0% found this document useful (0 votes)
168 views530 pages

210 Course PDF

This document contains lecture notes on thermodynamics and statistical mechanics. It includes sections covering fundamentals of probability, thermodynamics, heat engines, entropy, thermodynamic potentials, Maxwell relations, and applications of thermodynamics. The notes are intended as a work in progress by the author for a course at the University of California, San Diego.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views530 pages

210 Course PDF

This document contains lecture notes on thermodynamics and statistical mechanics. It includes sections covering fundamentals of probability, thermodynamics, heat engines, entropy, thermodynamic potentials, Maxwell relations, and applications of thermodynamics. The notes are intended as a work in progress by the author for a course at the University of California, San Diego.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 530

Lecture Notes on Thermodynamics and Statistical Mechanics

(A Work in Progress)

Daniel Arovas
Department of Physics
University of California, San Diego

December 10, 2019


Contents

Contents i

List of Tables xiv

List of Figures xv

0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

0.2 General references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1 Fundamentals of Probability 3

1.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Statistical Properties of Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 One-dimensional random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Thermodynamic limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.3 Entropy and energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Basic Concepts in Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.1 Fundamental definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.2 Bayesian statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.3 Random variables and their averages . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Entropy and Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4.1 Entropy and information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4.2 Probability distributions from maximum entropy . . . . . . . . . . . . . . . . . . 13

1.4.3 Continuous probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 17

i
ii CONTENTS

1.5 General Aspects of Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.5.1 Discrete and continuous distributions . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.5.2 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.5.3 Moments and cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21


1.5.4 Multidimensional Gaussian integral . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.6 Bayesian Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.6.1 Frequentists and Bayesians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.6.2 Updating Bayesian priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.6.3 Hyperparameters and conjugate priors . . . . . . . . . . . . . . . . . . . . . . . . 25


1.6.4 The problem with priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Thermodynamics 29

2.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 What is Thermodynamics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2.1 Thermodynamic systems and state variables . . . . . . . . . . . . . . . . . . . . . 30

2.2.2 Heat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.2.3 Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.4 Pressure and Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.2.5 Standard temperature and pressure . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.3 The Zeroth Law of Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Mathematical Interlude : Exact and Inexact Differentials . . . . . . . . . . . . . . . . . . 37

2.5 The First Law of Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39


2.5.1 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.5.2 Single component systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.5.3 Ideal gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.5.4 Adiabatic transformations of ideal gases . . . . . . . . . . . . . . . . . . . . . . . 44


2.5.5 Adiabatic free expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.6 Heat Engines and the Second Law of Thermodynamics . . . . . . . . . . . . . . . . . . . 47

2.6.1 There’s no free lunch so quit asking . . . . . . . . . . . . . . . . . . . . . . . . . . 47


CONTENTS iii

2.6.2 Engines and refrigerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.6.3 Nothing beats a Carnot engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.6.4 The Carnot cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.6.5 The Stirling cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.6.6 The Otto and Diesel cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.6.7 The Joule-Brayton cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.6.8 Carnot engine at maximum power output . . . . . . . . . . . . . . . . . . . . . . . 58

2.7 The Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.7.1 Entropy and heat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.7.2 The Third Law of Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.7.3 Entropy changes in cyclic processes . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2.7.4 Gibbs-Duhem relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

2.7.5 Entropy for an ideal gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.7.6 Example system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

2.7.7 Measuring the entropy of a substance . . . . . . . . . . . . . . . . . . . . . . . . . 66

2.8 Thermodynamic Potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

2.8.1 Energy E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

2.8.2 Helmholtz free energy F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

2.8.3 Enthalpy H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2.8.4 Gibbs free energy G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2.8.5 Grand potential Ω . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2.9 Maxwell Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

2.9.1 Relations deriving from E(S, V, N ) . . . . . . . . . . . . . . . . . . . . . . . . . . 70

2.9.2 Relations deriving from F (T, V, N ) . . . . . . . . . . . . . . . . . . . . . . . . . . 70

2.9.3 Relations deriving from H(S, p, N ) . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.9.4 Relations deriving from G(T, p, N ) . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.9.5 Relations deriving from Ω(T, V, µ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2.9.6 Generalized thermodynamic potentials . . . . . . . . . . . . . . . . . . . . . . . . 73


iv CONTENTS

2.10 Equilibrium and Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

2.10.1 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

2.10.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

2.11 Applications of Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

2.11.1 Adiabatic free expansion revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

2.11.2 Energy and volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

2.11.3 van der Waals equation of state . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

2.11.4 Thermodynamic response functions . . . . . . . . . . . . . . . . . . . . . . . . . . 80

2.11.5 Joule effect: free expansion of a gas . . . . . . . . . . . . . . . . . . . . . . . . . . 83

2.11.6 Throttling: the Joule-Thompson effect . . . . . . . . . . . . . . . . . . . . . . . . 84

2.12 Phase Transitions and Phase Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

2.12.1 p-v-T surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

2.12.2 The Clausius-Clapeyron relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

2.12.3 Liquid-solid line in H2 O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

2.12.4 Slow melting of ice : a quasistatic but irreversible process . . . . . . . . . . . . . . 92

2.12.5 Gibbs phase rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

2.13 Entropy of Mixing and the Gibbs Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . 96

2.13.1 Computing the entropy of mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

2.13.2 Entropy and combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

2.13.3 Weak solutions and osmotic pressure . . . . . . . . . . . . . . . . . . . . . . . . . 99

2.13.4 Effect of impurities on boiling and freezing points . . . . . . . . . . . . . . . . . . 101

2.13.5 Binary solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

2.14 Some Concepts in Thermochemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

2.14.1 Chemical reactions and the law of mass action . . . . . . . . . . . . . . . . . . . . 111

2.14.2 Enthalpy of formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

2.14.3 Bond enthalpies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

2.15 Appendix I : Integrating Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

2.16 Appendix II : Legendre Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119


CONTENTS v

2.17 Appendix III : Useful Mathematical Relations . . . . . . . . . . . . . . . . . . . . . . . . 122

3 Ergodicity and the Approach to Equilibrium 127

3.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

3.2 Modeling the Approach to Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

3.2.1 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

3.2.2 The Master Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

3.2.3 Equilibrium distribution and detailed balance . . . . . . . . . . . . . . . . . . . . 129

3.2.4 Boltzmann’s H-theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

3.3 Phase Flows in Classical Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

3.3.1 Hamiltonian evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

3.3.2 Dynamical systems and the evolution of phase space volumes . . . . . . . . . . . 132

3.3.3 Liouville’s equation and the microcanonical distribution . . . . . . . . . . . . . . 135

3.4 Irreversibility and Poincaré Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

3.4.1 Poincaré recurrence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

3.4.2 Kac ring model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

3.5 Remarks on Ergodic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

3.5.1 Definition of ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

3.5.2 The microcanonical ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

3.5.3 Ergodicity and mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

3.6 Thermalization of Quantum Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

3.6.1 Quantum dephasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

3.6.2 Eigenstate thermalization hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 150

3.6.3 When is the ETH true? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

3.7 Appendix I : Formal Solution of the Master Equation . . . . . . . . . . . . . . . . . . . . 152

3.8 Appendix II : Radioactive Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

3.9 Appendix III: Transition to Ergodicity in a Simple Model . . . . . . . . . . . . . . . . . . 155

4 Statistical Ensembles 165


vi CONTENTS

4.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

4.2 Microcanonical Ensemble (µCE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

4.2.1 The microcanonical distribution function . . . . . . . . . . . . . . . . . . . . . . . 166

4.2.2 Density of states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

4.2.3 Arbitrariness in the definition of S(E) . . . . . . . . . . . . . . . . . . . . . . . . 170

4.2.4 Ultra-relativistic ideal gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

4.2.5 Discrete systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

4.3 The Quantum Mechanical Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

4.3.1 The density matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

4.3.2 Averaging the DOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

4.3.3 Coherent states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

4.4 Thermal Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

4.4.1 Two systems in thermal contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

4.4.2 Thermal, mechanical and chemical equilibrium . . . . . . . . . . . . . . . . . . . . 177

4.4.3 Gibbs-Duhem relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

4.5 Ordinary Canonical Ensemble (OCE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

4.5.1 Canonical distribution and partition function . . . . . . . . . . . . . . . . . . . . . 178

4.5.2 The difference between P (En ) and Pn . . . . . . . . . . . . . . . . . . . . . . . . . 179

4.5.3 Additional remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

4.5.4 Averages within the OCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

4.5.5 Entropy and free energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

4.5.6 Fluctuations in the OCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

4.5.7 Thermodynamics revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

4.5.8 Generalized susceptibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

4.6 Grand Canonical Ensemble (GCE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

4.6.1 Grand canonical distribution and partition function . . . . . . . . . . . . . . . . . 185

4.6.2 Entropy and Gibbs-Duhem relation . . . . . . . . . . . . . . . . . . . . . . . . . . 186

4.6.3 Generalized susceptibilities in the GCE . . . . . . . . . . . . . . . . . . . . . . . . 187


CONTENTS vii

4.6.4 Fluctuations in the GCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

4.6.5 Gibbs ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

4.7 Statistical Ensembles from Maximum Entropy . . . . . . . . . . . . . . . . . . . . . . . . 189

4.7.1 µCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

4.7.2 OCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

4.7.3 GCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

4.8 Ideal Gas Statistical Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

4.8.1 Maxwell velocity distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

4.8.2 Equipartition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

4.9 Selected Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

4.9.1 Spins in an external magnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . 195

4.9.2 Negative temperature (!) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

4.9.3 Adsorption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

4.9.4 Elasticity of wool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

4.9.5 Noninteracting spin dimers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

4.10 Statistical Mechanics of Molecular Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

4.10.1 Separation of translational and internal degrees of freedom . . . . . . . . . . . . . 202

4.10.2 Ideal gas law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

4.10.3 The internal coordinate partition function . . . . . . . . . . . . . . . . . . . . . . 204

4.10.4 Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

4.10.5 Vibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

4.10.6 Two-level systems : Schottky anomaly . . . . . . . . . . . . . . . . . . . . . . . . 207

4.10.7 Electronic and nuclear excitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

4.11 Appendix I : Additional Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

4.11.1 Three state system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

4.11.2 Spins and vacancies on a surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

4.11.3 Fluctuating interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

4.11.4 Dissociation of molecular hydrogen . . . . . . . . . . . . . . . . . . . . . . . . . . 216


viii CONTENTS

5 Noninteracting Quantum Systems 219

5.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

5.2 Statistical Mechanics of Noninteracting Quantum Systems . . . . . . . . . . . . . . . . . . 220

5.2.1 Bose and Fermi systems in the grand canonical ensemble . . . . . . . . . . . . . . 220

5.2.2 Quantum statistics and the Maxwell-Boltzmann limit . . . . . . . . . . . . . . . . 221

5.2.3 Single particle density of states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

5.3 Quantum Ideal Gases : Low Density Expansions . . . . . . . . . . . . . . . . . . . . . . . 224

5.3.1 Expansion in powers of the fugacity . . . . . . . . . . . . . . . . . . . . . . . . . . 224

5.3.2 Virial expansion of the equation of state . . . . . . . . . . . . . . . . . . . . . . . 225

5.3.3 Ballistic dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

5.4 Entropy and Counting States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

5.5 Photon Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

5.5.1 Thermodynamics of the photon gas . . . . . . . . . . . . . . . . . . . . . . . . . . 229

5.5.2 Classical arguments for the photon gas . . . . . . . . . . . . . . . . . . . . . . . . 231

5.5.3 Surface temperature of the earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

5.5.4 Distribution of blackbody radiation . . . . . . . . . . . . . . . . . . . . . . . . . . 233

5.5.5 What if the sun emitted ferromagnetic spin waves? . . . . . . . . . . . . . . . . . 234

5.6 Lattice Vibrations : Einstein and Debye Models . . . . . . . . . . . . . . . . . . . . . . . 235

5.6.1 One-dimensional chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

5.6.2 General theory of lattice vibrations . . . . . . . . . . . . . . . . . . . . . . . . . . 237

5.6.3 Einstein and Debye models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

5.6.4 Melting and the Lindemann criterion . . . . . . . . . . . . . . . . . . . . . . . . . 242

5.6.5 Goldstone bosons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

5.7 The Ideal Bose Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

5.7.1 General formulation for noninteracting systems . . . . . . . . . . . . . . . . . . . 247

5.7.2 Ballistic dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

5.7.3 Isotherms for the ideal Bose gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

5.7.4 The λ-transition in Liquid 4 He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253


CONTENTS ix

5.7.5 Fountain effect in superfluid 4 He . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

5.7.6 Bose condensation in optical traps . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

5.8 The Ideal Fermi Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

5.8.1 Grand potential and particle number . . . . . . . . . . . . . . . . . . . . . . . . . 259


5.8.2 The Fermi distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

5.8.3 T = 0 and the Fermi surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

5.8.4 The Sommerfeld expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

5.8.5 Magnetic susceptibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

5.8.6 Moment formation in interacting itinerant electron systems . . . . . . . . . . . . . 270


5.8.7 White dwarf stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

5.9 Appendix I : Second Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

5.9.1 Basis states and creation/annihilation operators . . . . . . . . . . . . . . . . . . . 281

5.9.2 Second quantized operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283


5.10 Appendix II : Ideal Bose Gas Condensation . . . . . . . . . . . . . . . . . . . . . . . . . . 284

5.11 Appendix III : Example Bose Condensation Problem . . . . . . . . . . . . . . . . . . . . . 286

6 Classical Interacting Systems 289


6.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

6.2 Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

6.2.2 Ising model in one dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

6.2.3 Zero external field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291


6.2.4 Chain with free ends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

6.2.5 Ising model in two dimensions : Peierls’ argument . . . . . . . . . . . . . . . . . . 293

6.2.6 Two dimensions or one? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

6.2.7 High temperature expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298


6.3 Nonideal Classical Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

6.3.1 The configuration integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

6.3.2 One-dimensional Tonks gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301


x CONTENTS

6.3.3 Mayer cluster expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

6.3.4 Lowest order expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

6.3.5 One-particle irreducible clusters and the virial expansion . . . . . . . . . . . . . . 309

6.3.6 Cookbook recipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

6.3.7 Hard sphere gas in three dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 312

6.3.8 Weakly attractive tail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

6.3.9 Spherical potential well . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

6.3.10 Hard spheres with a hard wall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

6.4 Lee-Yang Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

6.4.1 Analytic properties of the partition function . . . . . . . . . . . . . . . . . . . . . 318

6.4.2 Electrostatic analogy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

6.4.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

6.5 Liquid State Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

6.5.1 The many-particle distribution function . . . . . . . . . . . . . . . . . . . . . . . . 322

6.5.2 Averages over the distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

6.5.3 Virial equation of state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

6.5.4 Correlations and scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

6.5.5 Correlation and response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

6.5.6 BBGKY hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

6.5.7 Ornstein-Zernike theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

6.5.8 Percus-Yevick equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

6.5.9 Ornstein-Zernike approximation at long wavelengths . . . . . . . . . . . . . . . . 337

6.6 Coulomb Systems : Plasmas and the Electron Gas . . . . . . . . . . . . . . . . . . . . . . 339

6.6.1 Electrostatic potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

6.6.2 Debye-Hückel theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

6.6.3 The electron gas : Thomas-Fermi screening . . . . . . . . . . . . . . . . . . . . . . 342

6.7 Polymers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

6.7.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345


CONTENTS xi

6.7.2 Polymers as random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

6.7.3 Flory theory of self-avoiding walks . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

6.7.4 Polymers and solvents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

6.8 Appendix I : Potts Model in One Dimension . . . . . . . . . . . . . . . . . . . . . . . . . 356


6.8.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

6.8.2 Transfer matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

7 Mean Field Theory of Phase Transitions 361


7.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

7.2 The van der Waals system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

7.2.1 Equation of state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

7.2.2 Analytic form of the coexistence curve near the critical point . . . . . . . . . . . . 365

7.2.3 History of the van der Waals equation . . . . . . . . . . . . . . . . . . . . . . . . 368


7.3 Fluids, Magnets, and the Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

7.3.1 Lattice gas description of a fluid . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

7.3.2 Phase diagrams and critical exponents . . . . . . . . . . . . . . . . . . . . . . . . 372

7.3.3 Gibbs-Duhem relation for magnetic systems . . . . . . . . . . . . . . . . . . . . . 374


7.3.4 Order-disorder transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374

7.4 Mean Field Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

7.4.1 h = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

7.4.2 Specific heat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

7.4.3 h 6= 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
7.4.4 Magnetization dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

7.4.5 Beyond nearest neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384

7.4.6 Ising model with long-ranged forces . . . . . . . . . . . . . . . . . . . . . . . . . . 384

7.5 Variational Density Matrix Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386


7.5.1 The variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386

7.5.2 Variational density matrix for the Ising model . . . . . . . . . . . . . . . . . . . . 387

7.5.3 Mean Field Theory of the Potts Model . . . . . . . . . . . . . . . . . . . . . . . . 390


xii CONTENTS

7.5.4 Mean Field Theory of the XY Model . . . . . . . . . . . . . . . . . . . . . . . . . 392

7.5.5 XY model via neglect of fluctuations method . . . . . . . . . . . . . . . . . . . . 394

7.6 Landau Theory of Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395

7.6.1 Quartic free energy with Ising symmetry . . . . . . . . . . . . . . . . . . . . . . . 395

7.6.2 Cubic terms in Landau theory : first order transitions . . . . . . . . . . . . . . . . 397

7.6.3 Magnetization dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

7.6.4 Sixth order Landau theory : tricritical point . . . . . . . . . . . . . . . . . . . . . 400

7.6.5 Hysteresis for the sextic potential . . . . . . . . . . . . . . . . . . . . . . . . . . . 403

7.7 Mean Field Theory of Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

7.7.1 Correlation and response in mean field theory . . . . . . . . . . . . . . . . . . . . 405

7.7.2 Calculation of the response functions . . . . . . . . . . . . . . . . . . . . . . . . . 406

7.7.3 Beyond the Ising model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

7.8 Global Symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

7.8.1 Symmetries and symmetry groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

7.8.2 Lower critical dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413

7.8.3 Continuous symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415

7.8.4 Random systems : Imry-Ma argument . . . . . . . . . . . . . . . . . . . . . . . . 416

7.9 Ginzburg-Landau Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418

7.9.1 Ginzburg-Landau free energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418

7.9.2 Domain wall profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419

7.9.3 Derivation of Ginzburg-Landau free energy . . . . . . . . . . . . . . . . . . . . . . 420

7.9.4 Ginzburg criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424

7.10 Appendix I : Equivalence of the Mean Field Descriptions . . . . . . . . . . . . . . . . . . 426

7.10.1 Variational Density Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427

7.10.2 Mean Field Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428

7.11 Appendix II : Additional Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

7.11.1 Blume-Capel model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

7.11.2 Ising antiferromagnet in an external field . . . . . . . . . . . . . . . . . . . . . . . 430


CONTENTS xiii

7.11.3 Canted quantum antiferromagnet . . . . . . . . . . . . . . . . . . . . . . . . . . . 434

7.11.4 Coupled order parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436

8 Nonequilibrium Phenomena 441


8.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

8.2 Equilibrium, Nonequilibrium and Local Equilibrium . . . . . . . . . . . . . . . . . . . . . 442

8.3 Boltzmann Transport Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444

8.3.1 Derivation of the Boltzmann equation . . . . . . . . . . . . . . . . . . . . . . . . . 444


8.3.2 Collisionless Boltzmann equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 445

8.3.3 Collisional invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447

8.3.4 Scattering processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447

8.3.5 Detailed balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449

8.3.6 Kinematics and cross section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450


8.3.7 H-theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

8.4 Weakly Inhomogeneous Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453

8.5 Relaxation Time Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455

8.5.1 Approximation of collision integral . . . . . . . . . . . . . . . . . . . . . . . . . . 455


8.5.2 Computation of the scattering time . . . . . . . . . . . . . . . . . . . . . . . . . . 455

8.5.3 Thermal conductivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456

8.5.4 Viscosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

8.5.5 Oscillating external force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460

8.5.6 Quick and Dirty Treatment of Transport . . . . . . . . . . . . . . . . . . . . . . . 461


8.5.7 Thermal diffusivity, kinematic viscosity, and Prandtl number . . . . . . . . . . . . 462

8.6 Diffusion and the Lorentz model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

8.6.1 Failure of the relaxation time approximation . . . . . . . . . . . . . . . . . . . . . 463

8.6.2 Modified Boltzmann equation and its solution . . . . . . . . . . . . . . . . . . . . 464


8.7 Linearized Boltzmann Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466

8.7.1 Linearizing the collision integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466

8.7.2 Linear algebraic properties of L̂ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467


xiv CONTENTS

8.7.3 Steady state solution to the linearized Boltzmann equation . . . . . . . . . . . . . 468


8.7.4 Variational approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
8.8 The Equations of Hydrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
8.9 Nonequilibrium Quantum Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
8.9.1 Boltzmann equation for quantum systems . . . . . . . . . . . . . . . . . . . . . . 473
8.9.2 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
8.9.3 Calculation of Transport Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . 478
8.9.4 Onsager Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
8.10 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
8.10.1 Langevin equation and Brownian motion . . . . . . . . . . . . . . . . . . . . . . . 481
8.10.2 Langevin equation for a particle in a harmonic well . . . . . . . . . . . . . . . . . 484
8.10.3 Discrete random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
8.10.4 Fokker-Planck equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
8.10.5 Brownian motion redux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
8.10.6 Master Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
8.11 Appendix I : Boltzmann Equation and Collisional Invariants . . . . . . . . . . . . . . . . 491
8.12 Appendix II : Distributions and Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . 494
8.13 Appendix III : General Linear Autonomous Inhomogeneous ODEs . . . . . . . . . . . . . 496
8.14 Appendix IV : Correlations in the Langevin formalism . . . . . . . . . . . . . . . . . . . . 502
8.15 Appendix V : Kramers-Krönig Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
List of Tables

2.1 Specific heat of some common substances. . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


2.2 Performances of real heat engines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.3 Van der Waals parameters for some common gases. . . . . . . . . . . . . . . . . . . . . . . 79
2.4 Latent heats of fusion and vaporization at p = 1 atm. . . . . . . . . . . . . . . . . . . . . . 102
2.5 Enthalpies of formation of some common substances. . . . . . . . . . . . . . . . . . . . . . 114
2.6 Average bond enthalpies for some common bonds. . . . . . . . . . . . . . . . . . . . . . . 116

3.1 Comparison of time and microcanonical averages (I). . . . . . . . . . . . . . . . . . . . . . 161


3.2 Comparison of time and microcanonical averages (II). . . . . . . . . . . . . . . . . . . . . 163

4.1 Rotational and vibrational temperatures of common molecules.. . . . . . . . . . . . . . . . 205


4.2 Nuclear angular momentum states for homonuclear diatomic molecules. . . . . . . . . . . 211

5.1 Debye temperatures and melting points for some common elements. . . . . . . . . . . . . 242

6.1 Exact, Percus-Yevick, and hypernetted chains results for hard spheres. . . . . . . . . . . . 337

7.1 van der Waals parameters for some common gases . . . . . . . . . . . . . . . . . . . . . . 364
7.2 Critical exponents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

8.1 Viscosities, thermal conductivities, and Prandtl numbers for some common gases. . . . . . 463

xv
xvi LIST OF TABLES
List of Figures

1 My father and my dog. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 The falling ball novelty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5


1.2 Approaching the Gaussian distribution as N → ∞. . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Microscale to macroscale in physics vs. social sciences. . . . . . . . . . . . . . . . . . . . . 31


2.2 Gas pressure as a space-time averaged quantity. . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 The constant volume gas thermometer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 The phase diagram of H2 O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Convergence of measurements in the gas thermometer. . . . . . . . . . . . . . . . . . . . . 37
2.6 Two distinct paths with identical endpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7 The first law of thermodynamics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.8 CV for one mole of H2 gas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.9 Molar heat capacities cV for three solids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.10 Adiabatic free expansion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.11 A perfect engine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.12 Heat engine and refrigerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.13 A wonder engine driving a Carnot refrigerator. . . . . . . . . . . . . . . . . . . . . . . . . 50
2.14 The Carnot cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.15 The Stirling cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.16 The Otto cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.17 The Diesel cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.18 The Joule-Brayton cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

xvii
xviii LIST OF FIGURES

2.19 Decomposition of a thermodynamic loop into Carnot cycles. . . . . . . . . . . . . . . . . . 61

2.20 Check for instability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

2.21 Adiabatic free expansion via a thermal path. . . . . . . . . . . . . . . . . . . . . . . . . . 77

2.22 Throttling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

2.23 Temperature inversion for the van der Waals gas. . . . . . . . . . . . . . . . . . . . . . . . 86

2.24 Phase diagrams of single component systems. . . . . . . . . . . . . . . . . . . . . . . . . . 87

2.25 p(v, T ) surface for the ideal gas equation of state. . . . . . . . . . . . . . . . . . . . . . . . 88

2.26 A p-v-T surface for a substance which contracts upon freezing. . . . . . . . . . . . . . . . 89

2.27 p − v − T surfaces for a substance which expands upon freezing. . . . . . . . . . . . . . . 90

2.28 Projection of the p-v-T surface onto the (v, p) plane. . . . . . . . . . . . . . . . . . . . . . 91

2.29 Phase diagram for CO2 in the (p, T ) plane. . . . . . . . . . . . . . . . . . . . . . . . . . . 93

2.30 Surface melting data for ice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

2.31 Multicomponent system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

2.32 Mixing among three different species of particles. . . . . . . . . . . . . . . . . . . . . . . . 98

2.33 Osmotic pressure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

2.34 Gibbs free energy for a binary solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

2.35 Binary systems: chemical potential shifts and (T, µ) phase diagrams. . . . . . . . . . . . . 105

2.36 Phase diagram for the binary system in the (x, T ) plane. . . . . . . . . . . . . . . . . . . . 106

2.37 Gibbs free energy for an ideal binary solution. . . . . . . . . . . . . . . . . . . . . . . . . . 107

2.38 Phase diagrams for azeotropes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

2.39 Binary fluid mixture in equilibrium with a vapor. . . . . . . . . . . . . . . . . . . . . . . . 109

2.40 Eutectic phase diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

2.41 Reaction enthalpy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

2.42 Reaction enthalpy for hydrogenation of ethene. . . . . . . . . . . . . . . . . . . . . . . . . 117

2.43 The Legendre transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

3.1 Time evolution of two immiscible fluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

3.2 Projected evolution of a set R0 under the mapping gτ . . . . . . . . . . . . . . . . . . . . . 137

3.3 Poincaré recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138


LIST OF FIGURES xix

3.4 The Kac ring model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

3.5 Simulation of the Kac ring model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

3.6 More simulations of the Kac ring model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

3.7 An ergodic flow which is not mixing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

3.8 The baker’s transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

3.9 Multiply iterated baker’s transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

3.10 Arnold’s cat map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

3.11 Hierarchy of dynamical systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

3.12 Poincaré sections for the ball and piston problem. . . . . . . . . . . . . . . . . . . . . . . . 157
Rt
3.13 Long time averages of Xav (t) ≡ t−1 0 dt′ X(t′ ). . . . . . . . . . . . . . . . . . . . . . . . . 160

 
4.1 Complex integration contours C for inverse Laplace transform L−1 Z(β) = D(E). When the product dN is

4.2 A system S in contact with a ‘world’ W . The union of the two, universe U = W ∪ S, is said to be the ‘univ

4.3 Averaging the quantum mechanical discrete density of states yields a continuous curve. . 173

4.4 Two systems in thermal contact. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

4.5 Microscopic, statistical interpretation of the First Law of Thermodynamics. . . . . . . . . 183



4.6 Maxwell distribution of speeds ϕ(v/v0 ). The most probable speed is vMAX = 2 v0 . The average speed is vA

4.7 When entropy decreases with increasing energy, the temperature is negative. Typically, kinetic degrees of fr

4.8 The monomers in wool are modeled as existing in one of two states. The low energy undeformed state is A,

4.9 Upper panel: length L(τ, T ) for kB T /ε̃ = 0.01 (blue), 0.1 (green), 0.5 (dark red), and 1.0 (red). Bottom pan

4.10 A model of noninteracting spin dimers on a lattice. Each red dot represents a classical spin for which σj =

4.11 Heat capacity per molecule as a function of temperature for (a) heteronuclear diatomic gases, (b) a single v

5.1 Partitions of bosonic occupation states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

5.2 Spectral density of blackbody radiation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

5.3 A linear chain of masses and springs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

5.4 Crystal structure, Bravais lattice, and basis. . . . . . . . . . . . . . . . . . . . . . . . . . . 237

5.5 Phonon spectra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

5.6 The polylogarithm function Lis (z). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249


xx LIST OF FIGURES

5.7 Molar heat capacity of the ideal Bose gas. . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

5.8 Phase diagrams for the ideal Bose gas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

5.9 Phase diagram of 4 He. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

5.10 Specific heat of liquid 4 He in the vicinity of the λ-transition. . . . . . . . . . . . . . . . . . 255

5.11 The fountain effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

5.12 The Fermi distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

5.13 Fermi surfaces for two and three-dimensional structures. . . . . . . . . . . . . . . . . . . . 262

5.14 Deformation of the complex integration contour in eqn. 5.219. . . . . . . . . . . . . . . . . 264

5.15 Fermi distributions in the presence of a magnetic field. . . . . . . . . . . . . . . . . . . . . 268

5.16 A graduate student experiences the Stoner enhancement. . . . . . . . . . . . . . . . . . . 275

5.17 Mean field phase diagram of the Hubbard model, including paramagnetic (P), ferromagnetic (F), and antife

5.18 Mass-radius relationship for white dwarf stars. . . . . . . . . . . . . . . . . . . . . . . . . 280

6.1 Clusters and boundaries for the square lattice Ising model. . . . . . . . . . . . . . . . . . . 294

6.2 A two-dimensional square lattice mapped onto a one-dimensional chain. . . . . . . . . . . 296

6.3 High temperature expansion diagrams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

6.4 High temperature expansion for the correlation function. . . . . . . . . . . . . . . . . . . . 300

6.5 The Lennard-Jones potential. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

6.6 Keeping up with the Joneses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

6.7 Diagrams and the Mayer cluster expansion. . . . . . . . . . . . . . . . . . . . . . . . . . . 305

6.8 Vertex labels in the configuration integral. . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

6.9 Symmetry factors for cluster diagrams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

6.10 Connected versus irreducible clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

6.11 The overlap of hard sphere Mayer functions. . . . . . . . . . . . . . . . . . . . . . . . . . . 313

6.12 Mayer function for an attractive spherical well with a repulsive core. . . . . . . . . . . . . 315

6.13 Density of hard spheres in the presence of a hard wall. . . . . . . . . . . . . . . . . . . . . 316

6.14 Singularities of the partition function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

6.15 Fugacity z and pv0 /kB T versus dimensionless specific volume v/v0 . . . . . . . . . . . . . . 321

6.16 Hard sphere pair distribution functions: simulation and experiment. . . . . . . . . . . . . 325
LIST OF FIGURES xxi

6.17 Monte Carlo pair distribution functions for liquid water. . . . . . . . . . . . . . . . . . . . 327

6.18 Elastic and inelastic scattering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

6.19 Static structure factor of the Lennard-Jones fluid. . . . . . . . . . . . . . . . . . . . . . . . 331

6.20 The Thomas-Fermi atom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

6.21 Some examples of linear chain polymers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

6.22 trans and gauche orientations in carbon chains. . . . . . . . . . . . . . . . . . . . . . . . . 347

6.23 The polymer chain as a random coil. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

6.24 Radius of gyration Rg of polystyrene in a toluene and benzene solvent. . . . . . . . . . . . 354

7.1 Pressure versus volume for the van der Waals gas. . . . . . . . . . . . . . . . . . . . . . . 363

7.2 Molar free energy f (T, v) of the van der Waals system. . . . . . . . . . . . . . . . . . . . . 365

7.3 Maxwell construction in the (v, p) plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

7.4 Pressure-volume isotherms for the van der Waals system, with Maxwell construction. . . . 368

7.5 Universality of the liquid-gas transition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

7.6 [The lattice gas model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

7.7 Comparison of liquid-gas and Ising ferromagnet phase diagrams. . . . . . . . . . . . . . . 373

7.8 Order-disorder transition on the square lattice. . . . . . . . . . . . . . . . . . . . . . . . . 375

7.9 Ising mean field theory at h = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

7.10 Ising mean field theory at h = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379

7.11 Dissipative magnetization dynamics ṁ = −f ′ (m). . . . . . . . . . . . . . . . . . . . . . . . 382

7.12 Magnetization hysteresis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383

7.13 Variational field free energy versus magnetization. . . . . . . . . . . . . . . . . . . . . . . 389

7.14 Phase diagram for the quartic Landau free energy. . . . . . . . . . . . . . . . . . . . . . . 396

7.15 Behavior of the quartic free energy f (m) = 21 am2 − 31 ym3 + 14 bm4 . . . . . . . . . . . . . . 398

7.16 Fixed points for ϕ(u) = 12 ru2 − 13 u3 + 41 u4 and flow u̇ = −ϕ′ (u). . . . . . . . . . . . . . . . 400

7.17 Behavior of the sextic free energy f (m) = 21 am2 + 41 bm4 + 61 cm6 . . . . . . . . . . . . . . . 401

7.18 Sextic free energy ϕ(u) = 21 ru2 − 14 u4 + 61 u6 for different values of r. . . . . . . . . . . . . 402

7.19 Fixed points ϕ′ (u∗ ) = 0 and flow u̇ = −ϕ′ (u) for the sextic potential. . . . . . . . . . . . . 403

7.20 A domain wall in a one-dimensional Ising model. . . . . . . . . . . . . . . . . . . . . . . . 413


xxii LIST OF FIGURES

7.21 Domain walls in the two and three dimensional Ising model. . . . . . . . . . . . . . . . . . 414
7.22 A domain wall in an XY ferromagnet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
7.23 Imry-Ma domains and free energy versus domain size. . . . . . . . . . . . . . . . . . . . . 417
7.24 Mean field phase diagram for the Blume-Capel model. . . . . . . . . . . . . . . . . . . . . 430
7.25 Mean field solution for an Ising antiferromagnet in an external field. . . . . . . . . . . . . 432
7.26 Mean field phase diagram for an Ising antiferromagnet in an external field. . . . . . . . . . 433
7.27 Mean field phase diagram for the model of eqn. 7.355. . . . . . . . . . . . . . . . . . . . . 435
7.28 Phase diagram for the model of eqn. 7.367. . . . . . . . . . . . . . . . . . . . . . . . . . . 439

1 2 1 2
8.1 Level sets for a sample f (x̄, p̄, t̄) = A e− 2 (x̄−p̄t̄) e− 2 p̄ . . . . . . . . . . . . . . . . . . . . . . 446
8.2 One and two particle scattering processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
8.3 Graphic representation of the equation n σ v̄rel τ = 1. . . . . . . . . . . . . . . . . . . . . . 456
8.4 Gedankenexperiment to measure shear viscosity η in a fluid. . . . . . . . . . . . . . . . . . 458
8.5 Experimental data on thermal conductivity and shear viscosity. . . . . . . . . . . . . . . . 460
8.6 Scattering in the center of mass frame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
8.7 The thermocouple. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
8.8 Peltier effect refrigerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
8.9 The Chapman-Kolmogorov equation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
8.10 Discretization of a continuous function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
8.11 Regions for some of the double integrals encountered in the text. . . . . . . . . . . . . . . 503
0.1. PREFACE 1

0.1 Preface

This is a proto-preface. A more complete preface will be written after these notes are completed.
These lecture notes are intended to supplement a course in statistical physics at the upper division
undergraduate or beginning graduate level.
I was fortunate to learn this subject from one of the great statistical physicists of our time, John Cardy.
I am grateful to my wife Joyce and to my children Ezra and Lily for putting up with all the outrageous
lies I’ve told them about getting off the computer ‘in just a few minutes’ while working on these notes.
These notes are dedicated to the only two creatures I know who are never angry with me: my father and
my dog.

Figure 1: My father (Louis) and my dog (Henry).


0.2. GENERAL REFERENCES 1

0.2 General references

– L. Peliti, Statistical Mechanics in a Nutshell (Princeton University Press, 2011)


The best all-around book on the subject I’ve come across thus far. Appropriate for the graduate
or advanced undergraduate level.

– J. P. Sethna, Entropy, Order Parameters, and Complexity (Oxford, 2006)


An excellent introductory text with a very modern set of topics and exercises. Available online at
http://www.physics.cornell.edu/sethna/StatMech

– M. Kardar, Statistical Physics of Particles (Cambridge, 2007)


A superb modern text, with many insightful presentations of key concepts.

– M. Plischke and B. Bergersen, Equilibrium Statistical Physics (3rd edition, World Scientific, 2006)
An excellent graduate level text. Less insightful than Kardar but still a good modern treatment of
the subject. Good discussion of mean field theory.

– E. M. Lifshitz and L. P. Pitaevskii, Statistical Physics (part I, 3rd edition, Pergamon, 1980)
This is volume 5 in the famous Landau and Lifshitz Course of Theoretical Physics. Though dated,
it still contains a wealth of information and physical insight.

– F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, 1987)


This has been perhaps the most popular undergraduate text since it first appeared in 1967, and
with good reason.
2 LIST OF FIGURES
Chapter 1

Fundamentals of Probability

1.1 References

– C. Gardiner, Stochastic Methods (4th edition, Springer-Verlag, 2010)


Very clear and complete text on stochastic methods with many applications.

– J. M. Bernardo and A. F. M. Smith, Bayesian Theory (Wiley, 2000)


A thorough textbook on Bayesian methods.

– D. Williams, Weighing the Odds: A Course in Probability and Statistics (Cambridge, 2001)
A good overall statistics textbook, according to a mathematician colleague.

– E. T. Jaynes, Probability Theory (Cambridge, 2007)


An extensive, descriptive, and highly opinionated presentation, with a strongly Bayesian approach.

– A. N. Kolmogorov, Foundations of the Theory of Probability (Chelsea, 1956)


The Urtext of mathematical probability theory.

3
4 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

1.2 Statistical Properties of Random Walks

1.2.1 One-dimensional random walk

Consider the mechanical system depicted in Fig. 1.1, a version of which is often sold in novelty shops. A
ball is released from the top, which cascades consecutively through N levels. The details of each ball’s
motion are governed by Newton’s laws of motion. However, to predict where any given ball will end up in
the bottom row is difficult, because the ball’s trajectory depends sensitively on its initial conditions, and
may even be influenced by random vibrations of the entire apparatus. We therefore abandon all hope of
integrating the equations of motion and treat the system statistically. That is, we assume, at each level,
that the ball moves to the right with probability p and to the left with probability q = 1 − p. If there is
no bias in the system, then p = q = 21 . The position XN after N steps may be written
N
X
X= σj , (1.1)
j=1

where σj = +1 if the ball moves to the right at level j, and σj = −1 if the ball moves to the left at level
j. At each level, the probability for these two outcomes is given by
(
p if σ = +1
Pσ = p δσ,+1 + q δσ,−1 = (1.2)
q if σ = −1 .

This is a normalized discrete probability distribution of the type discussed in section 1.5 below. The
multivariate distribution for all the steps is then
N
Y
P (σ1 , . . . , σN ) = P (σj ) . (1.3)
j=1

Our system is equivalent to a one-dimensional random walk . Imagine an inebriated pedestrian on a


sidewalk taking steps to the right and left at random. After N steps, the pedestrian’s location is X.
Now let’s compute the average of X:
N

X X
hXi = σj = N hσi = N σ P (σ) = N (p − q) = N (2p − 1) . (1.4)
j=1 σ=±1

This could be identified as an equation of state for our system, as it relates a measurable quantity X to
the number of steps N and the local bias p. Next, let’s compute the average of X 2 :
N X
X N
2
hX i = hσj σj ′ i = N 2 (p − q)2 + 4N pq . (1.5)
j=1 j ′ =1

Here we have used (


 2 1 if j = j ′
hσj σj ′ i = δjj ′ + 1 − δjj ′ (p − q) = (1.6)
(p − q)2 6 j′ .
if j =
1.2. STATISTICAL PROPERTIES OF RANDOM WALKS 5

Figure 1.1: The falling ball system, which mimics a one-dimensional random walk.

Note that hX 2 i ≥ hXi2 , which must be so because



2
Var(X) = h(∆X)2 i ≡ X − hXi = hX 2 i − hXi2 . (1.7)
This is called the variance of X. We have Var(X)p = 4N p q. The root mean square deviation, ∆Xrms ,
is the square root of the variance: ∆Xrms = Var(X). Note that the mean value of X is linearly
proportional to N 1 , but the RMS fluctuations ∆Xrms are proportional to N 1/2 . In the limit N → ∞
then, the ratio ∆Xrms /hXi vanishes as N −1/2 . This is a consequence of the central limit theorem (see
§1.5.2 below), and we shall meet up with it again on several occasions.
We can do even better. We can find the complete probability distribution for X. It is given by
 
N
PN,X = p NR q NL , (1.8)
NR
where NR/L are the numbers of steps taken to the right/left, with N = NR + NL , and X = NR − NL .
There are many independent ways to take NR steps to the right. For example, our first NR steps could
all be to the right, and the remaining NL = N − NR steps would then all be to the left. Or our final NR
steps could all be to the right. For each of these independent possibilities, the probability is pNR q NL .
How many possibilities are there? Elementary combinatorics tells us this number is
 
N N!
= . (1.9)
NR NR ! NL !
Note that N ± X = 2NR/L , so we can replace NR/L = 12 (N ± X). Thus,
N!
PN,X = N +X
 N −X  p(N +X)/2 q (N −X)/2 . (1.10)
2 ! 2 !
1
The exception is the unbiased case p = q = 12 , where hXi = 0.
6 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

Figure 1.2: Comparison of exact distribution of eqn. 1.10 (red squares) with the Gaussian distribution
of eqn. 1.19 (blue line).

1.2.2 Thermodynamic limit

Consider the limit N → ∞ but with x ≡ X/N finite. This is analogous to what is called the thermody-
namic limit in statistical mechanics. Since N is large, x may be considered a continuous variable. We
evaluate ln PN,X using Stirling’s asymptotic expansion

ln N ! ≃ N ln N − N + O(ln N ) . (1.11)

We then have
h i
ln PN,X ≃ N ln N − N − 12 N (1 + x) ln 21 N (1 + x) + 12 N (1 + x)
h i
− 12 N (1 − x) ln 12 N (1 − x) + 21 N (1 − x) + 12 N (1 + x) ln p + 12 N (1 − x) ln q (1.12)
h    i h   i
= −N 1+x 2 ln 1+x
2 + 1−x
2 ln 1−x
2 + N 1+x
2 ln p + 1−x
2 ln q .

Notice that the terms proportional to N ln N have all cancelled, leaving us with a quantity which is linear
in N . We may therefore write ln PN,X = −N f (x) + O(ln N ), where
h    i h 1+x   i
f (x) = 1+x
2 ln 1+x
2 + 1−x
2 ln 1−x
2 − 2 ln p + 1−x
2 ln q . (1.13)

We have just shown that in the large N limit we may write

PN,X = C e−N f (X/N ) , (1.14)

where C is a normalization constant2 . Since N is by assumption large, the function PN,X is dominated
by the minimum (or minima) of f (x), where the probability is maximized. To find the minimum of f (x),
2
The origin of C lies in the O(ln N ) and O(N 0 ) terms in the asymptotic expansion of ln N !. We have ignored these terms
here. Accounting for them carefully reproduces the correct value of C in eqn. 1.20.
1.2. STATISTICAL PROPERTIES OF RANDOM WALKS 7

we set f ′ (x) = 0, where  


′ 1 q 1+x
f (x) = 2 ln · . (1.15)
p 1−x
Setting f ′ (x) = 0, we obtain
1+x p
= ⇒ x̄ = p − q . (1.16)
1−x q
We also have
1
f ′′ (x) = , (1.17)
1 − x2
so invoking Taylor’s theorem,

f (x) = f (x̄) + 12 f ′′ (x̄) (x − x̄)2 + . . . . (1.18)

Putting it all together, we have


" # " #
N (x − x̄)2 (X − X̄)2
PN,X ≈ C exp − = C exp − , (1.19)
8pq 8N pq

where X̄ = hXi = N (p − q) = N x̄. The constant C is determined by the normalization condition,


∞ Z∞ " #
X (X − X̄)2 p
PN,X ≈ 12 dX C exp − = 2πN pq C , (1.20)
8N pq
X=−∞ −∞

and thus C = 1/ 2πN pq. Why don’t we go beyond second order in the Taylor expansion of f (x)? We
will find out in §1.5.2 below.

1.2.3 Entropy and energy

The function f (x) can be written as a sum of two contributions, f (x) = e(x) − s(x), where
   
s(x) = − 1+x
2 ln 1+x
2 − 1−x
2 ln 1−x
2
(1.21)
e(x) = − 12 ln(pq) − 21 x ln(p/q) .

The function S(N, x) ≡ N s(x) is analogous to the statistical entropy of our system3 . We have
   
N N
S(N, x) = N s(x) = ln = ln 1 . (1.22)
NR 2 N (1 + x)

Thus, the statistical entropy is the logarithm of the number of ways the system can be configured so as to
yield the same value of X (at fixed N ). The second contribution to f (x) is the energy term. We write

E(N, x) = N e(x) = − 21 N ln(pq) − 12 N x ln(p/q) . (1.23)

The energy term biases the probability PN,X = exp(S − E) so that low energy configurations are more
probable than high energy configurations. For our system, we see that when p < q (i.e. p < 21 ), the energy
3
The function s(x) is the specific entropy.
8 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

is minimized by taking x as small as possible (meaning as negative as possible). The smallest possible
allowed value of x = X/N is x = −1. Conversely, when p > q (i.e. p > 21 ), the energy is minimized by
taking x as large as possible, which means x = 1. The average value of x, as we have computed explicitly,
is x̄ = p − q = 2p − 1, which falls somewhere in between these two extremes.
In actual thermodynamic systems, entropy and energy are not dimensionless. What we have called S
here is really S/kB , which is the entropy in units of Boltzmann’s constant. And what we have called E
here is really E/kB T , which is energy in units of Boltzmann’s constant times temperature.

1.3 Basic Concepts in Probability Theory

Here we recite the basics of probability theory.

1.3.1 Fundamental definitions

The natural mathematical setting is set theory. Sets are generalized collections of objects. The basics:
ω ∈ A is a binary relation which says that the object ω is an element of the set A. Another binary
relation is set inclusion. If all members of A are in B, we write A ⊆ B. The union of sets A and B is
denoted A ∪ B and the intersection of A and B is denoted A ∩ B. The Cartesian product of A and B,
denoted A × B, is the set of all ordered elements (a, b) where a ∈ A and b ∈ B.
Some details: If ω is not in A, we write ω ∈ / A. Sets may also be objects, so we may speak of sets of
sets, but typically the sets which will concern us are simple discrete collections of numbers, such as the
possible rolls of a die {1,2,3,4,5,6}, or the real numbers R, or Cartesian products such as RN . If A ⊆ B
but A 6= B, we say that A is a proper subset of B and write A ⊂ B. Another binary operation is the set
difference A\B, which contains all ω such that ω ∈ A and ω ∈ / B.
In probability theory, each object ω is identified as an event. We denote by Ω the set of all events, and
∅ denotes the set of no events. There are three basic axioms of probability:

i) To each set A is associated a non-negative real number P (A), which is called the probability of A.

ii) P (Ω) = 1.

iii) If {Ai } is a collection of disjoint sets, i.e. if Ai ∩ Aj = ∅ for all i 6= j, then


[  X
P Ai = P (Ai ) . (1.24)
i i

From these axioms follow a number of conclusions. Among them, let ¬A = Ω\A be the complement of
A, i.e. the set of all events not in A. Then since A ∪ ¬A = Ω, we have P (¬A) = 1 − P (A). Taking A = Ω,
we conclude P (∅) = 0.
The meaning of P (A) is that if events ω are chosen from Ω at random, then the relative frequency for
ω ∈ A approaches P (A) as the number of trials tends to infinity. But what do we mean by ’at random’ ?
1.3. BASIC CONCEPTS IN PROBABILITY THEORY 9

One meaning we can impart to the notion of randomness is that a process is random if its outcomes
can be accurately modeled using the axioms of probability. This entails the identification of a probability
space Ω as well as a probability measure P . For example, in the microcanonical ensemble of classical
statistical physics, the space Ω is the Q
collection of phase space points
 ϕ = {q1 , . . . , qn , p1 , . . . , pn } and the
−1 n
probability measure is dµ = Σ (E) i=1 dqi dpi δ E − H(q, p) , so that for A ∈ Ω the probability of A
R
is P (A) = dµ χA (ϕ), where χA (ϕ) = 1 if ϕ ∈ A and χA (ϕ) R = 0 if ϕ ∈ / A is the characteristic function
of A. The quantity Σ(E) is determined by normalization: dµ = 1.

1.3.2 Bayesian statistics

We now introduce two additional probabilities. The joint probability for sets A and B together is written
P (A ∩ B). That is, P (A ∩ B) = Prob[ω ∈ A and ω ∈ B]. For example, A might denote the set of all
politicians, B the set of all American citizens, and C the set of all living humans with an IQ greater than
60. Then A ∩ B would be the set of all politicians who are also American citizens, etc. Exercise: estimate
P (A ∩ B ∩ C).
The conditional probability of B given A is written P (B|A). We can compute the joint probability
P (A ∩ B) = P (B ∩ A) in two ways:

P (A ∩ B) = P (A|B) · P (B) = P (B|A) · P (A) . (1.25)

Thus,
P (B|A) P (A)
P (A|B) = , (1.26)
P (B)
a result known as Bayes’ theorem. Now suppose the ‘event space’ is partitioned as {Ai }. Then
X
P (B) = P (B|Ai ) P (Ai ) . (1.27)
i

We then have
P (B|Ai ) P (Ai )
P (Ai |B) = P , (1.28)
j P (B|Aj ) P (Aj )

a result sometimes known as the extended form of Bayes’ theorem. When the event space is a ‘binary
partition’ {A, ¬A}, we have

P (B|A) P (A)
P (A|B) = . (1.29)
P (B|A) P (A) + P (B|¬A) P (¬A)

Note that P (A|B) + P (¬A|B) = 1 (which follows from ¬¬A = A).


As an example, consider the following problem in epidemiology. Suppose there is a rare but highly
contagious disease A which occurs in 0.01% of the general population. Suppose further that there is
a simple test for the disease which is accurate 99.99% of the time. That is, out of every 10,000 tests,
the correct answer is returned 9,999 times, and the incorrect answer is returned only once. Now let us
administer the test to a large group of people from the general population. Those who test positive are
quarantined. Question: what is the probability that someone chosen at random from the quarantine
10 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

group actually has the disease? We use Bayes’ theorem with the binary partition {A, ¬A}. Let B denote
the event that an individual tests positive. Anyone from the quarantine group has tested positive. Given
this datum, we want to know the probability that that person has the disease. That is, we want P (A|B).
Applying eqn. 1.29 with

P (A) = 0.0001 , P (¬A) = 0.9999 , P (B|A) = 0.9999 , P (B|¬A) = 0.0001 ,

we find P (A|B) = 21 . That is, there is only a 50% chance that someone who tested positive actually has
the disease, despite the test being 99.99% accurate! The reason is that, given the rarity of the disease in
the general population, the number of false positives is statistically equal to the number of true positives.
In the above example, we had P (B|A) + P (B|¬A) = 1, but this is not generally the case. What is true
instead is P (B|A) + P (¬B|A) = 1. Epidemiologists define the sensitivity of a binary classification test as
the fraction of actual positives which are correctly identified, and the specificity as the fraction of actual
negatives that are correctly identified. Thus, se = P (B|A) is the sensitivity and sp = P (¬B|¬A) is the
specificity. We then have P (B|¬A) = 1 − P (¬B|¬A). Therefore,

P (B|A) + P (B|¬A) = 1 + P (B|A) − P (¬B|¬A) = 1 + se − sp . (1.30)

In our previous example, se = sp = 0.9999, in which case the RHS above gives 1. In general, if P (A) ≡ f
is the fraction of the population which is afflicted, then
f · se
P (infected | positive) = . (1.31)
f · se + (1 − f ) · (1 − sp)

For continuous distributions, we speak of a probability density. We then have


Z
P (y) = dx P (y|x) P (x) (1.32)

and
P (y|x) P (x)
P (x|y) = R . (1.33)
dx′ P (y|x′ ) P (x′ )
The range of integration may depend on the specific application.
The quantities P (Ai ) are called the prior distribution. Clearly in order to compute P (B) or P (Ai |B)
we must know the priors, and this is usually the weakest link in the Bayesian chain of reasoning. If
our prior distribution is not accurate, Bayes’ theorem will generate incorrect results. One approach to
approximating prior probabilities P (Ai ) is to derive them from a maximum entropy construction.

1.3.3 Random variables and their averages

Consider an abstract probability space X whose elements (i.e. events) are labeled by x. The average of
any function f (x) is denoted as Ef or hf i, and is defined for discrete sets as
X
Ef = hf i = f (x) P (x) , (1.34)
x∈X
1.4. ENTROPY AND PROBABILITY 11

where P (x) is the probability of x. For continuous sets, we have


Z
Ef = hf i = dx f (x) P (x) . (1.35)
X

Typically for continuous sets we have X = R or X = R≥0 . Gardiner and other authors introduce an
extra symbol, X, to denote a random variable, with X(x) = x being its value. This is formally useful
but notationally confusing, so we’ll avoid it here and speak loosely of x as a random variable.
When there are two random variables x ∈ X and y ∈ Y, we have Ω = X × Y is the product space, and
XX
Ef (x, y) = hf (x, y)i = f (x, y) P (x, y) , (1.36)
x∈X y∈Y

with the obvious generalization to continuous sets. This generalizes to higher rank products, i.e. xi ∈ Xi
with i ∈ {1, . . . , N }. The covariance of xi and xj is defined as

 
Cij ≡ xi − hxi i xj − hxj i = hxi xj i − hxi ihxj i . (1.37)

If f (x) is a convex function then one has

Ef (x) ≥ f (Ex) . (1.38)

For continuous functions, f (x) is convex if f ′′ (x) ≥ 0 everywhere4 . If f (x) is convex on some interval
[a, b] then for x1,2 ∈ [a, b] we must have

f λx1 + (1 − λ)x2 ≤ λf (x1 ) + (1 − λ)f (x2 ) , (1.39)

where λ ∈ [0, 1]. This is easily generalized to


X  X
f p n xn ≤ pn f (xn ) , (1.40)
n n

where pn = P (xn ), a result known as Jensen’s theorem.

1.4 Entropy and Probability

1.4.1 Entropy and information theory

It was shown in the classic 1948 work of Claude Shannon that entropy is in fact a measure of information 5 .
Suppose we observe that a particular event occurs with probability p. We associate with this observation
an amount of information I(p). The information I(p) should satisfy certain desiderata:
4
A function g(x) is concave if −g(x) is convex.
5
See ‘An Introduction to Information Theory and Entropy’ by T. Carter, Santa Fe Complex Systems Summer School,
June 2011. Available online at http://astarte.csustan.edu/$\sim$tom/SFI-CSSS/info-theory/info-lec.pdf.
12 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

1 Information is non-negative, i.e. I(p) ≥ 0.

2 If two events occur independently so their joint probability is p1 p2 , then their information is addi-
tive, i.e. I(p1 p2 ) = I(p1 ) + I(p2 ).

3 I(p) is a continuous function of p.

4 There is no information content to an event which is always observed, i.e. I(1) = 0.

From these four properties, it is easy to show that the only possible function I(p) is

I(p) = −A ln p , (1.41)

where A is an arbitrary constant that can be absorbed into the base of the logarithm, since logb x =
ln x/ ln b. We will take A = 1 and use e as the base, so I(p) = − ln p. Another common choice is to take
the base of the logarithm to be 2, so I(p) = − log2 p. In this latter case, the units of information are
known as bits. Note that I(0) = ∞. This means that the observation of an extremely rare event carries
a great deal of information6
Now suppose we have a set of events labeled by an integer n which occur with probabilities {pn }. What
is the expected amount of information in N observations? Since event n occurs an average of N pn times,
and the information content in pn is − ln pn , we have that the average information per observation is

hIN i X
S= =− pn ln pn , (1.42)
N n

which is known as the entropy of the distribution. Thus, maximizing S is equivalent to maximizing the
information content per observation.
Consider, for example, the information content of course grades. As we shall see, if the only constraint on
the probability distribution is that of overall normalization, then S is maximized when all the probabilities
pn are equal. The binary entropy is then S = log2 Γ , since pn = 1/Γ . Thus, for pass/fail grading, the
maximum average information per grade is − log2 ( 21 ) = log2 2 = 1 bit. If only A, B, C, D, and F grades are
assigned, then the maximum average information per grade is log2 5 = 2.32 bits. If we expand the grade
options to include {A+, A, A-, B+, B, B-, C+, C, C-, D, F}, then the maximum average information
per grade is log2 11 = 3.46 bits.
Equivalently, consider, following the discussion in vol. 1 of Kardar, a random sequence {n1 , n2 , . . . , nN }
where each element nj takes one of K possible values. There are then K N such possible sequences, and
to specify one of them requires log2 (K N ) = N log2 K bits of information. However, if the value n occurs
with probability pn , then on average it will occur Nn = N pn times in a sequence of length N , and the
total number of such sequences will be

N!
g(N ) = QK . (1.43)
n=1 Nn !
6
My colleague John McGreevy refers to I(p) as the surprise of observing an event which occurs with probability p. I like
this very much.
1.4. ENTROPY AND PROBABILITY 13

In general, this is far less that the total possible number K N , and the number of bits necessary to specify
one from among these g(N ) possibilities is

K
X K
X
log2 g(N ) = log2 (N !) − log2 (Nn !) ≈ −N pn log2 pn , (1.44)
n=1 n=1

up to terms of order unity. Here we have invoked Stirling’s approximation. If the distribution is uniform,
then we have pn = K1 for all n ∈ {1, . . . , K}, and log2 g(N ) = N log2 K.

1.4.2 Probability distributions from maximum entropy

We have shown how one can proceed from a probability distribution and compute various averages. We
now seek to go in the other direction, and determine the full probability distribution based on a knowledge
of certain averages.
At first, this seems impossible. Suppose we want to reproduce the full probability distribution for an
N -step random walk from knowledge of the average hXi = (2p − 1)N , where p is the probability of
moving to the right at each step (see §1.2 above). The problem seems ridiculously underdetermined,
since there are 2N possible configurations for an N -step random walk: σj = ±1 for j = 1, . . . , N . Overall
normalization requires
X
P (σ1 , . . . , σN ) = 1 , (1.45)
{σj }

but this just imposes one constraint on the 2N probabilities P (σ1 , . . . , σN ), leaving 2N − 1 overall param-
eters. What principle allows us to reconstruct the full probability distribution

N
Y N
 Y
P (σ1 , . . . , σN ) = p δσj ,1 + q δσj ,−1 = p(1+σj )/2 q (1−σj )/2 , (1.46)
j=1 j=1

corresponding to N independent steps?

The principle of maximum entropy

The entropy of a discrete probability distribution {pn } is defined as


X
S=− pn ln pn , (1.47)
n

where here we take e as the base of the logarithm.


 The entropy may therefore be regarded as a function of
the probability distribution: S = S {pn } . One special
 Aproperty
 of the entropy is the following. Suppose
B
we have two independent normalized distributions pa and pb . The joint probability for events a
14 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

and b is then Pa,b = pA B


a pb . The entropy of the joint distribution is then
XX XX  XX 
S=− Pa,b ln Pa,b = − pA B A B
a pb ln pa pb = − pA B A B
a pb ln pa + ln pb
a b a b a b
X X X X X X
=− pA A
a ln pa · pB
b − pB B
b ln pb · pA
a =− pA
a ln pA
a − pB B
b ln pb
a b b a a b
A B
=S +S .
Thus, the entropy of a joint distribution formed from two independent distributions is additive.
P
Suppose all we knew about {pn } was that it was normalized. Then n pn = 1. This is a constraint on
the values {pn }. Let us now extremize the entropy S with respect to the distribution {pn }, but subject
to the normalization constraint. We do this using Lagrange’s method of undetermined multipliers. We
define  X X 
S ∗ {pn }, λ = − pn ln pn − λ pn − 1 (1.48)
n n
and we freely extremize S∗ over all its arguments. Thus, for all n we have
∂S ∗ 
0= = − ln pn + 1 + λ
∂pn
∂S ∗ X (1.49)
0= = pn − 1 .
∂λ n

From the first of these equations, we obtain pn = e−(1+λ) , and from the second we obtain
X X
pn = e−(1+λ) · 1 = Γ e−(1+λ) , (1.50)
n n
P
where Γ ≡ n 1 is the total number of possible events. Thus, pn = 1/Γ , which says that all events are
equally probable.
P
Now suppose we know one other piece of information, which is the average value X = n Xn pn of some
quantity. We now extremize S subject to two constraints, and so we define
 X X  X 

S {pn }, λ0 , λ1 = − pn ln pn − λ0 p n − 1 − λ1 Xn pn − X . (1.51)
n n n

We then have
∂S ∗ 
= − ln pn + 1 + λ0 + λ1 Xn = 0 , (1.52)
∂pn
which yields the two-parameter distribution
pn = e−(1+λ0 ) e−λ1 Xn . (1.53)
P P
To fully determine the distribution {pn } we need to invoke the two equations n pn = 1 and n Xn pn =
X, which come from extremizing S ∗ with respect to λ0 and λ1 , respectively:
X
1 = e−(1+λ0 ) e−λ1 Xn
n
X (1.54)
−(1+λ0 )
X=e Xn e−λ1 Xn .
n
1.4. ENTROPY AND PROBABILITY 15

General formulation

The generalization to K extra pieces of information (plus normalization) is immediately apparent. We


have X
Xa = Xna pn , (1.55)
n

and therefore we define

 X K
X X 

S {pn }, {λa } = − pn ln pn − λa Xna pn − X a , (1.56)
n a=0 n

(a=0)
with Xn ≡ X (a=0) = 1. Then the optimal distribution which extremizes S subject to the K + 1
constraints is
( K
)
X
a
pn = exp − 1 − λa Xn
a=0
( ) (1.57)
K
X
1
= exp − λa Xna ,
Z a=1

P
where Z = e1+λ0 is determined by normalization: n pn = 1. This is a (K + 1)-parameter distribution,
with {λ0 , λ1 , . . . , λK } determined by the K + 1 constraints in eqn. 1.55.

Example

As an example, consider the random walk problem. We have two pieces of information:
X X
··· P (σ1 , . . . , σN ) = 1
σ1 σN

X X N
X (1.58)
··· P (σ1 , . . . , σN ) σj = X .
σ1 σN j=1

Here the discrete label n from §1.4.2 ranges over 2N possible values, and may be written as an N digit
binary number rN · · · r1 , where rj = 12 (1 + σj ) is 0 or 1. Extremizing S subject to these constraints, we
obtain ( )
X N
Y
P (σ1 , . . . , σN ) = C exp − λ σj = C e−λ σj , (1.59)
j j=1

where C ≡ e−(1+λ0 ) and λ ≡ λ1 . Normalization then requires


X N
Tr P ≡ P (σ1 , . . . , σN ) = C eλ + e−λ , (1.60)
{σj }
16 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

hence C = (cosh λ)−N . We then have


N
Y N
Y
e−λσj 
P (σ1 , . . . , σN ) = λ −λ
= p δσ ,1 + q δσ ,−1 , (1.61)
e +e j j
j=1 j=1

where
e−λ eλ
p=
, q = 1 − p = . (1.62)
eλ + e−λ eλ + e−λ
We then have X = (2p − 1)N , which determines p = 21 (N + X), and we have recovered the Bernoulli
distribution.
Of course there are no miracles7 , and there are an infinite family of distributions for which X = (2p −
1)N that are not Bernoulli. For example, we could have imposed another constraint, such as E =
P N −1
j=1 σj σj+1 . This would result in the distribution
( N N −1
)
1 X X
P (σ1 , . . . , σN ) = exp − λ1 σ j − λ2 σj σj+1 , (1.63)
Z
j=1 j=1
P
with Z(λ1 , λ2 ) determined by normalization: σ P (σ) = 1. This is the one-dimensional Ising chain
′ ′
of classical equilibrium statistical physics. Defining the transfer matrix Rss′ = e−λ1 (s+s )/2 e−λ2 ss with
s, s′ = ±1 ,
 −λ −λ 
e 1 2 eλ2
R=
eλ2 eλ1 −λ2 (1.64)
= e−λ2 cosh λ1 I + eλ2 τ x − e−λ2 sinh λ1 τ z ,
where τ x and τ z are Pauli matrices, we have that
 
Zring = Tr RN , Zchain = Tr RN −1 S , (1.65)

where Sss′ = e−λ1 (s+s )/2 , i.e.
 
e−λ1 1
S=
1 eλ1 (1.66)
x z
= cosh λ1 I + τ − sinh λ1 τ .
The appropriate case here is that of the chain, but in the thermodynamic limit N → ∞ both chain and
ring yield identical results, so we will examine here the results for the ring, which are somewhat easier to
obtain. Clearly Zring = ζ+ N + ζ N , where ζ are the eigenvalues of R:
− ±
q
ζ± = e−λ2 cosh λ1 ± e−2λ2 sinh2 λ1 + e2λ2 . (1.67)
N . We now have
In the thermodynamic limit, the ζ+ eigenvalue dominates, and Zring ≃ ζ+
DX
N E ∂ ln Z N sinh λ1
X= σj = − = −q . (1.68)
∂λ1 sinh2 λ + e4λ2
j=1 1
7
See §10 of An Enquiry Concerning Human Understanding by David Hume (1748).
1.4. ENTROPY AND PROBABILITY 17

We also have E = −∂ ln Z/∂λ2 . These two equations determine the Lagrange multipliers λ1 (X, E, N ) and
λ2 (X, E, N ). In the thermodynamic limit, we have λi = λi (X/N, E/N ). Thus, if we fix X/N = 2p − 1
alone, there is a continuous one-parameter family of distributions, parametrized ε = E/N , which satisfy
the constraint on X.
So what is it about the maximum entropy approach that is so compelling? Maximum entropy gives us
a calculable distribution which is consistent with maximum ignorance given our known constraints. In
that sense, it is as unbiased as possible, from an information theoretic point of view. As a starting point,
a maximum entropy distribution may be improved upon, using Bayesian methods for example (see §1.6.2
below).

1.4.3 Continuous probability distributions

Suppose we have a continuous probability density P (ϕ) defined over some set Ω. We have observables
Z
X = dµ X a (ϕ) P (ϕ) ,
a
(1.69)

Q
where dµ is the appropriate integration measure. We assume dµ = D j=1 dϕj , where D is the dimension
of Ω. Then we extremize the functional
Z K Z !
  X
S ∗ P (ϕ), {λa } = − dµ P (ϕ) ln P (ϕ) − λa dµ P (ϕ) X a (ϕ) − X a (1.70)
Ω a=0 Ω

with respect to P (ϕ) and with respect to {λa }. Again, X 0 (ϕ) ≡ X 0 ≡ 1. This yields the following result:
K
X
ln P (ϕ) = −1 − λa X a (ϕ) . (1.71)
a=0

The K + 1 Lagrange multipliers {λa } are then determined from the K + 1 constraint equations in eqn.
1.69.
As an example, consider a distribution P (x) over the real numbers R. We constrain
Z∞ Z∞ Z∞
dx P (x) = 1 , dx x P (x) = µ , dx x2 P (x) = µ2 + σ 2 . (1.72)
−∞ −∞ −∞

Extremizing the entropy, we then obtain


2
P (x) = C e−λ1 x−λ2 x , (1.73)

where C = e−(1+λ0 ) . We already know the answer:


1 2 2
P (x) = √ e−(x−µ) /2σ . (1.74)
2πσ 2

In other words, λ1 = −µ/σ 2 and λ2 = 1/2σ 2 , with C = (2πσ 2 )−1/2 exp(−µ2 /2σ 2 ).
18 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

1.5 General Aspects of Probability Distributions

1.5.1 Discrete and continuous distributions

Consider a system whose possible configurations | n i can be labeled by a discrete variable n ∈ C, where
C is the set of possible configurations. The total number of possible configurations, which is to say the
order of the set C, may be finite or infinite. Next, consider an ensemble of such systems, and let Pn
denote the probability that a given random element from that ensemble is in the state (configuration)
| n i. The collection {Pn } forms a discrete probability distribution. We assume that the distribution is
normalized, meaning X
Pn = 1 . (1.75)
n∈C

Now let An be a quantity which takes values depending on n. The average of A is given by
X
hAi = Pn An . (1.76)
n∈C

Typically, C is the set of integers (Z) or some subset thereof, but it could be any countable set. As an
example, consider the throw of a single six-sided die. Then Pn = 16 for each n ∈ {1, . . . , 6}. Let An = 0
if n is even and 1 if n is odd. Then find hAi = 12 , i.e. on average half the throws of the die will result in
an even number.
It may be that the system’s configurations are described by several discrete variables {n1 , n2 ,P
n3 , . . .}. We
can combine these into a vector n and then we write Pn for the discrete distribution, with n Pn = 1.
Another possibility is that the system’s configurations are parameterized by a collection of continuous
variables, ϕ = {ϕ1 , . . . , ϕn }. We write ϕ ∈ Ω, where Ω is the phase space (or configuration space) of the
system. Let dµ be a measure on this space. In general, we can write

dµ = W (ϕ1 , . . . , ϕn ) dϕ1 dϕ2 · · · dϕn . (1.77)

The phase space measure used in classical statistical mechanics gives equal weight W to equal phase
space volumes:
Yr
dµ = C dqσ dpσ , (1.78)
σ=1

where C is a constant we shall discuss later on below8 .


Any continuous probability distribution P (ϕ) is normalized according to
Z
dµ P (ϕ) = 1 . (1.79)

8
Such a measure is invariant with respect to canonical transformations, which are the broad class of transformations
among coordinates and momenta which leave Hamilton’s equations of motion invariant, and which preserve phase space
volumes under Hamiltonian evolution. For this reason dµ is called an invariant phase space measure.
1.5. GENERAL ASPECTS OF PROBABILITY DISTRIBUTIONS 19

The average of a function A(ϕ) on configuration space is then


Z
hAi = dµ P (ϕ) A(ϕ) . (1.80)

For example, consider the Gaussian distribution


1 2 2
P (x) = √ e−(x−µ) /2σ . (1.81)
2πσ 2

From the result9


Z∞ r
−αx2 −βx π β 2 /4α
dx e e = e , (1.82)
α
−∞

we see that P (x) is normalized. One can then compute

hxi = µ
(1.83)
hx i − hxi2 = σ 2 .
2

We call µ the mean and σ the standard deviation of the distribution, eqn. 1.81.
The quantity P (ϕ) is called the distribution or probability density. One has

P (ϕ) dµ = probability that configuration lies within volume dµ centered at ϕ


 
For example, consider the probability density P = 1 normalized on the interval x ∈ 0, 1 . The probability
that some x chosen at random will be exactly 21 , say, is infinitesimal
 – one
 would have to specify each of
1
the infinitely many digits of x. However, we can say that x ∈ 0.45 , 0.55 with probability 10 .
If x is distributed according to P1 (x), then the probability distribution on the product space (x1 , x2 )
is simply the product of the distributions: P2 (x1 , x2 ) = P1 (x1 ) P1 (x2 ). Suppose we have a function
φ(x1 , . . . , xN ). How is it distributed? Let P (φ) be the distribution for φ. We then have
Z∞ Z∞  
P (φ) = dx1 · · · dxN PN (x1 , . . . , xN ) δ φ(x1 , . . . , xN ) − φ
−∞ −∞
(1.84)
Z∞ Z∞  
= dx1 · · · dxN P1 (x1 ) · · · P1 (xN ) δ φ(x1 , . . . , xN ) − φ ,
−∞ −∞

where the second line is appropriate if the {xj } are themselves distributed independently. Note that

Z∞
dφ P (φ) = 1 , (1.85)
−∞

so P (φ) is itself normalized.


9
Memorize this!
20 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

1.5.2 Central limit theorem


PN
In particular, consider the distribution function of the sum X = i=1 xi . We will be particularly
interested in the case where N is large. For general N , though, we have
Z∞ Z∞

PN (X) = dx1 · · · dxN P1 (x1 ) · · · P1 (xN ) δ x1 + x2 + . . . + xN − X . (1.86)
−∞ −∞

It is convenient to compute the Fourier transform10 of P (X):

Z∞
P̂N (k) = dX PN (X) e−ikX
−∞
(1.87)
Z∞ Z∞ Z∞
 N
= dX dx1 · · · dxN P1 (x1 ) · · · P1 (xN ) δ x1 + . . . + xN − X) e−ikX = P̂1 (k) ,
−∞ −∞ −∞

where
Z∞
P̂1 (k) = dx P1 (x) e−ikx (1.88)
−∞

is the Fourier transform of the single variable distribution P1 (x). The distribution PN (X) is a convo-
lution of the individual P1 (xi ) distributions. We have therefore proven that the Fourier transform of a
convolution is the product of the Fourier transforms.
OK, now we can write for P̂1 (k)

Z∞

P̂1 (k) = dx P1 (x) 1 − ikx − 1
2 k2 x2 + 16 i k 3 x3 + . . .
(1.89)
−∞
= 1 − ikhxi − 21 k2 hx2 i + 1
6 i k 3 hx3 i + . . . .
10
Jean Baptiste Joseph Fourier (1768-1830) had an illustrious career. The son of a tailor, and orphaned at age eight,
Fourier’s ignoble status rendered him ineligible to receive a commission in the scientific corps of the French army. A
Benedictine minister at the École Royale Militaire of Auxerre remarked, ”Fourier, not being noble, could not enter the
artillery, although he were a second Newton.” Fourier prepared for the priesthood but his affinity for mathematics proved
overwhelming, and so he left the abbey and soon thereafter accepted a military lectureship position. Despite his initial
support for revolution in France, in 1794 Fourier ran afoul of a rival sect while on a trip to Orleans and was arrested
and very nearly guillotined. Fortunately the Reign of Terror ended soon after the death of Robespierre, and Fourier was
released. He went on Napoleon Bonaparte’s 1798 expedition to Egypt, where he was appointed governor of Lower Egypt. His
organizational skills impressed Napoleon, and upon return to France he was appointed to a position of prefect in Grenoble.
It was in Grenoble that Fourier performed his landmark studies of heat, and his famous work on partial differential equations
and Fourier series. It seems that Fourier’s fascination with heat began in Egypt, where he developed an appreciation of
desert climate. His fascination developed into an obsession, and he became convinced that heat could promote a healthy
body. He would cover himself in blankets, like a mummy, in his heated apartment, even during the middle of summer. On
May 4, 1830, Fourier, so arrayed, tripped and fell down a flight of stairs. This aggravated a developing heart condition,
which he refused to treat with anything other than more heat. Two weeks later, he died. Fourier’s is one of the 72 names
of scientists, engineers and other luminaries which are engraved on the Eiffel Tower.
1.5. GENERAL ASPECTS OF PROBABILITY DISTRIBUTIONS 21

Thus,
ln P̂1 (k) = −iµk − 12 σ 2 k2 + 16 i γ 3 k3 + . . . , (1.90)
where

µ = hxi
σ 2 = hx2 i − hxi2 (1.91)
3 3 2 3
γ = hx i − 3 hx i hxi + 2 hxi

We can now write


 N 2 2 3 3
P̂1 (k) = e−iN µk e−N σ k /2 eiN γ k /6 · · · (1.92)
3 k 3 /6
Now for the inverse transform. In computing PN (X), we will expand the term eiN γ and all subsequent
terms in the above product as a power series in k. We then have
Z∞
dk ik(X−N µ) −N σ2 k2 /2 n o
PN (X) = e e 1 + 61 i N γ 3 k3 + . . .

−∞
 
γ3 ∂3 1 2 2 (1.93)
= 1− N + ...√ e−(X−N µ) /2N σ
6 ∂X 3 2πN σ 2
 
γ 3 −1/2 ∂ 3 1 2 2
= 1− N 3
+ ... √ e−ξ /2σ .
6 ∂ξ 2πN σ 2


In going from the second line to the third, we have written X = N µ+ N ξ, in which case ∂X = N −1/2 ∂ξ ,
and the non-Gaussian terms give a subleading contribution which vanishes in the N → ∞ limit. We have
just proven the central limit theorem: in the limit N → ∞, the distribution √ of a sum of N independent
random variables xi is a Gaussian with mean N µ and standard deviation N σ. Our only assumptions
are that the mean µ and standard deviation σ exist for the distribution P1 (x). Note that P1 (x) itself
need not be a Gaussian – it could be a very peculiar distribution indeed, but so long as itsPfirst and
second moment exist, where the kth moment is simply hxk i, the distribution of the sum X = N i=1 xi is
a Gaussian.

1.5.3 Moments and cumulants

Consider a general multivariate distribution P (x1 , . . . , xN ) and define the multivariate Fourier transform

Z∞ Z∞  N
X 
P̂ (k1 , . . . , kN ) = dx1 · · · dxN P (x1 , . . . , xN ) exp − i kj xj . (1.94)
−∞ −∞ j=1

The inverse relation is


Z∞ Z∞  N
X 
dk1 dkN
P (x1 , . . . , xN ) = ··· P̂ (k1 , . . . , kN ) exp + i kj xj . (1.95)
2π 2π
−∞ −∞ j=1
22 CHAPTER 1. FUNDAMENTALS OF PROBABILITY


Acting on P̂ (k), the differential operator i ∂k brings down from the exponential a factor of xi inside the
i
integral. Thus,
"    #
∂ m1 ∂ mN
m m
i ··· i P̂ (k) = x1 1 · · · xN N . (1.96)
∂k1 ∂kN
k=0
Similarly, we can reconstruct the distribution from its moments, viz .

X ∞
X (−ik1 )m1 (−ikN )mN
m1 m
P̂ (k) = ··· ··· x1 · · · xN N . (1.97)
m1 ! mN !
m1 =0 mN =0

m m
The cumulants hhx1 1 · · · xN N ii are defined by the Taylor expansion of ln P̂ (k):

X ∞
X (−ik1 )m1 (−ikN )mN

m1 m
ln P̂ (k) = ··· ··· x1 · · · xN N . (1.98)
m1 ! mN !
m1 =0 mN =0

There is no general form for the cumulants. It is straightforward to derive the following low order results:

hhxi ii = hxi i
hhxi xj ii = hxi xj i − hxi ihxj i (1.99)
hhxi xj xk ii = hxi xj xk i − hxi xj ihxk i − hxj xk ihxi i − hxk xi ihxj i + 2hxi ihxj ihxk i .

1.5.4 Multidimensional Gaussian integral

Consider the multivariable Gaussian distribution,


   
det A 1/2 1
P (x) ≡ exp − 2 i ij j ,
x A x (1.100)
(2π)n
where A is a positive definite matrix of rank n. A mathematical result which is extremely important
throughout physics is the following:
 1/2 Z∞ Z∞    
det A
Z(b) = dx1 · · · dxn exp − 21 xi Aij xj + bi xi = exp 12 bi A−1
ij j .
b (1.101)
(2π)n
−∞ −∞

Here, the vector b = (b1 , . . . , bn ) is identified as a source. Since Z(0) = 1, we have that the distribution
P (x) is normalized. Now consider averages of the form
Z
n ∂ nZ(b)
h xj1· · · xj i = d x P (x) xj1· · · xj =
2k 2k ∂bj · · · ∂bj b=0
X 1 2k (1.102)
−1 −1
= Aj j · · · Aj j .
σ(1) σ(2) σ(2k−1) σ(2k)
contractions

The sum in the last term is over all contractions of the indices {j1 , . . . , j2k }. A contraction is an
arrangement of the 2k indices into k pairs. There are C2k = (2k)!/2k k! possible such contractions. To
1.6. BAYESIAN STATISTICAL INFERENCE 23

obtain this result for Ck , we start with the first index and then find a mate among the remaining 2k − 1
indices. Then we choose the next unpaired index and find a mate among the remaining 2k − 3 indices.
Proceeding in this manner, we have
(2k)!
C2k = (2k − 1) · (2k − 3) · · · 3 · 1 =
. (1.103)
2k k!
Equivalently, we can take all possible permutations of the 2k indices, and then divide by 2k k! since
permutation within a given pair results in the same contraction and permutation among the k pairs
results in the same contraction. For example, for k = 2, we have C4 = 3, and
h xj1 xj2 xj3 xj4 i = A−1 −1 −1 −1 −1 −1
j j Aj j + Aj j Aj j + Aj j Aj j . (1.104)
1 2 3 4 1 3 2 4 1 4 2 3

If we define bi = iki , we have  


P̂ (k) = exp − 12 ki A−1
ij kj , (1.105)

from which we read off the cumulants hhxi xj ii = A−1


ij , with all higher order cumulants vanishing.

1.6 Bayesian Statistical Inference

1.6.1 Frequentists and Bayesians

There field of statistical inference is roughly divided into two schools of practice: frequentism and
Bayesianism. You can find several articles on the web discussing the differences in these two approaches.
In both cases we would like to model observable data x by a distribution. The distribution in general
depends on one or more parameters θ. The basic worldviews of the two approaches are as follows:

Frequentism: Data x are a random sample drawn from an infinite pool at some frequency.
The underlying parameters θ, which are to be estimated, remain fixed during this process.
There is no information prior to the model specification. The experimental conditions under
which the data are collected are presumed to be controlled and repeatable. Results are gen-
erally expressed in terms of confidence intervals and confidence levels, obtained via statistical
hypothesis testing. Probabilities have meaning only for data yet to be collected. Calculations
generally are computationally straightforward.

Bayesianism: The only data x which matter are those which have been observed. The
parameters θ are unknown and described probabilistically using a prior distribution, which
is generally based on some available information but which also may be at least partially
subjective. The priors are then to be updated based on observed data x. Results are expressed
in terms of posterior distributions and credible intervals. Calculations can be computationally
intensive.

In essence, frequentists say the data are random and the parameters are fixed. while Bayesians say the
data are fixed and the parameters are random 11 . Overall, frequentism has dominated over the past several
11
”A frequentist is a person whose long-run ambition is to be wrong 5% of the time. A Bayesian is one who, vaguely
expecting a horse, and catching glimpse of a donkey, strongly believes he has seen a mule.” – Charles Annis.
24 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

hundred years, but Bayesianism has been coming on strong of late, and many physicists seem naturally
drawn to the Bayesian perspective.

1.6.2 Updating Bayesian priors

Given data D and a hypothesis H, Bayes’ theorem tells us

P (D|H) P (H)
P (H|D) = . (1.106)
P (D)

Typically the data is in the form of a set of values x = {x1 , . . . , xN }, and the hypothesis in the form
of a set of parameters θ = {θ1 , . . . , θK }. It is notationally helpful to express distributions of x and
distributions of x conditioned on θ using the symbol f , and distributions of θ and distributions of θ
conditioned on x using the symbol π, rather than using the symbol P everywhere. We then have

f (x|θ) π(θ)
π(θ|x) = R , (1.107)
dθ ′ f (x|θ ′ ) π(θ ′ )
Θ
R
where Θ ∋ θ is the space of parameters. Note that Θ dθ π(θ|x) = 1. The denominator of the RHS
is simply f (x), which is independent of θ, hence π(θ|x) ∝ f (x|θ) π(θ). We call π(θ) the prior for θ,
f (x|θ) the likelihood of x given θ, and π(θ|x) the posterior for θ given x. The idea here is that while
our initial guess at the θ distribution is given by the prior π(θ), after taking data, we should update this
distribution to the posterior π(θ|x). The likelihood f (x|θ) is entailed by our model for the phenomenon
which produces the data. We can use the posterior to find the distribution of new data points y, called
the posterior predictive distribution,
Z
f (y|x) = dθ f (y|θ) π(θ|x) . (1.108)
Θ

This is the update of the prior predictive distribution,


Z
f (x) = dθ f (x|θ) π(θ) . (1.109)
Θ

Example: coin flipping

Consider a model of coin flipping based on a standard Bernoulli distribution, where θ ∈ [0, 1] is the
probability for heads (x = 1) and 1 − θ the probability for tails (x = 0). That is,

N h
Y i
f (x1 , . . . , xN |θ) = (1 − θ) δxj ,0 + θ δxj ,1
j=1 (1.110)
= θ X (1 − θ)N −X ,
1.6. BAYESIAN STATISTICAL INFERENCE 25

P
where X = N j=1 xj is the observed total number of heads, and N − X the corresponding number of tails.
We now need a prior π(θ). We choose the Beta distribution,

θ α−1 (1 − θ)β−1
π(θ) = , (1.111)
B(α, β)

where B(α, β) =R 1Γ(α) Γ(β)/Γ(α + β) is the Beta function. One can check that π(θ) is normalized on the
unit interval: 0 dθ π(θ) = 1 for all positive α, β. Even if we limit ourselves to this form of the prior,
different Bayesians might bring different assumptions about the values of α and β. Note that if we choose
α = β = 1, the prior distribution for θ is flat, with π(θ) = 1.
We now compute the posterior distribution for θ:

f (x1 , . . . , xN |θ) π(θ) θ X+α−1 (1 − θ)N −X+β−1


π(θ|x1 , . . . , xN ) = R 1 = . (1.112)
′ ′ ′ B(X + α, N − X + β)
0 dθ f (x1 , . . . , xN |θ ) π(θ )

Thus, we retain the form of the Beta distribution, but with updated parameters,

α′ = X + α
(1.113)
β′ = N − X + β .

The fact that the functional form of the prior is retained by the posterior is generally not the case in
Bayesian updating. We can also compute the prior predictive,

Z1
f (x1 , . . . , xN ) = dθ f (x1 , . . . , xN |θ) π(θ)
0
(1.114)
Z1
1 B(X + α, N − X + β)
= dθ θ X+α−1 (1 − θ)N −X+β−1 = .
B(α, β) B(α, β)
0

The posterior predictive is then

Z1
f (y1 , . . . , yM |x1 , . . . , xN ) = dθ f (y1 , . . . , yM |θ) π(θ|x1 , . . . , xN )
0
Z1
1 (1.115)
= dθ θ X+Y +α−1 (1 − θ)N −X+M −Y +β−1
B(X + α, N − X + β)
0
B(X + Y + α, N − X + M − Y + β)
= .
B(X + α, N − X + β)

1.6.3 Hyperparameters and conjugate priors

In the above example, θ is a parameter of the Bernoulli distribution, i.e. the likelihood, while quantities
α and β are hyperparameters which enter the prior π(θ). Accordingly, we could have written π(θ|α, β)
26 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

for the prior. We then have for the posterior

f (x|θ) π(θ|α)
π(θ|x, α) = R , (1.116)
dθ ′ f (x|θ ′ ) π(θ ′ |α)
Θ

replacing eqn. 1.107, etc., where α ∈ A is the vector of hyperparameters. The hyperparameters can
also be distributed, according to a hyperprior ρ(α), and the hyperpriors can further be parameterized by
hyperhyperparameters, which can have their own distributions, ad nauseum.
What use is all this? We’ve already seen a compelling example: when the posterior is of the same form
as the prior, the Bayesian update can be viewed as an automorphism of the hyperparameter space A, i.e.
one set of hyperparameters α is mapped to a new set of hyperparameters α.e


Definition: A parametric family of distributions
 P = π(θ|α) | θ ∈ Θ, α ∈ A is called a
conjugate family for a family of distributions f (x|θ) | x ∈ X , θ ∈ Θ if, for all x ∈ X and
α ∈ A,
f (x|θ) π(θ|α)
π(θ|x, α) ≡ R ′ ∈P . (1.117)
dθ f (x|θ ′ ) π(θ ′ |α)
Θ

That is, π(θ|x, α) = π(θ|α) e ∈ A, with α


e for some α e = α(α,
e x).

As an example, consider the conjugate Bayesian analysis of the Gaussian distribution. We assume a
likelihood ( )
XN
1
f (x|u, s) = (2πs2 )−N/2 exp − 2 (xj − u)2 . (1.118)
2s
j=1

The parameters here are θ = {u, s}. Now consider the prior distribution
( )
(u − µ0 )2
π(u, s|µ0 , σ0 ) = (2πσ02 )−1/2 exp − . (1.119)
2σ02

Note that the prior distribution is independent of the parameter s and only depends on u and the
hyperparameters α = (µ0 , σ0 ). We now compute the posterior:

π(u, s|x, µ0 , σ0 ) ∝ f (x|u, s) π(u, s|µ0 , σ0 )


(      2 )
1 N µ N hxi µ N hx 2i (1.120)
= exp − + 2 u2 + 0
+ 2 u− 0
+ ,
2σ02 2s σ02 s 2σ02 2s2

P 1 PN
with hxi = N1 N 2
j=1 xj and hx i = N
2
j=1 xj . This is also a Gaussian distribution for u, and after
supplying the appropriate normalization one finds
( )
2 −1/2 (u − µ1 )2
π(u, s|x, µ0 , σ0 ) = (2πσ1 ) exp − , (1.121)
2σ12
1.6. BAYESIAN STATISTICAL INFERENCE 27

with

N hxi − µ0 σ02
µ1 = µ0 +
s2 + N σ02
(1.122)
s2 σ02
σ12 = 2 .
s + N σ02

Thus, the posterior is among the same family as the prior, and we have derived the update rule for the
hyperparameters (µ0 , σ0 ) → (µ1 , σ1 ). Note that σ1 < σ0 , so the updated Gaussian prior is sharper than
the original. The updated mean µ1 shifts in the direction of hxi obtained from the data set.

1.6.4 The problem with priors

We might think that the for the coin flipping problem, the flat prior π(θ) = 1 is an appropriate initial
one, since it does not privilege any value of θ. This prior therefore seems ’objective’ or ’unbiased’, also
called ’uninformative’. But supposewe make a change of variables, mapping the interval θ ∈ [0, 1] to the
entire real line according to ζ = ln θ/(1 − θ) . In terms of the new parameter ζ, we write the prior as
π̃(ζ). Clearly π(θ) dθ = π̃(ζ) dζ, so π̃(ζ) = π(θ) dθ/dζ. For our example, find π̃(ζ) = 41 sech2(ζ/2), which
is not flat. Thus what was uninformative in terms of θ has become very informative in terms of the new
parameter ζ. Is there any truly unbiased way of selecting a Bayesian prior?
One approach, advocated by E. T. Jaynes, is to choose the prior distribution π(θ) according to the
principle of maximum entropy. For continuous parameter spaces, we must first define a parameter space
metric so as to be able to ’count’ the number ofR different parameter states. The entropy of a distribution
π(θ) is then dependent on this metric: S = − dµ(θ) π(θ) ln π(θ).
Another approach, due to Jeffreys, is to derive a parameterization-independent prior from the likelihood
f (x|θ) using the so-called Fisher information matrix ,
 2 
∂ lnf (x|θ)
Iij (θ) = −Eθ
∂θi ∂θj
Z (1.123)
∂ 2 lnf (x|θ)
= − dx f (x|θ) .
∂θi ∂θj

The Jeffreys prior πJ (θ) is defined as p


πJ (θ) ∝ det I(θ) . (1.124)
One can check that the Jeffries prior is invariant under reparameterization. As an
P example, consider the
Bernoulli process, for which ln f (x|θ) = X ln θ + (N − X) ln(1 − θ), where X = N j=1 xj . Then

d2 ln p(x|θ) X N −X
− 2
= 2+ , (1.125)
dθ θ (1 − θ)2
and since Eθ X = N θ, we have

N 1 1
I(θ) = ⇒ πJ (θ) = p , (1.126)
θ(1 − θ) π θ(1 − θ)
28 CHAPTER 1. FUNDAMENTALS OF PROBABILITY

which felicitously corresponds to a Beta distribution with α = β = 21 . In this example the Jeffries prior
turned out to be a conjugate prior, but in general this is not the case.
We can try to implement the Jeffreys procedure for a two-parameter family where each xj is normally
distributed with mean µ and standard deviation σ. Let the parameters be (θ1 , θ2 ) = (µ, σ). Then
N
√ 1 X
− ln f (x|θ) = N ln 2π + N ln σ + 2 (xj − µ)2 , (1.127)

j=1

and the Fisher information matrix is


 P 
N σ −2 σ −3 j (xj − µ)
∂ 2 lnf (x|θ)
I(θ) = − = P P
 . (1.128)
∂θi ∂θj
σ −3 j (xj − µ) −N σ −2 + 3σ −4 j (xj − µ)2

Taking the expectation value, we have E (xj − µ) = 0 and E (xj − µ)2 = σ 2 , hence
 
N σ −2 0
E I(θ) = (1.129)
0 2N σ −2

and the Jeffries prior is πJ (µ, σ) ∝ σ −2 . This is problematic because if we choose a flat metric on the
(µ, σ) upper half plane, the Jeffries prior is not normalizable. Note also that the Jeffreys prior no longer
resembles a Gaussian, and hence is not a conjugate prior.
Chapter 2

Thermodynamics

2.1 References

– E. Fermi, Thermodynamics (Dover, 1956)


This outstanding and inexpensive little book is a model of clarity.

– A. H. Carter, Classical and Statistical Thermodynamics


(Benjamin Cummings, 2000)
A very relaxed treatment appropriate for undergraduate physics majors.

– H. B. Callen, Thermodynamics and an Introduction to Thermostatistics


(2nd edition, Wiley, 1985)
A comprehensive text appropriate for an extended course on thermodynamics.

– D. V. Schroeder, An Introduction to Thermal Physics (Addison-Wesley, 2000)


An excellent thermodynamics text appropriate for upper division undergraduates. Contains many
illustrative practical applications.

– D. Kondepudi and I. Prigogine, Modern Thermodynamics: From Heat Engines to Dissipative Struc-
tures (Wiley, 1998)
Lively modern text with excellent choice of topics and good historical content. More focus on
chemical and materials applications than in Callen.

– L. E. Reichl, A Modern Course in Statistical Physics (2nd edition, Wiley, 1998)


A graduate level text with an excellent and crisp section on thermodynamics.

29
30 CHAPTER 2. THERMODYNAMICS

2.2 What is Thermodynamics?

Thermodynamics is the study of relations among the state variables describing a thermodynamic system,
and of transformations of heat into work and vice versa.

2.2.1 Thermodynamic systems and state variables

Thermodynamic systems contain large numbers of constituent particles, and are described by a set of
state variables which describe the system’s properties in an average sense. State variables are classified
as being either extensive or intensive.
Extensive variables, such as volume V , particle number N , total internal energy E, magnetization M ,
etc., scale linearly with the system size, i.e. as the first power of the system volume. If we take two
identical thermodynamic systems, place them next to each other, and remove any barriers between them,
then all the extensive variables will double in size.
Intensive variables, such as the pressure p, the temperature T , the chemical potential µ, the electric field
E, etc., are independent of system size, scaling as the zeroth power of the volume. They are the same
throughout the system, if that system is in an appropriate state of equilibrium. The ratio of any two
extensive variables is an intensive variable. For example, we write n = N/V for the number density,
which scales as V 0 . Intensive variables may also be inhomogeneous. For example, n(r) is the number
density at position r, and is defined as the limit of ∆N/∆V of the number of particles ∆N inside a
volume ∆V which contains the point r, in the limit V ≫ ∆V ≫ V /N .
Classically, the full motion of a system of N point particles requires 6N variables to fully describe it (3N
positions and 3N velocities or momenta, in three space dimensions)1 . Since the constituents are very
small, N is typically very large. A typical solid or liquid, for example, has a mass density on the order
of ̺ ∼ 1 g/cm3 ; for gases, ̺ ∼ 10−3 g/cm3 . The constituent atoms have masses of 100 to 102 grams per
mole, where one mole of X contains NA of X, and NA = 6.0221415 × 1023 is Avogadro’s number2 . Thus,
for solids and liquids we roughly expect number densities n of 10−2 − 100 mol/cm3 for solids and liquids,
and 10−5 − 10−3 mol/cm3 for gases. Clearly we are dealing with fantastically large numbers of constituent
particles in a typical thermodynamic system. The underlying theoretical basis for thermodynamics, where
we use a small number of state variables to describe a system, is provided by the microscopic theory of
statistical mechanics, which we shall study in the weeks ahead.
Intensive quantities such as p, T , and n ultimately involve averages over both space and time. Consider for
example the case of a gas enclosed in a container. We can measure the pressure (relative to atmospheric
pressure) by attaching a spring to a moveable wall, as shown in Fig. 2.2. From the displacement of the
spring and the value of its spring constant k we determine the force F . This force is due to the difference
in pressures, so p = p0 + F/A. Microscopically, the gas consists of constituent atoms or molecules, which
are constantly undergoing collisions with each other and with the walls of the container. When a particle
bounces off a wall, it imparts an impulse 2n̂(n̂ · p), where p is the particle’s momentum and n̂ is the unit
1
For a system of N molecules which can freely rotate, we must then specify 3N additional orientational variables – the
Euler angles – and their 3N conjugate momenta. The dimension of phase space is then 12N .
2
Hence, 1 guacamole = 6.0221415 × 1023 guacas.
2.2. WHAT IS THERMODYNAMICS? 31

Figure 2.1: From microscale to macroscale : physical versus social sciences.

vector normal to the wall. (Only particles with p · n̂ > 0 will hit the wall.) Multiply this by the number
of particles colliding with the wall per unit time, and one finds the net force on the wall; dividing by
the area gives the pressure p. Within the gas, each particle travels for a distance ℓ, called the mean free
path, before it undergoes a collision. We can write ℓ = v̄τ , where v̄ is the average particle speed and τ is
the mean free time. When we study the kinetic theory of gases, we will derive formulas for ℓ and v̄ (and
hence τ ). For now it is helpful to quote some numbers to get an idea of the relevant distance and time
scales. For O2 gas at standard temperature and pressure (T = 0◦ C, p = 1 atm), the mean free path is
ℓ ≈ 1.1 × 10−5 cm, the average speed is v̄ ≈ 480 m/s, and the mean free time is τ ≈ 2.5 × 10−10 s. Thus,
particles in the gas undergo collisions at a rate τ −1 ≈ 4.0 × 109 s−1 . A measuring device, such as our
spring, or a thermometer, effectively performs time and space averages. If there are Nc collisions with
a particular patch of wall during some time interval on which our measurement device responds, then
−1/2
the root mean square relative fluctuations in the local pressure will be on the order of Nc times the
average. Since Nc is a very large number, the fluctuations are negligible.
If the system is in steady state, the state variables do not change with time. If furthermore there are
no macroscopic currents of energy or particle number flowing through the system, the system is said
to be in equilibrium. A continuous succession of equilibrium states is known as a thermodynamic path,
which can be represented as a smooth curve in a multidimensional space whose axes are labeled by state
variables. A thermodynamic process is any change or succession of changes which results in a change of
the state variables. In a cyclic process, the initial and final states are the same. In a quasistatic process,
the system passes through a continuous succession of equilibria. A reversible process is one where the
external conditions and the thermodynamic path of the system can be reversed; it is both quasi-static and
non-dissipative (i.e. no friction). The slow expansion of a gas against a piston head, whose counter-force
is always infinitesimally less than the force pA exerted by the gas, is reversible. To reverse this process,
we simply add infinitesimally more force to pA and the gas compresses. An example of a quasistatic
process which is not reversible: slowly dragging a block across the floor, or the slow leak of air from a
tire. Irreversible processes, as a rule, are dissipative. Other special processes include isothermal (dT = 0),
isobaric (dp = 0), isochoric (dV = 0), and adiabatic (dQ ¯ = 0, i.e. no heat exchange):

reversible: dQ
¯ = T dS isothermal: dT = 0
spontaneous: dQ
¯ < T dS isochoric: dV = 0
adiabatic: dQ
¯ =0 isobaric: dp = 0 .

We shall discuss later the entropy S and its connection with irreversibility.
32 CHAPTER 2. THERMODYNAMICS

Figure 2.2: The pressure p of a gas is due to an average over space and time of the impulses due to the
constituent particles.

How many state variables are necessary to fully specify the equilibrium state of a thermodynamic system?
For a single component system, such as water which is composed of one constituent molecule, the answer
is three. These can be taken to be T , p, and V . One always must specify at least one extensive variable,
else we cannot determine the overall size of the system. For a multicomponent system with g different
species, we must specify g + 2 state variables, which may be {T, p, N1 , . . . , Ng }, where Na is the number
of particles of species a. Another possibility
Pg is the set (T, p, V, x1 , . . . , xg−1 }, where the concentration
Pg of
species a is xa = Na /N . Here, N = a=1 Na is the total number of particles. Note that a=1 xa = 1.
If then follows that if we specify more than g + 2 state variables, there must exist a relation among them.
Such relations are known as equations of state. The most famous example is the ideal gas law,

pV = N kB T , (2.1)

relating the four state variables T , p, V , and N . Here kB = 1.3806503 × 10−16 erg/K is Boltzmann’s
constant. Another example is the van der Waals equation,
 
aN 2
p + 2 (V − bN ) = N kB T , (2.2)
V
where a and b are constants which depend on the molecule which forms the gas. For a third example,
consider a paramagnet, where
M CH
= , (2.3)
V T
where M is the magnetization, H the magnetic field, and C the Curie constant.
Any quantity which, in equilibrium, depends only on the state variables is called a state function. For
example, the total internal energy E of a thermodynamics system is a state function, and we may write
E = E(T, p, V ). State functions can also serve as state variables, although the most natural state variables
are those which can be directly measured.
2.2. WHAT IS THERMODYNAMICS? 33

2.2.2 Heat

Once thought to be a type of fluid, heat is now understood in terms of the kinetic theory of gases, liquids,
and solids as a form of energy stored in the disordered motion of constituent particles. The units of
heat are therefore units of energy, and it is appropriate to speak of heat energy, which we shall simply
abbreviate as heat:3

1 J = 107 erg = 6.242 × 1018 eV = 2.390 × 10−4 kcal = 9.478 × 10−4 BTU . (2.4)

We will use the symbol Q to denote the amount of heat energy absorbed by a system during some given
thermodynamic process, and dQ ¯ to denote a differential amount of heat energy. The symbol d¯ indicates
an ‘inexact differential’, about which we shall have more to say presently. This means that heat is not a
state function: there is no ‘heat function’ Q(T, p, V ).

2.2.3 Work

In general we will write the differential element of work dW


¯ done by the system as
X
dW
¯ = Fi dXi , (2.5)
i

where Fi is a generalized force and dXi a generalized displacement 4 . The generalized forces and displace-
ments are themselves state variables, and by convention we will take the generalized forces to be intensive
and the generalized displacements to be extensive. As an example, in a simple one-component system,
we have dW
¯ = p dV . More generally, we write
P P
− j yj dXj a µa dNa
z }| { z }| {
¯ = p dV − H · dM − E · dP − σ dA + . . . − µ1 dN1 + µ2 dN2 + . . .
dW (2.6)

Here we distinguish between two types of work. The first involves changes in quantities such as volume,
magnetization, electric polarization, area, etc. The conjugate forces yi applied to the system are then
−p, the magnetic field H, the electric field E, the surface tension σ, respectively. The second type of
work involves changes in the number of constituents of a given species. For example, energy is required
in order to dissociate two hydrogen atoms in an H2 molecule. The effect of such a process is dNH = −1
2
and dNH = +2.
As with heat, dW
¯ is an inexact differential, and work W is not a state variable, since it is path-dependent.
There is no ‘work function’ W (T, p, V ).

3
One calorie (cal) is the amount of heat needed to raise 1 g of H2 O from T0 = 14.5◦ C to T1 = 15.5◦ C at a pressure
of p0 = 1 atm. One British Thermal Unit (BTU) is the amount of heat needed to raise 1 lb. of H2 O from T0 = 63◦ F to
T1 = 64◦ F at a pressure of p0 = 1 atm.
4
¯ to indicate that this is not an exact differential. More on this in section 2.4
We use the symbol d¯ in the differential dW
below.
34 CHAPTER 2. THERMODYNAMICS

2.2.4 Pressure and Temperature

The units of pressure (p) are force per unit area. The SI unit is the Pascal (Pa): 1 Pa = 1 N/m2 =
1 kg/m s2 . Other units of pressure we will encounter:

1 bar ≡ 105 Pa
1 atm ≡ 1.01325 × 105 Pa
1 torr ≡ 133.3 Pa .

Temperature (T ) has a very precise definition from the point of view of statistical mechanics, as we shall
see. Many physical properties depend on the temperature – such properties are called thermometric
properties. For example, the resistivity of a metal ρ(T, p) or the number density of a gas n(T, p) are both
thermometric properties, and can be used to define a temperature scale. Consider the device known as
the ‘constant volume gas thermometer’ depicted in Fig. 2.3, in which the volume or pressure of a gas may
be used to measure temperature. The gas is assumed to be in equilibrium at some pressure p, volume
V , and temperature T . An incompressible fluid of density ̺ is used to measure the pressure difference
∆p = p − p0 , where p0 is the ambient pressure at the top of the reservoir:

p − p0 = ̺g(h2 − h1 ) , (2.7)

where g is the acceleration due to gravity. The height h1 of the left column of fluid in the U-tube provides
a measure of the change in the volume of the gas:

V (h1 ) = V (0) − Ah1 , (2.8)

where A is the (assumed constant) cross-sectional area of the left arm of the U-tube. The device can
operate in two modes:

• Constant pressure mode : The height of the reservoir is adjusted so that the height difference h2 −h1
is held constant. This fixes the pressure p of the gas. The gas volume still varies with temperature
T , and we can define
T V
= , (2.9)
Tref Vref
where Tref and Vref are the reference temperature and volume, respectively.

• Constant volume mode : The height of the reservoir is adjusted so that h1 = 0, hence the volume
of the gas is held fixed, and the pressure varies with temperature. We then define
T p
= , (2.10)
Tref pref

where Tref and pref are the reference temperature and pressure, respectively.

What should we use for a reference? One might think that a pot of boiling water will do, but anyone
who has gone camping in the mountains knows that water boils at lower temperatures at high altitude
2.2. WHAT IS THERMODYNAMICS? 35

Figure 2.3: The constant volume gas thermometer. The gas is placed in thermal contact with an
object of temperature T . An incompressible fluid of density ̺ is used to measure the pressure difference
∆p = pgas − p0 .

(lower pressure). This phenomenon is reflected in the phase diagram for H2 O, depicted in Fig. 2.4.
There are two special points in the phase diagram, however. One is the triple point, where the solid,
liquid, and vapor (gas) phases all coexist. The second is the critical point, which is the terminus of the
curve separating liquid from gas. At the critical point, the latent heat of transition between liquid and
gas phases vanishes (more on this later on). The triple point temperature Tt at thus unique and is by
definition Tt = 273.16 K. The pressure at the triple point is 611.7 Pa = 6.056 × 10−3 atm.
A question remains: are the two modes of the thermometer compatible? E.g. it we boil water at p = p0 =
1 atm, do they yield the same value for T ? And what if we use a different gas in our measurements? In
fact, all these measurements will in general be incompatible, yielding different results for the temperature
T . However, in the limit that we use a very low density gas, all the results converge. This is because all
low density gases behave as ideal gases, and obey the ideal gas equation of state pV = N kB T .

2.2.5 Standard temperature and pressure

It is customary in the physical sciences to define certain standard conditions with respect to which any
arbitrary conditions may be compared. In thermodynamics, there is a notion of standard temperature
and pressure, abbreviated STP. Unfortunately, there are two different definitions of STP currently in use,
one from the International Union of Pure and Applied Chemistry (IUPAC), and the other from the U.S.
National Institute of Standards and Technology (NIST). The two standards are:

IUPAC : T0 = 0◦ C = 273.15 K , p0 = 105 Pa


NIST : T0 = 20◦ C = 293.15 K , p0 = 1 atm = 1.01325 × 105 Pa
36 CHAPTER 2. THERMODYNAMICS

Figure 2.4: A sketch of the phase diagram of H2 O (water). Two special points are identified: the triple
point (Tt , pt ) at which there is three phase coexistence, and the critical point (Tc , pc ), where the latent
heat of transformation from liquid to gas vanishes. Not shown are transitions between several different
solid phases.

To make matters worse, in the past it was customary to define STP as T0 = 0◦ C and p0 = 1 atm. We will
use the NIST definition in this course. Unless I slip and use the IUPAC definition. Figuring out what I
mean by STP will keep you on your toes.
The volume of one mole of ideal gas at STP is then
(
NA kB T0 22.711 ℓ (IUPAC)
V = = (2.11)
p0 24.219 ℓ (NIST) ,
where 1 ℓ = 106 cm3 = 10−3 m3 is one liter. Under the old definition of STP as T0 = 0◦ C and p0 = 1 atm,
the volume of one mole of gas at STP is 22.414 ℓ, which is a figure I remember from my 10th grade
chemistry class with Mr. Lawrence.

2.3 The Zeroth Law of Thermodynamics

Equilibrium is established by the exchange of energy, volume, or particle number between different systems
or subsystems:
energy exchange =⇒ T = constant =⇒ thermal equilibrium
p
volume exchange =⇒ = constant =⇒ mechanical equilibrium
T
µ
particle exchange =⇒ = constant =⇒ chemical equilibrium
T
2.4. MATHEMATICAL INTERLUDE : EXACT AND INEXACT DIFFERENTIALS 37

Figure 2.5: As the gas density tends to zero, the readings of the constant volume gas thermometer
converge.

Equilibrium is transitive, so

If A is in equilibrium with B, and B is in equilibrium with C, then A is in equilibrium with


C.

This known as the Zeroth Law of Thermodynamics5 .

2.4 Mathematical Interlude : Exact and Inexact Differentials

The differential
k
X
dF = Ai dxi (2.12)
i=1
is called exact if there is a function F (x1 , . . . , xk ) whose differential gives the right hand side of eqn.
2.188. In this case, we have
∂F ∂Ai ∂Aj
Ai = ⇐⇒ = ∀ i, j . (2.13)
∂xi ∂xj ∂xi
For exact differentials, the integral between fixed endpoints is path-independent:
ZB
dF = F (xB1 , . . . , xBk ) − F (xA1 , . . . , xAk ) , (2.14)
A

from which it follows that the integral of dF around any closed path must vanish:
I
dF = 0 . (2.15)

5
As we shall see further below, thermomechanical equilibrium in fact leads to constant p/T , and thermochemical equi-
librium to constant µ/T . If there is thermal equilibrium, then T is already constant, and so thermomechanical and thermo-
chemical equilibria then guarantee the constancy of p and µ.
38 CHAPTER 2. THERMODYNAMICS

Figure 2.6: Two distinct paths with identical endpoints.

When the cross derivatives are not identical, i.e. when ∂Ai /∂xj 6= ∂Aj /∂xi , the differential is inexact.
In this case, the integral of dF is path dependent, and does not depend solely on the endpoints.
As an example, consider the differential
dF = K1 y dx + K2 x dy . (2.16)
Let’s evaluate the integral of dF , which is the work done, along each of the two paths in Fig. 2.6:

ZxB ZyB
W (I) = K1 dx yA + K2 dy xB = K1 yA (xB − xA ) + K2 xB (yB − yA ) (2.17)
xA yA

ZxB ZyB
W (II) = K1 dx yB + K2 dy xA = K1 yB (xB − xA ) + K2 xA (yB − yA ) . (2.18)
xA yA

Note that in general W (I) 6= W (II) . Thus, if we start at point A, the kinetic energy at point B will depend
on the path taken, since the work done is path-dependent.
The difference between the work done along the two paths is
I
W (I) − W (II) = dF = (K2 − K1 ) (xB − xA ) (yB − yA ) . (2.19)

Thus, we see that if K1 = K2 , the work is the same for the two paths. In fact, if K1 = K2 , the work
would be path-independent, and would depend only on the endpoints. This is true for any path, and
not just piecewise linear paths of the type depicted in Fig. 2.6. Thus, if K1 = K2 , we are justified in
using the notation dF for the differential in eqn. 2.16; explicitly, we then have F = K1 xy. However, if
K1 6= K2 , the differential is inexact, and we will henceforth write dF
¯ in such cases.
2.5. THE FIRST LAW OF THERMODYNAMICS 39

Figure 2.7: The first law of thermodynamics is a statement of energy conservation.

2.5 The First Law of Thermodynamics

2.5.1 Conservation of energy

The first law is a statement of energy conservation, and is depicted in Fig. 2.7. It says, quite simply, that
during a thermodynamic process, the change in a system’s internal energy E is given by the heat energy
Q added to the system, minus the work W done by the system:

∆E = Q − W . (2.20)

The differential form of this, the First Law of Thermodynamics, is

¯ − dW
dE = dQ ¯ . (2.21)

We use the symbol d¯ in the differentials dQ


¯ and dW¯ to remind us that these are inexact differentials.
The energy E, however, is a state function, hence dE is an exact differential.
Consider a volume V of fluid held in a flask, initially at temperature T0 , and held at atmospheric pressure.
The internal energy is then E0 = E(T0 , p, V ). Now let us contemplate changing the temperature in two
different ways. The first method (A) is to place the flask on a hot plate until the temperature of the fluid
rises to a value T1 . The second method (B) is to stir the fluid vigorously. In the first case, we add heat
QA > 0 but no work is done, so WA = 0. In the second case, if we thermally insulate the flask and use
a stirrer of very low thermal conductivity, then no heat is added, i.e. QB = 0. However, the stirrer does
work −WB > 0 on the fluid (remember W is the work done by the system). If we end up at the same
temperature T1 , then the final energy is E1 = E(T1 , p, V ) in both cases. We then have

∆E = E1 − E0 = QA = −WB . (2.22)

It also follows that for any cyclic transformation, where the state variables are the same at the beginning
and the end, we have
∆Ecyclic = Q − W = 0 =⇒ Q = W (cyclic) . (2.23)

2.5.2 Single component systems

A single component system is specified by three state variables. In many applications, the total number
of particles N is conserved, so it is useful to take N as one of the state variables. The remaining two can
40 CHAPTER 2. THERMODYNAMICS

be (T, V ) or (T, p) or (p, V ). The differential form of the first law says

¯ − dW
dE = dQ ¯
¯ − p dV + µ dN .
= dQ (2.24)

The quantity µ is called the chemical potential. We ask: how much heat is required in order to make an
infinitesimal change in temperature, pressure, volume, or particle number? We start by rewriting eqn.
2.24 as
¯ = dE + p dV − µ dN .
dQ (2.25)
We now must roll up our sleeves and do some work with partial derivatives.
• (T, V, N ) systems : If the state variables are (T, V, N ), we write
     
∂E ∂E ∂E
dE = dT + dV + dN . (2.26)
∂T V,N ∂V T,N ∂N T,V

Then   "  # "  #


∂E ∂E ∂E
dQ
¯ = dT + + p dV + − µ dN . (2.27)
∂T V,N ∂V T,N ∂N T,V

• (T, p, N ) systems : If the state variables are (T, p, N ), we write


     
∂E ∂E ∂E
dE = dT + dp + dN . (2.28)
∂T p,N ∂p T,N ∂N T,p

We also write      
∂V ∂V ∂V
dV = dT + dp + dN . (2.29)
∂T p,N ∂p T,N ∂N T,p
Then
"   #  "    #
∂E ∂V ∂E ∂V
dQ
¯ = +p dT + +p dp
∂T p,N ∂T p,N ∂p T,N ∂p T,N
"    # (2.30)
∂E ∂V
+ +p − µ dN .
∂N T,p ∂N T,p

• (p, V, N ) systems : If the state variables are (p, V, N ), we write


     
∂E ∂E ∂E
dE = dp + dV + dN . (2.31)
∂p V,N ∂V p,N ∂N p,V

Then   "  # "  #


∂E ∂E ∂E
dQ
¯ = dp + + p dV + − µ dN . (2.32)
∂p V,N ∂V p,N ∂N p,V

The heat capacity of a body, C, is by definition the ratio dQ/dT


¯ of the amount of heat absorbed by
the body to the associated infinitesimal change in temperature dT . The heat capacity will in general be
2.5. THE FIRST LAW OF THERMODYNAMICS 41

cp c̃p cp c̃p
SUBSTANCE (J/mol K) (J/g K) SUBSTANCE (J/mol K) (J/g K)
Air 29.07 1.01 H2 O (25◦ C) 75.34 4.181
Aluminum 24.2 0.897 H2 O (100◦+ C) 37.47 2.08
Copper 24.47 0.385 Iron 25.1 0.450
CO2 36.94 0.839 Lead 26.4 0.127
Diamond 6.115 0.509 Lithium 24.8 3.58
Ethanol 112 2.44 Neon 20.786 1.03
Gold 25.42 0.129 Oxygen 29.38 0.918
Helium 20.786 5.193 Paraffin (wax) 900 2.5
Hydrogen 28.82 5.19 Uranium 27.7 0.116
H2 O (−10◦ C) 38.09 2.05 Zinc 25.3 0.387

Table 2.1: Specific heat (at 25◦ C, unless otherwise noted) of some common substances. (Source:
Wikipedia.)

different if the body is heated at constant volume or at constant pressure. Setting dV = 0 gives, from
eqn. 2.27,
   
dQ
¯ ∂E
CV,N = = . (2.33)
dT V,N ∂T V,N

Similarly, if we set dp = 0, then eqn. 2.30 yields


     
dQ
¯ ∂E ∂V
Cp,N = = +p . (2.34)
dT p,N ∂T p,N ∂T p,N

Unless explicitly stated as otherwise, we shall assume that N is fixed, and will write CV for CV,N and Cp
for Cp,N .
The units of heat capacity are energy divided by temperature, e.g. J/K. The heat capacity is an extensive
quantity, scaling with the size of the system. If we divide by the number of moles N/NA , we obtain the
molar heat capacity, sometimes called the molar specific heat: c = C/ν, where ν = N/NA is the number
of moles of substance. Specific heat is also sometimes quoted in units of heat capacity per gram of
substance. We shall define
C c heat capacity per mole
c̃ = = = . (2.35)
mN M mass per mole

Here m is the mass per particle and M is the mass per mole: M = NA m.
Suppose we raise the temperature of a body from T = TA to T = TB . How much heat is required? We
have
ZTB
Q = dT C(T ) , (2.36)
TA
42 CHAPTER 2. THERMODYNAMICS

Figure 2.8: Heat capacity CV for one mole of hydrogen (H2 ) gas. At the lowest temperatures, only
translational degrees of freedom are relevant, and f = 3. At around 200 K, two rotational modes are
excitable and f = 5. Above 1000 K, the vibrational excitations begin to contribute. Note the logarithmic
temperature scale. (Data from H. W. Wooley et al., Jour. Natl. Bureau of Standards, 41, 379 (1948).)

where C = CV or C = Cp depending on whether volume or pressure is held constant. For ideal gases, as
we shall discuss below, C(T ) is constant, and thus

Q
Q = C(TB − TA ) =⇒ TB = TA + . (2.37)
C

In metals at very low temperatures one finds C = γT , where γ is a constant6 . We then have

ZTB

Q = dT C(T ) = 12 γ TB2 − TA2 (2.38)
TA
p
TB = TA2 + 2γ −1 Q . (2.39)

2.5.3 Ideal gases

The ideal gas equation of state is pV = N kB T . In order to invoke the formulae in eqns. 2.27, 2.30, and
2.32, we need to know the state function E(T, V, N ). A landmark experiment by Joule in the mid-19th
century established that the energy of a low density gas is independent of its volume7 . Essentially, a gas
at temperature T was allowed to freely expand from one volume V to a larger volume V ′ > V , with no
added heat Q and no work W done. Therefore the energy cannot change. What Joule found was that
the temperature also did not change. This means that E(T, V, N ) = E(T, N ) cannot be a function of the
volume.
6
In most metals, the difference between CV and Cp is negligible.
7
See the description in E. Fermi, Thermodynamics, pp. 22-23.
2.5. THE FIRST LAW OF THERMODYNAMICS 43

Since E is extensive, we conclude that

E(T, V, N ) = ν ε(T ) , (2.40)

where ν = N/NA is the number of moles of substance. Note that ν is an extensive variable. From eqns.
2.33 and 2.34, we conclude

CV (T ) = ν ε′ (T ) , Cp (T ) = CV (T ) + νR , (2.41)

where we invoke the ideal gas law to obtain the second of these. Empirically it is found that CV (T )
is temperature independent over a wide range of T , far enough from boiling point. We can then write
CV = ν cV , where ν ≡ N/NA is the number of moles, and where cV is the molar heat capacity. We then
have
cp = cV + R , (2.42)
where R = NA kB = 8.31457 J/mol K is the gas constant. We denote by γ = cp /cV the ratio of specific
heat at constant pressure and at constant volume.
From the kinetic theory of gases, one can show that

monatomic gases: cV = 32 R , cp = 52 R , γ= 5
3
diatomic gases: cV = 25 R , cp = 72 R , γ= 7
5
4
polyatomic gases: cV = 3R , cp = 4R , γ= 3 .

Digression : kinetic theory of gases

We will conclude in general from noninteracting classical statistical mechanics that the specific heat of a
substance is cv = 12 f R, where f is the number of phase space coordinates, per particle, for which there
is a quadratic kinetic or potential energy function. For example, a point particle has three translational
degrees of freedom, and the kinetic energy is a quadratic function of their conjugate momenta: H0 =
(p2x +p2y +p2z )/2m. Thus, f = 3. Diatomic molecules have two additional rotational degrees of freedom – we
don’t count rotations about the symmetry axis – and their conjugate momenta also appear quadratically
in the kinetic energy, leading to f = 5. For polyatomic molecules, all three Euler angles and their
conjugate momenta are in play, and f = 6.
The reason that f = 5 for diatomic molecules rather than f = 6 is due to quantum mechanics. While
translational eigenstates form a continuum, or are quantized in a box with ∆kα = 2π/Lα being very
small, since the dimensions Lα are macroscopic, angular momentum, and hence rotational kinetic energy,
is quantized. For rotations about a principal axis with very low moment of inertia I, the corresponding
energy scale ~2 /2I is very large, and a high temperature is required in order to thermally populate these
states. Thus, degrees of freedom with a quantization energy on the order or greater than ε0 are ‘frozen
out’ for temperatures T < ∼ ε0 /kB .
In solids, each atom is effectively connected to its neighbors by springs; such a potential arises from
quantum mechanical and electrostatic consideration of the interacting atoms. Thus, each degree of
freedom contributes to the potential energy, and its conjugate momentum contributes to the kinetic
energy. This results in f = 6. Assuming only lattice vibrations, then, the high temperature limit for
44 CHAPTER 2. THERMODYNAMICS

Figure 2.9: Molar heat capacities cV for three solids. The solid curves correspond to the predictions of
the Debye model, which we shall discuss later.

cV (T ) for any solid is predicted to be 3R = 24.944 J/mol K. This is called the Dulong-Petit law . The
high temperature limit is reached above the so-called Debye temperature, which is roughly proportional
to the melting temperature of the solid.
In table 2.1, we list cp and c̃p for some common substances at T = 25◦ C (unless otherwise noted).
Note that cp for the monatomic gases He and Ne is to high accuracy given by the value from kinetic
theory, cp = 52 R = 20.7864 J/mol K. For the diatomic gases oxygen (O2 ) and air (mostly N2 and O2 ),
kinetic theory predicts cp = 72 R = 29.10, which is close to the measured values. Kinetic theory predicts
cp = 4R = 33.258 for polyatomic gases; the measured values for CO2 and H2 O are both about 10%
higher.

2.5.4 Adiabatic transformations of ideal gases

Assuming dN = 0 and E = ν ε(T ), eqn. 2.27 tells us that

dQ
¯ = CV dT + p dV . (2.43)

Invoking the ideal gas law to write p = νRT /V , and remembering CV = ν cV , we have, setting dQ
¯ = 0,
dT R dV
+ =0. (2.44)
T cV V
We can immediately integrate to obtain

 γ−1 = constant
T V
dQ
¯ =0 =⇒ pV γ = constant (2.45)

 γ 1−γ
T p = constant

where the second two equations are obtained from the first by invoking the ideal gas law. These are all
adiabatic equations of state. Note the difference between the adiabatic equation of state d(pV γ ) = 0 and
2.5. THE FIRST LAW OF THERMODYNAMICS 45

the isothermal equation of state d(pV ) = 0. Equivalently, we can write these three conditions as

V 2 T f = V02 T0f , pf V f +2 = pf0 V0f +2 , T f +2 p−2 = T0f +2 p−2


0 . (2.46)

It turns out that air is a rather poor conductor of heat. This suggests the following model for an adiabatic
atmosphere. The hydrostatic pressure decrease associated with an increase dz in height is dp = −̺g dz,
where ̺ is the density and g the acceleration due to gravity. Assuming the gas is ideal, the density can
be written as ̺ = M p/RT , where M is the molar mass. Thus,
dp Mg
=− dz . (2.47)
p RT

If the height changes are adiabatic, then, from d(T γ p1−γ ) = 0, we have

γ − 1 T dp γ − 1 Mg
dT = =− dz , (2.48)
γ p γ R
with the solution  
γ − 1 Mg γ−1 z
T (z) = T0 − z= 1− T0 , (2.49)
γ R γ λ
where T0 = T (0) is the temperature at the earth’s surface, and

RT0
λ= . (2.50)
Mg

With M = 28.88 g and γ = 57 for air, and assuming T0 = 293 K, we find λ = 8.6 km, and dT /dz =
−(1 − γ −1 ) T0 /λ = −9.7 K/km. Note that in this model the atmosphere ends at a height zmax =
γλ/(γ − 1) = 30 km.
Again invoking the adiabatic equation of state, we can find p(z):
  γ   γ
p(z) T γ−1 γ−1 z γ−1
= = 1− (2.51)
p0 T0 γ λ

Recall that 
x k
ex = lim
. 1+ (2.52)
k→∞ k
Thus, in the limit γ → 1, where k = γ/(γ − 1) → ∞, we have p(z) = p0 exp(−z/λ). Finally, since
̺ ∝ p/T from the ideal gas law, we have
  1
̺(z) γ−1 z γ−1
= 1− . (2.53)
̺0 γ λ

2.5.5 Adiabatic free expansion

Consider the situation depicted in Fig. 2.10. A quantity (ν moles) of gas in equilibrium at temperature
T and volume V1 is allowed to expand freely into an evacuated chamber of volume V2 by the removal of
46 CHAPTER 2. THERMODYNAMICS

a barrier. Clearly no work is done on or by the gas during this process, hence W = 0. If the walls are
everywhere insulating, so that no heat can pass through them, then Q = 0 as well. The First Law then
gives ∆E = Q − W = 0, and there is no change in energy.
If the gas is ideal, then since E(T, V, N ) = N cV T , then ∆E = 0 gives ∆T = 0, and there is no change in
temperature. (If the walls are insulating against the passage of heat, they must also prevent the passage
of particles, so ∆N = 0.) There is of course a change in volume: ∆V = V2 , hence there is a change in
pressure. The initial pressure is p = N kB T /V1 and the final pressure is p′ = N kB T /(V1 + V2 ).
If the gas is nonideal, then the temperature will in general change. Suppose E(T, V, N ) = α V x N 1−x T y ,
where α, x, and y are constants. This form is properly extensive: if V and N double, then E doubles. If
the volume changes from V to V ′ under an adiabatic free expansion, then we must have, from ∆E = 0,
 x  y  x/y
V T′ ′ V
= =⇒ T =T· . (2.54)
V′ T V′

If x/y > 0, the temperature decreases upon the expansion. If x/y < 0, the temperature increases.
Without an equation of state, we can’t say precisely what happens to the pressure, although we know on
general grounds that it must decrease because,
 as we shall see, thermodynamic stability entails a positive
isothermal compressibility: κT = − V1 ∂V
∂p T,N > 0.
Adiabatic free expansion of a gas is a spontaneous process, arising due to the natural internal dynamics
of the system. It is also irreversible. If we wish to take the gas back to its original state, we must do
work on it to compress it. If the gas is ideal, then the initial and final temperatures are identical, so we
can place the system in thermal contact with a reservoir at temperature T and follow a thermodynamic
path along an isotherm. The work done on the gas during compression is then

ZVi    
dV Vf V2
W = −N kB T = N kB T ln = N kB T ln 1 + (2.55)
V Vi V1
Vf

Figure 2.10: In the adiabatic free expansion of a gas, there is volume expansion with no work or heat
exchange with the environment: ∆E = Q = W = 0.
2.6. HEAT ENGINES AND THE SECOND LAW OF THERMODYNAMICS 47

Figure 2.11: A perfect engine would extract heat Q from a thermal reservoir at some temperature T and
convert it into useful mechanical work W . This process is alas impossible, according to the Second Law
of thermodynamics. The inverse process, where work W is converted into heat Q, is always possible.

R
The work done by the gas is W = p dV = −W. During the compression, heat energy Q = W < 0 is
transferred to the gas from the reservoir. Thus, Q = W > 0 is given off by the gas to its environment.

2.6 Heat Engines and the Second Law of Thermodynamics

2.6.1 There’s no free lunch so quit asking

A heat engine is a device which takes a thermodynamic system through a repeated cycle which can be
represented as a succession of equilibrium states: A → B → C · · · → A. The net result of such a cyclic
process is to convert heat into mechanical work, or vice versa.
For a system in equilibrium at temperature T , there is a thermodynamically large amount of internal
energy stored in the random internal motion of its constituent particles. Later, when we study statistical
mechanics, we will see how each ‘quadratic’ degree of freedom in the Hamiltonian contributes 21 kB T to the
total internal energy. An immense body in equilibrium at temperature T has an enormous heat capacity
C, hence extracting a finite quantity of heat Q from it results in a temperature change ∆T = −Q/C which
is utterly negligible. Such a body is called a heat bath, or thermal reservoir . A perfect engine would, in
each cycle, extract an amount of heat Q from the bath and convert it into work. Since ∆E = 0 for a
cyclic process, the First Law then gives W = Q. This situation is depicted schematically in Fig. 2.11.
One could imagine running this process virtually indefinitely, slowly sucking energy out of an immense
heat bath, converting the random thermal motion of its constituent molecules into useful mechanical
work. Sadly, this is not possible:

A transformation whose only final result is to extract heat from a source at fixed temperature
and transform that heat into work is impossible.

This is known as the Postulate of Lord Kelvin. It is equivalent to the postulate of Clausius,

A transformation whose only result is to transfer heat from a body at a given temperature to
a body at higher temperature is impossible.
48 CHAPTER 2. THERMODYNAMICS

These postulates which have been repeatedly validated by empirical observations, constitute the Second
Law of Thermodynamics.

2.6.2 Engines and refrigerators

While it is not possible to convert heat into work with 100% efficiency, it is possible to transfer heat
from one thermal reservoir to another one, at lower temperature, and to convert some of that heat into
work. This is what an engine does. The energy accounting for one cycle of the engine is depicted in the
left hand panel of Fig. 2.12. An amount of heat Q2 > 0 is extracted- from the reservoir at temperature
T2 . Since the reservoir is assumed to be enormous, its temperature change ∆T2 = −Q2 /C2 is negligible,
and its temperature remains constant – this is what it means for an object to be a reservoir. A lesser
amount of heat, Q1 , with 0 < Q1 < Q2 , is deposited in a second reservoir at a lower temperature T1 . Its
temperature change ∆T1 = +Q1 /C1 is also negligible. The difference W = Q2 − Q1 is extracted as useful
work. We define the efficiency, η, of the engine as the ratio of the work done to the heat extracted from
the upper reservoir, per cycle:
W Q
η= =1− 1 . (2.56)
Q2 Q2
This is a natural definition of efficiency, since it will cost us fuel to maintain the temperature of the upper
reservoir over many cycles of the engine. Thus, the efficiency is proportional to the ratio of the work
done to the cost of the fuel.
A refrigerator works according to the same principles, but the process runs in reverse. An amount of heat
Q1 is extracted from the lower reservoir – the inside of our refrigerator – and is pumped into the upper
reservoir. As Clausius’ form of the Second Law asserts, it is impossible for this to be the only result of
our cycle. Some amount of work W must be performed on the refrigerator in order for it to extract the
heat Q1 . Since ∆E = 0 for the cycle, a heat Q2 = W + Q1 must be deposited into the upper reservoir
during each cycle. The analog of efficiency here is called the coefficient of refrigeration, κ, defined as

Q1 Q1
κ= = . (2.57)
W Q2 − Q1

Thus, κ is proportional to the ratio of the heat extracted to the cost of electricity, per cycle.
Please note the deliberate notation here. I am using symbols Q and W to denote the heat supplied to
the engine (or refrigerator) and the work done by the engine, respectively, and Q and W to denote the
heat taken from the engine and the work done on the engine.
A perfect engine has Q1 = 0 and η = 1; a perfect refrigerator has Q1 = Q2 and κ = ∞. Both violate
the Second Law. Sadi Carnot8 (1796 – 1832) realized that a reversible cyclic engine operating between
two thermal reservoirs must produce the maximum amount of work W , and that the amount of work
produced is independent of the material properties of the engine. We call any such engine a Carnot
engine.
The efficiency of a Carnot engine may be used to define a temperature scale. We know from Carnot’s
observations that the efficiency ηC can only be a function of the temperatures T1 and T2 : ηC = ηC (T1 , T2 ).
8
Carnot died during cholera epidemic of 1832. His is one of the 72 names engraved on the Eiffel Tower.
2.6. HEAT ENGINES AND THE SECOND LAW OF THERMODYNAMICS 49

Figure 2.12: An engine (left) extracts heat Q2 from a reservoir at temperature T2 and deposits a
smaller amount of heat Q1 into a reservoir at a lower temperature T1 , during each cycle. The difference
W = Q2 − Q1 is transformed into mechanical work. A refrigerator (right) performs the inverse process,
drawing heat Q1 from a low temperature reservoir and depositing heat Q2 = Q1 + W into a high
temperature reservoir, where W is the mechanical (or electrical) work done per cycle.

We can then define


T1
≡ 1 − ηC (T1 , T2 ) . (2.58)
T2
Below, in §2.6.4, we will see that how, using an ideal gas as the ‘working substance’ of the Carnot engine,
this temperature scale coincides precisely with the ideal gas temperature scale from §2.2.4.

2.6.3 Nothing beats a Carnot engine

The Carnot engine is the most efficient engine possible operating between two thermal reservoirs. To see
this, let’s suppose that an amazing wonder engine has an efficiency even greater than that of the Carnot
engine. A key feature of the Carnot engine is its reversibility – we can just go around its cycle in the
opposite direction, creating a Carnot refrigerator. Let’s use our notional wonder engine to drive a Carnot
refrigerator, as depicted in Fig. 2.13.
We assume that
W W′
= ηwonder > ηCarnot = ′ . (2.59)
Q2 Q2
But from the figure, we have W = W ′ , and therefore the heat energy Q′2 − Q2 transferred to the upper
reservoir is positive. From
W = Q2 − Q1 = Q′2 − Q′1 = W ′ , (2.60)

we see that this is equal to the heat energy extracted from the lower reservoir, since no external work is
done on the system:
Q′2 − Q2 = Q′1 − Q1 > 0 . (2.61)
50 CHAPTER 2. THERMODYNAMICS

Figure 2.13: A wonder engine driving a Carnot refrigerator.

Therefore, the existence of the wonder engine entails a violation of the Second Law. Since the Second
Law is correct – Lord Kelvin articulated it, and who are we to argue with a Lord? – the wonder engine
cannot exist.
We further conclude that all reversible engines running between two thermal reservoirs have the same
efficiency, which is the efficiency of a Carnot engine. For an irreversible engine, we must have

W Q T
η= = 1 − 1 ≤ 1 − 1 = ηC . (2.62)
Q2 Q2 T2

Thus,
Q2 Q1
− ≤0. (2.63)
T2 T1

2.6.4 The Carnot cycle

Let us now consider a specific cycle, known as the Carnot cycle, depicted in Fig. 2.14. The cycle consists
of two adiabats and two isotherms. The work done per cycle is simply the area inside the curve on our
p − V diagram: I
W = p dV . (2.64)

The gas inside our Carnot engine is called the ‘working substance’. Whatever it may be, the system
obeys the First Law,
dE = dQ¯ − dW¯ = dQ¯ − p dV . (2.65)

We will now assume that the working material is an ideal gas, and we compute W as well as Q1 and Q2
to find the efficiency of this cycle. In order to do this, we will rely upon the ideal gas equations,

νRT
E= , pV = νRT , (2.66)
γ−1
2.6. HEAT ENGINES AND THE SECOND LAW OF THERMODYNAMICS 51

Figure 2.14: The Carnot cycle consists of two adiabats (dark red) and two isotherms (blue).

where γ = cp /cv = 1 + f2 , where f is the effective number of molecular degrees of freedom contributing
to the internal energy. Recall f = 3 for monatomic gases, f = 5 for diatomic gases, and f = 6 for
polyatomic gases. The finite difference form of the first law is

∆E = Ef − Ei = Qif − Wif , (2.67)

where i denotes the initial state and f the final state.

AB: This stage is an isothermal expansion at temperature T2 . It is the ‘power stroke’ of the engine. We
have
ZVB  
νRT2 VB
WAB = dV = νRT2 ln (2.68)
V VA
VA
νRT2
EA = EB = , (2.69)
γ−1
hence  
VB
QAB = ∆EAB + WAB = νRT2 ln . (2.70)
VA
BC: This stage is an adiabatic expansion. We have

QBC = 0 (2.71)
νR
∆EBC = EC − EB = (T − T2 ) . (2.72)
γ−1 1
The energy change is negative, and the heat exchange is zero, so the engine still does some work
during this stage:
νR
WBC = QBC − ∆EBC = (T − T1 ) . (2.73)
γ−1 2
52 CHAPTER 2. THERMODYNAMICS

CD: This stage is an isothermal compression, and we may apply the analysis of the isothermal expansion,
mutatis mutandis:
ZVD  
νRT1 VD
WCD = dV = νRT1 ln (2.74)
V VC
VC
νRT1
EC = ED = , (2.75)
γ−1
hence  
VD
QCD = ∆ECD + WCD = νRT1 ln . (2.76)
VC
DA: This last stage is an adiabatic compression, and we may draw on the results from the adiabatic
expansion in BC:
QDA = 0 (2.77)
νR
∆EDA = ED − EA = (T − T1 ) . (2.78)
γ−1 2
The energy change is positive, and the heat exchange is zero, so work is done on the engine:
νR
WDA = QDA − ∆EDA = (T − T2 ) . (2.79)
γ−1 1

We now add up all the work values from the individual stages to get for the cycle
W = WAB + WBC + WCD + WDA
   
VB VD (2.80)
= νRT2 ln + νRT1 ln .
VA VC
Since we are analyzing a cyclic process, we must have ∆E = 0, we must have Q = W , which can of course
be verified explicitly, by computing Q = QAB + QBC + QCD + QDA . To finish up, recall the adiabatic ideal
gas equation of state, d(T V γ−1 ) = 0. This tells us that
T2 VBγ−1 = T1 VCγ−1 (2.81)
γ−1 γ−1
T2 VA = T1 VD . (2.82)
Dividing these two equations, we find
VB V
= C , (2.83)
VA VD
and therefore
 
VB
W = νR(T2 − T1 ) ln (2.84)
VA
 
VB
QAB = νRT2 ln . (2.85)
VA
Finally, the efficiency is given by the ratio of these two quantities:
W T
η= =1− 1 . (2.86)
QAB T2
2.6. HEAT ENGINES AND THE SECOND LAW OF THERMODYNAMICS 53

Figure 2.15: A Stirling cycle consists of two isotherms (blue) and two isochores (green).

2.6.5 The Stirling cycle

Many other engine cycles are possible. The Stirling cycle, depicted in Fig. 2.15, consists of two isotherms
and two isochores. Recall the isothermal ideal gas equation of state, d(pV ) = 0. Thus, for an ideal gas
Stirling cycle, we have
p A V1 = p B V2 , pD V1 = pC V2 , (2.87)
which says
pB p V
= C = 1 . (2.88)
pA pD V2

AB: This isothermal expansion is the power stroke. Assuming ν moles of ideal gas throughout, we have
pV = νRT2 = p1 V1 , hence
ZV2  
νRT2 V2
WAB = dV = νRT2 ln . (2.89)
V V1
V1

Since AB is an isotherm, we have EA = EB , and from ∆EAB = 0 we conclude QAB = WAB .


BC: Isochoric cooling. Since dV = 0 we have WBC = 0. The energy change is given by
νR(T1 − T2 )
∆EBC = EC − EB = , (2.90)
γ−1
which is negative. Since WBC = 0, we have QBC = ∆EBC .

CD: Isothermal compression. Clearly


ZV1  
νRT1 V2
WCD = dV = −νRT1 ln . (2.91)
V V1
V2
54 CHAPTER 2. THERMODYNAMICS

Since CD is an isotherm, we have EC = ED , and from ∆ECD = 0 we conclude QCD = WCD .


DA: Isochoric heating. Since dV = 0 we have WDA = 0. The energy change is given by
νR(T2 − T1 )
∆EDA = EA − ED = , (2.92)
γ−1
which is positive, and opposite to ∆EBC . Since WDA = 0, we have QDA = ∆EDA .

We now add up all the work contributions to obtain


W = WAB + WBC + WCD + WDA
 
V2 (2.93)
= νR(T2 − T1 ) ln .
V1
The cycle efficiency is once again
W T
η= =1− 1 . (2.94)
QAB T2

2.6.6 The Otto and Diesel cycles

The Otto cycle is a rough approximation to the physics of a gasoline engine. It consists of two adiabats
and two isochores, and is depicted in Fig. 2.16. Assuming an ideal gas, along the adiabats we have
d(pV γ ) = 0. Thus,
pA V1γ = pB V2γ , pD V1γ = pC V2γ , (2.95)
which says  γ
pB p V1
= C = . (2.96)
pA pD V2

AB: Adiabatic expansion, the power stroke. The heat transfer is QAB = 0, so from the First Law we
have WAB = −∆EAB = EA − EB , thus
"  γ−1 #
pA V1 − pB V2 p A V1 V1
WAB = = 1− . (2.97)
γ−1 γ−1 V2

Note that this result can also be obtained from the adiabatic equation of state pV γ = pA V1γ :

ZV2 ZV2 "  γ−1 #


γ −γ p A V1 V1
WAB = p dV = pA V1 dV V = 1− . (2.98)
γ−1 V2
V1 V1

BC: Isochoric cooling (exhaust); dV = 0 hence WBC = 0. The heat QBC absorbed is then
V2
QBC = EC − EB = (p − pB ) . (2.99)
γ−1 C
In a realistic engine, this is the stage in which the old burned gas is ejected and new gas is inserted.
2.6. HEAT ENGINES AND THE SECOND LAW OF THERMODYNAMICS 55

Figure 2.16: An Otto cycle consists of two adiabats (dark red) and two isochores (green).

Figure 2.17: A Diesel cycle consists of two adiabats (dark red), one isobar (light blue), and one isochore
(green).

CD: Adiabatic compression; QCD = 0 and WCD = EC − ED :


"  γ−1 #
p C V2 − p D V1 p D V1 V1
WCD = =− 1− . (2.100)
γ−1 γ−1 V2

DA: Isochoric heating, i.e. the combustion of the gas. As with BC we have dV = 0, and thus WDA = 0.
The heat QDA absorbed by the gas is then

V1
QDA = EA − ED = (p − pD ) . (2.101)
γ−1 A
56 CHAPTER 2. THERMODYNAMICS

The total work done per cycle is then


W = WAB + WBC + WCD + WDA
"  γ−1 #
(pA − pD )V1 V1 (2.102)
= 1− ,
γ−1 V2

and the efficiency is defined to be


 γ−1
W V1
η≡ =1− . (2.103)
QDA V2
The ratio V2 /V1 is called the compression ratio. We can make our Otto cycle more efficient simply by
increasing the compression ratio. The problem with this scheme is that if the fuel mixture becomes too
hot, it will spontaneously ‘preignite’, and the pressure will jump up before point D in the cycle is reached.
A Diesel engine avoids preignition by compressing the air only, and then later spraying the fuel into the
cylinder when the air temperature is sufficient for fuel ignition. The rate at which fuel is injected is
adjusted so that the ignition process takes place at constant pressure. Thus, in a Diesel engine, step DA
is an isobar. The compression ratio is r ≡ VB /VD , and the cutoff ratio is s ≡ VA /VD . This refinement of
the Otto cycle allows for higher compression ratios (of about 20) in practice, and greater engine efficiency.
For the Diesel cycle, we have, briefly,
pA VA − pB VB pC VC − pD VD
W = pA (VA − VD ) + +
γ−1 γ−1
(2.104)
γ pA (VA − VD ) (pB − pC )VB
= −
γ−1 γ−1
and
γ pA (VA − VD )
QDA = . (2.105)
γ−1
To find the efficiency, we will need to eliminate pB and pC in favor of pA using the adiabatic equation of
state d(pV γ ) = 0. Thus,  γ  γ
VA VD
pB = pA · , pC = pA · , (2.106)
VB VB
where we’ve used pD = pA and VC = VB . Putting it all together, the efficiency of the Diesel cycle is
W 1 r 1−γ (sγ − 1)
η= =1− . (2.107)
QDA γ s−1

2.6.7 The Joule-Brayton cycle

Our final example is the Joule-Brayton cycle, depicted in Fig. 2.18, consisting of two adiabats and two
isobars. Along the adiabats we have Thus,
p2 VAγ = p1 VDγ , p2 VBγ = p1 VCγ , (2.108)
which says
 γ −1
VD V p2
= C = . (2.109)
VA VB p1
2.6. HEAT ENGINES AND THE SECOND LAW OF THERMODYNAMICS 57

Figure 2.18: A Joule-Brayton cycle consists of two adiabats (dark red) and two isobars (light blue).

AB: This isobaric expansion at p = p2 is the power stroke. We have

ZVB
WAB = dV p2 = p2 (VB − VA ) (2.110)
VA

p2 (VB − VA )
∆EAB = EB − EA = (2.111)
γ−1
γ p2 (VB − VA )
QAB = ∆EAB + WAB = . (2.112)
γ−1

BC: Adiabatic expansion; QBC = 0 and WBC = EB − EC . The work done by the gas is
 
p 2 VB − p 1 VC p 2 VB p 1 VC
WBC = = 1− ·
γ−1 γ−1 p 2 VB
"  1−γ −1 # (2.113)
p 2 VB p1
= 1− .
γ−1 p2

CD: Isobaric compression at p = p1 .

ZVD  1−γ −1
p1
WCD = dV p1 = p1 (VD − VC ) = −p2 (VB − VA ) (2.114)
p2
VC

p1 (VD − VC )
∆ECD = ED − EC = (2.115)
γ−1
 1−γ −1
γ p2 p1
QCD = ∆ECD + WCD =− (V − VA ) . (2.116)
γ−1 B p2
58 CHAPTER 2. THERMODYNAMICS

DA: Adiabatic expansion; QDA = 0 and WDA = ED − EA . The work done by the gas is
 
p1 VD − p2 VA p 2 VA p1 VD
WDA = =− 1− ·
γ−1 γ −1 p2 VA
"  1−γ −1 # (2.117)
p2 VA p1
=− 1− .
γ−1 p2

The total work done per cycle is then


W = WAB + WBC + WCD + WDA
"  1−γ −1 #
γ p2 (VB − VA ) p1 (2.118)
= 1−
γ−1 p2

and the efficiency is defined to be


 1−γ −1
W p1
η≡ =1− . (2.119)
QAB p2

2.6.8 Carnot engine at maximum power output

While the Carnot engine described above in §2.6.4 has maximum efficiency, it is practically useless,
because the isothermal processes must take place infinitely slowly in order for the working material to
remain in thermal equilibrium with each reservoir. Thus, while the work done per cycle is finite, the cycle
period is infinite, and the engine power is zero.
A modification of the ideal Carnot cycle is necessary to create a practical engine. The idea9 is as follows.
During the isothermal expansion stage, the working material is maintained at a temperature T2w < T2 .
The temperature difference between the working material and the hot reservoir drives a thermal current,
dQ
¯ 2
= κ2 (T2 − T2w ) . (2.120)
dt
Here, κ2 is a transport coefficient which describes the thermal conductivity of the chamber walls, multiplied
by a geometric parameter (which is the ratio of the total wall area to its thickness). Similarly, during the
isothermal compression, the working material is maintained at a temperature T1w > T1 , which drives a
thermal current to the cold reservoir,
dQ
¯ 1
= κ1 (T1w − T1 ) . (2.121)
dt
Now let us assume that the upper isothermal stage requires a duration ∆t2 and the lower isotherm a
duration ∆t1 . Then

Q2 = κ2 ∆t2 (T2 − T2w ) (2.122)


Q1 = κ1 ∆t1 (T1w − T1 ) . (2.123)
9
See F. L. Curzon and B. Ahlborn, Am. J. Phys. 43, 22 (1975). I am grateful to Professor Asle Sudbø for correcting a
typo in one expression and providing a simplified form of another.
2.6. HEAT ENGINES AND THE SECOND LAW OF THERMODYNAMICS 59

Power source T1 (◦ C) T2 (◦ C) ηCarnot η (theor.) η (obs.)


West Thurrock (UK)
Coal Fired Steam Plant ∼ 25 565 0.641 0.40 0.36
CANDU (Canada)
PHW Nuclear Reactor ∼ 25 300 0.480 0.28 0.30
Larderello (Italy)
Geothermal Steam Plant ∼ 80 250 0.323 0.175 0.16

Table 2.2: Observed performances of real heat engines, taken from table 1 from Curzon and Albhorn
(1975).

Since the engine is reversible, we must have

Q1 Q
= 2 , (2.124)
T1w T2w

which says
∆t1 κ T (T − T2w )
= 2 1w 2 . (2.125)
∆t2 κ1 T2w (T1w − T1 )
The power is
Q2 − Q1
P = , (2.126)
(1 + α) (∆t1 + ∆t2 )
where we assume that the adiabatic stages require a combined time of α (∆t1 + ∆t2 ). Thus, we find

κ1 κ2 (T2w − T1w ) (T1w − T1 ) (T2 − T2w )


P = · (2.127)
1 + α κ1 T2w (T1w − T1 ) + κ2 T1w (T2 − T2w )

We optimize the engine by maximizing P with respect to the temperatures T1w and T2w . This yields
p
T2 − T1 T2
T2w = T2 − p (2.128)
1 + κ2 /κ1
p
T1 T2 − T1
T1w = T1 + p . (2.129)
1 + κ1 /κ2

The efficiency at maximum power is then


s
Q − Q1 T T1
η= 2 = 1 − 1w = 1 − . (2.130)
Q2 T2w T2

One also finds at maximum power s


∆t2 κ1
= . (2.131)
∆t1 κ2
60 CHAPTER 2. THERMODYNAMICS

Finally, the maximized power is


p p !2
κ κ T2 − T1
Pmax = 1 2 p p . (2.132)
1+α κ1 + κ2

Table 2.2, taken from the article of Curzon and Albhorn (1975), shows how the efficiency of this practical
Carnot cycle, given by eqn. 2.130, rather accurately predicts the efficiencies of functioning power plants.

2.7 The Entropy

2.7.1 Entropy and heat

The Second Law guarantees us that an engine operating between two heat baths at temperatures T1 and
T2 must satisfy
Q1 Q2
+ ≤0, (2.133)
T1 T2
with the equality holding for reversible processes. This is a restatement of eqn. 2.63, after writing
Q1 = −Q1 for the heat transferred to the engine from reservoir #1. Consider now an arbitrary curve in
the p − V plane. We can describe such a curve, to arbitrary accuracy, as a combination of Carnot cycles,
as shown in Fig. 2.19. Each little Carnot cycle consists of two adiabats and two isotherms. We then
conclude I
XQ dQ
¯
i
−→ ≤0, (2.134)
Ti T
i C
with equality holding if all the cycles are reversible. Rudolf Clausius, in 1865, realized that one could
then define a new state function, which he called the entropy, S, that depended only on the initial and
final states of a reversible process:

ZB
dQ
¯ dQ
¯
dS = =⇒ SB − SA = . (2.135)
T T
A

Since Q is extensive, so is S; the units of entropy are [S] = J/K.

2.7.2 The Third Law of Thermodynamics

Eqn. 2.135 determines the entropy up to a constant. By choosing a standard state Υ, we can define
SΥ = 0, and then by taking A = Υ in the above equation, we can define the absolute entropy S for
any state. However, it turns out that this seemingly arbitrary constant SΥ in the entropy does have
consequences, for example in the theory of gaseous equilibrium. The proper definition of entropy, from
the point of view of statistical mechanics, will lead us to understand how the zero temperature entropy
of a system is related to its quantum mechanical ground state degeneracy. Walther Nernst, in 1906,
articulated a principle which is sometimes called the Third Law of Thermodynamics,
2.7. THE ENTROPY 61

The entropy of every system at absolute zero temperature always vanishes.

Again, this is not quite correct, and quantum mechanics tells us that S(T = 0) = kB ln g, where g is the
ground state degeneracy. Nernst’s law holds when g = 1.
We can combine the First and Second laws to write

dE + dW ¯ ≤ T dS ,
¯ = dQ (2.136)

where the equality holds for reversible processes.

2.7.3 Entropy changes in cyclic processes

For a cyclic process, whether reversible or not, the change in entropy around a cycle is zero: ∆SCYC = 0.
This is because the entropy S is a state function, with a unique value for every equilibrium state. A
cyclical process returns to the same equilibrium state, hence S must return as well to its corresponding
value from the previous cycle.
Consider now a general engine, as in Fig. 2.12. Let us compute the total entropy change in the entire
Universe over one cycle. We have

(∆S)TOTAL = (∆S)ENGINE + (∆S)HOT + (∆S)COLD , (2.137)

written as a sum over entropy changes of the engine itself, the hot reservoir, and the cold reservoir10 .
10
We neglect any interfacial contributions to the entropy change, which will be small compared with the bulk entropy
change in the thermodynamic limit of large system size.

Figure 2.19: An arbitrarily shaped cycle in the p − V plane can be decomposed into a number of smaller
Carnot cycles. Red curves indicate isotherms and blue curves adiabats, with γ = 35 .
62 CHAPTER 2. THERMODYNAMICS

Clearly (∆S)ENGINE = 0. The changes in the reservoir entropies are


Z
dQ
¯ HOT Q
(∆S)HOT = =− 2 < 0 (2.138)
T T2
T =T2
Z
dQ
¯ COLD Q Q
(∆S)COLD = = 1 =− 1 > 0, (2.139)
T T1 T1
T =T1

because the hot reservoir loses heat Q2 > 0 to the engine, and the cold reservoir gains heat Q1 = −Q1 > 0
from the engine. Therefore,  
Q1 Q2
(∆S)TOTAL = − + ≥0. (2.140)
T1 T2
Thus, for a reversible cycle, the net change in the total entropy of the engine plus reservoirs is zero. For
an irreversible cycle, there is an increase in total entropy, due to spontaneous processes.

2.7.4 Gibbs-Duhem relation

Recall eqn. 2.6: X X


¯ =−
dW yj dXj − µa dNa . (2.141)
j a

For reversible systems, we can therefore write


X X
dE = T dS + yj dXj + µa dNa . (2.142)
j a

This says that the energy E is a function of the entropy S, the generalized displacements {Xj }, and the
particle numbers {Na }: 
E = E S, {Xj }, {Na } . (2.143)
Furthermore, we have
  !  
∂E ∂E ∂E
T = , yj = , µa = (2.144)
∂S {Xj ,Na } ∂Xj ∂Na S,{Xj ,Nb(6=a) }
S,{Xi(6=j) ,Na }

Since E and all its arguments are extensive, we have



λE = E λS, {λXj }, {λNa } . (2.145)

We now differentiate the LHS and RHS above with respect to λ, setting λ = 1 afterward. The result is

∂E X ∂E X ∂E
E=S + Xj + Na
∂S ∂Xj a
∂Na
j
X X (2.146)
= TS + yj Xj + µ a Na .
j a
2.7. THE ENTROPY 63

Mathematically astute readers will recognize this result as an example of Euler’s theorem for homogeneous
functions. Taking the differential of eqn. 2.146, and then subtracting eqn. 2.142, we obtain
X X
S dT + Xj dyj + Na dµa = 0 . (2.147)
j a

This is called the Gibbs-Duhem relation. It says that there is one equation of state which may be written
in terms of all the intensive quantities alone. For example, for a single component system, we must have
p = p(T, µ), which follows from
S dT − V dp + N dµ = 0 . (2.148)

2.7.5 Entropy for an ideal gas

For an ideal gas, we have E = 12 f N kB T , and

1 p µ
dS = dE + dV − dN
T T T   (2.149)
1 dT p 1 µ
= 2 f N kB + dV + 2 f kB − dN .
T T T

Invoking the ideal gas equation of state pV = N kB T , we have



dS N = 12 f N kB d ln T + N kB d ln V . (2.150)

Integrating, we obtain
S(T, V, N ) = 12 f N kB ln T + N kB ln V + ϕ(N ) , (2.151)

where ϕ(N ) is an arbitrary function. Extensivity of S places restrictions on ϕ(N ), so that the most
general case is  
1 V
S(T, V, N ) = 2 f N kB ln T + N kB ln + Na , (2.152)
N
where a is a constant. Equivalently, we could write
   
1 E V
S(E, V, N ) = 2 f N kB ln + N kB ln + Nb , (2.153)
N N

where b = a − 12 f kB ln( 21 f kB ) is another constant. When we study statistical mechanics, we will find that
for the monatomic ideal gas the entropy is
"  #
5 V
S(T, V, N ) = N kB 2 + ln , (2.154)
N λ3T
p
where λT = 2π~2 /mkB T is the thermal wavelength, which involved Planck’s constant. Let’s now
contrast two illustrative cases.
64 CHAPTER 2. THERMODYNAMICS

• Adiabatic free expansion – Suppose the volume freely expands from Vi to Vf = r Vi , with r > 1. Such
an expansion can be effected by a removal of a partition between two chambers that are otherwise
thermally insulated (see Fig. 2.10). We have already seen how this process entails

∆E = Q = W = 0 . (2.155)

But the entropy changes! According to eqn. 2.153, we have

∆S = Sf − Si = N kB ln r . (2.156)

• Reversible adiabatic expansion – If the gas expands quasistatically and reversibly, then S =
S(E, V, N ) holds everywhere along the thermodynamic path. We then have, assuming dN = 0,
dE dV
0 = dS = 12 f N kB + N kB
E V (2.157)

= N kB d ln V E f /2 .

Integrating, we find
 2/f
E V0
= . (2.158)
E0 V
Thus,
Ef = r −2/f Ei ⇐⇒ Tf = r −2/f Ti . (2.159)

2.7.6 Example system

Consider a model thermodynamic system for which

aS 3
E(S, V, N ) = , (2.160)
NV
where a is a constant. We have
dE = T dS − p dV + µ dN , (2.161)
and therefore
 
∂E 3aS 2
T = = (2.162)
∂S V,N NV
 
∂E aS 3
p=− = (2.163)
∂V S,N NV 2
 
∂E aS 3
µ= =− 2 . (2.164)
∂N S,V N V

Choosing any two of these equations, we can eliminate S, which is inconvenient for experimental purposes.
This yields three equations of state,

T3 V T3 N p N
= 27a , = 27a , =− , (2.165)
p2 N µ2 V µ V
2.7. THE ENTROPY 65

only two of which are independent.


What about CV and Cp ? To find CV , we recast eqn. 2.162 as
 1/2
NV T
S= . (2.166)
3a
We then have    1/2
∂S 1 NV T N T2
CV = T = = , (2.167)
∂T V,N 2 3a 18a p
where the last equality on the RHS follows upon invoking the first of the equations of state in eqn. 2.165.
To find Cp , we eliminate V from eqns. 2.162 and 2.163, obtaining T 2 /p = 9aS/N . From this we obtain
 
∂S 2N T 2
Cp = T = . (2.168)
∂T p,N 9a p

Thus, Cp /CV = 4.

We can derive still more. To find the isothermal compressibility κT = − V1 ∂V
, use the first of the
∂p T,N

equations of state in eqn. 2.165. To derive the adiabatic compressibility κS = − V1 ∂V
∂p S,N , use eqn.
2.163, and then eliminate the inconvenient variable S.
Suppose we use this system as the working substance for a Carnot engine. Let’s compute the work done
and the engine efficiency. To do this, it is helpful to eliminate S in the expression for the energy, and to
rewrite the equation of state:
r r
N 1/2 3/2 N T 3/2
E = pV = V T , p= . (2.169)
27a 27a V 1/2
We assume dN = 0 throughout. We now see that for isotherms,
E
dT = 0 : √ = constant (2.170)
V
Furthermore, since r
N 3/2 dV
¯ T =
dW T 1/2
= 2 dE T , (2.171)
27a V
we conclude that

dT = 0 : Wif = 2(Ef − Ei ) , Qif = Ef − Ei + Wif = 3(Ef − Ei ) . (2.172)

For adiabats, eqn. 2.162 says d(T V ) = 0, and therefore


E
dQ
¯ = 0 : T V = constant , = constant , EV = constant (2.173)
T
as well as Wif = Ei − Ef . We can use these relations to derive the following:
s s
VB T1 VB T
EB = EA , EC = EA , ED = 1 EA . (2.174)
VA T2 VA T2
66 CHAPTER 2. THERMODYNAMICS

Now we can write


s !
VB
WAB = 2(EB − EA ) = 2 − 1 EA (2.175)
VA
s !
VB T1
WBC = (EB − EC ) = 1− EA (2.176)
VA T2
s !
T1 VB
WCD = 2(ED − EC ) = 2 1− EA (2.177)
T2 VA
!
T1
WDA = (ED − EA ) = − 1 EA (2.178)
T2

Adding up all the work, we obtain

W = WAB + WBC + WCD + WDA


s ! ! (2.179)
VB T1
=3 −1 1− EA .
VA T2

Since s !
3 VB
QAB = 3(EB − EA ) = 2 WAB =3 − 1 EA , (2.180)
VA
we find once again
W T
η= =1− 1 . (2.181)
QAB T2

2.7.7 Measuring the entropy of a substance

If we can measure the heat capacity CV (T ) or Cp (T ) of a substance as a function of temperature down to


the lowest temperatures, then we can measure the entropy. At constant pressure, for example, we have
T dS = Cp dT , hence
ZT
Cp (T ′ )
S(p, T ) = S(p, T = 0) + dT ′ . (2.182)
T′
0

The zero temperature entropy is S(p, T = 0) = kB ln g where g is the quantum ground state degeneracy
at pressure p. In all but highly unusual cases, g = 1 and S(p, T = 0) = 0.

2.8 Thermodynamic Potentials

Thermodynamic systems may do work on their environments. Under certain constraints, the work done
may be bounded from above by the change in an appropriately defined thermodynamic potential.
2.8. THERMODYNAMIC POTENTIALS 67

2.8.1 Energy E

Suppose we wish to create a thermodynamic system from scratch. Let’s imagine that we create it from
scratch in a thermally insulated box of volume V . The work we must to to assemble the system is
then W = E . After we bring all the constituent particles together, pulling them in from infinity (say),
the system will have total energy E. After we finish, the system may not be in thermal equilibrium.
Spontaneous processes will then occur so as to maximize the system’s entropy, but the internal energy
remains at E.
¯ − dW
We have, from the First Law, dE = dQ ¯ and combining this with the Second Law in the form
¯ ≤ T dS yields
dQ
dE ≤ T dS − dW
¯ . (2.183)
Rearranging terms, we have dW ¯ ≤ T dS − dE . Hence, the work done by a thermodynamic system under
conditions of constant entropy is bounded above by −dE, and the maximum dW
¯ is achieved for a reversible
process. It is sometimes useful to define the quantity

dW ¯ − p dV ,
¯ free = dW (2.184)

which is the differential work done by the system other than that required to change its volume. Then
we have
¯ free ≤ T dS − p dV − dE ,
dW (2.185)
¯ free ≤ −dE.
and we conclude for systems at fixed (S, V ) that dW
¯ = p dV −
In equilibrium, the equality in Eqn. 2.183 holds, and for single component systems where dW
µ dN we have E = E(S, V, N ) with
     
∂E ∂E ∂E
T = , −p = , µ= . (2.186)
∂S V,N ∂V S,N ∂N S,V

These expressions are easily generalized to multicomponent systems, magnetic systems, etc.
Now consider a single component system at fixed (S, V, N ). We conclude that dE ≤ 0 , which says that
spontaneous processes in a system with dS = dV = dN = 0 always lead to a reduction in the internal
energy E. Therefore, spontaneous processes drive the internal energy E to a minimum in systems at fixed
(S, V, N ).

2.8.2 Helmholtz free energy F

Suppose that when we spontaneously create our system while it is in constant contact with a thermal
reservoir at temperature T . Then as we create our system, it will absorb heat from the reservoir.
Therefore, we don’t have to supply the full internal energy E, but rather only E − Q, since the system
receives heat energy Q from the reservoir. In other words, we must perform work W = E − T S to create
our system, if it is constantly in equilibrium at temperature T . The quantity E − T S is known as the
Helmholtz free energy, F , which is related to the energy E by a Legendre transformation,

F = E − TS . (2.187)
68 CHAPTER 2. THERMODYNAMICS

The general properties of Legendre transformations are discussed in Appendix II, §2.16.
Again invoking the Second Law, we have

dF ≤ −S dT − dW
¯ . (2.188)

Rearranging terms, we have dW ¯ ≤ −S dT − dF , which says that the work done by a thermodynamic
system under conditions of constant temperature is bounded above by −dF , and the maximum dW
¯ is
achieved for a reversible process. We also have the general result

¯ free ≤ −S dT − p dV − dF ,
dW (2.189)

¯ free ≤ −dF .
and we conclude, for systems at fixed (T, V ), that dW
Under equilibrium conditions, the equality in Eqn. 2.188 holds, and for single component systems where
¯ = p dV − µ dN we have dF = −S dT − p dV + µ dN . This says that F = F (T, V, N ) with
dW
     
∂F ∂F ∂F
−S = , −p = , µ= . (2.190)
∂T V,N ∂V T,N ∂N T,V

For spontaneous processes, dF ≤ −S dT −p dV +µ dN says that spontaneous processes drive the Helmholtz
free energy F to a minimum in systems at fixed (T, V, N ).

2.8.3 Enthalpy H

Suppose that when we spontaneously create our system while it is thermally insulated, but in constant
mechanical contact with a ‘volume bath’ at pressure p. For example, we could create our system inside
a thermally insulated chamber with one movable wall where the external pressure is fixed at p. Thus,
when creating the system, in addition to the system’s internal energy E, we must also perform work pV
in order to make room for it. In other words, we must perform work W = E + pV . The quantity E + pV
is known as the enthalpy, H. (We use the calligraphic font for H for enthalpy to avoid confusing it with
magnetic field, H.) The enthalpy is obtained from the energy via a different Legendre transformation
than that used to obtain the Helmholtz free energy F , i.e.

H = E + pV . (2.191)

Again invoking the Second Law, we have

dH ≤ T dS − dW
¯ + p dV + V dp , (2.192)

hence with dW ¯ − p dV , we have in general


¯ free = dW

¯ free ≤ T dS + V dp − dH ,
dW (2.193)

¯ free ≤ −dH.
and we conclude, for systems at fixed (S, p), that dW
In equilibrium, for single component systems,

dH = T dS + V dp + µ dN , (2.194)
2.8. THERMODYNAMIC POTENTIALS 69

which says H = H(S, p, N ), with


     
∂H ∂H ∂H
T = , V = , µ= . (2.195)
∂S p,N ∂p S,N ∂N S,p

For spontaneous processes, dH ≤ T dS + V dp + µ dN , which says that spontaneous processes drive the
enthalpy H to a minimum in systems at fixed (S, p, N ).

2.8.4 Gibbs free energy G

If we create a thermodynamic system at conditions of constant temperature T and constant pressure


p, then it absorbs heat energy Q = T S from the reservoir and we must expend work energy pV in
order to make room for it. Thus, the total amount of work we must do in assembling our system is
W = E − T S + pV . This is the Gibbs free energy, G. The Gibbs free energy is obtained from E after
two Legendre transformations, viz.
G = E − T S + pV (2.196)
Note that G = F + pV = H − T S. The Second Law says that

dG ≤ −S dT + V dp + p dV − dW
¯ , (2.197)

¯ free ≤ −S dT + V dp − dG . Accordingly, we conclude, for systems at fixed


which we may rearrange as dW
¯ free ≤ −dG.
(T, p), that dW
For equilibrium one-component systems, the differential of G is

dG = −S dT + V dp + µ dN , (2.198)

therefore G = G(T, p, N ), with


     
∂G ∂G ∂G
−S = , V = , µ= . (2.199)
∂T p,N ∂p T,N ∂N T,p

Recall that Euler’s theorem for single component systems requires E = T S − pV + µN which says
G = µN , Thus, the chemical potential µ is the Gibbs free energy per particle. For spontaneous processes,
dG ≤ −S dT + V dp + µ dN , hence spontaneous processes drive the Gibbs free energy G to a minimum in
systems at fixed (T, p, N ).

2.8.5 Grand potential Ω

The grand potential, sometimes called the Landau free energy, is defined by

Ω = E − T S − µN . (2.200)

Under equilibrium conditions, its differential is

dΩ = −S dT − p dV − N dµ , (2.201)
70 CHAPTER 2. THERMODYNAMICS

hence      
∂Ω ∂Ω ∂Ω
−S = , −p = , −N = . (2.202)
∂T V,µ ∂V T,µ ∂µ T,V

Again invoking eqn. 2.146, we find Ω = −pV , which says that the pressure is the negative of the grand
potential per unit volume.
The Second Law tells us
dΩ ≤ −dW
¯ − S dT − µ dN − N dµ , (2.203)
hence
f ≡ dW
d¯W ¯ free + µ dN ≤ −S dT − p dV − N dµ − dΩ . (2.204)
free
ffree ≤ −dΩ.
We conclude, for systems at fixed (T, V, µ), that d¯W

2.9 Maxwell Relations

Maxwell relations are conditions equating certain derivatives of state variables which follow from the
exactness of the differentials of the various state functions.

2.9.1 Relations deriving from E(S, V, N)

The energy E(S, V, N ) is a state function, with

dE = T dS − p dV + µ dN , (2.205)

and therefore      
∂E ∂E ∂E
T = , −p = , µ= . (2.206)
∂S V,N ∂V S,N ∂N S,V

Taking the mixed second derivatives, we find


   
∂ 2E ∂T ∂p
= =− (2.207)
∂S ∂V ∂V S,N ∂S V,N
   
∂ 2E ∂T ∂µ
= = (2.208)
∂S ∂N ∂N S,V ∂S V,N
   
∂ 2E ∂p ∂µ
=− = . (2.209)
∂V ∂N ∂N S,V ∂V S,N

2.9.2 Relations deriving from F (T, V, N)

The energy F (T, V, N ) is a state function, with

dF = −S dT − p dV + µ dN , (2.210)
2.9. MAXWELL RELATIONS 71

and therefore
     
∂F ∂F ∂F
−S = , −p = , µ= . (2.211)
∂T V,N ∂V T,N ∂N T,V

Taking the mixed second derivatives, we find


   
∂ 2F ∂S ∂p
=− =− (2.212)
∂T ∂V ∂V T,N ∂T V,N
   
∂ 2F ∂S ∂µ
=− = (2.213)
∂T ∂N ∂N T,V ∂T V,N
   
∂ 2F ∂p ∂µ
=− = . (2.214)
∂V ∂N ∂N T,V ∂V T,N

2.9.3 Relations deriving from H(S, p, N)

The enthalpy H(S, p, N ) satisfies


dH = T dS + V dp + µ dN , (2.215)
which says H = H(S, p, N ), with
     
∂H ∂H ∂H
T = , V = , µ= . (2.216)
∂S p,N ∂p S,N ∂N S,p

Taking the mixed second derivatives, we find


   
∂2H ∂T ∂V
= = (2.217)
∂S ∂p ∂p
S,N ∂S p,N
   
∂2H ∂T ∂µ
= = (2.218)
∂S ∂N ∂N S,p ∂S p,N
   
∂2H ∂V ∂µ
= = . (2.219)
∂p ∂N ∂N S,p ∂p S,N

2.9.4 Relations deriving from G(T, p, N)

The Gibbs free energy G(T, p, N ) satisfies

dG = −S dT + V dp + µ dN , (2.220)

therefore G = G(T, p, N ), with


     
∂G ∂G ∂G
−S = , V = , µ= . (2.221)
∂T p,N ∂p T,N ∂N T,p
72 CHAPTER 2. THERMODYNAMICS

Taking the mixed second derivatives, we find


   
∂2G ∂S ∂V
=− = (2.222)
∂T ∂p ∂p T,N ∂T p,N
   
∂2G ∂S ∂µ
=− = (2.223)
∂T ∂N ∂N T,p ∂T p,N
   
∂2G ∂V ∂µ
= = . (2.224)
∂p ∂N ∂N T,p ∂p T,N

2.9.5 Relations deriving from Ω(T, V, µ)

The grand potential Ω(T, V, µ) satisfied

dΩ = −S dT − p dV − N dµ , (2.225)

hence      
∂Ω ∂Ω ∂Ω
−S = , −p = , −N = . (2.226)
∂T V,µ ∂V T,µ ∂µ T,V

Taking the mixed second derivatives, we find


   
∂ 2Ω ∂S ∂p
=− =− (2.227)
∂T ∂V ∂V T,µ ∂T V,µ
   
∂ 2Ω ∂S ∂N
=− =− (2.228)
∂T ∂µ ∂µ T,V ∂T V,µ
   
∂ 2Ω ∂p ∂N
=− =− . (2.229)
∂V ∂µ ∂µ T,V ∂V T,µ

Relations deriving from S(E, V, N )

We can also derive Maxwell relations based on the entropy S(E, V, N ) itself. For example, we have

1 p µ
dS = dE + dV − dN . (2.230)
T T T

Therefore S = S(E, V, N ) and


   
∂ 2S ∂(T −1 ) ∂(pT −1 )
= = , (2.231)
∂E ∂V ∂V E,N ∂E V,N

et cetera.
2.10. EQUILIBRIUM AND STABILITY 73

2.9.6 Generalized thermodynamic potentials

We have up until now assumed a generalized force-displacement pair (y, X) = (−p, V ). But the above
results also generalize to e.g. magnetic systems, where (y, X) = (H, M ). In general, we have

THIS SPACE AVAILABLE dE = T dS + y dX + µ dN (2.232)

F = E − TS dF = −S dT + y dX + µ dN (2.233)

H = E − yX dH = T dS − X dy + µ dN (2.234)

G = E − T S − yX dG = −S dT − X dy + µ dN (2.235)

Ω = E − T S − µN dΩ = −S dT + y dX − N dµ . (2.236)

Generalizing (−p, V ) → (y, X), we also obtain, mutatis mutandis, the following Maxwell relations:
           
∂T ∂y ∂T ∂µ ∂y ∂µ
= = =
∂X S,N ∂S X,N ∂N S,X ∂S X,N ∂N S,X ∂X S,N
           
∂T ∂X ∂T ∂µ ∂X ∂µ
=− = =−
∂y S,N ∂S y,N ∂N S,y ∂S y,N ∂N S,y ∂y S,N
           
∂S ∂y ∂S ∂µ ∂y ∂µ
=− =− =
∂X T,N ∂T X,N ∂N T,X ∂T X,N ∂N T,X ∂X T,N
           
∂S ∂X ∂S ∂µ ∂X ∂µ
= =− =−
∂y T,N ∂T y,N ∂N T,y ∂T y,N ∂N T,y ∂y T,N
           
∂S ∂y ∂S ∂N ∂y ∂N
=− = =− .
∂X T,µ ∂T X,µ ∂µ T,X ∂T X,µ ∂µ T,X ∂X T,µ

2.10 Equilibrium and Stability

2.10.1 Equilibrium

Suppose we have two systems, A and B, which are free to exchange energy, volume, and particle number,
subject to overall conservation rules

EA + EB = E , VA + V B = V , NA + N B = N , (2.237)

where E, V , and N are fixed. Now let us compute the change in the total entropy of the combined
systems when they are allowed to exchange energy, volume, or particle number. We assume that the
74 CHAPTER 2. THERMODYNAMICS

entropy is additive, i.e.


"    # "    #
∂SA ∂SB ∂SA ∂SB
dS = − dEA + − dVA
∂EA VA ,NA ∂EB V ,N ∂VA E ,N ∂VB E ,N
B B A A B B
"    #
∂SA ∂SB
+ − dNA . (2.238)
∂NA E ,V ∂NB E ,V
A A B B

Note that we have used dEB = −dEA , dVB = −dVA , and dNB = −dNA . Now we know from the Second
Law that spontaneous processes result in T dS > 0, which means that S tends to a maximum. If S is a
maximum, it must be that the coefficients of dEA , dVA , and dNA all vanish, else we could increase the total
entropy of the system by a judicious choice of these three differentials. From T dS = dE + p dV − µ, dN ,
we have
     
1 ∂S p ∂S µ ∂S
= , = , =− . (2.239)
T ∂E V,N T ∂V E,N T ∂N E,V

Thus, we conclude that in order for the system to be in equilibrium, so that S is maximized and can
increase no further under spontaneous processes, we must have

TA = TB (thermal equilibrium) (2.240)


pA p
= B (mechanical equilibrium) (2.241)
TA TB
µA µ
= B (chemical equilibrium) (2.242)
TA TB

2.10.2 Stability

Next, consider a uniform system with energy E ′ = 2E, volume V ′ = 2V , and particle number N ′ = 2N .
We wish to check that this system is not unstable with respect to spontaneously becoming inhomogeneous.
To that end, we imagine dividing the system in half. Each half would have energy E, volume V , and
particle number N . But suppose we divided up these quantities differently, so that the left half had
slightly different energy, volume, and particle number than the right, as depicted in Fig. 2.20. Does the
entropy increase or decrease? We have

∆S = S(E + ∆E, V + ∆V, N + ∆N ) + S(E − ∆E, V − ∆V, N − ∆N ) − S(2E, 2V, 2N )


∂ 2S 2 ∂ 2S 2 ∂ 2S
= (∆E) + (∆V ) + (∆N )2 (2.243)
∂E 2 ∂V 2 ∂N 2
∂ 2S ∂ 2S ∂ 2S
+2 ∆E ∆V + 2 ∆E ∆N + 2 ∆V ∆N .
∂E ∂V ∂E ∂N ∂V ∂N

Thus, we can write


X
∆S = Qij Ψi Ψj , (2.244)
i,j
2.10. EQUILIBRIUM AND STABILITY 75

Figure 2.20: To check for an instability, we compare the energy of a system to its total energy when we
reapportion its energy, volume, and particle number slightly unequally.

where  
∂ 2S ∂ 2S ∂ 2S
∂E 2 ∂E ∂V ∂E ∂N
 
 
 ∂ 2S ∂ 2S ∂ 2S 
Q =  ∂E ∂V 2 ∂V ∂N 
(2.245)
 ∂V 
 
∂ 2S ∂ 2S ∂ 2S
∂E ∂N ∂V ∂N ∂N 2
is the matrix of second derivatives, known in mathematical parlance as the Hessian, and Ψ = (∆E, ∆V, ∆N ).
Note that Q is a symmetric matrix.
Since S must be a maximum in order for the system to be in equilibrium, we are tempted to conclude
that the homogeneous system is stable if and only if all three eigenvalues of Q are negative. If one or more
of the eigenvalues is positive, then it is possible to choose a set of variations Ψ such that ∆S > 0, which
would contradict the assumption that the homogeneous state is one of maximum entropy. A matrix with
this restriction is said to be negative definite. While it is true that Q can have no positive eigenvalues,
it is clear from homogeneity of S(E, V, N ) that one of the three eigenvalues must be zero, corresponding
to the eigenvector Ψ = (E, V, N ). Homogeneity means S(λE, λV, λN ) = λS(E, V, N ). Now let us take
λ = 1 + η, where η is infinitesimal. Then ∆E = ηE, ∆V = ηV , and ∆N = ηN , and homogeneity says
S(E ± ∆E, V ± ∆V, N ± ∆N ) = (1 ± η) S(E, V, N ) and ∆S = (1 + η)S + (1 − η)S − 2S = 0. We then
have a slightly weaker characterization of Q as negative semidefinite.
However, if we fix one of the components of (∆E, ∆V, ∆N ) to be zero, then Ψ must have some component
orthogonal to the zero eigenvector, in which case ∆S < 0. Suppose we set ∆N = 0 and we just examine
the stability with respect to inhomogeneities in energy and volume. We then restrict our attention to the
upper left 2 × 2 submatrix of Q. A general symmetric 2 × 2 matrix may be written
 
a b
Q= (2.246)
b c
It is easy to solve for the eigenvalues of Q. One finds
  s 
a+c a−c 2
λ± = ± + b2 . (2.247)
2 2
In order for Q to be negative definite, we require λ+ < 0 and λ− < 0. Thus, Tr Q = a + c = λ+ + λ− < 0
and det Q = ac − b2 = λ+ λ− > 0. Taken together, these conditions require
a<0 , c<0 , ac > b2 . (2.248)
76 CHAPTER 2. THERMODYNAMICS

Going back to thermodynamic variables, this requires


 2
∂ 2S ∂ 2S ∂ 2S ∂ 2S ∂ 2S
<0 , <0 , · > . (2.249)
∂E 2 ∂V 2 ∂E 2 ∂V 2 ∂E ∂V

Thus the entropy is a concave function of E and V at fixed N . Had we set ∆E = 0 and considered
the lower right 2 × 2 submatrix of Q, we’d have concluded that S(V, N ) is concave at fixed E. Since
∂S
 −1 , we have ∂ 2S = − 1 ∂T
 C
∂E V = T ∂E 2 T 2 ∂E V
= − TV2 < 0 and we conclude CV > 0 for stability.
Many thermodynamic systems are held at fixed (T, p, N ), which suggests we examine the stability criteria
for G(T, p, N ). Suppose our system is in equilibrium with a reservoir at temperature T0 and pressure p0 .
Then, suppressing N (which is assumed constant), we have

G(T0 , p0 ) = E − T0 S + p0 V . (2.250)

Now suppose there is a fluctuation in the entropy and the volume of our system, which is held at fixed
particle number. Going to second order in ∆S and ∆V , we have
"  # "  #
∂E ∂E
∆G = − T0 ∆S + + p0 ∆V
∂S V ∂V S
" # (2.251)
1 ∂ 2E 2 ∂ 2E ∂ 2E 2
+ (∆S) + 2 ∆S ∆V + (∆V ) + . . . .
2 ∂S 2 ∂S ∂V ∂V 2

Equilibrium requires that the coefficients of ∆S and ∆V both vanish, i.e. that T = ∂E ∂S V,N = T0 and
∂E

p = − ∂V S,N
= p0 . The condition for stability is that ∆G > 0 for all (∆S, ∆V ). Stability therefore
requires that the Hessian matrix Q be positive definite, with
 2 2

∂ E ∂ E
 ∂S 2 ∂S ∂V 
Q=  . (2.252)
∂ 2E ∂ 2E
∂S ∂V ∂V 2

Thus, we have the following three conditions:


 
∂ 2E ∂T T
2
= = >0 (2.253)
∂S ∂S V CV
 
∂ 2E ∂p 1
2
=− = >0 (2.254)
∂V ∂V S V κS
 2 2  
∂ 2E ∂ 2E ∂E T ∂T 2
· − = − >0. (2.255)
∂S 2 ∂V 2 ∂S ∂V V κS CV ∂V S

As we shall discuss below, the quantity αS ≡ V1 ∂V
∂T S,N is the adiabatic thermal expansivity coefficient.
We therefore conclude that stability of any thermodynamic system requires
r
CV κS CV
>0 , κS > 0 , αS > . (2.256)
T VT
2.11. APPLICATIONS OF THERMODYNAMICS 77

Figure 2.21: Adiabatic free expansion via a thermal path. The initial and final states do not lie along
an adabat! Rather, for an ideal gas, the initial and final states lie along an isotherm.

2.11 Applications of Thermodynamics

A discussion of various useful mathematical relations among partial derivatives may be found in the
appendix in §2.17. Some facility with the differential multivariable calculus is extremely useful in the
analysis of thermodynamics problems.

2.11.1 Adiabatic free expansion revisited

Consider once again the adiabatic free expansion of a gas from initial volume Vi to final volume Vf = rVi .
Since the system is not in equilibrium during the free expansion process, the initial and final states do
not lie along an adiabat, i.e. they do not have the same entropy. Rather, as we found, from Q = W = 0,
we have that Ei = Ef , which means they have the same energy, and, in the case of an ideal gas, the
same temperature (assuming N is constant). Thus, the initial and final states lie along an isotherm. The
situation is depicted in Fig. 2.21. Now let us compute the change in entropy ∆S = Sf − Si by integrating
along this isotherm. Note that the actual dynamics are irreversible and do not quasistatically follow
any continuous thermodynamic path. However, we can use what is a fictitious thermodynamic path as a
means of comparing S in the initial and final states.
We have
ZVf  
∂S
∆S = Sf − Si = dV . (2.257)
∂V T,N
Vi

But from a Maxwell equation deriving from F , we have


   
∂S ∂p
= , (2.258)
∂V T,N ∂T V,N
78 CHAPTER 2. THERMODYNAMICS

hence
ZVf  
∂p
∆S = dV . (2.259)
∂T V,N
Vi

For an ideal gas, we can use the equation of state pV = N kB T to obtain


 
∂p N kB
= . (2.260)
∂T V,N V

The integral can now be computed:

ZrVi
N kB
∆S = dV = N kB ln r , (2.261)
V
Vi

as we found before, in eqn. 2.156 What is different about this derivation? Previously, we derived the
entropy change from the explicit formula for S(E, V, N ). Here, we did not need to know this function.
The Maxwell relation allowed us to compute the entropy change using only the equation of state.

2.11.2 Energy and volume

We saw how E(T, V, N ) = 21 f N kB T for an ideal gas, independent of the volume. In general we should
have
V

E(T, V, N ) = N φ T, N . (2.262)
V

For the ideal gas, φ T, N = 12 f kB T is a function of T alone and is independent on the other intensive
quantity V /N . How does energy vary with volume? At fixed temperature and particle number, we have,
from E = F + T S,
       
∂E ∂F ∂S ∂p
= +T = −p + T , (2.263)
∂V T,N ∂V T,N ∂V T,N ∂T V,N

∂S
 ∂p 
where we have used the Maxwell relation ∂V T.N
= ∂T V,N , derived from the mixed second derivative
∂ 2F
∂T ∂V. Another way to derive this result is as follows. Write dE = T dS − p dV + µ dN and then express
dS in terms of dT , dV , and dN , resulting in
  "   # "   #
∂S ∂S ∂µ
dE = T dT + T − p dV − T + µ dN . (2.264)
∂T V,N ∂V T,N ∂T V,N

∂E

Now read off ∂V V,N
and use the same Maxwell relation as before to recover eqn. 2.263. Applying this
result to the ideal gas law pV = N kB T results in the vanishing of the RHS, hence for any substance
obeying the ideal gas law we must have

E(T, V, N ) = ν ε(T ) = N ε(T )/NA . (2.265)


2.11. APPLICATIONS OF THERMODYNAMICS 79

  
L2 ·bar L
gas a mol2
b mol pc (bar) Tc (K) vc (L/mol)
Acetone 14.09 0.0994 52.82 505.1 0.2982
Argon 1.363 0.03219 48.72 150.9 0.0966
Carbon dioxide 3.640 0.04267 7404 304.0 0.1280
Ethanol 12.18 0.08407 63.83 516.3 0.2522
Freon 10.78 0.0998 40.09 384.9 0.2994
Helium 0.03457 0.0237 2.279 5.198 0.0711
Hydrogen 0.2476 0.02661 12.95 33.16 0.0798
Mercury 8.200 0.01696 1055 1723 0.0509
Methane 2.283 0.04278 46.20 190.2 0.1283
Nitrogen 1.408 0.03913 34.06 128.2 0.1174
Oxygen 1.378 0.03183 50.37 154.3 0.0955
Water 5.536 0.03049 220.6 647.0 0.0915

Table 2.3: Van der Waals parameters for some common gases. (Source: Wikipedia.)

2.11.3 van der Waals equation of state

It is clear that the same conclusion follows for any equation of state of the form p(T, V, N ) = T · f (V /N ),
where f (V /N ) is an arbitrary function of its argument: the ideal gas law remains valid11 . This is not
true, however, for the van der Waals equation of state,
 
a
p + 2 v − b) = RT , (2.266)
v
where v = NA V /N is the molar volume. We then find (always assuming constant N ),
     
∂E ∂ε ∂p a
= =T −p= 2 , (2.267)
∂V T ∂v T ∂T V v
where E(T, V, N ) ≡ ν ε(T, v). We can integrate this to obtain
a
ε(T, v) = ω(T ) − , (2.268)
v
where ω(T ) is arbitrary. From eqn. 2.33, we immediately have
 
∂ε
cV = = ω ′ (T ) . (2.269)
∂T v

What about cp ? This requires a bit of work. We start with eqn. 2.34,
   
∂ε ∂v
cp = +p
∂T p ∂T p
   (2.270)
a ∂v
= ω ′ (T ) + p + 2
v ∂T p
11
Note V /N = v/NA .
80 CHAPTER 2. THERMODYNAMICS

We next take the differential of the equation of state (at constant N ):


   
a  2a
R dT = p + 2 dv + v − b dp − 3 dv
v v
  (2.271)
a 2ab 
= p − 2 + 3 dv + v − b dp .
v v
We can now read off the result for the volume expansion coefficient,
 
1 ∂v 1 R
αp = = · . (2.272)
v ∂T p v p − va2 + 2ab v3

We now have for cp ,


a

′ p+ v2
R
cp = ω (T ) + a 2ab
p− v2 + v3
(2.273)
R2 T v 3
= ω ′ (T ) + .
RT v 3 − 2a(v − b)2

where v = V NA /N is the molar volume.


To fix ω(T ), we consider the v → ∞ limit, where the density of the gas vanishes. In this limit, the gas
must be ideal, hence eqn. 2.268 says that ω(T ) = 21 f RT . Therefore cV (T, v) = 21 f R, just as in the case
of an ideal gas. However, rather than cp = cV + R, which holds for ideal gases, cp (T, v) is given by eqn.
2.273. Thus,

cVDW
V = 12 f R (2.274)
R2 T v 3
cVDW
p = 21 f R + . (2.275)
RT v 3 − 2a(v − b)2
Note that cp (a → 0) = cV + R, which is the ideal gas result.
As we shall see in chapter 7, the van der Waals system in unstable throughout a region of parameters,
where it undergoes phase separation between high density (liquid) and low density (gas) phases. The
above results are valid only in the stable regions of the phase diagram.

2.11.4 Thermodynamic response functions

Consider the entropy S expressed as a function of T , V , and N :


     
∂S ∂S ∂S
dS = dT + dV + dN . (2.276)
∂T V,N ∂V T,N ∂N T,V

Dividing by dT , multiplying by T , and assuming dN = 0 throughout, we have


   
∂S ∂V
Cp − CV = T . (2.277)
∂V T ∂T p
2.11. APPLICATIONS OF THERMODYNAMICS 81

Appealing to a Maxwell relation derived from F (T, V, N ), and then appealing to eqn. 2.479, we have
       
∂S ∂p ∂p ∂V
= =− . (2.278)
∂V T ∂T V ∂V T ∂T p

This allows us to write    2


∂p ∂V
Cp − CV = −T . (2.279)
∂V T ∂T p

We define the response functions,


 
1 ∂V 1 ∂2G
isothermal compressibility: κT = − =− (2.280)
V ∂p T V ∂p2
 
1 ∂V 1 ∂2H
adiabatic compressibility: κS = − =− (2.281)
V ∂p S V ∂p2
 
1 ∂V
thermal expansivity: αp = . (2.282)
V ∂T p

Thus,
T α2p
Cp − CV = V , (2.283)
κT
or, in terms of intensive quantities,
v T α2p
cp − cV = , (2.284)
κT
where, as always, v = V NA /N is the molar volume.
This above relation generalizes to any conjugate force-displacement pair (−p, V ) → (y, X):
   
∂y ∂X
Cy − CX = −T
∂T X ∂T y
    (2.285)
∂y ∂X 2
=T .
∂X T ∂T y

For example, we could have (y, X) = (H α , M α ).


A similar relationship can be derived between the compressibilities κT and κS . We then clearly must
start with the volume, writing
     
∂V ∂V ∂V
dV = dp + dS + dN . (2.286)
∂p S,N ∂S p,N ∂p S,p

Dividing by dp, multiplying by −V −1 , and keeping N constant, we have


  
1 ∂V ∂S
κT − κS = − . (2.287)
V ∂S p ∂p T
82 CHAPTER 2. THERMODYNAMICS

Again we appeal to a Maxwell relation, writing


   
∂S ∂V
=− , (2.288)
∂p T ∂T p
and after invoking the chain rule,
      
∂V ∂V ∂T T ∂V
= = , (2.289)
∂S p ∂T p ∂S p Cp ∂T p

we obtain
v T α2p
κT − κS = . (2.290)
cp
Comparing eqns. 2.284 and 2.290, we find
(cp − cV ) κT = (κT − κS ) cp = v T α2p . (2.291)
This result entails
cp κ
= T . (2.292)
cV κS
The corresponding result for magnetic systems is
 2
∂m
(cH − cM ) χT = (χT − χS ) cH =T , (2.293)
∂T H

where m = M/ν is the magnetization per mole of substance, and


 
∂M 1 ∂ 2G
isothermal susceptibility: χT = =− (2.294)
∂H T ν ∂H 2
 
∂M 1 ∂2H
adiabatic susceptibility: χS = =− . (2.295)
∂H S ν ∂H 2

Here the enthalpy and Gibbs free energy are


H = E − HM dH = T dS − M dH (2.296)
G = E − T S − HM dG = −S dT − M dH . (2.297)

Remark: The previous discussion has assumed an isotropic magnetic system where M and H are
collinear, hence H ·M = HM .
 
∂M α 1 ∂ 2G
χαβ
T = = − (2.298)
∂H β T ν ∂H α ∂H β
 
∂M α 1 ∂2H
χαβ
S = =− . (2.299)
β
∂H S ν ∂H α ∂H β

In this case, the enthalpy and Gibbs free energy are


H = E − H ·M dH = T dS − M ·dH (2.300)
G = E − T S − H ·M dG = −S dT − M ·dH . (2.301)
2.11. APPLICATIONS OF THERMODYNAMICS 83

2.11.5 Joule effect: free expansion of a gas

Previously we considered the adiabatic free expansion of an ideal gas. We found that Q = W = 0 hence
∆E = 0, which means the process is isothermal, since E = νε(T ) is volume-independent. The entropy
changes, however, since S(E, V, N ) = N kB ln(V /N ) + 21 f N kB ln(E/N ) + N s0 . Thus,
 
V
Sf = Si + N kB ln f . (2.302)
Vi

What happens if the gas is nonideal?


We integrate along a fictitious thermodynamic path connecting initial and final states, where dE = 0
along the path. We have    
∂E ∂E
0 = dE = dV + dT (2.303)
∂V T ∂T V
hence    
∂T (∂E/∂V )T 1 ∂E
=− =− . (2.304)
∂V E (∂E/∂T )V CV ∂V T
We also have      
∂E ∂S ∂p
=T −p=T −p . (2.305)
∂V T ∂V T ∂T V
Thus, "
    #
∂T 1 ∂p
= p−T . (2.306)
∂V E CV ∂T V
Note that the term in square brackets vanishes for any system obeying the ideal gas law. For a nonideal
gas,
ZVf  
∂T
∆T = dV , (2.307)
∂V E
Vi

which is in general nonzero.


Now consider a van der Waals gas, for which
 
a
p + 2 (v − b) = RT .
v

We then have  
∂p a aν 2
p−T =− 2
=− 2 . (2.308)
∂T V v V
In §2.11.3 we concluded that CV = 21 f νR for the van der Waals gas, hence

ZVf  
2aν dV 2a 1 1
∆T = − = − . (2.309)
fR V2 f R vf vi
Vi
84 CHAPTER 2. THERMODYNAMICS

Figure 2.22: In a throttle, a gas is pushed through a porous plug separating regions of different pressure.
The change in energy is the work done, hence enthalpy is conserved during the throttling process.

Thus, if Vf > Vi , we have Tf < Ti and the gas cools upon expansion.
Consider O2 gas with an initial specific volume of vi = 22.4 L/mol, which is the STP value for an ideal gas,
freely expanding to a volume vf = ∞ for maximum cooling. According to table 2.3, a = 1.378 L2 ·bar/mol2 ,
and we have ∆T = −2a/f Rvi = −0.296 K, which is a pitifully small amount of cooling. Adiabatic free
expansion is a very inefficient way to cool a gas.

2.11.6 Throttling: the Joule-Thompson effect

In a throttle, depicted in Fig. 2.22, a gas is forced through a porous plug which separates regions of
different pressures. According to the figure, the work done on a given element of gas is

ZVf ZVi
W = dV pf − dV pi = pf Vf − pi Vi . (2.310)
0 0

Now we assume that the system is thermally isolated so that the gas exchanges no heat with its environ-
ment, nor with the plug. Then Q = 0 so ∆E = −W , and

Ei + pi Vi = Ef + pf Vf (2.311)
Hi = Hf , (2.312)

where H is enthalpy. Thus, the throttling process is isenthalpic. We can therefore study it by defining a
fictitious thermodynamic path along which dH = 0. The, choosing T and p as state variables,
   
∂H ∂H
0 = dH = dT + dp (2.313)
∂T p ∂p T

hence  
∂T (∂H/∂p)T
=− . (2.314)
∂p H (∂H/∂T )p
The numerator on the RHS is computed by writing dH = T dS + V dp and then dividing by dp, to obtain
     
∂H ∂S ∂V
= V +T = V −T . (2.315)
∂p T ∂p T ∂T p
2.11. APPLICATIONS OF THERMODYNAMICS 85

The denominator is
    
∂H ∂H ∂S
=
∂T p ∂S p ∂T p
  (2.316)
∂S
=T = Cp .
∂T p
Thus,
  "   #
∂T 1 ∂v
= T −v
∂p H cp ∂T p
(2.317)
v 
= T αp − 1 ,
cp
1 ∂V

where αp = V ∂T p is the volume expansion coefficient.
From the van der Waals equation of state, we obtain, from eqn. 2.272,
 
T ∂v RT /v v−b
T αp = = a 2ab
= 2 . (2.318)
v ∂T p p − v2 + v3 v − 2a v−b RT v
a
Assuming v ≫ , b, we have
RT    
∂T 1 2a
= −b . (2.319)
∂p H cp RT
 
2a
Thus, for T > T ∗ = Rb , we have ∂T
∂p H < 0 and the gas heats up upon an isenthalpic pressure decrease.

For T < T , the gas cools under such conditions.
∗ for the van der Waals gas. To see this, we set T α = 1,
In fact, there are two inversion temperatures T1,2 p
which is the criterion for inversion. From eqn. 2.318 it is easy to derive
r
b bRT
=1− . (2.320)
v 2a
We insert this into the van der Waals equation of state to derive a relationship T = T ∗ (p) at which
T αp = 1 holds. After a little work, we find
r
3RT 8aRT a
p=− + 3
− 2 . (2.321)
2b b b
This is a quadratic equation for T , the solution of which is
r !2
∗ 2a 3b2 p
T (p) = 2± 1− . (2.322)
9 bR a
 
∂T
In Fig. 2.23 we plot pressure versus temperature in scaled units, showing the curve along which ∂p H =
0. The volume, pressure, and temperature scales defined are
a 8a
vc = 3b , pc = , Tc = . (2.323)
27 b2 27 bR
86 CHAPTER 2. THERMODYNAMICS

Figure 2.23: Inversion temperature T ∗ (p) for the van der Waals gas. Pressure and temperature are given
in terms of pc = a/27b2 and Tc = 8a/27bR, respectively.

Values for pc , Tc , and vc are provided in table 2.3. If we define v = v/vc , p = p/pc , and T = T /Tc , then
the van der Waals equation of state may be written in dimensionless form:
 
3
p + 2 3v − 1) = 8T . (2.324)
v
 
∂T
In terms of the scaled parameters, the equation for the inversion curve ∂p H = 0 becomes
 q 2  q 2
p = 9 − 36 1 − 13 T ⇐⇒ T = 3 1 ± 1 − 19 p . (2.325)

Thus, there is no inversion for p > 9 pc . We are usually interested in the upper inversion temperature, T2∗ ,
corresponding to the upper sign in eqn. 2.322. The maximum inversion temperature occurs for p = 0,
2a

where Tmax = bR = 27 ∗
4 Tc . For H2 , from the data in table 2.3, we find Tmax (H2 ) = 224 K, which is within
10% of the experimentally measured value of 205 K.
 
What happens when H2 gas leaks from a container with T > T2∗ ? Since ∂T ∂p H < 0 and ∆p < 0, we have
∆T > 0. The gas warms up, and the heat facilitates the reaction 2 H2 + O2 −→ 2 H2 O, which releases
energy, and we have a nice explosion.

2.12 Phase Transitions and Phase Equilibria

A typical phase diagram of a p-V -T system is shown in the Fig. 2.24(a). The solid lines delineate
boundaries between distinct thermodynamic phases. These lines are called coexistence curves. Along
2.12. PHASE TRANSITIONS AND PHASE EQUILIBRIA 87

(a) (b) (c)


generic
substance
p
pressure

temperature T 3He 4He

Figure 2.24: (a) Typical thermodynamic phase diagram of a single component p-V -T system, showing
triple point (three phase coexistence) and critical point. (Source: Univ. of Helsinki.) Also shown: phase
diagrams for 3 He (b) and 4 He (c). What a difference a neutron makes! (Source: Brittanica.)

these curves, we can have coexistence of two phases, and the thermodynamic potentials are singular. The
order of the singularity is often taken as a classification of the phase transition. I.e. if the thermodynamic
potentials E, F , G, and H have discontinuous or divergent mth derivatives, the transition between the
respective phases is said to be mth order . Modern theories of phase transitions generally only recognize
two possibilities: first order transitions, where the order parameter changes discontinuously through
the transition, and second order transitions, where the order parameter vanishes continuously at the
boundary from ordered to disordered phases12 . We’ll discuss order parameters during Physics 140B.
For a more interesting phase diagram, see Fig. 2.24(b,c), which displays the phase diagrams for 3 He and
4 He. The only difference between these two atoms is that the former has one fewer neutron: (2p + 1n +

2e) in 3 He versus (2p + 2n + 2e) in 4 He. As we shall learn when we study quantum statistics, this extra
neutron makes all the difference, because 3 He is a fermion while 4 He is a boson.

2.12.1 p-v-T surfaces

The equation of state for a single component system may be written as

f (p, v, T ) = 0 . (2.326)

This may in principle be inverted to yield p = p(v, T ) or v = v(T, p) or T = T (p, v). The single constraint
f (p, v, T ) on the three state variables defines a surface in {p, v, T } space. An example of such a surface
is shown in Fig. 2.25, for the ideal gas.
Real p-v-T surfaces are much richer than that for the ideal gas, because real systems undergo phase
transitions in which thermodynamic properties are singular or discontinuous along certain curves on the
p-v-T surface. An example is shown in Fig. 2.26. The high temperature isotherms resemble those of the
ideal gas, but as one cools below the critical temperature Tc , the isotherms become singular. Precisely
12
Some exotic phase transitions in quantum matter, which do not quite fit the usual classification schemes, have recently
been proposed.
88 CHAPTER 2. THERMODYNAMICS

Figure 2.25: The surface p(v, T ) = RT /v corresponding to the ideal gas equation of state, and its
projections onto the (p, T ), (p, v), and (T, v) planes.

at T = Tc , the isotherm p = p(v, Tc ) becomes perfectly horizontal at v = vc , which is the critical molar

volume. This means that the isothermal compressibility, κT = − v1 ∂v ∂p T diverges at T = Tc . Below Tc ,
the isotherms have a flat portion, as shown in Fig. 2.28, corresponding to a two-phase region where liquid
and vapor coexist. In the (p, T ) plane, sketched for H2 O in Fig. 2.4 and shown for CO2 in Fig. 2.29,
this liquid-vapor phase coexistence occurs along a curve, called the vaporization (or boiling) curve. The
density changes discontinuously across this curve; for H2 O, the liquid is approximately 1000 times denser
than the vapor at atmospheric pressure. The density discontinuity vanishes at the critical point. Note
that one can continuously transform between liquid and vapor phases, without encountering any phase
transitions, by going around the critical point and avoiding the two-phase region.
In addition to liquid-vapor coexistence, solid-liquid and solid-vapor coexistence also occur, as shown in
Fig. 2.26. The triple point (Tt , pt ) lies at the confluence of these three coexistence regions. For H2 O, the
location of the triple point and critical point are given by

Tt = 273.16 K Tc = 647 K
−3
pt = 611.7 Pa = 6.037 × 10 atm pc = 22.06 MPa = 217.7 atm
2.12. PHASE TRANSITIONS AND PHASE EQUILIBRIA 89

Figure 2.26: A p-v-T surface for a substance which contracts upon freezing. The red dot is the critical
point and the red dashed line is the critical isotherm. The yellow dot is the triple point at which there
is three phase coexistence of solid, liquid, and vapor.

2.12.2 The Clausius-Clapeyron relation

Recall that the homogeneity of E(S, V, N ) guaranteed E = T S − pV + µN , from Euler’s theorem. It also
guarantees a relation between the intensive variables T , p, and µ, according to eqn. 2.148. Let us define
g ≡ G/ν = NA µ, the Gibbs free energy per mole. Then

dg = −s dT + v dp , (2.327)

where s = S/ν and v = V /ν are the molar entropy and molar volume, respectively. Along a coexistence
curve between phase #1 and phase #2, we must have g1 = g2 , since the phases are free to exchange
energy and particle number, i.e. they are in thermal and chemical equilibrium. This means

dg1 = −s1 dT + v1 dp = −s2 dT + v2 dp = dg2 . (2.328)

Therefore, along the coexistence curve we must have


 
dp s − s1 ℓ
= 2 = , (2.329)
dT coex v2 − v1 T ∆v
where
ℓ ≡ T ∆s = T (s2 − s1 ) (2.330)
is the molar latent heat of transition. A heat ℓ must be supplied in order to change from phase #1 to
phase #2, even without changing p or T . If ℓ is the latent heat per mole, then we write ℓ̃ as the latent
heat per gram: ℓ̃ = ℓ/M , where M is the molar mass.
90 CHAPTER 2. THERMODYNAMICS

Figure 2.27: Equation of state for a substance which expands upon freezing, projected to the (v, T ) and
(v, p) and (T, p) planes.

Along the liquid-gas coexistence curve, we typically have vgas ≫ vliquid , and assuming the vapor is ideal,
we may write ∆v ≈ vgas ≈ RT /p. Thus,
 
dp ℓ pℓ
= ≈ . (2.331)
dT liq−gas T ∆v RT 2

If ℓ remains constant throughout a section of the liquid-gas coexistence curve, we may integrate the above
equation to get
dp ℓ dT
= =⇒ p(T ) = p(T0 ) eℓ/RT0 e−ℓ/RT . (2.332)
p R T2

2.12.3 Liquid-solid line in H2 O

Life on planet earth owes much of its existence to a peculiar property of water: the solid is less dense
than the liquid along the coexistence curve. For example at T = 273.1 K and p = 1 atm,

ṽwater = 1.00013 cm 3 /g , ṽice = 1.0907 cm 3 /g . (2.333)

The latent heat of the transition is ℓ̃ = 333 J/g = 79.5 cal/g. Thus,
 
dp ℓ̃ 333 J/g
= =
dT liq−sol T ∆ṽ (273.1 K) (−9.05 × 10−2 cm3 /g)
(2.334)
dyn
8 atm
= −1.35 × 10 2
= −134 ◦ .
cm K C
2.12. PHASE TRANSITIONS AND PHASE EQUILIBRIA 91

Figure 2.28: Projection of the p-v-T surface of Fig. 2.26 onto the (v, p) plane.

The negative slope of the melting curve is invoked to explain the movement of glaciers: as glaciers slide
down a rocky slope, they generate enormous pressure at obstacles13 Due to this pressure, the story goes,
the melting temperature decreases, and the glacier melts around the obstacle, so it can flow past it, after
which it refreezes. But it is not the case that the bottom of the glacier melts under the pressure, for
consider a glacier of height h = 1 km. The pressure at the bottom is p ∼ gh/ṽ ∼ 107 Pa, which is only
about 100 atmospheres. Such a pressure can produce only a small shift in the melting temperature of
about ∆Tmelt = −0.75◦ C.
Does the Clausius-Clapeyron relation explain how we can skate on ice? When my daughter was seven
years old, she had a mass of about M = 20 kg. Her ice skates had blades of width about 5 mm and length
about 10 cm. Thus, even on one foot, she imparted an additional pressure of only

Mg 20 kg × 9.8 m/s2
∆p = ≈ = 3.9 × 105 Pa = 3.9 atm . (2.335)
A (5 × 10−3 m) × (10−1 m)

The corresponding change in the melting temperature is thus minuscule: ∆Tmelt ≈ −0.03◦ C.
So why could my daughter skate so nicely? The answer isn’t so clear!14 There seem to be two relevant
issues in play. First, friction generates heat which can locally melt the surface of the ice. Second, the
surface of ice, and of many solids, is naturally slippery. Indeed, this is the case for ice even if one is
standing still, generating no frictional forces. Why is this so? It turns out that the Gibbs free energy
of the ice-air interface is larger than the sum of free energies of ice-water and water-air interfaces. That
is to say, ice, as well as many simple solids, prefers to have a thin layer of liquid on its surface, even at
13
The melting curve has a negative slope at relatively low pressures, where the solid has the so-called Ih hexagonal crystal
structure. At pressures above about 2500 atmospheres, the crystal structure changes, and the slope of the melting curve
becomes positive.
14
For a recent discussion, see R. Rosenberg, Physics Today 58, 50 (2005).
92 CHAPTER 2. THERMODYNAMICS

temperatures well below its bulk melting point. If the intermolecular interactions are not short-ranged15 ,
theory predicts a surface melt thickness d ∝ (Tm − T )−1/3 . In Fig. 2.30 we show measurements by Gilpin
(1980) of the surface melt on ice, down to about −50◦ C. Near 0◦ C the melt layer thickness is about
40 nm, but this decreases to ∼ 1 nm at T = −35◦ C. At very low temperatures, skates stick rather than
glide. Of course, the skate material is also important, since that will affect the energetics of the second
interface. The 19th century novel, Hans Brinker, or The Silver Skates by Mary Mapes Dodge tells the
story of the poor but stereotypically decent and hardworking Dutch boy Hans Brinker, who dreams of
winning an upcoming ice skating race, along with the top prize: a pair of silver skates. All he has are some
lousy wooden skates, which won’t do him any good in the race. He has money saved to buy steel skates,
but of course his father desperately needs an operation because – I am not making this up – he fell off a
dike and lost his mind. The family has no other way to pay for the doctor. What a story! At this point,
I imagine the suspense must be too much for you to bear, but this isn’t an American Literature class, so
you can use Google to find out what happens (or rent the 1958 movie, directed by Sidney Lumet). My
point here is that Hans’ crappy wooden skates can’t compare to the metal ones, even though the surface
melt between the ice and the air is the same. The skate blade material also makes a difference, both for
the interface energy and, perhaps more importantly, for the generation of friction as well.

2.12.4 Slow melting of ice : a quasistatic but irreversible process

Suppose we have an ice cube initially at temperature T0 < Θ ≡ 273.15 K (i.e. Θ = 0◦ C) and we toss it
into a pond of water. We regard the pond as a heat bath at some temperature T1 > Θ. Let the mass of
the ice be M . How much heat Q is absorbed by the ice in order to raise its temperature to T1 ? Clearly
Q = M c̃S (Θ − T0 ) + M ℓ̃ + M c̃L (T1 − Θ) , (2.336)
where c̃S and c̃L are the specific heats of ice (solid) and water (liquid), respectively16 , and ℓ̃ is the latent
heat of melting per unit mass. The pond must give up this much heat to the ice, hence the entropy of
the pond, discounting the new water which will come from the melted ice, must decrease:
Q
∆Spond = − . (2.337)
T1

Now we ask what is the entropy change of the H2 O in the ice. We have
Z ZΘ ZT1
dQ
¯ M c̃S M ℓ̃ M c̃L
∆Sice = = dT + + dT
T T Θ T
T0 Θ (2.338)
   
Θ M ℓ̃ T
= M c̃S ln + + M c̃L ln 1 .
T0 Θ Θ
The total entropy change of the system is then
∆Stotal = ∆Spond + ∆Sice
         
Θ Θ − T0 1 1 T1 T1 − Θ (2.339)
= M c̃S ln − M c̃S + M ℓ̃ − + M c̃L ln − M c̃L
T0 T1 Θ T1 Θ T1
15
For example, they could be of the van der Waals form, due to virtual dipole fluctuations, with an attractive 1/r 6 tail.
16
We assume c̃S (T ) and c̃L (T ) have no appreciable temperature dependence, and we regard them both as constants.
2.12. PHASE TRANSITIONS AND PHASE EQUILIBRIA 93

Figure 2.29: Phase diagram for CO2 in the (p, T ) plane. (Source: www.scifun.org.)

Now since T0 < Θ < T1 , we have


   
Θ − T0 Θ − T0
M c̃S < M c̃S . (2.340)
T1 Θ

Therefore,  
1 1  
∆S > M ℓ̃ − + M c̃S f T0 /Θ + M c̃L f Θ/T1 , (2.341)
Θ T1
where f (x) = x − 1 − ln x. Clearly f ′ (x) = 1 − x−1 is negative on the interval (0, 1), which means that
the maximum of f (x) occurs at x = 0 and the minimum at x = 1. But f (0) = ∞ and f (1) = 0, which
means that f (x) ≥ 0 for x ∈ [0, 1]. Since T0 < Θ < T1 , we conclude ∆Stotal > 0.

2.12.5 Gibbs phase rule

Equilibrium between two phases means that p, T , and µ(p, T ) are identical. From

µ1 (p, T ) = µ2 (p, T ) , (2.342)

we derive an equation for the slope of the coexistence curve, the Clausius-Clapeyron relation. Note that
we have one equation in two unknowns (T, p), so the solution set is a curve. For three phase coexistence,
we have
µ1 (p, T ) = µ2 (p, T ) = µ3 (p, T ) , (2.343)
which gives us two equations in two unknowns. The solution is then a point (or a set of points). A critical
point also is a solution of two simultaneous equations:

critical point =⇒ v1 (p, T ) = v2 (p, T ) , µ1 (p, T ) = µ2 (p, T ) . (2.344)


94 CHAPTER 2. THERMODYNAMICS

Figure 2.30: Left panel: data from R. R. Gilpin, J. Colloid Interface Sci. 77, 435 (1980) showing
measured thickness of the surface melt on ice at temperatures below 0◦ C. The straight line has slope − 31 ,
as predicted by theory. Right panel: phase diagram of H2 O, showing various high pressure solid phases.
(Source : Physics Today, December 2005).

∂µ 
Recall v = NA ∂p T . Note that there can be no four phase coexistence for a simple p-V -T system.
Now for the general result. Suppose we have σ species, with particle numbers Na , where a = 1, . . . , σ. It is
useful to briefly recapitulate the derivation of the Gibbs-Duhem relation. The energy E(S, V, N1 , . . . , Nσ )
is a homogeneous function of degree one:

E(λS, λV, λN1 , . . . , λNσ ) = λE(S, V, N1 , . . . , Nσ ) . (2.345)

From Euler’s theorem for homogeneous functions (just differentiate with respect to λ and then set λ = 1),
we have

E = TS − pV + µ a Na . (2.346)
a=1

Taking the differential, and invoking the First Law,


σ
X
dE = T dS − p dV + µa dNa , (2.347)
a=1

we arrive at the relation


σ
X
S dT − V dp + Na dµa = 0 , (2.348)
a=1

of which eqn. 2.147 is a generalization to additional internal ‘work’ variables. This says that the σ + 2
quantities (T, p, µ1 , . . . , µσ ) are not all independent. We can therefore write

µσ = µσ T, p, µ1 , . . . , µσ−1 . (2.349)
2.12. PHASE TRANSITIONS AND PHASE EQUILIBRIA 95

(j)
If there are ϕ different phases, then in each phase j, with j = 1, . . . , ϕ, there is a chemical potential µa
for each species a. We then have
 
(j) (j)
µ(j)
σ = µσ
(j)
T, p, µ1 , . . . , µσ−1 . (2.350)

(j)
Here µa is the chemical potential of the ath species in the j th phase. Thus, there are ϕ such equations
 (j) 
relating the 2 + ϕ σ variables T, p, µa , meaning that only 2 + ϕ (σ − 1) of them may be chosen as
independent. This, then, is the dimension of ‘thermodynamic space’ containing a maximal number of
intensive variables:
dTD (σ, ϕ) = 2 + ϕ (σ − 1) . (2.351)
To completely specify the state of our system, we of course introduce
Pσ a single extensive variable, such
as the total volume V . Note that the total particle number N = a=1 Na may not be conserved in the
presence of chemical reactions!
Now suppose we have equilibrium among ϕ phases. We have implicitly assumed thermal and mechanical
equilibrium among all the phases, meaning that p and T are constant. Chemical equilibrium applies on
a species-by-species basis. This means
(j ′ )
µ(j)
a = µa (2.352)
where j, j ′ ∈ {1, . . . , ϕ}. This gives σ(ϕ − 1) independent equations equations17 . Thus, we can have phase
equilibrium among the ϕ phases of σ species over a region of dimension

dPE (σ, ϕ) = 2 + ϕ (σ − 1) − σ (ϕ − 1)
(2.353)
= 2+σ−ϕ .

Since dPE ≥ 0, we must have ϕ ≤ σ + 2. Thus, with two species (σ = 2), we could have at most four
phase coexistence.
If the various species can undergo ρ distinct chemical reactions of the form
(r) (r)
ζ1 A1 + ζ2 A2 + · · · + ζσ(r) Aσ = 0 , (2.354)

(r)
where Aa is the chemical formula for species a, and ζa is the stoichiometric coefficient for the ath species
in the r th reaction, with r = 1, . . . , ρ, then we have an additional ρ constraints of the form
σ
X
ζa(r) µ(j)
a =0 . (2.355)
a=1

Therefore,
dPE (σ, ϕ, ρ) = 2 + σ − ϕ − ρ . (2.356)
One might ask what value of j are we to use in eqn. 2.355, or do we in fact have ϕ such equations for
each r? The answer is that eqn. 2.352 guarantees that the chemical potential of species a is the same in
all the phases, hence it doesn’t matter what value one chooses for j in eqn. 2.355.
17
Set j = 1 and let j ′ range over the ϕ − 1 values 2, . . . , ϕ.
96 CHAPTER 2. THERMODYNAMICS


Let us assume that no reactions take place, i.e. ρ = 0, so the total number of particles b=1 Nb is
(j)
conserved. Instead of choosing (T, p, µ1 , . . . , µσ−1 ) as dTD intensive variables, we could have chosen
(j)
(T, p, µ1 , . . . , xσ−1 ), where xa = Na /N is the concentration of species a.
Why do phase diagrams in the (p, v) and (T, v) plane look different than those in the (p, T ) plane?18
For example, Fig. 2.27 shows projections of the p-v-T surface of a typical single component substance
onto the (T, v), (p, v), and (p, T ) planes. Coexistence takes place along curves in the (p, T ) plane, but in
extended two-dimensional regions in the (T, v) and (p, v) planes. The reason that p and T are special is
that temperature, pressure, and chemical potential must be equal throughout an equilibrium phase if it
is truly in thermal, mechanical, and chemical equilibrium. This is not the case for an intensive variable
such as specific volume v = NA V /N or chemical concentration xa = Na /N .

2.13 Entropy of Mixing and the Gibbs Paradox

2.13.1 Computing the entropy of mixing

Entropy is widely understood as a measure of disorder. Of course, such a definition should be supple-
mented by a more precise definition of disorder – after all, one man’s trash is another man’s treasure.
To gain
P some intuition about entropy, let us explore the mixing of a multicomponent ideal gas. Let
N = a Na be the total
P number of particles of all species, and let xa = Na /N be the concentration of
species a. Note that a xa = 1.
For any substance obeying the ideal gas law pV = N kB T , the entropy is

S(T, V, N ) = N kB ln(V /N ) + N φ(T ) , (2.357)

∂S
 ∂p  Nk
since ∂V T,N
= ∂T V,N
= V B . Note that in eqn. 2.357 we have divided V by N before taking the
logarithm. This is essential if the entropy is to be an extensive function (see §2.7.5). One might think
that the configurational entropy of an ideal gas should scale as ln(V N ) = N ln V , since each particle can
be anywhere in the volume V . However, if the particles are indistinguishable, then permuting the particle
labels does not result in a distinct configuration, and so the configurational entropy is proportional to
ln(V N /N !) ∼ N ln(V /N ) − N . The origin of this indistinguishability factor will become clear when we
discuss the quantum mechanical formulation of statistical mechanics. For now, note that such a correction
is necessary in order that the entropy be an extensive function.
If we did not include this factor and instead wrote S ∗ (T, V, N ) = N kB ln V + N φ(T ), then we would
find S ∗ (T, V, N ) − 2S ∗ (T, 21 V, 21 N ) = N kB ln 2, i.e. the total entropy of two identical systems of particles
separated by a barrier will increase if the barrier is removed and they are allowed to mix. This seems
absurd, though, because we could just as well regard the barriers as invisible. This is known as the Gibbs
paradox . The resolution of the Gibbs paradox is to include the indistinguishability correction, which
renders S extensive, in which case S(T, V, N ) = 2S(T, 21 V, 21 N ).

18
The same can be said for multicomponent systems: the phase diagram in the (T, x) plane at constant p looks different
than the phase diagram in the (T, µ) plane at constant p.
2.13. ENTROPY OF MIXING AND THE GIBBS PARADOX 97

Figure 2.31: A multicomponent system consisting of isolated gases, each at temperature T and pressure
p. Then system entropy increases when all the walls between the different subsystems are removed.

Consider now the situation in Fig. 2.31, where we have separated the different components into their own
volumes Va . Let the pressure and temperature be the same everywhere, so pVa = Na kB T . The entropy
of the unmixed system is then
X Xh i
Sunmixed = Sa = Na kB ln(Va /Na ) + Na φa (T ) . (2.358)
a a

Now let us imagine removing all the barriers separating the different gases and letting the particles mix
thoroughly. The result is that each component gas occupies the full volume V , so the entropy is
X Xh i
Smixed = Sa = Na kB ln(V /Na ) + Na φa (T ) . (2.359)
a a

Thus, the entropy of mixing is


∆Smix = Smixed − Sunmixed
X X (2.360)
= Na kB ln(V /Va ) = −N kB xa ln xa ,
a a
Na Va
where xa = N = V is the fraction of species a. Note that ∆Smix ≥ 0.
What if all the components were initially identical? It seems absurd that the entropy should increase
simply by removing some invisible barriers. This is again the Gibbs paradox. In this case, the resolution
of the paradox is to note that the sum in the expression for Smixed is a sum over distinct species. Hence
if the particles are all identical, we have Smixed = N kB ln(V /N ) + N φ(T ) = Sunmixed , hence ∆Smix = 0.

2.13.2 Entropy and combinatorics

As we shall learn when we study statistical mechanics, the entropy may be interpreted in terms of the
number of ways W (E, V, N ) a system at fixed energy and volume can arrange itself. One has
S(E, V, N ) = kB ln W (E, V, N ) . (2.361)
98 CHAPTER 2. THERMODYNAMICS

Figure 2.32: Mixing among three different species of particles. The mixed configuration has an additional
entropy, ∆Smix .

Consider a system consisting of σ different species of particles. Now let it be that for each species label
a, Na particles of that species are confined among Qa little boxes such that at most one particle can fit
in a box (see Fig. 2.32). How many ways W are there to configure N identical particles among Q boxes?
Clearly  
Q Q!
W = = . (2.362)
N N ! (Q − N )!
Q!
Were the particles distinct, we’d have Wdistinct = (Q−N )! , which is N ! times greater. This is because
permuting distinct particles results in a different configuration, and there are N ! ways to permute N
particles.
 
The entropy for species a is then Sa = kB ln Wa = kB ln Q N
a . We then use Stirling’s approximation,
a

ln(K!) = K ln K − K + 21 ln K + 12 ln(2π) + O(K −1 ) , (2.363)

which is an asymptotic expansion valid for K ≫ 1. One then finds for Q, N ≫ 1, with x = N/Q ∈ [0, 1],
       
Q 
ln = Q ln Q − Q − xQ ln(xQ) − xQ − (1 − x)Q ln (1 − x)Q − (1 − x)Q
N
h i
= −Q x ln x + (1 − x) ln(1 − x) . (2.364)

This is valid up to terms of order Q in Stirling’s expansion. Since ln Q ≪ Q, the next term is small and
we are safe to stop here. Summing up the contributions from all the species, we get
σ
X σ
X h i
Sunmixed = kB ln Wa = −kB Qa xa ln xa + (1 − xa ) ln(1 − xa ) , (2.365)
a=1 a=1

where xa = Na /Qa is the initial dimensionless density of species a.


Now let’s remove all the partitions between
P the different species so that each of the particles is free to
explore all of the boxes. There are Q = a Qa boxes in all. The total number of ways of placing N1
2.13. ENTROPY OF MIXING AND THE GIBBS PARADOX 99

particles of species a = 1 through Nσ particles of species σ is

Q!
Wmixed = , (2.366)
N0 ! N1 ! · · · Nσ !

where N0 = Q − a=1 Na is the number of vacant boxes. Again using Stirling’s rule, we find
σ
X
Smixed = −kB Q x
ea ln x
ea , (2.367)
a=0

where x
ea = Na /Q is the fraction of all boxes containing a particle of species a, and N0 is the number of
empty boxes. Note that
N N Q
ea = a = a · a = xa fa ,
x (2.368)
Q Qa Q
P
where fa ≡ Qa /Q. Note that σa=1 fa = 1.
Qa Na
Let’s assume all the densities are initially the same, so xa = x∀a, so x
ea = x fa . In this case, fa = Q = N
is the fraction of species a among all the particles. We then have x e0 = 1 − x, and
σ
X
Smixed = −kB Q xfa ln(xfa ) − kB Q x
e0 ln x
e0
a=1
h i σ (2.369)
X
= −kB Q x ln x + (1 − x) ln(1 − x) − kB x Q fa ln fa .
a=1

Thus, the entropy of mixing is


σ
X
∆Smix = −N kB fa ln fa , (2.370)
a=1
P
where N = σa=1 Na is the total number of particles among all species (excluding vacancies) and fa =
Na /(N + N0 ) is the fraction of all boxes occupied by species a.

2.13.3 Weak solutions and osmotic pressure

Suppose one of the species is much more plentiful than all the others, and label it with a = 0. We will
call this the solvent. The entropy of mixing is then
"   X σ  #
N0 Na
∆Smix = −kB N0 ln + Na ln , (2.371)
N0 + N ′ N0 + N ′
a=1

where N ′ = a=1 Na is the total number of solvent molecules, summed over all species. We assume the
solution is weak , which means Na ≤ N ′ ≪ N0 . Expanding in powers of N ′ /N0 and Na /N0 , we find
σ
"   #
X Na 2 
∆Smix = −kB Na ln − Na + O N ′ /N0 . (2.372)
N0
a=1
100 CHAPTER 2. THERMODYNAMICS

Consider now a solution consisting of N0 molecules of a solvent and Na molecules of species a of solute,
where a = 1, . . . , σ. We begin by expanding the Gibbs free energy G(T, p, N0 , N1 , . . . , Nσ ), where there
are σ species of solutes, as a power series in the small quantities Na . We have
X  
 Na
G T, p, N0 , {Na } = N0 g0 (T, p) + kB T Na ln
a
eN0
X (2.373)
1 X
+ Na ψa (T, p) + Aab (T, p) Na Nb .
a
2N0
a,b

The first term on the RHS corresponds to the Gibbs free energy of the solvent. The second term is due
to the entropy of mixing. The third term is the contribution to the total free energy from the individual
species. Note the factor of e in the denominator inside the logarithm, which accounts for the second term
in the brackets on the RHS of eqn. 2.372. The last term is due to interactions between the species; it is
truncated at second order in the solute numbers.
The chemical potential for the solvent is
∂G X X
µ0 (T, p) = = g0 (T, p) − kB T xa − 12 Aab (T, p) xa xb , (2.374)
∂N0 a a,b

and the chemical potential for species a is


∂G X
µa (T, p) = = kB T ln xa + ψa (T, p) + Aab (T, p) xb , (2.375)
∂Na
b

where xa = Na /N0 is the concentrations of solute species a. By assumption,


P the last term on the RHS
of each of these equations is small, since Nsolute ≪ N0 , where Nsolute = σa=1 Na is the total number of
solute molecules. To lowest order, then, we have
µ0 (T, p) = g0 (T, p) − x kB T (2.376)
µa (T, p) = kB T ln xa + ψa (T, p) , (2.377)
P
where x = a xa is the total solute concentration.
If we add sugar to a solution confined by a semipermeable membrane19 , the pressure increases! To
see why, consider a situation where a rigid semipermeable membrane separates a solution (solvent plus
solutes) from a pure solvent. There is energy exchange through the membrane, so the temperature is T
throughout. There is no volume exchange, however: dV = dV ′ = 0, hence the pressure need not be the
same. Since the membrane is permeable to the solvent, we have that the chemical potential µ0 is the
same on each side. This means
g0 (T, pR ) − xkB T = g0 (T, pL ) , (2.378)
where pL,R is the pressure on the left and right sides of the membrane, and x = N/N0 is again the
total solute concentration. This equation once again tells us that the pressure p cannot be the same on
both sides of the membrane. If the pressure difference is small, we can expand in powers of the osmotic
pressure, π ≡ pR − pL , and we find  
∂µ0
π = x kB T . (2.379)
∂p T
19
‘Semipermeable’ in this context means permeable to the solvent but not the solute(s).
2.13. ENTROPY OF MIXING AND THE GIBBS PARADOX 101

Figure 2.33: Osmotic pressure causes the column on the right side of the U-tube to rise higher than the
column on the left by an amount ∆h = π/̺ g.

But a Maxwell relation (§2.9) guarantees


   
∂µ ∂V
= = v(T, p)/NA , (2.380)
∂p T,N ∂N T,p

where v(T, p) is the molar volume of the solvent.

πv = xRT , (2.381)

which looks very much like the ideal gas law, even though we are talking about dense (but ‘weak’)
solutions! The resulting pressure has a demonstrable effect, as sketched in Fig. 2.33. Consider a solution
containing ν moles of sucrose (C12 H22 O11 ) per kilogram (55.52 mol) of water at 30◦ C. We find π = 2.5 atm
when ν = 0.1.
One might worry about the expansion in powers of π when π is much larger than the ambient pressure.
But in fact the next term in the expansion is smaller than the first term by a factor of πκT , where κT is
the isothermal compressibility. For water one has κT ≈ 4.4 × 10−5 (atm)−1 , hence we can safely ignore
the higher order terms in the Taylor expansion.

2.13.4 Effect of impurities on boiling and freezing points

Along the coexistence curve separating liquid and vapor phases, the chemical potentials of the two phases
are identical:
µ0L (T, p) = µ0V (T, p) . (2.382)
Here we write µ0 for µ to emphasize that we are talking about a phase with no impurities present. This
equation provides a single constraint on the two variables T and p, hence one can, in principle, solve to
obtain T = T0∗ (p), which is the equation of the liquid-vapor coexistence curve in the (T, p) plane. Now
suppose there is a solute present in the liquid. We then have

µL (T, p, x) = µ0L (T, p) − xkB T , (2.383)


102 CHAPTER 2. THERMODYNAMICS

Latent Heat Melting Latent Heat of Boiling


Substance of Fusion ℓ̃f Point Vaporization ℓ̃v Point
J/g ◦C J/g ◦C

C2 H5 OH 108 -114 855 78.3


NH3 339 -75 1369 -33.34
CO2 184 -57 574 -78
He – – 21 -268.93
H 58 -259 455 -253
Pb 24.5 372.3 871 1750
N2 25.7 -210 200 -196
O2 13.9 -219 213 -183
H2 O 334 0 2270 100

Table 2.4: Latent heats of fusion and vaporization at p = 1 atm.

where x is the dimensionless solute concentration, summed over all species. The condition for liquid-vapor
coexistence now becomes
µ0L (T, p) − xkB T = µ0V (T, p) . (2.384)
This will lead to a shift in theboiling temperature at fixed p. Assuming this shift is small, let us expand
to lowest order in T − T0∗ (p) , writing
   
∂µ0L  ∂µ0V 
µL (T0∗ , p)
0
+ T− T0∗ − xkB T = µV (T0∗ , p)
0
+ T − T0∗ . (2.385)
∂T p ∂T p

Note that    
∂µ ∂S
=− (2.386)
∂T p,N ∂N T,p

from a Maxwell relation deriving from exactness of dG. Since S is extensive, we can write S =
(N/NA ) s(T, p), where s(T, p) is the molar entropy. Solving for T , we obtain
 2
∗ xR T0∗ (p)
T (p, x) = T0∗ (p) + , (2.387)
ℓv (p)

where ℓv = T0∗ · (sV − sL ) is the latent heat of the liquid-vapor transition20 . The shift ∆T ∗ = T ∗ − T0∗ is
called the boiling point elevation.
As an example, consider seawater, which contains approximately 35 g of dissolved Na+ Cl− per kilogram of
H2 O. The atomic masses of Na and Cl are 23.0 and 35.4, respectively, hence the total ionic concentration
in seawater (neglecting everything but sodium and chlorine) is given by

2 · 35 1000
x= ≈ 0.022 . (2.388)
23.0 + 35.4 18
20
We shall discuss latent heat again in §2.12.2 below.
2.13. ENTROPY OF MIXING AND THE GIBBS PARADOX 103

The latent heat of vaporization of H2 O at atmospheric pressure is ℓ = 40.7 kJ/mol, hence

(0.022)(8.3 J/mol K)(373 K)2


∆T ∗ = ≈ 0.6 K . (2.389)
4.1 × 104 J/mol
Put another way, the boiling point elevation of H2 O at atmospheric pressure is about 0.28◦ C per percent
solute. We can express this as ∆T ∗ = Km, where the molality m is the number of moles of solute per
kilogram of solvent. For H2 O, we find K = 0.51◦ C kg/mol.
Similar considerations apply at the freezing point, when we equate the chemical potential of the solvent
plus solute to that of the pure solid. The latent heat of fusion for H2 O is about ℓf = Tf0 ·(sLIQUID −sSOLID) =
 2
6.01 kJ/mol21 We thus predict a freezing point depression of ∆T ∗ = −xR T0∗ /ℓf = 1.03◦ C · x[%]. This
can be expressed once again as ∆T ∗ = −Km, with K = 1.86◦ C kg/mol22 .

2.13.5 Binary solutions

Consider a binary solution, and write the Gibbs free energy G(T, p, NA , NB ) as
 
0 0 NA
G(T, p, NA , NB ) = NA µA (T, p) + NB µB (T, p) + NA kB T ln
NA + NB
  (2.390)
NB NA NB
+ NB kB T ln +λ .
NA + NB NA + NB

The first four terms on the RHS represent the free energy of the individual component fluids and the
entropy of mixing. The last term is an interaction contribution. With λ > 0, the interaction term prefers
that the system be either fully A or fully B. The entropy contribution prefers a mixture, so there is a
competition. What is the stable thermodynamic state?
It is useful to write the Gibbs free energy per particle, g(T, p, x) = G/(NA + NB ), in terms of T , p, and
the concentration x ≡ xB = NB /(NA + NB ) of species B (hence xA = 1 − x is the concentration of species
A). Then
h i
g(T, p, x) = (1 − x) µ0A + x µ0B + kB T x ln x + (1 − x) ln(1 − x) + λ x (1 − x) . (2.391)

In order for the system to be stable against phase separation into relatively A-rich and B-rich regions, we
must have that g(T, p, x) be a convex function of x. Our first check should be for a local instability, i.e.
spinodal decomposition. We have
 
∂g 0 0 x
= µB − µA + kB T ln + λ (1 − 2x) (2.392)
∂x 1−x

and
∂ 2g k T k T
= B + B − 2λ . (2.393)
∂x2 x 1−x
21
See table 2.4, and recall M = 18 g is the molar mass of H2 O.
22
It is more customary to write ∆T ∗ = Tpure
∗ ∗
solvent − Tsolution in the case of the freezing point depression, in which case
∆T ∗ is positive.
104 CHAPTER 2. THERMODYNAMICS

Figure 2.34: Gibbs free energy per particle for a binary solution as a function of concentration x = xB
of the B species (pure A at the left end x = 0 ; pure B at the right end x = 1), in units of the interaction
parameter λ. Dark red curve: T = 0.65 λ/kB > Tc ; green curve: T = λ/2kB = Tc ; blue curve:
T = 0.40 λ/kB < Tc . We have chosen µ0A = 0.60 λ − 0.50 kB T and µ0B = 0.50 λ − 0. 50 kB T . Note that the
free energy g(T, p, x) is not convex in x for T < Tc , indicating an instability and necessitating a Maxwell
construction.

∂ 2g
The spinodal is given by the solution to the equation ∂x2
= 0, which is

T ∗ (x) = x (1 − x) . (2.394)
kB
1
Since x (1 − x) achieves its maximum value of 4 at x = 12 , we have T ∗ ≤ kB /2λ.
In Fig. 2.34 we sketch the free energy g(T, p, x) versus x for three representative temperatures. For
T > λ/2kB , the free energy is everywhere convex in λ. When T < λ/2kB , there free energy resembles the
blue curve in Fig. 2.34, and the system is unstable to phase separation. The two phases are said to be
immiscible, or, equivalently, there exists a solubility gap. To determine the coexistence curve, we perform
a Maxwell construction, writing

g(x2 ) − g(x1 ) ∂g ∂g
= = . (2.395)
x2 − x1 ∂x x ∂x x
1 2

Here, x1 and x2 are the boundaries of the two phase region. These equations admit a symmetry of
x ↔ 1 − x, hence we can set x = x1 and x2 = 1 − x. We find

g(1 − x) − g(x) = (1 − 2x) µ0B − µ0A , (2.396)
and invoking eqns. 2.395 and 2.392 we obtain the solution
λ 1 − 2x
Tcoex (x) = ·  . (2.397)
kB ln 1−x
x
2.13. ENTROPY OF MIXING AND THE GIBBS PARADOX 105

Figure 2.35: Upper panels: chemical potential shifts ∆µ± = ∆µA ± ∆µB versus concentration x = xB .
The dashed black line is the spinodal, and the solid black line the coexistence boundary. Temperatures
range from T = 0 (dark blue) to T = 0.6 λ/kB (red) in units of 0.1 λ/kB . Lower panels: phase diagram in
the (T, ∆µ± ) planes. The black dot is the critical point.

The phase diagram for the binary system is shown in Fig. 2.36. For T < T ∗ (x), the system is unstable,
and spinodal decomposition occurs. For T ∗ (x) < T < Tcoex (x), the system is metastable, just like the
van der Waals gas in its corresponding regime. Real binary solutions behave qualitatively like the model
discussed here, although the coexistence curve is generally not symmetric under x ↔ 1 − x, and the single
phase region extends down to low temperatures for x ≈ 0 and x ≈ 1. If λ itself is temperature-dependent,
there can be multiple solutions to eqns. 2.394 and 2.397. For example, one could take

λ0 T 2
λ(T ) = . (2.398)
T 2 + T02

In this case, kB T > λ at both high and low temperatures, and we expect the single phase region to be
reentrant. Such a phenomenon occurs in water-nicotine mixtures, for example.
106 CHAPTER 2. THERMODYNAMICS

Figure 2.36: Phase diagram for the binary system. The black curve is the coexistence curve, and the
dark red curve is the spinodal. A-rich material is to the left and B-rich to the right.

It is instructive to consider the phase diagram in the (T, µ) plane. We define the chemical potential shifts,
∆µA ≡ µA − µ0A = kB T ln(1 − x) + λ x2 (2.399)
∆µB ≡ µB − µ0B 2
= kB T ln x + λ (1 − x) , (2.400)
and their sum and difference, ∆µ± ≡ ∆µA ± ∆µB . From the Gibbs-Duhem relation, we know that we
can write µB as a function of T , p, and µA . Alternately, we could write ∆µ± in terms of T , p, and
∆µ∓ , so we can choose which among ∆µ+ and ∆µ− we wish to use in our phase diagram. The results
are plotted in Fig. 2.35. It is perhaps easiest to understand the phase diagram in the (T, ∆µ− ) plane.
At low temperatures, below T = Tc = λ/2kB , there is a first order phase transition at ∆µ− = 0. For
T < Tc = λ/2kB and ∆µ− = 0+ , i.e. infinitesimally positive, the system is in the A-rich phase, but for
∆µ− = 0− , i.e. infinitesimally negative, it is B-rich. The concentration x = xB changes discontinuously
across the phase boundary. The critical point lies at (T, ∆µ− ) = (λ/2kB , 0).
If we choose N = NA + NB to be the extensive variable, then fixing N means dNA + dNB = 0. So st fixed
T and p,

dG T,p = µA dNA + µB dNB ⇒ dg T,p = −∆µ− dx . (2.401)
Since ∆µ− (x, T ) = ϕ(x, T ) − ϕ(1 − x, T ) = −∆µ− (1 − x, T ), where ϕ(x, T ) = λx − kB T ln x, we have that
R
1−x
the coexistence boundary in the (x, ∆− ) plane is simply the line ∆µ− = 0, because dx′ ∆µ− (x′ , T ) = 0.
x

Note also that there is no two-phase region in the (T, ∆µ) plane; the phase boundary in this plane is
a curve which terminates at a critical point. As we saw in §2.12, the same situation pertains in single
component (p, v, T ) systems. That is, the phase diagram in the (p, v) or (T, v) plane contains two-phase
regions, but in the (p, T ) plane the boundaries between phases are one-dimensional curves. Any two-phase
behavior is confined to these curves, where the thermodynamic potentials are singular.
2.13. ENTROPY OF MIXING AND THE GIBBS PARADOX 107

Figure 2.37: Gibbs free energy per particle g for an ideal binary solution for temperatures T ∈ [TA∗ , TB∗ ].
The Maxwell construction is shown for the case TA∗ < T < TB∗ . Right: phase diagram, showing two-phase
region and distillation sequence in (x, T ) space.

The phase separation can be seen in a number of systems. A popular example involves mixtures of
water and ouzo or other anise-based liqueurs, such as arak and absinthe. Starting with the pure liqueur
(x = 1), and at a temperature below the coexistence curve maximum, the concentration is diluted by
adding water. Follow along on Fig. 2.36 by starting at the point (x = 1 , kB T /λ = 0.4) and move to
the left. Eventually, one hits the boundary of the two-phase region. At this point, the mixture turns
milky, due to the formation of large droplets of the pure phases on either side of coexistence region which
scatter light, a process known as spontaneous emulsification 23 . As one continues to dilute the solution
with more water, eventually one passes all the way through the coexistence region, at which point the
solution becomes clear once again, and described as a single phase.
What happens if λ < 0 ? In this case, both the entropy and the interaction energy prefer a mixed phase,
and there is no instability to phase separation. The two fluids are said to be completely miscible. An
example would be benzene, C6 H6 , and toluene, C7 H8 (i.e. C6 H5 CH3 ). The phase diagram would be
blank, with no phase boundaries below the boiling transition, because the fluid could exist as a mixture
in any proportion.
Any fluid will eventually boil if the temperature is raised sufficiently high. Let us assume that the boiling
points of our A and B fluids are TA,B∗ , and without loss of generality let us take T ∗ < T ∗ at some given
A B
fixed pressure . This means µA (TA∗ , p) = µVA (TA∗ , p) and µLB (TB∗ , p) = µVB (TB∗ , p). What happens to the
24 L

23
An emulsion is a mixture of two or more immiscible liquids.
24
We assume the boiling temperatures are not exactly equal!
108 CHAPTER 2. THERMODYNAMICS

Figure 2.38: Negative (left) and positive (right) azeotrope phase diagrams. From Wikipedia.

mixture? We begin by writing the free energies of the mixed liquid and mixed vapor phases as
h i
gL (T, p, x) = (1 − x) µLA (T, p) + x µLB (T, p) + kB T x ln x + (1 − x) ln(1 − x) + λL x(1 − x) (2.402)
h i
gV (T, p, x) = (1 − x) µVA (T, p) + x µVB (T, p) + kB T x ln x + (1 − x) ln(1 − x) + λV x(1 − x) . (2.403)

Typically λV ≈ 0. Consider these two free energies as functions of the concentration x, at fixed T and
p. If the curves never cross, and gL (x) < gV (x) for all x ∈ [0, 1], then the liquid is always the state of
lowest free energy. This is the situation in the first panel of Fig. 2.37. Similarly, if gV (x) < gL (x) over
this range, then the mixture is in the vapor phase throughout. What happens if the two curves cross at
some value of x? This situation is depicted in the second panel of Fig. 2.37. In this case, there is always
a Maxwell construction which lowers the free energy throughout some range of concentration, i.e. the
system undergoes phase separation.
In an ideal fluid, we have λL = λV = 0, and setting gL = gV requires

(1 − x) ∆µA (T, p) + x ∆µB (T, p) = 0 , (2.404)

where ∆µA/B (T, p) = µLA/B (T, p) − µVA/B (T, p). Expanding the chemical potential about a given tempera-
ture T ∗ ,
cp (T ∗ , p)
µ(T, p) = µ(T ∗ , p) − s(T ∗ , p) (T − T ∗ ) − (T − T ∗ )2 + . . . , (2.405)
2T
∂µ  ∂S
 ∂s

where we have used ∂T p,N = − ∂N T,p = −s(T, p), the entropy per particle, and ∂T p,N = cp /T .
2.13. ENTROPY OF MIXING AND THE GIBBS PARADOX 109

Figure 2.39: Free energies before Maxwell constructions for a binary fluid mixture in equilibrium with
a vapor (λV = 0). Panels show (a) λL = 0 (ideal fluid), (b) λL < 0 (miscible fluid; negative azeotrope),
(c) λLAB > 0 (positive azeotrope), (d) λLAB > 0 (heteroazeotrope). Thick blue and red lines correspond
to temperatures TA∗ and TB∗ , respectively, with TA∗ < TB∗ . Thin blue and red curves are for temperatures
outside the range [TA∗ , TB∗ ]. The black curves show the locus of points where g is discontinuous, i.e. where
the liquid and vapor free energy curves cross. The yellow curve in (d) corresponds to the coexistence
temperature for the fluid mixture. In this case the azeotrope forms within the coexistence region.

∗ , we have
Thus, expanding ∆µA/B about TA/B

cVpA − cLpA
L V V L
∆µA ≡ µA − µA = (sA − sA )(T − TA∗ ) + (T − TA∗ )2 + . . .
2TA∗
(2.406)
cpB − cLpB
V
L V V L
∆µB ≡ µB − µB = (sB − sB )(T − TB∗ ) + (T − TB∗ )2 + ...
2TB∗
We assume sVA/B > sLA/B , i.e. the vapor phase has greater entropy per particle. Thus, ∆µA/B (T ) changes
∗ . If we assume that these are the only sign changes
sign from negative to positive as T rises through TA/B
for ∆µA/B (T ) at fixed p, then eqn. 2.404 can only be solved for T ∈ [TA∗ , TB∗ ]. This immediately leads to
the phase diagram in the rightmost panel of Fig. 2.37.
According to the Gibbs phase rule, with σ = 2, two-phase equilibrium (ϕ = 2) occurs along a subspace
110 CHAPTER 2. THERMODYNAMICS

of dimension dPE = 2 + σ − ϕ = 2. Thus, if we fix the pressure p and the concentration x = xB , liquid-gas
equilibrium occurs at a particular temperature T ∗ , known as the boiling point. Since the liquid and the
vapor with which it is in equilibrium at T ∗ may have different composition, i.e. different values of x, one
may distill the mixture to separate the two pure substances, as follows. First, given a liquid mixture of
A and B, we bring it to boiling, as shown in the rightmost panel of Fig. 2.37. The vapor is at a different
concentration x than the liquid (a lower value of x if the boiling point of pure A is less than that of
pure B, as shown). If we collect the vapor, the remaining fluid is at a higher value of x. The collected
vapor is then captured and then condensed, forming a liquid at the lower x value. This is then brought
to a boil, and the resulting vapor is drawn off and condensed, etc The result is a purified A state. The
remaining liquid is then at a higher B concentration. By repeated boiling and condensation, A and B
can be separated. For liquid-vapor transitions, the upper curve, representing the lowest temperature at
a given concentration for which the mixture is a homogeneous vapor, is called the dew point curve. The
lower curve, representing the highest temperature at a given concentration for which the mixture is a
homogeneous liquid, is called the bubble point curve. The same phase diagram applies to liquid-solid
mixtures where both phases are completely miscible. In that case, the upper curve is called the liquidus,
and the lower curve the solidus.
When a homogeneous liquid or vapor at concentration x is heated or cooled to a temperature T such that
(x, T ) lies within the two-phase region, the mixture phase separates into the the two end components
(x∗L , T ) and (x∗V , T ), which lie on opposite sides of the boundary of the two-phase region, at the same
temperature. The locus of points at constant T joining these two points is called the tie line. To
determine how much of each of these two homogeneous phases separates out, we use particle number
conservation. If ηL,V is the fraction of the homogeneous liquid and homogeneous vapor phases present,
then ηL x∗L + ηV x∗V = x, which says ηL = (x − x∗V )/(x∗L − x∗V ) and ηV = (x − x∗L )/(x∗V − x∗L ). This is known
as the lever rule.
For many binary mixtures, the boiling point curve is as shown in Fig. 2.38. Such cases are called
azeotropes. For negative azeotropes, the maximum of the boiling curve lies above both TA,B ∗ . The free
∗ ∗
energy curves for this case are shown in panel (b) of Fig. 2.39. For x < x , where x is the azeotropic
composition, one can distill A but not B. Similarly, for x > x∗ one can distill B but not A. The
situation is different for positive azeotropes, where the minimum of the boiling curve lies below both
∗ , corresponding to the free energy curves in panel (c) of Fig. 2.39. In this case, distillation (i.e.
TA,B
condensing and reboiling the collected vapor) from either side of x∗ results in the azeotrope. One can
of course collect the fluid instead of the vapor. In general, for both positive and negative azeotropes,
starting from a given concentration x, one can only arrive at pure A plus azeotrope (if x < x∗ ) or pure B
plus azeotrope (if x > x∗ ). Ethanol (C2 H5 OH) and water (H2 O) form a positive azeotrope which is 95.6%
ethanol and 4.4% water by weight. The individual boiling points are TC∗ 2 H5 OH = 78.4◦ C , TH∗ 2 O = 100◦ C,
while the azeotrope boils at TAZ ∗ = 78.2◦ C. No amount of distillation of this mixture can purify ethanol

beyond the 95.6% level. To go beyond this level of purity, one must resort to azeotropic distillation, which
involves introducing another component, such as benzene (or a less carcinogenic additive), which alters
the molecular interactions.
To model the azeotrope system, we need to take λL 6= 0, in which case one can find two solutions to
the energy crossing condition gV (x) = gL (x). With two such crossings come two Maxwell constructions,
hence the phase diagrams in Fig. 2.38. Generally, negative azeotropes are found in systems with λL < 0 ,
whereas positive azeotropes are found when λL > 0. As we’ve seen, such repulsive interactions between
2.14. SOME CONCEPTS IN THERMOCHEMISTRY 111

Figure 2.40: Phase diagram for a eutectic mixture in which a liquid L is in equilibrium with two solid
phases α and β. The same phase diagram holds for heteroazeotropes, where a vapor is in equilibrium
with two liquid phases.

the A and B components in general lead to a phase separation below a coexistence temperature TCOEX (x)
given by eqn. 2.397. What happens if the minimum boiling point lies within the coexistence region?
This is the situation depicted in panel (d) of Fig. 2.39. The system is then a liquid/vapor version of the
solid/liquid eutectic (see Fig. 2.40), and the minimum boiling point mixture is called a heteroazeotrope.

2.14 Some Concepts in Thermochemistry

2.14.1 Chemical reactions and the law of mass action

Suppose we have a chemical reaction among σ species, written as

ζ1 A1 + ζ2 A2 + · · · + ζσ Aσ = 0 , (2.407)

where

Aa = chemical formula
ζa = stoichiometric coefficient .

For example, we could have

− 3 H2 − N2 + 2 NH3 = 0 (3 H2 + N2 ⇋ 2 NH3 ) (2.408)

for which
ζ(H2 ) = −3 , ζ(N2 ) = −1 , ζ(NH3 ) = 2 . (2.409)
112 CHAPTER 2. THERMODYNAMICS

When ζa > 0, the corresponding Aa is a product; when ζa < 0, the corresponding Aa is a reactant. The
bookkeeping of the coefficients ζa which ensures conservation of each individual species of atom in the
reaction(s) is known as stoichiometry 25
Now we ask: what are the conditions for equilibrium? At  constant T and p, which is typical for many
chemical reactions, the conditions are that G T, p, {Na } be a minimum. Now
X
dG = −S dT + V dp + µa dNa , (2.410)
i

so if we let the reaction go forward, we have dNa = ζa , and if it runs in reverse we have dNa = −ζa .
Thus, setting dT = dp = 0, we have the equilibrium condition
σ
X
ζa µa = 0 . (2.411)
a=1

Let us investigate the consequences of this relation for ideal gases. The chemical potential of the ath
species is
µa (T, p) = kB T φa (T ) + kB T ln pa . (2.412)
P
Here pa = p xa is the partial pressure of species a, where xa = Na / b Nb the dimensionless concentration
of species a. Chemists sometimes write xa = [Aa ] for the concentration of species a. In equilibrium we
must have X h i
ζa ln p + ln xa + φa (T ) = 0 , (2.413)
a

which says X X X
ζa ln xa = − ζa ln p − ζa φa (T ) . (2.414)
a a a

Exponentiating, we obtain the law of mass action:


Y P
 X 
xaζa = p− a ζa exp − ζa φa (T ) ≡ κ(p, T ) . (2.415)
a a
25
Antoine Lavoisier, the ”father of modern chemistry”, made pioneering contributions in both chemistry and biology. In
particular, he is often credited as the progenitor of stoichiometry. An aristocrat by birth, Lavoisier was an administrator of
the Ferme générale, an organization in pre-revolutionary France which collected taxes on behalf of the king. At the age of
28, Lavoisier married Marie-Anne Pierette Paulze, the 13-year-old daughter of one of his business partners. She would later
join her husband in his research, and she played a role in his disproof of the phlogiston theory of combustion. The phlogiston
theory was superseded by Lavoisier’s work, where, based on contemporary experiments by Joseph Priestley, he correctly
identified the pivotal role played by oxygen in both chemical and biological processes (i.e. respiration). Despite his fame as
a scientist, Lavoisier succumbed to the Reign of Terror. His association with the Ferme générale, which collected taxes from
the poor and the downtrodden, was a significant liability in revolutionary France (think Mitt Romney vis-a-vis Bain Capital).
Furthermore – and let this be a lesson to all of us – Lavoisier had unwisely ridiculed a worthless pseudoscientific pamphlet,
ostensibly on the physics of fire, and its author, Jean-Paul Marat. Marat was a journalist with scientific pretensions, but
apparently little in the way of scientific talent or acumen. Lavoisier effectively blackballed Marat’s candidacy to the French
Academy of Sciences, and the time came when Marat sought revenge. Marat was instrumental in getting Lavoisier and
other members of the Ferme générale arrested on charges of counterrevolutionary activities, and on May 8, 1794, after a
trial lasting less than a day, Lavoisier was guillotined. Along with Fourier and Carnot, Lavoisier’s name is one of the 72
engraved on the Eiffel Tower. Source: http://www.vigyanprasar.gov.in/scientists/ALLavoisier.htm.
2.14. SOME CONCEPTS IN THERMOCHEMISTRY 113

The quantity κ(p, T ) is called the equilibrium constant. When κ is large, the LHS of the above equation
is large. This favors maximal concentration xa for the products (ζa > 0) and minimal concentration xa
for the reactants (ζa < 0). This means that the equation REACTANTS ⇋ PRODUCTS is shifted to the
right, i.e. the products are plentiful and the reactants are scarce. When κ is small, the LHS is small and
the reaction is shifted to the left, i.e. the reactants are plentiful and the products are scarce.
P Remember
we are describing equilibrium conditions here. Now we observe that reactions for which a ζa > 0 shift
to the Pleft with increasing pressure and shift to the right with decreasing pressure, while reactions for
which a ζa > 0 the situation is P reversed: they shift to the right with increasing pressure and to the left
with decreasing pressure. When a ζa = 0 there is no shift upon increasing or decreasing pressure.
The rate at which the equilibrium constant changes with temperature is given by
  X
∂ ln κ
=− ζa φ′a (T ) . (2.416)
∂T p a

Now from eqn. 2.412 we have that the enthalpy per particle for species i is
 
∂µa
ha = µ a − T , (2.417)
∂T p

since H = G + T S and S = − ∂G
∂T p . We find

ha = −kB T 2 φ′a (T ) , (2.418)


and thus   P
∂ ln κ i ζ a ha ∆h
= = , (2.419)
∂T p k T2
B kB T 2
where ∆h is the enthalpy of the reaction, which is the heat absorbed or emitted as a result of the reaction.
When ∆h > 0 the reaction is endothermic and the yield increases with increasing T . When ∆h < 0 the
reaction is exothermic and the yield decreases with increasing T .
As an example, consider the reaction H2 + I2 ⇋ 2 HI. We have
ζ(H2 ) = −1 , ζ(I2 ) = −1 ζ(HI) = 2 . (2.420)
Suppose our initial system consists of ν10 moles of H2 , ν20 = 0 moles of I2 , and ν30 moles
P of undissociated
0
HI . These mole numbers determine the initial concentrations xa , where xa = νa / b νb . Define
x03 − x3
α≡ , (2.421)
x3
in which case we have
x1 = x01 + 12 αx03 , x2 = 21 αx03 , x3 = (1 − α) x03 . (2.422)
Then the law of mass action gives
4 (1 − α)2
=κ. (2.423)
α(α + 2r)
where r ≡ x01 /x03 = ν10 /ν30 . This yields
P a quadratic equation, which can be solved to find α(κ, r). Note
that κ = κ(T ) for this reaction since a ζa = 0. The enthalpy of this reaction is positive: ∆h > 0.
114 CHAPTER 2. THERMODYNAMICS

∆Hf0 ∆Hf0
Formula Name State kJ/mol Formula Name State kJ/mol
Ag Silver crystal 0.0 NiSO4 Nickel sulfate crystal -872.9
C Graphite crystal 0.0 Al2 O3 Aluminum oxide crystal -1657.7
C Diamond crystal 1.9 Ca3 P2 O8 Calcium phosphate gas -4120.8
O3 Ozone gas 142.7 HCN Hydrogen cyanide liquid 108.9
H2 O Water liquid -285.8 SF6 Sulfur hexafluoride gas -1220.5
H3 BO3 Boric acid crystal -1094.3 CaF2 Calcium fluoride crystal -1228.0
ZnSO4 Zinc sulfate crystal -982.8 CaCl2 Calcium chloride crystal -795.4

Table 2.5: Enthalpies of formation of some common substances.

2.14.2 Enthalpy of formation

Most chemical reactions take place under constant pressure. The heat Qif associated with a given isobaric
process is
Zf Zf
Qif = dE + p dV = (Ef − Ei ) + p (Vf − Vi ) = Hf − Hi , (2.424)
i i

where H is the enthalpy,


H = E + pV . (2.425)
Note that the enthalpy H is a state function, since E is a state function and p and V are state variables.
Hence, we can meaningfully speak of changes in enthalpy: ∆H = Hf − Hi . If ∆H < 0 for a given reaction,
we call it exothermic – this is the case when Qif < 0 and thus heat is transferred to the surroundings. Such
reactions can occur spontaneously, and, in really fun cases, can produce explosions. The combustion of
fuels is always exothermic. If ∆H > 0, the reaction is called endothermic. Endothermic reactions require
that heat be supplied in order for the reaction to proceed. Photosynthesis is an example of an endothermic
reaction.
Suppose we have two reactions
(∆H)
1
A + B −−−−→ C (2.426)
and
(∆H)
2
C + D −−−−→ E. (2.427)
Then we may write
(∆H)
3
A + B + D −−−−→ E, (2.428)
with
(∆H)1 + (∆H)2 = (∆H)3 . (2.429)
We can use this additivity of reaction enthalpies to define a standard molar enthalpy of formation. We
first define the standard state of a pure substance at a given temperature to be its state (gas, liquid, or
solid) at a pressure p = 1 bar. The standard reaction enthalpies at a given temperature are then defined
to be the reaction enthalpies when the reactants and products are all in their standard states. Finally,
2.14. SOME CONCEPTS IN THERMOCHEMISTRY 115

Figure 2.41: Left panel: reaction enthalpy and activation energy (exothermic case shown). Right panel:
reaction enthalpy as a difference between enthalpy of formation of reactants and products.

we define the standard molar enthalpy of formation ∆Hf0 (X) of a compound X at temperature T as the
reaction enthalpy for the compound X to be produced by its constituents when they are in their standard
state. For example, if X = SO2 , then we write
f∆H0 [SO2 ]
S + O2 −−−−− −−−→ SO2 . (2.430)

The enthalpy of formation of any substance in its standard state is zero at all temperatures, by definition:
∆H0f [O2 ] = ∆H0f [He] = ∆H0f [K] = ∆H0f [Mn] = 0, etc.
Suppose now we have a reaction
∆H
a A + b B −−−−→ c C + d D . (2.431)
To compute the reaction enthalpy ∆H, we can imagine forming the components A and B from their
standard state constituents. Similarly, we can imagine doing the same for C and D. Since the number
of atoms of a given kind is conserved in the process, the constituents of the reactants must be the same
as those of the products, we have

∆H = −a ∆Hf0 (A) − b ∆Hf0 (B) + c ∆Hf0 (C) + d ∆Hf0 (D) . (2.432)

A list of a few enthalpies of formation is provided in table 2.5. Note that the reaction enthalpy is
independent of the actual reaction path. That is, the difference in enthalpy between A and B is the same
whether the reaction is A −→ B or A −→ X −→ (Y + Z) −→ B. This statement is known as Hess’s
Law .
Note that
dH = dE + p dV + V dp = dQ
¯ + V dp , (2.433)
hence    
dQ
¯ ∂H
Cp = = . (2.434)
dT p ∂T p
116 CHAPTER 2. THERMODYNAMICS

enthalpy enthalpy enthalpy enthalpy


bond (kJ/mol) bond (kJ/mol) bond (kJ/mol) bond (kJ/mol)
H−H 436 C−C 348 C−S 259 F−F 155
H−C 412 C=C 612 N−N 163 F − Cl 254
H−N 388 C≡C 811 N=N 409 Cl − Br 219
H−O 463 C−N 305 N≡N 945 Cl − I 210
H−F 565 C=N 613 N−O 157 Cl − S 250
H − Cl 431 C≡N 890 N−F 270 Br − Br 193
H − Br 366 C−O 360 N − Cl 200 Br − I 178
H−I 299 C=O 743 N − Si 374 Br − S 212
H−S 338 C−F 484 O−O 146 I−I 151
H−P 322 C − Cl 338 O=O 497 S−S 264
H − Si 318 C − Br 276 O−F 185 P−P 172
C−I 238 O − Cl 203 Si − Si 176

Table 2.6: Average bond enthalpies for some common bonds. (Source: L. Pauling, The Nature of the
Chemical Bond (Cornell Univ. Press, NY, 1960).)

We therefore have
ZT
H(T, p, ν) = H(T0 , p, ν) + ν dT ′ cp (T ′ ) . (2.435)
T0
1
For ideal gases, we have cp (T ) = (1 + 2 f ) R. For real gases, over a range of temperatures, there are small
variations:
cp (T ) = α + β T + γ T 2 . (2.436)
Two examples (300 K < T < 1500 K, p = 1 atm):
J J J
O2 : α = 25.503 , β = 13.612 × 10−3 , γ = −42.553 × 10−7
mol K mol K2 mol K3
J J J
H2 O : α = 30.206 , β = 9.936 × 10−3 , γ = 11.14 × 10−7
mol K mol K2 mol K3

If all the gaseous components in a reaction can be approximated as ideal, then we may write
X
(∆H)rxn = (∆E)rxn + ζa RT , (2.437)
a

where the subscript ‘rxn’ stands for ‘reaction’. Here (∆E)rxn is the change in energy from reactants to
products.

2.14.3 Bond enthalpies

The enthalpy needed to break a chemical bond is called the bond enthalpy, h[ • ]. The bond enthalpy
is the energy required to dissociate one mole of gaseous bonds to form gaseous atoms. A table of bond
2.14. SOME CONCEPTS IN THERMOCHEMISTRY 117

Figure 2.42: Calculation of reaction enthalpy for the hydrogenation of ethene (ethylene), C2 H4 .

enthalpies is given in Tab. 2.6. Bond enthalpies are endothermic, since energy is required to break
chemical bonds. Of course, the actual bond energies can depend on the location of a bond in a given
molecule, and the values listed in the table reflect averages over the possible bond environment.
The bond enthalpies in Tab. 2.6 may be used to compute reaction enthalpies. Consider, for example, the
reaction 2 H2 (g) + O2 (g) −→ 2 H2 O(l). We then have, from the table,

(∆H)rxn = 2 h[H−H] + h[O = O] − 4 h[H−O]


(2.438)
= −483 kJ/mol O2 .

Thus, 483 kJ of heat would be released for every two moles of H2 O produced, if the H2 O were in the
gaseous phase. Since H2 O is liquid at STP, we should also include the condensation energy of the gaseous
water vapor into liquid water. At T = 100◦ C the latent heat of vaporization is ℓ̃ = 2270 J/g, but at
T = 20◦ C, one has ℓ̃ = 2450 J/g, hence with M = 18 we have ℓ = 44.1 kJ/mol. Therefore, the heat
produced by the reaction 2 H2 (g) + O2 (g) −⇀ ↽− 2 H2 O(l) is (∆H)rxn = −571.2 kJ / mol O2 . Since the
reaction produces two moles of water, we conclude that the enthalpy of formation of liquid water at STP
is half this value: ∆H0f [H2 O] = 285.6 kJ/mol.
Consider next the hydrogenation of ethene (ethylene): C2 H4 + H2 ↽− −⇀ C2 H6 . The product is known
as ethane. The energy accounting is shown in Fig. 2.42. To compute the enthalpies of formation of
ethene and ethane from the bond enthalpies, we need one more bit of information, which is the standard
enthalpy of formation of C(g) from C(s), since the solid is the standard state at STP. This value is
118 CHAPTER 2. THERMODYNAMICS

∆H0f [C(g)] = 718 kJ/mol. We may now write


−2260 kJ
2 C(g) + 4 H(g) −−−−−−−−→ C2 H4 (g)
1436 kJ
2 C(s) −−−−−−−−→ 2 C(g)
872 kJ
2 H2 (g) −−−−−−−−→ 4 H(g) .

Thus, using Hess’s law, i.e. adding up these reaction equations, we have
48 kJ
2 C(s) + 2 H2 (g) −−−−−−−−→ C2 H4 (g) .

Thus, the formation of ethene is endothermic. For ethane,


−2820 kJ
2 C(g) + 6 H(g) −−−−−−−−→ C2 H6 (g)
1436 kJ
2 C(s) −−−−−−−−→ 2 C(g)
1306 kJ
3 H2 (g) −−−−−−−−→ 6 H(g)

For ethane,
−76 kJ
2 C(s) + 3 H2 (g) −−−−−−−−→ C2 H6 (g) ,
which is exothermic.

2.15 Appendix I : Integrating Factors

Suppose we have an inexact differential


dW
¯ = Ai dxi . (2.439)
Here I am adopting the ‘Einstein
P convention’ where we sum over repeated indices unless otherwise ex-
plicitly stated; Ai dxi = i Ai dxi . An integrating factor eL(~x) is a function which, when divided into
dF
¯ , yields an exact differential:
∂U
dU = e−L dW¯ = dx . (2.440)
∂xi i
Clearly we must have
∂2U ∂  ∂ 
= e−L Aj = e−L Ai . (2.441)
∂xi ∂xj ∂xi ∂xj

Applying the Leibniz rule and then multiplying by eL yields

∂Aj ∂L ∂Ai ∂L
− Aj = − Ai . (2.442)
∂xi ∂xi ∂xj ∂xj

If there are K independent variables {x1 , . . . , xK }, then there are 12 K(K − 1) independent equations of
the above form – one for each distinct (i, j) pair. These equations can be written compactly as

∂L
Ωijk = Fij , (2.443)
∂xk
2.16. APPENDIX II : LEGENDRE TRANSFORMATIONS 119

where

Ωijk = Aj δik − Ai δjk (2.444)

∂Aj ∂Ai
Fij = − . (2.445)
∂xi ∂xj

Note that Fij is antisymmetric, and resembles a field strength tensor, and that Ωijk = −Ωjik is antisym-
metric in the first two indices (but is not totally antisymmetric in all three).
Can we solve these 12 K(K − 1) coupled equations to find an integrating factor L? In general the answer
is no. However, when K = 2 we can always find an integrating factor. To see why, let’s call x ≡ x1 and
y ≡ x2 . Consider now the ODE
dy A (x, y)
=− x . (2.446)
dx Ay (x, y)
This equation can be integrated to yield a one-parameter set of integral curves, indexed by an initial
condition. The equation for these curves may be written as Uc (x, y) = 0, where c labels the curves. Then
along each curve we have

dUc ∂Ux ∂Uc dy


0= = +
dx ∂x ∂y dx
(2.447)
∂Uc Ax ∂Uc
= − .
∂x Ay ∂y

Thus,
∂Uc ∂Uc
Ay = A ≡ e−L Ax Ay . (2.448)
∂x ∂y x
This equation defines the integrating factor L :
   
1 ∂Uc 1 ∂Uc
L = − ln = − ln . (2.449)
Ax ∂x Ay ∂y

We now have that


∂Uc ∂Uc
Ax = eL , Ay = eL , (2.450)
∂x ∂y
and hence
∂Uc ∂Uc
e−L dW
¯ = dx + dy = dUc . (2.451)
∂x ∂y

2.16 Appendix II : Legendre Transformations

A convex function of a single variable f (x) is one for which f ′′ (x) > 0 everywhere. The Legendre transform
of a convex function f (x) is a function g(p) defined as follows. Let p be a real number, and consider the
line y = px, as shown in Fig. 2.43. We define the point x(p) as the value of x for which the difference
120 CHAPTER 2. THERMODYNAMICS

Figure 2.43: Construction for the Legendre transformation of a function f (x).


F (x, p) = px − f (x) is greatest. Then define g(p) = F x(p), p .26 The value x(p) is unique if f (x) is
convex, since x(p) is determined by the equation

f ′ x(p) = p . (2.452)

Note that from p = f ′ x(p) we have, according to the chain rule,
d ′   h i−1
f x(p) = f ′′ x(p) x′ (p) =⇒ x′ (p) = f ′′ x(p) . (2.453)
dp
From this, we can prove that g(p) is itself convex:
dh i
g′ (p) = p x(p) − f x(p)
dp (2.454)

= p x′ (p) + x(p) − f ′ x(p) x′ (p) = x(p) ,
hence h i−1
g′′ (p) = x′ (p) = f ′′ x(p) >0. (2.455)

In higher dimensions, the generalization of the definition f ′′ (x) > 0 is that a function F (x1 , . . . , xn ) is
convex if the matrix of second derivatives, called the Hessian,
∂ 2F
Hij (x) = (2.456)
∂xi ∂xj

is positive definite. That is, all the eigenvalues of Hij (x) must be positive for every x. We then define
the Legendre transform G(p) as
G(p) = p · x − F (x) (2.457)
26
Note that g(p) may be a negative number, if the line y = px lies everywhere below f (x).
2.16. APPENDIX II : LEGENDRE TRANSFORMATIONS 121

where
p = ∇F . (2.458)
Note that
dG = x · dp + p · dx − ∇F · dx = x · dp , (2.459)
which establishes that G is a function of p and that
∂G
= xj . (2.460)
∂pj
Note also that the Legendre transformation is self dual, which is to say that the Legendre transform of
G(p) is F (x): F → G → F under successive Legendre transformations.
We can also define a partial Legendre transformation as follows. Consider a function of q variables F (x, y),
where x = {x1 , . . . , xm } and y = {y1 , . . . , yn }, with q = m + n. Define p = {p1 , . . . , pm }, and
G(p, y) = p · x − F (x, y) , (2.461)
where
∂F
pa = (a = 1, . . . , m) . (2.462)
∂xa
These equations are then to be inverted to yield
∂G
xa = xa (p, y) = . (2.463)
∂pa

Note that
∂F 
pa = x(p, y), y . (2.464)
∂xa
Thus, from the chain rule,
∂pa ∂ 2F ∂xc ∂ 2F ∂ 2G
δab = = = , (2.465)
∂pb ∂xa ∂xc ∂pb ∂xa ∂xc ∂pc ∂pb
which says
∂ 2G ∂xa
= = K−1
ab , (2.466)
∂pa ∂pb ∂pb
where the m × m partial Hessian is
∂ 2F ∂p
= a = Kab . (2.467)
∂xa ∂xb ∂xb
Note that Kab = Kba is symmetric. And with respect to the y coordinates,
∂ 2G ∂ 2F
=− = −Lµν , (2.468)
∂yµ ∂yν ∂yµ ∂yν
where
∂ 2F
Lµν = (2.469)
∂yµ ∂yν
is the partial Hessian in the y coordinates. Now it is easy to see that if the full q × q Hessian matrix Hij
is positive definite, then any submatrix such as Kab or Lµν must also be positive definite. In this case,
the partial Legendre transform is convex in {p1 , . . . , pm } and concave in {y1 , . . . , yn }.
122 CHAPTER 2. THERMODYNAMICS

2.17 Appendix III : Useful Mathematical Relations

Consider a set of n independent variables {x1 , . . . , xn }, which can be thought of as a point in n-dimensional
space. Let {y1 , . . . , yn } and {z1 , . . . , zn } be other choices of coordinates. Then
∂xi ∂xi ∂yj
= . (2.470)
∂zk ∂yj ∂zk

Note that this entails a matrix multiplication: Aik = Bij Cjk , where Aik = ∂xi /∂zk , Bij = ∂xi /∂yj , and
Cjk = ∂yj /∂zk . We define the determinant
 
∂xi ∂(x1 , . . . , xn )
det ≡ . (2.471)
∂zk ∂(z1 , . . . , zn )
Such a determinant is called a Jacobian. Now if A = BC, then det(A) = det(B) · det(C). Thus,
∂(x1 , . . . , xn ) ∂(x1 , . . . , xn ) ∂(y1 , . . . , yn )
= · . (2.472)
∂(z1 , . . . , zn ) ∂(y1 , . . . , yn ) ∂(z1 , . . . , zn )
Recall also that
∂xi
= δik . (2.473)
∂xk

Consider the case n = 2. We have


 ∂x
 ∂x

∂u v ∂v u       
∂(x, y)   ∂x ∂y ∂x ∂y
= det    = − . (2.474)
∂(u, v) ∂y ∂y ∂u v ∂v u ∂v u ∂u v
∂u v ∂v u

We also have
∂(x, y) ∂(u, v) ∂(x, y)
· = . (2.475)
∂(u, v) ∂(r, s) ∂(r, s)
From this simple mathematics follows several very useful results.
1) First, write
" #−1
∂(x, y) ∂(u, v)
= . (2.476)
∂(u, v) ∂(x, y)
Now let v = y :  
∂(x, y) ∂x 1
= = ∂u
 . (2.477)
∂(u, y) ∂u y ∂x y
Thus,  
∂x .  ∂u 
=1 (2.478)
∂u y ∂x y

2) Second, we have
    
∂(x, y) ∂x ∂(x, y) ∂(x, u) ∂y ∂x
= = · =− ,
∂(u, y) ∂u y ∂(x, u) ∂(u, y) ∂u x ∂y u
2.17. APPENDIX III : USEFUL MATHEMATICAL RELATIONS 123

which is to say     
∂x ∂y ∂x
=− . (2.479)
∂y u ∂u x ∂u y

Invoking eqn. 2.478, we conclude that


   
∂x ∂y ∂u
= −1 . (2.480)
∂y u ∂u x ∂x y

3) Third, we have
∂(x, v) ∂(x, v) ∂(y, v)
= · , (2.481)
∂(u, v) ∂(y, v) ∂(u, v)
which says     
∂x ∂x ∂y
= (2.482)
∂u v ∂y v ∂u v

This is simply the chain rule of partial differentiation.


4) Fourth, we have

∂(x, y) ∂(x, y) ∂(u, v)


= ·
∂(u, y) ∂(u, v) ∂(u, y)
         (2.483)
∂x ∂y ∂v ∂x ∂y ∂v
= − ,
∂u v ∂v u ∂y u ∂v u ∂u v ∂y u

which says       
∂x ∂x ∂x ∂y
= − (2.484)
∂u y ∂u v ∂y u ∂u v

5) Fifth, whenever we differentiate one extensive quantity with respect to another, holding only intensive
quantities constant, the result is simply the ratio of those extensive quantities. For example,
 
∂S S
= . (2.485)
∂V p,T V

The reason should be obvious. In the above example, S(p, V, T ) = V φ(p, T ), where φ is a function of the
two intensive quantities p and T . Hence differentiating S with respect to V holding p and T constant is
the same as dividing S by V . Note that this implies
     
∂S ∂S ∂S S
= = = , (2.486)
∂V p,T ∂V p,µ ∂V n,T V

where n = N/V is the particle density.


6) Sixth, suppose we have a function Φ(y, v) and we write

dΦ = x dy + u dv . (2.487)
124 CHAPTER 2. THERMODYNAMICS

That is,    
∂Φ ∂Φ
x= ≡ Φy , u= ≡ Φv . (2.488)
∂y v ∂v y

Now we may write

dx = Φyy dy + Φyv dv (2.489)


du = Φvy dy + Φvv dv . (2.490)

If we demand du = 0, this yields  


∂x Φyy
= . (2.491)
∂u v Φvy
Note that Φvy = Φyv . From the equation du = 0 we also derive
 
∂y Φvv
=− . (2.492)
∂v u Φvy

Next, we use eqn. 2.490 with du = 0 to eliminate dy in favor of dv, and then substitute into eqn. 2.489.
This yields  
∂x Φyy Φvv
= Φyv − . (2.493)
∂v u Φvy
Finally, eqn. 2.490 with dv = 0 yields  
∂y 1
= . (2.494)
∂u v Φvy

Combining the results of eqns. 2.491, 2.492, 2.493, and 2.494, we have
      
∂(x, y) ∂x ∂y ∂x ∂y
= −
∂(u, v) ∂u v ∂v u ∂v u ∂u v
      (2.495)
Φyy Φvv Φyy Φvv 1
= − − Φyv − = −1 .
Φvy Φvy Φvy Φvy

Thus, if Φ = E(S, V ),̇ then (x, y) = (T, S) and (u, v) = (−p, V ), we have

∂(T, S)
= −1 . (2.496)
∂(−p, V )

Nota bene: It is important to understand what other quantities are kept constant, otherwise we can run
into trouble. For example, it would seem that eqn. 2.495 would also yield
∂(µ, N )
=1. (2.497)
∂(p, V )
But then we should have
∂(T, S) ∂(T, S) ∂(−p, V )
= · = +1 (WRONG!)
∂(µ, N ) ∂(−p, V ) ∂(µ, N )
2.17. APPENDIX III : USEFUL MATHEMATICAL RELATIONS 125

when according to eqn. 2.495 it should be −1. What has gone wrong?
The problem is that we have not properly specified what else is being held constant. In eqn. 2.496 it is
N (or µ) which is being held constant, while in eqn. 2.497 it is S (or T ) which is being held constant.
Therefore a naive application of the chain rule for determinants yields the wrong result, as we have seen.
Let’s be more careful. Applying the same derivation to dE = x dy + u dv + r ds and holding s constant,
we conclude        
∂(x, y, s) ∂x ∂y ∂x ∂y
= − = −1 . (2.498)
∂(u, v, s) ∂u v,s ∂v u,s ∂v u,s ∂u v,s
Thus, if
dE = T dS + y dX + µ dN , (2.499)
where (y, X) = (−p, V ) or (H α , M α ) or (E α , P α ), the appropriate thermodynamic relations are

∂(T, S, N ) ∂(T, S, µ)
= −1 = −1
∂(y, X, N ) ∂(y, X, µ)
∂(µ, N, X) ∂(µ, N, y)
= −1 = −1 (2.500)
∂(T, S, X) ∂(T, S, y)
∂(y, X, S) ∂(y, X, T )
= −1 = −1
∂(µ, N, S) ∂(µ, N, T )

For example,
∂(T, S, N ) ∂(−p, V, S) ∂(µ, N, V )
= = = −1 (2.501)
∂(−p, V, N ) ∂(µ, N, S) ∂(T, S, V )
and
∂(T, S, µ) ∂(−p, V, T ) ∂(µ, N, −p)
= = = −1 . (2.502)
∂(−p, V, µ) ∂(µ, N, T ) ∂(T, S, −p)

If we are careful, then the results in eq. 2.500 can be quite handy, especially when used in conjunction
with eqn. 2.472. For example, we have
=1
  z }| {  
∂S ∂(T, S, N ) ∂(T, S, N ) ∂(p, V, N ) ∂p
= = · = , (2.503)
∂V T,N ∂(T, V, N ) ∂(p, V, N ) ∂(T, V, N ) ∂T V,N

which is one of the Maxwell relations derived from the exactness of dF (T, V, N ). Some other examples
include
=1
  z }| {  
∂V ∂(V, p, N ) ∂(V, p, N ) ∂(S, T, N ) ∂T
= = · = (2.504)
∂S p,N ∂(S, p, N ) ∂(S, T, N ) ∂(S, p, N ) ∂p S,N
=1
  z }| {  
∂S ∂(S, T, p) ∂(S, T, p) ∂(µ, N, p) ∂µ
= = · =− , (2.505)
∂N T,p ∂(N, T, p) ∂(µ, N, p) ∂(N, T, p) ∂T p,N
126 CHAPTER 2. THERMODYNAMICS

which are Maxwell relations deriving from dH(S, p, N ) and dG(T, p, N ), respectively. Note that due to
the alternating nature of the determinant – it is antisymmetric under interchange of any two rows or
columns – we have
∂(x, y, z) ∂(y, x, z) ∂(y, x, z)
=− = = ... . (2.506)
∂(u, v, w) ∂(u, v, w) ∂(w, v, u)

In general, it is usually advisable to eliminate S from a Jacobian. If we have a Jacobian involving T , S,


and N , we can write
∂(T, S, N ) ∂(T, S, N ) ∂(p, V, N ) ∂(p, V, N )
= = , (2.507)
∂( • , • , N ) ∂(p, V, N ) ∂( • , • , N ) ∂( • , • , N )
where each • is a distinct arbitrary state variable other than N .
If our Jacobian involves the S, V , and N , we write

∂(S, V, N ) ∂(S, V, N ) ∂(T, V, N ) C ∂(T, V, N )


= · = V · . (2.508)
∂( • , • , N ) ∂(T, V, N ) ∂( • , • , N ) T ∂( • , • , N )

If our Jacobian involves the S, p, and N , we write

∂(S, p, N ) ∂(S, p, N ) ∂(T, p, N ) Cp ∂(T, p, N )


= · = · . (2.509)
∂( • , • , N ) ∂(T, p, N ) ∂( • , • , N ) T ∂( • , • , N )

For example,
=1
  z }| {  
∂T ∂(T, S, N ) ∂(T, S, N ) ∂(p, V, N ) ∂(p, T, N ) T ∂V
= = · · = (2.510)
∂p S,N ∂(p, S, N ) ∂(p, V, N ) ∂(p, T, N ) ∂(p, S, N ) Cp ∂T p,N
   
∂V ∂(V, S, N ) ∂(V, S, N ) ∂(V, T, N ) ∂(p, T, N ) CV ∂V
= = · · = . (2.511)
∂p S,N ∂(p, S, N ) ∂(V, T, N ) ∂(p, T, N ) ∂(p, S, N ) Cp ∂p T,N

With κ ≡ − V1 ∂V ∂p the compressibility, we see that the second of these equations says κT cV = κS cp ,
relating the isothermal and adiabatic compressibilities and the molar heat capacities at constant volume
and constant pressure. This relation was previously established in eqn. 2.292
Chapter 3

Ergodicity and the Approach to


Equilibrium

3.1 References

– R. Balescu, Equilibrium and Nonequilibrium Statistical Mechanics (Wiley, 1975)


An advanced text with an emphasis on fluids and kinetics.

– R. Balian, From Macrophysics to Microphysics (2 vols., Springer-Verlag, 2006)


A very detailed discussion of the fundamental postulates of statistical mechanics and their implica-
tions.)

127
128 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

3.2 Modeling the Approach to Equilibrium

3.2.1 Equilibrium

A thermodynamic system typically consists of an enormously large number of constituent particles, a


typical ‘large number’ being Avogadro’s number, NA = 6.02 × 1023 . Nevertheless, in equilibrium, such
a system is characterized by a relatively small number of thermodynamic state variables.  Thus, while
a complete description of a (classical) system would require us to account for O 1023 evolving degrees
of freedom, with respect to the physical quantities in which we are interested, the details of the initial
conditions are effectively forgotten over some microscopic time scale τ , called the collision time, and over
some microscopic distance scale, ℓ, called the mean free path1 . The equilibrium state is time-independent.

3.2.2 The Master Equation

Relaxation to equilibrium is often modeled with something called the master equation. Let Pi (t) be the
probability that the system is in a quantum or classical state i at time t. Then write

dPi X 
= Wij Pj − Wji Pi . (3.1)
dt
j

Here, Wij is the rate at which j makes a transition to i. Note that we can write this equation as

dPi X
=− Γij Pj , (3.2)
dt
j

where (
−W if i 6= j
Γij = P′ ij (3.3)
k Wkj if i = j ,

where the prime on the sum indicates that k = j is to be excluded. The constraints on the Wij are that
Wij ≥ 0 for all i, j, and we may take Wii ≡ 0 (no sum on i). Fermi’s Golden Rule of quantum mechanics
says that
2π 2
Wij = h i | V̂ | j i ρ(Ej ) , (3.4)
~

where Ĥ0 i = Ei i , V̂ is an additional potential which leads to transitions, and ρ(Ei ) is the density
of final states at energy Ei . The fact that Wij ≥ 0 means that if each Pi (t = 0) ≥ 0, then Pi (t) ≥ 0 for all
t ≥ 0. To see this, suppose that at some time t > 0 one of the P probabilities Pi is crossing zero and about
to become negative. But then eqn. 3.1 says that Ṗi (t) = j Wij Pj (t) ≥ 0. So Pi (t) can never become
negative.
1
Exceptions involve quantities which are conserved by collisions, such as overall particle number, momentum, and energy.
These quantities relax to equilibrium in a special way called hydrodynamics.
3.2. MODELING THE APPROACH TO EQUILIBRIUM 129

3.2.3 Equilibrium distribution and detailed balance

If the transition rates Wij are themselves time-independent, then we may formally write

Pi (t) = e−Γ t ij
Pj (0) . (3.5)

Here we have used the Einstein ‘summation convention’ in which repeated indices are summed over (in
this case, the j index). Note that X
Γij = 0 , (3.6)
i
P
which says that the total probability i Pi is conserved:

d X X X X 
Pi = − Γij Pj = − Pj Γij = 0 . (3.7)
dt
i i,j j i

We conclude that φ ~ = (1, 1, . . . , 1) is a left eigenvector of Γ with eigenvalue λ = 0. The corresponding


right eigenvector, which we write as Pieq , satisfies Γij Pjeq = 0, and is a stationary (i.e. time independent)
solution to the master equation. Generally, there is only one right/left eigenvector pair corresponding to
λ = 0, in which case any initial probability distribution Pi (0) converges to Pieq as t → ∞, as shown in
Appendix I (§3.7).
In equilibrium, the net rate of transitions into a state | i i is equal to the rate of transitions out of | i i.
If, for each state | j i the transition rate from | i i to | j i is equal to the transition rate from | j i to | i i,
we say that the rates satisfy the condition of detailed balance. In other words,

Wij Pjeq = Wji Pieq . (3.8)

Assuming Wij 6= 0 and Pjeq 6= 0, we can divide to obtain

Wji Pjeq
= eq . (3.9)
Wij Pi

Note that detailed balance is a stronger condition than that required for a stationary solution to the
master equation.
If Γ = Γ t is symmetric, then the right eigenvectors and left eigenvectors are transposes of each other,
hence P eq = 1/N , where N is the dimension of Γ . The system then satisfies the conditions of detailed
balance. See Appendix II (§3.8) for an example of this formalism applied to a model of radioactive decay.

3.2.4 Boltzmann’s H-theorem

Suppose for the moment that Γ is a symmetric matrix, i.e. Γij = Γji . Then construct the function
X
H(t) = Pi (t) ln Pi (t) . (3.10)
i
130 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

Then
dH X dPi X dP
i
= 1 + ln Pi ) = ln Pi
dt dt dt
i i
X
=− Γij Pj ln Pi (3.11)
i,j
X 
= Γij Pj ln Pj − ln Pi ,
i,j
P
where we have used i Γij = 0. Now switch i ↔ j in the above sum and add the terms to get

dH 1X  
= Γij Pi − Pj ln Pi − ln Pj . (3.12)
dt 2
i,j

Note that the i = j term does not contribute to the sum. For i 6= j we have Γij = −Wij ≤ 0, and using
the result
(x − y) (ln x − ln y) ≥ 0 , (3.13)
we conclude
dH
≤0. (3.14)
dt
In equilibrium, Pieq is a constant, independent of i. We write

1 X
Pieq = , Ω= 1 =⇒ H = − ln Ω . (3.15)

i

If Γij 6= Γji , we can still prove a version of the H-theorem. Define a new symmetric matrix

W ij ≡ Wij Pjeq = Wji Pieq = W ji , (3.16)

and the generalized H-function,  


X P (t)
H(t) ≡ Pi (t) ln i eq . (3.17)
Pi
i

Then  "    #
dH 1 X Pi Pj Pi Pj
=− W ij − ln − ln ≤0. (3.18)
dt 2 Pieq Pjeq Pieq Pjeq
i,j

3.3 Phase Flows in Classical Mechanics

3.3.1 Hamiltonian evolution

The master equation provides us with a semi-phenomenological description of a dynamical system’s


relaxation to equilibrium. It explicitly breaks time reversal symmetry. Yet the microscopic laws of Nature
3.3. PHASE FLOWS IN CLASSICAL MECHANICS 131

are (approximately) time-reversal symmetric. How can a system which obeys Hamilton’s equations of
motion come to equilibrium?
Let’s start our investigation by reviewing the basics of Hamiltonian dynamics. Recall the
R Lagrangian
L = L(q, q̇, t) = T − V . The Euler-Lagrange equations of motion for the action S q(t) = dt L are
 
d ∂L ∂L
ṗσ = = , (3.19)
dt ∂ q̇σ ∂qσ
where pσ is the canonical momentum conjugate to the generalized coordinate qσ :
∂L
pσ = . (3.20)
∂ q̇σ

The Hamiltonian, H(q, p) is obtained by a Legendre transformation,


r
X
H(q, p) = pσ q̇σ − L . (3.21)
σ=1

Note that
r 
X 
∂L ∂L ∂L
dH = pσ dq̇σ + q̇σ dpσ − dqσ − dq̇ − dt
σ=1
∂qσ ∂ q̇σ σ ∂t
r   (3.22)
X ∂L ∂L
= q̇σ dpσ − dqσ − dt .
∂qσ ∂t
σ=1

Thus, we obtain Hamilton’s equations of motion,


∂H ∂H ∂L
= q̇σ , =− = −ṗσ (3.23)
∂pσ ∂qσ ∂qσ
and
dH ∂H ∂L
= =− . (3.24)
dt ∂t ∂t
Define the rank 2r vector ϕ by its components,


q i if 1 ≤ i ≤ r
ϕi = (3.25)


pi−r if r ≤ i ≤ 2r .
Then we may write Hamilton’s equations compactly as
∂H
ϕ̇i = Jij , (3.26)
∂ϕj
where !
0r×r 1r×r
J= (3.27)
−1r×r 0r×r

is a rank 2r matrix. Note that J t = −J, i.e. J is antisymmetric, and that J 2 = −12r×2r .
132 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

3.3.2 Dynamical systems and the evolution of phase space volumes

Consider a general dynamical system,



= V (ϕ) , (3.28)
dt
where ϕ(t) is a point in an n-dimensional phase space. Consider now a compact2 region R0 in phase
space, and consider its evolution under the dynamics. That is, R0 consists of a set of points ϕ | ϕ ∈ R0 ,
and if we regard each ϕ ∈ R0 as an initial condition, we can define the time-dependent set R(t) as the
set of points ϕ(t) that were in R0 at time t = 0:

R(t) = ϕ(t) ϕ(0) ∈ R0 . (3.29)

Now consider the volume Ω(t) of the set R(t). We have


Z
Ω(t) = dµ (3.30)
R(t)

where
dµ = dϕ1 dϕ2 · · · dϕn , (3.31)
for an n-dimensional phase space. We then have
Z Z
∂ϕi (t + dt)

Ω(t + dt) = dµ = dµ , (3.32)
∂ϕ (t) j
R(t+dt) R(t)

where
∂ϕi (t + dt) ∂(ϕ′1 , . . . , ϕ′n )

∂ϕ (t) ≡ ∂(ϕ , . . . , ϕ ) (3.33)
j 1 n

is a determinant, which
 ′ is the Jacobean
of the transformation from the set of coordinates ϕi = ϕi (t)
to the coordinates ϕi = ϕi (t + dt) . But according to the dynamics, we have

ϕi (t + dt) = ϕi (t) + Vi ϕ(t) dt + O(dt2 ) (3.34)

and therefore
∂ϕi (t + dt) ∂Vi
= δij + dt + O(dt2 ) . (3.35)
∂ϕj (t) ∂ϕj
We now make use of the equality
ln det M = Tr ln M , (3.36)
for any matrix M , which gives us3 , for small ε,
   2 
det 1 + εA = exp Tr ln 1 + εA = 1 + ε Tr A + 12 ε2 Tr A − Tr (A2 ) + . . . (3.37)
2
‘Compact’ in the parlance of mathematical analysis means ‘closed and bounded’.
3
The equality ln det M = Tr ln M is most easily proven by bringing the matrix to diagonal form via a similarity transfor-
mation, and proving the equality for diagonal matrices.
3.3. PHASE FLOWS IN CLASSICAL MECHANICS 133

Thus, Z
Ω(t + dt) = Ω(t) + dµ ∇·V dt + O(dt2 ) , (3.38)
R(t)

which says Z Z
dΩ
= dµ ∇·V = dS n̂ · V (3.39)
dt
R(t) ∂R(t)

Here, the divergence is the phase space divergence,


n
X ∂Vi
∇·V = , (3.40)
∂ϕi
i=1

and we have used the divergence theorem to convert the volume integral of the divergence to a surface
integral of n̂ · V , where n̂ is the surface normal and dS is the differential element of surface area, and
∂R denotes the boundary of the region R. We see that if ∇·V = 0 everywhere in phase space, then Ω(t)
is a constant, and phase space volumes are preserved by the evolution of the system.
For an alternative derivation, consider a function ̺(ϕ, t) which is defined to be the density of some
collection of points in phase space at phase space position ϕ and time t. This must satisfy the continuity
equation,
∂̺
+ ∇·(̺V ) = 0 . (3.41)
∂t
This is called the continuity equation. It says that ‘nobody gets lost’. If we integrate it over a region of
phase space R, we have Z Z Z
d
dµ ̺ = − dµ ∇·(̺V ) = − dS n̂ · (̺V ) . (3.42)
dt
R R ∂R

It is perhaps helpful to think of ̺ as a charge density, in which case J = ̺V is the current density. The
above equation then says Z
dQR
= − dS n̂ · J , (3.43)
dt
∂R

where QR is the total charge contained inside the region R. In other words, the rate of increase or
decrease of the charge within the region R is equal to the total integrated current flowing in or out of R
at its boundary.
The Leibniz rule lets us write the continuity equation as

∂̺
+ V ·∇̺ + ̺ ∇·V = 0 . (3.44)
∂t
But now suppose that the phase flow is divergenceless, i.e. ∇·V = 0. Then we have
 
D̺ ∂
≡ + V ·∇ ̺ = 0 . (3.45)
Dt ∂t
134 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

Figure 3.1: Time evolution of two immiscible fluids. The local density remains constant.

The combination inside the brackets above is known as the convective derivative. It tells us the total rate
of change of ̺ for an observer moving with the phase flow . That is

d  ∂̺ dϕi ∂̺
̺ ϕ(t), t = +
dt ∂ϕi dt ∂t
Xn (3.46)
∂ρ ∂̺ D̺
= Vi + = .
∂ϕi ∂t Dt
i=1

If D̺/Dt = 0, the local density remains the same during the evolution of the system. If we consider the
‘characteristic function’ (
1 if ϕ ∈ R0
̺(ϕ, t = 0) = (3.47)
0 otherwise
then the vanishing of the convective derivative means that the image of the set R0 under time evolution
will always have the same volume.
Hamiltonian evolution in classical mechanics is volume preserving. The equations of motion are

∂H ∂H
q̇i = + , ṗi = − (3.48)
∂pi ∂qi

A point in phase space is specified by r positions qi and r momenta pi , hence the dimension of phase
space is n = 2r:      
q q̇ ∂H/∂p
ϕ= , V = = . (3.49)
p ṗ −∂H/∂q
3.3. PHASE FLOWS IN CLASSICAL MECHANICS 135

Hamilton’s equations of motion guarantee that the phase space flow is divergenceless:
r 
X 
∂ ṗi
∂ q̇i
∇·V = +
∂qi ∂pi
i=1
r
(    ) (3.50)
X ∂ ∂H ∂ ∂H
= + − =0.
∂qi ∂pi ∂pi ∂qi
i=1

Thus, we have that the convective derivative vanishes, viz.


D̺ ∂̺
≡ + V ·∇̺ = 0 , (3.51)
Dt ∂t
for any distribution ̺(ϕ, t) on phase space. Thus, the value of the density ̺(ϕ(t), t) is constant, which
tells us that the phase flow is incompressible. In particular, phase space volumes are preserved.

3.3.3 Liouville’s equation and the microcanonical distribution

Let ̺(ϕ) = ̺(q, p) be a distribution on phase space. Assuming the evolution is Hamiltonian, we can
write
Xr  
∂̺ ∂ ∂
= −ϕ̇ · ∇̺ = − q̇k + ṗk ̺ = −iL̺̂ , (3.52)
∂t ∂qk ∂pk
k=1

where L̂ is a differential operator known as the Liouvillian:


r
( )
X ∂H ∂ ∂H ∂
L̂ = −i − . (3.53)
∂pk ∂qk ∂qk ∂pk
k=1

Eqn. 3.52, known as Liouville’s equation, bears an obvious resemblance to the Schrödinger equation from
quantum mechanics.
Suppose that Λa (ϕ) is conserved by the dynamics of the system. Typical conserved quantities include
the components of the total linear momentum (if there is translational invariance), the components of the
total angular momentum (if there is rotational invariance), and the Hamiltonian itself (if the Lagrangian
is not explicitly time-dependent). Now consider a distribution ̺(ϕ, t) = ̺(Λ1 , Λ2 , . . . , Λk ) which is a
function only of these various conserved quantities. Then from the chain rule, we have
X ∂̺
ϕ̇ · ∇̺ = ϕ̇ · ∇Λa = 0 , (3.54)
a
∂Λa

since for each a we have


X r  
dΛa ∂Λa ∂Λa
= q̇ + ṗ = ϕ̇ · ∇Λa = 0 . (3.55)
dt ∂qσ σ ∂pσ σ
σ=1

We conclude that any distribution ̺(ϕ, t) = ̺(Λ1 , Λ2 , . . . , Λk ) which is a function solely of conserved
dynamical quantities is a stationary solution to Liouville’s equation.
136 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

Clearly the microcanonical distribution,


 
δ E − H(ϕ) δ E − H(ϕ)
̺E (ϕ) = =R  , (3.56)
D(E) dµ δ E − H(ϕ)

is a fixed point solution of Liouville’s equation.

3.4 Irreversibility and Poincaré Recurrence

The dynamics of the master equation describe an approach to equilibrium. These dynamics are irre-
versible: dH/dt ≤ 0, where H is Boltzmann’s H-function. However, the microscopic laws of physics are
(almost) time-reversal invariant4 , so how can we understand the emergence of irreversibility? Further-
more, any dynamics which are deterministic and volume-preserving in a finite phase space exhibits the
phenomenon of Poincaré recurrence, which guarantees that phase space trajectories are arbitrarily close
to periodic if one waits long enough.

3.4.1 Poincaré recurrence theorem

The proof of the recurrence theorem is simple. Let gτ be the ‘τ -advance mapping’ which evolves points
in phase space according to Hamilton’s equations. Assume that gτ is invertible and volume-preserving,
as is the case for Hamiltonian flow. Further assume that phase space volume is finite. Since energy is
preserved in the case of time-independent Hamiltonians, we simply ask that the volume of phase space
at fixed total energy E be finite, i.e.
Z

dµ δ E − H(q, p) < ∞ , (3.57)

where dµ = dq dp is the phase space uniform integration measure.

Theorem: In any finite neighborhood R0 of phase space there exists a point ϕ0 which will return to R0
after m applications of gτ , where m is finite.
Proof: Assume the theorem fails; we will show this assumption results in a contradiction. Consider the
set Υ formed from the union of all sets gτk R for all m:


[
Υ= gτk R0 (3.58)
k=0

We assume that the set {gτk R0 | k ∈ N} is disjoint5 . The volume of a union of disjoint sets is the sum of
4
Actually, the microscopic laws of physics are not time-reversal invariant, but rather are invariant under the product
P CT , where P is parity, C is charge conjugation, and T is time reversal.
5
The natural numbers N is the set of non-negative integers {0, 1, 2, . . .}.
3.4. IRREVERSIBILITY AND POINCARÉ RECURRENCE 137

Figure 3.2: Successive images of a set R0 under the τ -advance mapping gτ , projected onto a two-
dimensional phase plane. The Poincaré recurrence theorem guarantees that if phase space has finite
volume, and gτ is invertible and volume preserving, then for any set R0 there exists an integer m such
that R0 ∩ gτm R0 6= ∅.

the individual volumes. Thus,



X 
vol(Υ) = vol gτk R0
k=0
∞ (3.59)
X
= vol(R0 ) · 1=∞,
k=0
 
since vol gτk R0 = vol R0 from volume preservation. But clearly Υ is a subset of the entire phase space,
hence we have a contradiction, because by assumption phase space is of finite volume.
Thus, the assumption that the set {gτk R0 | k ∈ Z+ } is disjoint fails. This means that there exists some
pair of integers k and l, with k 6= l, such that gτk R0 ∩ gτl R0 6= ∅. Without loss of generality we may
assume k < l. Apply the inverse gτ−1 to this relation k times to get gτl−k R0 ∩ R0 6= ∅. Now choose any
point ϕ1 ∈ gτm R0 ∩ R0 , where m = l − k, and define ϕ0 = gτ−m ϕ1 . Then by construction both ϕ0 and
gτm ϕ0 lie within R0 and the theorem is proven.
Poincaré recurrence has remarkable implications. Consider a bottle of perfume which is opened in an
otherwise evacuated room, as depicted in fig. 3.3. The perfume molecules evolve according to Hamiltonian
evolution. The positions are bounded because physical space is finite. The momenta are bounded because
the total energy is conserved, hence no single particle can have a momentum such that T (p) > ETOT ,
where T (p) is the single particle kinetic energy function6 . Thus, phase space, however large, is still
bounded. Hamiltonian evolution, as we have seen, is invertible and volume preserving, therefore the
6
In the nonrelativistic limit, T = p2 /2m. For relativistic particles, we have T = (p2 c2 + m2 c4 )1/2 − mc2 .
138 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

Figure 3.3: Poincaré recurrence guarantees that if we remove the cap from a bottle of perfume in an
otherwise evacuated room, all the perfume molecules will eventually return to the bottle! (Here H is the
Hubble constant.)

system is recurrent. All the molecules must eventually return to the bottle. What’s more, they all must
return with momenta arbitrarily close to their initial momenta!7 In this case, we could define the region
R0 as

R0 = (q1 , . . . , qr , p1 , . . . , pr ) |qi − qi0 | ≤ ∆q and |pj − p0j | ≤ ∆p ∀ i, j , (3.60)

which specifies a hypercube in phase space centered about the point (q0 , p0 ).
Each of the three central assumptions – finite phase space, invertibility, and volume preservation – is
crucial. If any one of these assumptions does not hold, the proof fails. Obviously if phase space is infinite
the flow needn’t be recurrent since it can keep moving off in a particular direction. Consider next a
volume-preserving map which is not invertible. An example might be a mapping f : R → R which takes
any real number to its fractional part. Thus, f (π) = 0.14159265 . . .. Let us restrict our attention to
intervals of width less than unity. Clearly f is then volume preserving. The action of f on the interval
[2, 3) is to map it to the interval [0, 1). But [0, 1) remains fixed under the action of f , so no point within
the interval [2, 3) will ever return under repeated iterations of f . Thus, f does not exhibit Poincaré
recurrence.
Consider next the case of the damped harmonic oscillator. In this case, phase space volumes contract.
For a one-dimensional oscillator obeying ẍ + 2β ẋ + Ω02 x = 0 one has ∇ · V = −2β < 0, since β > 0
for physical damping. Thus the convective derivative is Dt ̺ = −(∇ · V )̺ = 2β̺ which says that the
density increases exponentially in the comoving frame, as ̺(t) = e2βt ̺(0). Thus, phase space volumes
collapse: Ω(t) = e−2β2 Ω(0), and are not preserved by the dynamics. The proof of recurrence therefore
fails. In this case, it is possible for the set Υ to be of finite volume, even if it is the union of an infinite
number of sets gτk R0 , because the volumes of these component sets themselves decrease exponentially, as
vol(gτn R0 ) = e−2nβτ vol(R0 ). A damped pendulum, released from rest at some small angle θ0 , will not
return arbitrarily close to these initial conditions.

7
Actually, what the recurrence theorem guarantees is that there is a configuration arbitrarily close to the initial one which
recurs, to within the same degree of closeness.
3.4. IRREVERSIBILITY AND POINCARÉ RECURRENCE 139

Figure 3.4: Left: A configuration of the Kac ring with N = 16 sites and F = 4 flippers. The flippers,
which live on the links, are represented by blue dots. Right: The ring system after one time step.
Evolution proceeds by clockwise rotation. Spins passing through flippers are flipped.

3.4.2 Kac ring model

The implications of the Poincaré recurrence theorem are surprising – even shocking. If one takes a bottle
of perfume in a sealed, evacuated room and opens it, the perfume molecules will diffuse throughout the
room. The recurrence theorem guarantees that after some finite time T all the molecules will go back
inside the bottle (and arbitrarily close to their initial velocities as well). The hitch is that this could take
a very long time, e.g. much much longer than the age of the Universe.
On less absurd time scales, we know that most systems come to thermodynamic equilibrium. But
how can a system both exhibit equilibration and Poincaré recurrence? The two concepts seem utterly
incompatible!
A beautifully simple model due to Kac shows how a recurrent system can exhibit the phenomenon of
equilibration. Consider a ring with N sites. On each site, place a ‘spin’ which can be in one of two states:
up or down. Along the N links of the system, F of them contain ‘flippers’. The configuration of the
flippers is set at the outset and never changes. The dynamics of the system are as follows: during each
time step, every spin moves clockwise a distance of one lattice spacing. Spins which pass through flippers
reverse their orientation: up becomes down, and down becomes up.
The ‘phase space’ for this system consists of 2N discrete configurations. Since each configuration maps
onto a unique image under the evolution of the system, phase space ‘volume’ is preserved. The evolution
is invertible; the inverse is obtained simply by rotating the spins counterclockwise. Figure 3.4 depicts an
example configuration for the system, and its first iteration under the dynamics.
Suppose the flippers were not fixed, but moved about randomly. In this case, we could focus on a single
spin and determine its configuration probabilistically. Let pn be the probability that a given spin is in
140 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

Figure 3.5: Three simulations of the Kac ring model with N = 2500 sites and three different concen-
trations of flippers. The red line shows the magnetization as a function of time, starting from an initial
configuration in which 100% of the spins are up. The blue line shows the prediction of the Stosszahlansatz ,
which yields an exponentially decaying magnetization with time constant τ .

the up configuration at time n. The probability that it is up at time (n + 1) is then


pn+1 = (1 − x) pn + x (1 − pn ) , (3.61)
where x = F/N is the fraction of flippers in the system. In words: a spin will be up at time (n + 1) if it
was up at time n and did not pass through a flipper, or if it was down at time n and did pass through
a flipper. If the flipper locations are randomized at each time step, then the probability of flipping is
simply x = F/N . Equation 3.61 can be solved immediately:
pn = 1
2 + (1 − 2x)n (p0 − 12 ) , (3.62)
1
which decays exponentially to the equilibrium value of peq = 2 with time scale
1
τ (x) = − . (3.63)
ln |1 − 2x|
3.4. IRREVERSIBILITY AND POINCARÉ RECURRENCE 141

Figure 3.6: Simulations of the Kac ring model. Top: N = 2500 sites with F = 201 flippers. After
2500 iterations, each spin has flipped an odd number of times, so the recurrence time is 2N . Middle:
N = 2500 with F = 2400, resulting in a near-complete reversal of the population with every iteration.
Bottom: N = 25000 with N = 1000, showing long time equilibration and dramatic resurgence of the spin
population.

We identify τ (x) as the microscopic relaxation time over which local equilibrium is established. If we
define the magnetization m ≡ (N↑ − N↓ )/N , then m = 2p − 1, so mn = (1 − 2x)n m0 . The equilibrium
magnetization is meq = 0. Note that for 12 < x < 1 that the magnetization reverses sign each time step,
as well as decreasing exponentially in magnitude.
The assumption that leads to equation 3.61 is called the Stosszahlansatz 8 , a long German word mean-
ing, approximately, ‘assumption on the counting of hits’. The resulting dynamics are irreversible: the
magnetization inexorably decays to zero. However, the Kac ring model is purely deterministic, and the
Stosszahlansatz can at best be an approximation to the true dynamics. Clearly the Stosszahlansatz fails
to account for correlations such as the following: if spin i is flipped at time n, then spin i + 1 will have
8
Unfortunately, many important physicists were German and we have to put up with a legacy of long German words like
Gedankenexperiment, Zitterbewegung, Brehmsstrahlung, Stosszahlansatz , Kartoffelsalat, etc.
142 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

been flipped at time n − 1. Also if spin i is flipped at time n, then it also will be flipped at time n + N .
Indeed, since the dynamics of the Kac ring model are invertible and volume preserving, it must exhibit
Poincaré recurrence. We see this most vividly in figs. 3.5 and 3.6.
The model is trivial to simulate. The results of such a simulation are shown in figure 3.5 for a ring of
N = 1000 sites, with F = 100 and F = 24 flippers. Note how the magnetization decays and fluctuates
about the equilibrium value meq = 0, but that after N iterations m recovers its initial value: mN = m0 .
The recurrence time for this system is simply N if F is even, and 2N if F is odd, since every spin will
then have flipped an even number of times.
In figure 3.6 we plot two other simulations. The top panel shows what happens when x > 12 , so that the
magnetization wants to reverse its sign with every iteration. The bottom panel shows a simulation for a
larger ring, with N = 25000 sites. Note that the fluctuations in m about equilibrium are smaller than in
the cases with N = 1000 sites. Why?

3.5 Remarks on Ergodic Theory

3.5.1 Definition of ergodicity

A mechanical system evolves according to Hamilton’s equations of motion. We have seen how such a
system is recurrent in the sense of Poincaré.
There is a level beyond recurrence called ergodicity. In an ergodic system, time averages over intervals
[0, T ] with T → ∞ may be replaced by phase space averages. The time average of a function f (ϕ) is
defined as
ZT

1 
f (ϕ) t = lim dt f ϕ(t) . (3.64)
T →∞ T
0
For a Hamiltonian system, the phase space average of the same function is defined by
Z Z

 
f (ϕ) S = dµ f (ϕ) δ E − H(ϕ) dµ δ E − H(ϕ) , (3.65)

where H(ϕ) = H(q, p) is the Hamiltonian, and where δ(x) is the Dirac δ-function. Thus,



ergodicity ⇐⇒ f (ϕ) t = f (ϕ) S , (3.66)


for all smooth functions f (ϕ) for which f (ϕ) S exists and is finite. Note that we do not average over
all of phase space. Rather, we average only over a hypersurface along which H(ϕ) = E is fixed, i.e. over
one of the level sets of the Hamiltonian function. This is because the dynamics preserves the energy.
Ergodicity means that almost all points ϕ will, upon Hamiltonian evolution, move in such a way as to
eventually pass through every finite neighborhood on the energy surface, and will spend equal time in
equal regions of phase space.
Let χR (ϕ) be the characteristic function of a region R:
(
1 if ϕ ∈ R
χR (ϕ) = (3.67)
0 otherwise,
3.5. REMARKS ON ERGODIC THEORY 143

where H(ϕ) = E for all ϕ ∈ R. Then


 

time spent in R
χR (ϕ) = lim . (3.68)
t T →∞ T
If the system is ergodic, then

D (E)
χR (ϕ) = P (R) = R , (3.69)
t D(E)
where P (R) is the a priori probability to find ϕ ∈ R, based solely on the relative volumes of R and of
the entire phase space. The latter is given by
Z

D(E) = dµ δ E − H(ϕ) , (3.70)

called the density of states, is the surface area of phase space at energy E, and
Z

DR (E) = dµ δ E − H(ϕ) . (3.71)
R

is the density of states for the phase space subset R. Note that
Z Z
 dS
D(E) ≡ dµ δ E − H(ϕ) = (3.72)
|∇H|
SE
Z
d  dΩ(E)
= dµ Θ E − H(ϕ) = . (3.73)
dE dE
Here, dS is the differential surface element, SE is the constant H hypersurface H(ϕ) = E, and Ω(E) is
the volume of phase space over which H(ϕ) < E. Note also that we may write
dµ = dE dΣE , (3.74)
where
dS
dΣE = (3.75)
|∇H| H(ϕ)=E
is the the invariant surface element.

3.5.2 The microcanonical ensemble

The distribution,  
δ E − H(ϕ) δ E − H(ϕ)
̺E (ϕ) = =R  , (3.76)
D(E) dµ δ E − H(ϕ)
defines the microcanonical ensemble (µCE) of Gibbs.
We could also write Z

1
f (ϕ) S = dΣE f (ϕ) , (3.77)
D(E)
SE

integrating over the hypersurface SE rather than the entire phase space.
144 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

Figure 3.7: Constant phase space velocity at an irrational angle over a toroidal phase space is ergodic,
but not mixing. A circle remains a circle, and a blob remains a blob.

3.5.3 Ergodicity and mixing

Just because a system is ergodic, it doesn’t necessarily


mean that ̺(ϕ, t) → ̺eq (ϕ), for consider the
following motion on the toroidal space ϕ = (q, p) 0 ≤ q < 1 , 0 ≤ p < 1 , where we identify opposite
edges, i.e. we impose periodic boundary conditions. We also take q and p to be dimensionless, for
simplicity of notation. Let the dynamics be given by

q̇ = 1 , ṗ = α . (3.78)

The solution is
q(t) = q0 + t , p(t) = p0 + αt , (3.79)
hence the phase curves are given by
p = p0 + α(q − q0 ) . (3.80)

Now consider the average of some function f (q, p). We can write f (q, p) in terms of its Fourier transform,
X
f (q, p) = fˆmn e2πi(mq+np) . (3.81)
m,n

We have, then,
 X
f q(t), p(t) = fˆmn e2πi(mq0 +np0) e2πi(m+αn)t . (3.82)
m,n

We can now perform the time average of f :


1 X′ ˆ e2πi(m+αn)T − 1
f (q, p) t = fˆ00 + lim fmn e2πi(mq0 +np0 )
T →∞ T m,n 2πi(m + αn) (3.83)
= fˆ00 if α irrational.

Clearly,
Z1 Z1



f (q, p) S = dq dp f (q, p) = fˆ00 = f (q, p) t , (3.84)
0 0

so the system is ergodic.


3.5. REMARKS ON ERGODIC THEORY 145

Figure 3.8: The baker’s transformation is a successive stretching, cutting, and restacking.

The situation is depicted in fig. 3.7. If we start with the characteristic function of a disc,

̺(q, p, t = 0) = Θ a2 − (q − q0 )2 − (p − p0 )2 , (3.85)

then it remains the characteristic function of a disc:



̺(q, p, t) = Θ a2 − (q − q0 − t)2 − (p − p0 − αt)2 , (3.86)

For an example of a transition to ergodicity in a simple dynamical Hamiltonian model, see §3.9.
A stronger condition one could impose is the following. Let A and B be subsets of SE . Define the measure
Z Z
D (E)
ν(A) = χ
dΣE A (ϕ) dΣE = A , (3.87)
D(E)

where χA (ϕ) is the characteristic function of A. The measure of a set A is the fraction of the energy
surface SE covered by A. This means ν(SE ) = 1, since SE is the entire phase space at energy E. Now
let g be a volume-preserving map on phase space. Given two measurable sets A and B, we say that a
system is mixing if
 
mixing ⇐⇒ lim ν g nA ∩ B = ν(A) ν(B) . (3.88)
n→∞

In other words, the fraction of B covered by the nth iterate of A, i.e. gnA, is, as n → ∞, simply the
fraction of SE covered by A. The iterated map gn distorts the region A so severely that it eventually
spreads out ‘evenly’ over the entire energy hypersurface. Of course by ‘evenly’ we mean ‘with respect to
any finite length scale’, because at the very smallest scales, the phase space density is still locally constant
as one evolves with the dynamics.
146 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

Figure 3.9: The multiply iterated baker’s transformation. The set A covers half the phase space and its
area is preserved under the map. Initially, the fraction of B covered by A is zero. After many iterations,
the fraction of B covered by gnA approaches 12 .

Mixing means that


Z


f (ϕ) = dµ ̺(ϕ, t) f (ϕ)
Z Z
 
−−−−→ dµ f (ϕ) δ E − H(ϕ) dµ δ E − H(ϕ) (3.89)
t→∞
h i. h i
≡ Tr f (ϕ) δ E − H(ϕ) Tr δ E − H(ϕ) .

Physically, we can imagine regions of phase space being successively stretched and folded. During the
stretching process, the volume is preserved, so the successive stretch and fold operations map phase space
back onto itself.
An example of a mixing system is the baker’s transformation, depicted in fig. 3.8. The baker map is
defined by
 
1
 2q , 2 p
 if 0 ≤ q < 12
g(q, p) = (3.90)

 
2q − 1 , 12 p + 21 if 21 ≤ q < 1 .

Note that g is invertible and volume-preserving. The baker’s transformation consists of an initial stretch
in which q is expanded by a factor of two and p is contracted by a factor of two, which preserves the total
volume. The system is then mapped back onto the original area by cutting and restacking, which we can
call a ‘fold’. The inverse transformation is accomplished by stretching first in the vertical (p) direction
3.5. REMARKS ON ERGODIC THEORY 147

Figure 3.10: The Arnold cat map applied to an image of 150 × 150 pixels. After 300 iterations, the image
repeats itself. (Source: Wikipedia)

and squashing in the horizontal (q) direction, followed by a slicing and restacking. Explicitly,
 
1

 2 q , 2p if 0 ≤ p < 21
g−1 (q, p) = (3.91)

 1 1

2 q + 2 , 2p − 1 if 21 ≤ p < 1 .

Another example of a mixing system is Arnold’s ‘cat map’9



g(q, p) = [q + p] , [q + 2p] , (3.92)

where [x] denotes the fractional part of x. One can write this in matrix form as
M
 ′  z }| {  
q 1 1 q
′ = mod Z2 . (3.93)
p 1 2 p

The matrix M is very special because it has integer entries and its determinant is det M = 1. This means
that the inverse also has integer entries. The inverse transformation is then

M −1
  z }| {  
q 2 −1 q′
= mod Z2 . (3.94)
p −1 1 p′

Now for something cool. Suppose that our image consists of a set of discrete points located at (n1 /k , n2 /k),
where the denominator k ∈ Z is fixed, and where n1 and n2 range over the set {1, . . . , k}. Clearly g and
its inverse preserve this set, since the entries of M and M −1 are integers. If there are two possibilities
9
The cat map gets its name from its initial application, by Arnold, to the image of a cat’s face.
148 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

Figure 3.11: The hierarchy of dynamical systems.

2
for each pixel (say off and on, or black and white), then there are 2(k ) possible images, and the cat map
will map us invertibly from one image to another. Therefore it must exhibit Poincaré recurrence! This
phenomenon is demonstrated vividly in fig. 3.10, which shows a k = 150 pixel (square) image of a cat
subjected to the iterated cat map. The image is stretched and folded with each successive application of
the cat map, but after 300 iterations the image is restored! How can this be if the cat map is mixing? The
point is that only the discrete set of points (n1 /k , n2 /k) is periodic. Points with different denominators
will exhibit a different periodicity, and points with irrational coordinates will in general never return
to their exact initial conditions, although recurrence says they will come arbitrarily close, given enough
iterations. The baker’s transformation is also different in this respect, since the denominator of the p
coordinate is doubled upon each successive iteration.
The student should now contemplate the hierarchy of dynamical systems depicted in fig. 3.11, under-
standing the characteristic features of each successive refinement10 .

3.6 Thermalization of Quantum Systems

3.6.1 Quantum dephasing

Thermalization of quantum systems is fundamentally different from that of classical systems. Whereas
time evolution in classical mechanics is in general a nonlinear dynamical system, the Schrödinger equation
for time evolution in quantum mechanics is linear:
∂Ψ
i~ = ĤΨ , (3.95)
∂t
where Ĥ is a many-body Hamiltonian. In classical mechanics, the thermal state is constructed by time
evolution – this is the content of the ergodic theorem. In quantum mechanics, as we shall see, the thermal
distribution must be encoded in the eigenstates themselves.
10
There is something beyond mixing, called a K-system. A K-system has positive Kolmogorov-Sinai entropy. For such
a system, closed orbits separate exponentially in time, and consequently the Liouvillian L has a Lebesgue spectrum with
denumerably infinite multiplicity.
3.6. THERMALIZATION OF QUANTUM SYSTEMS 149

Let us assume an initial condition at t = 0,


X
|Ψ(0)i = Cα |Ψα i , (3.96)
α

where | Ψα i is an orthonormal
P eigenbasis for Ĥ satisfying Ĥ |Ψα i = Eα |Ψα i. The expansion coefficients
satisfy Cα = hΨα |Ψ(0)i and α |Cα |2 = 1. Normalization requires
X
h Ψ(0) | Ψ(0) i = |Cα |2 = 1 . (3.97)
α

The time evolution of |Ψi is then given by


X
|Ψ(t)i = Cα e−iEα t/~ |Ψα i . (3.98)
α

The energy is distributed according to the time-independent function


X
P (E) = h Ψ(t) | δ(E − Ĥ) | Ψ(t) i = |Cα |2 δ(E − Eα ) . (3.99)
α

Thus, the average energy is time-independent and is given by


Z∞ X
hEi = h Ψ(t) | Ĥ | Ψ(t) i = dE P (E) E = |Cα |2 Eα . (3.100)
−∞ α

The root mean square fluctuations of the energy are given by


D s
2 E1/2 X X 2
(∆E)rms = E − hEi = |Cα |2 Eα2 − |Cα |2 Eα . (3.101)
α α

Typically we assume that the distribution P (E) is narrowly peaked about hEi, such that (∆E)rms ≪
E − E0 , where E0 is the ground state energy. Note that P (E) = 0 for E < E0 , i.e. the eigenspectrum of
Ĥ is bounded from below.
Now consider a general quantum observable described by an operator A. We have
X
hA(t)i = h Ψ(t) | A | Ψ(t) i = Cα∗ Cβ ei(Eα −Eβ )t/~ Aαβ , (3.102)
α,β

where Aαβ = hΨα |A|Ψβ i. In the limit of large times, we have

ZT X
1
hAit ≡ lim dt hA(t)i = |Cα |2 Aαα . (3.103)
T →∞ T
0 α

Note that this implies that all coherence between different eigenstates is lost in the long time limit, due
to dephasing.
150 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

3.6.2 Eigenstate thermalization hypothesis

The essential ideas behind the eigenstate thermalization hypothesis (ETH) were described independently
by J. Deutsch (1991) and by M. Srednicki (1994). The argument goes as follows. If the total energy is
the only conserved quantity, and if A is a local, translationally-invariant, few-body operator, then the
time average hAi is given by its microcanonical value,
X P
2 α Aαα Θ(Eα ∈ I)
hAit = |Cα | Aαα = P ≡ hAiE , (3.104)
α α Θ(Eα ∈ I)

 
where I = E, E + ∆E is an energy interval of width ∆E. So once again, time averages are micro
canonical averages.
But how is it that this is the case? The hypothesis of Deutsch and of Srednicki is that thermalization
in isolated and bounded quantum systems occurs at the level of individual eigenstates. That is, for all
eigenstates |Ψα i with Eα ∈ I, one has
Aαα = hAiEα . (3.105)
This means that thermal information is encoded in each eigenstate. This is called the eigenstate ther-
malization hypothesis (ETH).
An equivalent version of the ETH is the following scenario. Suppose we have an infinite or extremely
large quantum system U (the ‘universe’) fixed in an eigenstate |Ψα i. Then form the projection operator
Pα = |Ψα ihΨα |. Projection operators satisfy P 2 = P and their eigenspectrum consists of one eigenvalue
1 and the rest of the eigenvalues are zero11 . Now consider a partition of U = W ∪ S, where W ≫ S. We
imagine S to be the ‘system’ and W the ‘world’. We can always decompose the state |Ψα i in a complete
product basis for W and S, viz.
NW NS
X X
|Ψα i = Qαpj |ψpW i ⊗ |ψjS i . (3.106)
p=1 j=1

Here NW/S is the size of the basis for W/S. The reduced density matrix for S is defined as

NS NW !
X X
ρS = Tr Pα = Qαpj Qα∗
pj ′ |ψjS ihψjS′ | . (3.107)
W
j,j ′ =1 p=1

The claim is that ρS approximates a thermal density matrix on S, i.e.

1 −β Ĥ
ρS ≈ e S , (3.108)
ZS

where ĤS is some Hamiltonian on S, and ZS = Tr e−β ĤS , so that Tr ρS = 1 and ρS is properly normalized.
A number of issues remain to be clarified:
11
More generally, we could project onto a K-dimensional subspace, in which case there would be K eigenvalues of +1 and
N − K eigenvalues of 0, where N is the dimension of the entire vector space.
3.6. THERMALIZATION OF QUANTUM SYSTEMS 151

(i) What do we mean by “approximates”?

(ii) What do we mean by ĤS ?

(iii) What do we mean by the temperature T ?

We address these in reverse order. The temperature T of an eigenstate |Ψα i of a Hamiltonian Ĥ is defined
by setting its energy density Eα /VU to the thermal energy density, i.e.

Eα 1 Tr Ĥ e−β Ĥ
= . (3.109)
V V Tr e−β Ĥ

Here, Ĥ = ĤU is the full Hamiltonian of the universe U = W ∪ S. Our intuition is that ĤS should reflect
a restriction of the original Hamiltonian ĤU to the system S. What should be done, though, about
the interface parts of ĤU which link S and W ? For lattice Hamiltonians, we can simply but somewhat
arbitrarily cut all the bonds coupling S and W . But we could easily imagine some other prescription, such
as halving the coupling strength along all such interface bonds. Indeed, the definition of HS is somewhat
arbitrary. However, so long as we use ρS to compute averages of local operators which lie sufficiently far
from the boundary of S, the precise details of how we truncate ĤU to ĤS are unimportant. This brings
us to the first issue: the approximation of ρS by its Gibbs form in eqn. 3.108 is only valid when we
consider averages of local operators lying within the bulk of S. This means that we must only examine
operators whose support is confined to regions greater than some distance ξT from ∂S, where ξT is a
thermal correlation length. This, in turn, requires that LS ≫ ξT , i.e. the region S is very large on the
scale of ξT . How do we define ξT ? For a model such as the Ising model, it can be taken to be the usual
correlation length obtained from the spin-spin correlation function hσr σr′ iT . More generally, we may
choose the largest correlation length from among the correlators of all the independent local operators
in our system. Again, the requirement is that exp(−d∂ (r)/ξT ) ≪ 1, where d∂ (r) is the shortest distance
from the location of our local operator Or to the boundary of S. At criticality, the exponential is replaced
by a power law (d∂ (r)/ξT )−p , where p is a critical exponent. Another implicit assumption here is that
VS ≪ VW .

3.6.3 When is the ETH true?

There is no rigorous proof of the ETH. Deutsch showed that the ETH holds for the case of an integrable
Hamiltonian weakly perturbed by a single Gaussian random matrix. Horoi et al. (1995) showed that
nuclear shell model wavefunctions reproduce thermodynamic predictions. Recent numerical work by M.
Rigol and collaborators has verified the applicability of the ETH in small interacting boson systems.
ETH fails for so-called integrable models, where there are a large number of conserved quantities, which
commute with the Hamiltonian. Integrable models are, however, quite special, and as Deutsch showed,
integrability is spoiled by weak perturbations, in which case ETH then applies.
ETH also fails in the case of noninteracting disordered systems which exhibit Anderson localization.
Single particle energy eigenstates ψj whose energies εj the localized portion of the eigenspectrum decay

exponentially, as |ψj (r)|2 ∼ exp − |r − rj |/ξ(εj ) , where rj is some position in space associated with
ψj and ξ(εj ) is the localization length. Within the localized portion of the spectrum, ξ(ε) is finite.
152 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

As ε approaches a mobility edge, ξ(ε) diverges as a power law. In the delocalized regime, eigenstates
are spatially extended and typically decay at worst as a power law12 . Exponentially localized states
are unable to thermalize with other distantly removed localized states. Of course, all noninteracting
systems will violate ETH, because they are integrable. The interacting version of this phenomenon,
many-body localization (MBL), is a topic of intense current interest in condensed matter and statistical
physics. MBL systems also exhibit a large number of conserved quantities, but in contrast to the case of
integrable systems, where each conserved quantity is in general expressed in terms of an integral of a local
density, in MBL systems the conserved quantities are themselves local, although emergent. The emergent
nature of locally conserved quantities in MBL systems means that they are not simply expressed in terms
of the original local operators of the system, but rather are arrived at via a sequence of local unitary
transformations.
Note again that in contrast to the classical case, time evolution of a quantum state does not create the
thermal state. Rather, it reveals the thermal distribution which is encoded in all eigenstates after sufficient
time for dephasing to occur, so that correlations between all the wavefunction expansion coefficients {Cα }
for α 6= α′ are all lost.

3.7 Appendix I : Formal Solution of the Master Equation

Recall the master equation Ṗi = −Γij Pj . The matrix Γij is real but not necessarily symmetric. For such
a matrix, the left eigenvectors φαi and the right eigenvectors ψjβ are not the same: general different:

φαi Γij = λα φαj


(3.110)
Γij ψjβ = λβ ψiβ .

Note that the eigenvalue equation for the right eigenvectors is Γ ψ = λψ while that for the left eigenvectors
is Γ t φ = λφ. The characteristic polynomial is the same in both cases:

F (λ) ≡ det (λ − Γ ) = det (λ − Γ t ) , (3.111)


 ∗
which means that the left and right eigenvalues are the same. Note also that F (λ) = F (λ∗ ), hence the
eigenvalues are either real or appear in complex conjugate pairs. Multiplying the eigenvector equation
for φα on the right by ψjβ and summing over j, and multiplying the eigenvector equation for ψ β on the
left by φαi and summing over i, and subtracting the two results yields


λα − λβ φα ψ β = 0 , (3.112)

where the inner product is



X
φ ψ = φi ψi . (3.113)
i

We can now demand




φα ψ β = δαβ , (3.114)
12
Recall that in systems with no disorder, eigenstates exhibit Bloch periodicity in space.
3.8. APPENDIX II : RADIOACTIVE DECAY 153

in which case we can write


X
X
Γ = λα ψ α φα ⇐⇒ Γij = λα ψiα φαj . (3.115)
α α

We have seen that φ ~ = (1, 1, . . . , 1) is a left eigenvector with eigenvalue λ = 0, since P Γ = 0. We do


i ij
not know a priori the corresponding right eigenvector, which depends on other details of Γij . Now let’s
expand Pi (t) in the right eigenvectors of Γ , writing
X
Pi (t) = Cα (t) ψiα . (3.116)
α
Then
dPi X dC
α α
= ψi
dt α
dt
X
= −Γij Pj = − Cα Γij ψjα (3.117)
α
X
=− λα Cα ψiα .
α
This allows us to write
dCα
= −λα Cα =⇒ Cα (t) = Cα (0) e−λα t . (3.118)
dt
Hence, we can write X
Pi (t) = Cα (0) e−λα t ψiα . (3.119)
α

It is now easy to see that Re (λα ) ≥ 0 for all λ, or else the probabilities will become negative. For suppose
Re (λα ) < 0 for some α. Then as t → ∞, the sum in eqn. 3.119 will be dominated by the term for
whichPλα has the largest negative
real part; all other contributions will be subleading. But we must
have i ψiα = 0 since ψ α must be orthogonal to the left eigenvector φ ~ α=0 = (1, 1, . . . , 1). Therefore,
α
at least one component of ψi (i.e. for some value of i) must have a negative real part, which means a
negative probability!13 As we have already proven that an initial nonnegative distribution {Pi (t = 0)}
will remain nonnegative under the evolution of the master equation, we conclude that Pi (t) → Pieq as
t → ∞, relaxing to the λ = 0 right eigenvector, with Re (λα ) ≥ 0 for all α.

3.8 Appendix II : Radioactive Decay

Consider a group of atoms, some of which are in an excited state which can undergo nuclear decay. Let
Pn (t) be the probability that n atoms are excited at some time t. We then model the decay dynamics by


0 if m ≥ n
Wmn = nγ if m = n − 1 (3.120)


0 if m < n − 1 .
13
Since the probability Pi (t) is real, if the eigenvalue with the smallest (i.e. largest negative) real part is complex, there
will be a corresponding complex conjugate eigenvalue, and summing over all eigenvectors will result in a real value for Pi (t).
154 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

Here, γ is the decay rate of an individual atom, which can be determined from quantum mechanics. The
master equation then tells us
dPn
= (n + 1) γ Pn+1 − n γ Pn . (3.121)
dt

The interpretation here is as follows: let n denote a state in which n atoms are excited. Then Pn (t) =

h ψ(t) | n i 2 . Then Pn (t) will increase due to spontaneous transitions from | n+1 i to | n i, and will
decrease due to spontaneous transitions from | n i to | n−1 i.
The average number of particles in the system is

X
N (t) = n Pn (t) . (3.122)
n=0

Note that

dN X h ∞ i
= n (n + 1) γ Pn+1 − n γ Pn
dt
n=0
∞ h
X i
=γ n(n − 1) Pn − n2 Pn (3.123)
n=0
X∞
= −γ n Pn = −γ N .
n=0

Thus,
N (t) = N (0) e−γt . (3.124)
The relaxation time is τ = γ −1 , and the equilibrium distribution is

Pneq = δn,0 . (3.125)

Note that this satisfies detailed balance.


We can go a bit farther here. Let us define

X
P (z, t) ≡ z n Pn (t) . (3.126)
n=0

This is sometimes called a generating function. Then

∂P X h ∞ i
=γ z n (n + 1) Pn+1 − n Pn
∂t
n=0 (3.127)
∂P ∂P
=γ − γz .
∂z ∂z
Thus,
1 ∂P ∂P
− (1 − z) =0. (3.128)
γ ∂t ∂z
3.9. APPENDIX III: TRANSITION TO ERGODICITY IN A SIMPLE MODEL 155

We now see that any function f (ξ) satisfies the above equation, where ξ = γt − ln(1 − z). Thus, we can
write 
P (z, t) = f γt − ln(1 − z) . (3.129)

Setting t = 0 we have P (z, 0) = f −ln(1 − z) , and inverting this result we obtain f (u) = P (1 − e−u , 0),
i.e. 
P (z, t) = P 1 + (z − 1) e−γt , 0 . (3.130)
P∞
The total probability is P (z = 1, t) = n=0 Pn , which clearly is conserved: P (1, t) = P (1, 0). The average
particle number is
X∞
∂P
N (t) = n Pn (t) = = e−γt P (1, 0) = N (0) e−γt . (3.131)
n=0
∂z
z=1

3.9 Appendix III: Transition to Ergodicity in a Simple Model

A ball of mass m executes perfect one-dimensional motion along the symmetry axis of a piston. Above
the ball lies a mobile piston head of mass M which slides frictionlessly inside the piston. Both the ball
and piston head execute ballistic motion, with two types of collision possible: (i) the ball may bounce off
the floor, which is assumed to be infinitely massive and fixed in space, and (ii) the ball and piston head
may engage in a one-dimensional elastic collision. The Hamiltonian is

P2 p2
H= + + M gX + mgx ,
2M 2m
where X is the height of the piston head and x the height of the ball. Another quantity is conserved by
the dynamics: Θ(X − x). I.e., the ball always is below the piston head.

(a) Choose
√ an arbitrary length p
scale L, and then energy scale E0 = M gL, momentum scale P0 =
M gL, and time scale τ0 = L/g. Show that the dimensionless Hamiltonian becomes

p̄2
H̄ = 21 P̄ 2 + X̄ + + rx̄ ,
2r
with r = m/M , and with equations of motion dX/dt = ∂ H̄/∂ P̄ , etc. (Here the bar indicates
dimensionless variables: P̄ = P/P0 , t̄ = t/τ0 , etc.) What special dynamical consequences hold for
r = 1?

(b) Compute the microcanonical average piston height hXi. The analogous dynamical average is

ZT
1
hXit = lim dt X(t) .
T →∞ T
0

When computing microcanonical averages, it is helpful to use the Laplace transform, discussed
toward the end of §3.3 of the notes. (It is possible to compute the microcanonical average by more
brute force methods as well.)
156 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

(c) Compute the microcanonical average of the rate of collisions between the ball and the floor. Show
that this is given by

X

δ(t − ti ) = Θ(v) v δ(x − 0+ ) .
i
The analogous dynamical average is

ZT X
1
hγit = lim dt δ(t − ti ) ,
T →∞ T
0 i

where {ti } is the set of times at which the ball hits the floor.

(d) How do your results change if you do not enforce the dynamical constraint X ≥ x?

(e) Write a computer program to simulate this system. The only input should be the mass ratio r
(set Ē = 10 to fix the energy). You also may wish to input the initial conditions, or perhaps to
choose the initial conditions randomly (all satisfying energy conservation, of course!). Have your
program compute the microcanonical as well as dynamical averages in parts (b) and (c). Plot out
the Poincaré section of P vs. X for those times when the ball hits the floor. Investigate this for
several values of r. Just to show you that this is interesting, I’ve plotted some of my own numerical
results in fig. 3.12.

Solution:
√ √
(a) Once p
we choose a length scale L (arbitrary), we may define E0 = M gL, P0 = M gL, V0 = gL,
and τ0 = L/g as energy, momentum, velocity, and time scales, respectively, the result follows directly.
Rather than write P̄ = P/P0 etc., we will drop the bar notation and write

p2
H = 12 P 2 + X + + rx .
2r

(b) What is missing from the Hamiltonian of course is the interaction potential between the ball and
the piston head. We assume that both objects are impenetrable, so the potential energy is infinite when
the two overlap. We further assume that the ball is a point particle (otherwise reset ground level to
minus the diameter of the ball). We can eliminate the interaction potential from H if we enforce that
each time X = x the ball and the piston head undergo an elastic collision. From energy and momentum
conservation, it is easy to derive the elastic collision formulae

1−r 2
P′ = P+ p
1+r 1+r

2r 1−r
p′ = P− p.
1+r 1+r

We can now answer the last question from part (a). When r = 1, we have that P ′ = p and p′ = P ,
i.e. the ball and piston simply exchange momenta. The problem is then equivalent to two identical
3.9. APPENDIX III: TRANSITION TO ERGODICITY IN A SIMPLE MODEL 157

Figure 3.12: Poincaré sections for the ball and piston head problem. Each color corresponds to a different
initial condition. When the mass ratio r = m/M exceeds unity, the system apparently becomes ergodic.

particles elastically bouncing off the bottom of the piston, and moving through each other as if they were
completely transparent. When the trajectories cross, however, the particles exchange identities.
Averages within the microcanonical ensemble are normally performed with respect to the phase space
distribution 
δ E − H(ϕ)
̺(ϕ) =  ,
Tr δ E − H(ϕ)
where ϕ = (P, X, p, x), and

Z∞ Z∞ Z∞ Z∞
Tr F (ϕ) = dP dX dp dx F (P, X, p, x) .
−∞ 0 −∞ 0

Since X ≥ x is a dynamical constraint, we should define an appropriately restricted microcanonical


158 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

average: 

h i 
e F (ϕ) δ E − H(ϕ)
F (ϕ) µce ≡ Tr e δ E − H(ϕ)
Tr

where
Z∞ Z∞ Z∞ ZX
e F (ϕ) ≡
Tr dP dX dp dx F (P, X, p, x)
−∞ 0 −∞ 0
is the modified trace. Note that the integral over x has an upper limit of X rather than ∞, since the
region of phase space with x > X is dynamically inaccessible.
When computing the traces, we shall make use of the following result from the theory of Laplace trans-
forms. The Laplace transform of a function K(E) is
Z∞
b
K(β) = dE K(E) e−βE .
0

The inverse Laplace transform is given by


Z
c+i∞
dβ b
K(E) = K(β) eβE ,
2πi
c−i∞

where the integration contour, which is a line extending from β = c − i∞ to β = c + i∞, lies to the right
b
of any singularities of K(β) in the complex β-plane. For this problem, all we shall need is the following:
E t−1 b
K(E) = ⇐⇒ K(β) = β −t .
Γ(t)
For a proof, see §4.2.2 of the lecture notes.
We’re now ready to compute the microcanonical average of X. We have
N (E)
hXi = ,
D(E)
where
 
e X δ(E − H)
N (E) = Tr
e δ(E − H) .
D(E) = Tr
b
Let’s first compute D(E). To do this, we compute the Laplace transform D(β):
b
D(β) e e−βH
= Tr
Z∞ Z∞ Z∞ ZX
−βP 2 /2 −βp2 /2r −βX
= dP e dp e dX e dx e−βrx
−∞ −∞ 0 0
√ Z∞   √
2π r −βX 1 − e−βrX r 2π
= dX e = · .
β βr 1 + r β3
0
3.9. APPENDIX III: TRANSITION TO ERGODICITY IN A SIMPLE MODEL 159

b (β) we have
Similarly for N

e X e−βH
b (β) = Tr
N
Z∞ Z∞ Z∞ ZX
−βP 2 /2 −βp2 /2r −βX
= dP e dp e dX X e dx e−βrx
−∞ −∞ 0 0
√ Z∞  
2π r −βX 1 − e−βrX (2 + r) r 3/2 2π
= dX X e = · 4 .
β βr (1 + r)2 β
0

Taking the inverse Laplace transform, we then have

√ √
r (2 + r) r 1
D(E) = · πE 2 , N (E) = · 3 πE 3 .
1+r (1 + r)2

We then have
 
N (E) 2+r
hXi = = · 31 E .
D(E) 1+r

The ‘brute force’ evaluation of the integrals isn’t so bad either. We have

Z∞ Z∞ Z∞ ZX
1 2 1 2

D(E) = dP dX dp dx δ 2P + 2r p + X + rx − E .
−∞ 0 −∞ 0

√ √ √
To evaluate, define P = 2 ux and p = 2r uy . Then we have dP dp = 2 r dux duy and 12 P 2 + 1
2r p2 =
u2x + u2y . Now convert to 2D polar coordinates with w ≡ u2x + u2y . Thus,

Z∞ Z∞ ZX
√ 
D(E) = 2π r dw dX dx δ w + X + rx − E
0 0 0
Z∞ Z∞ ZX

=√ dw dX dx Θ(E − w − X) Θ(X + rX − E + w)
r
0 0 0
ZE Z
E−w √ ZE √
2π 2π r r
=√ dw dX = dq q = · πE 2 ,
r 1+r 1+r
0 E−w 0
1+r
160 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

Rt
Figure 3.13: Long time running numerical averages Xav (t) ≡ t−1 0 dt′ X(t′ ) for r = 0.3 (top) and r = 1.2
(bottom), each for three different initial conditions, with E = 10 in all cases. Note how in the r = 0.3 case
the long time average is dependent on the initial condition, while the r = 1.2 case is ergodic and hence
independent of initial conditions. The dashed black line shows the restricted microcanonical average,
hXiµce = (2+r) 1
(1+r) · 3 E.

with q = E − w. Similarly,

Z∞ Z∞ ZX
√ 
N (E) = 2π r dw dX X dx δ w + X + rx − E
0 0 0
Z∞ Z∞ ZX

=√ dw dX X dx Θ(E − w − X) Θ(X + rX − E + w)
r
0 0 0
ZE Z
E−w ZE     √
2π 2π 1 1 2 2+r r 1
=√ dw dX X = √ dq 1 − · 2 q = · · πE 3 .
r r (1 + r)2 1+r 1+r 3
0 E−w 0
1+r

(c) Using the general result


 X δ(x − xi )
δ F (x) − A =
F ′ (x ) ,
i i
3.9. APPENDIX III: TRANSITION TO ERGODICITY IN A SIMPLE MODEL 161

where F (xi ) = A, we recover the desired expression. We should be careful not to double count, so to
avoid this difficulty we can evaluate δ(t − t+ + +
i ), where ti = ti + 0 is infinitesimally later than ti . The
+
point here is that when t = ti we have p = r v > 0 (i.e. just after hitting the bottom). Similarly, at
times t = t−i we have p < 0 (i.e. just prior to hitting the bottom). Note v = p/r. Again we write
γ(E) = N (E)/D(E), this time with
 
e Θ(p) r −1 p δ(x − 0+ ) δ(E − H) .
N (E) = Tr

The Laplace transform is

Z∞ Z∞ Z∞
2 /2 2 /2r
b (β) = dP e
N −βP −1
dp r p e−βp
dX e−βX
−∞ 0 0
r
2π 1 1 √
= · · = 2π β −5/2 .
β β β

Thus,

4 2
N (E) = 3 E 3/2

and  
N (E) √ 1+r
hγi = = 4 2

√ E −1/2 .
D(E) r

r X(0) hX(t)i hXiµce hγ(t)i hγiµce r X(0) hX(t)i hXiµce hγ(t)i hγiµce
0.3 0.1 6.1743 5.8974 0.5283 0.4505 1.2 0.1 4.8509 4.8545 0.3816 0.3812
0.3 1.0 5.7303 5.8974 0.4170 0.4505 1.2 1.0 4.8479 4.8545 0.3811 0.3812
0.3 3.0 5.7876 5.8974 0.4217 0.4505 1.2 3.0 4.8493 4.8545 0.3813 0.3812
0.3 5.0 5.8231 5.8974 0.4228 0.4505 1.2 5.0 4.8482 4.8545 0.3813 0.3812
0.3 7.0 5.8227 5.8974 0.4228 0.4505 1.2 7.0 4.8472 4.8545 0.3808 0.3812
0.3 9.0 5.8016 5.8974 0.4234 0.4505 1.2 9.0 4.8466 4.8545 0.3808 0.3812
0.3 9.9 6.1539 5.8974 0.5249 0.4505 1.2 9.9 4.8444 4.8545 0.3807 0.3812

Table 3.1: Comparison of time averages and microcanonical ensemble averages for r = 0.3 and r = 0.9.
Initial conditions are P (0) = x(0) = 0, with X(0) given in the table and E = 10. Averages were performed
over a period extending for Nb = 107 bounces.

(d) When the constraint X ≥ x is removed, we integrate over all phase space. We then have

b
D(β) = Tr e−βH
Z∞ Z∞ Z∞ Z∞ √
−βP 2 /2 −βp2 /2r −βX −βrx 2π r
= dP e dp e dX e dx e = .
β3
−∞ −∞ 0 0
162 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM

For part (b) we would then have


b (β) = Tr X e−βH
N
Z∞ Z∞ Z∞ Z∞ √
2
−βP /2 2
−βp /2r −βX −βrx 2π r
= dP e dp e dX X e dx e = .
β4
−∞ −∞ 0 0
√ √
The respective inverse Laplace transforms are D(E) = π rE 2 and N (E) = 13 π rE 3 . The microcanonical
average of X would then be
hXi = 13 E .
Using the restricted phase space, we obtained a value which is greater than this by a factor of (2+r)/(1+r).
That the restricted average gives a larger value makes good sense, since X is not allowed to descend below
x in that case. For part (c), we would obtain the same result for N (E) since x = 0 in the average. We
would then obtain √
hγi = 43π2 r −1/2 E −1/2 .
The restricted microcanonical average yields a rate which is larger by a factor 1 + r. Again, it makes
good sense that the restricted average should yield a higher rate, since the ball is not allowed to attain a
height greater than the instantaneous value of X.
(e) It is straightforward to simulate the dynamics. So long as 0 < x(t) < X(t), we have
p
Ẋ = P , Ṗ = −1 , ẋ = , ṗ = −r .
r
Starting at an arbitrary time t0 , these equations are integrated to yield

X(t) = X(t0 ) + P (t0 ) (t − t0 ) − 12 (t − t0 )2


P (t) = P (t0 ) − (t − t0 )
p(t0 )
x(t) = x(t0 ) + (t − t0 ) − 21 (t − t0 )2
r
p(t) = p(t0 ) − r(t − t0 ) .

We must stop the evolution when one of two things happens. The first possibility is a bounce at t = tb ,
meaning x(tb ) = 0. The momentum p(t) changes discontinuously at the bounce, with p(t+ −
b ) = −p(tb ),

and where p(tb ) < 0 necessarily. The second possibility is a collision at t = tc , meaning X(tc ) = x(tc ).
Integrating across the collision, we must conserve both energy and momentum. This means

1−r 2
P (t+
c )= P (t−
c )+ p(t−
c )
1+r 1+r

2r 1−r
p(t+
c )= P (t−
c )− p(t−
c ) .
1+r 1+r

In the following tables I report on the results of numerical simulations, comparing dynamical averages with
(restricted) phase space averages within the microcanonical ensemble. For r = 0.3 the microcanonical
3.9. APPENDIX III: TRANSITION TO ERGODICITY IN A SIMPLE MODEL 163

r X(0) Nb hX(t)i hXiµce hγ(t)i hγiµce


1.2 7.0 104 4.8054892 4.8484848 0.37560388 0.38118510
1.2 7.0 105 4.8436969 4.8484848 0.38120356 0.38118510
1.2 7.0 106 4.8479414 4.8484848 0.38122778 0.38118510
1.2 7.0 107 4.8471686 4.8484848 0.38083749 0.38118510
1.2 7.0 108 4.8485825 4.8484848 0.38116282 0.38118510
1.2 7.0 109 4.8486682 4.8484848 0.38120259 0.38118510
1.2 1.0 109 4.8485381 4.8484848 0.38118069 0.38118510
1.2 9.9 109 4.8484886 4.8484848 0.38116295 0.38118510

Table 3.2: Comparison of time averages and microcanonical ensemble averages for r = 1.2, with Nb
ranging from 104 to 109 .

averages poorly approximate the dynamical averages, and the dynamical averages are dependent on
the initial conditions, indicating that the system is not ergodic. For r = 1.2, the agreement between
dynamical and microcanonical averages generally improves with averaging time. Indeed, it has been
shown by N. I. Chernov, Physica D 53, 233 (1991), building on the work of M. P. Wojtkowski, Comm.
Math. Phys. 126, 507 (1990) that this system is ergodic for r > 1. Wojtkowski also showed that this
system is equivalent to the wedge billiard,
 in which a single point
particle of mass m bounces −1
inside
p ma

two-dimensional wedge-shaped region (x, y) x ≥ 0 , y ≥ x ctn φ for some fixed angle φ = tan M.
To see this, pass to relative (X ) and center-of-mass (Y) coordinates,

mP − M p
X =X −x Px =
M +m
M X + mx
Y= Py = P + p .
M +m

Then
(M + m) Px2 Py2
H= + + (M + m) gY .
2M m 2(M + m)
There are two constraints. One requires X ≥ x, i.e. X ≥ 0. The second requires x > 0, i.e.
M
x=Y− X ≥0.
M +m

M +m Mm
Now define x ≡ X , px ≡ Px , and rescale y ≡ √
Mm
Y and py ≡ M +m Py to obtain

1 2 
H= px + p2y + M g y

√ q
Mm M
with µ = M +m the familiar reduced mass and M = M m. The constraints are then x ≥ 0 and y ≥ m x.
164 CHAPTER 3. ERGODICITY AND THE APPROACH TO EQUILIBRIUM
Chapter 4

Statistical Ensembles

4.1 References

– F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, 1987)


This has been perhaps the most popular undergraduate text since it first appeared in 1967, and
with good reason.

– A. H. Carter, Classical and Statistical Thermodynamics


(Benjamin Cummings, 2000)
A very relaxed treatment appropriate for undergraduate physics majors.

– D. V. Schroeder, An Introduction to Thermal Physics (Addison-Wesley, 2000)


This is the best undergraduate thermodynamics book I’ve come across, but only 40% of the book
treats statistical mechanics.

– C. Kittel, Elementary Statistical Physics (Dover, 2004)


Remarkably crisp, though dated, this text is organized as a series of brief discussions of key concepts
and examples. Published by Dover, so you can’t beat the price.

– M. Kardar, Statistical Physics of Particles (Cambridge, 2007)


A superb modern text, with many insightful presentations of key concepts.

– M. Plischke and B. Bergersen, Equilibrium Statistical Physics (3rd edition, World Scientific, 2006)
An excellent graduate level text. Less insightful than Kardar but still a good modern treatment of
the subject. Good discussion of mean field theory.

– E. M. Lifshitz and L. P. Pitaevskii, Statistical Physics (part I, 3rd edition, Pergamon, 1980)
This is volume 5 in the famous Landau and Lifshitz Course of Theoretical Physics. Though dated,
it still contains a wealth of information and physical insight.

165
166 CHAPTER 4. STATISTICAL ENSEMBLES

4.2 Microcanonical Ensemble (µCE)

4.2.1 The microcanonical distribution function

We have seen how in an ergodic dynamical system, time averages can be replaced by phase space averages:



ergodicity ⇐⇒ f (ϕ) t = f (ϕ) S , (4.1)
where
ZT

1 
f (ϕ) t = lim dt f ϕ(t) . (4.2)
T →∞ T
0
and Z Z

 
f (ϕ) S = dµ f (ϕ) δ E − Ĥ(ϕ) dµ δ E − Ĥ(ϕ) . (4.3)

Here Ĥ(ϕ) = Ĥ(q, p) is the Hamiltonian, and where δ(x) is the Dirac δ-function1 . Thus, averages are
taken over a constant energy hypersurface which is a subset of the entire phase space.
We’ve also seen how any phase space distribution ̺(Λ1 , . . . , Λk ) which is a function of conserved quantitied
Λa (ϕ) is automatically a stationary (time-independent) solution to Liouville’s equation. Note that the
microcanonical distribution,
Z
 
̺E (ϕ) = δ E − Ĥ(ϕ) dµ δ E − Ĥ(ϕ) , (4.4)

is of this form, since Ĥ(ϕ) is conserved by the dynamics. Linear and angular momentum conservation
generally are broken by elastic scattering off the walls of the sample.
So averages in the microcanonical ensemble are computed by evaluating the ratio

Tr A δ(E − Ĥ)
A = , (4.5)
Tr δ(E − Ĥ)
where Tr means ‘trace’, which entails an integration over all phase space:
N Z
1 Y ddpi ddqi
Tr A(q, p) ≡ A(q, p) . (4.6)
N! (2π~)d
i=1
Here N is the total number of particles and d is the dimension of physical space in which each particle
moves. The factor of 1/N !, which cancels in the ratio between numerator and denominator, is present for
indistinguishable particles 2 . The normalization factor (2π~)−N d renders the trace dimensionless. Again,
this cancels between numerator and denominator. These factors may then seem arbitrary in the definition
of the trace, but we’ll see how they in fact are required from quantum mechanical considerations. So we
now adopt the following metric for classical phase space integration:
N
1 Y ddpi ddqi
dµ = . (4.7)
N! (2π~)d
i=1
1
We write the Hamiltonian as Ĥ (classical or quantum) in order to distinguish it from magnetic field (H) or enthalpy
(H).
2
More on this in chapter 5.
4.2. MICROCANONICAL ENSEMBLE (µCE) 167

4.2.2 Density of states

The denominator,
D(E) = Tr δ(E − Ĥ) , (4.8)
is called the density of states. It has dimensions of inverse energy, such that
E+∆E
Z Z Z
′ ′
D(E) ∆E = dE dµ δ(E − Ĥ) = dµ (4.9)
E E<Ĥ<E+∆E
= # of states with energies between E and E + ∆E .

Let us now compute D(E) for the nonrelativistic ideal gas. The Hamiltonian is
N
X p2i
Ĥ(q, p) = . (4.10)
2m
i=1

We assume that the gas is enclosed in a region of volume V , and we’ll do a purely classical calculation,
neglecting discreteness of its quantum spectrum. We must compute
Z Y
N  N
X 
1 ddpi ddqi p2i
D(E) = δ E− . (4.11)
N! (2π~)d 2m
i=1 i=1

We shall calculate D(E) in two ways. The first method utilizes the Laplace transform, Z(β):
Z∞
 
Z(β) = L D(E) ≡ dE e−βE D(E) = Tr e−β Ĥ . (4.12)
0

The inverse Laplace transform is then

Z
c+i∞
−1
  dβ βE
D(E) = L Z(β) ≡ e Z(β) , (4.13)
2πi
c−i∞

where c is such that the integration contour is to the right of any singularities of Z(β) in the complex
β-plane. We then have
N Z
1 Y ddxi ddpi −βp2 /2m
Z(β) = e i
N! (2π~)d
i=1
 ∞ N d
Z
VN  dp −βp2 /2m  (4.14)
= e
N! 2π~
−∞
N
 N d/2
V m
= β −N d/2 .
N! 2π~2
168 CHAPTER 4. STATISTICAL ENSEMBLES

 
Figure 4.1: Complex integration contours C for inverse Laplace transform L−1 Z(β) = D(E). When
the product dN is odd, there is a branch cut along the negative Re β axis.

The inverse Laplace transform is then


 N d/2 I
VN m dβ βE −N d/2
D(E) = e β
N! 2π~2 2πi
C
1
(4.15)
 N d/2 2
N d−1
VN m E
= ,
N! 2π~2 Γ(N d/2)

exactly as before. The integration contour for the inverse Laplace transform is extended in an infinite
semicircle in the left half β-plane. When N d is even, the function β −N d/2 has a simple pole of order
N d/2 at the origin. When N d is odd, there is a branch cut extending along the negative Re β axis, and
the integration contour must avoid the cut, as shown in fig. 4.1. One can check that this results in the
same expression above, i.e. we may analytically continue from even values of N d to all positive values of
N d.
 
For a general system, the Laplace transform, Z(β) = L D(E) also is called the partition function. We
shall again meet up with Z(β) when we discuss the ordinary canonical ensemble.
Our final result, then, is
1
 N d/2 N d−1
VN m E2
D(E, V, N ) = . (4.16)
N! 2π~2 Γ(N d/2)
Here we have emphasized that the density of states is a function of E, V , and N . Using Stirling’s
approximation,

ln N ! = N ln N − N + 21 ln N + 12 ln(2π) + O N −1 , (4.17)

we may define the statistical entropy,


 
E V
S(E, V, N ) ≡ kB ln D(E, V, N ) = N kB φ , + O(ln N ) , (4.18)
N N
4.2. MICROCANONICAL ENSEMBLE (µCE) 169

where        
E V d E V d m 1

φ , = ln + ln + ln + 1 + 2 d . (4.19)
N N 2 N N 2 dπ~2
Recall kB = 1.3806503 × 10−16 erg/K is Boltzmann’s constant.

Second method

The second method invokes a mathematical trick. First, let’s rescale pαi ≡ 2mE uαi . We then have
√ !N d Z
VN 2mE 1 
D(E) = dMu δ u21 + u22 + . . . + u2M − 1 . (4.20)
N! h E

Here we have written u = (u1 , u2 , . . . , uM ) with M = N d as a M -dimensional vector. We’ve also used
the rule δ(Ex) = E −1 δ(x) for δ-functions. We can now write

dMu = uM −1 du dΩM , (4.21)

where dΩM is the M -dimensional differential solid angle. We now have our answer:3
√ !N d
VN 2m 1
N d−1
D(E) = E2 · 21 ΩN d . (4.22)
N! h

What remains is for us to compute ΩM , the total solid angle in M dimensions. We do this by a nifty
mathematical trick. Consider the integral
Z Z∞
−u2 2
IM = dMu e = ΩM du uM −1 e−u
0
(4.23)
Z∞
1
1
M −1 −s 
= 2 ΩM ds s 2 e = 12 ΩM Γ 1
2M ,
0

where s = u2 , and where


Z∞
Γ(z) = dt tz−1 e−t (4.24)
0

is the Gamma function, which satisfies z Γ(z) = Γ(z + 1).4 On the other hand, we can compute IM in
Cartesian coordinates, writing
 ∞ M
Z
2 √ M
IM =  du1 e−u1  = π . (4.25)
−∞
3
The factor of 21 preceding ΩM in eqn. 4.22 appears because δ(u2 − 1) = 1
2
δ(u − 1) + 1
2
δ(u + 1). Since u = |u| ≥ 0, the
second term can be dropped.
4
Note that for integer argument, Γ(k) = (k − 1)!
170 CHAPTER 4. STATISTICAL ENSEMBLES

Therefore
2π M/2
ΩM = . (4.26)
Γ(M/2)
We thereby obtain Ω2 = 2π, Ω3 = 4π, Ω4 = 2π 2 , etc., the first two of which are familiar.

4.2.3 Arbitrariness in the definition of S(E)

Note that D(E) has dimensions of inverse energy, so one might ask how we are to take the logarithm
of a dimensionful quantity in eqn. 4.18. We must introduce an energy scale, such as ∆E in eqn. 4.9,
and define D̃(E; ∆E) = D(E) ∆E and S(E; ∆E) ≡ kB ln D̃(E; ∆E). The definition of statistical entropy
then involves the arbitrary parameter ∆E, however this only affects S(E) in an additive way. That is,
 
∆E1
S(E, V, N ; ∆E1 ) = S(E, V, N ; ∆E2 ) + kB ln . (4.27)
∆E2

Note that the difference between the two definitions of S depends only on the ratio ∆E1 /∆E2 , and is
independent of E, V , and N .

4.2.4 Ultra-relativistic ideal gas

Consider an ultrarelativistic ideal gas, with single particle dispersion ε(p) = cp. We then have
 N
Z∞
VN ΩdN  dp pd−1 e−βcp 
Z(β) =
N! hN d
0 (4.28)
N
 N
V Γ(d) Ωd
= .
N! cd hd β d
E V

The statistical entropy is S(E, V, N ) = kB ln D(E, V, N ) = N kB φ N , N , with
       
E V E V Ωd Γ(d)
φ , = d ln + ln + ln + (d + 1) (4.29)
N N N N (dhc)d

4.2.5 Discrete systems

For classical systems where the energy levels are discrete, the states of the system | σ i are labeled by a
set of discrete quantities {σ1 , σ2 , . . .}, where each variable σi takes discrete values. The number of ways
of configuring the system at fixed energy E is then
X
Ω(E, N ) = δĤ(σ),E , (4.30)
σ

where the sum is over all possible configurations. Here N labels the total number of particles. For
example, if we have N spin- 12 particles on a lattice which are placed in a magnetic field H, so the
4.3. THE QUANTUM MECHANICAL TRACE 171

individual particle energy is εi = −µ0 Hσ, where σ = ±1, then in a configuration in which N↑ particles
have σi = +1 and N↓ = N − N↑ particles have σi = −1, the energy is E = (N↓ − N↑ )µ0 H. The number
of configurations at fixed energy E is
 
N N!
Ω(E, N ) = = N E
 N E
 , (4.31)
N↑ 2 − 2µ H ! 2 + 2µ H ! 0 0

N E
since N↑/↓ = 2 ∓ 2µ0 H . The statistical entropy is S(E, N ) = kB ln Ω(E, N ).

4.3 The Quantum Mechanical Trace

Thus far our understanding of ergodicity is rooted in the dynamics of classical mechanics. A Hamiltonian
flow which is ergodic is one in which time averages can be replaced by phase space averages using the
microcanonical ensemble. What happens, though, if our system is quantum mechanical, as all systems
ultimately are?

4.3.1 The density matrix

First, let us consider that our system S will in general


be in contact with a world W . We call the union

of S and W the universe, U = W ∪ S. Let N denote a quantum mechanical state of W , and let n
denote a quantum mechanical state of S. Then the most general wavefunction we can write is of the form
X
Ψ = ΨN,n N ⊗ n . (4.32)
N,n

Now let us ′ the expectation value of some operator  which acts as the identity within W ,

compute
meaning N  N =  δN N ′ , where  is the ‘reduced’ operator which acts within S alone. We then
have

XX

Ψ Â Ψ = Ψ∗N,n ΨN ′ ,n′ δN N ′ n  n′
N,N ′ n,n′ (4.33)

= Tr ̺ˆ Â ,
where XX

̺ˆ = Ψ∗N,n ΨN,n′ n′ n (4.34)
N n,n′

is the density matrix . The time-dependence of ̺ˆ is easily found:


XX

̺ˆ(t) = Ψ∗N,n ΨN,n′ n′ (t) n(t)
N n,n′ (4.35)
= e−iĤt/~ ̺ˆ e+iĤt/~ ,

where Ĥ is the Hamiltonian for the system S. Thus, we find


∂ ̺ˆ  
i~ = Ĥ, ̺ˆ . (4.36)
∂t
172 CHAPTER 4. STATISTICAL ENSEMBLES

Figure 4.2: A system S in contact with a ‘world’ W . The union of the two, universe U = W ∪ S, is said
to be the ‘universe’.

Note that the density matrix evolves according to a slightly different equation than an operator in the
Heisenberg picture, for which

∂ Â    
Â(t) = e+iHt/~ A e−iĤt/~ =⇒ i~ = Â, Ĥ = − Ĥ, Â . (4.37)
∂t

For Hamiltonian systems, we found that the phase space distribution ̺(q, p, t) evolved according to the
Liouville equation,
∂̺
i = L̺ , (4.38)
∂t
where the Liouvillian L is the differential operator
Nd
( )
X ∂ Ĥ ∂ ∂ Ĥ ∂
L = −i − . (4.39)
∂pj ∂qj ∂qj ∂pj
j=1

Accordingly, any distribution ̺(Λ1 , . . . , Λk ) which is a function of constants of the motion Λa (q, p) is
a stationary solution to the Liouville equation: ∂t ̺(Λ1 , . . . , Λk ) = 0. Similarly, any quantum mechan-
ical density matrix which commutes with the Hamiltonian is a stationary solution to eqn. 4.36. The
corresponding microcanonical distribution is

̺ˆE = δ E − Ĥ . (4.40)

4.3.2 Averaging the DOS

If our quantum mechanical system is placed in a finite volume, the energy levels will be discrete, rather
than continuous, and the density of states (DOS) will be of the form
 X
D(E) = Tr δ E − Ĥ = δ(E − El ) , (4.41)
l
4.3. THE QUANTUM MECHANICAL TRACE 173

Figure 4.3: Averaging the quantum mechanical discrete density of states yields a continuous curve.

where {El } are the eigenvalues of the Hamiltonian Ĥ. In the thermodynamic limit, V → ∞, and the
discrete spectrum of kinetic energies remains discrete for all finite V but must approach the continuum
result. To recover the continuum result, we average the DOS over a window of width ∆E:
Z
E+∆E
1
D(E) = dE ′ D(E ′ ) . (4.42)
∆E
E

If we take the limit ∆E → 0 but with ∆E ≫ δE, where δE is the spacing between successive quantized
levels, we recover a smooth function, as shown in fig. 4.3. We will in general drop the bar and refer to
this function as D(E). Note that δE ∼ 1/D(E) = e−N φ(ε,v) is (typically) exponentially small in the size
of the system, hence if we took ∆E ∝ V −1 which vanishes in the thermodynamic limit, there are still
exponentially many energy levels within an interval of width ∆E.

4.3.3 Coherent states

The quantum-classical correspondence is elucidated with the use of coherent states. Recall that the
one-dimensional harmonic oscillator Hamiltonian may be written
p2
Ĥ0 = + 1 m ω02 q 2
2m 2  (4.43)
= ~ω0 a† a + 21 ,
 
where a and a† are ladder operators satisfying a, a† = 1, which can be taken to be
∂ q ∂ q
a=ℓ + , a† = −ℓ + , (4.44)
∂q 2ℓ ∂q 2ℓ
p
with ℓ = ~/2mω0 . Note that
 ~ 
q = ℓ a + a† , p= a − a† . (4.45)
2iℓ
174 CHAPTER 4. STATISTICAL ENSEMBLES

The ground state satisfies a ψ0 (q) = 0, which yields


2 /4ℓ2
ψ0 (q) = (2πℓ2 )−1/4 e−q . (4.46)

The normalized coherent state | z i is defined as

X∞
1 2 † 1 2 zn
| z i = e− 2 |z| eza | 0 i = e− 2 |z| √ |ni . (4.47)
n=0
n!

The overlap of coherent states is given by


1 2 1 2
h z1 | z2 i = e− 2 |z1 | e− 2 |z2 | ez̄1 z2 , (4.48)

hence different coherent states are not orthogonal. Despite this nonorthogonality, the coherent states
allow a simple resolution of the identity,
Z 2
dz d2z d Rez d Imz
1= | z ih z | ; ≡ (4.49)
2πi 2πi π
which is straightforward to establish.
To gain some physical intuition about the coherent states, define
Q iℓP
z≡ + (4.50)
2ℓ ~
and write | z i ≡ | Q, P i. One finds (exercise!)
2 /4ℓ2
ψQ,P (q) = h q | z i = (2πℓ2 )−1/4 e−iP Q/2~ eiP q/~ e−(q−Q) , (4.51)

hence the coherent state ψQ,P (q) is a wavepacket Gaussianly localized about q = Q, but oscillating with
average momentum P .
For example, we can compute



Q, P q Q, P = z ℓ (a + a† ) z = 2ℓ Re z = Q (4.52)


~ ~
Q, P p Q, P = z (a − a† ) z = Im z = P (4.53)
2iℓ ℓ
as well as



Q, P q 2 Q, P = z ℓ2 (a + a† )2 z = Q2 + ℓ2 (4.54)


~2 ~2
Q, P p2 Q, P = − z 2 (a − a† )2 z = P 2 + 2 . (4.55)
4ℓ 4ℓ
Thus, the root mean square fluctuations in the coherent state | Q, P i are
s r
~ ~ m~ω0
∆q = ℓ = , ∆p = = , (4.56)
2mω0 2ℓ 2
4.4. THERMAL EQUILIBRIUM 175

and ∆q · ∆p = 12 ~. Thus we learn that the coherent state ψQ,P (q) is localized in phase space, i.e. in both
position and momentum. If we have a general operator Â(q, p), we can then write


Q, P Â(q, p) Q, P = A(Q, P ) + O(~) , (4.57)

where A(Q, P ) is formed from Â(q, p) by replacing q → Q and p → P .


Since
d2z d Rez d Imz dQ dP
≡ = , (4.58)
2πi π 2π~
we can write the trace using coherent states as
Z∞ Z∞
1

Tr  = dQ dP Q, P  Q, P . (4.59)
2π~
−∞ −∞

We now can understand the origin of the factor 2π~ in the denominator of each (qi , pi ) integral over
classical phase space in eqn. 4.6.
Note that ω0 is arbitrary in our discussion. By increasing ω0 , the states become more localized in q
and more plane wave like in p. However, so long as ω0 is finite, the width of the coherent state in each
direction is proportional to ~1/2 , and thus vanishes in the classical limit.

4.4 Thermal Equilibrium

4.4.1 Two systems in thermal contact

Consider two systems in thermal contact, as depicted in fig. 4.4. The two subsystems #1 and #2 are
free to exchange energy, but their respective volumes and particle numbers remain fixed. We assume
the contact is made over a surface, and that the energy associated with that surface is negligible when
compared with the bulk energies E1 and E2 . Let the total energy be E = E1 + E2 . Then the density of
states D(E) for the combined system is
Z
D(E) = dE1 D1 (E1 ) D2 (E − E1 ) . (4.60)

The probability density for system #1 to have energy E1 is then


D1 (E1 ) D2 (E − E1 )
P1 (E1 ) = . (4.61)
D(E)
R
Note that P1 (E1 ) is normalized: dE1 P1 (E1 ) = 1. We now ask: what is the most probable value of E1 ?
We find out by differentiating P1 (E1 ) with respect to E1 and setting the result to zero. This requires
1 dP1 (E1 ) ∂
0= = ln P1 (E1 )
P1 (E1 ) dE1 ∂E1
(4.62)
∂ ∂
= ln D1 (E1 ) + ln D2 (E − E1 ) .
∂E1 ∂E1
176 CHAPTER 4. STATISTICAL ENSEMBLES

Figure 4.4: Two systems in thermal contact.

We conclude that the maximally likely partition of energy between systems #1 and #2 is realized when

∂S1 ∂S2
= . (4.63)
∂E1 ∂E2

This guarantees that


S(E, E1 ) = S1 (E1 ) + S2 (E − E1 ) (4.64)

is a maximum with respect to the energy E1 , at fixed total energy E.


The temperature T is defined as
 
1 ∂S
= , (4.65)
T ∂E V,N

a result familiar from thermodynamics. The difference is now we have a more rigorous definition of the
entropy. When the total entropy S is maximized, we have that T1 = T2 . Once again, two systems in
thermal contact and can exchange energy will in equilibrium have equal temperatures.
According to eqns. 4.19 and 4.29, the entropies of nonrelativistic and ultrarelativistic ideal gases in d
space dimensions are given by
   
1 E V
SNR = 2 N d kB ln + N kB ln + const. (4.66)
N N
   
E V
SUR = N d kB ln + N kB ln + const. . (4.67)
N N

Invoking eqn. 4.65, we then have

ENR = 21 N d kB T , EUR = N d kB T . (4.68)

We saw that the probability distribution P1 (E1 ) is maximized when T1 = T2 , but how sharp is the peak
in the distribution? Let us write E1 = E1∗ + ∆E1 , where E1∗ is the solution to eqn. 4.62. We then have

1 ∂ 2S1 1 ∂ 2S2
ln P1 (E1∗ + ∆E1 ) = ln P1 (E1∗ ) + 2
(∆E1 ) + (∆E1 )2 + . . . , (4.69)
2kB ∂E12 E ∗ 2kB ∂E22 E ∗
1 2
4.4. THERMAL EQUILIBRIUM 177

where E2∗ = E − E1∗ . We must now evaluate


   
∂ 2S ∂ 1 1 ∂T 1
2
= =− 2 =− 2 , (4.70)
∂E ∂E T T ∂E V,N T CV

where CV = ∂E/∂T V,N is the heat capacity. Thus,
2 /2k 2 C̄
P1 = P1∗ e−(∆E1 ) BT V , (4.71)
where
CV,1 CV,2
C̄V = . (4.72)
CV,1 + CV,2
The distribution is therefore a Gaussian, and the fluctuations in ∆E1 can now be computed:

q
(∆E1 )2 = kB T 2 C̄V =⇒ (∆E1 )RMS = kB T C̄V /kB . (4.73)
The individual heat capacities CV,1 and CV,2 scale with the volumes V1 and V2 , respectively. If V2 ≫ V1 ,
then CV,2 ≫ CV,1 , in which case C̄V ≈ CV,1 . Therefore the RMS fluctuations in ∆E1 are proportional to
the square root of the system size, whereas E1 itself is extensive. Thus, the ratio (∆E1 )RMS /E1 ∝ V −1/2
scales as the inverse square root of the volume. The distribution P1 (E1 ) is thus extremely sharp.

4.4.2 Thermal, mechanical and chemical equilibrium



We have dS V,N = 1
T dE , but in general S = S(E, V, N ). Equivalently, we may write E = E(S, V, N ).
 
The full differential of E(S, V, N ) is then dE = T dS − p dV + µ dN , with T = ∂E ∂E
∂S V,N and p = − ∂V S,N
∂E

and µ = ∂N S,V
. As we shall discuss in more detail, p is the pressure and µ is the chemical potential.
We may thus write the total differential dS as
1 p µ
dS = dE + dV − dN . (4.74)
T T T
Employing the same reasoning as in the previous section, we conclude that entropy maximization for two
systems in contact requires the following:

• If two systems can exchange energy, then T1 = T2 . This is thermal equilibrium.


• If two systems can exchange volume, then p1 /T1 = p2 /T2 . This is mechanical equilibrium.
• If two systems can exchange particle number, then µ1 /T1 = µ2 /T2 . This is chemical equilibrium.

4.4.3 Gibbs-Duhem relation

The energy E(S, V, N ) is an extensive function of extensive variables, i.e. it is homogeneous of degree
one in its arguments. Therefore E(λS, λV, λN ) = λE, and taking the derivative with respect to λ yields
     
∂E ∂E ∂E
E=S +V +N
∂S V,N ∂V S,N ∂N S,V (4.75)
= T S − pV + µN .
178 CHAPTER 4. STATISTICAL ENSEMBLES

Taking the differential of each side, using the Leibniz rule on the RHS, and plugging in dE = T dS −
p dV + µ dN , we arrive at the Gibbs-Duhem relation 5 ,
S dT − V dp + N dµ = 0 . (4.76)
This, in turn, says that any one of the intensive quantities (T, p, µ) can be written as a function of the
other two, in the case of a single component system.

4.5 Ordinary Canonical Ensemble (OCE)

4.5.1 Canonical distribution and partition function

Consider a system S in contact with a world W , and let their union U = W ∪ S be called the ‘universe’.
The situation is depicted in fig. 4.2. The volume VS and particle number NS of the system are held
fixed, but the energy is allowed to fluctuate by exchange with the world W . We are interested in the
limit NS → ∞, NW → ∞, with NS ≪ NW , with similar relations holding for the respective volumes and
energies. We now ask what is the probability that S is in a state | n i with energy En . This is given by
the ratio
DW (EU − En ) ∆E
Pn = lim
∆E→0 DU (EU ) ∆E
(4.77)
# of states accessible to W given that ES = En
= .
total # of states in U

Then
ln Pn = ln DW (EU − En ) − ln DU (EU )

∂ ln DW (E)
= ln DW (EU ) − ln DU (EU ) − En + ... (4.78)
∂E E=E
U

≡ −α − βEn .
The constant β is given by
∂ ln DW (E) 1
β= = . (4.79)
∂E E=E kB T
U

P
Thus, we find Pn = e−α e−βEn . The constant α is fixed by the requirement that n Pn = 1:
1 X
Pn = e−βEn , Z(T, V, N ) = e−βEn = Tr e−β Ĥ . (4.80)
Z n

We’ve already met Z(β) in eqn. 4.12 – it is the Laplace transform of the density of states. It is also
called the partition function of the system S. Quantum mechanically, we can write the ordinary canonical
density matrix as
e−β Ĥ
̺ˆ = , (4.81)
Tr e−β Ĥ
5
See §2.7.4.
4.5. ORDINARY CANONICAL ENSEMBLE (OCE) 179

 
which is known as the Gibbs distribution. Note that ̺ˆ, Ĥ = 0, hence the ordinary canonical distribution
is a stationary solution to the evolution equation for the density matrix. Note that the OCE is specified
by three parameters: T , V , and N .

4.5.2 The difference between P (En ) and Pn

Let the total energy of the Universe be fixed at EU . The joint probability density P (ES , EW ) for the
system to have energy ES and the world to have energy EW is

P (ES , EW ) = DS (ES ) DW (EW ) δ(EU − ES − EW ) DU (EU ) , (4.82)
where
Z∞
DU (EU ) = dES DS (ES ) DW (EU − ES ) , (4.83)
−∞
R R
which ensures that dES dEW P (ES , EW ) = 1. The probability density P (ES ) is defined such that
P (ES ) dES is the (differential) probability for the system to have an energy in the range [ES , ES + dES ].
The units of P (ES ) are E −1 . To obtain P (ES ), we simply integrate the joint probability density P (ES , EW )
over all possible values of EW , obtaining
DS (ES ) DW (EU − ES )
P (ES ) = , (4.84)
DU (EU )
as we have in eqn. 4.77.
Now suppose we wish to know the probability Pn that the system is in a particular state | n i with energy
En . Clearly
probability that ES ∈ [En , En + ∆E] P (En ) ∆E D (E − En )
Pn = lim = = W U . (4.85)
∆E→0 # of S states with ES ∈ [En , En + ∆E] DS (En ) ∆E DU (EU )

4.5.3 Additional remarks

The formula of eqn. 4.77 is quite general and holds in the case where NS /NW = O(1), so long as we are
in the thermodynamic limit, where the energy associated with the interface between S and W may be
neglected. In this case, however, one is not licensed to perform the subsequent Taylor expansion, and the
distribution Pn is no longer of the Gibbs form. It is also valid for quantum systems6 , in which case we
interpret Pn = hn|̺S |ni as a diagonal element of the density matrix ̺S . The density of states functions
may then be replaced by
EU −E
Z n +∆E
SW (EU −En , ∆E)
DW (EU − En ) ∆E → e ≡ Tra dE δ(E − ĤW )
W
EU −En
(4.86)
EUZ+∆E

DU (EU ) ∆E → eSU (EU , ∆E) ≡ Tra dE δ(E − ĤU ) .


U
EU
6
See T.-C. Lu and T. Grover, arXiv 1709.08784.
180 CHAPTER 4. STATISTICAL ENSEMBLES

The off-diagonal matrix elements of ̺S are negligible in the thermodynamic limit.

4.5.4 Averages within the OCE

To compute averages within the OCE,


P

 n hn|Â|ni e−βEn
 = Tr ̺ˆ  = P −βE , (4.87)
ne
n

where we have conveniently taken the trace in a basis of energy eigenstates. In the classical limit, we
have Z
1
̺(ϕ) = e−β Ĥ(ϕ) , Z = Tr e−β Ĥ = dµ e−β Ĥ(ϕ) , (4.88)
Z
Q
with dµ = N1 ! N d d d
j=1 (d qj d pj /h ) for identical particles (‘Maxwell-Boltzmann statistics’). Thus,

R
dµ A(ϕ) e−β Ĥ(ϕ)
hAi = Tr (̺A) = R . (4.89)
dµ e−β Ĥ(ϕ)

4.5.5 Entropy and free energy

The Boltzmann entropy is defined by


X
S = −kB Tr ̺ˆ ln ̺ˆ) = −kB Pn ln Pn . (4.90)
n

The Boltzmann entropy and the statistical entropy S = kB ln D(E) are identical in the thermodynamic
limit. We define the Helmholtz free energy F (T, V, N ) as

F (T, V, N ) = −kB T ln Z(T, V, N ) , (4.91)

hence
Pn = eβF e−βEn , ln Pn = βF − βEn . (4.92)
Therefore the entropy is
X  F h Ĥ i
S = −kB Pn βF − βEn = − + , (4.93)
n
T T
which is to say F = E − T S, where

X Tr Ĥ e−β Ĥ
E= Pn En = (4.94)
n Tr e−β Ĥ

is the average energy. We also see that


X P −βEn
n En e ∂ ∂ 
Z = Tr e−β Ĥ = e−βEn =⇒ E= P −βEn
= − ln Z = βF . (4.95)
n ne ∂β ∂β
4.5. ORDINARY CANONICAL ENSEMBLE (OCE) 181

Thus, F (T, V, N ) is a Legendre transform of E(S, V, N ), with

dF = −S dT − p dV + µ dN , (4.96)

which means
     
∂F ∂F ∂F
S=− , p=− , µ=+ . (4.97)
∂T V,N ∂V T,N ∂N T,V

4.5.6 Fluctuations in the OCE

In the OCE, the energy is not fixed. It therefore fluctuates about its average value E = hĤi. Note that

∂E ∂E ∂ 2 ln Z
− = kB T 2 =
∂β ∂T ∂β 2
!2
Tr Ĥ 2 e−β Ĥ Tr Ĥ e−β Ĥ (4.98)
= −
Tr e−β Ĥ Tr e−β Ĥ

2

2
= Ĥ − Ĥ .

Thus, the heat capacity is related to the fluctuations in the energy, just as we saw at the end of §4.4:
 
∂E 1 
2
2 
CV = = Ĥ − Ĥ (4.99)
∂T V,N kB T 2

For the nonrelativistic ideal gas, we found CV = d2 N kB , hence the ratio of RMS fluctuations in the energy
to the energy itself is q
p r
(∆Ĥ)2 kB T 2 CV 2
= d = , (4.100)
hĤi 2 N kB T
Nd
and the ratio of the RMS fluctuations to the mean value vanishes in the thermodynamic limit.
The full distribution function for the energy is


Tr δ(E − Ĥ) e−β Ĥ 1
P (E) = δ(E − Ĥ) = = D(E) e−βE . (4.101)
Tr e−β Ĥ Z
Thus,
e−β[E−T S(E)]
P (E) = R ′ ′ , (4.102)
dE ′ e−β[E −T S(E )]
where S(E) = kB ln D(E) is the statistical entropy. Let’s write E = E + δE, where E extremizes the
combination E − T S(E), i.e. the solution to T S ′ (E) = 1, where the energy derivative of S is performed
at fixed volume V and particle number N . We now expand S(E + δE) to second order in δE, obtaining
2
δE δE
S(E + δE) = S(E) + − + ... (4.103)
T 2T 2 CV
182 CHAPTER 4. STATISTICAL ENSEMBLES

∂ 1

Recall that S ′′ (E) = ∂E T = − T 21C . Thus,
V

(δE)2 
E − T S(E) = E − T S(E) + + O (δE)3 . (4.104)
2T CV

Applying this to both numerator and denominator of eqn. 4.102, we obtain7


" #
(δE)2
P (E) = N exp − , (4.105)
2kB T 2 CV
R
where N = (2πkB T 2 CV )−1/2 is a normalization constant which guarantees dE P (E) = 1. Once again,
p
we see that the distribution is a Gaussian centered at hEi = E, and of width (∆E)RMS = kB T 2 CV .
This is a consequence of the Central Limit Theorem.

4.5.7 Thermodynamics revisited

The average energy within the OCE is X


E= En Pn , (4.106)
n
and therefore
X X
dE = En dPn + Pn dEn
n n (4.107)
¯ − dW
= dQ ¯ ,

where
X
¯ =−
dW Pn dEn (4.108)
n
X
dQ
¯ = En dPn . (4.109)
n

Finally, from Pn = Z −1 e−En /kB T , we can write

En = −kB T ln Z − kB T ln Pn , (4.110)

with which we obtain


X
dQ
¯ = En dPn
n
X X
= −kB T ln Z dPn − kB T ln Pn dPn (4.111)
n n
 X 
= T d − kB Pn ln Pn = T dS .
n
7
In applying eqn. 4.104 to the denominator of eqn. 4.102, we shift E ′ by E and integrate over the difference δE ′ ≡ E ′ − E,
retaining terms up to quadratic order in δE ′ in the argument of the exponent.
4.5. ORDINARY CANONICAL ENSEMBLE (OCE) 183

Figure 4.5: Microscopic, statistical interpretation of the First Law of Thermodynamics.

Note also that


X
¯ =−
dW Pn dEn
n
!
X X ∂E
n
=− Pn dXi (4.112)
n
∂Xi
i
X
∂ Ĥ X
=− Pn n n dXi ≡ Fi dXi ,
∂Xi
n,i i

so the generalized force Fi conjugate to the generalized displacement dXi is

X  
∂En ∂ Ĥ
Fi = − Pn =− . (4.113)
n
∂Xi ∂Xi

This is the force acting on the system8 . In the chapter on thermodynamics, we defined the generalized
force conjugate to Xi as yi ≡ −Fi .
Thus we see from eqn. 4.107 that there are two ways that the average energy can change; these are
depicted in the sketch of fig. 4.5. Starting from a set of energy levels {En } and probabilities {Pn }, we
8
In deriving eqn. 4.113, we have used the so-called Feynman-Hellman theorem of quantum mechanics: dhn|Ĥ|ni =
hn| dĤ |ni, if |ni is an energy eigenstate.
184 CHAPTER 4. STATISTICAL ENSEMBLES

can shift the energies to {En′ }. The resulting change in energy (∆E)I = −W is identified with the work
done on the system. We could also modify the probabilities to {Pn′ } without changing the energies. The
energy change in this case is the heat absorbed by the system: (∆E)II = Q. This provides us with a
statistical and microscopic interpretation of the First Law of Thermodynamics.

4.5.8 Generalized susceptibilities

Suppose our Hamiltonian is of the form

Ĥ = Ĥ(λ) = Ĥ0 − λ Q̂ , (4.114)

where λ is an intensive parameter, such as magnetic field. Then

Z(λ) = Tr e−β(Ĥ0 −λQ̂) (4.115)

and  
1 ∂Z 1
= β · Tr Q̂ e−β Ĥ(λ) = β hQ̂i . (4.116)
Z ∂λ Z
But then from Z = e−βF we have
 
∂F
Q(λ, T ) = h Q̂ i = − . (4.117)
∂λ T

Typically we will take Q to be an extensive quantity. We can now define the susceptibility χ as

1 ∂Q 1 ∂ 2F
χ= =− . (4.118)
V ∂λ V ∂λ2
The volume factor in the denominator ensures that χ is intensive.
 
It is important to realize that we have assumed here that Ĥ0 , Q̂ = 0, i.e. the ‘bare’ Hamiltonian Ĥ0
and the operator Q̂ commute. If they do not commute, then the response functions must be computed
within a proper quantum mechanical formalism, which we shall not discuss here.
  
Note also that we can imagine an entire family of observables Q̂k satisfying Q̂k , Q̂k′ = 0 and
 
Ĥ0 , Q̂k = 0, for all k and k′ . Then for the Hamiltonian
X
Ĥ (~λ) = Ĥ0 − λk Q̂k , (4.119)
k

we have that  
∂F
Qk (~λ, T ) = h Q̂k i = − (4.120)
∂λk T, Na , λk′ 6=k

and we may define an entire matrix of susceptibilities,

1 ∂Qk 1 ∂ 2F
χkl = =− . (4.121)
V ∂λl V ∂λk ∂λl
4.6. GRAND CANONICAL ENSEMBLE (GCE) 185

4.6 Grand Canonical Ensemble (GCE)

4.6.1 Grand canonical distribution and partition function

Consider once again the situation depicted in fig. 4.2, where a system S is in contact with a world W ,
their union U = W ∪ S being called the ‘universe’. We assume that the system’s volume VS is fixed,
but otherwise it is allowed to exchange energy and particle number with W . Hence, the system’s energy
ES and particle number NS will fluctuate. We ask what is the probability that S is in a state | n i with
energy En and particle number Nn . This is given by the ratio
DW (EU − En , NU − Nn ) ∆E
Pn = lim
∆E→0 DU (EU , NU ) ∆E
(4.122)
# of states accessible to W given that ES = En and NS = Nn
= .
total # of states in U

Then
ln Pn = ln DW (EU − En , NU − Nn ) − ln DU (EU , NU )

= ln DW (EU , NU ) − ln DU (EU , NU )
(4.123)
∂ ln DW (E, N ) ∂ ln DW (E, N )
− En E=EU − Nn E=EU + . . .
∂E ∂N
N=N N=N
U U

≡ −α − βEn + βµNn .
The constants β and µ are given by

∂ ln DW (E, N ) 1
β= = (4.124)
∂E E=E
U kB T
N=N
U

∂ ln DW (E, N )
µ = −kB T E=EU . (4.125)
∂N
N=N
U

The quantity µ has dimensions of energy and is called the chemical potential. Nota bene: Some texts
define the ‘grand canonical Hamiltonian’ K̂ as

K̂ ≡ Ĥ − µN̂ . (4.126)
P
Thus, Pn = e−α e−β(En −µNn ) . Once again, the constant α is fixed by the requirement that n Pn = 1:
1 −β(En −µNn ) X
Pn = e , Ξ(β, V, µ) = e−β(En −µNn ) = Tr e−β(Ĥ−µN̂ ) = Tr e−β K̂ . (4.127)
Ξ n

Thus, the quantum mechanical grand canonical density matrix is given by

e−β K̂
̺ˆ = . (4.128)
Tr e−β K̂
186 CHAPTER 4. STATISTICAL ENSEMBLES

 
Note that ̺ˆ, K̂ = 0. The quantity Ξ(T, V, µ) is called the grand partition function. It stands in relation
to a corresponding free energy in the usual way:

Ξ(T, V, µ) ≡ e−βΩ(T,V,µ) ⇐⇒ Ω = −kB T ln Ξ , (4.129)

where Ω(T, V, µ) is the grand potential, also known as the Landau free energy. The dimensionless quantity
z ≡ eβµ is called the fugacity.
 
If Ĥ, N̂ = 0, the grand potential may be expressed as a sum over contributions from each N sector, viz.
X
Ξ(T, V, µ) = eβµN Z(T, V, N ) . (4.130)
N

When there is more than one species, we have several chemical potentials {µa }, and accordingly we define
X
K̂ = Ĥ − µa N̂a , (4.131)
a

with Ξ = Tr e−β K̂ as before.

4.6.2 Entropy and Gibbs-Duhem relation

In the GCE, the Boltzmann entropy is


X
S = −kB Pn ln Pn
n
X  
= −kB Pn βΩ − βEn + βµNn (4.132)
n
Ω hĤi µ hN̂ i
=− + − ,
T T T
which says
Ω = E − T S − µN , (4.133)
where
X 
E= En Pn = Tr ̺ˆ Ĥ (4.134)
n
X 
N= Nn Pn = Tr ̺ˆ N̂ . (4.135)
n

Therefore, Ω(T, V, µ) is a double Legendre transform of E(S, V, N ), with

dΩ = −S dT − p dV − N dµ , (4.136)

which entails
     
∂Ω ∂Ω ∂Ω
S=− , p=− , N =− . (4.137)
∂T V,µ ∂V T,µ ∂µ T,V
4.6. GRAND CANONICAL ENSEMBLE (GCE) 187

Since Ω(T, V, µ) is an extensive quantity, we must be able to write Ω = V ω(T, µ). We identify the
function ω(T, µ) as the negative of the pressure:
 
∂Ω kB T ∂Ξ 1 X ∂En −β(En −µNn )
=− = e
∂V Ξ ∂V T,µ Ξ n ∂V
  (4.138)
∂E
= = −p(T, µ) .
∂V T,µ
Therefore,
Ω = −pV , p = p(T, µ) (equation of state) . (4.139)
This is consistent with the result from thermodynamics that G = E − T S + pV = µN . Taking the
differential, we recover the Gibbs-Duhem relation,
dΩ = −S dT − p dV − N dµ = −p dV − V dp ⇒ S dT − V dp + N dµ = 0 . (4.140)

4.6.3 Generalized susceptibilities in the GCE

We can appropriate the results §4.5.8 and


 from  apply them,
 mutatis
 mutandis,
 to the
 GCE. Suppose we
have a family of observables Q̂k satisfying Q̂k , Q̂k′ = 0 and Ĥ0 , Q̂k = 0 and N̂a , Q̂k = 0 for all
k, k′ , and a. Then for the grand canonical Hamiltonian
X X
K̂ (~λ) = Ĥ0 − µa N̂a − λk Q̂k , (4.141)
a k

we have that  
∂Ω
Qk (~λ, T ) = h Q̂k i = − (4.142)
∂λk T,µa , λk′ 6=k

and we may define the matrix of generalized susceptibilities,


1 ∂Qk 1 ∂2Ω
χkl = =− . (4.143)
V ∂λl V ∂λk ∂λl

4.6.4 Fluctuations in the GCE

Both energy and particle number fluctuate in the GCE. Let us compute the fluctuations in particle
number. We have
Tr N̂ e−β(Ĥ−µN̂ ) 1 ∂
N = h N̂ i = = ln Ξ . (4.144)
Tr e−β(Ĥ−µN̂ ) β ∂µ
Therefore,
!2
1 ∂N Tr N̂ 2 e−β(Ĥ−µN̂ ) Tr N̂ e−β(Ĥ−µN̂ )
= −
β ∂µ Tr e−β(Ĥ−µN̂ ) Tr e−β(Ĥ−µN̂ )
(4.145)

2
2
= N̂ − N̂ .
188 CHAPTER 4. STATISTICAL ENSEMBLES

Note now that



2  
N̂ 2 − N̂ kB T ∂N k T

2 = 2 = B κT , (4.146)
N̂ N ∂µ T,V V
where κT is the isothermal compressibility. Note:
 
∂N ∂(N, T, V ) ∂(N, T, V )
= =−
∂µ T,V ∂(µ, T, V ) ∂(V, T, µ)
1
z }| {
∂(N, T, V ) ∂(N, T, p) ∂(V, T, p) ∂(N, T, µ) (4.147)
=− · · ·
∂(N, T, p) ∂(V, T, p) ∂(N, T, µ) ∂(V, T, µ)
 
N 2 ∂V N2
=− 2 = κ .
V ∂p T,N V T

Thus, r
(∆N )RMS kB T κT
= , (4.148)
N V
which again scales as V −1/2 .

4.6.5 Gibbs ensemble

Let the system’s particle number N be fixed, but let it exchange energy and volume with the world W .
Mutatis mutandis, we have
DW (EU − En , VU − Vn ) ∆E ∆V
Pn = lim lim . (4.149)
∆E→0 ∆V →0 DU (EU , VU ) ∆E ∆V
Then
ln Pn = ln DW (EU − En , VU − Vn ) − ln DU (EU , VU )

= ln DW (EU , VU ) − ln DU (EU , VU )
(4.150)
∂ ln DW (E, V ) ∂ ln DW (E, V )
− En E=EU − Vn E=EU + . . .
∂E ∂V
V =V V =V
U U

≡ −α − βEn − βp Vn .

The constants β and p are given by



∂ ln DW (E, V ) 1
β= = (4.151)
∂E E=E
U kB T
V =V
U

∂ ln DW (E, V )
p = kB T E=EU . (4.152)
∂V
V =V
U
4.7. STATISTICAL ENSEMBLES FROM MAXIMUM ENTROPY 189

The corresponding partition function is


Z∞
−β(Ĥ+pV ) 1
Y (T, p, N ) = Tr e = dV e−βpV Z(T, V, N ) ≡ e−βG(T,p,N ) , (4.153)
V0
0

where V0 is a constant which has dimensions of volume. The factor V0−1 in front of the integral renders
Y dimensionless. Note that G(V0′ ) = G(V0 ) + kB T ln(V0′ /V0 ), so the difference is not extensive and can
be neglected in the thermodynamic limit. In other words, it doesn’t matter what constant we choose for
V0 since it contributes subextensively to G. Moreover, in computing averages, the constant V0 divides
out in the ratio of numerator and denominator. Like the Helmholtz free energy, the Gibbs free energy
G(T, p, N ) is also a double Legendre transform of the energy E(S, V, N ), viz.

G = E − T S + pV
(4.154)
dG = −S dT + V dp + µ dN ,

which entails
     
∂G ∂G ∂G
S=− , V =+ , µ=+ . (4.155)
∂T p,N ∂p T,N ∂N T,p

4.7 Statistical Ensembles from Maximum Entropy

The basic principle: maximize the entropy,


X
S = −kB Pn ln Pn . (4.156)
n

4.7.1 µCE

We maximize S subject to the single constraint


X
C= Pn − 1 = 0 . (4.157)
n

We implement the constraint C = 0 with a Lagrange multiplier, λ̄ ≡ kB λ, writing

S ∗ = S − kB λ C , (4.158)

and freely extremizing over the distribution {Pn } and the Lagrange multiplier λ. Thus,

δS ∗ = δS − kB λ δC − kB C δλ
Xh i
= −kB ln Pn + 1 + λ δPn − kB C δλ ≡ 0 . (4.159)
n

We conclude that C = 0 and that 


ln Pn = − 1 + λ , (4.160)
190 CHAPTER 4. STATISTICAL ENSEMBLES

P
and we fix λ by the normalization condition n Pn = 1. This gives
1 X
Pn = , Ω= Θ(E + ∆E − En ) Θ(En − E) . (4.161)
Ω n

Note that Ω is the number of states with energies between E and E + ∆E.

4.7.2 OCE

We maximize S subject to the two constraints


X X
C1 = Pn − 1 = 0 , C2 = En Pn − E = 0 . (4.162)
n n

We now have two Lagrange multipliers. We write


2
X

S = S − kB λj Cj , (4.163)
j=1

and we freely extremize over {Pn } and {Cj }. We therefore have

X 2
X


δS = δS − kB λ1 + λ2 En δPn − kB Cj δλj
n j=1
(4.164)
Xh i 2
X
= −kB ln Pn + 1 + λ1 + λ2 En δPn − kB Cj δλj ≡ 0 .
n j=1

Thus, C1 = C2 = 0 and 
ln Pn = − 1 + λ1 + λ2 En . (4.165)
We define λ2 ≡ β and we fix λ1 by normalization. This yields
1 −βEn X
Pn = e , Z= e−βEn . (4.166)
Z n

4.7.3 GCE

We maximize S subject to the three constraints


X X X
C1 = Pn − 1 = 0 , C2 = En Pn − E = 0 , C3 = Nn Pn − N = 0 . (4.167)
n n n

We now have three Lagrange multipliers. We write


3
X
S ∗ = S − kB λj Cj , (4.168)
j=1
4.8. IDEAL GAS STATISTICAL MECHANICS 191

and hence
X 3
X

δS ∗ = δS − kB λ1 + λ2 En + λ3 Nn δPn − kB Cj δλj
n j=1
(4.169)
Xh i 3
X
= −kB ln Pn + 1 + λ1 + λ2 En + λ3 Nn δPn − kB Cj δλj ≡ 0 .
n j=1

Thus, C1 = C2 = C3 = 0 and 
ln Pn = − 1 + λ1 + λ2 En + λ3 Nn . (4.170)
We define λ2 ≡ β and λ3 ≡ −βµ, and we fix λ1 by normalization. This yields

1 −β(En −µNn ) X
Pn = e , Ξ= e−β(En −µNn ) . (4.171)
Ξ n

4.8 Ideal Gas Statistical Mechanics

The ordinary canonical partition function for the ideal gas was computed in eqn. 4.14. We found
N Z
1 Y ddxi ddpi −βp2i /2m
Z(T, V, N ) = e
N! (2π~)d
i=1
 ∞ N d
Z
VN  dp −βp2 /2m  (4.172)
= e
N! 2π~
−∞
 N
1 V
= ,
N ! λdT

where λT is the thermal wavelength: p


λT = 2π~2 /mkB T . (4.173)
The physical interpretation of λT is that it is the de Broglie wavelength for a particle of mass m which
has a kinetic energy of kB T .
In the GCE, we have

X
Ξ(T, V, µ) = eβµN Z(T, V, N )
N =0

!N ! (4.174)
X 1 V eµ/kB T V eµ/kB T
= = exp .
N!
N =1
λdT λdT

From Ξ = e−Ω/kB T , we have the grand potential is



Ω(T, V, µ) = −V kB T eµ/kB T λdT . (4.175)
192 CHAPTER 4. STATISTICAL ENSEMBLES

Since Ω = −pV (see §4.6.2), we have


p(T, µ) = kB T λ−d
T e
µ/kB T
. (4.176)
The number density can also be calculated:
 
N 1 ∂Ω
n= =− = λ−d
T e
µ/kB T
. (4.177)
V V ∂µ T,V

Combined, the last two equations recapitulate the ideal gas law, pV = N kB T .

4.8.1 Maxwell velocity distribution

The distribution function for momenta is given by


D1 X
N E
g(p) = δ(pi − p) . (4.178)
N
i=1


Note that g(p) = δ(pi − p) is the same for every particle, independent of its label i. We compute the

average hAi = Tr Ae−β Ĥ / Tr e−β Ĥ . Setting i = 1, all the integrals other than that over p1 divide out
between numerator and denominator. We then have
R 3 2
d p1 δ(p1 − p) e−βp1 /2m
g(p) = R 2
d3p1 e−βp1 /2m (4.179)
2 /2m
= (2πmkB T )−3/2 e−βp .
Textbooks commonly refer to the velocity distribution f (v), which is related to g(p) by
f (v) d3v = g(p) d3p . (4.180)
Hence,
 3/2
m 2 /2k
f (v) = e−mv BT . (4.181)
2πkB T
This is known as the Maxwell velocity distribution. Note that the distributions are normalized, viz.
Z Z
d3p g(p) = d3v f (v) = 1 . (4.182)

If we are only interested in averaging functions of v = |v| which are isotropic, then we can define the
Maxwell speed distribution, f˜(v), as
 3/2
˜ 2 m 2
f (v) = 4π v f (v) = 4π v 2 e−mv /2kB T . (4.183)
2πkB T

Note that f˜(v) is normalized according to


Z∞
dv f˜(v) = 1 . (4.184)
0
4.8. IDEAL GAS STATISTICAL MECHANICS 193


Figure 4.6: Maxwell distribution
q of speeds ϕ(v/v0 ). The most probable speed is vMAX = 2 v0 . The
8

average speed is vAVG = π v0 . The RMS speed is vRMS = 3 v0 .

p
It is convenient to represent v in units of v0 = kB T /m, in which case
q
1 2 /2
f˜(v) = ϕ(v/v0 ) , ϕ(s) = 2
π s2 e−s . (4.185)
v0

The distribution ϕ(s) is shown in fig. 4.6. Computing averages, we have

Z∞
2 
k
Ck ≡ hs i = ds sk ϕ(s) = 2k/2 · √ Γ 3
2 + k
2 . (4.186)
π
0
q
8
Thus, C0 = 1, C1 = π, C2 = 3, etc. The speed averages are

 k/2

k kB T
v = Ck . (4.187)
m
p
Note that the average velocity is hvi = 0, but the average speed is hvi = 8kB T /πm. The speed
distribution is plotted in fig. 4.6.

4.8.2 Equipartition

The Hamiltonian for ballistic (i.e. massive nonrelativistic) particles is quadratic in the individual com-
ponents of each momentum pi . There are other cases in which a classical degree of freedom appears
194 CHAPTER 4. STATISTICAL ENSEMBLES

quadratically in Ĥ as well. For example, an individual normal mode ξ of a system of coupled oscillators
has the Lagrangian
L = 21 ξ̇ 2 − 21 ω02 ξ 2 , (4.188)
where the dimensions of ξ are [ξ] = M 1/2 L by convention. The Hamiltonian for this normal mode is then

p2
Ĥ = + 1
2 ω02 ξ 2 , (4.189)
2
from which we see that both the kinetic as well as potential energy terms enter quadratically into the
Hamiltonian. The classical rotational kinetic energy is also quadratic in the angular momentum compo-
nents.
Let us compute the contribution of a single quadratic degree of freedom in Ĥ to the partition function.
We’ll call this degree of freedom ζ – it may be a position or momentum or angular momentum – and
we’ll write its contribution to Ĥ as
Ĥζ = 21 Kζ 2 , (4.190)
where K is some constant. Integrating over ζ yields the following factor in the partition function:
Z∞  
−βKζ 2 /2 2π 1/2
dζ e = . (4.191)

−∞

The contribution to the Helmholtz free energy is then


 
1 K
∆Fζ = 2 kB T ln , (4.192)
2πkB T

and therefore the contribution to the internal energy E is


∂  1
∆Eζ = β ∆Fζ = = 21 kB T . (4.193)
∂β 2β
We have thus derived what is commonly called the equipartition theorem of classical statistical mechanics:

To each degree of freedom which enters the Hamiltonian quadratically is associated a contri-
bution 12 kB T to the internal energy of the system. This results in a concomitant contribution
of 12 kB to the heat capacity.

We now see why the internal energy of a classical ideal gas with f degrees of freedom per molecule
is E = 12 f N kB T , and CV = 12 N kB . This result also has applications in the theory of solids. The
atoms in a solid possess kinetic energy due to their motion, and potential energy due to the spring-like
interatomic potentials which tend to keep the atoms in their preferred crystalline positions. Thus, for a
three-dimensional crystal, there are six quadratic degrees of freedom (three positions and three momenta)
per atom, and the classical energy should be E = 3N kB T , and the heat capacity CV = 3N kB . As we
shall see, quantum mechanics modifies this result considerably at temperatures below the highest normal
mode (i.e. phonon) frequency, but the high temperature limit is given by the classical value CV = 3νR
(where ν = N/NA is the number of moles) derived here, known as the Dulong-Petit limit.
4.9. SELECTED EXAMPLES 195

4.9 Selected Examples

4.9.1 Spins in an external magnetic field

Consider a system of NS spins , each of which can be either up (σ = +1) or down (σ = −1). The
Hamiltonian for this system is
NS
X
Ĥ = −µ0 H σj , (4.194)
j=1

where now we write Ĥ for the Hamiltonian, to distinguish it from the external magnetic field H, and µ0
is the magnetic moment per particle. We treat this system within the ordinary canonical ensemble. The
partition function is X X
Z= ··· e−β Ĥ = ζ NS , (4.195)
σ1 σN
S

where ζ is the single particle partition function:


X  
µ0 Hσ/kB T µ H
ζ= e = 2 cosh 0 . (4.196)
kB T
σ=±1

The Helmholtz free energy is then


"  #
µ H
F (T, H, NS ) = −kB T ln Z = −NS kB T ln 2 cosh 0 . (4.197)
kB T

The magnetization is    
∂F µ0 H
M =− = NS µ0 tanh . (4.198)
∂H T, NS kB T
The energy is  
∂  µ H
E= βF = −NS µ0 H tanh 0 . (4.199)
∂β kB T
Hence, E = −HM , which we already knew, from the form of Ĥ itself.
Each spin here is independent. The probability that a given spin has polarization σ is

eβµ0 Hσ
Pσ = . (4.200)
eβµ0 H + e−βµ0 H
The total probability is unity, and the average polarization is a weighted average of σ = +1 and σ = −1
contributions:  
µ0 H
P↑ + P↓ = 1 , hσi = P↑ − P↓ = tanh . (4.201)
kB T
At low temperatures T ≪ µ0 H/kB , we have P↑ ≈ 1 − e−2µ0 H/kB T . At high temperatures T > µ0 H/kB ,
 
σµ H
the two polarizations are equally likely, and Pσ ≈ 21 1 + k 0T .
B
196 CHAPTER 4. STATISTICAL ENSEMBLES

The isothermal magnetic susceptibility is defined as


   
1 ∂M µ20 2 µ0 H
χT = = sech . (4.202)
NS ∂H T kB T kB T

(Typically this is computed per unit volume rather than per particle.) At H = 0, we have χT = µ20 /kB T ,
which is known as the Curie law .

Aside

The energy E = −HM here is not the same quantity we discussed in our study of thermodynamics.
In fact, the thermodynamic energy for this problem vanishes! Here is why. To avoid confusion, we’ll
need to invoke a new symbol for the thermodynamic energy, E. Recall that the thermodynamic energy
E is a function of extensive quantities, meaning E = E(S, M, NS ). It is obtained from the free energy
F (T, H, NS ) by a double Legendre transform:

E(S, M, NS ) = F (T, H, NS ) + T S + HM . (4.203)

Now from eqn. 4.197 we derive the entropy


"  #  
∂F µ0 H µ0 H µ0 H
S=− = NS kB ln 2 cosh − NS tanh . (4.204)
∂T kB T T kB T

Thus, using eqns. 4.197 and 4.198, we obtain E(S, M, NS ) = 0.


The potential confusion here arises from our use of the expression F (T, H, NS ). In thermodynamics, it is
the Gibbs free energy G(T, p, N ) which is a double Legendre transform of the energy: G = E − T S + pV .
By analogy, with magnetic systems we should perhaps write G = E − T S − HM , but in keeping with
many textbooks we shall use the symbol F and refer to it as the Helmholtz free energy. The quantity
we’ve called E in eqn. 4.199 is in fact E = E − HM , which means E = 0. The energy E(S, M, NS )
vanishes here because the spins are noninteracting.

4.9.2 Negative temperature (!)

Consider again a system of NS spins, each of which can be either up (+) or down (−). Let Nσ be the
number of sites with spin σ, where σ = ±1. Clearly N+ + N− = NS . We now treat this system within
the microcanonical ensemble.
The energy of the system is
E = −HM , (4.205)
where H is an external magnetic field, and M = (N+ − N− ) µ0 is the total magnetization. We now
compute S(E) using the ordinary canonical ensemble. The number of ways of arranging the system with
N+ up spins is  
NS
Ω= , (4.206)
N+
4.9. SELECTED EXAMPLES 197

Figure 4.7: When entropy decreases with increasing energy, the temperature is negative. Typically,
kinetic degrees of freedom prevent this peculiarity from manifesting in physical systems.

hence the entropy is n o


S = kB ln Ω = −NS kB x ln x + (1 − x) ln(1 − x) (4.207)

in the thermodynamic limit: NS → ∞, N+ → ∞, x = N+ /NS constant. Now the magnetization is


M = (N+ − N− )µ0 = (2N+ − NS )µ0 , hence if we define the maximum energy E0 ≡ NS µ0 H, then

E M E0 − E
=− = 1 − 2x =⇒ x= . (4.208)
E0 NS µ0 2E0

We therefore have
"       #
E0 − E E0 − E E0 + E E0 + E
S(E, NS ) = −NS kB ln + ln . (4.209)
2E0 2E0 2E0 2E0

We now have    
1 ∂S ∂S ∂x NS kB E0 − E
= = = ln . (4.210)
T ∂E NS ∂x ∂E 2E0 E0 + E
We see that the temperature is positive for −E0 ≤ E < 0 and is negative for 0 < E ≤ E0 .
What has gone wrong? The answer is that nothing has gone wrong – all our calculations are perfectly
correct. This system does exhibit the possibility of negative temperature. It is, however, unphysical in
that we have neglected kinetic degrees of freedom, which result in an entropy function S(E, NS ) which
is an increasing function of energy. In this system, S(E, NS ) achieves a maximum of Smax = NS kB ln 2
at E = 0 (i.e. x = 12 ), and then turns over and starts decreasing. In fact, our results are completely
consistent with eqn. 4.199 : the energy E is an odd function of temperature. Positive energy requires
negative temperature! Another example of this peculiarity is provided in the appendix in §4.11.2.
198 CHAPTER 4. STATISTICAL ENSEMBLES

4.9.3 Adsorption

PROBLEM: A surface containing NS adsorption sites is in equilibrium with a monatomic ideal gas. Atoms
adsorbed on the surface have an energy −∆ and no kinetic energy. Each adsorption site can accommodate
at most one atom. Calculate the fraction f of occupied adsorption sites as a function of the gas density
n, the temperature T , the binding energy ∆, and physical constants.
The grand partition function for the surface is
NS  
X N
−Ωsurf /kB T
Ξsurf = e = S ej(µ+∆)/kB T
j (4.211)
j=0

T NS
= 1 + eµ/kB T e∆/kB .

The fraction of occupied sites is

hN̂surf i 1 ∂Ωsurf eµ/kB T


f= =− = µ/k T . (4.212)
NS NS ∂µ e B + e−∆/kB T
Since the surface is in equilibrium with the gas, its fugacity z = exp(µ/kB T ) and temperature T are the
same as in the gas.
SOLUTION:
p For a monatomic ideal gas, the single particle partition function is ζ = V λ−3
T , where λT =
2π~2 /mkB T is the thermal wavelength. Thus, the grand partition function, for indistinguishable par-
ticles, is  
Ξgas = exp V λ−3
T e µ/kB T
. (4.213)
The gas density is
hN̂gas i 1 ∂Ωgas
n= =− = λ−3
T e
µ/kB T
. (4.214)
V V ∂µ
We can now solve for the fugacity: z = eµ/kB T = nλ3T . Thus, the fraction of occupied adsorption sites is

nλ3T
f= . (4.215)
nλ3T + e−∆/kB T
Interestingly, the solution for f involves the constant ~.
It is always advisable to check that the solution makes sense in various limits. First of all, if the gas
density tends to zero at fixed T and ∆, we have f → 0. On the other hand, if n → ∞ we have f → 1,
which also makes sense. At fixed n and T , if the adsorption energy is (−∆) → −∞, then once again
f = 1 since every adsorption site wants to be occupied. Conversely, taking (−∆) → +∞ results in n → 0,
since the energetic cost of adsorption is infinitely high.

4.9.4 Elasticity of wool

Wool consists of interlocking protein molecules which can stretch into an elongated configuration, but
reversibly so. This feature gives wool its very useful elasticity. Let us model a chain of these proteins
4.9. SELECTED EXAMPLES 199

Figure 4.8: The monomers in wool are modeled as existing in one of two states. The low energy unde-
formed state is A, and the higher energy deformed state is B. Applying tension induces more monomers
to enter the B state.

by assuming they can exist in one of two states, which we will call A and B, with energies εA and εB
and lengths ℓA and ℓB . The situation is depicted in fig. 4.8. We model these conformational degrees of
freedom by a spin variable σ = ±1 for each molecule, where σ = +1 in the A state and σ = −1 in the B
state. Suppose a chain consisting of N monomers is placed under a tension τ . We then have

N h
X i
Ĥ = εA δσj ,+1 + εB δσj ,−1 . (4.216)
j=1

Similarly, the length is


N h
X i
L̂ = ℓA δσj ,+1 + ℓB δσj ,−1 . (4.217)
j=1

The Gibbs partition function is Y = Tr e−K̂/kB T , with K̂ = Ĥ − τ L̂ :

N h
X i
K̂ = ε̃A δσj ,+1 + ε̃B δσj ,−1 , (4.218)
j=1

where ε̃A ≡ εA − τ ℓA and ε̃B ≡ εB − τ ℓB . At τ = 0 the A state is preferred for each monomer, but when
τ exceeds τ ∗ , defined by the relation ε̃A = ε̃B , the B state is preferred. One finds

εB − εA
τ∗ = . (4.219)
ℓB − ℓA

Once again, we have a set of N noninteracting spins. The partition function is Y = ζ N , where ζ is the
single monomer partition function, ζ = Tr e−β ĥ , where

ĥ = ε̃A δσj ,1 + ε̃B δσj ,−1 (4.220)

is the single “spin” Hamiltonian. Thus,

ζ = Tr e−β ĥ = e−β ε̃A + e−β ε̃B , (4.221)

It is convenient to define the differences

∆ε = εB − εA , ∆ℓ = ℓB − ℓA , ∆ε̃ = ε̃B − ε̃A (4.222)


200 CHAPTER 4. STATISTICAL ENSEMBLES

Figure 4.9: Upper panel: length L(τ, T ) for kB T /ε̃ = 0.01 (blue), 0.1 (green), 0.5 (dark red), and 1.0
(red). Bottom panel: dimensionless force constant k/N (∆ℓ)2 versus temperature.

in which case the partition function Y is


h iN
Y (T, τ, N ) = e−N β ε̃A 1 + e−β∆ε̃ (4.223)
h i
G(T, τ, N ) = N ε̃A − N kB T ln 1 + e−∆ε̃/kB T (4.224)

The average length is


 
∂G
L = hL̂i = −
∂τ T,N
(4.225)
N ∆ℓ
= N ℓA + .
e(∆ε−τ ∆ℓ)/kB T + 1
The polymer behaves as a spring, and for small τ the spring constant is
 
∂τ 4kB T 2 ∆ε
k= = cosh . (4.226)
∂L τ =0 N (∆ℓ)2 2kB T
The results are shown in fig. 4.9. Note that length increases with temperature for τ < τ ∗ and decreases
with temperature for τ > τ ∗ . Note also that k diverges at both low and high temperatures. At low T , the
energy gap ∆ε dominates and L = N ℓA , while at high temperatures kB T dominates and L = 21 N (ℓA + ℓB).
4.9. SELECTED EXAMPLES 201

4.9.5 Noninteracting spin dimers

Consider a system of noninteracting spin dimers as depicted in fig. 4.10. Each dimer contains two spins,
and is described by the Hamiltonian

Ĥdimer = −J σ1 σ2 − µ0 H (σ1 + σ2 ) . (4.227)

Here, J is an interaction energy between the spins which comprise the dimer. If J > 0 the interaction is
ferromagnetic, which prefers that the spins are aligned. That is, the lowest energy states are | ↑↑ i and
| ↓↓ i. If J < 0 the interaction is antiferromagnetic, which prefers that spins be anti-aligned: | ↑↓ i and
| ↓↑ i.9
Suppose there are Nd dimers. Then the OCE partition function is Z = ζ Nd , where ζ(T, H) is the single
dimer partition function. To obtain ζ(T, H), we sum over the four possible states of the two spins,
obtaining

ζ = Tr e−Ĥdimer /kB T
 
−J/kB T J/kB T 2µ0 H
= 2e + 2e cosh .
kB T
Thus, the free energy is
"  #
2µ0 H
F (T, H, Nd ) = −Nd kB T ln 2 − Nd kB T ln e−J/kB T + eJ/kB T cosh . (4.228)
kB T

The magnetization is
 
  2µ H
eJ/kB T sinh k 0T
∂F B
M =− = 2Nd µ0 ·   (4.229)
∂H T,Nd e−J/k B T J/k T
+ e B cosh k 0T
2µ H
B

It is instructive to consider the zero field isothermal susceptibility per spin,



1 ∂M µ20 2 eJ/kB T
χT = = · . (4.230)
2Nd ∂H H=0 kB T eJ/kB T + e−J/kB T

The quantity µ20 /kB T is simply the Curie susceptibility for noninteracting classical spins. Note that we
correctly recover the Curie result when J = 0, since then the individual spins comprising each dimer are
in fact noninteracting. For the ferromagnetic case, if J ≫ kB T , then we obtain
2µ20
χT (J ≫ kB T ) ≈ . (4.231)
kB T
This has the following simple interpretation. When J ≫ kB T , the spins of each dimer are effectively
locked in parallel. Thus, each dimer has an effective magnetic moment µeff = 2µ0 . On the other hand,
there are only half as many dimers as there are spins, so the resulting Curie susceptibility per spin is
1 2
2 × (2µ0 ) /kB T .
9
Nota bene we are concerned with classical spin configurations only – there is no superposition of states allowed in this
model!
202 CHAPTER 4. STATISTICAL ENSEMBLES

Figure 4.10: A model of noninteracting spin dimers on a lattice. Each red dot represents a classical spin
for which σj = ±1.

When −J ≫ kB T , the spins of each dimer are effectively locked in one of the two antiparallel configura-
tions. We then have
2µ20 −2|J|/k T
χT (−J ≫ kB T ) ≈ e B . (4.232)
kB T
In this case, the individual dimers have essentially zero magnetic moment.

4.10 Statistical Mechanics of Molecular Gases

4.10.1 Separation of translational and internal degrees of freedom

The states of a noninteracting atom or molecule are labeled by its total momentum p and its internal
quantum numbers, which we will simply write with a collective index α, specifying rotational, vibrational,
and electronic degrees of freedom. The single particle Hamiltonian is then
p2
ĥ = + ĥint , (4.233)
2m
with  
~2 k2
ĥ k , α = + εα k , α . (4.234)
2m
The partition function is X X
2 /2m
ζ = Tr e−β ĥ = e−βp gj e−βεj . (4.235)
p j
Here we have replaced the internal label α with a label j of energy eigenvalues, with gj being the
degeneracy of the internal state with energy εj . To do the p sum, we quantize in a box of dimensions
L1 × L2 × · · · × Ld , using periodic boundary conditions. Then
 
2π~n1 2π~n2 2π~nd
p= , , ... , , (4.236)
L1 L2 Ld
4.10. STATISTICAL MECHANICS OF MOLECULAR GASES 203

where each ni is an integer. Since the differences between neighboring quantized p vectors are very tiny,
we can replace the sum over p by an integral:
X Z
ddp
−→ (4.237)
p
∆p1 · · · ∆pd

where the volume in momentum space of an elementary rectangle is


(2π~)d (2π~)d
∆p1 · · · ∆pd = = . (4.238)
L1 · · · Ld V
Thus,
Z X
ddp −p2 /2mkB T
ζ=V e gj e−εj /kB T = V λ−d
T ξ (4.239)
(2π~)d
j
X
ξ(T ) = gj e−εj /kB T . (4.240)
j

Here, ξ(T ) is the internal coordinate partition function. The full N -particle ordinary canonical partition
function is then  
1 V N N
ZN = ξ (T ) . (4.241)
N ! λdT
Using Stirling’s approximation, we find the Helmholtz free energy F = −kB T ln Z is
"   #
V
F (T, V, N ) = −N kB T ln + 1 + ln ξ(T )
N λdT
"   # (4.242)
V
= −N kB T ln + 1 + N ϕ(T ) ,
N λdT

where
ϕ(T ) = −kB T ln ξ(T ) (4.243)
is the internal coordinate contribution to the single particle free energy. We could also compute the
partition function in the Gibbs (T, p, N ) ensemble:
Z∞
−βG(T,p,N ) 1
Y (T, p, N ) = e = dV e−βpV Z(T, V, N )
V0
0 (4.244)
  N
kB T kB T
= ξ N (T ) .
pV0 p λdT
Thus, in the thermodynamic limit,
 d
G(T, p, N ) p λT
µ(T, p) = = kB T ln − kB T ln ξ(T )
N kB T
 d (4.245)
p λT
= kB T ln + ϕ(T ) .
kB T
204 CHAPTER 4. STATISTICAL ENSEMBLES

4.10.2 Ideal gas law

Since the internal coordinate contribution to the free energy is volume-independent, we have
 
∂G N kB T
V = = , (4.246)
∂p T,N p

and the ideal gas law applies. The entropy is


  "   #
∂G kB T
S=− = N kB ln + 1 + 12 d − N ϕ′ (T ) , (4.247)
∂T p,N pλdT

and therefore the heat capacity is


 
∂S 
Cp = T = 1
2d + 1 N kB − N T ϕ′′ (T ) (4.248)
∂T p,N
 
∂S
CV = T = 12 dN kB − N T ϕ′′ (T ) . (4.249)
∂T V,N

Thus, any temperature variation in Cp must be due to the internal degrees of freedom.

4.10.3 The internal coordinate partition function

At energy scales of interest we can separate the internal degrees of freedom into distinct classes, writing

ĥint = ĥrot + ĥvib + ĥelec (4.250)

as a sum over internal Hamiltonians governing rotational, vibrational, and electronic degrees of freedom.
Then
ξint = ξrot · ξvib · ξelec . (4.251)

Associated with each class of excitation is a characteristic temperature Θ. Rotational and vibrational
temperatures of a few common molecules are listed in table tab. 4.1.

4.10.4 Rotations

Consider a class of molecules which can be approximated as an axisymmetric top. The rotational Hamil-
tonian is then

L2a + L2b L2
ĥrot = + c
2I1 2I3
2
  (4.252)
~ L(L + 1) 1 1
= + − L2c ,
2I1 2I3 2I1
4.10. STATISTICAL MECHANICS OF MOLECULAR GASES 205

molecule Θrot (K) Θvib (K)


H2 85.4 6100
N2 2.86 3340
H2 O 13.7 , 21.0 , 39.4 2290 , 5180 , 5400

Table 4.1: Some rotational and vibrational temperatures of common molecules.

where n̂a.b,c (t) are the principal axes, with n̂c the symmetry axis, and La,b,c are the components of the
angular momentum vector L about these instantaneous body-fixed principal axes. The components of L
along space-fixed axes {x, y, z} are written as Lx,y,z . Note that
 µ     
L , Lc = nνc Lµ , Lν + Lµ , nνc Lν = iǫµνλ nνc Lλ + iǫµνλ nλc Lν = 0 , (4.253)
which is equivalent to the statement that Lc = n̂c ·L is a rotational scalar. We can therefore simultaneously
specify the eigenvalues of {L2 , Lz , Lc }, which form a complete set of commuting observables (CSCO)10 .
The eigenvalues of Lz are m~ with m ∈ {−L, . . . , L}, while those of Lc are k~ with k ∈ {−L, . . . , L}.
There is a (2L + 1)-fold degeneracy associated with the Lz quantum number.
We assume the molecule is prolate, so that I3 < I1 . We can the define two temperature scales,
~2 e= ~2
Θ= , Θ . (4.254)
2I1 kB 2I3 kB
e > Θ. We conclude that the rotational partition function for an axisymmetric
Prolateness then means Θ
molecule is given by

X L
X
−L(L+1) Θ/T 2 e
ξrot (T ) = (2L + 1) e e−k (Θ−Θ)/T (4.255)
L=0 k=−L

e ≫ kB T at all relevant temperatures. Only the k = 0


In diatomic molecules, I3 is extremely small, and Θ
term contributes to the partition sum, and we have

X
ξrot (T ) = (2L + 1) e−L(L+1) Θ/T . (4.256)
L=0

When T ≪ Θ, only the first few terms contribute, and


ξrot (T ) = 1 + 3 e−2Θ/T + 5 e−6Θ/T + . . . (4.257)
In the high temperature limit, we have a slowly varying summand. The Euler-MacLaurin summation
formula may be used to evaluate such a series:
Zn
n
X   X∞
B2j h (2j−1) i
Fk = dk F (k) + 12 F (0) + F (n) + F (n) − F (2j−1) (0) (4.258)
(2j)!
k=0 0 j=1

10
Note that while we cannot simultaneously specify the eigenvalues of two components of L along axes fixed in space,
we can simultaneously specify the components of L along one axis fixed in space and one axis rotating with a body. See
Landau and Lifshitz, Quantum Mechanics, §103.
206 CHAPTER 4. STATISTICAL ENSEMBLES

where Bj is the j th Bernoulli number where

B0 = 1 , B1 = − 21 , B2 = 1
6 , 1
B4 = − 30 , B6 = 1
42 . (4.259)

Thus,

X Z∞
1 ′ 1
Fk = dx F (x) + 12 F (0) − 12 F (0) − F ′′′ (0) + . . . . (4.260)
720
k=0 0
R∞ T
We have F (x) = (2x + 1) e−x(x+1)Θ/T , for which dx F (x) = Θ , hence
0
 2
T 1 1 Θ 4 Θ
ξrot = + + + + ... . (4.261)
Θ 3 15 T 315 T

Recall that ϕ(T ) = −kB T ln ξ(T ). We conclude that ϕrot (T ) ≈ −3kB T e−2Θ/T for T ≪ Θ and ϕrot (T ) ≈
−kB T ln(T /Θ) for T ≫ Θ. We have seen that the internal coordinate contribution to the heat capacity
is ∆CV = −N T ϕ′′ (T ). For diatomic molecules, then, this contribution is exponentially suppressed for
T ≪ Θ, while for high temperatures we have ∆CV = N kB . One says that the rotational excitations are
‘frozen out’ at temperatures much below Θ. Including the first few terms, we have
 2
Θ
∆CV (T ≪ Θ) = 12 N kB e−2Θ/T + . . . (4.262)
T
(     )
1 Θ 2 16 Θ 3
∆CV (T ≫ Θ) = N kB 1 + + + ... . (4.263)
45 T 945 T

Note that CV overshoots its limiting value of N kB and asymptotically approaches it from above.
Special care must be taken in the case of homonuclear diatomic molecules, for then only even or odd L
states are allowed, depending on the total nuclear spin. This is discussed below in §4.10.7.
For polyatomic molecules, the moments of inertia generally are large enough that the molecule’s rotations
can be considered classically. We then have

L2a L2 L2
ε(La , Lb , Lc ) = + b + c . (4.264)
2I1 2I2 2I3

We then have Z
1 dLa dLb dLc dφ dθ dψ −ε(La Lb Lc )/k T
ξrot (T ) = e B , (4.265)
grot (2π~)3
where (φ, θ ψ) are the Euler angles. Recall φ ∈ [0, 2π], θ ∈ [0, π], and ψ ∈ [0, 2π]. The factor grot
accounts for physically indistinguishable orientations of the molecule brought about by rotations, which
can happen when more than one of the nuclei is the same. We then have
 3/2
2kB T p
ξrot (T ) = πI1 I2 I3 . (4.266)
~2

This leads to ∆CV = 23 N kB .


4.10. STATISTICAL MECHANICS OF MOLECULAR GASES 207

4.10.5 Vibrations

Vibrational frequencies are often given in units of inverse wavelength, such as cm−1 , called a wavenumber .
To convert to a temperature scale T ∗ , we write kB T ∗ = hν = hc/λ, hence T ∗ = (hc/kB ) λ−1 , and we
multiply by
hc
= 1.436 K · cm . (4.267)
kB
For example, infrared absorption (∼ 50 cm−1 to 104 cm−1 ) reveals that the ‘asymmetric stretch’ mode of
the H2 O molecule has a vibrational frequency of ν = 3756 cm −1 . The corresponding temperature scale is
T ∗ = 5394 K.
Vibrations are normal modes of oscillations. A single normal mode Hamiltonian is of the form

p2 
ĥ = + 21 mω 2 q 2 = ~ω a† a + 12 . (4.268)
2m
In general there are many vibrational modes, hence many normal mode frequencies ωα . We then must
sum over all of them, resulting in Y (α)
ξvib = ξvib . (4.269)
α
For each such normal mode, the contribution is

X ∞ 
X n
1
ξ= e−(n+ 2 )~ω/kB T = e−~ω/2kB T e−~ω/kB T
n=0 n=0 (4.270)
e−~ω/2kB T 1
= = ,
1 − e−~ω/kB T 2 sinh(Θ/2T )
where Θ = ~ω/kB . Then
 
ϕ = kB T ln 2 sinh(Θ/2T )
 (4.271)
= 21 kB Θ + kB T ln 1 − e−Θ/T .

The contribution to the heat capacity is


 2
eΘ/TΘ
∆CV = N kB
(eΘ/T − 1)2
T
( (4.272)
N kB (Θ/T )2 exp(−Θ/T ) (T → 0)
=
N kB (T → ∞)

4.10.6 Two-level systems : Schottky anomaly

Consider now a two-level system, with energies ε0 and ε1 . We define ∆ ≡ ε1 − ε0 and assume without
loss of generality that ∆ > 0. The partition function is

ζ = e−βε0 + e−βε1 = e−βε0 1 + e−β∆ . (4.273)
208 CHAPTER 4. STATISTICAL ENSEMBLES

Figure 4.11: Heat capacity per molecule as a function of temperature for (a) heteronuclear diatomic
gases, (b) a single vibrational mode, and (c) a single two-level system.

The free energy is 


f = −kB T ln ζ = ε0 − kB T ln 1 + e−∆/kB T . (4.274)
The entropy for a given two level system is then
∂f  ∆ 1
s=− = kB ln 1 + e−∆/kB T + · ∆/k T (4.275)
∂T T e B +1
and the heat capacity is = T (∂s/∂T ), i.e.

∆2 e∆/kB T
c(T ) = ·  . (4.276)
kB T 2 e∆/kB T + 1 2

Thus,

∆2 −∆/k T
c (T ≪ ∆) = e B (4.277)
kB T 2
∆2
c (T ≫ ∆) = . (4.278)
4kB T 2
We find that c(T ) has a characteristic peak at T ∗ ≈ 0.42 ∆/kB . The heat capacity vanishes in both the
low temperature and high temperature limits. At low temperatures, the gap to the excited state is much
greater than kB T , and it is not possible to populate it and store energy. At high temperatures, both
ground state and excited state are equally populated, and once again there is no way to store energy.
If we have a distribution of independent two-level systems, the heat capacity of such a system is a sum
over the individual Schottky functions:

X Z∞
C(T ) = e
c (∆i /kB T ) = N d∆ P (∆) e
c(∆/T ) , (4.279)
i 0
4.10. STATISTICAL MECHANICS OF MOLECULAR GASES 209

where N is the number of two level systems, e c(x) = kB x2 ex /(ex + 1)2 , and where P (∆) is the normalized
distribution function, which satisfies the normalization condition
Z∞
d∆ P (∆) = 1 . (4.280)
0

NS is the total number of two level systems. If P (∆) ∝ ∆r for ∆ → 0, then the low temperature heat
capacity behaves as C(T ) ∝ T 1+r . Many amorphous or glassy systems contain such a distribution of two
level systems, with r ≈ 0 for glasses, leading to a linear low-temperature heat capacity. The origin of
these two-level systems is not always so clear but is generally believed to be associated with local atomic
configurations for which there are two low-lying states which are close in energy. The paradigmatic
example is the mixed crystalline solid (KBr)1−x (KCN)x which over the range 0.1 < ∼x< ∼ 0.6 forms an
‘orientational glass’ at low temperatures. The two level systems are associated with different orientation
of the cyanide (CN) dipoles.

4.10.7 Electronic and nuclear excitations

For a monatomic gas, the internal coordinate partition function arises due to electronic and nuclear
degrees of freedom. Let’s first consider the electronic degrees of freedom. We assume that kB T is small
compared with energy differences between successive electronic shells. The atomic ground state is then
computed by filling up the hydrogenic orbitals until all the electrons are used up. If the atomic number
is a ‘magic number’ (A = 2 (He), 10 (Ne), 18 (Ar), 36 (Kr), 54 (Xe), etc.) then the atom has all shells
filled and L = 0 and S = 0. Otherwise the last shell is partially filled and one or both of L and S will be
nonzero. The atomic ground state configuration 2J+1 LS is then determined by Hund’s rules:

1. The LS multiplet with the largest S has the lowest energy.


2. If the largest value of S is associated with several multiplets, the multiplet with the largest L has
the lowest energy.
3. If an incomplete shell is not more than half-filled, then the lowest energy state has J = |L − S|. If
the shell is more than half-filled, then J = L + S.

The last of Hund’s rules distinguishes between the (2S + 1)(2L + 1) states which result upon fixing S and
L as per rules #1 and #2. It arises due to the atomic spin-orbit coupling, whose effective Hamiltonian
may be written Ĥ = ΛL · S, where Λ is the Russell-Saunders coupling. If the last shell is less than or
equal to half-filled, then Λ > 0 and the ground state has J = |L − S|. If the last shell is more than
half-filled, the coupling is inverted, i.e. Λ < 0, and the ground state has J = L + S.11
The electronic contribution to ξ is then
L+S
X
ξelec = (2J + 1) e−∆ε(L,S,J)/kB T (4.281)
J=|L−S|

11
See e.g. §72 of Landau and Lifshitz, Quantum Mechanics, which, in my humble estimation, is the greatest physics book
ever written.
210 CHAPTER 4. STATISTICAL ENSEMBLES

where h i
∆ε(L, S, J) = 21 Λ J(J + 1) − L(L + 1) − S(S + 1) . (4.282)

At high temperatures, kB T is larger than the energy difference between the different J multiplets, and we
have ξelec ∼ (2L + 1)(2S + 1) e−βε0 , where ε0 is the ground state energy. At low temperatures, a particular
value of J is selected – that determined by Hund’s third rule – and we have ξelec ∼ (2J + 1) e−βε0 . If, in
addition, there is a nonzero nuclear spin I, then we also must include a factor ξnuc = (2I + 1), neglecting
the small hyperfine splittings due to the coupling of nuclear and electronic angular momenta.
For heteronuclear diatomic molecules, i.e. molecules composed from two different atomic nuclei, the
(1) (2)
internal partition function simply receives a factor of ξelec · ξnuc · ξnuc , where the first term is a sum over
molecular electronic states, and the second two terms arise from the spin degeneracies of the two nuclei.
For homonuclear diatomic molecules, the exchange of nuclear centers is a symmetry operation, and does
not represent a distinct quantum state. To correctly count the electronic states, we first assume that the
total electronic spin is S = 0. This is generally a very safe assumption. Exchange symmetry now puts
restrictions on the possible values of the molecular angular momentum L, depending on the total nuclear
angular momentum Itot . If Itot is even, then the molecular angular momentum L must also be even.
If the total nuclear angular momentum is odd, then L must be odd. This is so because the molecular
ground state configuration is 1Σg+ .12
The total number of nuclear states for the molecule is (2I + 1)2 , of which some are even under nuclear
exchange, and some are odd. The number of even states, corresponding to even total nuclear angular
momentum is written as gg , where the subscript conventionally stands for the (mercifully short) German
word gerade, meaning ‘even’. The number of odd (Ger. ungerade) states is written gu . Table 4.2 gives
the values of gg,u corresponding to half-odd-integer I and integer I.
The final answer for the rotational component of the internal molecular partition function is then

ξrot (T ) = gg ζg + gu ζu , (4.283)

where

X
ζg = (2L + 1) e−L(L+1) Θrot /T
L even
X (4.284)
−L(L+1) Θrot /T
ζu = (2L + 1) e .
L odd

For hydrogen, the molecules with the larger nuclear statistical weight are called orthohydrogen and those
with the smaller statistical weight are called parahydrogen. For H2 , we have I = 12 hence the ortho state
has gu = 3 and the para state has gg = 1. In D2 , we have I = 1 and the ortho state has gg = 6 while the
para state has gu = 3. In equilibrium, the ratio of ortho to para states is then

NHortho g ζ 3ζ NDortho gg ζg 2 ζg
2
= u u = u , 2
= = . (4.285)
NHpara gg ζg ζg NDpara gu ζu ζu
2 2

12
See Landau and Lifshitz, Quantum Mechanics, §86.
4.11. APPENDIX I : ADDITIONAL EXAMPLES 211

2I gg gu
odd I(2I + 1) (I + 1)(2I + 1)
even (I + 1)(2I + 1) I(2I + 1)

Table 4.2: Number of even (gg ) and odd (gu ) total nuclear angular momentum states for a homonuclear
diatomic molecule. I is the ground state nuclear spin.

Incidentally, how do we derive the results in Tab. ?? ? The total nuclear angular momentum Itot is the
quantum mechanical sum of the two individual nuclear angular momenta, each of which are of magnitude
I. From elementary addition of angular momenta, we have

I ⊗ I = 0 ⊕ 1 ⊕ 2 ⊕ · · · ⊕ 2I . (4.286)

The right hand side of the above equation lists all the possible multiplets. Thus, Itot ∈ {0, 1, . . . , 2I}.
Now let us count the total number of states with even Itot . If 2I is even, which is to say if I is an integer,
we have
I n
X o
gg(2I=even) = 2 · (2n) + 1 = (I + 1)(2I + 1) , (4.287)
n=0

because the degeneracy of each multiplet is 2Itot + 1. It follows that

gu(2I=even) = (2I + 1)2 − gg = I(2I + 1) . (4.288)

On the other hand, if 2I is odd, which is to say I is a half odd integer, then

I− 21
Xn o
gg(2I=odd) = 2 · (2n) + 1 = I(2I + 1) . (4.289)
n=0

It follows that
gu(2I=odd) = (2I + 1)2 − gg = (I + 1)(2I + 1) . (4.290)

4.11 Appendix I : Additional Examples

4.11.1 Three state system

Consider a spin-1 particle where σ = −1, 0, +1. We model this with the single particle Hamiltonian

ĥ = −µ0 H σ + ∆(1 − σ 2 ) . (4.291)

We can also interpret this as describing a spin if σ = ±1 and a vacancy if σ = 0. The parameter ∆ then
represents the vacancy formation energy. The single particle partition function is

ζ = Tr e−β ĥ = e−β∆ + 2 cosh(βµ0 H) . (4.292)


212 CHAPTER 4. STATISTICAL ENSEMBLES

With NS distinguishable noninteracting spins (e.g. at different sites in a crystalline lattice), we have
Z = ζ NS and h i
F ≡ NSf = −kB T ln Z = −NS kB T ln e−β∆ + 2 cosh(βµ0 H) , (4.293)

where f = −kB T ln ζ is the free energy of a single particle. Note that

∂ ĥ
n̂V = 1 − σ 2 = (4.294)
∂∆
∂ ĥ
m̂ = µ0 σ = − (4.295)
∂H
are the vacancy number and magnetization, respectively. Thus,


∂f e−∆/kB T
nV = n̂V = = −∆/k T (4.296)
∂∆ e B + 2 cosh(µ0 H/kB T )

and

∂f 2µ sinh(µ0 H/kB T )
m = m̂ = − = −∆/k 0T . (4.297)
∂H e B + 2 cosh(µ0 H/kB T )
At weak fields we can compute

∂m µ20 2
χT = = · . (4.298)

∂H H=0 kB T 2 + e−∆/k BT

We thus obtain a modified Curie law. At temperatures T ≪ ∆/kB , the vacancies are frozen out and we
recover the usual Curie behavior. At high temperatures, where T ≫ ∆/kB , the low temperature result
is reduced by a factor of 32 , which accounts for the fact that one third of the time the particle is in a
nonmagnetic state with σ = 0.

4.11.2 Spins and vacancies on a surface

PROBLEM: A collection of spin- 12 particles is confined to a surface with N sites. For each site, let σ = 0 if
there is a vacancy, σ = +1 if there is particle present with spin up, and σ = −1 if there is a particle present
with spin down. The particles are non-interacting, and the energy for each site is given by ε = −W σ 2 ,
where −W < 0 is the binding energy.

(a) Let Q = N↑ + N↓ be the number of spins, and N0 be the number of vacancies. The surface
magnetization is M = N↑ − N↓ . Compute, in the microcanonical ensemble, the statistical entropy
S(Q, M ).

(b) Let q = Q/N and m = M/N be the dimensionless particle density and magnetization density,
respectively. Assuming that we are in the thermodynamic limit, where N , Q, and M all tend to
infinity, but with q and m finite, Find the temperature T (q, m). Recall Stirling’s formula

ln(N !) = N ln N − N + O(ln N ) .
4.11. APPENDIX I : ADDITIONAL EXAMPLES 213

(c) Show explicitly that T can be negative for this system. What does negative T mean? What physical
degrees of freedom have been left out that would avoid this strange property?

SOLUTION: There is a constraint on N↑ , N0 , and N↓ :

N↑ + N0 + N↓ = Q + N0 = N . (4.299)

The total energy of the system is E = −W Q.

(a) The number of states available to the system is


N!
Ω= . (4.300)
N↑ ! N0 ! N↓ !

Fixing Q and M , along with the above constraint, is enough to completely determine {N↑ , N0 , N↓ }:

1 1
N↑ = 2 (Q + M ) , N0 = N − Q , N↓ = 2 (Q − M ) , (4.301)

whence
N!
Ω(Q, M ) =  1  1  . (4.302)
2 (Q + M ) ! 2 (Q − M ) ! (N − Q)!
The statistical entropy is S = kB ln Ω:
     
S(Q, M ) = kB ln(N !) − kB ln 21 (Q + M )! − kB ln 21 (Q − M )! − kB ln (N − Q)! . (4.303)

(b) Now we invoke Stirling’s rule,

ln(N !) = N ln N − N + O(ln N ) , (4.304)

to obtain
1 
ln Ω(Q, M ) = N ln N − N − 21 (Q + M ) ln 2 (Q + M ) + 12 (Q + M )
1 
− 12 (Q − M ) ln 2 (Q − M ) + 12 (Q − M )
(4.305)
− (N − Q) ln(N − Q) + (N − Q)
h i  
1 1 2 2 1 Q+M
= N ln N − 2 Q ln 4 (Q − M ) − 2 M ln
Q−M
Combining terms,
h p i  
1 1 q+m
ln Ω(Q, M ) = −N q ln 2 q2 − m2 − 2 N m ln − N (1 − q) ln(1 − q) , (4.306)
q−m
where Q = N q and M = N m. Note that the entropy S = kB ln Ω is extensive. The statistical
entropy per site is thus
h p i  
1 2 2 1 q+m
s(q, m) = −kB q ln 2 q − m − 2 kB m ln − kB (1 − q) ln(1 − q) . (4.307)
q−m
214 CHAPTER 4. STATISTICAL ENSEMBLES

The temperature is obtained from the relation


   
1 ∂S 1 ∂s
= =
T ∂E M W ∂q m
h p i (4.308)
1 1
= ln(1 − q) − ln 21 q 2 − m2 .
W W

Thus,
W/kB
T =  p  . (4.309)
ln 2(1 − q)/ q 2 − m2

(c) We have 0 ≤ q ≤ 1 and −q ≤ m ≤ q, so T is real (thank heavens!). But it is easy to choose {q, m}
such that T < 0. For example, when m = 0 we have T = W/kB ln(2q −1 − 2) and T < 0 for all
q ∈ 32 , 1 . The reason for this strange state of affairs is that the entropy S is bounded, and is not an
monotonically increasing function of the energy E (or the dimensionless quantity Q). The entropy
is maximized for N ↑= N0 = N↓ = 31 , which says m = 0 and q = 23 . Increasing q beyond this
point (with m = 0 fixed) starts to reduce the entropy, and hence (∂S/∂E) < 0 in this range, which
immediately gives T < 0. What we’ve left out are kinetic degrees of freedom, such as vibrations
and rotations, whose energies are unbounded, and which result in an increasing S(E) function.

4.11.3 Fluctuating interface

Consider an interface between two dissimilar fluids. In equilibrium, in a uniform gravitational field, the
denser fluid is on the bottom. Let z = z(x, y) be the height the interface between the fluids, relative to
equilibrium. The potential energy is a sum of gravitational and surface tension terms, with
Z Zz
Ugrav = d2x dz ′ ∆ρ g z ′ (4.310)
0
Z
Usurf = d2 x 21 σ (∇z)2 . (4.311)

We won’t need the kinetic energy in our calculations, but we can include it just for completeness. It isn’t
so clear how to model it a priori so we will assume a rather general form
Z Z
∂z(x, t) ∂z(x′ , t)
T = d x d2x′ 21 µ(x, x′ )
2
. (4.312)
∂t ∂t
We assume  that the (x, y) plane is a rectangle of dimensions Lx × Ly . We also assume µ(x, x′ ) =
µ |x − x′ | . We can then Fourier transform
−1/2 X
z(x) = Lx Ly zk eik·x , (4.313)
k

where the wavevectors k are quantized according to


2πnx 2πny
k= x̂ + ŷ , (4.314)
Lx Ly
4.11. APPENDIX I : ADDITIONAL EXAMPLES 215

with integer nx and ny , if we impose periodic boundary conditions (for calculational convenience). The
Lagrangian is then
1 X h 2  2 i
L= µk żk − g ∆ρ + σk2 zk , (4.315)
2
k
where Z

µk = d2x µ |x| e−ik·x . (4.316)

Since z(x, t) is real, we have the relation z−k = zk∗ , therefore the Fourier coefficients at k and −k are not
independent. The canonical momenta are given by
∂L ∂L
pk = = µk żk , p∗k = = µk żk∗ (4.317)
∂ żk∗ ∂ żk
The Hamiltonian is then
X′ h i
Ĥ = pk zk∗ + p∗k zk − L (4.318)
k
X′  |p |2 

k 2 2
= + g ∆ρ + σk |zk | , (4.319)
µk
k

where the prime on the k sum indicates that only one of the pair {k, −k} is to be included, for each k.
We may now compute the ordinary canonical partition function:
Y′ Z d2p d2z
k k −|pk |2 /µk kB T −(g ∆ρ+σk2 ) |zk |2 /kB T
Z= 2
e e
(2π~)
k
(4.320)
Y′  k T 2  µk

B
= .
2~ g ∆ρ + σk2
k

Thus,
X k T 
B
F = −kB T ln , (4.321)
2~Ωk
k
where13  1/2
g ∆ρ + σk2
Ωk = . (4.322)
µk
is the normal mode frequency
 for surface oscillations at wavevector k. For deep water waves, it is
appropriate to take µk = ∆ρ |k|, where ∆ρ = ρL − ρG ≈ ρL is the difference between the densities of
water and air.
It is now easy to compute the thermal average
Z Z

2
2 2 −(g ∆ρ+σk2 ) |zk |2 /kB T 2 2
|zk | = d zk |zk | e d2zk e−(g ∆ρ+σk ) |zk | /kB T
(4.323)
kB T
= .
g ∆ρ + σk2
13
Note that there is no prime on the k sum for F , as we have divided the logarithm of Z by two and replaced the half
sum by the whole sum.
216 CHAPTER 4. STATISTICAL ENSEMBLES

Note that this result does not depend on µk , i.e. on our choice of kinetic energy. One defines the
correlation function
Z 2  

1 X
2
ik·x dk kB T
C(x) ≡ z(x) z(0) = |zk | e = eik·x
Lx Ly (2π)2 g ∆ρ + σk2
k
Z∞ (4.324)
kB T eik|x| kB T 
= dq p = K |x|/ξ ,
4πσ q2 + ξ2 4πσ 0
0
p
where ξ = g ∆ρ/σ is the correlation length, and where K0 (z) is the Bessel function of imaginary
argument. The asymptotic behavior of K0 (z) for small z is K0 (z) ∼ ln(2/z), whereas for large z one has
K0 (z) ∼ (π/2z)1/2 e−z . We see that on large length scales the correlations decay exponentially, but on
small length scales they diverge. This divergence is due to the improper energetics we have assigned to
short wavelength fluctuations of the interface. Roughly, it can cured by imposing a cutoff on the integral,
or by insisting that the shortest distance scale is a molecular diameter.

4.11.4 Dissociation of molecular hydrogen

Consider the reaction


↽− p+ + e− .
H −⇀ (4.325)
In equilibrium, we have
µH = µp + µe . (4.326)
What is the relationship between the temperature T and the fraction x of hydrogen which is dissociated?
Let us assume a fraction x of the hydrogen is dissociated. Then the densities of H, p, and e are then

nH = (1 − x) n , np = xn , ne = xn . (4.327)

The single particle partition function for each species is


 
gN V N −N ε /k T
ζ= e int B , (4.328)
N ! λ3T
where g is the degeneracy and εint the internal energy for a given species. We have εint = 0 for p and e,
and εint = −∆ for H, where ∆ = e2 /2aB = 13.6 eV, the binding energy of hydrogen. Neglecting hyperfine
splittings14 , we have gH = 4, while ge = gp = 2 because each has spin S = 12 . Thus, the associated grand
potentials are

ΩH = −gH V kB T λ−3
T,H e
(µH +∆)/kB T
(4.329)

Ωp = −gp V kB T λ−3
T,p e
µp /kB T
(4.330)

Ωe = −ge V kB T λ−3
T,e e
µe /kB T
, (4.331)
14
The hyperfine splitting in hydrogen is on the order of (me /mp ) α4 me c2 ∼ 10−6 eV, which is on the order of 0.01 K. Here
α = e2 /~c is the fine structure constant.
4.11. APPENDIX I : ADDITIONAL EXAMPLES 217

where s
2π~2
λT,a = (4.332)
ma kB T
for species a. The corresponding number densities are
 
1 ∂Ω
n= = g λ−3
T e
(µ−εint )/kB T
, (4.333)
V ∂µ T,V

and the fugacity z = eµ/kB T of a given species is given by

z = g−1 n λ3T eεint /kB T . (4.334)

We now invoke µH = µp + µe , which says zH = zp ze , or

−1
 
gH nH λ3T,H e−∆/kB T = gp−1 np λ3T,p ge−1 ne λ3T,e , (4.335)

which yields  
x2
nλ̃3T = e−∆/kB T , (4.336)
1−x
p
where λ̃T = 2π~2 /m∗ kB T , with m∗ = mp me /mH ≈ me . Note that
s s
4πmH ∆
λ̃T = aB , (4.337)
mp kB T

where aB = 0.529 Å is the Bohr radius. Thus, we have


   3/2
x2 3/2 T
· (4π) ν= e−T0 /T , (4.338)
1−x T0

where T0 = ∆/kB = 1.578 × 105 K and ν = na3B . Consider for example a temperature T = 3000 K, for
which T0 /T = 52.6, and assume that x = 21 . We then find ν = 1.69 × 10−27 , corresponding to a density
of n = 1.14 × 10−2 cm−3 . At this temperature, the fraction of hydrogen molecules in their first excited
(2s) state is x′ ∼ e−T0 /2T = 3.8 × 10−12 . This is quite striking: half the hydrogen atoms are completely
dissociated, which requires an energy of ∆, yet the number in their first excited state, requiring energy
1
2 ∆, is twelve orders of magnitude smaller. The student should reflect on why this can be the case.
218 CHAPTER 4. STATISTICAL ENSEMBLES
Chapter 5

Noninteracting Quantum Systems

5.1 References

– F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, 1987)


This has been perhaps the most popular undergraduate text since it first appeared in 1967, and
with good reason.

– A. H. Carter, Classical and Statistical Thermodynamics


(Benjamin Cummings, 2000)
A very relaxed treatment appropriate for undergraduate physics majors.

– D. V. Schroeder, An Introduction to Thermal Physics (Addison-Wesley, 2000)


This is the best undergraduate thermodynamics book I’ve come across, but only 40% of the book
treats statistical mechanics.

– C. Kittel, Elementary Statistical Physics (Dover, 2004)


Remarkably crisp, though dated, this text is organized as a series of brief discussions of key concepts
and examples. Published by Dover, so you can’t beat the price.

– R. K. Pathria, Statistical Mechanics (2nd edition, Butterworth-Heinemann, 1996)


This popular graduate level text contains many detailed derivations which are helpful for the stu-
dent.

– M. Plischke and B. Bergersen, Equilibrium Statistical Physics (3rd edition, World Scientific, 2006)
An excellent graduate level text. Less insightful than Kardar but still a good modern treatment of
the subject. Good discussion of mean field theory.

– E. M. Lifshitz and L. P. Pitaevskii, Statistical Physics (part I, 3rd edition, Pergamon, 1980)
This is volume 5 in the famous Landau and Lifshitz Course of Theoretical Physics. Though dated,
it still contains a wealth of information and physical insight.

219
220 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

5.2 Statistical Mechanics of Noninteracting Quantum Systems

5.2.1 Bose and Fermi systems in the grand canonical ensemble

A noninteracting many-particle quantum Hamiltonian may be written as1


X
Ĥ = εα n̂α , (5.1)
α

where n̂α is the number of particles in the quantum state α with energy εα . This form is called the
second quantized representation of the Hamiltonian. The number eigenbasis is therefore also an energy
eigenbasis. Any eigenstate of Ĥ may be labeled by the integer eigenvalues of the n̂α number operators,
and written as n1 , n2 , . . . . We then have

n̂α ~n = nα ~n (5.2)
and X
Ĥ ~n = nα εα ~n . (5.3)
α
The eigenvalues nα take on different possible values depending on whether the constituent particles are
bosons or fermions, viz.

bosons : nα ∈ 0 , 1 , 2 , 3 , . . .
 (5.4)
fermions : nα ∈ 0 , 1 .
In other words, for bosons, the occupation numbers are nonnegative integers. For fermions, the occupation
numbers are either 0 or 1 due to the Pauli principle, which says that at most one fermion can occupy
any single particle quantum state. There is no Pauli principle for bosons.
The N -particle partition function ZN is then
X P
ZN = e−β α nα εα δN , P nα , (5.5)
α
{nα }

where the sum is over all allowed values of the set {nα }, which depends on the statistics of the particles.
Bosons satisfy Bose-Einstein (BE) statistics, in which nα ∈ {0 , 1 , 2 , . . .}. Fermions satisfy Fermi-Dirac
(FD) statistics, in which nα ∈ {0 , 1}.
P
The OCE partition sum is difficult to perform, owing to the constraint α nα = N on the total number
of particles. This constraint is relaxed in the GCE, where
X
Ξ= eβµN ZN
N
X P P
= e−β α nα εα βµ
e α nα

{nα } (5.6)
!
Y X
= e−β(εα −µ) nα .
α nα
1
For a review of the formalism of second quantization, see the appendix in §5.9.
5.2. STATISTICAL MECHANICS OF NONINTERACTING QUANTUM SYSTEMS 221

Note that the grand partition function Ξ takes the form of a product over contributions from the indi-
vidual single particle states.
We now perform the single particle sums:

X 1
e−β(ε−µ) n = (bosons) (5.7)
n=0
1− e−β(ε−µ)
X1
e−β(ε−µ) n = 1 + e−β(ε−µ) (fermions) . (5.8)
n=0

Therefore we have
Y 1
ΞBE =
1−
α e−(εα −µ)/kB T
X   (5.9)
ΩBE = kB T ln 1 − e−(εα −µ)/kB T
α

and
Y 
ΞFD = 1 + e−(εα −µ)/kB T
α
X   (5.10)
ΩFD = −kB T ln 1 + e−(εα −µ)/kB T .
α

We can combine these expressions into one, writing


X  
Ω(T, V, µ) = ±kB T ln 1 ∓ e−(εα −µ)/kB T , (5.11)
α

where we take the upper sign for Bose-Einstein statistics and the lower sign for Fermi-Dirac statistics.
Note that the average occupancy of single particle state α is
∂Ω 1
hn̂α i = = (ε −µ)/k T , (5.12)
∂εα e α B ∓1
and the total particle number is then
X 1
N (T, V, µ) = . (5.13)
α e(εα −µ)/kB T ∓ 1

We will henceforth write nα (µ, T ) = hn̂α i for the thermodynamic average of this occupancy.

5.2.2 Quantum statistics and the Maxwell-Boltzmann limit

Consider a system composed of N noninteracting particles. The Hamiltonian is


N
X
Ĥ = ĥj . (5.14)
j=1
222 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

The single particle Hamiltonian ĥ has eigenstates |αi with corresponding energy eigenvalues εα . What is
the partition function? Is it

?
X X −β ε + ε + ... + ε
Z = ··· e α1 α2 α
N = ζN , (5.15)
α1 αN

where ζ is the single particle partition function,


X
ζ= e−βεα . (5.16)
α

For systems where the individual particles are distinguishable, such as spins on a lattice which have fixed
positions, this is indeed correct. But for particles free to move in a gas, this equation is wrong. The
reason is that for indistinguishable particles the many particle quantum mechanical states are specified
by a collection of occupation numbers nα , which tell us how many particles are in the single-particle state
| α i. The energy is X
E= nα εα (5.17)
α

and the total number of particles is X


N= nα . (5.18)
α

That is, each collection of occupation numbers {nα } labels a unique many particle state {nα } . In the
product ζ N , the collection {nα } occurs many times. We have therefore overcounted the contribution to
ZN due to this state. By what factor have we overcounted? It is easy to see that the overcounting factor
is
N!
degree of overcounting = Q ,
α nα !

which is the number of ways we can rearrange the labels αj to arrive at the same collection {nα }. This
follows from the multinomial theorem,

K
!N
X XX X N! n n n
xα = ··· x 1 x 2 · · · xKK δN,n1 + ... + n . (5.19)
α=1
n1 ! n2 ! · · · nK ! 1 2 K
n1 n2 nK

Thus, the correct expression for ZN is


X P
ZN = e−β α nα ε α
δN,P nα
α
{nα }
XX X  Q n !  −β(ε + εα + ... + εα )
(5.20)
α α α1
= ··· e 2 N .
N!
α1 α2 αN

In the high temperature limit, almost all the nα are either 0 or 1, hence

ζN
ZN ≈ . (5.21)
N!
5.2. STATISTICAL MECHANICS OF NONINTERACTING QUANTUM SYSTEMS 223

This is the classical Maxwell-Boltzmann limit of quantum statistical mechanics. We now see the origin
of the 1/N ! term which is so important in the thermodynamics of entropy of mixing.
Finally, starting with the expressions for the grand partition function for Bose-Einstein or Fermi-Dirac
particles, and working in the low density limit where nα (µ, T ) ≪ 1 , we have εα − µ ≫ kB T , and
consequently
X  
ΩBE/FD = ±kB T ln 1 ∓ e−(εα −µ)/kB T
α
X (5.22)
−→ −kB T e−(εα −µ)/kB T ≡ ΩMB .
α

This is the Maxwell-Boltzmann limit of quantum statistical mechanics. The occupation number average
in the Maxwell-Boltzmann limit is then

hn̂α i = e−(εα −µ)/kB T . (5.23)

5.2.3 Single particle density of states

The single particle density of states per unit volume g(ε) is defined as
1 X
g(ε) = δ(ε − εα ) . (5.24)
V α

We can then write


Z∞  
Ω(T, V, µ) = ±V kB T dε g(ε) ln 1 ∓ e−(ε−µ)/kB T . (5.25)
−∞

For particles with a dispersion ε(k), with p = ~k, we have


Z d
dk 
g(ε) = g d
δ(ε − ε(k)
(2π)
(5.26)
g Ωd kd−1
= .
(2π)d dε/dk
where g = 2S +1 is the spin degeneracy, and where we assume that ε(k) is both isotropic and a mono-
tonically increasing function of k. Thus, we have
g
dk

 d=1
 π dε



g Ωd kd−1 
g dk
g(ε) = = 2π k dε d=2 (5.27)
(2π)d dε/dk 




 g 2 dk
2π 2
k dε d = 3 .

In order to obtain g(ε) as a function of the energy ε one must invert the dispersion relation ε = ε(k) to
obtain k = k(ε).
224 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Note that we can equivalently write

ddk g Ωd d−1
g(ε) dε = g d
= k dk (5.28)
(2π) (2π)d

to derive g(ε).
For a spin-S particle with ballistic dispersion ε(k) = ~2 k2 /2m, we have
 
2S +1 m d/2 d −1
g(ε) = ε 2 Θ(ε) , (5.29)
Γ(d/2) 2π~2

where Θ(ε) is the step function, which takes the value 0 for ε < 0 and 1 for ε ≥ 0. The appearance of Θ(ε)
simply says that all the single particle energy eigenvalues are nonnegative. Note that we are assuming a
box of volume V but we are ignoring the quantization of kinetic energy, and assuming that the difference
between successive quantized single particle energy eigenvalues is negligible so that g(ε) can be replaced
by the average in the above expression. Note that
1
n(ε, T, µ) = . (5.30)
e(ε−µ)/kB T ∓1
This result holds true independent of the form of g(ε). The average total number of particles is then
Z∞
1
N (T, V, µ) = V dε g(ε) , (5.31)
e(ε−µ)/kB T ∓1
−∞

which does depend on g(ε).

5.3 Quantum Ideal Gases : Low Density Expansions

5.3.1 Expansion in powers of the fugacity

From eqn. 5.31, we have that the number density n = N/V is


Z∞
g(ε)
n(T, z) = dε
z −1 eε/kB T ∓1
−∞ (5.32)
X∞
= (±1)j−1 Cj (T ) z j ,
j=1

where z = exp(µ/kB T ) is the fugacity and


Z∞
Cj (T ) = dε g(ε) e−jε/kB T . (5.33)
−∞
5.3. QUANTUM IDEAL GASES : LOW DENSITY EXPANSIONS 225

From Ω = −pV and our expression above for Ω(T, V, µ), we have

Z∞  
−ε/kB T
p(T, z) = ∓ kB T dε g(ε) ln 1 ∓ z e
−∞ (5.34)

X
= kB T (±1)j−1 j −1 Cj (T ) z j .
j=1

5.3.2 Virial expansion of the equation of state

Eqns. 5.32 and 5.34 express n(T, z) and p(T, z) as power series in the fugacity z, with T -dependent
coefficients. In principal, we can eliminate z using eqn. 5.32, writing z = z(T, n) as a power series in the
number density n, and substitute this into eqn. 5.34 to obtain an equation of state p = p(T, n) of the
form  
p(T, n) = n kB T 1 + B2 (T ) n + B3 (T ) n2 + . . . . (5.35)

Note that the low density limit n → 0 yields the ideal gas law independent of the density of states g(ε).
This follows from expanding n(T, z) and p(T, z) to lowest order in z, yielding n = C1 z + O(z 2 ) and
p = kB T C1 z + O(z 2 ). Dividing the second of these equations by the first yields p = n kB T + O(n2 ),
which is the ideal gas law. Note that z = n/C1 + O(n2 ) can formally be written as a power series in n.
Unfortunately, there is no general analytic expression for the virial coefficients Bj (T ) in terms of the
expansion coefficients nj (T ). The only way is to grind things out order by order in our expansions. Let’s
roll up our sleeves and see how this is done. We start by formally writing z(T, n) as a power series in the
density n with T -dependent coefficients Aj (T ):

z = A1 n + A2 n2 + A3 n3 + . . . . (5.36)

We then insert this into the series for n(T, z):

n = C1 z ± C2 z 2 + C3 z 3 + . . .
 2
= C1 A1 n + A2 n2 + A3 n3 + . . . ± C2 A1 n + A2 n2 + A3 n3 + . . . (5.37)
3
+ C3 A1 n + A2 n2 + A3 n3 + . . . + . . . .

Let’s expand the RHS to order n3 . Collecting terms, we have


 
n = C1 A1 n + C1 A2 ± C2 A21 n2 + C1 A3 ± 2C2 A1 A2 + C3 A31 n3 + . . . . (5.38)

In order for this equation to be true we require that the coefficient of n on the RHS be unity, and that
the coefficients of nj for all j > 1 must vanish. Thus,

C1 A1 = 1
C1 A2 ± C2 A21 = 0 (5.39)
C1 A3 ± 2C2 A1 A2 + C3 A31 = 0 .
226 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

The first of these yields A1 :


1
A1 = . (5.40)
C1
We now insert this into the second equation to obtain A2 :
C2
A2 = ∓ . (5.41)
C13
Next, insert the expressions for A1 and A2 into the third equation to obtain A3 :

2C22 C
A3 = 5 − 34 . (5.42)
C1 C1
This procedure rapidly gets tedious!
And we’re only half way done. We still must express p in terms of n:
p  2
= C1 A1 n + A2 n2 + A3 n3 + . . . ± 12 C2 A1 n + A2 n2 + A3 n3 + . . .
kB T
3
+ 13 C3 A1 n + A2 n2 + A3 n3 + . . . + . . .
  (5.43)
2 2
= C1 A1 n + C1 A2 ± 1
2 C2 A1 n + C1 A3 ± C2 A1 A2 + 1
3 C3 A31 3
n + ...

= n + B2 n 2 + B3 n 3 + . . .

We can now write


C2
B2 = C1 A2 ± 21 C2 A21 = ∓
2C12
(5.44)
C2 2 C3
B3 = C1 A3 ± C2 A1 A2 + 1
3 C3 A31 = 24 − .
C1 3 C13

It is easy to derive the general result that BjF = (−1)j−1 BjB , where the superscripts denote Fermi (F) or
Bose (B) statistics.
We remark that the equation of state for classical (and quantum) interacting systems also can be expanded
in terms of virial coefficients. Consider, for example, the van der Waals equation of state,
 
aN 2
p+ 2 V − N b) = N kB T . (5.45)
V
This may be recast as
nkB T
p= − an2
1 − bn (5.46)
 2 2 3 3 4
= nkB T + b kB T − a n + kB T b n + kB T b n + . . . ,

where n = N/V . Thus, for the van der Waals system, we have B2 = (b kB T − a) and Bk = kB T bk−1 for
all k ≥ 3.
5.4. ENTROPY AND COUNTING STATES 227

5.3.3 Ballistic dispersion

For the ballistic dispersion ε(p) = p2 /2m we computed the density of states in eqn. 5.29. One finds
Z∞
g λ−d d
Cj (T ) = S T dt t 2 −1 e−jt = gS λ−d
T j
−d/2
. (5.47)
Γ(d/2)
0

We then have
d
B2 (T ) = ∓ 2−( 2 +1) · g−1
S λT
d
 d
 (5.48)
B3 (T ) = 2−(d+1) − 3−( 2 +1) · 2 g−2 2d
S λT .

Note that B2 (T ) is negative for bosons and positive for fermions. This is because bosons have a tendency
to bunch and under certain circumstances may exhibit a phenomenon known as Bose-Einstein conden-
sation (BEC). Fermions, on the other hand, obey the Pauli principle, which results in an extra positive
correction to the pressure in the low density limit.
We may also write
n(T, z) = ±gS λ−d
T Li d (±z) (5.49)
2

and
p(T, z) = ±gS kB T λ−d
T Li d +1 (±z) , (5.50)
2

where

X zn
Liq (z) ≡ (5.51)
nq
n=1

is the polylogarithm function 2 . Note that Liq (z) obeys a recursion relation in its index, viz.


z Li (z) = Liq−1 (z) , (5.52)
∂z q
and that
X∞
1
Liq (1) = = ζ(q) . (5.53)
n=1
nq

5.4 Entropy and Counting States

Suppose we are to partition N particles among J possible distinct single particle states. How many ways
Ω are there of accomplishing this task? The answer depends on the statistics of the particles. If the
J
particles are fermions, the answer is easy: ΩFD = N . For bosons, the number of possible partitions
can be evaluated via the following argument. Imagine that we line up all the N particles in a row, and
we place J − 1 barriers among the particles, as shown below in Fig. 5.1. The number of partitions is
2
Several texts, such as Pathria and Reichl, write gq (z) for Liq (z). I adopt the latter notation since we are already using
the symbol g for the density of states function g(ε) and for the internal degeneracy g.
228 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

then the total number of ways of placing the N particles among these N + J − 1 objects (particles plus
N +J−1
barriers), hence we have ΩBE = N . For Maxwell-Boltzmann statistics, we take ΩMB = J N /N ! Note
that ΩMB is not necessarily an integer, so Maxwell-Boltzmann statistics does not represent any actual
state counting. Rather, it manifests itself as a common limit of the Bose and Fermi distributions, as we
have seen and shall see again shortly.

Figure 5.1: Partitioning N bosons into J possible states (N = 14 and J = 5 shown). The N black dots
represent bosons, while the J − 1 white dots represent markers separating the different single particle
populations. Here n1 = 3, n2 = 1, n3 = 4, n4 = 2, and n5 = 4.

The entropy in each case is simply S = kB ln Ω. We assume N ≫ 1 and J ≫ 1, with n ≡ N/J finite.
Then using Stirling’s approximation, ln(K!) = K ln K − K + O(ln K), we have

SMB = −JkB n ln n
 
SBE = −JkB n ln n − (1 + n) ln(1 + n) (5.54)
 
SFD = −JkB n ln n + (1 − n) ln(1 − n) .

In the Maxwell-Boltzmann limit, n ≪ 1, and all three expressions agree. Note thatR
 
∂SMB 
= −kB 1 + ln n
∂N J
 
∂SBE 
= kB ln n−1 + 1 (5.55)
∂N J
 
∂SFD 
= kB ln n−1 − 1 .
∂N J

Now let’s imagine grouping the single particle spectrum into intervals of J consecutive energy states.
If J is finite and the spectrum is continuous and we are in the thermodynamic limit, then these states
will all be degenerate. Therefore, using α as a label for the energies, we have that the grand potential
Ω = E − T S − µN is given in each case by
Xh i
ΩMB = J (εα − µ) nα + kB T nα ln nα
α
Xh i
ΩBE = J (εα − µ) nα + kB T nα ln nα − kB T (1 + nα ) ln(1 + nα ) (5.56)
α
Xh i
ΩFD = J (εα − µ) nα + kB T nα ln nα + kB T (1 − nα ) ln(1 − nα ) .
α

Now - lo and behold! - treating Ω as a function of the distribution


P {nα } and extremizing in each case,
subject to the constraint of total particle number N = J α nα , one obtains the Maxwell-Boltzmann,
5.5. PHOTON STATISTICS 229

Bose-Einstein, and Fermi-Dirac distributions, respectively:



 nMB (µ−εα )/kB T

 α =e




δ  X   −1
Ω −λJ nα′ = 0 ⇒ nBE
α = e(εα −µ)/kB T − 1 (5.57)
δnα 

α′ 



 FD  (εα −µ)/kB T −1
nα = e +1 .

As long as J is finite, so the states in each block all remain at the same energy, the results are independent
of J.

5.5 Photon Statistics

5.5.1 Thermodynamics of the photon gas

There exists a certain class of particles, including photons and certain elementary excitations in solids
such as phonons (i.e. lattice vibrations) and magnons (i.e. spin waves) which obey bosonic statistics
but with zero chemical potential. This is because their overall number is not conserved (under typical
conditions) – photons can be emitted and absorbed by the atoms in the wall of a container, phonon and
magnon number is also not conserved due to various processes, etc. In such cases, the free energy attains
its minimum value with respect to particle number when
 
∂F
µ= =0. (5.58)
∂N T.V
The number distribution, from eqn. 5.12, is then
1
n(ε) = . (5.59)
eβε −1

The grand partition function for a system of particles with µ = 0 is


Z∞

Ω(T, V ) = V kB T dε g(ε) ln 1 − e−ε/kB T , (5.60)
−∞

where g(ε) is the density of states per unit volume.


Suppose the particle dispersion is ε(p) = A|p|σ . We can compute the density of states g(ε):
Z Z∞
ddp  gΩ
g(ε) = g δ ε − A|p|σ = dd dp pd−1 δ(ε − Apσ )
hd h
0
 √ d d (5.61)
Z∞
gΩd − σd d
−1 2g π −1
= A dx x σ δ(ε − x) = ε σ Θ(ε) ,
σhd σ Γ(d/2) hA1/σ
0
230 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

where g is the internal degeneracy,


 due, for example, to different polarization states of the photon. We
have used the result Ωd = 2π d/2 Γ(d/2) for the solid angle in d dimensions. The step function Θ(ε) is
perhaps overly formal, but it reminds us that the energy spectrum is bounded from below by ε = 0, i.e.
there are no negative energy states.
For the photon, we have ε(p) = cp, hence σ = 1 and
2g π d/2 εd−1
g(ε) = Θ(ε) . (5.62)
Γ(d/2) (hc)d
In d = 3 dimensions the degeneracy is g = 2, the number of independent polarization states. The pressure
p(T ) is then obtained using Ω = −pV . We have
Z∞

p(T ) = −kB T dε g(ε) ln 1 − e−ε/kB T
−∞
Z∞
2 g π d/2 
=− −d
(hc) kB T dε εd−1 ln 1 − e−ε/kB T (5.63)
Γ(d/2)
0
Z∞
2 g π d/2 (kB T )d+1 
=− d
dt td−1 ln 1 − e−t .
Γ(d/2) (hc)
0

We can make some progress with the dimensionless integral:


Z∞

Id ≡ − dt td−1 ln 1 − e−t
0

X Z∞
1 (5.64)
= dt td−1 e−nt
n=1
n
0

X 1
= Γ(d) = Γ(d) ζ(d + 1) .
nd+1
n=1

Finally, we invoke a result from the mathematics of the gamma function known as the doubling formula,
2z−1 z
 z+1

Γ(z) = √ Γ 2 Γ 2 . (5.65)
π
Putting it all together, we find
− 21 (d+1) d+1
 (kB T )d+1
p(T ) = g π Γ 2 ζ(d + 1) . (5.66)
(~c)d
The number density is found to be
Z∞
g(ε)
n(T ) = dε
eε/kB T −1
−∞ (5.67)
 d
− 12 (d+1) d+1
 kB T
= gπ Γ 2 ζ(d) .
~c
5.5. PHOTON STATISTICS 231

For photons in d = 3 dimensions, we have g = 2 and thus


 
2 ζ(3) kB T 3 2 ζ(4) (kB T )4
n(T ) = , p(T ) = . (5.68)
π2 ~c π2 (~c)3
π4
It turns out that ζ(4) = 90 .

Note that ~c/kB = 0.22855 cm · K, so


kB T
= 4.3755 T [K] cm −1 =⇒ n(T ) = 20.405 × T 3 [K3 ] cm−3 . (5.69)
~c
To find the entropy, we use Gibbs-Duhem:
dp
dµ = 0 = −s dT + v dp =⇒ s=v , (5.70)
dT
where s is the entropy per particle and v = n−1 is the volume per particle. We then find
ζ(d+1)
s(T ) = (d+1) kB . (5.71)
ζ(d)
The entropy per particle is constant. The internal energy is
∂ ln Ξ ∂
E=− =− βpV ) = d · p V , (5.72)
∂β ∂β
and hence the energy per particle is
E d · ζ(d+1)
ε= = d · pv = kB T . (5.73)
N ζ(d)

5.5.2 Classical arguments for the photon gas

A number of thermodynamic properties of the photon gas can be determined from purely classical argu-
ments. Here we recapitulate a few important ones.

1. Suppose our photon gas is confined to a rectangular box of dimensions Lx ×Ly ×Lz . Suppose further
that the dimensions are all expanded by a factor λ1/3 , i.e. the volume is isotropically expanded by
a factor of λ. The cavity modes of the electromagnetic radiation have quantized wavevectors, even
within classical electromagnetic theory, given by
 
2πnx 2πny 2πnz
k= , , . (5.74)
Lx Ly Lz

Since the energy for a given mode is ε(k) = ~c|k|, we see that the energy changes by a factor λ−1/3
under an adiabatic volume expansion V → λV , where the distribution of different electromagnetic
mode occupancies remains fixed. Thus,
   
∂E ∂E
V =λ = − 31 E . (5.75)
∂V S ∂λ S
232 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Thus,  
∂E E
p=− = , (5.76)
∂V S 3V
as we found in eqn. 5.72. Since E = E(T, V ) is extensive, we must have p = p(T ) alone.

2. Since p = p(T ) alone, we have


   
∂E ∂E
= = 3p
∂V T ∂V p
  (5.77)
∂p
=T −p ,
∂T V
∂S
 ∂p 
where the second line follows the Maxwell relation ∂V p
= ∂T V
, after invoking the First Law
dE = T dS − p dV . Thus,
dp
T = 4p =⇒ p(T ) = A T 4 , (5.78)
dT
where A is a constant. Thus, we recover the temperature dependence found microscopically in eqn.
5.66.

3. Given an energy density E/V , the differential energy flux emitted in a direction θ relative to a
surface normal is
E dΩ
djε = c · · cos θ · , (5.79)
V 4π
where dΩ is the differential solid angle. Thus, the power emitted per unit area is

Zπ/2 Z2π
dP cE cE
= dθ dφ sin θ · cos θ = = 3
4 c p(T ) ≡ σ T 4 , (5.80)
dA 4πV 4V
0 0

where σ = 43 cA, with p(T ) = A T 4 as we found above. From quantum statistical mechanical
considerations, we have
π 2 kB4 W
σ= 2 3
= 5.67 × 10−8 2 4 (5.81)
60 c ~ m K
is Stefan’s constant.

5.5.3 Surface temperature of the earth

We derived the result P = σT 4 · A where σ = 5.67 × 10−8 W/m2 K4 for the power emitted by an
electromagnetic ‘black body’. Let’s apply this result to the earth-sun system. We’ll need three lengths:
the radius of the sun R⊙ = 6.96 × 108 m, the radius of the earth Re = 6.38 × 106 m, and the radius of the
earth’s orbit ae = 1.50 × 1011 m. Let’s assume that the earth has achieved a steady state temperature
of Te . We balance the total power incident upon the earth with the power radiated by the earth. The
power incident upon the earth is

πRe2 4 2 Re2 R⊙2


Pincident = · σT⊙ · 4πR ⊙ = · πσT⊙4 . (5.82)
4πa2e a2e
5.5. PHOTON STATISTICS 233

Figure 5.2: Spectral density ρε (ν, T ) for blackbody radiation at three temperatures.

The power radiated by the earth is


Pradiated = σTe4 · 4πRe2 . (5.83)
Setting Pincident = Pradiated , we obtain
 1/2
R⊙
Te = T⊙ . (5.84)
2 ae
Thus, we find Te = 0.04817 T⊙ , and with T⊙ = 5780 K, we obtain Te = 278.4 K. The mean surface
temperature of the earth is T̄e = 287 K, which is only about 10 K higher. The difference is due to the fact
that the earth is not a perfect blackbody, i.e. an object which absorbs all incident radiation upon it and
emits radiation according to Stefan’s law. As you know, the earth’s atmosphere retraps a fraction of the
emitted radiation – a phenomenon known as the greenhouse effect.

5.5.4 Distribution of blackbody radiation

Recall that the frequency of an electromagnetic wave of wavevector k is ν = c/λ = ck/2π. Therefore the
number of photons NT (ν, T ) per unit frequency in thermodynamic equilibrium is (recall there are two
polarization states)
2V d3k V k2 dk
N (ν, T ) dν = 3 · ~ck/k T = 2 · ~ck/k T . (5.85)
8π e B −1 π e B −1
We therefore have
8πV ν2
N (ν, T ) = 3 · hν/k T . (5.86)
c e B −1
Since a photon of frequency ν carries energy hν, the energy per unit frequency E(ν) is
8πhV ν3
E(ν, T ) = · . (5.87)
c3 ehν/kB T − 1
234 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Note what happens if Planck’s constant h vanishes, as it does in the classical limit. The denominator
can then be written

ehν/kB T − 1 = + O(h2 ) (5.88)
kB T
and
8πkB T 2
ECL (ν, T ) = lim E(ν) = V · ν . (5.89)
h→0 c3
In classical electromagnetic theory, then, the total energy integrated over all frequencies diverges. This
is known as the ultraviolet catastrophe, since the divergence comes from the large ν part of the integral,
which in the optical spectrum is the ultraviolet portion. With quantization, the Bose-Einstein factor
imposes an effective ultraviolet cutoff kB T /h on the frequency integral, and the total energy, as we found
above, is finite:
Z∞
π 2 (kB T )4
E(T ) = dν E(ν) = 3pV = V · . (5.90)
15 (~c)3
0

We can define the spectral density ρε (ν) of the radiation as

E(ν, T ) 15 h (hν/kB T )3
ρε (ν, T ) ≡ = 4 (5.91)
E(T ) π kB T ehν/kB T − 1

so that ρε (ν, T ) dν is the fraction of the electromagnetic energy, under equilibrium conditions, between
R∞
frequencies ν and ν + dν, i.e. dν ρε (ν, T ) = 1. In fig. 5.2 we plot this in fig. 5.2 for three different
0
temperatures. The maximum occurs when s ≡ hν/kB T satisfies
 3 
d s s
s
=0 =⇒ =3 =⇒ s = 2.82144 . (5.92)
ds e − 1 1 − e−s

5.5.5 What if the sun emitted ferromagnetic spin waves?

We saw in eqn. 5.79 that the power emitted per unit surface area by a blackbody is σT 4 . The power law
here follows from the ultrarelativistic dispersion ε = ~ck of the photons. Suppose that we replace this
dispersion with the general form ε = ε(k). Now consider a large box in equilibrium at temperature T .
The energy current incident on a differential area dA of surface normal to ẑ is
Z 3
dk 1 ∂ε(k) 1
dP = dA · 3
Θ(cos θ) · ε(k) · · ε(k)/k T . (5.93)
(2π) ~ ∂kz e B −1
Let us assume an isotropic power law dispersion of the form ε(k) = Ck α . Then after a straightforward
calculation we obtain
dP 2
= σ T 2+ α , (5.94)
dA
where
2+ 2 2
2
 2
 g kB α C − α
σ =ζ 2+ α Γ 2+ α · . (5.95)
8π 2 ~
One can check that for g = 2, C = ~c, and α = 1 that this result reduces to that of eqn. 5.81.
5.6. LATTICE VIBRATIONS : EINSTEIN AND DEBYE MODELS 235

5.6 Lattice Vibrations : Einstein and Debye Models

Crystalline solids support propagating waves called phonons, which are quantized vibrations of the lattice.
p2
Recall that the quantum mechanical Hamiltonian for a single harmonic oscillator, Ĥ = 2m + 21 mω02 q 2 ,
† 1 †
may be written
  as Ĥ = ~ω0 (a a + 2 ), where a and a are ‘ladder operators’ satisfying commutation

relations a , a = 1.

5.6.1 One-dimensional chain

Consider the linear chain of masses and springs depicted in fig. 5.3. We assume that our system consists
of N mass points on a large ring of circumference L. In equilibrium, the masses are spaced evenly by a
distance b = L/N . That is, x0n = nb is the equilibrium position of particle n. We define un = xn − x0n to
be the difference between the position of mass n and The Hamiltonian is then
X  p2 
n 1 2
Ĥ = + κ (xn+1 − xn − a)
n
2m 2
X  p2  (5.96)
n 1 2 1 2
= + κ (un+1 − un ) + 2 N κ(b − a) ,
n
2m 2

where a is the unstretched length of each spring, m is the mass of each mass point, κ is the force constant
of each spring, and N is the total number of mass points. If b 6= a the springs are under tension in
equilibrium, but as we see this only leads to an additive constant in the Hamiltonian, and hence does not
enter the equations of motion.
The classical equations of motion are

∂ Ĥ p
u̇n = = n
∂pn m
(5.97)
∂ Ĥ 
ṗn = − = κ un+1 + un−1 − 2un .
∂un

Taking the time derivative of the first equation and substituting into the second yields
κ 
ün = un+1 + un−1 − 2un . (5.98)
m
We now write
1 X
un = √ ũk eikna , (5.99)
N k

where periodicity uN +n = un requires that the k values are quantized so that eikN a = 1, i.e. k = 2πj/N a
where j ∈ {0, 1, . . . , N −1}. The inverse of this discrete Fourier transform is

1 X
ũk = √ un e−ikna . (5.100)
N n
236 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Figure 5.3: A linear chain of masses and springs. The black circles represent the equilibrium positions
of the masses. The displacement of mass n relative to its equilibrium value is un .

Note that ũk is in general complex, but that ũ∗k = ũ−k . In terms of the ũk , the equations of motion take
the form

¨k = − 2κ 1 − cos(ka) ũk ≡ −ωk2 ũk .
ũ (5.101)
m
Thus, each ũk is a normal mode, and the normal mode frequencies are
r
κ 
ωk = 2 sin 12 ka . (5.102)
m
The density of states for this band of phonon excitations is

Zπ/a
dk
g(ε) = δ(ε − ~ωk )

−π/a (5.103)
2 −1/2
= J 2 − ε2 Θ(ε) Θ(J − ε) ,
πa
p
where J = 2~ κ/m is the phonon bandwidth. The step functions require 0 ≤ ε ≤ J; outside this range
there are no phonon energy levels and the density of states accordingly vanishes.
 
The entire theory can be quantized, taking pn , un′ = −i~δnn′ . We then define
1 X 1 X
pn = √ p̃k eikna , p̃k = √ pn e−ikna , (5.104)
N k N n
 
in which case p̃k , ũk′ = −i~δkk′ . Note that ũ†k = ũ−k and p̃†k = p̃−k . We then define the ladder operator
 1/2  
1 mωk 1/2
ak = p̃k − i ũk (5.105)
2m~ωk 2~

and its Hermitean conjugate a†k , in terms of which the Hamiltonian is


X 
Ĥ = ~ωk a†k ak + 12 , (5.106)
k

which is a sum over independent harmonic  oscillator modes. Note that the sum over k is restricted to
an interval of width 2π, e.g. k ∈ − πa , πa , which is the first Brillouin zone for the one-dimensional chain
structure. The state at wavevector k + 2π a is identical to that at k, as we see from eqn. 5.100.
5.6. LATTICE VIBRATIONS : EINSTEIN AND DEBYE MODELS 237

Figure 5.4: A crystal structure with an underlying square Bravais lattice and a three element basis.

5.6.2 General theory of lattice vibrations

The most general model of a harmonic solid is described by a Hamiltonian of the form
X p2 (R) 1 XX X α
Ĥ = i
+ ui (R) Φαβ ′ β ′
ij (R − R ) uj (R ) , (5.107)
2Mi 2 ′
R,i i,j α,β R,R

where the dynamical matrix is


∂2U
Φαβ ′
ij (R − R ) = , (5.108)
∂uαi (R) ∂uβj (R′ )

where U is the potential energy of interaction among all the atoms. Here we have simply expanded the
potential to second order in the local displacements uαi (R). The lattice sites R are elements of a Bravais
lattice. The indices i and j specify basis elements with respect to this lattice, and the indices α and
β range over {1, . . . , d}, the number of possible directions in space. The subject of crystallography is
beyond the scope of these notes, but, very briefly, a Bravais lattice in d dimensions is specified by a set of
d linearly independent primitive direct lattice vectors al , such that any point in the Bravais lattice may
P
be written as a sum over the primitive vectors with integer coefficients: R = dl=1 nl al . The set of all
such vectors {R} is called the direct lattice. The direct lattice is closed under the operation of vector
addition: if R and R′ are points in a Bravais lattice, then so is R + R′ .
A crystal is a periodic arrangement of lattice sites. The fundamental repeating unit is called the unit
cell. Not every crystal is a Bravais lattice, however. Indeed, Bravais lattices are special crystals in which
there is only one atom per unit cell. Consider, for example, the structure in fig. 5.4. The blue dots form
238 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

a square Bravais lattice with primitive direct lattice vectors a1 = a x̂ and a2 = a ŷ, where a is the lattice
constant, which is the distance between any neighboring pair of blue dots. The red squares and green
triangles, along with the blue dots, form a basis for the crystal structure which label each sublattice. Our
crystal in fig. 5.4 is formally classified as a square Bravais lattice with a three element basis. To specify
an arbitrary site in the crystal, we must specify both a direct lattice vector R as well as a basis index
j ∈ {1, . . . , r}, so that the location is R + ηj . The vectors {ηj } are the basis vectors for our crystal
structure. We see that a general crystal structure consists of a repeating unit, known as a unit cell. The
centers (or corners, if one prefers) of the unit cells form a Bravais lattice. Within a given unit cell, the
individual sublattice sites are located at positions ηj with respect to the unit cell position R.
Upon diagonalization, the Hamiltonian of eqn. 5.107 takes the form
X  
Ĥ = ~ωa (k) A†a (k) Aa (k) + 12 , (5.109)
k,a

where
 
Aa (k) , A†b (k′ ) = δab δkk′ . (5.110)
The eigenfrequencies are solutions to the eigenvalue equation
X (a) (a)
Φ̃αβ 2
ij (k) ejβ (k) = Mi ωa (k) eiα (k) , (5.111)
j,β

where X
Φ̃αβ
ij (k) = Φαβ
ij (R) e
−ik·R
. (5.112)
R

Here, k lies within the first Brillouin zone, which is the unit cell of the reciprocal lattice of points G
satisfying eiG·R = 1 for all G and R. The reciprocal lattice is also a Bravais lattice, with Pprimitive
reciprocal lattice vectors bl , such that any point on the reciprocal lattice may be written G = dl=1 ml bl .
One also has that al · bl′ = 2πδll′ . The index a ranges from 1 to d · r and labels the mode of oscillation at
(a)
wavevector k. The vector eiα (k) is the polarization vector for the ath phonon branch. In solids of high
symmetry, phonon modes can be classified as longitudinal or transverse excitations.
For a crystalline lattice with an r-element basis, there are then d · r phonon modes for each wavevector
k lying in the first Brillouin zone. If we impose periodic boundary conditions, then the k points within
the first Brillouin zone are themselves quantized, as in the d = 1 case where we found k = 2πn/N . There
are N distinct k points in the first Brillouin zone – one for every direct lattice site. The total number of
modes is than d · r · N , which is the total number of translational degrees of freedom in our system: rN
total atoms (N unit cells each with an r atom basis) each free to vibrate in d dimensions. Of the d · r
branches of phonon excitations, d of them will be acoustic modes whose frequency vanishes as k → 0.
The remaining d(r − 1) branches are optical modes and oscillate at finite frequencies. Basically, in an
acoustic mode, for k close to the (Brillouin) zone center k = 0, all the atoms in each unit cell move
together in the same direction at any moment of time. In an optical mode, the different basis atoms
move in different directions.
There is no number conservation law for phonons – they may be freely created or destroyed in anharmonic
processes, where two photons with wavevectors k and q can combine into a single phonon with wavevector
5.6. LATTICE VIBRATIONS : EINSTEIN AND DEBYE MODELS 239

k + q, and vice versa. Therefore the chemical potential for phonons is µ = 0. We define the density of
states ga (ω) for the ath phonon mode as
Z d
1 X  dk 
ga (ω) = δ ω − ωa (k) = V0 d
δ ω − ωa (k) , (5.113)
N (2π)
k BZ

where N is the number of unit cells, V0 is the unit cell volume of the direct lattice, and the k sum
and integral are over the first Brillouin zone only. Note that ω here has dimensions of frequency. The
functions ga (ω) is normalized to unity:
Z∞
dω ga (ω) = 1 . (5.114)
0

The total phonon density of states per unit cell is given by3
dr
X
g(ω) = ga (ω) . (5.115)
a=1

The grand potential for the phonon gas is



Y X 
na (k)+ 12
Ω(T, V ) = −kB T ln e−β~ωa (k)
k,a na (k)=0
" # 
X ~ωa (k)
= kB T ln 2 sinh (5.116)
2kB T
k,a
Z∞ "  #

= N kB T dω g(ω) ln 2 sinh .
2kB T
0

∂Ω

Note that V = N V0 since there are N unit cells, each of volume V0 . The entropy is given by S = − ∂T V
and thus the heat capacity is
Z∞    
∂ 2Ω ~ω 2 2 ~ω
CV = −T = N kB dω g(ω) csch (5.117)
∂T 2 2kB T 2kB T
0


 2kB T
Note that as T → ∞ we have csch 2kB T → ~ω , and therefore

Z∞
lim CV (T ) = N kB dω g(ω) = rdN kB . (5.118)
T →∞
0

This is the classical Dulong-Petit limit of 12 kB per quadratic degree of freedom; there are rN atoms
moving in d dimensions, hence d · rN positions and an equal number of momenta, resulting in a high
temperature limit of CV = rdN kB .
3
Note the dimensions of g(ω) are (frequency)−1 . By contrast, the dimensions of g(ε) in eqn. 5.29 are (energy)−1 ·
(volume)−1 . The difference lies in the a factor of V0 · ~, where V0 is the unit cell volume.
240 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Figure 5.5: Upper panel: phonon spectrum in elemental rhodium (Rh) at T = 297 K measured by high
precision inelastic neutron scattering (INS) by A. Eichler et al., Phys. Rev. B 57, 324 (1998). Note the
three acoustic branches and no optical branches, corresponding to d = 3 and r = 1. Lower panel: phonon
spectrum in gallium arsenide (GaAs) at T = 12 K, comparing theoretical lattice-dynamical calculations
with INS results of D. Strauch and B. Dorner, J. Phys.: Condens. Matter 2, 1457 (1990). Note the three
acoustic branches and three optical branches, corresponding to d = 3 and r = 2. The Greek letters along
the x-axis indicate points of high symmetry in the Brillouin zone.

5.6.3 Einstein and Debye models

HIstorically, two models of lattice vibrations have received wide attention. First is the so-called Einstein
model, in which there is no dispersion to the individual phonon modes. We approximate ga (ω) ≈ δ(ω−ωa ),
in which case
X  ~ω 2 
~ωa

a 2
CV (T ) = N kB csch . (5.119)
a
2kB T 2kB T
~ωa 
At low temperatures, the contribution from each branch vanishes exponentially, since csch2 2kB T ≃
4 e−~ωa /kB T → 0. Real solids don’t behave this way.
A more realistic model. due to Debye, accounts for the low-lying acoustic phonon branches. Since the
5.6. LATTICE VIBRATIONS : EINSTEIN AND DEBYE MODELS 241

acoustic phonon dispersion vanishes linearly with |k| as k → 0, there is no temperature at which the
acoustic phonons ‘freeze out’ exponentially, as in the case of Einstein phonons. Indeed, the Einstein
model is appropriate in describing the d (r−1) optical phonon branches, though it fails miserably for the
acoustic branches.
In the vicinity of the zone center k = 0 (also called Γ in crystallographic notation) the d acoustic modes
obey a linear dispersion, with ωa (k) = ca (k̂) k. This results in an acoustic phonon density of states in
d = 3 dimensions of
Z
V0 ω 2 X dk̂ 1
g̃(ω) = Θ(ωD − ω)
2π 2 a 4π c3a (k)
(5.120)
3V0 2
= 2 3 ω Θ(ωD − ω) ,
2π c̄
where c̄ is an average acoustic phonon velocity (i.e. speed of sound) defined by
X Z
3 dk̂ 1
= (5.121)
c̄3 a
4π c3a (k)

and ωD is a cutoff known as the Debye frequency. The cutoff is necessary because the phonon branch
does not extend forever, but only to the boundaries of the Brillouin zone. Thus, ωD should roughly be
equal to the energy of a zone boundary phonon. Alternatively, we can define ωD by the normalization
condition
Z∞
dω g̃(ω) = 3 =⇒ ωD = (6π 2 /V0 )1/3 c̄ . (5.122)
0

This allows us to write g̃(ω) = 9ω 2 /ωD3 Θ(ωD − ω).
The specific heat due to the acoustic phonons is then

ZωD    
9N kB 2 ~ω 2 2 ~ω
CV (T ) = dω ω csch
ωD3 2kB T 2kB T
0 (5.123)
 
2T 3 
= 9N kB φ ΘD /2T ,
ΘD

where ΘD = ~ωD /kB is the Debye temperature and



1 3
Zx 
3x x→0
φ(x) = dt t4 csch 2 t = (5.124)

 π4
0
30 x→∞.

Therefore,   3

 12π 4
N k T
T ≪ ΘD
 5 B ΘD
CV (T ) = (5.125)



3N kB T ≫ ΘD .
242 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Element Ag Al Au C Cd Cr Cu Fe Mn
ΘD (K) 227 433 162 2250 210 606 347 477 409
Tmelt (K) 962 660 1064 3500 321 1857 1083 1535 1245
Element Ni Pb Pt Si Sn Ta Ti W Zn
ΘD (K) 477 105 237 645 199 246 420 383 329
Tmelt (K) 1453 327 1772 1410 232 2996 1660 3410 420

Table 5.1: Debye temperatures (at T = 0) and melting points for some common elements (carbon is
assumed to be diamond and not graphite). (Source: the internet!)

Thus, the heat capacity due to acoustic phonons obeys the Dulong-Petit rule in that CV (T → ∞) = 3N kB ,
corresponding to the three acoustic degrees of freedom per unit cell. The remaining contribution of
3(r − 1)N kB to the high temperature heat capacity comes from the optical modes not considered in the
Debye model. The low temperature T 3 behavior of the heat capacity of crystalline solids is a generic
feature, and its detailed description is a triumph of the Debye model.

5.6.4 Melting and the Lindemann criterion

Atomic fluctuations in a crystal

For the one-dimensional chain, eqn. 5.105 gives


 1/2
~ 
ũk = i ak − a†−k . (5.126)
2mωk

Therefore the RMS fluctuations at each site are given by

1 X
hu2n i = hũk ũ−k i
N
k
(5.127)
1 X ~ 
= n(k) + 21 ,
N mωk
k

 −1
where n(k, T ) = exp(~ωk /kB T ) − 1 is the Bose occupancy function.
Let us now generalize this expression to the case of a d-dimensional solid. The appropriate expression for
the RMS position fluctuations of the ith basis atom in each unit cell is

dr
1 XX ~ 
hu2i (R)i = na (k) + 21 . (5.128)
N Mia (k) ωa (k)
k a=1

Here we sum over all wavevectors k in the first Brilliouin zone, and over all normal modes a. There are
dr normal modes per unit cell i.e. d branches of the phonon dispersion ωa (k). (For the one-dimensional
5.6. LATTICE VIBRATIONS : EINSTEIN AND DEBYE MODELS 243

chain with d = 1 and r = 1 there was only one such branch to consider). Note also the quantity Mia (k),
(a)
which has units of mass and is defined in terms of the polarization vectors eiα (k) as
d
X
1 (a) 2
= e (k) . (5.129)
Mia (k) µ=1 iµ

The dimensions of the polarization vector are [mass]−1/2 , since the generalized orthonormality condition
on the normal modes is X (a) ∗ (b)
Mi eiµ (k) eiµ (k) = δab , (5.130)
i,µ

where Mi is the mass of the atom of species i within the unit cell (i ∈ {1, . . . , r}). For our purposes we
can replace Mia (k) by an appropriately averaged quantity which we call Mi ; this ‘effective mass’ is then
independent of the mode index a as well as the wavevector k. We may then write
Z∞ ( )
~ 1 1
h u2i i ≈ dω g(ω) · + , (5.131)
Mi ω e~ω/kB T − 1 2
0

where we have dropped the site label R since translational invariance guarantees that the fluctuations
are the same from one unit cell to the next. Note that the fluctuations h u2i i can be divided into
a temperature-dependent part h u2i ith and a temperature-independent quantum contribution h u2i iqu ,
where
Z∞
2 ~ g(ω) 1
h ui ith = dω · ~ω/k T
Mi ω e B −1
0
(5.132)
Z∞
~ g(ω)
h u2i iqu = dω .
2Mi ω
0

Let’s evaluate these contributions within the Debye model, where we replace g(ω) by
d2 ω d−1
ḡ(ω) = Θ(ωD − ω) . (5.133)
ωDd
We then find
 d−1
d2 ~ kB T
h u2i ith = Fd (~ωD /kB T )
Mi ωD ~ωD
(5.134)
d2 ~
h u2i iqu = · ,
d − 1 2Mi ωD

where  d−2
x
Zx 
 d−2 x→0
sd−2
Fd (x) = ds s = . (5.135)
e −1  
0 ζ(d − 1) x→∞

We can now extract from these expressions several important conclusions:


244 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

1) The T = 0 contribution to the the fluctuations, h u2i iqu , diverges in d = 1 dimensions. Therefore
there are no one-dimensional quantum solids.

2) The thermal contribution to the fluctuations, h u2i ith , diverges for any T > 0 whenever d ≤ 2. This
is because the integrand of Fd (x) goes as sd−3 as s → 0. Therefore, there are no two-dimensional
classical solids.

3) Both the above conclusions are valid in the thermodynamic limit. Finite size imposes a cutoff on
the frequency integrals, because there is a smallest wavevector kmin ∼ 2π/L, where L is the (finite)
linear dimension of the system. This leads to a low frequency cutoff ωmin = 2πc̄/L, where c̄ is the
appropriately averaged acoustic phonon velocity from eqn. 5.121, which mitigates any divergences.

Lindemann melting criterion

An old phenomenological theory of melting due to Lindemann says that a crystalline solid melts when
the RMS fluctuations in the atomic positions exceeds a certain fraction η of the lattice constant a. We
therefore define the ratios
 
2 h u2i ith 2 ~2 T d−1
xi,th ≡ = d · · · F (ΘD /T )
a2 Mi a2 kB ΘDd
  (5.136)
2 h u2i iqu d2 ~2 1
xi,qu ≡ = · · ,
a2 2(d − 1) Mi a2 kB ΘD
q q .
with xi = x2i,th + x2i,qu = h u2i i a.

Let’s now work through an example of a three-dimensional solid. We’ll assume a single element basis
(r = 1). We have that
9~2 /4kB
2 = 109 K . (5.137)
1 amu Å
According to table 5.1, the melting temperature always exceeds the Debye temperature, and often by a
great amount. We therefore assume T ≫ ΘD , which puts us in the small x limit of Fd (x). We then find
s 
2 Θ⋆ 2 Θ ⋆ 4T 4T Θ⋆
xqu = , xth = · , x= 1+ . (5.138)
ΘD ΘD ΘD ΘD ΘD

where
109 K
Θ∗ = 2 . (5.139)
M [amu] · a[Å]
The total position fluctuation is of course the sum x2 = x2i,th + x2i,qu . Consider for example the case of
copper, with M = 56 amu and a = 2.87 Å. The Debye temperature is ΘD = 347 K. From this we find
xqu = 0.026, which says that at T = 0 the RMS fluctuations of the atomic positions are not quite three
percent of the lattice spacing (i.e. the distance between neighboring copper atoms). At room temperature,
T = 293 K, one finds xth = 0.048, which is about twice as large as the quantum contribution. How big
are the atomic position fluctuations at the melting point? According to our table, Tmelt = 1083 K for
5.6. LATTICE VIBRATIONS : EINSTEIN AND DEBYE MODELS 245

copper, and from our formulae we obtain xmelt = 0.096. The Lindemann criterion says that solids melt
when x(T ) ≈ 0.1.
We were very lucky to hit the magic number xmelt = 0.1 with copper. Let’s try another example. Lead
has M = 208 amu and a = 4.95 Å. The Debye temperature is ΘD = 105 K (‘soft phonons’), and the
melting point is Tmelt = 327 K. From these data we obtain x(T = 0) = 0.014, x(293 K) = 0.050 and
x(T = 327 K) = 0.053. Same ballpark.
We can turn the analysis around and predict a melting temperature based on the Lindemann criterion
x(Tmelt ) = η, where η ≈ 0.1. We obtain
 2 
η ΘD Θ
TL = ⋆
−1 · D . (5.140)
Θ 4

We call TL the Lindemann temperature. Most treatments of the Lindemann criterion ignore the quantum
correction, which gives the −1 contribution inside the above parentheses. But if we are more careful and
include it, we see that it may be possible to have TL < 0. This occurs for any crystal where ΘD < Θ⋆ /η 2 .
Consider for example the case of 4 He, which at atmospheric pressure condenses into a liquid at Tc = 4.2 K
and remains in the liquid state down to absolute zero. At p = 1 atm, it never solidifies! Why? The number
density of liquid 4 He at p = 1 atm and T = 0 K is 2.2 × 1022 cm−3 . Let’s say the Helium atoms want
to form a crystalline lattice. We don’t know a priori what the lattice structure will be, so let’s for the
sake of simplicity assume a simple cubic lattice. From the number density we obtain a lattice spacing of
a = 3.57 Å. OK now what do we take for the Debye temperature? Theoretically this should depend on the
microscopic force constants which enter the small oscillations problem (i.e. the spring constants between
pairs of helium atoms in equilibrium). We’ll use the expression we derived for the Debye frequency,
ωD = (6π 2 /V0 )1/3 c̄, where V0 is the unit cell volume. We’ll take c̄ = 238 m/s, which is the speed of
sound in liquid helium at T = 0. This gives ΘD = 19.8 K. We find Θ⋆ = 2.13 K, and if we take η = 0.1
this gives Θ⋆ /η 2 = 213 K, which significantly exceeds ΘD . Thus, the solid should melt because the RMS
fluctuations in the atomic positions at absolute zero are huge: xqu = (Θ⋆ /ΘD )1/2 = 0.33. By applying
pressure, one can get 4 He to crystallize above pc = 25 atm (at absolute zero). Under pressure, the unit
cell volume V0 decreases and the phonon velocity c̄ increases, so the Debye temperature itself increases.
It is important to recognize that the Lindemann criterion does not provide us with a theory of melting
per se. Rather it provides us with a heuristic which allows us to predict roughly when a solid should
melt.

5.6.5 Goldstone bosons

The vanishing of the acoustic phonon dispersion at k = 0 is a consequence of Goldstone’s theorem


which says that associated with every broken generator of a continuous symmetry there is an associated
bosonic gapless excitation (i.e. one whose frequency ω vanishes in the long wavelength limit). In the
case of phonons, the ‘broken generators’ are the symmetries under spatial translation in the x, y, and z
directions. The crystal selects a particular location for its center-of-mass, which breaks this symmetry.
There are, accordingly, three gapless acoustic phonons.
Magnetic materials support another branch of elementary excitations known as spin waves, or magnons.
246 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

In isotropic magnets, there is a global symmetry associated with rotations in internal spin space, described
by the group SU(2). If the system spontaneously magnetizes, meaning there is long-ranged ferromagnetic
order (↑↑↑ · · · ), or long-ranged antiferromagnetic order (↑↓↑↓ · · · ), then global spin rotation symmetry is
broken. Typically a particular direction is chosen for the magnetic moment (or staggered moment, in the
case of an antiferromagnet). Symmetry under rotations about this axis is then preserved, but rotations
which do not preserve the selected axis are ‘broken’. In the most straightforward case, that of the
antiferromagnet, there are two such rotations for SU(2), and concomitantly two gapless magnon branches,
with linearly vanishing dispersions ωa (k). The situation is more subtle in the case of ferromagnets, because
the total magnetization is conserved by the dynamics (unlike the total staggered magnetization in the
case of antiferromagnets). Another wrinkle arises if there are long-ranged interactions present.
For our purposes, we can safely ignore the deep physical reasons underlying the gaplessness of Goldstone
bosons and simply posit a gapless dispersion relation of the form ω(k) = A |k|σ . The density of states
for this excitation branch is then
d
−1
g(ω) = C ω σ Θ(ωc − ω) , (5.141)

where C is a constant and ωc is the cutoff, which is the bandwidth for this excitation branch.4 Normalizing
the density of states for this branch results in the identification ωc = (d/σC)σ/d .
The heat capacity is then found to be

Zωc    
d
−1 ~ω 2 2 ~ω
CV = N kB C dω ω σ
csch
kB T 2kB T
0 (5.142)
 d/σ
d 2T 
= N kB φ Θ/2T ,
σ Θ

where Θ = ~ωc /kB and


σ d/σ
Zx d 
d x x→0
σ
+1 2
φ(x) = dt t csch t = (5.143)

 −d/σ  
0 2 Γ 2 + σd ζ 2 + σd x→∞,

which is a generalization of our earlier results. Once again, we recover Dulong-Petit for kB T ≫ ~ωc , with
CV (T ≫ ~ωc /kB ) = N kB .
In an isotropic ferromagnet, i.e.a ferromagnetic material where there is full SU(2) symmetry in internal
‘spin’ space, the magnons have a k2 dispersion. Thus, a bulk three-dimensional isotropic ferromagnet will
exhibit a heat capacity due to spin waves which behaves as T 3/2 at low temperatures. For sufficiently
low temperatures this will overwhelm the phonon contribution, which behaves as T 3 .

4 −d d
−σ
If ω(k) = Akσ , then C = 21−d π σ −1 A

2
g Γ(d/2) .
5.7. THE IDEAL BOSE GAS 247

5.7 The Ideal Bose Gas

5.7.1 General formulation for noninteracting systems

Recall that the grand partition function for noninteracting bosons is given by
!
Y X ∞ Y −1
β(µ−εα )nα
Ξ= e = 1 − eβ(µ−εα ) , (5.144)
α nα =0 α

In order for the sum to converge to the RHS above, we must have µ < εα for all single-particle states |αi.
The density of particles is then
  Z ∞
1 ∂Ω 1 X 1 g(ε)
n(T, µ) = − = β(ε −µ)
= dε β(ε−µ) , (5.145)
V ∂µ T,V V α e α −1 e −1
ε0
P
where g(ε) = V1 α δ(ε − εα ) is the density of single particle states per unit volume. We assume that
g(ε) = 0 for ε < ε0 ; typically ε0 = 0, as is the case for any dispersion of the form ε(k) = A|k|r , for
example. However, in the presence of a magnetic field, we could have ε(k, σ) = A|k|r − gµ0 Hσ, in which
case ε0 = −gµ0 |H|.
Clearly n(T, µ) is an increasing function of both T and µ. At fixed T , the maximum possible value for
n(T, µ), called the critical density nc (T ), is achieved for µ = ε0 , i.e.
Z∞
g(ε)
nc (T ) = dε . (5.146)
eβ(ε−ε0 ) −1
ε0

The above integral converges provided g(ε0 ) = 0, assuming g(ε) is continuous5 . If g(ε0 ) > 0, the integral
diverges, and nc (T ) = ∞. In this latter case, one can always invert the equation for n(T, µ) to obtain
the chemical potential µ(T, n). In the former case, where the nc (T ) is finite, we have a problem – what
happens if n > nc (T ) ?
In the former case, where nc (T ) is finite, we can equivalently restate the problem in terms of a critical
temperature Tc (n), defined by the equation nc (Tc ) = n. For T < Tc , we apparently can no longer invert to
obtain µ(T, n), so clearly something has gone wrong. The remedy is to recognize that the single particle
energy levels are discrete, and separate out the contribution from the lowest energy state ε0 . I.e. we write
n0 n′
z }| { z }| {
Z∞
1 g0 g(ε)
n(T, µ) = β(ε −µ)
+ dε , (5.147)
V e 0 −1 eβ(ε−µ) −1
ε0

where g0 is the degeneracy of the single particle state with energy ε0 . We assume that n0 is finite, which
means that N0 = V n0 is extensive. We say that the particles have condensed into the state with energy
5
OK, that isn’t quite true. For example, if g(ε) ∼ 1/ ln ε, then the integral has a very weak ln ln(1/η) divergence, where
η is the lower cutoff. But for any power law density of states g(ε) ∝ εr with r > 0, the integral converges.
248 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

ε0 . The quantity n0 is the condensate density. The remaining particles, with density n′ , are said to
comprise the overcondensate. With the total density n fixed, we have n = n0 + n′ . Note that n0 finite
means that µ is infinitesimally close to ε0 :
 
g0 g k T
µ = ε0 − kB T ln 1 + ≈ ε0 − 0 B . (5.148)
V n0 V n0

Note also that if ε0 − µ is finite, then n0 ∝ V −1 is infinitesimal.


Thus, for T < Tc (n), we have µ = ε0 with n0 > 0, and
Z∞
g(ε)
n(T, n0 ) = n0 + dε . (5.149)
e(ε−ε0 )/kB T −1
ε0

For T > Tc (n), we have n0 = 0 and

Z∞
g(ε)
n(T, µ) = dε . (5.150)
e(ε−µ)/kB T −1
ε0

The equation for Tc (n) is


Z∞
g(ε)
n = dε . (5.151)
e(ε−ε0 )/kB Tc −1
ε0

For another take on ideal Bose gas condensation see the appendix in §5.10.

5.7.2 Ballistic dispersion

We already derived, in §5.3.3, expressions for n(T, z) and p(T, z) for the ideal Bose gas (IBG) with ballistic
dispersion ε(p) = p2 /2m, We found

n(T, z) = g λ−d
T Li d (z)
2
(5.152)
p(T, z) = g kB T λ−d
T Li d +1 (z),
2

where g is the internal (e.g. spin) degeneracy of each single particle energy level. Here z = eµ/kB T is the
fugacity and

X zm
Lis (z) = (5.153)
ms
m=1

is the polylogarithm function. For bosons with a spectrum bounded below by ε0 = 0, the fugacity takes
values on the interval z ∈ [0, 1]6 .
6
It is easy to see that the chemical potential for noninteracting bosons can never exceed the minimum value ε0 of the
single particle dispersion.
5.7. THE IDEAL BOSE GAS 249

1 3 5
Figure 5.6: The polylogarithm function Lis (z) versus z for s = 2, s = 2, and s = 2. Note that
Lis (1) = ζ(s) diverges for s ≤ 1.

Clearly n(T, z) = g λ−d


T Li d (z) is an increasing function of z for fixed T . In fig. 5.6 we plot the function
2
Lis (z) versus z for three different values of s. We note that the maximum value Lis (z = 1) is finite if
s > 1. Thus, for d > 2, there is a maximum density nmax (T ) = g Li d (z) λ−d
T which is an increasing
2
function of temperature T . Put another way, if we fix the density n, then there is a critical temperature
Tc below which there is no solution to the equation n = n(T, z). The critical temperature Tc (n) is then
determined by the relation
   2/d
d
 mkB Tc d/2 2π~2 n
n = gζ 2 =⇒ kB Tc =  . (5.154)
2π~2 m g ζ d2
What happens for T < Tc ?
As shown above in §5.7, we must separate out the contribution from the lowest energy single particle
mode, which for ballistic dispersion lies at ε0 = 0. Thus writing
1 1 1 X 1
n= −1
+ , (5.155)
V z − 1 V α z e α BT − 1
−1 ε /k
(εα >0)

where we have taken g = 1. Now V −1 is of course very small, since V is thermodynamically large, but if
µ → 0 then z −1 − 1 is also very small and their ratio can be finite, as we have seen. Indeed, if the density
of k = 0 bosons n0 is finite, then their total number N0 satisfies
1 1
N0 = V n 0 = =⇒ z= . (5.156)
z −1−1 1 + N0−1
The chemical potential is then
 k T
µ = kB T ln z = −kB T ln 1 + N0−1 ≈ − B → 0− . (5.157)
N0
In other words, the chemical potential is infinitesimally negative, because N0 is assumed to be thermo-
dynamically large.
250 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

According to eqn. 5.11, the contribution to the pressure from the k = 0 states is

kB T k T
p0 = − ln(1 − z) = B ln(1 + N0 ) → 0+ . (5.158)
V V
So the k = 0 bosons, which we identify as the condensate, contribute nothing to the pressure.
Having separated out the k = 0 mode, we can now replace the remaining sum over α by the usual integral
over k. We then have

T < Tc : n = n0 + g ζ d2 λ−d
T
(5.159)
d
 −d
p = g ζ 2 +1 kB T λT

and

T > Tc : n = g Li d (z) λ−d


T
2
(5.160)
p = g Li d +1 (z) kB T λ−d
T .
2

The condensate fraction n0 /n is unity at T = 0, when all particles are in the condensate with k = 0, and
decreases with increasing T until T = Tc , at which point it vanishes identically. Explicitly, we have
  d/2
n0 (T ) g ζ d2 T
=1− =1− . (5.161)
n n λdT Tc (n)

Let us compute the internal energy E for the ideal Bose gas. We have

∂ ∂Ω ∂Ω
(βΩ) = Ω + β =Ω−T = Ω + TS (5.162)
∂β ∂β ∂T

and therefore

E = Ω + T S + µN = µN + (βΩ)
∂β
 ∂ 
= V µn − (βp) (5.163)
∂β
= 2 d gV kB T λ−d
1
T Li d +1 (z) .
2

This expression is valid at all temperatures, both above and below Tc . Note that the condensate particles
do not contribute to E, because the k = 0 condensate particles carry no energy.

We now investigate the heat capacity CV,N = ∂E ∂T V,N . Since we have been working in the GCE, it is
very important to note that N is held constant when computing CV,N . We’ll also restrict our attention
to the case d = 3 since the ideal Bose gas does not condense at finite T for d ≤ 2 and d > 3 is unphysical.
While we’re at it, we’ll also set g = 1.
5.7. THE IDEAL BOSE GAS 251

Figure 5.7: Molar heat capacity of the ideal Bose gas (units of R). Note the cusp at T = Tc .

The number of particles is  


 N0 + ζ
 3
2 V λ−3
T (T < Tc )
N= (5.164)


V λ−3
T Li3/2 (z) (T > Tc ) ,
and the energy is
3 V
E= kB T Li (z) . (5.165)
2 λ3T 5/2
For T < Tc , we have z = 1 and
 
∂E 15 5
 V
CV,N = = 4 ζ 2 kB . (5.166)
∂T V,N λ3T

The molar heat capacity is therefore


CV,N  −1
cV,N (T, n) = NA · = 15
4 ζ 5
2 R · n λ3T . (5.167)
N
For T > Tc , we have
V dT V dz
dE V = 15
4 kB T Li5/2 (z) · + 3
2 kB T Li3/2 (z) · , (5.168)
λ3T T λ3T z

where we have invoked eqn. 5.52. Taking the differential of N , we have


V dT V dz
dN V = 3
2 Li3/2 (z) 3 · + Li1/2 (z) 3 · . (5.169)
λT T λT z

We set dN = 0, which fixes dz in terms of dT , resulting in


"5 3
#
3 2 Li5/2 (z) 2 Li3/2 (z)
cV,N (T, z) = 2 R · − . (5.170)
Li3/2 (z) Li1/2 (z)
252 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

To obtain cV,N (T, n), we must invert the relation

n(T, z) = λ−3
T Li3/2 (z) (5.171)

in order to obtain z(T, n), and then insert this into eqn. 5.170. The results are shown in fig. 5.7. There
are several noteworthy features of this plot. First of all, by dimensional analysis the function cV,N (T, n)
is R times a function of the dimensionless ratio T /Tc (n) ∝ T n−2/3 . Second, the high temperature limit
is 23 R, which is the classical value. Finally, there is a cusp at T = Tc (n).
For another example, see §5.11.

5.7.3 Isotherms for the ideal Bose gas

Let a be some length scale and define


2π~2 2π~2
va = a 3 , pa = , Ta = (5.172)
ma5 ma2 kB
Then we have
 3/2
va T
= Li3/2 (z) + va n0 (5.173)
v Ta
 5/2
p T
= Li5/2 (z) , (5.174)
pa Ta

where v = V /N is the volume per particle7 and n0 is the condensate number density; n0 vanishes for
T ≥ Tc , where z = 1. One identifies a critical volume vc (T ) by setting z = 1 and n0 = 0, leading to
vc (T ) = va (T /Ta )3/2 . For v < vc (T ), we set z = 1 in eqn. 5.173 to find a relation between v, T , and n0 .
For v > vc (T ), we set n0 = 0 in eqn. 5.173 to relate v, T , and z. Note that the pressure is independent
of volume for T < Tc . The isotherms in the (p, v) plane are then flat for v < vc . This resembles
the coexistence region familiar from our study of the thermodynamics of the liquid-gas transition. The
situation is depicted in Fig. 5.8. In the (T, p) plane, we identify pc (T ) = pa (T /Ta )5/2 as the critical
temperature at which condensation starts to occur.
Recall the Gibbs-Duhem equation,
dµ = −s dT + v dp . (5.175)
Along a coexistence curve, we have the Clausius-Clapeyron relation,
 
dp s − s1 ℓ
= 2 = , (5.176)
dT coex v2 − v1 T ∆v
where ℓ = T (s2 − s1 ) is the latent heat per mole, and ∆v = v2 − v1 . For ideal gas Bose condensation,
the coexistence curve resembles the red curve in the right hand panel of fig. 5.8. There is no meaning to
the shaded region where p > pc (T ). Nevertheless, it is tempting to associate the curve p = pc (T ) with
the coexistence of the k = 0 condensate and the remaining uncondensed (k 6= 0) bosons8 .
7
Note that in the thermodynamics chapter we used v to denote the molar volume, NA V /N .
8
The k 6= 0 particles are sometimes called the overcondensate.
5.7. THE IDEAL BOSE GAS 253

Figure 5.8: Phase diagrams for the ideal Bose gas. Left panel: (p, v) plane. The solid blue curves are
isotherms, and the green hatched region denotes v < vc (T ), where the system is partially condensed. Right
panel: (p, T ) plane. The solid red curve is the coexistence curve pc (T ), along which Bose condensation
occurs. No distinct thermodynamic phase exists in the yellow hatched region above p = pc (T ).

The entropy in the coexistence region is given by


  5
  
1 ∂Ω  ζ 25 n0
s=− = 5
2 ζ 5
2 kB v λ−3
T = 2  kB 1 − . (5.177)
N ∂T V ζ 23 n

All the entropy is thus carried by the uncondensed bosons, and the condensate carries zero entropy.
The Clausius-Clapeyron relation can then be interpreted as describing a phase equilibrium between the
condensate, for which s0 = v0 = 0, and the uncondensed bosons, for which s′ = s(T ) and v ′ = vc (T ).
So this identification forces us to conclude that the specific volume of the condensate is zero. This is
certainly false in an interacting Bose gas!
While one can identify, by analogy, a ‘latent heat’ ℓ = T ∆s = T s in the Clapeyron equation, it is
important to understand that there is no distinct thermodynamic phase associated with the region p >
pc (T ). Ideal Bose gas condensation is a second order transition, and not a first order transition.

5.7.4 The λ-transition in Liquid 4 He

Helium has two stable isotopes. 4 He is a boson, consisting of two protons, two neutrons, and two electrons
(hence an even number of fermions). 3 He is a fermion, with one less neutron than 4 He. Each 4 He atom
can be regarded as a tiny hard sphere of mass m = 6.65 × 10−24 g and diameter a = 2.65 Å. A sketch
of the phase diagram is shown in fig. 5.9. At atmospheric pressure, Helium liquefies at Tl = 4.2 K. The
gas-liquid transition is first order, as usual. However, as one continues to cool, a second transition sets in
at T = Tλ = 2.17 K (at p = 1 atm). The λ-transition, so named for the λ-shaped anomaly in the specific
heat in the vicinity of the transition, as shown in fig. 5.10, is continuous (i.e. second order).
254 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Figure 5.9: Phase diagram of 4 He. All phase boundaries are first order transition lines, with the exception
of the normal liquid-superfluid transition, which is second order. (Source: University of Helsinki)

If we pretend that 4 He is a noninteracting Bose gas, then from the density of the liquid n = 2.2×1022 cm−3 ,
2 
3 2/3
we obtain a Bose-Einstein condensation temperature Tc = 2π~ m n/ζ( 2 ) = 3.16 K, which is in the right
ballpark. The specific heat Cp (T ) is found to be singular at T = Tλ , with
−α
Cp (T ) = A T − Tλ (p) . (5.178)

α is an example of a critical exponent. We shall study the physics of critical phenomena later on in this
course. For now, note that a cusp singularity of the type found in fig. 5.7 corresponds to α = −1. The
behavior of Cp (T ) in 4 He is very nearly logarithmic in |T − Tλ |. In fact, both theory (renormalization
group on the O(2) model) and experiment concur that α is almost zero but in fact slightly negative, with
α = −0.0127 ± 0.0003 in the best experiments (Lipa et al., 2003). The λ transition is most definitely not
an ideal Bose gas condensation. Theoretically, in the parlance of critical phenomena, IBG condensation
and the λ-transition in 4 He lie in different universality classes 9 . Unlike the IBG, the condensed phase in
4 He is a distinct thermodynamic phase, known as a superfluid.

Note that Cp (T < Tc ) for the IBG is not even defined, since for T < Tc we have p = p(T ) and therefore
dp = 0 requires dT = 0.

5.7.5 Fountain effect in superfluid 4 He

At temperatures T < Tλ , liquid 4 He has a superfluid component which is a type of Bose condensate. In
fact, there is an important difference between condensate fraction Nk=0 /N and superfluid density, which
is denoted by the symbol ρs . In 4 He, for example, at T = 0 the condensate fraction is only about 8%,
while the superfluid fraction ρs /ρ = 1. The distinction between N0 and ρs is very interesting but lies
beyond the scope of this course.
9
IBG condensation is in the universality class of the spherical model. The λ-transition is in the universality class of the
XY model.
5.7. THE IDEAL BOSE GAS 255

Figure 5.10: Specific heat of liquid 4 He in the vicinity of the λ-transition. Data from M. J. Buckingham
and W. M. Fairbank, in Progress in Low Temperature Physics, C. J. Gortner, ed. (North-Holland, 1961).
Inset at upper right: more recent data of J. A. Lipa et al., Phys. Rev. B 68, 174518 (2003) performed
in zero gravity earth orbit, to within ∆T = 2 nK of the transition.

One aspect of the superfluid state is its complete absence of viscosity. For this reason, superfluids can flow
through tiny cracks called microleaks that will not pass normal fluid. Consider then a porous plug which
permits the passage of superfluid but not of normal fluid. The key feature of the superfluid component
is that it has zero energy density. Therefore even though there is a transfer of particles across the plug,
there is no energy exchange, and therefore a temperature gradient across the plug can be maintained10 .
The elementary excitations in the superfluid state are sound waves called phonons. They are compres-
sional waves, just like longitudinal phonons in a solid, but here in a liquid. Their dispersion is acoustic,
given by ω(k) = ck where c = 238 m/s.11 The have no internal degrees of freedom, hence g = 1. Like
phonons in a solid, the phonons in liquid helium are not conserved. Hence their chemical potential van-
ishes and these excitations are described by photon statistics. We can now compute the height difference
∆h in a U-tube experiment.

10
Recall that two bodies in thermal equilibrium will have identical temperatures if they are free to exchange energy.
11
The phonon velocity c is slightly temperature dependent.
256 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Figure 5.11: The fountain effect. In each case, a temperature gradient is maintained across a porous
plug through which only superfluid can flow. This results in a pressure gradient which can result in a
fountain or an elevated column in a U-tube.

Clearly ∆h = ∆p/ρg. so we must find p(T ) for the helium. In the grand canonical ensemble, we have
Z
d3k 
p = −Ω/V = −kB T 3
ln 1 − e−~ck/kB T
(2π)
Z∞ (5.179)
(kB T )4 4π 2 −u π 2 (kB T )4
=− du u ln(1 − e ) = .
(~c)3 8π 3 90 (~c)3
0

Let’s assume T = 1 K. We’ll need the density of liquid helium, ρ = 148 kg/m3 .
 3
dh 2π 2 kB T kB
=
dT 45 ~c ρg
 3 (5.180)
2π 2 (1.38 × 10−23 J/K)(1 K) (1.38 × 10−23 J/K)
= × ≃ 32 cm/K ,
45 (1.055 × 10−34 J · s)(238 m/s) (148 kg/m3 )(9.8 m/s2 )

a very noticeable effect!

5.7.6 Bose condensation in optical traps

The 2001 Nobel Prize in Physics was awarded to Weiman, Cornell, and Ketterle for the experimental
observation of Bose condensation in dilute atomic gases. The experimental techniques required to trap
and cool such systems are a true tour de force, and we shall not enter into a discussion of the details
here12 .
The optical trapping of neutral bosonic atoms, such as 87 Rb, results in a confining potential V (r) which
12
Many reliable descriptions may be found on the web. Check Wikipedia, for example.
5.7. THE IDEAL BOSE GAS 257

is quadratic in the atomic positions. Thus, the single particle Hamiltonian for a given atom is written
~2 2 1 
Ĥ = − ∇ + 2 m ω12 x2 + ω22 y 2 + ω32 z 2 , (5.181)
2m
where ω1,2,3 are the angular frequencies of the trap. This is an anisotropic three-dimensional harmonic
oscillator, the solution of which is separable into a product of one-dimensional harmonic oscillator wave-
functions. The eigenspectrum is then given by a sum of one-dimensional spectra, viz.

En1 ,n2 ,n3 = n1 + 21 ) ~ω1 + n2 + 12 ) ~ω2 + n3 + 21 ) ~ω3 . (5.182)

According to eqn. 5.13, the number of particles in the system is



X ∞
X ∞ h
X i−1
N= y −1 en1 ~ω1 /kB T en2 ~ω2 /kB T en3 ~ω3 /kB T − 1
n1 =0 n2 =0 n3 =0
∞     (5.183)
X 1 1 1
k
= y ,
k=1
1 − e−k~ω1 /kB T 1 − e−k~ω2 /kB T 1 − e−k~ω3 /kB T

where we’ve defined


y ≡ eµ/kB T e−~ω1 /2kB T e−~ω2 /2kB T e−~ω3 /2kB T . (5.184)
Note that y ∈ [0, 1].
Let’s assume that the trap is approximately anisotropic, which entails that the frequency ratios ω1 /ω2
etc. are all numbers on the order of one. Let us further assume that kB T ≫ ~ω1,2,3 . Then

 kB T ∗

 k~ωj k<∼ k (T )
1
≈ (5.185)
1 − e−k~ωj /kB T 

1 ∗
k > k (T )

where k∗ (T ) = kB T /~ω̄ ≫ 1, with


1/3
ω̄ = ω1 ω2 ω3 . (5.186)
We then have
∗  3 X
k∗
y k +1 kB T yk
N (T, y) ≈ + , (5.187)
1−y ~ω̄ k3
k=1
where the first term on the RHS is due to k > k ∗ and the second term from k ≤ k∗ in the previous sum.
Since k∗ ≫ 1 and since the sum of inverse cubes is convergent, we may safely extend −1the limit on the
above sum to infinity. To help make more sense of the first term, write N0 = y −1 − 1 for the number
of particles in the (n1 , n2 , n3 ) = (0, 0, 0) state. Then
N0
y= . (5.188)
N0 + 1
This is true always. The issue vis-a-vis Bose-Einstein condensation is whether N0 ≫ 1. At any rate, we
now see that we can write  
 ∗
−1 −k kB T 3
N ≈ N0 1 + N0 + Li3 (y) . (5.189)
~ω̄
258 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

As for the first term, we have



0 N0 ≪ k ∗
 ∗
−1 −k
N0 1 + N0 = (5.190)


N0 N0 ≫ k∗

Thus, as in the case of IBG condensation of ballistic particles, we identify the critical temperature by the
condition y = N0 /(N0 + 1) ≈ 1, and we have

 1/3  
~ω̄ N ν̄
Tc = = 4.5 N 1/3 [ nK ] , (5.191)
kB ζ(3) 100 Hz

where ν̄ = ω̄/2π. We see that kB Tc ≫ ~ω̄ if the number of particles in the trap is large: N ≫ 1. In this
regime, we have

 
kB T 3
T < Tc : N = N0 + ζ(3)
~ω̄
 3 (5.192)
kB T
T > Tc : N= Li3 (y) .]
~ω̄

It is interesting to note that BEC can also occur in two-dimensional traps, which is to say traps which are
very anisotropic, with oblate equipotential surfaces V (r) = V0 . This happens when ~ω3 ≫ kB T ≫ ω1,2 .
We then have
 
(d=2) ~ω̄ 6N 1/2
Tc = · (5.193)
kB π2
1/2
with ω̄ = ω1 ω2 . The particle number then obeys a set of equations like those in eqns. 5.192, mutatis
mutandis 13 .
For extremely prolate traps, with ω3 ≪ ω1,2 , the situation is different because Li1 (y) diverges for y = 1.
We then have
k T 
N = N0 + B ln 1 + N0 . (5.194)
~ω3

Here we have simply replaced y by the equivalent expression N0 /(N0 +1). If our criterion for condensation
is that N0 = αN , where α is some fractional value, then we have

~ω3 N
Tc (α) = (1 − α) · . (5.195)
kB ln N
π2
13
3 2
Explicitly, one replaces ζ(3) with ζ(2) = 6
, Li3 (y) with Li2 (y), and kB T /~ω̄ with kB T /~ω̄ .
5.8. THE IDEAL FERMI GAS 259

5.8 The Ideal Fermi Gas

5.8.1 Grand potential and particle number

The grand potential of the ideal Fermi gas is, per eqn. 5.11,
X  
Ω(T, V, µ) = −V kB T ln 1 + eµ/kB T e−εα /kB T
α
Z∞   (5.196)
= −V kB T dε g(ε) ln 1 + e(µ−ε)/kB T .
−∞

The average number of particles in a state with energy ε is

1
n(ε) = , (5.197)
e(ε−µ)/kB T +1

hence the total number of particles is

Z∞
1
N = V dε g(ε) . (5.198)
e(ε−µ)/kB T +1
−∞

5.8.2 The Fermi distribution

We define the function


1
f (ǫ) ≡ , (5.199)
eǫ/kB T +1
known as the Fermi distribution. In the T → ∞ limit, f (ǫ) → 12 for all finite values of ε. As T → 0, f (ǫ)
approaches a step function Θ(−ǫ). The average number of particles in a state of energy ε in a system at
temperature T and chemical potential µ is n(ε) = f (ε − µ). In fig. 5.12 we plot f (ε − µ) versus ε for
three representative temperatures.

5.8.3 T = 0 and the Fermi surface

At T = 0, we therefore have n(ε) = Θ(µ − ε), which says that all single particle energy states up to
ε = µ are filled, and all energy states above ε = µ are empty. We call µ(T = 0) the Fermi energy:
εF = µ(T = 0). If the single particle dispersion ε(k) depends only on the wavevector k, then the locus
of points in k-space for which ε(k) = εF is called the Fermi surface. For isotropic systems, ε(k) = ε(k)
is a function only of the magnitude k = |k|, and the Fermi surface is a sphere in d = 3 or a circle in
d = 2. The radius of this circle is the Fermi wavevector , kF . When there is internal (e.g. spin) degree
of freedom, there is a Fermi surface and Fermi wavevector (for isotropic systems) for each polarization
state of the internal degree of freedom.
260 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

 −1
Figure 5.12: The Fermi distribution, f (ǫ) = exp(ǫ/kB T ) + 1 . Here we have set kB = 1 and taken
1
µ = 2, with T = 20 (blue), T = 34 (green), and T = 2 (red). In the T → 0 limit, f (ǫ) approaches a step
function Θ(−ǫ).

Let’s compute the Fermi wavevector kF and Fermi energy εF for the IFG with a ballistic dispersion
ε(k) = ~2 k2 /2m. The number density is


 g k /π (d = 1)
 F

Z d 

dk g Ωd kFd 
n=g Θ(k F − k) = · = g kF2 /4π (d = 2) (5.200)
(2π)d (2π)d d 





 3
g kF /6π 2 (d = 3) ,

where Ωd = 2π d/2 /Γ(d/2) is the area of the unit sphere in d space dimensions. Note that the form
of n(kF ) is independent of the dispersion relation, so long as it remains isotropic. Inverting the above
expressions, we obtain kF (n):


 πn/g (d = 1)


  

d n 1/d 
kF = 2π = (4πn/g)1/2 (d = 2) (5.201)
g Ωd 





 2
(6π n/g)1/3 (d = 3) .

The Fermi energy in each case, for ballistic dispersion, is therefore


 2 2 2

 π ~ n
(d = 1)
 2g2 m



  
~2 kF2 2π 2 ~2 d n 2/d  2π~2 n
εF = = = gm (d = 2) (5.202)
2m m g Ωd 





 ~2 6π2 n 2/3

2m g (d = 3) .
5.8. THE IDEAL FERMI GAS 261

Another useful result for the ballistic dispersion, which follows from the above, is that the density of
states at the Fermi level is given by

g Ωd mkFd−2 d n
g(εF ) = · = · . (5.203)
(2π)d ~2 2 εF

−1 −1
For the electron gas, we have g = 2. In a metal, one typically has kF ∼ 0.5 Å to 2 Å , and εF ∼
1 eV − 10 eV. Due to the effects of the crystalline lattice, electrons in a solid behave as if they had an
effective mass m∗ which is typically on the order of the electron mass but very often about an order of
magnitude smaller, particularly in semiconductors.
Nonisotropic dispersions ε(k) are more interesting in that they give rise to non-spherical Fermi surfaces.
The simplest example is that of a two-dimensional ‘tight-binding’ model of electrons hopping on a square
lattice, as may be appropriate in certain layered materials. The dispersion relation is then

ε(kx , ky ) = −2t cos(kx a) − 2t cos(ky a) , (5.204)


 
where kx and ky are confined to the interval − πa , πa . The quantity t has dimensions of energy and is
known as the hopping integral. The Fermi surface is the set of points (kx , ky ) which satisfies ε(kx , ky ) =
εF . When εF achieves its minimum value of εmin F = −4t, the Fermi surface collapses to a point at
(kx , ky ) = (0, 0). For energies just above this minimum value, we can expand the dispersion in a power
series, writing  
ε(kx , ky ) = −4t + ta2 kx2 + ky2 − 12 1
ta4 kx4 + ky4 + . . . . (5.205)
If we only work to quadratic order in kx and ky , the dispersion is isotropic, and the Fermi surface is a
circle, with kF2 = (εF + 4t)/ta2 . As the energy increases further, the continuous O(2) rotational invariance
is broken down to the discrete group of rotations of the square, C4v . The Fermi surfaces distort and
eventually, at εF = 0, the Fermi surface is itself a square. As εF increases further, the square turns back
into a circle, but centered about the point πa , πa . Note that everything is periodic in kx and ky modulo

a . The Fermi surfaces for this model are depicted in the upper right panel of fig. 5.13.

Fermi surfaces in three dimensions can be very interesting indeed, and of great importance in under-
standing the electronic properties of solids. Two examples are shown in the bottom panels of fig. 5.13.
The electronic configuration of cesium (Cs) is [Xe] 6s1 . The 6s electrons ‘hop’ from site to site on a body
centered cubic (BCC) lattice, a generalization of the simple two-dimensional square lattice hopping model
discussed above. The elementary unit cell in k space, known as the first Brillouin zone, turns out to be
a dodecahedron. In yttrium, the electronic structure is [Kr] 5s2 4d1 , and there are two electronic energy
bands at the Fermi level, meaning two Fermi surfaces. Yttrium forms a hexagonal close packed (HCP)
crystal structure, and its first Brillouin zone is shaped like a hexagonal pillbox.

Spin-split Fermi surfaces

Consider an electron gas in an external magnetic field H. The single particle Hamiltonian is then

p2
Ĥ = + µB H σ , (5.206)
2m
262 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Figure 5.13: Fermi surfaces for two and three-dimensional structures. Upper left: free particles in two
dimensions. Upper right: ‘tight binding’ electrons on a square lattice. Lower left: Fermi surface for
cesium, which is predominantly composed of electrons in the 6s orbital shell. Lower right: the Fermi
surface of yttrium has two parts. One part (yellow) is predominantly due to 5s electrons, while the other
(pink) is due to 4d electrons. (Source: www.phys.ufl.edu/fermisurface/)

where µB is the Bohr magneton,


e~
µB = = 5.788 × 10−9 eV/G
2mc (5.207)
µB /kB = 6.717 × 10−5 K/G ,
where m is the electron mass. What happens at T = 0 to a noninteracting electron gas in a magnetic
field?
Electrons of each spin polarization form their own Fermi surfaces. That is, there is an up spin Fermi
surface, with Fermi wavevector kF↑ , and a down spin Fermi surface, with Fermi wavevector kF↓ . The
individual Fermi energies, on the other hand, must be equal, hence
~2 kF↑
2 ~2 kF↓
2
+ µB H = − µB H , (5.208)
2m 2m
which says
2 2 2eH
kF↓ − kF↑ = . (5.209)
~c
The total density is
3
kF↑ 3
kF↓ 3 3
n= + =⇒ kF↑ + kF↓ = 6π 2 n . (5.210)
6π 2 6π 2
5.8. THE IDEAL FERMI GAS 263

Clearly the down spin Fermi surface grows and the up spin Fermi surface shrinks with increasing H.
Eventually, the minority spin Fermi surface vanishes altogether. This happens for the up spins when
kF↑ = 0. Solving for the critical field, we obtain

~c 1/3
Hc = · 6π 2 n . (5.211)
2e
In real magnetic solids, like cobalt and nickel, the spin-split Fermi surfaces are not spheres, just like the
case of the (spin degenerate) Fermi surfaces for Cs and Y shown in fig. 5.13.

5.8.4 The Sommerfeld expansion

In dealing with the ideal Fermi gas, we will repeatedly encounter integrals of the form

Z∞
I(T, µ) ≡ dε f (ε − µ) φ(ε) . (5.212)
−∞

The Sommerfeld expansion provides a systematic way of expanding these expressions in powers of T and
is an important analytical tool in analyzing the low temperature properties of the ideal Fermi gas (IFG).
We start by defining

Φ(ε) ≡ dε′ φ(ε′ ) (5.213)
−∞

so that φ(ε) = Φ′ (ε). We then have

Z∞

I = dε f (ε − µ)

−∞
(5.214)
Z∞
= − dε f ′ (ε) Φ(µ + ε) ,
−∞

where we assume Φ(−∞) = 0. Next, we invoke Taylor’s theorem, to write

X∞
εn dnΦ
Φ(µ + ε) =
n ! dµn
n=0 (5.215)
 
d
= exp ε Φ(µ) .

This last expression involving the exponential of a differential operator may appear overly formal but it
proves extremely useful. Since
1 eε/kB T
f ′ (ε) = −  , (5.216)
kB T eε/kB T + 1 2
264 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Figure 5.14: Deformation of the complex integration contour in eqn. 5.219.

we can write
Z∞
evD
I = dv Φ(µ) , (5.217)
(ev + 1)(e−v + 1)
−∞

with v = ε/kB T , where


d
D = kB T (5.218)

is a dimensionless differential operator. The integral can now be done using the methods of complex
integration:14
Z∞ ∞
" #
evD X evD
dv = 2πi Res
(ev + 1)(e−v + 1) (ev + 1)(e−v + 1)
−∞ n=1 v=(2n+1)iπ

X (5.219)
= −2πi D e(2n+1)iπD
n=0
2πiD eiπD
=− = πD csc πD .
1 − e2πiD

Thus,
I(T, µ) = πD csc(πD) Φ(µ) , (5.220)
which is to be understood as the differential operator πD csc(πD) = πD/ sin(πD) acting on the function
Φ(µ). Appealing once more to Taylor’s theorem, we have

π2 d2 7π 4 d4
πD csc(πD) = 1 + (kB T )2 2 + (kB T )4 4 + . . . . (5.221)
6 dµ 360 dµ
14
Note that writing v = (2n + 1) iπ + ǫ we have e±v = −1 ∓ ǫ − 21 ǫ2 + . . . , so (ev + 1)(e−v + 1) = −ǫ2 + . . . We then expand
evD = e(2n+1)iπD 1 + ǫD + . . .) to find the residue: Res = −D e(2n+1)iπD .
5.8. THE IDEAL FERMI GAS 265

Thus,
Z∞
I(T, µ) = dε f (ε − µ) φ(ε)
−∞
(5.222)

π2 7π 4
= dε φ(ε) + (kB T )2 φ′ (µ) + (k T )4 φ′′′ (µ) + . . . .
6 360 B
−∞

If φ(ε) is a polynomial function of its argument, then each derivative effectively reduces the order of the
polynomial by one degree, and the dimensionless parameter of the expansion is (T /µ)2 . This procedure
is known as the Sommerfeld expansion.

Chemical potential shift

As our first application of the Sommerfeld expansion formalism, let us compute µ(n, T ) for the ideal
Fermi gas. The number density n(T, µ) is
Z∞
n = dε g(ε) f (ε − µ)
−∞
(5.223)

π2
= dε g(ε) + (kB T )2 g′ (µ) + . . . .
6
−∞

Let us write µ = εF + δµ, where εF = µ(T = 0, n) is the Fermi energy, which is the chemical potential at
T = 0. We then have
εF
Z+δµ
π2
n= dε g(ε) + (kB T )2 g′ (εF + δµ) + . . .
6
−∞
(5.224)
ZεF
π2
= dε g(ε) + g(εF ) δµ + (kB T )2 g′ (εF ) + . . . ,
6
−∞

from which we derive


π2 g ′ (εF )
δµ = − (kB T )2 + O(T 4 ) . (5.225)
6 g(εF )
Note that g ′ /g = (ln g)′ . For a ballistic dispersion, assuming g = 2,
Z 3  
dk ~2 k2 m k(ε)
g(ε) = 2 δ ε− = 2 2 (5.226)
(2π)3 2m π ~ k(ε)= 1 √2mε
~

1
Thus, g(ε) ∝ ε1/2 and (ln g)′ = 2 ε−1 , so
π 2 (kB T )2
µ(n, T ) = εF − + ... , (5.227)
12 εF
~2 2 2/3 .
where εF (n) = 2m (3π n)
266 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Specific heat

The energy of the electron gas is


Z∞
E
= dε g(ε) ε f (ε − µ)
V
−∞

π2 d  
= dε g(ε) ε + (kB T )2 µ g(µ) + . . .
6 dµ
−∞ (5.228)
ZεF
π2 π2
= dε g(ε) ε + g(εF ) εF δµ + (kB T )2 εF g′ (εF ) + (kB T )2 g(εF ) + . . .
6 6
−∞
π2
= ε0 + (kB T )2 g(εF ) + . . . ,
6
RεF
where ε0 = dε g(ε) ε . is the ground state energy density (i.e. ground state energy per unit volume).
−∞
Thus, to order T 2 ,  
∂E π2
CV,N = = V kB2 T g(εF ) ≡ V γ T , (5.229)
∂T V,N 3
π2

where γ(n) = 3 kB2 g εF (n) . Note that the molar heat capacity is
 
NA π2 kB T g(εF ) π 2 kB T
cV = · CV = R· = R, (5.230)
N 3 n 2 εF
where in the last expression on the RHS we have assumed a ballistic dispersion, for which
g(εF ) g mkF 6π 2 3
= 2 2
· 3
= . (5.231)
n 2π ~ g kF 2 εF

The molar heat capacity in eqn. 5.230 is to be compared with the classical ideal gas value of 32 R. Relative
to the classical ideal gas, the IFG value is reduced by a fraction of (π 2 /3) × (kB T /εF ), which in most
metals is very small and even at room temperature is only on the order of 10−2 . Most of the heat capacity
of metals at room temperature is due to the energy stored in lattice vibrations.
A niftier way to derive the heat capacity15 : Starting with eqn. 5.225 for µ(T ) − εF ≡ δµ(T ) , note that
2
g(εF ) = dn/dεF , so we may write δµ = − π6 (kB T )2 (dg/dn) + O(T 4 ) . Next, use the Maxwell relation
(∂S/∂N )T,V = −(∂µ/∂T )N,V to arrive at
 
∂s π 2 2 ∂g(εF )
= k T + O(T 3 ) , (5.232)
∂n T 3 B ∂n
where s = S/V is the entropy per unit volume. Now use S(T = 0) = 0 and integrate with respect to the
density n to arrive at S(T, V, N ) = V γT , where γ(n) is defined above.
15
I thank my colleague Tarun Grover for this observation.
5.8. THE IDEAL FERMI GAS 267

5.8.5 Magnetic susceptibility

Pauli paramagnetism

Magnetism has two origins: (i) orbital currents of charged particles, and (ii) intrinsic magnetic moment.
The intrinsic magnetic moment m of a particle is related to its quantum mechanical spin via

q~
m = gµ0 S/~ , µ0 = = magneton , (5.233)
2mc

where g is the particle’s g-factor,


 µ0 its magnetic moment, and S is the vector of quantum mechanical
spin operators satisfying S α , S β = i~ǫαβγ S γ , i.e. SU(2) commutation relations. The Hamiltonian for
a single particle is then

1  q 2
Ĥ = p− A − H ·m
2m∗ c (5.234)
1  e 2 g
= p+ A + µB H σ ,
2m∗ c 2

where in the last line we’ve restricted our attention to the electron, for which q = −e. The g-factor
for an electron is g = 2 at tree level, and when radiative corrections are accounted for using quantum
electrodynamics (QED) one finds g = 2.0023193043617(15). For our purposes we can take g = 2, although
we can always absorb the small difference into the definition of µB , writing µB → µ̃B = ge~/4mc. We’ve
chosen the ẑ-axis in spin space to point in the direction of the magnetic field, and we wrote the eigenvalues
of S z as 12 ~σ, where σ = ±1. The quantity m∗ is the effective mass of the electron, which we mentioned
earlier. An important distinction is that it is m∗ which enters into the kinetic energy term p2 /2m∗ , but
it is the electron mass m itself (m = 511 keV) which enters into the definition of the Bohr magneton. We
shall discuss the consequences of this further below.
In the absence of orbital magnetic coupling, the single particle dispersion is

~2 k2
εσ (k) = + µ̃B H σ . (5.235)
2m∗

At T = 0, we have the results of §5.8.3. At finite T , we once again use the Sommerfeld expansion. We
then have
Z∞ Z∞
n = dε g↑ (ε) f (ε − µ) + dε g↓ (ε) f (ε − µ)
−∞ −∞
Z∞ n o
1
= 2 dε g(ε − µ̃B H) + g(ε + µ̃B H) f (ε − µ) (5.236)
−∞
Z∞ n o
= dε g(ε) + (µ̃B H)2 g′′ (ε) + . . . f (ε − µ) .
−∞
268 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Figure 5.15: Fermi distributions in the presence of an external Zeeman-coupled magnetic field.

We now invoke the Sommerfeld expension to find the temperature dependence:


π2
n = dε g(ε) + (k T )2 g′ (µ) + (µ̃B H)2 g ′ (µ) + . . .
6 B
−∞
(5.237)
ZεF
π2
= dε g(ε) + g(εF ) δµ + (k T )2 g′ (εF ) + (µ̃B H)2 g′ (εF ) + . . . .
6 B
−∞

Note that the density of states for spin species σ is


1
gσ (ε) = 2 g(ε − µ̃B Hσ) , (5.238)

where g(ε) is the total density of states per unit volume, for both spin species, in the absence of a magnetic
field. We conclude that the chemical potential shift in an external field is
 2  ′
π 2 2 g (εF )
δµ(T, n, H) = − (kB T ) + (µ̃B H) + ... . (5.239)
6 g(εF )

We next compute the difference n↑ − n↓ in the densities of up and down spin electrons:

Z∞ n o
n↑ − n↓ = dε g↑ (ε) − g↓ (ε) f (ε − µ)
−∞
Z∞ n o (5.240)
1
= 2 dε g(ε − µ̃B H) − g(ε + µ̃B H) f (ε − µ)
−∞
= −µ̃B H · πD csc(πD) g(µ) + O(H 3 ) .
5.8. THE IDEAL FERMI GAS 269

We needn’t go beyond the trivial lowest order term in the Sommerfeld expansion, because H is already
assumed to be small. Thus, the magnetization density is

M = −µ̃B (n↑ − n↓ ) = µ̃2B g(εF ) H . (5.241)

in which the magnetic susceptibility is


 
∂M
χ= = µ̃2B g(εF ) . (5.242)
∂H T,N

This is called the Pauli paramagnetic susceptibility.

Landau diamagnetism

When orbital effects are included, the single particle energy levels are given by
~2 kz2
ε(n, kz , σ) = (n + 21 )~ωc + + µ̃B H σ . (5.243)
2m∗
Here n is a Landau level index, and ωc = eH/m∗ c is the cyclotron frequency. Note that
µ̃B H ge~H m∗ c g m∗
= · = · . (5.244)
~ωc 4mc ~eH 4 m
Accordingly, we define the ratio r ≡ (g/2) × (m∗ /m). We can then write

1
 ~2 kz2
ε(n, kz , σ) = n + 2 + 12 rσ ~ωc + . (5.245)
2m∗
The grand potential is then given by
Z∞
dkz X X h i

HA 1 1 2 2 ∗
Ω=− · Lz · kB T ln 1 + eµ/kB T e−(n+ 2 + 2 rσ)~ωc /kB T e−~ kz /2m kB T . (5.246)
φ0 2π
−∞ n=0 σ=±1

A few words are in order here regarding the prefactor. In the presence of a uniform magnetic field, the
energy levels of a two-dimensional ballistic charged particle collapse into Landau levels. The number of
states per Landau level scales with the area of the system, and is equal to the number of flux quanta
through the system: Nφ = HA/φ0 , where φ0 = hc/e is the Dirac flux quantum. Note that

HA V
· Lz · kB T = ~ωc · 3 , (5.247)
φ0 λT
hence we can write
∞ X
X
1

Ω(T, V, µ, H) = ~ωc Q (n + 2 + 12 rσ)~ωc − µ , (5.248)
n=0 σ=±1

where
Z∞
V dkz h 2 2 ∗
i
Q(ε) = − 2 ln 1 + e−ε/kB T e−~ kz /2m kB T . (5.249)
λT 2π
−∞
270 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

We now invoke the Euler-MacLaurin formula,


X Z∞
F (n) = dx F (x) + 12 F (0) − 1
12 F ′ (0) + . . . , (5.250)
n=0 0

resulting in
( Z∞
X 
Ω= dε Q(ε − µ) + 12 ~ωc Q 1
2 (1 + rσ)~ωc − µ
σ=±1 1
2
(1+rσ)~ωc (5.251)
)

− 1
12 (~ωc )2 Q′ 1
2 (1 + rσ)~ωc − µ + . . .

We next expand in powers of the magnetic field H to obtain


Z∞
1 2

Ω(T, V, µ, H) = 2 dε Q(ε − µ) + 4r − 1
12 (~ωc )2 Q′ (−µ) + . . . . (5.252)
0

Thus, the magnetic susceptibility is

1 ∂ 2Ω 2 1
 2   2 ′
∗ 2

χ=− = r − 3 · µ̃ B · m/m · − Q (−µ)
V ∂H 2 V
 2  (5.253)
g m2
= − · µ̃2B · n2 κT ,
4 3m∗2

where κT is the isothermal compressibility16 . In most metals we have m∗ ≈ m and the term in brackets is
positive (recall g ≈ 2). In semiconductors, however, we can have m∗ ≪ m; for example in GaAs we have
m∗ = 0.067 m . Thus, semiconductors can have a diamagnetic response. If we take g = 2 and m∗ = m, we
see that the orbital currents give rise to a diamagnetic contribution to the magnetic susceptibility which
is exactly − 13 times as large as the contribution arising from Zeeman coupling. The net result is then
paramagnetic (χ > 0) and 32 as large as the Pauli susceptibility. The orbital currents can be understood
within the context of Lenz’s law.
Exercise : Show that − V2 Q′ (−µ) = n2 κT .

5.8.6 Moment formation in interacting itinerant electron systems

The Hubbard model

A noninteracting electron gas exhibits paramagnetism or diamagnetism, depending on the sign of χ,


but never develops a spontaneous magnetic moment: M (H = 0) = 0. What gives rise to magnetism
16 ∂ 2Ω
We’ve used − V2 Q′ (µ) = − V1 ∂µ2
= n2 κT .
5.8. THE IDEAL FERMI GAS 271

in solids? Overwhelmingly, the answer is that Coulomb repulsion between electrons is responsible for
magnetism, in those instances in which magnetism arises. At first thought this might seem odd, since
the Coulomb interaction is spin-independent. How then can it lead to a spontaneous magnetic moment?
To understand how Coulomb repulsion leads to magnetism, it is useful to consider a model interacting
system, described by the Hamiltonian

X  X X †
Ĥ = −t c†iσ cjσ + c†jσ ciσ + U ni↑ ni↓ + µB H · ciα σαβ ciβ . (5.254)
ij,σ i i,α,β

This is none other than the famous Hubbard model, which has served as a kind of Rosetta stone for
interacting electron systems. The first term describes hopping of electrons along the links of some regular
lattice (the symbol ij denotes a link between sites i and j). The second term describes the local (on-
site) repulsion of electrons. This is a single orbital model, so the repulsion exists when one tries to put
two electrons in the orbital, with opposite spin polarization. Typically the Hubbard U parameter is
on the order of electron volts. The last term is the Zeeman interaction of the electron spins with an
external magnetic field. Orbital effects can be modeled by associating a phase exp(iAij ) to the hopping
matrix element t between sites i and j, where the directed sum of Aij around a plaquette yields the
total magnetic flux through the plaquette in units of φ0 = hc/e. We will ignore orbital effects here.
Note that the interaction term is short-ranged, whereas the Coulomb interaction falls off as 1/|Ri − Rj |.
The Hubbard model is thus unrealistic, although screening effects in metals do effectively render the
interaction to be short-ranged.
Within the Hubbard model, the interaction term is local and written as U n↑ n↓ on any given site. This
term favors a local moment. This is because the chemical potential will fix the mean value of the total
occupancy n↑ + n↓ , in which case it always pays to maximize the difference |n↑ − n↓ |.

Stoner mean field theory

There are no general methods available to solve for even the ground state of an interacting many-body
Hamiltonian. We’ll solve this problem using a mean field theory due to Stoner. The idea is to write the
occupancy niσ as a sum of average and fluctuating terms:

niσ = hniσ i + δniσ . (5.255)

Here, hniσ i is the thermodynamic average; the above equation may then be taken as a definition of the
fluctuating piece, δniσ . We assume that the average is site-independent. This is a significant assumption,
for while we understand why each site should favor developing a moment, it is not clear that all these local
moments should want to line up parallel to each other. Indeed, on a bipartite lattice, it is possible that the
individual local moments on neighboring sites will be antiparallel, corresponding to an antiferromagnetic
order of the pins. Our mean field theory will be one for ferromagnetic states.
272 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

We now write the interaction term as


(flucts )2
z }| {
ni↑ ni↓ = hn↑ i hn↓ i + hn↑ i δni↓ + hn↓ i δni↑ + δni↑ δni↓
 (5.256)
= −hn↑ i hn↓ i + hn↑ i ni↓ + hn↓ i ni↑ + O (δn)2

= 14 (m2 − n2 ) + 12 n (ni↑ + ni↓ ) + 21 m (ni↑ − ni↓ ) + O (δn)2 ,
where n and m are the average occupancy per spin and average spin polarization, each per unit cell:
n = hn↓ i + hn↑ i
(5.257)
m = hn↓ i − hn↑ i ,

i.e. hnσ i = 21 (n − σm). The mean field grand canonical Hamiltonian K = Ĥ − µN , may then be written
as
X  †  X †
KMF = − 21 tij ciσ cjσ + c†jσ ciσ − µ − 12 U n ciσ ciσ
i,j,σ iσ
X (5.258)
+ µB H + 12 U m σ c†iσ ciσ + 41 Nsites U (m2 − n2 ) ,

where we’ve quantized spins along the direction of H, defined as ẑ. You should take note of two things
here. First, the chemical potential is shifted downward (or the electron energies shifted upward) by an
amount 12 U n, corresponding to the average energy of repulsion with the background. Second, the effective
magnetic field has been shifted by an amount 12 U m/µB , so the effective field is
Um
Heff = H + . (5.259)
2µB
The bare single particle dispersions are given by εσ (k) = −t̂(k) + σµB H, where
X
t̂(k) = t(R) e−ik·R , (5.260)
R
P
and tij = t(Ri −Rj ). For nearest neighbor hopping on a d-dimensional cubic lattice, t̂(k) = −t dµ=1 cos(kµ a),
where a is the lattice constant. Including the mean field effects, the effective single particle dispersions
become 
εeσ (k) = −t̂(k) − 21 U n + µB H + 21 U m σ . (5.261)

We now solve the mean field theory, by obtaining the free energy per site, ϕ(n, T, H). First, note that
ϕ = ω + µn, where ω = Ω/Nsites is the Landau, or grand canonical, free energy per site. This follows
from the general relation Ω = F − µN ; note that the total electron number is N = nNsites , since n is the
electron number per unit cell (including both spin species). If g(ε) is the density of states per unit cell
(rather than per unit volume), then we have17
Z∞     
1 2 2 1 (µ̄−ε−∆)/kB T (µ̄−ε+∆)/kB T
ϕ= 4 U (m + n ) + µ̄n − 2 kB T dε g(ε) ln 1 + e + ln 1 + e (5.262)
−∞
17
Note that we have written µn = µ̄n + 12 U n2 , which explains the sign of the coefficient of n2 .
5.8. THE IDEAL FERMI GAS 273

where µ̄ ≡ µ − 12 U n and ∆ ≡ µB H + 12 U m. From this free energy we derive two self-consistent equations
for µ and m. The first comes from demanding that ϕ be a function of n and not of µ, i.e. ∂ϕ/∂µ = 0,
which leads to
Z∞ n o
1
n = 2 dε g(ε) f (ε − ∆ − µ̄) + f (ε + ∆ − µ̄) , (5.263)
−∞
 −1
where f (y) = exp(y/kB T ) + 1 is the Fermi function. The second equation comes from minimizing f
with respect to average moment m:
Z∞ n o
1
m= dε g(ε) f (ε − ∆ − µ̄) − f (ε + ∆ − µ̄) .
2 (5.264)
−∞

Here, we will solve the first equation, eq. 5.263, and use the results to generate a Landau expansion of
the free energy ϕ in powers of m2 . We assume that ∆ is small, in which case we may write
Z∞ n o
n = dε g(ε) f (ε − µ̄) + 21 ∆2 f ′′ (ε − µ̄) + 1
24 ∆4 f ′′′′ (ε − µ̄) + . . . . (5.265)
−∞

We write µ̄(∆) = µ̄0 + δµ̄ and expand in δµ̄. Since n is fixed in our (canonical) ensemble, we have

Z∞

n = dε g(ε) f ε − µ̄0 , (5.266)
−∞

which defines µ̄0 (n, T ).18 The remaining terms in the δµ̄ expansion of eqn. 5.265 must sum to zero. This
yields

D(µ̄0 ) δµ̄ + 12 ∆2 D ′ (µ̄0 ) + 21 (δµ̄)2 D ′ (µ̄0 ) + 12 D ′′ (µ̄0 ) ∆2 δµ̄ + 1


24 D ′′′ (µ̄0 ) ∆4 + O(∆6 ) = 0 , (5.267)

where
Z∞
D(µ) = − dε g(ε) f ′ (ε − µ) (5.268)
−∞

is the thermally averaged bare density of states at energy µ. Note that the k th derivative is
Z∞
(k)
D (µ) = − dε g (k) (ε) f ′ (ε − µ) . (5.269)
−∞

Solving for δµ̄, we obtain



δµ̄ = − 21 a1 ∆2 − 1
24 3a31 − 6a1 a2 + a3 ∆4 + O(∆6 ) , (5.270)
18
The Gibbs-Duhem relation guarantees that such an equation of state exists, relating any three intensive thermodynamic
quantities.
274 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

where
D (k) (µ̄0 )
ak ≡ . (5.271)
D(µ̄0 )

After integrating by parts and inserting this result for δµ̄ into our expression for the free energy f , we
obtain the expansion
 ′ 2 !
D (µ̄ 0 )
ϕ(n, T, m) = ϕ0 (n, T ) + 14 U m2 − 12 D(µ̄0 ) ∆2 + 81 − 13 D ′′ (µ̄0 ) ∆4 + . . . , (5.272)
D(µ̄0 )

where prime denotes differentiation with respect to argument, at m = 0, and


Z∞
1 2

ϕ0 (n, T ) = 4Un + nµ̄0 − dε N (ε) f ε − µ̄0 , (5.273)
−∞

where g(ε) = N ′ (ε), so N (ε) is the integrated bare density of states per unit cell in the absence of any
magnetic field (including both spin species).
We assume that H and m are small, in which case

U χ0
ϕ = ϕ0 + 12 am2 + 41 bm4 − 12 χ0 H 2 − Hm + . . . , (5.274)
2µB

where χ0 = µ2B D(µ̄0 ) is the Pauli susceptibility, and


!
(D ′ )2 1 ′′
a= 1
2U 1− 1
2 U D) , b= 1
32 − 3 D U4 , (5.275)
D

where the argument of each D (k) above is µ̄0 (n, T ). The magnetization density (per unit cell, rather than
per unit volume) is given by
∂ϕ U χ0
M =− = χ0 H + m. (5.276)
∂H 2µB
Minimizing with respect to m yields

U χ0
am + bm3 − H=0, (5.277)
2µB

which gives, for small m,


χ0 H
m= . (5.278)
µB 1 − 12 U D
We therefore obtain M = χ H with
χ0
χ= , (5.279)
U
1− Uc

where
2
Uc = . (5.280)
D(µ̄0 )
5.8. THE IDEAL FERMI GAS 275

Figure 5.16: A graduate student experiences the Stoner enhancement.

The denominator of χ increases the susceptibility above the bare Pauli value χ0 , and is referred to as – I
kid you not – the Stoner enhancement (see Fig. 5.16).
It is worth emphasizing that the magnetization per unit cell is given by

1 δĤ
M =− = µB m . (5.281)
Nsites δH

This is an operator identity and is valid for any value of m, and not only small m.
When H = 0 we can still get a magnetic moment, provided U > Uc . This is a consequence of the simple
Landau theory we have derived. Solving for m when H = 0 gives m = 0 when U < Uc and

 1/2 p
U
m(U ) = ± U − Uc , (5.282)
2b Uc

when U > Uc , and assuming b > 0. Thus we have the usual mean field order parameter exponent of
β = 21 .

Antiferromagnetic solution

In addition to ferromagnetism, there may be other ordered states which solve the mean field theory. One
such example is antiferromagnetism. On a bipartite lattice, the antiferromagnetic mean field theory is
obtained from

hniσ i = 21 n + 12 σ eiQ·Ri m , (5.283)


276 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

where Q = (π/a, π/a, . . . , π/a) is the antiferromagnetic ordering wavevector. The grand canonical Hamil-
tonian is then
X  †  X †
KMF = − 21 tij ciσ cjσ + c†jσ ciσ − µ − 21 U n ciσ ciσ
i,j,σ iσ
X
+ 1
2Um
iQ·Ri
e σ c†iσ ciσ + 1
4 Nsites U (m
2
− n2 ) (5.284)

!
X †  ε(k) − µ + 1 U n 1
σ Um

ck,σ
1 † 2 2
=2 ck,σ ck+Q,σ 1 1
2σ Um ε(k + Q) − µ + 2 U n ck+Q,σ

+ 14 Nsites U (m2 − n2 ) , (5.285)
where ε(k) = −t̂(k), as before. On a bipartite lattice, with nearest neighbor hopping only, we have
ε(k+Q) = −ε(k). The above matrix is diagonalized by a unitary transformation, yielding the eigenvalues
p
λ± = ± ε2 (k) + ∆2 − µ̄ (5.286)
with ∆ = 12 U m and µ̄ = µ − 12 U n as before. The free energy per unit cell is then
ϕ = 41 U (m2 + n2 ) + µ̄n (5.287)
Z∞   √   √ 
− 12 kB T dε g(ε) ln 1 + e(µ̄− ε +∆ )/kB T + ln 1 + e(µ̄+ ε +∆ )/kB T
2 2 2 2
.
−∞

The mean field equations are then


Z∞ n  p  p o
n = 12 dε g(ε) f − ε2 + ∆2 − µ̄ + f ε2 + ∆2 − µ̄ (5.288)
−∞
Z∞ n  p  p o
1 1 g(ε) 2 2 2 2
= 2 dε √ f − ε + ∆ − µ̄ − f ε + ∆ − µ̄ . (5.289)
U ε2 + ∆2
−∞

As in the case of the ferromagnet, a paramagnetic solution with m = 0 always exists, in which case the
second of the above equations is no longer valid.

Mean field phase diagram of the Hubbard model

Let us compare the mean field theories for the ferromagnetic and antiferromagnetic states at T = 0 and
H = 0. Due to particle-hole symmetry, we may assume 0 ≤ n ≤ 1 without loss of generality. (The
solutions repeat themselves under n → 2 − n.) For the paramagnet, we have
Zµ̄
n = dε g(ε) (5.290)
−∞
Zµ̄
1 2
ϕ= 4Un + dε g(ε) ε , (5.291)
−∞
5.8. THE IDEAL FERMI GAS 277

with µ̄ = µ − 21 U n is the ‘renormalized’ Fermi energy and g(ε) is the density of states per unit cell in the
absence of any explicit (H) or implicit (m) symmetry breaking, including both spin polarizations.
For the ferromagnet,
µ̄−∆
Z µ̄+∆
Z
1 1
n= 2 dε g(ε) + 2 dε g(ε) (5.292)
−∞ −∞
µ̄+∆
Z
4∆
= dε g(ε) (5.293)
U
µ̄−∆
µ̄−∆
Z µ̄+∆
Z
1 2 ∆2
ϕ= 4Un − + dε g(ε) ε + dε g(ε) ε . (5.294)
U
−∞ −∞

Here, ∆ = 12 U m is nonzero in the ordered phase.


Finally, the antiferromagnetic mean field equations are
Z∞ Z∞
nµ̄<0 = dε g(ε) ; nµ̄>0 = 2 − dε g(ε) (5.295)
ε0 ε0
Z∞
2 g(ε)
= dε √ (5.296)
U ε2 + ∆2
ε0

2 Z∞ p

ϕ = 41 U n2 + − dε g(ε) ε2 + ∆2 , (5.297)
U
ε0
p
where ε0 = µ̄2 − ∆2 and ∆ = 12 U m as before. Note that |µ̄| ≥ ∆ for these solutions. Exactly at
half-filling, we have n = 1 and µ̄ = 0. We then set ε0 = 0.
The paramagnet to ferromagnet transition  may be first or second order, depending on the details of
F
g(ε). If second order, it occurs at Uc = 1 g(µ̄P ), where µ̄P (n) is the paramagnetic solution for µ̄. The
paramagnet to antiferromagnet transition is always second order in this mean field
 ∞theory, since the RHS
R
of eqn. (5.296) is a monotonic function of ∆. This transition occurs at UcA = 2 dε g(ε) ε−1 . Note that
µ̄P
UcA → 0 logarithmically for n → 1, since µ̄P = 0 at half-filling.
For large U , the ferromagnetic solution always has the lowest energy, and therefore if UcA < UcF , there
will be a first-order antiferromagnet to ferromagnet transition at some value U ∗ > UcF . In fig. 5.17, I
plot the phase diagram√ obtained by solving the mean field equations assuming a semicircular density of
states g(ε) = π2 W −2 W 2 − ε2 . Also shown is the phase diagram for the d = 2 square lattice Hubbard
model obtained by J. Hirsch (1985).
How well does Stoner theory describe the physics of the Hubbard model? Quantum Monte Carlo calcula-
tions by J. Hirsch (1985) found that the actual phase diagram of the d = 2 square lattice Hubbard Model
278 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Figure 5.17: Mean field phase diagram of the Hubbard model, including paramagnetic (P), ferromagnetic
(F), and antiferromagnetic (A) phases. Left panel: results using a semicircular density of states function
of half-bandwidth W . Right panel: results using a two-dimensional square lattice density of states with
nearest neighbor hopping t, from J. E. Hirsch, Phys. Rev. B 31, 4403 (1985). The phase boundary
between F and A phases is first order.

exhibits no ferromagnetism for any n up to U = 10. Furthermore, he found the antiferromagnetic phase
to be entirely confined to the vertical line n = 1. For n 6= 1 and 0 ≤ U ≤ 10, the system is a param-
agnet19 . These results were state-of-the art at the time, but both computing power as well as numerical
algorithms for interacting quantum systems have advanced considerably since 1985. Yet as of 2018, we
still don’t have a clear understanding of the d = 2 Hubbard model’s T = 0 phase diagram! There is
an emerging body of numerical evidence20 that in the underdoped (n < 1) regime, there are portions of
the phase diagram which exhibit a stripe ordering, in which antiferromagnetic order is interrupted by a
parallel array of line defects containing excess holes (i.e. the absence of an electron)21 . This problem has
turned out to be unexpectedly rich, complex, and numerically difficult to resolve due to the presence of
competing ordered states, such as d-wave superconductivity and spiral magnetic phases, which lie nearby
in energy with respect to the putative stripe ground state.
In order to achieve a ferromagnetic solution, it appears necessary to introduce geometric frustration,
either by including a next-nearest-neighbor hopping amplitude t′ or by defining the model on non-bipartite
lattices. Numerical work by M. Ulmke (1997) showed the existence of a ferromagnetic phase at T = 0 on
19
A theorem due to Nagaoka establishes that the ground state is ferromagnetic for the case of a single hole in the U = ∞
system on bipartite lattices.
20
See J. P. F. LeBlanc et al., Phys. Rev. X 5, 041041 (2015) and B. Zheng et al., Science 358, 1155 (2017).
21
The best case for stripe order has been made at T = 0, U/t = 8, and hold doping x = 81 (i.e. n = 78 ).
5.8. THE IDEAL FERMI GAS 279

the FCC lattice Hubbard model for U = 6 and n ∈ [0.15, 0.87] (approximately).

5.8.7 White dwarf stars

There is a nice discussion of this material in R. K. Pathria, Statistical Mechanics. As a model, consider
a mass M ∼ 1033 g of helium at nuclear densities of ρ ∼ 107 g/cm3 and temperature T ∼ 107 K. This
temperature is much larger than the ionization energy of 4 He, hence we may safely assume that all helium
atoms are ionized. If there are N electrons, then the number of α particles (i.e. 4 He nuclei) must be 12 N .
The mass of the α particle is mα ≈ 4mp . The total stellar mass M is almost completely due to α particle
cores.
The electron density is then

N 2 · M/4mp ρ
n= = = ≈ 1030 cm−3 , (5.298)
V V 2mp

since M = N · me + 21 N · 4mp . From the number density n we find for the electrons

kF = (3π 2 n)1/3 = 2.14 × 1010 cm−1


pF = ~kF = 2.26 × 10−17 g cm/s (5.299)
−28 10 −17
mc = (9.1 × 10 g)(3 × 10 m/s) = 2.7 × 10 g cm/s .

Since pF ∼ mc, we conclude that the electrons are relativistic. The Fermi temperature will then be
TF ∼ mc2 ∼ 106 eV ∼ 1012 K. Thus, T ≪ Tf which says that the electron gas is degenerate and may
be considered to be at T ∼ 0. So we need to understand the ground state properties of the relativistic
electron gas.
The kinetic energy is given by p
ε(p) = p2 c2 + m2 c4 − mc2 . (5.300)
The velocity is
∂ε pc2
v= =p . (5.301)
∂p p2 c2 + m2 c4
The pressure in the ground state is

p0 = 13 nhp · vi
ZpF
1 p2 c2
= 2 3 dp p2 · p
3π ~ p2 c2 + m2 c4
0
ZθF (5.302)
m4 c5 4
= 2 3 dθ sinh θ
3π ~
0
m4 c5 
= 2 3
sinh(4θF ) − 8 sinh(2θF ) + 12 θF ,
96π ~
280 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

Figure 5.18: Mass-radius relationship for white dwarf stars. (Source: Wikipedia).

where we use the substitution


 
1 c+v
p = mc sinh θ , v = c tanh θ =⇒ θ= 2 ln . (5.303)
c−v

Note that pF = ~kF = ~(3π 2 n)1/3 , and that

M 9π M
n= =⇒ 3π 2 n = . (5.304)
2mp V 8 R 3 mp

Now in equilibrium the pressure p is balanced by gravitational pressure. We have

dE0 = −p0 dV = −p0 (R) · 4πR2 dR . (5.305)

This must be balanced by gravity:


GM 2
dEg = γ · dR , (5.306)
R2
where γ depends on the radial mass distribution. Equilibrium then implies

γ GM 2
p0 (R) = . (5.307)
4π R4

To find the relation R = R(M ), we must solve

γ gM 2 m4 c5 
4
= 2 3
sinh(4θF ) − 8 sinh(2θF ) + 12 θF . (5.308)
4π R 96π ~
5.9. APPENDIX I : SECOND QUANTIZATION 281

Note that 
96 5

 15 θF θF → 0
sinh(4θF ) − 8 sinh(2θF ) + 12θF = (5.309)

1
2 e4θF θF → ∞ .
Thus, we may write
  5/3

 ~2 9π M
θF → 0
 15π m 8
 2 R3 mp
γ gM 2
p0 (R) = = (5.310)
4π R4 
  4/3

 ~c 9π M
12π 2 8 R3 mp θF → ∞ .

In the limit θF → 0, we solve for R(M ) and find

~2
R= 3
40γ (9π)2/3 5/3
∝ M −1/3 . (5.311)
G mp m M 1/3

In the opposite limit θF → ∞, the R factors divide out and we obtain


 1/2  3/2
9 3π ~c 1
M = M0 = . (5.312)
64 γ3 G m2p

To find the R dependence, we must go beyond the lowest order expansion of eqn. 5.309, in which case
we find
 1/3    "   #1/2
9π ~ M 1/3 M 2/3
R= 1− . (5.313)
8 mc mp M0

The value M0 is the limiting size for a white dwarf. It is called the Chandrasekhar limit.

5.9 Appendix I : Second Quantization

5.9.1 Basis states and creation/annihilation operators

Second quantization is a convenient scheme to label basis states of a many particle quantum system. We
are ultimately interested in solutions of the many-body Schrödinger equation,

ĤΨ(x1 , . . . , xN ) = E Ψ(x1 , . . . , xN ) (5.314)

where the Hamiltonian is


N N
~2 X 2 X
Ĥ = − ∇i + V (xj − xk ) . (5.315)
2m
i=1 j<k

To the coordinate labels {x1 , . . . xN } we may also append labels for internal degrees of freedom, such as
 
spin polarization, denoted {ζ1 , . . . , ζN }. Since Ĥ, σ = 0 for all permutations σ ∈ SN , the many-body
282 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

wavefunctions may be chosen to transform according to irreducible representations of the symmetric


group SN . Thus, for any σ ∈ SN ,
 
 1
Ψ xσ(1) , . . . , xσ(N ) = Ψ(x1 , . . . , xN ) , (5.316)
sgn(σ)

where the upper choice is for Bose-Einstein statistics and the lower sign for Fermi-Dirac statistics. Here
xj may include not only the spatial coordinates of particle j, but its internal quantum number(s) as well,
such as ζj .

A convenient basis for the many body states is obtained from the single-particle eigenstates |αi of
some single-particle Hamiltonian

Ĥ0 , with h x | α i = ϕα (x) and Ĥ0 |αi = εα |αi. The basis may be taken
as orthonormal, i.e. α α′ = δαα′ . Now define

1 X  1

Ψα1 ,...,α (x1 , . . . , xN ) = p Q ϕα (x1 ) · · · ϕα (xN ) . (5.317)
N
N! α nα ! σ∈S
sgn(σ) σ(1) σ(N)
N

Here nα is the number of times the index α appears among the set {α1 , . . . , αN }. For BE statistics,
nα ∈ {0, 1, 2, . . .} , whereas for FD statistics, nα ∈ {0, 1} . Note that the above states are normalized22 :
Z Z
2 1 X  1
 Y
N Z
d
d x1 · · · d xN Ψα1 ···α (x1 , . . . , xN ) =
d Q ddxj ϕ∗α (xj ) ϕα (xj )
N N! α nα ! σ,µ∈S sgn(σµ) σ(j) µ(j)
N
j=1
N
X Y
1
=Q δαj ,α =1 . (5.318)
α nα !
σ(j)
σ∈SN j=1

Note that
X 
ϕα (x1 ) · · · ϕα (xN ) ≡ per ϕαi (xj )
σ(1) σ(N)
σ∈SN
X  (5.319)
sgn(σ) ϕα (x1 ) · · · ϕα (xN ) ≡ det ϕαi (xj ) ,
σ(1) σ(N)
σ∈SN

which stand for permanent and determinant, respectively. We may now write


Ψα1 ··· α (x1 , . . . , xN ) = x1 , · · · , xN α1 · · · αN , (5.320)
N

where
1 X  1


α1 · · · αN = p Q α ⊗ ασ(2) ⊗ · · · ⊗ ασ(N ) . (5.321)
σ(1)
N! α nα ! σ∈S
sgn(σ)
N

Note that | ασ(1) · · · ασ(N ) i = (±1)σ | α1 · · · αN i , where by (±1)σ we mean 1 in the case of BE statistics
and sgn (σ) in the case of FD statistics.
22
ddx implicitly includes a sum
R P
In the normalization integrals, each ζ over any internal indices that may be present.
5.9. APPENDIX I : SECOND QUANTIZATION 283

We may express | α1 · · · αN i as a product of creation operators acting on a vacuum | 0 i in Fock space.


For bosons,
Y (b†α )nα
α1 · · · αN = p 0 ≡ {nα } , (5.322)
α nα !
with      
bα , bβ = 0 , b†α , b†β = 0 , bα , b†β = δαβ , (5.323)
where [ • , • ] is the commutator. For fermions,
† † †
α ··· α 0 ≡ {nα }
1 N = cα cα · · · cα 1 2 N
, (5.324)

with   † † 
cα , cβ = 0 , cα , cβ = 0 , cα , c†β = δαβ , (5.325)
where {• , •} is the anticommutator.

5.9.2 Second quantized operators

~ 2 PN 2
Now consider the action of permutation-symmetric first quantized operators such as T̂ = − 2m i=1 ∇i
PN
and V̂ = i<j v̂(xi − xj ). For a one-body operator such as T̂ , we have
Z Z Y −1/2  Y −1/2


α1 · · · αN T̂ α′1 · · · α′N = ddx1 · · · ddxN nα ! n′α ! × (5.326)
α α
X N
X
(±1)σ ϕ∗α (x1 ) · · · ϕ∗α (xN ) T̂i ϕα′ (x1 ) · · · ϕα′
σ(1) σ(N) σ(1) σ(N)
σ∈SN k=1

X Y −1/2 X
N Y Z
= (±1)σ nα ! n′α ! δα ′ ddx1 ϕ∗α (x1 ) T̂1 ϕα′ (x1 ) .
j ,ασ(j) i σ(i)
σ∈SN α i=1 j
(j6=i)

One may verify that any permutation-symmetric one-body operator such as T̂ is faithfully represented
by the second quantized expression,
X

T̂ = α T̂ β ψα† ψβ , (5.327)
α,β

where ψα† is b†α or c†α as the application determines, and


Z


α T̂ β = ddx1 ϕ∗α (x1 ) T̂1 ϕβ (x1 ) . (5.328)

Similarly, two-body operators such as V̂ are represented as


X

V̂ = 12 αβ V̂ γδ ψα† ψβ† ψδ ψγ , (5.329)
α,β,γ,δ
284 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

where Z Z


αβ V̂ γδ = ddx1 ddx2 ϕ∗α (x1 ) ϕ∗β (x2 ) v(x1 − x2 ) ϕδ (x2 ) ϕγ (x1 ) . (5.330)

The general form for an n-body operator is then


1 X

R̂ = α1 · · · αn R̂ β1 · · · βn ψα† n · · · ψα† n ψβ · · · ψβ1 . (5.331)
n! α ··· α n
1 n
β1 ··· βn

PN
Finally, if the Hamiltonian is noninteracting, consisting solely of one-body operators Ĥ = i=1 ĥi , then
X
Ĥ = εα ψα† ψα , (5.332)
α

where {εα } is the spectrum of each single particle Hamiltonian ĥi .

5.10 Appendix II : Ideal Bose Gas Condensation

We begin with the grand canonical Hamiltonian K = H − µN for the ideal Bose gas,
X √ X 
K= (εk − µ) b†k bk − N νk b†k + ν̄k bk . (5.333)
k k
 
Here b†k is the creation operator for a boson in a state of wavevector k, hence bk , b†k′ = δkk′ . The
dispersion relation is given by the function εk , which is the energy of a particle with wavevector k. We
must have εk − µ ≥ 0 for all k, lest the spectrum of K be unbounded from below. The fields {νk , ν̄k }
break a global O(2) symmetry.
Students who have not taken a course in solid state physics can skip the following paragraph, and be
aware that N = V /v0 is the total volume of the system in units of a fundamental ”unit cell” volume.
The thermodynamic limit is then N → ∞. Note that N is not the boson particle number, which we’ll
call Nb .
Solid state physics boilerplate : We presume a setting in which the real space Hamiltonian is defined
by some boson hopping model on a Bravais lattice. The wavevectors k are then restricted to the first
Brillouin zone, Ω̂, and assuming periodic boundary conditions are quantized according to the condition
exp iNl k · al = 1 for all l ∈ {1, . . . , d}, where al is the lth fundamental direct lattice vector and Nl is
the size
Q of the system in the al direction; d isP the dimension of space. The total number of unit cells is
N ≡ l Nl . Thus, quantization entails k = l (2πnl /Nl ) bl , where bl is the lth elementary reciprocal
lattice vector (al · bl′ = 2πδll′ ) and nl ranges over Nl distinct integers such that the allowed k points form
a discrete approximation to Ω̂ .
To solve, we first shift the boson creation and annihilation operators, writing
X X |ν |2
K= (εk − µ) βk† βk − N k
, (5.334)
εk − µ
k k
5.10. APPENDIX II : IDEAL BOSE GAS CONDENSATION 285

where √ √
N νk N ν̄k
βk = bk − , βk† = b†k − . (5.335)
εk − µ εk − µ
 
Note that βk , βk† ′ = δkk′ so the above transformation is canonical. The Landau free energy Ω =
−kB T ln Ξ , where Ξ = Tr e−K/kB T , is given by
Z∞ X |ν |2

Ω = N kB T dε g(ε) ln 1 − e(µ−ε)/kb T − N k
, (5.336)
−∞ k εk−µ

where g(ε) is the density of energy states per unit cell,


Z d
1 X  dk 
g(ε) = δ ε − εk −−−−→ d
δ ε − εk . (5.337)
N N →∞ (2π)
k
Ω̂

Note that
1
1 ∂Ω νk
ψk ≡ √ bk = − = . (5.338)
N N ∂ ν̄k εk −µ

In the condensed phase, ψk is nonzero.


The Landau free energy (grand potential) is a function Ω(T, N, µ, ν, ν̄). We now make a Legendre
transformation, X 
Y (T, N, µ, ψ, ψ̄) = Ω(T, N, µ, ν, ν̄) + N νk ψ̄k + ν̄k ψk . (5.339)
k
Note that
∂Y ∂Ω
= + N ψk = 0 , (5.340)
∂ ν̄k ∂ ν̄k
by the definition of ψk . Similarly, ∂Y /∂νk = 0. We now have
Z∞ X

Y (T, N, µ, ψ, ψ̄) = N kB T dε g(ε) ln 1 − e(µ−ε)/kb T + N (εk − µ) |ψk |2 . (5.341)
−∞ k

Therefore, the boson particle number per unit cell is given by the dimensionless density,

X Z∞
N 1 ∂Y g(ε)
n= b =− = |ψk |2 + dε , (5.342)
N N ∂µ e(ε−µ)/kB T −1
k −∞

and the condensate amplitude at wavevector k is


1 ∂Y
νk = = (εk − µ) ψk . (5.343)
N ∂ ψ̄k

Recall that νk acts as an external field. Let the dispersion εk be minimized at k = K . Without loss of
generality, we may assume this minimum value is εK = 0 . We see that if νk = 0 then one of two must
be true:
286 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

(i) ψk = 0 for all k

(ii) µ = εK , in which case ψK can be nonzero.

Thus, for ν = ν̄ = 0 and µ > 0, we have the usual equation of state,


Z∞
g(ε)
n(T, µ) = dε , (5.344)
−∞
e(ε−µ)/kB T −1

which relates the intensive variables n, T , and µ. When µ = 0, the equation of state becomes
n> (T )
n0 z }| {
zX }| { Z∞
g(ε)
n(T, µ = 0) = |ψK |2 + dε , (5.345)
eε/k BT − 1
K −∞

where now the sum is over only those K for which εK = 0 . Typically this set has only one member,
K = 0, but it is quite possible, due to symmetry reasons, that there are more such K values. This last
equation of state is one which relates the intensive variables n, T , and n0 , where
X
n0 = |ψK |2 (5.346)
K

is the dimensionless condensate density. If the integral n> (T ) in eqn. 5.345 is finite, then for n > n0 (T )
we must have n0 > 0. Note that, for any T , n> (T ) diverges logarithmically whenever g(0) is finite. This
means that eqn. 5.344 can always be inverted to yield a finite µ(n, T ), no matter how large the value of
n, in which case there is no condensation and n0 = 0. If g(ε) ∝ εα with α > 0, the integral converges and
n> (T ) is finite and monotonically increasing for all T . Thus, for fixed dimensionless number n, there will
be a critical temperature Tc for which n = n> (Tc ). For T < Tc , eqn. 5.344 has no solution for any µ and
we must appeal to eqn. 5.345. The condensate density, given by n0 (n, T ) = n − n> (T ) , is then finite for
T < Tc , and vanishes for T ≥ Tc .
In the condensed phase, the phase of the order parameter ψ inherits its phase from the external field
ν, which is taken to zero, in the same way the magnetization in the symmetry-broken phase of an Ising
ferromagnet inherits its direction from an applied field h which is taken to zero. The important feature
is that in both cases the applied field is taken to zero after the approach to the thermodynamic limit.

5.11 Appendix III : Example Bose Condensation Problem

PROBLEM: A three-dimensional gas of noninteracting bosonic particles obeys the dispersion relation ε(k) =
1/2
A k .

(a) Obtain an expression for the density n(T, z) where z = exp(µ/kB T ) is the fugacity. Simplify your
expression as best you can, adimensionalizing any integral or infinite sum which may appear. You
5.11. APPENDIX III : EXAMPLE BOSE CONDENSATION PROBLEM 287

may find it convenient to define


Z∞ ∞
X zk
1 tν−1
Liν (z) ≡ dt = . (5.347)
Γ(ν) z −1 et − 1 kν
0 k=1

Note Liν (1) = ζ(ν), the Riemann zeta function.

(b) Find the critical temperature for Bose condensation, Tc (n). Your expression should only include
the density n, the constant A, physical constants, and numerical factors (which may be expressed
in terms of integrals or infinite sums).
1
(c) What is the condensate density n0 when T = 2 Tc ?

(d) Do you expect the second virial coefficient to be positive or negative? Explain your reasoning. (You
don’t have to do any calculation.)

SOLUTION: We work in the grand canonical ensemble, using Bose-Einstein statistics.

(a) The density for Bose-Einstein particles are given by


Z 3
dk 1
n(T, z) = 3
(2π) z exp(Ak 1/2 /kB T ) − 1
−1

  Z∞
1 kB T 6 s5
= 2 ds −1 s (5.348)
π A z e −1
0
 6
120 kB T
= Li6 (z) ,
π2 A

where we have changed integration variables from k to s = Ak 1/2 /kB T , and we have defined the
functions Liν (z) as above, in eqn. 5.347. Note Liν (1) = ζ(ν), the Riemann zeta function.

(b) Bose condensation sets in for z = 1, i.e. µ = 0. Thus, the critical temperature Tc and the density
n are related by
 
120 ζ(6) kB Tc 6
n= , (5.349)
π2 A
or  
A π 2 n 1/6
Tc (n) = . (5.350)
kB 120 ζ(6)

(c) For T < Tc , we have


 
120 ζ(6) kB T 6
n = n0 +
π2 A
 6 (5.351)
T
= n0 + n,
Tc
288 CHAPTER 5. NONINTERACTING QUANTUM SYSTEMS

1
where n0 is the condensate density. Thus, at T = 2 Tc ,

n0 T = 12 Tc = 63
64 n. (5.352)

(d) The virial expansion of the equation of state is


 
p = nkB T 1 + B2 (T ) n + B3 (T ) n2 + . . . . (5.353)

We expect B2 (T ) < 0 for noninteracting bosons, reflecting the tendency of the bosons to condense.
(Correspondingly, for noninteracting fermions we expect B2 (T ) > 0.)
For the curious, we compute B2 (T ) by eliminating the fugacity z from the equations for n(T, z) and
p(T, z). First, we find p(T, z):
Z 3  
dk 1/2
p(T, z) = −kB T ln 1 − z exp(−Ak /kB T )
(2π)3
  Z∞
kB T kB T 6 
=− 2 ds s5 ln 1 − z e−s (5.354)
π A
0
 6
120 kB T kB T
= Li7 (z).
π2 A

Expanding in powers of the fugacity, we have


6 n o
120 z2
kB T z3
n= 2 z+ 6 + + . . .
π A2 36
 6 n o (5.355)
p 120 kB T z2 z3
= 2 z+ 7 + + . . . .
kB T π A 2 37

Solving for z(n) using the first equation, we obtain, to order n2 ,


   2
π 2 A6 n 1 π 2 A6 n
z= − 6 + O(n3 ) . (5.356)
120 (kB T )6 2 120 (kB T )6

Plugging this into the equation for p(T, z), we obtain the first nontrivial term in the virial expansion,
with  
π2 A 6
B2 (T ) = − , (5.357)
15360 kB T
which is negative, as expected. Note that the ideal gas law is recovered for T → ∞, for fixed n.
Chapter 6

Classical Interacting Systems

6.1 References

– M. Kardar, Statistical Physics of Particles (Cambridge, 2007)


A superb modern text, with many insightful presentations of key concepts.

– L. E. Reichl, A Modern Course in Statistical Physics (2nd edition, Wiley, 1998)


A comprehensive graduate level text with an emphasis on nonequilibrium phenomena.

– M. Plischke and B. Bergersen, Equilibrium Statistical Physics (3rd edition, World Scientific, 2006)
An excellent graduate level text. Less insightful than Kardar but still a good modern treatment of
the subject. Good discussion of mean field theory.

– E. M. Lifshitz and L. P. Pitaevskii, Statistical Physics (part I, 3rd edition, Pergamon, 1980)
This is volume 5 in the famous Landau and Lifshitz Course of Theoretical Physics. Though dated,
it still contains a wealth of information and physical insight.

– J.-P Hansen and I. R. McDonald, Theory of Simple Liquids (Academic Press, 1990)
An advanced, detailed discussion of liquid state physics.

289
290 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

6.2 Ising Model

6.2.1 Definition

The simplest model of an interacting system consists of a lattice L of sites, each of which contains a spin
σi which may be either up (σi = +1) or down (σi = −1). The Hamiltonian is
X X
Ĥ = −J σi σj − µ 0 H σi . (6.1)
hiji i

When J > 0, the preferred (i.e. lowest energy) configuration of neighboring spins is that they are aligned,
i.e. σi σj = +1. The interaction is then called ferromagnetic. When J < 0 the preference is for anti-
alignment, i.e. σi σj = −1, which is antiferromagnetic.
This model is not exactly solvable in general. In one dimension, the solution is quite straightforward.
In two dimensions, Onsager’s solution of the model (with H = 0) is among the most celebrated results
in statistical physics. In higher dimensions the system has been studied by numerical simulations (the
Monte Carlo method) and by field theoretic calculations (renormalization group), but no exact solutions
exist.

6.2.2 Ising model in one dimension

Consider a one-dimensional ring of N sites. The ordinary canonical partition function is then

Zring = Tr e−β Ĥ
N
XY
= eβJσn σn+1 eβµ0 Hσn (6.2)
{σn } n=1

= Tr RN ,

where σN +1 ≡ σ1 owing to periodic (ring) boundary conditions, and where R is a 2 × 2 transfer matrix ,
′ ′
Rσσ′ = eβJσσ eβµ0 H(σ+σ )/2
 βJ βµ H 
e e 0 e−βJ
= (6.3)
e−βJ eβJ e−βµ0 H
= eβJ cosh(βµ0 H) + eβJ sinh(βµ0 H) τ z + e−βJ τ x ,

where τ α are the Pauli matrices. Since the trace of a matrix is invariant under a similarity transformation,
we have
Z(T, H, N ) = λN N
+ + λ− , (6.4)

where q
λ± (T, H) = eβJ cosh(βµ0 H) ± e2βJ sinh2 (βµ0 H) + e−2βJ (6.5)
6.2. ISING MODEL 291

are the eigenvalues of R. In the thermodynamic limit, N → ∞, and the λN + term dominates exponentially.
We therefore have
F (T, H, N ) = −N kB T ln λ+ (T, H) . (6.6)
From the free energy, we can compute the magnetization,
 
∂F N µ0 sinh(βµ0 H)
M =− =q (6.7)
∂H T,N sinh2 (βµ0 H) + e−4βJ

and the zero field isothermal susceptibility,



1 ∂M µ20 2J/k T
χ(T ) = = e B . (6.8)
N ∂H H=0 kB T

Note that in the noninteracting limit J → 0 we recover the familiar result for a free spin. The effect of the
interactions at low temperature is to vastly increase the susceptibility. Rather than a set of independent
single spins, the system effectively behaves as if it were composed of large blocks of spins, where the block
size ξ is the correlation length, to be derived below.
The physical properties of the system are often elucidated by evaluation of various correlation functions.
In this case, we define


Tr σ1 Rσ1 σ2 · · · Rσn σn+1 σn+1 Rσn+1 σn+2 · · · RσN σ1
C(n) ≡ σ1 σn+1 = 
Tr RN
 (6.9)
Tr Σ Rn Σ RN −n
=  ,
Tr RN

where 0 < n < N , and where  


1 0
Σ= . (6.10)
0 −1
To compute this ratio, we decompose R in terms of its eigenvectors, writing

R = λ+ |+ih+| + λ− |−ih−| . (6.11)

Then 
N −n n N −n
λN 2 N 2
+ Σ++ + λ− Σ−− + λ+ λ− + λn+ λ− Σ+− Σ−+
C(n) = N N
, (6.12)
λ+ + λ−
where
Σµµ′ = h µ | Σ | µ′ i . (6.13)

6.2.3 Zero external field

Consider the case H = 0, where R = eβJ + e−βJ τ x , where τ x is the Pauli matrix. Then
 
| ± i = √12 | ↑i ± | ↓i , (6.14)
292 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

i.e. the eigenvectors of R are  


1 1
ψ± = √ , (6.15)
2 ±1
and Σ++ = Σ−− = 0, while Σ± = Σ−+ = 1. The corresponding eigenvalues are

λ+ = 2 cosh(βJ) , λ− = 2 sinh(βJ) . (6.16)

The correlation function is then found to be


N −|n| |n| |n| N −|n|

λ+ λ− + λ+ λ−
C(n) ≡ σ1 σn+1 =
λN N
+ + λ−

tanh|n| (βJ) + tanhN −|n| (βJ) (6.17)


=
1 + tanhN (βJ)
≈ tanh|n| (βJ) (N → ∞) .

This result is also valid for n < 0, provided |n| ≤ N . We see that we may write

C(n) = e−|n|/ξ(T ) , (6.18)

where the correlation length is


1
ξ(T ) = . (6.19)
ln ctnh(J/kB T )
1
Note that ξ(T ) grows as T → 0 as ξ ≈ 2 e2J/kB T .

6.2.4 Chain with free ends

When the chain has free ends, there are (N −1) links, and the partition function is
X 
Zchain = RN −1 σσ′
σ,σ′
Xn o (6.20)
N −1 N −1
= λ+ ψ+ (σ) ψ+ (σ ′ ) + λ− ψ− (σ) ψ− (σ ′ ) ,
σ,σ′

where ψ± (σ) = h σ | ± i. When H = 0, we make use of eqn. 6.15 to obtain


   
N −1 1 1 1 N −1 1 1 −1 N −1
R = 2 cosh βJ + 2 sinh βJ , (6.21)
2 1 1 2 −1 1

and therefore
Zchain = 2N coshN −1 (βJ) . (6.22)

There’s a nifty trick to obtaining the partition function for the Ising chain which amounts to a change of
variables. We define
νn ≡ σn σn+1 (n = 1 , . . . , N − 1) . (6.23)
6.2. ISING MODEL 293

Thus, ν1 = σ1 σ2 , ν2 = σ2 σ3 , etc. Note that each νj takes the values ±1. The Hamiltonian for the chain
is
N
X −1 NX−1
Hchain = −J σn σn+1 = −J νn . (6.24)
n=1 n=1

The state of the system is defined by the N Ising variables {σ1 , ν1 , . . . , νN −1 }. Note that σ1 doesn’t
appear in the Hamiltonian. Thus, the interacting model is recast as N−1 noninteracting Ising spins, and
the partition function is
Zchain = Tr e−βHchain
XX X
= ··· eβJν1 eβJν2 · · · eβJνN−1
σ1 ν1 νN−1 (6.25)
!N −1
X X
= eβJν = 2N coshN −1 (βJ) .
σ1 ν

6.2.5 Ising model in two dimensions : Peierls’ argument

We have just seen how in one dimension, the Ising model never achieves long-ranged spin order. That is,
the spin-spin correlation function decays asymptotically as an exponential function of the distance with
a correlation length ξ(T ) which is finite for all > 0. Only for T = 0 does the correlation length diverge.
At T = 0, there are two ground states, | ↑↑↑↑ · · · ↑ i and | ↓↓↓↓ · · · ↓ i. To choose between these ground
states, we can specify a boundary condition at the ends of our one-dimensional chain, where we demand
that the spins are up. Equivalently, we can apply a magnetic field H of order 1/N , which vanishes in
the thermodynamic limit, but which at zero temperature will select the ‘all up’ ground state. At finite
temperature, there is always a finite probability for any consecutive pair of sites (n, n+1) to be in a high
energy state, i.e. either | ↑↓ i or | ↓↑ i. Such a configuration is called a domain wall, and in one-dimensional
systems domain walls live on individual links. Relative to the configurations | ↑↑ i and | ↓↓ i, a domain
wall costs energy 2J. For a system with M = xN domain walls, the free energy is
 
N
F = 2M J − kB T ln
M
( ) (6.26)
h i
= N · 2Jx + kB T x ln x + (1 − x) ln(1 − x) ,
 
Minimizing the free energy with respect to x, one finds x = 1 e2J/kB T + 1 , so the equilibrium con-
centration of domain walls is finite, meaning there can be no long-ranged spin order. In one dimension,
entropy wins and there is always a thermodynamically large number of domain walls in equilibrium. And
since the correlation length for T > 0 is finite, any boundary conditions imposed at spatial infinity will
have no thermodynamic consequences since they will only be ‘felt’ over a finite range.
As we shall discuss in the following chapter, this consideration is true for any system with sufficiently
short-ranged interactions and a discrete global symmetry. Another example is the q-state Potts model,
X X
H = −J δσ ,σ − h δσ ,1 . (6.27)
i j i
hiji i
294 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

++++++++ +++++++ +++++++++++++++


+ −− − + + + + + −− + + + + + −− − + + + + + −− + + + +
+ + + −− + + + + −− + + + + + + + −− + + + + −− + + + +
+ + −−− + + + +++++++ + + −−− + + + + + + + + + +
+ +−+ + + + + +++++++ + +−+ + + + + + + + + + + +
+ + + +−+ + + + +−+ + + + + + + +−+ + + + +−+ + + +
+ + + +−− + + + + −−− + + + + + + − − + −− + − − − + +
++++++++ + + −−− + + + + + + + + + − + + −−− + +
+ + + + −− + + + + −−+ + + + + + + −− + −− + − − + + +
+ + −− − − + + + + + −−− + + + −− − − + + − + + − − − +
+ + −− − + + + + + + + −−+ + + −− − + + + − + + + − − +
+ + −− + + + + +++++++ + + −− + + + + −− − + + + +
++++++++ + + +−+ + + + + + + + + + + + + +−+ + +
+ −− − + + − + + + +−+ + + + −− − + + − + + + + − + + +
++++++++ +++++++ +++++++++++++++
Figure 6.1: Clusters and boundaries for the square lattice Ising model. Left panel: a configuration Γ
where the central spin is up. Right panel: a configuration Cγ ◦ Γ where the interior spins of a new loop
γ containing the central spin have been flipped.

Here, the spin variables σi take values in the set {1, 2, . . . , q} on each site. The equivalent of an external
magnetic field in the Ising case is a field h which prefers a particular value of σ (σ = 1 in the above
Hamiltonian). See the appendix in §6.8 for a transfer matrix solution of the one-dimensional Potts
model.
What about higher dimensions? A nifty argument due to R. Peierls shows that there will be a finite
temperature phase transition for the Ising model on the square lattice1 . Consider the Ising model, in
zero magnetic field, on a Nx × Ny square lattice, with Nx,y → ∞ in the thermodynamic limit. Along
the perimeter of the system we impose the boundary condition σi = +1. Any configuration of the spins
may then be represented uniquely in the following manner. Start with a configuration in which all spins
are up. Next, draw a set of closed loops on the lattice. By definition, the loops cannot share any links
along their boundaries, i.e. each link on the lattice is associated with at most one such loop. Now flip all
the spins inside each loop from up to down. Identify each such loop configuration with a label Γ . The
partition function is
X
Z = Tr e−β Ĥ = e−2βJLΓ , (6.28)
Γ

where LΓ is the total perimeter of the loop configuration Γ . The domain walls are now loops, rather
than individual links, but as in the one-dimensional case, each link of each domain wall contributes an
energy +2J relative to the ground state.
Now we wish to compute the average magnetization of the central site (assume Nx,y are both
odd, so
there is a unique central site). This is given by the difference P+ (0) − P− (0), where Pµ (0) = δσ , µ is
0
the probability that the central spin has spin polarization µ. If P+ (0) > P− (0), then the magnetization

1
Here we modify slightly the discussion in chapter 5 of the book by L. Peliti.
6.2. ISING MODEL 295

per site m = P+ (0) − P− (0) is finite in the thermodynamic limit, and the system is ordered. Clearly

1 X −2βJL
P+ (0) = e Γ , (6.29)
Z
Γ ∈Σ+

where the restriction on the sum indicates that only those configurations where the central spin is up
(σ0 = +1) are to be included. (see fig. 6.1a). Similarly,

1 X −2βJL e
P− (0) = e Γ , (6.30)
Z
Γe∈Σ−

where only configurations in which σ0 = −1 are included in the sum. Here we have defined
n o
Σ ± = Γ σ0 = ± . (6.31)

I.e. Σ+ (Σ− ) is the set of configurations Γ in which the central spin is always up (down). Consider now
the construction in fig. 6.1b. Any loop configuration Γe ∈ Σ− may be associated with a unique loop
configuration Γ ∈ Σ+ by reversing all the spins within the loop of Γe which contains the origin. Note
that the map from Γe to Γ is many-to-one. That is, we can write Γe = Cγ ◦ Γ , where Cγ overturns the
spins within the loop γ, with the conditions that (i) γ contains the origin, and (ii) none of the links in
the perimeter of γ coincide with any of the links from the constituent loops of Γ . Let us denote this set
of loops as ΥΓ : n o
ΥΓ = γ : 0 ∈ int(γ) and γ ∩ Γ = ∅ . (6.32)

Then  
1 X −2βJL X
−2βJLγ
m = P+ (0) − P− (0) = e Γ 1− e . (6.33)
Z
Γ ∈Σ+ γ∈ΥΓ

P
If we can prove that γ∈Υ e−2βJLγ < 1, then we will have established that m > 0. Let us ask: how
Γ
many loops γ are there in ΥΓ with perimeter L? We cannot answer this question exactly, but we can
derive a rigorous upper bound for this number, which, following Peliti, we call g(L). We claim that
 2
2 L L L
g(L) < · 3L · = ·3 . (6.34)
3L 4 24

To establish this bound, consider any site on such a loop γ. Initially we have 4 possible directions to
proceed to the next site, but thereafter there are only 3 possibilities for each subsequent step, since the
loop cannot run into itself. This gives 4 · 3L−1 possibilities. But we are clearly overcounting, since any
point on the loop could have been chosen as the initial point, and moreover we could have started by
proceeding either clockwise or counterclockwise. So we are justified in dividing this by 2L. We are still
overcounting, because we have not accounted for the constraint that γ is a closed loop, nor that γ ∩Γ = ∅.
We won’t bother trying to improve our estimate to account for these constraints. However, we are clearly
undercounting due to the fact that a given loop can be translated in space so long as the origin remains
within it. To account for this, we multiply by the area of a square of side length L/4, which is the
maximum area that can be enclosed by a loop of perimeter L. We therefore arrive at eqn. 6.34. Finally,
296 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

-32 -33 -34 -35

-31 18 19 20 21 22 23 24 25

-30 17 -8 -9 -10 -11 -12 -13 26

-29 16 -7 2 3 4 5 -14 27

-28 15 -6 1 0 -1 6 -15 28

-27 14 -5 -4 -3 -2 7 -16 29

-26 13 12 11 10 9 8 -17 30

-25 -24 -23 -22 -21 -20 -19 -18 31

35 34 33 32

Figure 6.2: A two-dimensional square lattice mapped onto a one-dimensional chain.

we note that the smallest possible value of L is L = 4, corresponding to a square enclosing the central
site alone. Therefore
X ∞
−2βJLγ 1 X 2k x4 (2 − x2 )
e < k · 3 e−2βJ = ≡r , (6.35)
12 12 (1 − x2 )2
γ∈ΥΓ k=2

where x = 3 e−2βJ . Note that we have accounted for the fact that the perimeter L of each loop γ must
be an even integer. The sum is smaller than unity provided x < x0 = 0.869756 . . ., hence the system is
ordered provided
kB T 2
< = 1.61531 . (6.36)
J ln(3/x0 )
The exact result is kB Tc = 2J/ sinh−1 (1) = 2.26918 . . . The Peierls argument has been generalized to
higher dimensional lattices as well2 .
With a little more work we can derive a bound for the magnetization. We have shown that

1 X −2βJL X −2βJLγ 1 X −2βJL


P− (0) = e Γ e <r· e Γ = r P (0) .
+ (6.37)
Z Z
Γ ∈Σ+ γ∈ΥΓ Γ ∈Σ+

Thus,
1 = P+ (0) + P− (0) < (1 + r) P+ (0) (6.38)
and therefore
1−r
m = P+ (0) − P− (0) > (1 − r) P+ (0) > , (6.39)
1+r
where r(T ) is given in eqn. 6.35.
2
See. e.g. J. L. Lebowitz and A. E. Mazel, J. Stat. Phys. 90, 1051 (1998).
6.2. ISING MODEL 297

6.2.6 Two dimensions or one?

We showed that the one-dimensional Ising model has no finite temperature phase transition, and is
disordered at any finite temperature T , but in two dimensions on the square lattice there is a finite
critical temperature Tc below which there is long-ranged order. Consider now the construction depicted
in fig. 6.2, where the sites of a two-dimensional square lattice are mapped onto those of a linear chain3 .
Clearly we can elicit a one-to-one mapping between the sites of a two-dimensional square lattice and
those of a one-dimensional chain. That is, the two-dimensional square lattice Ising model may be written
as a one-dimensional Ising model, i.e.
square linear
X
lattice X
chain

Ĥ = −J σi σj = − Jnn′ σn σn′ . (6.40)


hiji n,n′

How can this be consistent with the results we have just proven?
The fly in the ointment here is that the interaction along the chain Jn,n′ is long-ranged. This is apparent
from inspecting the site labels in fig. 6.2. Note that site n = 15 is linked to sites n′ = 14 and n′ = 16,
but also to sites n′ = −6 and n′ = −28. With each turn of the concentric spirals in the figure, the range
of the interaction increases. To complicate matters further, the interactions are no longer translationally
invariant, i.e. Jnn′ 6= J(n − n′ ). But it is the long-ranged nature of the interactions on our contrived
one-dimensional chain which spoils our previous energy-entropy argument, because now the domain walls
themselves interact via a long-ranged potential. Consider for example the linear chain with Jn,n′ =
J |n − n′ |−α , where α > 0. Let us compute the energy of a domain wall configuration where σn = +1 if
n > 0 and σn = −1 if n ≤ 0. The domain wall energy is then
∞ X
X ∞
2J
∆= . (6.41)
|m + n|α
m=0 n=1

Here we have written one of the sums in terms of m = −n′ . For asymptotically large m and n, we can
write R = (m, n) and we obtain an integral over the upper right quadrant of the plane:

Z∞ Zπ/2 Zπ/4 Z∞
2J −α/2 dφ dR
dR R dφ α α
=2 . (6.42)
R (cos φ + sin φ) cos φ Rα−1
α
1 0 −π/4 1

The φ integral is convergent, but the R integral diverges for α ≤ 2. For a finite system, the upper
bound on the R integral becomes the system size L. For α > 2 the domain wall energy is finite in the
thermodynamic limit L → ∞. In this case, entropy again wins. I.e. the entropy associated with a single
domain wall is kB ln L, and therefore F = E − kB T is always lowered by having a finite density of domain
walls. For α < 2, the energy of a single domain wall scales as L2−α . It was first proven by F. J. Dyson
in 1969 that this model has a finite temperature phase transition provided 1 < α < 2. There is no
transition for α < 1 or α > 2. The case α = 2 is special, and is discussed as a special case in the beautiful
renormalization group analysis by J. M. Kosterlitz in Phys. Rev. Lett. 37, 1577 (1976).
3
A corresponding mapping can be found between a cubic lattice and the linear chain as well.
298 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

6.2.7 High temperature expansion

Consider once again the ferromagnetic Ising model in zero field (H = 0), but on an arbitrary lattice. The
partition function is
( )
P
βJ hiji σi σj N Y 
Z = Tr e = cosh βJ L Tr 1 + x σi σj , (6.43)
hiji

where x = tanh βJ and NL is the number of links. For regular lattices, NL = 12 zN , where N is the
number of lattice sites and z is the lattice coordination number, i.e. the number of nearest neighbors for
each site. We have used
(
n o e+βJ if σσ ′ = +1
βJσσ′ ′
e = cosh βJ · 1 + σσ tanh βJ = (6.44)
e−βJ if σσ ′ = −1 .

We expand eqn. 6.43 in powers of x, resulting in a sum of 2NL terms, each of which can be represented
graphically in terms of so-called lattice animals. A lattice animal is a distinct (including reflections and
rotations) arrangement of adjacent plaquettes on a lattice. In order that the trace not vanish, only such
configurations and their compositions are permitted. This is because each σi for every given site i must
occur an even number of times in order for a given term in the sum not to vanish. For all such terms,
the trace is 2N . Let Γ represent a collection of lattice animals, and gΓ the multiplicity of Γ . Then
N X L
Z = 2N cosh βJ L gΓ tanh βJ Γ , (6.45)
Γ

where LΓ is the total number of sites in the diagram Γ , and gΓ is the multiplicity of Γ . Since x vanishes
as T → ∞, this procedure is known as the high temperature expansion (HTE).
For the square lattice, he enumeration of all lattice animals with up to order eight is given in fig. 6.3. For
the diagram represented as a single elementary plaquette, there are N possible locations for the lower left
vertex. For the 2 × 1 plaquette animal, one has g = 2N , because there are two inequivalent orientations
as well as N translations. For two disjoint elementary squares, one has g = 12 N (N − 5), which arises
from subtracting 5N ‘illegal’ configurations involving double lines (remember each link in the partition
sum appears only once!), shown in the figure, and finally dividing by two because the individual squares
are identical. Note that N (N − 5) is always even for any integer value of N . Thus, to lowest interesting
order on the square lattice,
2N n  o
Z = 2N cosh βJ 1 + N x4 + 2N x6 + 7 − 52 N x8 + 21 N 2 x8 + O(x10 ) . (6.46)

The free energy is therefore


h i
F = −kB T ln 2 + N kB T ln(1 − x2 ) − N kB T x4 + 2 x6 + 92 x8 + O(x10 )
n o (6.47)
= N kB T ln 2 − N kB T x2 + 23 x4 + 73 x6 + 19
4 x 8
+ O(x 10
) ,

again with x = tanh βJ. Note that we’ve substituted cosh2 βJ = 1/(1 − x2 ) to write the final result as
a power series in x. Notice that the O(N 2 ) factor in Z has cancelled upon taking the logarithm, so the
free energy is properly extensive.
6.2. ISING MODEL 299

Figure 6.3: HTE diagrams on the square lattice and their multiplicities.

Note that the high temperature expansion for the one-dimensional Ising chain yields

Zchain (T, N ) = 2N coshN −1 βJ , Zring (T, N ) = 2N coshN βJ , (6.48)

in agreement with the transfer matrix calculations. In higher dimensions, where there is a finite tem-
perature phase transition, one typically computes the specific heat c(T ) and tries to extract its singular
behavior in the vicinity of Tc , where c(T ) ∼ A (T − Tc )−α . Since x(T ) = tanh(J/kB T ) is analytic in T ,
we have c(x) ∼ A′ (x − xc )−α , where xc = x(Tc ). One assumes xc is the singularity closest to the origin
and corresponds to the radius of convergence of the high temperature expansion. If we write


X  −α
n ′′ x
c(x) = an x ∼ A 1− , (6.49)
n=0
xc

then according to the binomial theorem we should expect


 
an 1 1−α
= 1− . (6.50)
an−1 xc n

Thus, by plotting an /an−1 versus 1/n, one extracts 1/xc as the intercept, and (α − 1)/xc as the slope.
300 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Figure 6.4: HTE diagrams for the numerator Ykl of the correlation function Ckl . The blue path connecting
sites k and l is the string. The remaining red paths are all closed loops.

High temperature expansion for correlation functions

Can we also derive a high temperature expansion for the spin-spin correlation function Ckl = hσk σl i ?
Yes we can. We have

h P i
Tr σk σl eβJ hiji σi σj Y
Ckl = h P i ≡ kl . (6.51)
Tr e βJ hiji σi σj Z

Recall our analysis of the partition function Z. We concluded that in order for the trace not to vanish,
the spin variable σi on each site i must occur an even number of times in the expansion of the product.
Similar considerations hold for Ykl , except now due to the presence of σk and σl , those variables now
must occur an odd number of times when expanding the product. It is clear that the only nonvanishing
diagrams will be those in which there is a finite string connecting sites k and l, in addition to the usual
closed HTE loops. See fig. 6.4 for an instructive sketch. One then expands both Ykl as well as Z in
powers of x = tanh βJ, taking the ratio to obtain the correlator Ckl . At high temperatures (x → 0),
both numerator and denominator are dominated by the configurations Γ with the shortest possible total
perimeter. For Z, this means the trivial path Γ = {∅}, while for Ykl this means finding the shortest
length path from k to l. (If there is no straight line path from k to l, there will in general be several
such minimizing paths.) Note, however, that the presence of the string between sites k and l complicates
the analysis of gΓ for the closed loops, since none of the links of Γ can intersect the string. It is worth
stressing that this does not mean that the string and the closed loops cannot intersect at isolated sites,
but only that they share no common links; see once again fig. 6.4.
6.3. NONIDEAL CLASSICAL GASES 301

6.3 Nonideal Classical Gases

Let’s switch gears


 now and return to the study of continuous classical systems described by a Hamiltonian
Ĥ {xi }, {pi } . In the next chapter, we will see how the critical properties of classical fluids can in fact be
modeled by an appropriate lattice gas Ising model, and we’ll derive methods for describing the liquid-gas
phase transition in such a model.

6.3.1 The configuration integral

Consider the ordinary canonical partition function for a nonideal system of identical point particles
interacting via a central two-body potential u(r). We work in the ordinary canonical ensemble. The
N -particle partition function is
Z Y N
1 ddpi ddxi −Ĥ/k T
Z(T, V, N ) = e B
N! hd
i=1
  (6.52)
−N d Z YN
λ 1 X 
= T ddxi exp − u |xi − xj | .
N! kB T
i=1 i<j

Here, we have assumed a many body Hamiltonian of the form


N
X X
p2i 
Ĥ = + u |xi − xj | , (6.53)
2m
i=1 i<j

in which massive nonrelativistic particles interact via a two-body central potential. As before, λT =
p
2π~2 /mkB T is the thermal wavelength. We can now write

Z(T, V, N ) = λ−N
T
d
QN (T, V ) , (6.54)

where the configuration integral QN (T, V ) is given by


Z Z Y −βu(r )
1
QN (T, V ) = d x1 · · · ddxN
d
e ij . (6.55)
N!
i<j

There are no general methods for evaluating the configurational integral exactly.

6.3.2 One-dimensional Tonks gas

The Tonks gas is a one-dimensional generalization of the hard sphere gas. Consider a one-dimensional
gas of indistinguishable particles of mass m interacting via the potential
(
′ ∞ if |x − x′ | < a
u(x − x ) = (6.56)
0 if |x − x′ | ≥ a .
302 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Thus, the Tonks gas may be considered to be a gas of hard rods. The above potential guarantees that
the portion of configuration space in which any rods overlap is forbidden in this model4 . Let the gas
be placed in a finite volume L. The hard sphere nature of the particles means that no particle can get
within a distance 12 a of the ends at x = 0 and x = L. That is, there is a one-body potential v(x) acting
as well, where 
1
∞ if x < 2 a

v(x) = 0 if 12 a ≤ x ≤ L − 12 a (6.57)

 1
∞ if x > L − 2 a .

The configuration integral of the 1D Tonks gas is given by

ZL ZL
1
QN (T, L) = dx1 · · · dxN χ(x1 , . . . , xN ) , (6.58)
N!
0 0

where χ = e−U/kB T is zero if any two ‘rods’ (of length a) overlap, or if any rod overlaps with either
boundary at x = 0 and x = L, and χ = 1 otherwise. Note that χ does not depend on temperature.
Without loss of generality, we can integrate over the subspace where x1 < x2 < · · · < xN and then multiply
the result by N ! . Clearly xj must lie to the right of xj−1 + a and to the left of Yj ≡ L − (N − j)a − 12 a.
Thus, the configurational integral is

ZY1 ZY2 ZYN


QN (T, L) = dx1 dx2 · · · dxN
a/2 x1 +a xN−1 +a
YN−1
ZY1 ZY2 Z

= dx1 dx2 · · · dxN −1 YN −1 − xN −1
a/2 x1 +a xN−2 +a (6.59)
YN−2
ZY1 ZY2 Z
1
2
= dx1 dx2 · · · dxN −2 2 YN −2 − xN −2 = ···
a/2 x1 +a xN−3 +a
1 N 1
= X1 − 21 a = (L − N a)N .
N! N!

The partition function is Z(T, L, N ) = λ−N


T QN (T, L) , and so the free energy is
(  )
L
F = −kB T ln Z = −N kB T − ln λT + 1 + ln −a , (6.60)
N

where we have used Stirling’s rule to write ln N ! ≈ N ln N − N . The pressure is


∂F k T nkB T
p=− = LB = , (6.61)
∂L N −a
1 − na
4
Not that I personally think there’s anything wrong with that.
6.3. NONIDEAL CLASSICAL GASES 303

where n = N/L is the one-dimensional density. Note that the pressure diverges as n approaches 1/a.
The usual one-dimensional ideal gas law, pL = N kB T , is replaced by pLeff = N kB T , where Leff = L − N a
is the ‘free’ volume obtained by subtracting the total ”excluded volume” N a from the original volume
L. Note the similarity here to the van der Waals equation of state, (p + av −2 )(v − b) = RT , where
v = NA V /N is the molar volume. Defining ã ≡ a/NA2 and b̃ ≡ b/NA , we have

nkB T
p + ãn2 = , (6.62)
1 − b̃n
where n = NA /v is the number density. The term involving the constant ã is due to the long-ranged
attraction of atoms due to their mutual polarizability. The term involving b̃ is an excluded volume effect.
The Tonks gas models only the latter.

6.3.3 Mayer cluster expansion

Let us return to the general problem of computing the configuration integral. Consider the function
e−βuij , where uij ≡ u(|xi − xj |). We assume that at very short distances there is a strong repulsion
between particles, i.e. uij → ∞ as rij = |xi − xj | → 0, and that uij → 0 as rij → ∞. Thus, e−βuij
vanishes as rij → 0 and approaches unity as rij → ∞. For our purposes, it will prove useful to define the
function
f (r) = e−βu(r) − 1 , (6.63)
called the Mayer function after Josef Mayer. We may now write
Z Z Y
1 
QN (T, V ) = ddx1 · · · ddxN 1 + fij . (6.64)
N!
i<j

A typical potential we might consider is the semi-phenomenological Lennard-Jones potential,


  
σ 12  σ 6
u(r) = 4 ǫ − . (6.65)
r r

This accounts for a long-distance attraction due to mutually induced electric dipole fluctuations, and a
strong short-ranged repulsion, phenomenologically modelled with a r −12 potential, which mimics a hard
core due to overlap of the atomic electron distributions. Setting u′ (r) = 0 we obtain r ∗ = 21/6 σ ≈
1.12246 σ at the minimum, where u(r ∗ ) = −ǫ. In contrast to the Boltzmann weight e−βu(r) , the Mayer
function f (r) vanishes as r → ∞, behaving as f (r) ∼ −βu(r). The Mayer function also depends on
temperature. Sketches of u(r) and f (r) for the Lennard-Jones model are shown in fig. 6.5.
The Lennard-Jones potential5 is realistic for certain simple fluids, but it leads to a configuration integral
which is in general impossible to evaluate. Indeed, even a potential as simple as that of the hard sphere
gas is intractable in more than one space dimension. We can however make progress by deriving a series
5
Disambiguation footnote: Take care not to confuse Philipp Lenard (Hungarian-German, cathode ray tubes, Nazi),
Alfred-Marie Liénard (French, Liénard-Wiechert potentials, not a Nazi), John Lennard-Jones (British, molecular structure,
definitely not a Nazi), and Lynyrd Skynyrd (American, ”Free Bird”, possibly killed by Nazis in 1977 plane crash). I thank
my colleague Oleg Shpyrko for setting me straight on this.
304 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS


Figure 6.5: Bottom panel: Lennard-Jones potential u(r) = 4ǫ x−12 −x−6 , with x = r/σ and ǫ = 1. Note
the weak attractive tail and the strong repulsive core. Top panel: Mayer function f (r, T ) = e−u(r)/kB T − 1
for kB T = 0.8 ǫ (blue), kB T = 1.5 ǫ (green), and kB T = 5 ǫ (red).

expansion for the equation of state in powers of the particle density. This is known as the virial expansion.
As was the case when we investigated noninteracting quantum statistics, it is convenient to work in the
grand canonical ensemble and to derive series expansions for the density n(T, z) and the pressure p(T, z)
in terms of the fugacity z, then solve for z(T, n) to obtain p(T, n). These expansions in terms of fugacity
have a nifty diagrammatic interpretation, due to Mayer.
We begin by expanding the product in eqn. 6.64 as
Y  X X
1 + fij = 1 + fij + fij fkl + . . . . (6.66)
i<j i<j i<j , k<l
(ij)6=(kl)

As there are 21 N (N − 1) possible pairings, there are 2N (N −1)/2 terms in the expansion of the above
product. Each such term may be represented by a graph, as shown in fig. 6.7. For each such term,
we draw a connection between dots representing different particles i and j if the factor fij appears in
the term under consideration. The contribution for any given graph may be written as a product over
contributions from each of its disconnected component clusters. For example, in the case of the term in
fig. 6.7, the contribution to the configurational integral would be
Z
V N −11
∆Q = ddx1 ddx4 ddx7 ddx9 f1,4 f4,7 f4,9 f7,9
N!
Z Z Z (6.67)
× d x2 d x5 d x6 f2,5 f2,6 × d x3 d x10 f3,10 × ddx8 ddx11 f8,11 .
d d d d d

We will refer to a given product of Mayer functions which arises from this expansion as a term.
6.3. NONIDEAL CLASSICAL GASES 305

Figure 6.6: Left: John Lennard-Jones. Center: Catherine Zeta-Jones. Right: James Earl Jones.

The particular labels we assign to each vertex of a given graph don’t affect the overall value of the graph.
Now a given unlabeled graph consists of a certain number of connected subgraphs. For a system with N
particles, we may then write X
N= mγ n γ , (6.68)
γ

where γ ranges over all possible connected subgraphs, and

mγ = number of connected subgraphs of type γ in the unlabeled graph


nγ = number of vertices in the connected subgraph γ .

Note that the single vertex • counts as a connected subgraph, with n• = 1. We now ask: how many ways
are there of assigning the N labels to the N vertices of a given unlabeled graph? One might first thing
the answer is simply N !, however this is too big, because different assignments of the labels to the vertices
may not result in a distinct graph. To see this, consider the examples in fig. 6.8. In the first example, an
unlabeled graph with four vertices consists of two identical connected subgraphs. Given any assignment
of labels to the vertices, then, we Q
can simply exchange the two subgraphs and get the same term. So we
should divide N ! by the product γ mγ ! . But even this is not enough, because within each connected
subgraph γ there may be permutations which leave the integrand unchanged, as shown in the second and
third examples in fig. 6.8. We define the symmetry factor sγ as the number of permutations of the labels
which leaves a given connected subgraphs γ invariant. Examples of symmetry factors are shown in fig.
6.9. Consider, for example, the third subgraph in the top row. Clearly one can rotate the figure about

Figure 6.7: Diagrammatic interpretation of a term involving a product of eight Mayer functions.
306 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Figure 6.8: Different assignations of labels to vertices may not result in a distinct term in the expansion
of the configuration integral.

its horizontal symmetry axis to obtain a new labeling which represents the same term. This twofold axis
is the only symmetry the diagram possesses, hence sγ = 2. For the first diagram in the second row, one
can rotate either of the triangles about the horizontal symmetry axis. One can also rotate the figur e in
the plane by 180◦ so as to exchange the two triangles. Thus, there are 2 × 2 × 2 = 8 symmetry operations
which result in the same term, and sγ = 8. Finally, the last subgraph in the second row consists of five
vertices each of which is connected to the other four. Therefore any permutation
Q of the labels results in
the same term, and sγ = 5! = 120. In addition to dividing by the product γ mγ !, we must then also
Q m
divide by γ sγ γ .
We can now write the partition function as
!mγ
λ−N d X
N! Y Z Yγ
T d d
Z=
N! Q mγ · d x1 · · · d xn γ fij · δN , P m n
γ γ
{mγ } m γ ! s γ γ i<j
 (6.69)
X Y V bγ (T ) mγ
= λ−N
T
d
· δN , P m n
γ
m γ ! γ γ
{mγ }


where the product
P i<j fij is over all links in the subgraph γ. The final Kronecker delta enforces the
constraint N = γ mγ nγ . We have defined the cluster integrals bγ as
Z γ
Y
1 1
bγ (T ) ≡ · ddx1 · · · ddxnγ fij , (6.70)
sγ V
i<j

 Q
where we assume the limit V → ∞. Since fij = f |xi − xj | , the product γi<j fij is invariant under
simultaneous translation of all the coordinate vectors by any constant vector, and hence the integral over
6.3. NONIDEAL CLASSICAL GASES 307

the nγ position variables contains exactly one factor of the volume, which cancels with the prefactor in
the above definition of bγ . Thus, each cluster integral is intensive 6 , scaling as V 0 .
If we compute the grand partition function, then the fixed N constraint is relaxed, and we can do the
sums:
X P mγ nγ Y 1 m
Ξ = e−βΩ = eβµ λ−d
T V bγ γ
γ
mγ !
{mγ }

1  βµ −d mγ nγ

Y X m
= e λT V bγ γ (6.71)
γ mγ =0
mγ !
 X 

βµ −d nγ
= exp V e λT bγ .
γ

Thus, X nγ
Ω(T, V, µ) = −V kB T eβµ λ−d
T bγ (T ) , (6.72)
γ

and we can write


X nγ
p = kB T zλ−d
T bγ (T )
γ
X nγ (6.73)
n= nγ zλ−d
T bγ (T ) ,
γ

where z = exp(βµ) is the fugacity, and where b• ≡ 1. As in the case of ideal quantum gas statistical
mechanics, we can systematically invert the relation n = n(z, T ) to obtain z = z(n, T ), and then insert
this into the equation for p(z, T ) to obtain the equation of state p = p(n, T ). This yields the virial
expansion of the equation of state,
n o
p = nkB T 1 + B2 (T ) n + B3 (T ) n2 + . . . . (6.74)

6.3.4 Lowest order expansion

We have
Z Z
1 
b− (T ) = d
d x1 ddx2 f |x1 − x2 |
2V
Z (6.75)
= 2 ddr f (r)
1

and
Z Z Z
1  
b∧ (T ) = d d
d x1 d x2 ddx3 f |x1 − x2 | f |x1 − x3 |
2V
Z Z (6.76)
2
= 2 d r ddr ′ f (r) f (r ′ ) = 2 b−
1 d

6
We assume that the long-ranged behavior of f (r) ≈ −βu(r) is integrable.
308 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Figure 6.9: The symmetry factor


Q sγ for a connected subgraph γ is the number of permutations of its
indices which leaves the term (ij)∈γ fij invariant.

and
Z Z Z
1   
b△ (T ) = d d
d x1 d x2 ddx3 f |x1 − x2 | f |x1 − x3 | f |x2 − x3 |
6V
Z Z (6.77)

= 6 d r ddr ′ f (r) f (r ′ ) f |r − r ′ | .
1 d

We may now write


n    o
−d 2 −d 3
p = kB T zλ−d
T + zλ T b − (T ) + zλT · b ∧ + b △ + O(z 4
)
   (6.78)
2 3
n = zλ−d −d
T + 2 zλT b− (T ) + 3 zλ−dT · b∧ + b△ + O(z 4 )
We invert by writing
zλ−d 2 3
T = n + α2 n + α3 n + . . . (6.79)
and substituting into the equation for n(z, T ), yielding

n = (n + α2 n2 + α3 n3 ) + 2(n + α2 n2 )2 b− + 3n3 b∧ + b△ + O(n4 ) . (6.80)
Thus,
0 = (α2 + 2b− ) n2 + (α3 + 4α2 b− + 3b∧ + 3b△ ) n3 + . . . . (6.81)
We therefore conclude
α2 = −2b−
α3 = −4α2 b− − 3b∧ − 3b△ (6.82)
= 8b2− − 6b2− − 3b△ = 2b2− − 3b△ .
We now insert eqn. 6.79 with the determined values of α2,3 into the equation for p(z, T ), obtaining
p
= n − 2b− n2 + (2b2− − 3b△ ) n3 + (n − 2b− n2 )2 b− + n3 (2b2− + b△ ) + O(n4 )
kB T (6.83)
= n − b− n2 − 2b△ n3 + O(n4 ) .
6.3. NONIDEAL CLASSICAL GASES 309

Thus,
B2 (T ) = −b− (T ) , B3 (T ) = −2b△ (T ) . (6.84)
Note that b∧ does not contribute to B2 – only △ appears. As we shall see, this is because the virial
coefficients Bj involve only cluster integrals bγ for one-particle irreducible clusters, i.e. those clusters
which remain connected if any of the vertices plus all its links are removed.

6.3.5 One-particle irreducible clusters and the virial expansion

We start with eqn. 6.73 for p(T, z) and n(T, z),


X nγ
p = kB T zλ−d
T bγ (T )
γ
X nγ (6.85)
n= nγ zλ−d
T bγ (T ) ,
γ

where bγ (T ) for the connected cluster γ is given by


Z γ
Y
1 1
bγ (T ) ≡ · ddx1 · · · ddxnγ fij . (6.86)
sγ V
i<j

It is convenient to work with dimensionless quantities, using λdT as the unit of volume. To this end, define
nγ −1
ν ≡ nλdT , π ≡ pλdT , cγ (T ) ≡ bγ (T ) λdT , (6.87)

so that
X ∞
X X ∞
X
nγ ℓ nγ
βπ = cγ z = dℓ z , ν= nγ cγ z = ℓ dℓ z ℓ , (6.88)
γ ℓ=1 γ l=1

where X
dℓ = cγ δnγ , ℓ (6.89)
γ

is the sum over all connected clusters with ℓ vertices. Here and henceforth, the functional dependence on
T is implicit; π and ν are regarded here as explicit functions of z. We can, in principle, invert to obtain
z(ν). Let us write this inverse as
 X ∞ 
k
z(ν) = ν exp − βk ν . (6.90)
k=1
Ultimately we need to obtain expressions for the coefficients βk , but let us first assume the above form
and use it to write π in terms of ν. We have

X Zz X ∞ Zν Zν
ℓ ℓ−1 dz̃ ν̃ d ln z̃
βπ = dℓ z = dz̃ ℓ dℓ z̃ = dν̃ = dν̃
dν̃ z̃ d ln ν̃
ℓ=1 0 l=1 0 0
  (6.91)
Zν ∞
X ∞
X ∞
X
k βk k+1
= dν̃ 1 − k βk ν̃ k =ν− ν ≡ Bk ν k ,
k+1
0 k=1 k=1 k=1
310 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

−d(k−1)
where Bk = Bk λT is the dimensionless kth virial coefficient. Thus, Bk=1 = 1 and

k−1
Bk = − βk−1 (6.92)
k

for k > 1. We may also obtain the cluster integrals dℓ in terms of the βk . To this end, note that ℓ2 dℓ is
the coefficient of z ℓ in the function z dν/dz , hence

I   I I ∞
2 dz 1 dν dν −ℓ dν 1 Y ℓβ ν k
ℓ dℓ = z = z = e k
2πiz z ℓ dz 2πi 2πi ν ℓ
k=1
I ∞ ∞ (6.93)
dν 1 X Y (ℓ βk )mk km X
P
Y (ℓ βk )mk
= ν k = δ .
2πi ν ℓ mk ! k kmk , ℓ−1 mk !
{mk } k=1 {mk } k=1

Irreducible clusters

The clusters which contribute to dℓ are all connected, by definition. However, it is useful to make a
further distinction based on the topology of connected clusters and define a connected cluster γ to be
irreducible if, upon removing any site in γ and all the links connected to that site, the remaining sites
of the cluster are still connected. The situation is depicted in Fig. 6.10. For a reducible cluster γ, the
integral cγ is proportional to a product of cluster integrals over its irreducible components. Let us define

Figure 6.10: Connected versus irreducible clusters. Clusters (a) through (d) are irreducible in that they
remain connected if any component site and its connecting links are removed. Cluster (e) is connected,
but is reducible. Its integral cγ may be reduced to a product over its irreducible components, each shown
in a unique color.
6.3. NONIDEAL CLASSICAL GASES 311

the set Γℓ as the set of all irreducible clusters of ℓ vertices. It turns out that
Z Z γ
1 1 X d d
Y
βk (T ) = (k−1)d k!
d x 1 · · · d x k fij (6.94)
Vλ T γ∈Γk+1 hiji

Thus, the virial coefficients Bj (T ) are obtained by summing a restricted set of cluster integrals, viz.

k−1 (k−1)d
Bj (T ) = − βk−1 (T ) λT . (6.95)
k
In the end, it turns out we don’t need the symmetry factors at all!

6.3.6 Cookbook recipe

Just follow these simple steps!

• The pressure and number density are written as an expansion over unlabeled connected clusters γ,
viz.
X nγ
βp = zλ−d
T bγ
γ
X nγ (6.96)
n= nγ zλ−d
T bγ .
γ

• For each term in each of these sums, draw the unlabeled connected cluster γ.

• Assign labels 1 , 2 , . . . , nγ to the vertices, where nγ is the total number of vertices in the cluster
γ. It doesn’t matter how you assign the labels.
Q
• Write down the product γi<j fij . The factor fij appears in the product if there is a link in your
(now labeled) cluster between sites i and j.

• The symmetry factor sγ is the number of elements of the symmetric group Snγ which leave the
Q
product γi<j fij invariant. The identity permutation leaves the product invariant, so sγ ≥ 1.

• The cluster integral is


Z γ
Y
1 1 d d
bγ (T ) ≡ · d x1 · · · d xn γ fij . (6.97)
sγ V
i<j

Due to translation invariance, bγ (T ) ∝ V 0 . One can therefore set xnγ ≡ 0, eliminate the volume
factor from the denominator, and perform the integral over the remaining nγ −1 coordinates.

• This procedure generates expansions for p(T, z) and n(T, z) in powers of the fugacity z = eβµ . To
obtain something useful like p(T, n), we invert the equation n = n(T, z) to find z = z(T, n), and
then substitute into the equation p = p(T, z) to obtain p = p T, z(T, n) = p(T, n). The result is
the virial expansion, n o
p = nkB T 1 + B2 (T ) n + B3 (T ) n2 + . . . , (6.98)
312 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

where
1 XZ Z γ
Y
Bk (T ) = − ddx1 · · · ddxk−1 fij (6.99)
k(k − 2)!
γ∈Γk hiji

with Γk the set of all one-particle irreducible j-site clusters.

6.3.7 Hard sphere gas in three dimensions

The hard sphere potential is given by


(
∞ if r ≤ a
u(r) = (6.100)
0 if r > a .

Here a is the diameter of the spheres. The corresponding Mayer function is then temperature independent,
and given by (
−1 if r ≤ a
f (r) = (6.101)
0 if r > a .
We can change variables Z
b− (T ) = 1
2 d3r f (r) = − 23 πa3 . (6.102)

The calculation of b△ is more challenging. We have


Z Z

b△ = 1
6 d ρ d3r f (ρ) f (r) f |r − ρ| .
3
(6.103)

We must first compute the volume of overlap for spheres of radius a (recall a is the diameter of the
constituent hard sphere particles) centered at 0 and at ρ:
Z

V = d3r f (r) f |r − ρ|
Za (6.104)
= 2 dz π(a2 − z 2 ) = 4π 3
3 a − πa2 ρ + π
12 ρ3 .
ρ/2

We then integrate over region |ρ| < a, to obtain

Za n o 2
b△ = − 16 · 4π dρ ρ2 · 4π 3 2
3 a − πa ρ +
π
12 ρ3 = − 5π 6
36 a . (6.105)
0

Thus, n o
2π 3 5π 2 6 2
p = nkB T 1 + 3 a n + 18 a n + O(n3 ) . (6.106)
6.3. NONIDEAL CLASSICAL GASES 313

Figure 6.11: The overlap of hard sphere Mayer functions. The shaded volume is V.

6.3.8 Weakly attractive tail

Suppose (
∞ if r ≤ a
u(r) = (6.107)
−u0 (r) if r > a .
Then the corresponding Mayer function is
(
−1 if r ≤ a
f (r) = βu (r)
(6.108)
e 0 − 1 if r > a .
Thus,
Z Z∞ h i
b− (T ) = 1
2 d3r f (r) = − 2π
3 a 3
+ 2π dr r 2
eβu0 (r)
− 1 . (6.109)
a
Thus, the second virial coefficient is
Z∞
2π 3 2π
B2 (T ) = −b− (T ) ≈ 3 a − dr r 2 u0 (r) , (6.110)
kB T
a

where we have assumed kB T ≪ u0 (r). We see that the second virial coefficient changes sign at some
temperature T0 , from a negative low temperature value to a positive high temperature value.

6.3.9 Spherical potential well

Consider an attractive spherical well potential with an infinitely repulsive core,



∞
 if r ≤ a
u(r) = −ǫ if a < r < R (6.111)


0 if r > R .
314 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Then the corresponding Mayer function is



−1
 if r ≤ a
f (r) = eβǫ − 1 if a < r < R (6.112)


0 if r > R .

Writing s ≡ R/a, we have


Z
B2 (T ) = −b− (T ) = d3r f (r)
− 21
 
1 4π 3 βǫ
 4π 3 3
=− (−1) · 3 a + e − 1 · 3 a (s − 1) (6.113)
2
 
2π 3 3 βǫ

= 3 a 1 − (s − 1) e − 1 .

To find the temperature T0 where B2 (T ) changes sign, we set B2 (T0 ) = 0 and obtain
  
s3
kB T0 = ǫ ln 3
. (6.114)
s −1

Recall in our study of the thermodynamics of the Joule-Thompson effect in §1.10.6 that the throttling
process is isenthalpic. The temperature change, when a gas is pushed (or escapes) through a porous plug
from a high pressure region to a low pressure one is

Zp2  
∂T
∆T = dp , (6.115)
∂p H
p1

where   "   #
∂T 1 ∂V
= T −V . (6.116)
∂p H Cp ∂T p

Appealing to the virial expansion, and working to lowest order in corrections to the ideal gas law, we
have
N N2
p= kB T + 2 kB T B2 (T ) + . . . (6.117)
V V

and we compute ∂V∂T p by seting

N kB T N kB 2N 2 N2 
0 = dp = − 2
dV + dT − 3
kB T B 2 (T ) dV + 2
d kB T B2 (T ) + . . . . (6.118)
V V V V

Dividing by dT , we find #"


 
∂V ∂B2
T −V =N T − B2 . (6.119)
∂T p ∂T
6.3. NONIDEAL CLASSICAL GASES 315

Figure 6.12: An attractive spherical well with a repulsive core u(r) and its associated Mayer function
f (r).


The temperature where ∂T ∗
∂p H changes sign is called the inversion temperature T . To find the inversion
point, we set T ∗ B2′ (T ∗ ) = B2 (T ∗ ), i.e.

d ln B2
=1. (6.120)
d ln T T ∗
B
If we approximate B2 (T ) ≈ A − T, then the inversion temperature follows simply:

B B 2B
=A− ∗ =⇒ T∗ = . (6.121)
T∗ T A

6.3.10 Hard spheres with a hard wall

Consider a hard sphere gas in three dimensions in the presence of a hard wall at z = 0. The gas is
confined to the region z > 0. The total potential energy is now
X X
W (x1 , . . . , xN ) = v(xi ) + u(xi − xj ) , (6.122)
i i<j

where (
∞ if z ≤ 12 a
v(r) = v(z) = (6.123)
0 if z > 12 a ,

and u(r) is given in eqn. 6.100. The grand potential is written as a series in the total particle number
N , and is given by
Z Z Z
−βΩ 3 −βv(z) ′ ′
Ξ=e = 1+ξ d re 1 2
+ 2 ξ d r d3r ′ e−βv(z) e−βv(z ) e−βu(r−r ) + . . . ,
3
(6.124)

where ξ = z λ−3T , with z = e


µ/kB T the fugacity. Taking the logarithm, and invoking the Taylor series
1 2 1 3
ln(1 + δ) = δ − 2 δ + 3 δ − . . ., we obtain
Z Z Z h i
3 ′
1 2 3
− βΩ = ξ d r + 2ξ dr d3r ′ e−βu(r−r ) − 1 + . . . (6.125)
z> a
2
z> a2 z ′ > a2
316 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Figure 6.13: In the presence of a hard wall, the Mayer sphere is cut off on the side closest to the wall.
The resulting density n(z) vanishes for z < 12 a since the center of each sphere must be at least one radius
( 12 a) away from the wall. Between z = 12 a and z = 32 a there is a density enhancement. If the calculation
were carried out to higher order, n(z) would exhibit damped spatial oscillations with wavelength λ ∼ a.

R
The volume is V = d3r. Dividing by V , we have, in the thermodynamic limit,
z>0

Z Z h i
βΩ 1 ′
− = βp = ξ + 12 ξ 2 d3r d3r ′ e−βu(r−r ) − 1 + . . .
V V
z> a2 z ′ > a2 (6.126)
= ξ − 23 πa3 ξ 2 + O(ξ 3 ) .

The number density is



n=ξ (βp) = ξ − 43 πa3 ξ 2 + O(ξ 3 ) , (6.127)
∂ξ

and inverting to obtain ξ(n) and then substituting into the pressure equation, we obtain the lowest order
virial expansion for the equation of state,
n o
p = kB T n + 23 πa3 n2 + . . . . (6.128)

As expected, the presence of the wall does not affect a bulk property such as the equation of state.
Next, let us compute the number density n(z), given by


X
n(z) = δ(r − ri ) . (6.129)
i

Due to translational invariance in the (x, y) plane, we know that the density must be a function of z
alone. The presence of the wall at z = 0 breaks translational symmetry in the z direction. The number
6.3. NONIDEAL CLASSICAL GASES 317

density is

 N
X 
β(µN̂ −Ĥ)
n(z) = Tr e δ(r − ri ) Tr eβ(µN̂ −Ĥ)
i=1
( Z )
=Ξ −1
ξe −βv(z)
+ξ e2 −βv(z) 3 ′ −βv(z ′ ) −βu(r−r′ )
dr e e + ... (6.130)

Z h i
−βv(z) 2 −βv(z) ′ ′
= ξe +ξ e d3r ′ e−βv(z ) e−βu(r−r ) − 1 + . . . .


Note that the term in square brackets in the last line is the Mayer function f (r − r ′ ) = e−βu(r−r ) − 1.
Consider the function


0 if z < 12 a or z ′ < 21 a

e−βv(z) e−βv(z ) f (r − r ′ ) = 0 if |r − r ′ | > a (6.131)


−1 if z > 12 a and z ′ > 21 a and |r − r ′ | < a .

Now consider the integral of the above function with respect to r ′ . Clearly the result depends on the
value of z. If z > 32 a, then there is no excluded region in r ′ and the integral is (−1) times the full Mayer
sphere volume, i.e. − 34 πa3 . If z < 21 a the integral vanishes due to the e−βv(z) factor. For z infinitesimally
 
larger than 21 a, the integral is (−1) times half the Mayer sphere volume, i.e. − 23 πa3 . For z ∈ a2 , 3a 2 the
integral interpolates between − 32 πa3 and − 34 πa3 . Explicitly, one finds by elementary integration,

Z 
0h
 if z < 12 a

3 ′ −βv(z) −βv(z ) ′
  i
1 3
dr e e f (r − r ) = −1 − 32 z
a − 1
2 + 1 z
2 a − 2 · 23 πa3 if 12 a < z < 23 a (6.132)


− 4 πa3 if z > 23 a .
3

After substituting ξ = n + 43 πa3 n2 + O(n3 ) to relate ξ to the bulk density n = n∞ , we obtain the desired
result:


0 h
 i if z < 21 a
  3
n(z) = n + 1 − 32 az − 21 + 21 za − 21 · 23 πa3 n2 if 21 a < z < 32 a (6.133)


n if z > 3 a . 2

A sketch is provided in the right hand panel of fig. 6.13. Note that the density n(z) vanishes identically
for z < 12 due to the exclusion of the hard spheres by the wall. For z between 21 a and 23 a, there is a
density enhancement, the origin of which has a simple physical interpretation. Since the wall excludes
particles from the region z < 21 , there is an empty slab of thickness 12 z coating the interior of the wall.
There are then no particles in this region to exclude neighbors to their right, hence the density builds
up just on the other side of this slab. The effect vanishes to the order of the calculation past z = 32 a,
where n(z) = n returns to its bulk value. Had we calculated to higher order, we’d have found damped
oscillations with spatial period λ ∼ a.
318 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

6.4 Lee-Yang Theory

6.4.1 Analytic properties of the partition function

How can statistical mechanics describe phase transitions? This question was addressed in some beautiful
mathematical analysis by Lee and Yang7 . Consider the grand partition function Ξ,

X
Ξ(T, V, z) = z N QN (T, V ) λ−dN
T , (6.134)
N =0

where Z Z
1
QN (T, V ) = dd x1 · · · dd xN e−U (x1 , ... , xN )/kB T (6.135)
N!
is the contribution to the N -particle partition function from the potential energy U (assuming no
momentum-dependent potentials). For two-body central potentials, we have
X 
U (x1 , . . . , xN ) = v |xi − xj | . (6.136)
i<j

Suppose further that these classical particles have hard cores. Then for any finite volume, there must be
some maximum number NV such that QN (T, V ) vanishes for N > NV . This is because if N > NV at
least two spheres must overlap, in which case the potential energy is infinite. The theoretical maximum
packing density for hard spheres is achieved for a hexagonal close packed (HCP) lattice8 , for which
π

fHCP = 3√ 2
= 0.74048. If the spheres have radius r0 , then NV = V /4 2r03 is the maximum particle
number.
Thus, if V itself is finite, then Ξ(T, V, z) is a finite degree polynomial in z, and may be factorized as

NV NV  
X Y z
Ξ(T, V, z) = N
z QN (T, V ) λ−dN
T = 1− , (6.137)
zk
N =0 k=1

where zk (T, V ) is one of the NV zeros of the grand partition function. Note that the O(z 0 ) term is fixed to
be unity. Note also that since the configuration integrals QN (T, V ) are all positive, Ξ(z) is an increasing
function along the positive real z axis. In addition, since the coefficients of z N in the polynomial Ξ(z)
are all real, then Ξ(z) = 0 implies Ξ(z) = Ξ(z̄) = 0, so the zeros of Ξ(z) are either real and negative or
else come in complex conjugate pairs.
For finite NV , the situation is roughly as depicted in the left panel of fig. 6.14, with a set of NV zeros
arranged in complex conjugate pairs (or negative real values). The zeros aren’t necessarily distributed
along a circle as shown in the figure, though. They could be anywhere, so long as they are symmetrically
distributed about the Re(z) axis, and no zeros occur for z real and nonnegative.
7
See C. N. Yang and R. D. Lee, Phys. Rev. 87, 404 (1952) and ibid, p. 410
8
See e.g. http://en.wikipedia.org/wiki/Close-packing. For randomly close-packed hard spheres, one finds, from
numerical simulations, fRCP = 0.644.
6.4. LEE-YANG THEORY 319

Figure 6.14: In the thermodynamic limit, the grand partition function can develop a singularity at
positive real fugacity z. The set of discrete zeros fuses into a branch cut.

Lee and Yang proved the existence of the limits


p 1
= lim ln Ξ(T, V, z)
kB T V →∞ V
  (6.138)
∂ 1
n = lim z ln Ξ(T, V, z) ,
V →∞ ∂z V

and notably the result  


∂ p
n=z , (6.139)
∂z kB T
which amounts to the commutativity of the thermodynamic limit V → ∞ with the differential operator

z ∂z . In particular, p(T, z) is a smooth function of z in regions free of roots. If the roots do coalesce and
pinch the positive real axis, then then density n can be discontinuous, as in a first order phase transition,
or a higher derivative ∂ j p/∂nj can be discontinuous or divergent, as in a second order phase transition.

6.4.2 Electrostatic analogy

There is a beautiful analogy to the theory of two-dimensional electrostatics. We write


NV  
p 1 X z
= ln 1 −
kB T V zk
k=1
(6.140)
NV h i
X
=− φ(z − zk ) − φ(0 − zk ) ,
k=1

where
1
φ(z) = − ln(z) (6.141)
V
320 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

is the complex potential due to a line charge of linear density λ = V −1 located at origin. The number
density is then
  NV
∂ p ∂ X
n=z = −z φ(z − zk ) , (6.142)
∂z kB T ∂z
k=1
to be evaluated for physical values of z, i.e. z ∈ R+ . Since φ(z) is analytic,
∂φ 1 ∂φ i ∂φ
= + =0. (6.143)
∂ z̄ 2 ∂x 2 ∂y
If we decompose the complex potential φ = φ1 + iφ2 into real and imaginary parts, the condition of
analyticity is recast as the Cauchy-Riemann equations,
∂φ1 ∂φ2 ∂φ1 ∂φ
= , =− 2 . (6.144)
∂x ∂y ∂y ∂x
Thus,
∂φ 1 ∂φ i ∂φ
− =− +
∂z 2 ∂x 2 ∂y
   
1 ∂φ1 ∂φ2 i ∂φ1 ∂φ2
=− + + − (6.145)
2 ∂x ∂y 2 ∂y ∂x
∂φ ∂φ
= − 1 + i 1 = Ex − iEy ,
∂x ∂y
where E = −∇φ1 is the electric field. Suppose, then, that as V → ∞ a continuous charge distribution
develops, which crosses the positive real z axis at a point x ∈ R+ . Then
n+ − n−
= Ex (x+ ) − Ex (x− ) = 4πσ(x) , (6.146)
x
where σ is the linear charge density (assuming logarithmic two-dimensional potentials), or the two-
dimensional charge density (if we extend the distribution along a third axis).

6.4.3 Example

As an example, consider the function


(1 + z)M (1 − z M )
Ξ(z) =
1−z (6.147)

= (1 + z)M 1 + z + z 2 + . . . + z M −1 .
The (2M −1) degree polynomial has an M th order zero at z = −1 and (M −1) simple zeros at z = e2πik/M ,
where k ∈ {1, . . . , M −1}. Since M serves as the maximum particle number NV , we may assume that
V = M v0 , and the V → ∞ limit may be taken as M → ∞. We then have
p 1
= lim ln Ξ(z)
kB T V →∞ V
1 1
= lim ln Ξ(z) (6.148)
v0 M →∞ M
 
1 1 
= lim M ln(1 + z) + ln 1 − z M − ln(1 − z) .
v0 M →∞ M
6.4. LEE-YANG THEORY 321

Figure 6.15: Fugacity z and pv0 /kB T versus dimensionless specific volume v/v0 for the example problem
discussed in the text.

The limit depends on whether |z| > 1 or |z| < 1, and we obtain

 |z| < 1
p v0 ln(1 + z)
 if
= h i (6.149)
kB T 

 ln(1 + z) + ln z if |z| > 1 .

Thus,

 1 z
   · if |z| < 1
∂ p  v0 1+z
n=z = h i (6.150)
∂z kB T 

 1
· z
+1 if |z| > 1 .
v0 1+z

If we solve for z(v), where v = n−1 , we find



 v0

 v−v0 if v > 2v0
z= (6.151)

 v0 −v
 if 1 2
2v−v0 2 v0 <v< 3 v0 .

We then obtain the equation of state,


  
 v

 ln v−v0 if v > 2v0




p v0 
= ln 2 2 (6.152)
if 3 v0 < v < 2v0
kB T 



 
 

ln v(v0 −v) 1
(2v−v )2 if 2 v0 < v < 32 v0 .
0
322 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

6.5 Liquid State Physics

6.5.1 The many-particle distribution function

The virial expansion is typically applied to low-density systems. When the density is high, i.e. when
na3 ∼ 1, where a is a typical molecular or atomic length scale, the virial expansion is impractical. There
are to many terms to compute, and to make progress one must use sophisticated resummation techniques
to investigate the high density regime.
To elucidate the physics of liquids, it is useful to consider the properties of various correlation functions.
These objects are derived from the general N -body Boltzmann distribution,
 −1
1
ZN ·
 N! e−β ĤN (p,x) OCE
f (x1 , . . . , xN ; p1 , . . . , pN ) = (6.153)

 −1 1 βµN
Ξ · N! e e−β ĤN (p,x) GCE .

We assume a Hamiltonian of the form


N
X p2i
ĤN = + W (x1 , . . . , xN ). (6.154)
2m
i=1

The quantity
ddx1 ddp1 ddxN ddpN
f (x1 , . . . , xN ; p1 , . . . , pN ) · · · (6.155)
hd hd
is the propability of finding N particles in the system, with particle #1 lying within d3x1 of x1 and
having momentum within ddp1 of p1 , etc. If we compute averages of quantities which only depend on the
positions {xj } and not on the momenta {pj }, then we may integrate out the momenta to obtain, in the
OCE,
1 −βW (x1 , ... , x )
P (x1 , . . . , xN ) = Q−1
N · N! e
N , (6.156)

where W is the total potential energy,


X X X
W (x1 , . . . , xN ) = v(xi ) + u(xi − xj ) + w(xi − xj , xj − xk ) + . . . , (6.157)
i i<j i<j<k

and QN is the configuration integral,


Z Z
1 d
QN (T, V ) = d x1 · · · ddxN e−βW (x1 , ... , xN ) . (6.158)
N!

We will, for the most part, consider only two-body central potentials as contributing to W , which is to
say we will only retain the middle term on the RHS. Note that P (x1 , . . . , xN ) is invariant under any
permutation of the particle labels.
6.5. LIQUID STATE PHYSICS 323

6.5.2 Averages over the distribution

To compute an average, one integrates over the distribution:


Z Z


F (x1 , . . . , xN ) = d x1 · · · ddxN P (x1 , . . . , xN ) F (x1 , . . . , xN ) .
d
(6.159)

The overall N -particle probability density is normalized according to


Z
ddxN P (x1 , . . . , xN ) = 1 . (6.160)

The average local density is



X
n1 (r) = δ(r − xi )
Zi Z (6.161)
= N ddx2 · · · ddxN P (r, x2 , . . . , xN ) .

Note that the local density obeys the sum rule


Z
ddr n1 (r) = N . (6.162)

In a translationally invariant system, n1 = n = N V is a constant independent of position. The bound-


aries of a system will in general break translational invariance, so in order to maintain the notion of a
translationally invariant system of finite total volume, one must impose periodic boundary conditions.
The two-particle density matrix n2 (r1 , r2 ) is defined by

X
n2 (r1 , r2 ) = δ(r1 − xi ) δ(r2 − xj )
i6=j
Z Z (6.163)
= N (N − 1) d x3 · · · ddxN P (r1 , r2 , x3 , . . . , xN ) .
d

As in the case of the one-particle density matrix, i.e. the local density n1 (r), the two-particle density
matrix satisfies a sum rule: Z Z
d
d r1 ddr2 n2 (r1 , r2 ) = N (N − 1) . (6.164)

Generalizing further, one defines the k-particle density matrix as



X′
nk (r1 , . . . , rk ) = δ(r1 − xi1 ) · · · δ(rk − xi )
k
i1 ···ik
Z Z (6.165)
N!
= d xk+1 · · · ddxN P (r1 , . . . , rk , xk+1 , . . . , xN ) ,
d
(N − k)!
where the prime on the sum indicates that all the indices i1 , . . . , ik are distinct. The corresponding sum
rule is then Z Z
N!
ddr1 · · · ddrk nk (r1 , . . . , rk ) = . (6.166)
(N − k)!
324 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

The average potential energy can be expressed in terms of the distribution functions. Assuming only
two-body interactions, we have

X
hW i = u(xi − xj )
i<j
Z Z

X
= 1
2 ddr1 ddr2 u(r1 − r2 ) δ(r1 − xi ) δ(r2 − xj ) (6.167)
i6=j
Z Z
= 1
2
d
d r1 ddr2 u(r1 − r2 ) n2 (r1 , r2 ) .

As the separations rij = |ri − rj | get large, we expect the correlations to vanish, in which case

X′
nk (r1 , . . . , rk ) = δ(r1 − xi1 ) · · · δ(rk − xi )
k
i1 ···ik
X ′


−−−−→ δ(r1 − xi1 ) · · · δ(rk − xi )
rij →∞ k
i1 ···ik
(6.168)
N! 1
= · k n1 (r1 ) · · · n1 (rk )
(N − k)! N
    
1 2 k−1
= 1− 1− ··· 1 − n1 (r1 ) · · · n1 (rk ) .
N N N

The k-particle distribution function is defined as the ratio


nk (r1 , . . . , rk )
gk (r1 , . . . , rk ) ≡ . (6.169)
n1 (r1 ) · · · n1 (rk )

For large separations, then,


Y
k−1
j

gk (r1 , . . . , rk ) −−−−→ 1− . (6.170)
rij →∞ N
j=1

For isotropic systems, the two-particle distribution function g2 (r1 , r2 ) depends only on the magnitude
|r1 − r2 |. As a function of this scalar separation, the function is known as the radial distribution function:
1
X
g(r) ≡ g2 (r) = 2
δ(r − xi ) δ(xj )
n
i6=j
(6.171)
1
X
= 2
δ(r − xi + xj ) .
Vn
i6=j

The radial distribution function is of great importance in the physics of liquids because

• thermodynamic properties of the system can be related to g(r)

• g(r) is directly measurable by scattering experiments


6.5. LIQUID STATE PHYSICS 325

Figure 6.16: Pair distribution functions for hard spheres of diameter a at filling fraction η = π6 a3 n = 0.49
(left) and for liquid Argon at T = 85 K (right). Molecular dynamics data for hard spheres (points) is
compared with the result of the Percus-Yevick approximation (see below in §6.5.8). Reproduced (without
permission) from J.-P. Hansen and I. R. McDonald, Theory of Simple Liquids, fig 5.5. Experimental
data on liquid argon are from the neutron scattering work of J. L. Yarnell et al., Phys. Rev. A 7, 2130
(1973). The data (points) are compared with molecular dynamics calculations by Verlet (1967) for a
Lennard-Jones fluid.

For example, in an isotropic system the average potential energy is given by


Z Z
hW i = 2 d r1 ddr2 u(r1 − r2 ) n2 (r1 , r2 )
1 d

Z Z

1 2 d
= 2 n d r1 ddr2 u(r1 − r2 ) g |r1 − r2 | (6.172)
Z
N2
= ddr u(r) g(r) .
2V
For a three-dimensional system, the average internal (i.e. potential) energy per particle is
Z∞
hW i
= 2πn dr r 2 g(r) u(r) . (6.173)
N
0

Intuitively, f (r) dr ≡ 4πr 2 n g(r) dr is the average number of particles lying at a radial distance between
r and r + dr from a given reference particle. The total potential energy of interaction with the reference
particle is then f (r) u(r) dr. Now integrate over all r and divide by two to avoid double-counting. This
recovers eqn. 6.173.
In the OCE, g(r) obeys the sum rule
Z
V V
ddr g(r) = 2 · N (N − 1) = V − , (6.174)
N N
326 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

hence Z
 
n ddr g(r) − 1 = −1 (OCE) . (6.175)

The function h(r) ≡ g(r) − 1 is called the pair correlation function.


In the grand canonical formulation, we have
Z
"
#
N N (N − 1)
n d3r h(r) = · V −V
V hN i2

2
2
N − N (6.176)
=
−1
N
= nkB T κT − 1 (GCE) ,

where κT is the isothermal compressibility. Note that in an ideal gas we have h(r) = 0 and κT = κ0T ≡
1/nkB T . Self-condensed systems, such asRliquids and solids far from criticality, are nearly incompressible,
hence 0 < nkB T κT ≪ 1, and therefore n d3r h(r) ≈ −1. For incompressible systems, where κT = 0, this
becomes an equality.
As we shall see below in §6.5.4, the function h(r), or rather its Fourier transform ĥ(k), is directly measured
in a scattering experiment. The question then arises as to which result applies: the OCE result from eqn.
6.175 or the GCE result from eqn. 6.176. The answer is that under almost all experimental conditions it is
the GCE result which applies. The reason for this is that the scattering experiment typically illuminates
only a subset of the entire system. This subsystem is in particle equilibrium with the remainder of the
system, hence it is appropriate to use the grand canonical ensemble. The OCE results would only apply
if the scattering experiment were to measure the entire system.

6.5.3 Virial equation of state

The virial of a mechanical system is defined to be


X
G= xi · F i , (6.177)
i

where Fi is the total force acting on particle i. If we average G over time, we obtain
ZT X
1
hGi = lim dt xi · F i
T →∞ T
0 i

ZT X (6.178)
1
= − lim dt m ẋ2i
T →∞ T
0 i

= −3N kB T .
Here, we have made use of
d 
xi · Fi = m xi · ẍi = − m ẋ2i + m xi · ẋi , (6.179)
dt
6.5. LIQUID STATE PHYSICS 327

Figure 6.17: Monte Carlo pair distribution functions for liquid water. From A. K. Soper, Chem Phys.
202, 295 (1996).

as well as ergodicity and equipartition of kinetic energy. We have also assumed three space dimensions.
In a bounded system, there are two contributions to the force Fi . One contribution is from the surfaces
which enclose the system. This is given by9

X (surf)
hGisurfaces = xi · F i = −3pV . (6.180)
i

The remaining contribution is due to the interparticle forces. Thus,


p N 1
X
= − xi · ∇i W . (6.181)
kB T V 3V kB T
i

Invoking the definition of g(r), we have


 
 Z∞ 
2πn
p = nkB T 1 − dr r 3 g(r) u′ (r) . (6.182)
 3kB T 
0

As an alternate derivation, consider the First Law of Thermodynamics,


dΩ = −S dT − p dV − N dµ , (6.183)
from which we derive    
∂Ω ∂F
p=− =− . (6.184)
∂V T,µ ∂V T,N
9
To derive this expression, note that F (surf) is directed inward and vanishes away from the surface. Each Cartesian
(surf) (surf)
direction α = (x, y, z) then contributes −Fα Lα , where Lα is the corresponding linear dimension. But Fα = pAα ,
where Aα is the area of the corresponding face and p. is the pressure. Summing over the three possibilities for α, one obtains
eqn. 6.180.
328 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Now let V → ℓ3 V , where ℓ is a scale parameter. Then



∂Ω 1 ∂
p=− =− Ω(T, ℓ3 V, µ) . (6.185)
∂V 3V ∂ℓ
ℓ=1
Now

X Z Z
3 1 βµN −3N 3
Ξ(T, ℓ V, µ) = e λT d x1 · · · d3xN e−βW (x1 , ... , xN )
N!
N =0
ℓ3 V ℓ3 V
Z Z (6.186)
1  βµ −3 N 3N 3

X
= e λT ℓ d x1 · · · d3xN e−βW (ℓx1 , ... , ℓxN )
N!
N =0 V V

Thus,

1 ∂Ω(ℓ3 V ) kB T 1 ∂Ξ(ℓ3 V )
p=− =
3V ∂ℓ 3V Ξ ∂ℓ
ℓ=1
 " #
 Z Z
∂W 

X X
k T 1 1 N (6.187)
= B zλ−3
T d3
x 1 · · · d3
x N e−βW (x1 , ... , xN )
3N − β x i ·
3V Ξ N!  ∂xi 
N =0 V V i

1 D ∂W E
= nkB T − .
3V ∂ℓ ℓ=1
P
Finally, from W = i<j u(ℓxij ) we have
D ∂W E X
= xij · ∇u(xij )
∂ℓ ℓ=1
i<j
Z∞ (6.188)
2πN 2
= dr r 3 g(r) u′ (r) ,
V
0

and hence
Z∞
2
p = nkB T − 2
3 πn dr r 3 g(r) u′ (r) . (6.189)
0
Note that the density n enters the equation of state explicitly on the RHS of the above equation, but also
implicitly through the pair distribution function g(r), which has implicit dependence on both n and T .

6.5.4 Correlations and scattering

Consider the scattering of a light or particle beam (i.e. photons or neutrons) from a liquid. We label the
states of the beam particles by their wavevector k and we assume a general dispersion εk . For photons,
εk = ~c|k|, while for neutrons εk = ~2 k2 /2mn . We assume a single scattering process with the liquid,
during which the total momentum and energy of the liquid plus beam are conserved. We write
k′ = k + q
(6.190)
εk′ = εk + ~ω ,
6.5. LIQUID STATE PHYSICS 329

Figure 6.18: In a scattering experiment, a beam of particles interacts with a sample and the beam
particles scatter off the sample particles. A momentum ~q and energy ~ω are transferred to the beam
particle during such a collision. If ω = 0, the scattering is said to be elastic. For ω 6= 0, the scattering is
inelastic.

where k′ is the final state of the scattered beam particle. Thus, the fluid transfers momentum ∆p = ~q
and energy ~ω to the beam.
Now consider the scattering process between an initial state | i, k i and a final state | j, k′ i, where these
states describe both the beam and the liquid. According to Fermi’s Golden Rule, the scattering rate is
2π 2
Γik→jk′ = h j, k′ | V | i, k i δ(Ej − Ei + ~ω) , (6.191)
~
where V is the scattering potential and Ei is the initial internal energy of the liquid. If r is the position
of the beam particle and {xl } are the positions of the liquid particles, then

N
X
V(r) = v(r − xl ) . (6.192)
l=1

The differential scattering cross section (per unit frequency per unit solid angle) is

∂ 2σ ~ g(εk′ ) X
= Pi Γik→jk′ , (6.193)
∂Ω ∂ω 4π |vk |
i,j

where Z
ddk
g(ε) = δ(ε − εk ) (6.194)
(2π)d
is the density of states for the beam particle and
1 −βEi
Pi = e . (6.195)
Z
330 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Consider now the matrix element


N Z


1 X ′
j, k′ V i, k = j ddrei(k−k )·r v(r − xl ) i
V
l=1
(6.196)
N
1
X
= v̂(q) j e−iq·xl i ,
V
l=1

where we have assumed that the incident and scattered beams are plane waves. We then have
N
∂ 2σ ~ g(εk+q ) |v̂(q)|2 X X
X

2
= 2
Pi j e−iq·xl i δ(Ej − Ei + ~ω)
∂Ω ∂ω 2 |∇ ε | V
k k i j l=1
(6.197)
g(εk+q ) N
= 2
|v̂(q)|2 S(q, ω) ,
4π |∇ ε | V
k k

where S(q, ω) is the dynamic structure factor ,


N
2π~ X X
X
j
2
S(q, ω) = Pi e−iq·xl i δ(Ej − Ei + ~ω) (6.198)
N
i j l=1

Note that for an arbitrary operator A,



X
2 XZ


j A i δ(Ej − Ei + ~ω) = 1 dt ei(Ej −Ei +~ω) t/~ i A† j j A i
2π~
j j −∞
Z∞
1 X

iĤt/~
= dt eiωt i A† j j e A e−iĤt/~ i (6.199)
2π~
j −∞
Z∞
1

= dt eiωt i A† (0) A(t) i .
2π~
−∞

Thus,
Z∞ X
X
1
S(q, ω) = dt eiωt Pi i eiq·xl (0) e−iq·xl′ (t) i
N
−∞ i l,l′
(6.200)
Z∞
1
X iq·x (0) −iq·x (t)
= dt eiωt e l e l′ ,
N
−∞ l,l′

where the angular brackets in the last line denote a thermal expectation value of a quantum mechanical
operator. If we integrate over all frequencies, we obtain the equal time correlator,
Z∞
dω 1 X
iq·(xl −x ′ )
S(q) = S(q, ω) = e l
2π N ′
−∞ l,l (6.201)
Z
 
= N δq,0 + 1 + n ddr e−iq·r g(r) − 1 .
6.5. LIQUID STATE PHYSICS 331

Figure 6.19: Comparison of the static structure factor as determined by neutron scattering work of J.
L. Yarnell et al., Phys. Rev. A 7, 2130 (1973) with molecular dynamics calculations by Verlet (1967) for
a Lennard-Jones fluid.

known as the static structure factor 10 . Note that S(q = 0) = N , since all the phases eiq·(xi −xj ) are then
unity. As q → ∞, the phases oscillate rapidly with changes in the distances |xi − xj |, and average out
to zero. However, the ‘diagonal’ terms in the sum, i.e. those with i = j, always contribute a total of 1 to
S(q). Therefore in the q → ∞ limit we have S(q → ∞) = 1.
In general, the detectors used in a scattering experiment are sensitive to the energy of the scattered beam
particles, although there is always a finite experimental resolution, both in q and ω. This means that
what is measured is actually something like
Z Z
Smeas (q, ω) = ddq ′ dω ′ F (q − q′ ) G(ω − ω ′ ) S(q′ , ω ′ ) , (6.202)

where F and G are essentially Gaussian functions of their argument, with width given by the experimental
resolution. If one integrates over all frequencies ω, i.e. if one simply counts scattered particles as a function
of q but without any discrimination of their energies, then one measures the static structure factor S(q).
Elastic scattering is determined by S(q, ω = 0, i.e. no energy transfer.

6.5.5 Correlation and response

Suppose an external potential v(x) is also present. Then


1 1 −βW (x1 , ... , x ) −β Pi v(xi )
P (x1 , . . . , xN ) = · e N e , (6.203)
QN [v] N !
10
We may write δq,0 = 1
V
(2π)d δ(q ).
332 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

where Z Z
1 P
QN [v] = ddx1 · · · ddxN e−βW (x1 , ... , xN ) e−β i v(xi ) . (6.204)
N!
The Helmholtz free energy is then
1  
F =− ln λ−dN
T QN [v] . (6.205)
β

Now consider the functional derivative

δF 1 1 δQN
=− · · . (6.206)
δv(r) β QN δv(r)

Using Z
X X
v(xi ) = ddr v(r) δ(r − xi ) , (6.207)
i i

hence
Z Z X
δF
= d x1 · · · ddxN P (x1 , . . . , xN )
d
δ(r − xi )
δv(r)
i
= n1 (r) , (6.208)

which is the local density at r.


Next, consider the response function,

δn1 (r) δ2F [v]


χ(r, r ′ ) ≡ =
δv(r ′ ) δv(r) δv(r ′ )
1 1 δQN δQN 1 1 δ2QN (6.209)
= · 2 − ·
β QN δv(r) δv(r ′ ) β QN δv(r) δv(r ′ )
= β n1 (r) n1 (r ′ ) − β n1 (r) δ(r − r ′ ) − β n2 (r, r ′ ) .

In an isotropic system, χ(r, r ′ ) = χ(r − r ′ ) is a function of the coordinate separation, and


−kB T χ(r − r ′ ) = −n2 + n δ(r − r ′ ) + n2 g |r − r ′ |
(6.210)

= n2 h |r − r ′ | + n δ(r − r ′ ) .

Taking the Fourier transform,

− kB T χ̂(q) = n + n2 ĥ(q) = n S(q) . (6.211)

We may also write


κT
= 1 + n ĥ(0) = −nkB T χ̂(0) , (6.212)
κ0T
6.5. LIQUID STATE PHYSICS 333

i.e. κT = −χ̂(0).
What does this all mean? Suppose we have an isotropic system which is subjected to a weak, spatially
inhomogeneous potential v(r). We expect that the density n(r) in the presence of the inhomogeneous
potential to itself be inhomogeneous. The first corrections to the v = 0 value n = n0 are linear in v, and
given by

Z
δn(r) = ddr ′ χ(r, r ′ ) v(r ′ )
Z (6.213)
= −βn0 v(r) − βn20 ddr ′ h(r − r) v(r ′ ) .

Note that if v(r) > 0 it becomes energetically more costly for a particle to be at r. Accordingly, the
density response is negative, and proportional to the ratio v(r)/kB T – this is the first term in the above
equation. If there were no correlations between the particles, then h = 0 and this would be the entire
story. However, the particles in general are correlated. Consider, for example, the case of hard spheres
of diameter a, and let there be a repulsive potential at r = 0. This means that it is less likely for a
particle to be centered anywhere within a distance a of the origin. But then it will be more likely to find
a particle in the next ‘shell’ of radial thickness a.

6.5.6 BBGKY hierarchy

The distribution functions satisfy a hierarchy of integro-differential equations known as the BBGKY
hierarchy 11 . In homogeneous systems, we have

Z Z
N! 1
gk (r1 , . . . , rk ) = d xk+1 · · · ddxN P (r1 , . . . , rk , xk+1 , . . . , xN ) ,
d
(6.214)
(N − k)! nk

where
1 1 −βW (x1 , ... , x )
P (x1 , . . . , xN ) = · e N . (6.215)
QN N !

Taking the gradient with respect to r1 , we have

Z Z
∂ 1 n−k P
gk (r1 , . . . , rk ) = · d xk+1 · · · ddxN e−β k<i<j u(xij )
d
∂r1 QN (N − k)!
 P P
 (6.216)
∂ −β i<j≤k u(rij ) −β i≤k<j u(ri −xj )
× e ·e ,
∂r1

11
So named after Bogoliubov, Born, Green, Kirkwood, and Yvon.
334 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

P
where k<i<j means to sum on indices i and j such that i < j and k < i, i.e.

X N
X −1 N
X 
u(xij ) ≡ u xi − xj
k<i<j i=k+1 j=i+1

X k−1 X
X k

u(rij ) ≡ u ri − rj
i<j≤k i=1 j=i+1

X k
X N
X
u(ri − xj ) = u(ri − xj ) .
i≤k<j i=1 j=k+1

Now
 P P

∂ −β i<j≤k u(rij ) −β i≤k<j u(ri −xj )
e ·e =
∂r1
( )   (6.217)
X ∂u(r1 − rj ) X ∂u(r1 − rj ) P P
−β i<j≤k u(rij ) −β i≤k<j u(ri −xj )
β + · e ·e ,
∂r1 ∂r1
1<j≤k k<j

hence

Xk
∂ ∂u(r1 − rj )
gk (r1 , . . . , rk ) = −β gk (r1 , . . . , rk )
∂r1 ∂r1
j=2
Z
∂u(r1 − xk+1 )
− β(N − k) ddxk+1 P (r1 , . . . , rk , xk+1 , . . . , xN )
∂r1
(6.218)
k
X ∂u(r1 − rj )
= −β gk (r1 , . . . , rk )
∂r1
j=2
Z
∂u(r1 − xk+1 )
+ n ddxk+1 gk+1 (r1 , . . . , rk , xk+1 ) .
∂r1

Thus, we obtain the BBGKY hierarchy:

k
X ∂u(r1 − rj )

−kB T gk (r1 , . . . , rk ) = gk (r1 , . . . , rk )
∂r1 ∂r1
j=2 (6.219)
Z
∂u(r1 − r ′ )
+ n ddr ′ gk+1 (r1 , . . . , rk , r ′ ) .
∂r1

The BBGKY hierarchy is an infinite tower of coupled integro-differential equations, relating gk to gk+1
for all k. If we approximate gk at some level k in terms of equal or lower order distributions, then we
obtain a closed set of equations which in principle can be solved, at least numerically. For example, the
Kirkwood approximation closes the hierarchy at order k = 2 by imposing the condition

g3 (r1 , r2 , r3 ) ≡ g(r1 − r2 ) g(r1 − r3 ) g(r2 − r2 ) . (6.220)


6.5. LIQUID STATE PHYSICS 335

This results in the single integro-differential equation


Z
− kB T ∇g(r) = g(r) ∇u + n ddr ′ g(r) g(r ′ ) g(r − r ′ ) ∇u(r − r ′ ) . (6.221)

This is known as the Born-Green-Yvon (BGY) equation. In practice, the BGY equation, which is solved
numerically, gives adequate results only at low densities.

6.5.7 Ornstein-Zernike theory

The direct correlation function c(r) is defined by the equation


Z
h(r) = c(r) + n d3r ′ h(r − r ′ ) c(r ′ ) , (6.222)

where h(r) = g(r) − 1 and we assume an isotropic system. This is called the Ornstein-Zernike equation.
The first term, c(r), accounts for local correlations, which are then propagated in the second term to
account for long-ranged correlations.
The OZ equation is an integral equation, but it becomes a simple algebraic one upon Fourier transforming:

ĥ(q) = ĉ(q) + n ĥ(q) ĉ(q) , (6.223)

the solution of which is


ĉ(q)
ĥ(q) = . (6.224)
1 − n ĉ(q)
The static structure factor is then
1
S(q) = 1 + n ĥ(q) = . (6.225)
1 − n ĉ(q)

In the grand canonical ensemble, we can write

1 + n ĥ(0) 1 1 κ0T
κT = = · =⇒ n ĉ(0) = 1 − , (6.226)
nkB T nkB T 1 − n ĉ(0) κT

where κ0T = 1/nkB T is the ideal gas isothermal compressibility.


At this point, we have merely substituted one unknown function, h(r), for another, namely c(r). To close
the system, we need to relate c(r) to h(r) again in some way. There are various approximation schemes
which do just this.

6.5.8 Percus-Yevick equation

In the Percus-Yevick approximation, we take


 
c(r) = 1 − eβu(r) · g(r) . (6.227)
336 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Note that c(r) vanishes whenever the potential u(r) itself vanishes. This results in the following integro-
differential equation for the pair distribution function g(r):
Z
−βu(r) −βu(r)
   ′ 
g(r) = e + ne d3r ′ g(r − r ′ ) − 1 · 1 − eβu(r ) g(r ′ ) . (6.228)

This is the Percus-Yevick equation. Remarkably, the Percus-Yevick (PY) equation can be solved analyt-
ically for the case of hard spheres, where u(r) = ∞ for r ≤ a and u(r) = 0 for r > a, where a is the hard
sphere diameter. Define the function y(r) = eβu(r) g(r), in which case
(
−y(r) , r ≤ a
c(r) = y(r) f (r) = (6.229)
0 , r>a.

Here, f (r) = e−βu(r) − 1 is the Mayer function. We remark that the definition of y(r) may cause some
concern for the hard sphere system, because of the eβu(r) term, which diverges severely for r ≤ a. However,
g(r) vanishes in this limit, and their product y(r) is in fact finite! The PY equation may then be written
for the function y(r) as
Z Z
3 ′ ′
y(r) = 1 + n d r y(r ) − n d3r ′ y(r ′ ) y(r − r ′ ) . (6.230)
r ′ <a r ′ <a
|r −r ′ |>a

This has been solved using Laplace transform methods by M. S. Wertheim, J. Math. Phys. 5, 643 (1964).
The final result for c(r) is
 r  r 3 
1
c(r) = − λ1 + 6η λ2 + 2 η λ1 · Θ(a − r) , (6.231)
a a
where η = 16 πa3 n is the packing fraction and
(1 + 2η)2 (1 + 21 η)2
λ1 = , λ2 = − . (6.232)
(1 − η)4 (1 − η)4
This leads to the equation of state
1 + η + η2
p = nkB T ·. (6.233)
(1 − η)3
This gets B2 and B3 exactly right. The accuracy of the PY approximation for higher order virial
coefficients is shown in table 6.1.
To obtain the equation of state from eqn. 6.231, we invoke the compressibility equation,
 
∂n 1
nkB T κT = = . (6.234)
∂p T 1 − n ĉ(0)
We therefore need
Z
ĉ(0) = d3r c(r)
Z1
  (6.235)
= −4πa dx x2 λ1 + 6 η λ2 x + 21 η λ1 x3
3

0

3 1

= −4πa 3 λ1 + 23 η λ2 + 1
12 η λ1 .
6.5. LIQUID STATE PHYSICS 337

quantity exact PY HNC


B4 /B23 0.28695 0.2969 0.2092
B5 /B24 0.1103 0.1211 0.0493
B6 /B25 0.0386 0.0281 0.0449
B7 /B26 0.0138 0.0156 –

Table 6.1: Comparison of exact (Monte Carlo) results to those of the Percus-Yevick (PY) and hypernetted
chains approximation (HCA) for hard spheres in three dimensions. Sources: Hansen and McDonald (1990)
and Reichl (1998)

With η = 61 πa3 n and using the definitions of λ1,2 in eqn. 6.232, one finds

1 + 4η + 4η 2
1 − n ĉ(0) = . (6.236)
(1 − η)4
We then have, from the compressibility equation,
6kB T ∂p 1 + 4η + 4η 2
3
= . (6.237)
πa ∂η (1 − η)4
Integrating, we obtain p(η) up to a constant. The constant is set so that p = 0 when n = 0. The result
is eqn. 6.233.
Another commonly used scheme is the hypernetted chains (HNC) approximation, for which
 
c(r) = −βu(r) + h(r) − ln 1 + h(r) . (6.238)

The rationale behind the HNC and other such approximation schemes is rooted in diagrammatic ap-
proaches, which are extensions of the Mayer cluster expansion to the computation of correlation func-
tions. For details and references to their application in the literature, see Hansen and McDonald (1990)
and Reichl (1998).

6.5.9 Ornstein-Zernike approximation at long wavelengths

Let’s expand the direct correlation function ĉ(q) in powers of the wavevector q, viz.
ĉ(q) = ĉ(0) + c2 q 2 + c4 q 4 + . . . . (6.239)
Here we have assumed spatial isotropy. Then
1
1 − n ĉ(q) = = 1 − n ĉ(0) − n c2 q 2 + . . .
S(q) (6.240)
≡ ξ −2 R2 + q 2 R2 + O(q 4 ) ,
where
Z∞
2
R = −n c2 = 2πn dr r 4 c(r) (6.241)
0
338 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

and R∞
−2 1 − n ĉ(0) 1 − 4πn 0 dr r 2 c(r)
ξ = = R∞ . (6.242)
R2 2πn 0 dr r 4 c(r)
The quantity R(T ) tells us something about the effective range of the interactions, while ξ(T ) is the
correlation length. As we approach a critical point, the correlation length diverges as a power law:

ξ(T ) ∼ A|T − Tc |−ν . (6.243)

The susceptibility is given by

nβR−2
χ̂(q) = −nβ S(q) = − (6.244)
ξ −2 + q 2 + O(q 4 )

In the Ornstein-Zernike approximation, one drops the O(q 4 ) terms in the denominator and retains only
the long wavelength behavior. in the direct correlation function. Thus,

nβR−2
χ̂OZ (q) = − . (6.245)
ξ −2 + q 2
We now apply the inverse Fourier transform back to real space to obtain χOZ (r). In d = 1 dimension the
result can be obtained exactly:
Z∞
n dq eiqx
χd=1 (x) = −
OZ

kB T R2 2π ξ + q 2
−2
−∞ (6.246)

=− e−|x|/ξ .
2kB T R2

In higher dimensions d > 1 we can obtain the result asymptotically in two limits:

• Take r → ∞ with ξ fixed. Then


  
ξ (3−d)/2 e−r/ξ d−3
χd (r) ≃ −Cd n ·
OZ
· · 1+O , (6.247)
kB T R2 r (d−1)/2 r/ξ

where the Cd are dimensionless constants.

• Take ξ → ∞ with r fixed; this is the limit T → Tc at fixed r. In dimensions d > 2 we obtain
  
Cd′ n e−r/ξ d−3
χOZ
d (r) ≃ − · · 1+O . (6.248)
kB T R2 r d−2 r/ξ
In d = 2 dimensions we obtain
    
C2′ n r −r/ξ 1
χd=2 (r) ≃ −
OZ
· ln e · 1+O , (6.249)
kB T R2 ξ ln(r/ξ)

where the Cd′ are dimensionless constants.


6.6. COULOMB SYSTEMS : PLASMAS AND THE ELECTRON GAS 339

At criticality, ξ → ∞, and clearly our results in d = 1 and d = 2 dimensions are nonsensical, as they are
divergent. To correct this behavior, M. E. Fisher in 1963 suggested that the OZ correlation functions in
the r ≪ ξ limit be replaced by
ξη e−r/ξ
χ(r) ≃ −Cd′′ n · · , (6.250)
kB T R2 r d−2+η
a result known as anomalous scaling. Here, η is the anomalous scaling exponent.
Recall that the isothermal compressibility is given by κT = −χ̂(0). Near criticality, the integral in χ̂(0)
is dominated by the r ≪ ξ part, since ξ → ∞. Thus, using Fisher’s anomalous scaling,
Z
κT = − (0) = − ddr χ(r)
χ̂
Z (6.251)
e−r/ξ −(2−η)ν
∼ A ddr d−2+η ∼ B ξ 2−η ∼ C T − Tc ,
r

where A, B, and C are temperature-dependent constants which are nonsingular at T = Tc . Thus, since
κT ∝ |T − Tc |−γ , we conclude
γ = (2 − η) ν , (6.252)
a result known as hyperscaling.

6.6 Coulomb Systems : Plasmas and the Electron Gas

6.6.1 Electrostatic potential

Coulomb systems are particularly interesting in statistical mechanics because of their long-ranged forces,
which result in the phenomenon of screening. Long-ranged forces wreak havoc with the Mayer cluster
expansion, since the Mayer function is no longer integrable. Thus, the virial expansion fails, and new
techniques need to be applied to reveal the physics of plasmas.
The potential energy of a Coulomb system is
Z Z
U = 2 d r ddr ′ ρ(r) u(r − r ′ ) ρ(r ′ ) ,
1 d
(6.253)

where ρ(r) is the charge density and u(r), which has the dimensions of (energy)/(charge)2 , satisfies

∇2 u(r − r ′ ) = −4π δ(r − r ′ ) . (6.254)

Thus, 

 −2π |x − x′ | , d=1





u(r) = −2 ln |r − r ′ | , d = 2 (6.255)







|r − r ′ |−1 , d=3.
340 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

For discete particles, the charge density ρ(r) is given by


X
ρ(r) = qi δ(r − xi ) , (6.256)
i

where qi is the charge of the ith particle. We will assume two types of charges: q = ±e, with e > 0. The
electric potential is Z X
φ(r) = ddr ′ u(r − r ′ ) ρ(r ′ ) = qi u(r − xi ) . (6.257)
i

This satisfies the Poisson equation,


∇2 φ(r) = −4πρ(r) . (6.258)
The total potential energy can be written as
Z X
U = 2 ddr φ(r) ρ(r) =
1 1
2 qi φ(xi ) . (6.259)
i

6.6.2 Debye-Hückel theory

We now write the grand partition function:



X ∞
X 1 βµ+ N+ −N+ d 1 βµ− N− −N− d
Ξ(T, V, µ+ , µ− ) = e λ+ · e λ−
N ! N− !
N+ =0 N− =0 +
(6.260)
Z Z
−βU (r1 , ... , rN +N )
· d r1 · · · ddrN+ +N− e
d + − .

We now adopt a mean field approach, known as Debye-Hückel theory, writing

ρ(r) = ρav (r) + δρ(r)


(6.261)
φ(r) = φav (r) + δφ(r) .

We then have
Z
   
U= 1
2 ddr ρav (r) + δρ(r) · φav (r) + δφ(r)
≡ U0 ignore fluctuation term (6.262)
z Z }| { Z zZ }| {
= − 12 ddr ρav (r) φav (r) + ddr φav (r) ρ(r)+ 21 ddr δρ(r) δφ(r) .

We apply the mean field approximation in each region of space, which leads to
Z  
−d d e φav (r)
Ω(T, V, µ+ , µ− ) = −kB T λ+ z+ d r exp −
kB T
Z   (6.263)
−d d e φav (r)
− kB T λ− z− d r exp + ,
kB T
6.6. COULOMB SYSTEMS : PLASMAS AND THE ELECTRON GAS 341

where    
2π~2 µ±
λ± = , z± = exp . (6.264)
m± kB T kB T

The charge density is therefore


   
δΩ −d e φ(r) −d e φ(r)
ρ(r) = av = e λ+ z+ exp − − e λ− z− exp + , (6.265)
δφ (r) kB T kB T

where we have now dropped the superscript on φav (r) for convenience. At r → ∞, we assume charge
neutrality and φ(∞) = 0. Thus

λ−d −d
+ z+ = n+ (∞) = λ− z− = n− (∞) ≡ n∞ , (6.266)

where n∞ is the ionic density of either species at infinity. Therefore,


 
e φ(r)
ρ(r) = −2e n∞ sinh . (6.267)
kB T

We now invoke Poisson’s equation,

∇2 φ = 8πen∞ sinh(βeφ) − 4πρext , (6.268)

where ρext is an externally imposed charge density.


If eφ ≪ kB T , we can expand the sinh function and obtain

∇2 φ = κ2D φ − 4πρext , (6.269)

where  1/2  1/2


8πn∞ e2 kB T
κD = , λD = . (6.270)
kB T 8πn∞ e2
The quantity λD is known as the Debye screening length. Consider, for example, a point charge Q located
at the origin. We then solve Poisson’s equation in the weak field limit,

∇2 φ = κ2D φ − 4πQ δ(r) . (6.271)

Fourier transforming, we obtain


4πQ
− q 2 φ̂(q) = κ2D φ̂(q) − 4πQ =⇒ φ̂(q) = . (6.272)
q2 + κ2D

Transforming back to real space, we obtain, in three dimensions, the Yukawa potential,
Z 3
d q 4πQ eiq·r Q −κ r
φ(r) = = ·e D . (6.273)
(2π)3 q 2 + κ2D r

This solution must break down sufficiently close to r = 0, since the assumption eφ(r) ≪ kB T is no longer
valid there. However, for larger r, the Yukawa form is increasingly accurate.
342 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

For another example, consider an electrolyte held between two conducting plates, one at potential φ(x =
0) = 0 and the other at potential φ(x = L) = V , where x̂ is normal to the plane of the plates. Again
assuming a weak field eφ ≪ kB T , we solve ∇2 φ = κ2D φ and obtain

φ(x) = A eκD x + B e−κD x . (6.274)

We fix the constants A and B by invoking the boundary conditions, which results in
sinh(κD x)
φ(x) = V · . (6.275)
sinh(κD L)

Debye-Hückel theory is valid provided n∞ λ3D ≫ 1, so that the statistical assumption of many charges in
a screening volume is justified.

6.6.3 The electron gas : Thomas-Fermi screening

Assuming kB T ≪ εF , thermal fluctuations are unimportant and we may assume T = 0. In the same spirit
as the Debye-Hückel approach, we assume a slowly varying mean electrostatic potential φ(r). Locally,
we can write
~2 kF2
εF = − eφ(r) . (6.276)
2m
Thus, the Fermi wavevector kF is spatially varying, according to the relation
 1/2
2m 
kF (r) = ε + eφ(r) . (6.277)
~2 F
The local electron number density is
 
kF3 (r) eφ(r) 3/2
n(r) = = n∞ 1 + . (6.278)
3π 2 εF
In the presence of a uniform compensating positive background charge ρ+ = en∞ , Poisson’s equation
takes the form " 3/2 #
eφ(r)
∇2 φ = 4πe n∞ · 1+ − 1 − 4πρext (r) . (6.279)
εF
If eφ ≪ εF , we may expand in powers of the ratio, obtaining

6πn∞ e2
∇2 φ = φ ≡ κ2TF φ − 4πρext (r) . (6.280)
εF
Here, κTF is the Thomas-Fermi wavevector ,
 1/2
6πn∞ e2
κTF = . (6.281)
εF

Thomas-Fermi theory is valid provided n∞ λ3TF ≫ 1, where λTF = κ−1


TF , so that the statistical assumption
of many electrons in a screening volume is justified.
6.6. COULOMB SYSTEMS : PLASMAS AND THE ELECTRON GAS 343

One important application of Thomas-Fermi screening is to the theory of metals. In a metal, the outer,
valence electrons of each atom are stripped away from the positively charged ionic core and enter into
itinerant, plane-wave-like states. These states disperse with some ε(k) function (that is periodic in the
Brillouin zone, i.e. under k → k+G, where G is a reciprocal lattice vector), and at T = 0 this energy band
is filled up to the Fermi level εF , as Fermi statistics dictates. (In some cases, there may be several bands
at the Fermi level, as we saw in the case of yttrium.) The set of ionic cores then acts as a neutralizing
positive background. In a perfect crystal, the ionic cores are distributed periodically, and the positive
background is approximately uniform. A charged impurity in a metal, such as a zinc atom in a copper
matrix, has a different nuclear charge and a different valency than the host. The charge of the ionic core,
when valence electrons are stripped away, differs from that of the host ions, and therefore the impurity
acts as a local charge impurity. For example, copper has an electronic configuration of [Ar] 3d10 4s1 .
The 4s electron forms an energy band which contains the Fermi surface. Zinc has a configuration of
[Ar] 3d10 4s2 , and in a Cu matrix the Zn gives up its two 4s electrons into the 4s conduction band, leaving
behind a charge +2 ionic core. The Cu cores have charge +1 since each copper atom contributed only
one 4s electron to the conduction band. The conduction band electrons neutralize the uniform positive
background of the Cu ion cores. What is left is an extra Q = +e nuclear charge at the Zn site, and one
extra 4s conduction band electron. The Q = +e impurity is, however, screened by the electrons, and at
distances greater than an atomic radius the potential that a given electron sees due to the Zn core is of
the Yukawa form,
Q −κ r
φ(r) = · e TF . (6.282)
r
We should take care, however, that the dispersion ε(k) for the conduction band in a metal is not necessarily
of the free electron form ε(k) = ~2 k2 /2m. To linear order in the potential, however, the change in the
local electronic density is
δn(r) = eφ(r) g(εF ) , (6.283)
where g(εF ) is the density of states at the Fermi energy. Thus, in a metal, we should write

∇2 φ = (−4π)(−e δn)
(6.284)
= 4πe2 g(εF ) φ = κ2TF φ ,

where q
κTF = 4πe2 g(εF ) . (6.285)

The value of g(εF ) will depend on the form of the dispersion. For ballistic bands with an effective mass
m∗ , the formula in eqn. 6.280 still applies.

The Thomas-Fermi atom

Consider an ion formed of a nucleus of charge +Ze and an electron cloud of charge −N e. The net ionic
charge is then (Z − N )e. Since we will be interested in atomic scales, we can no longer assume a weak
field limit and we must retain the full nonlinear screening theory, for which

(2m)3/2  3/2
∇2 φ(r) = 4πe · εF + eφ(r) − 4πZe δ(r) . (6.286)
3π 2 ~3
344 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Figure 6.20: The Thomas-Fermi atom consists of a nuclear charge +Ze surrounded by N electrons
distributed in a cloud. The electric potential φ(r) felt by any electron at position r is screened by the
electrons within this radius, resulting in a self-consistent potential φ(r) = φ0 + (Ze2 /r) χ(r/r0 ).

We assume an isotropic solution. It is then convenient to define

Ze2
εF + eφ(r) = · χ(r/r0 ) , (6.287)
r

where r0 is yet to be determined. As r → 0 we expect χ → 1 since the nuclear charge is then unscreened.
We then have  2 
2 Ze 1 Ze2 ′′
∇ χ
· (r/r0 ) = 2 χ (r/r0 ) , (6.288)
r r0 r
thus we arrive at the Thomas-Fermi equation,

1
χ′′ (t) = √ χ3/2 (t) , (6.289)
t

with r = t r0 , provided we take


 2/3
~2 3π
r0 = √ = 0.885 Z −1/3 aB , (6.290)
2me2 4 Z

~2
where aB = me2 = 0.529 Å is the Bohr radius. The TF equation is subject to the following boundary
conditions:

• At short distances, the nucleus is unscreened, i.e.

χ(0) = 1 . (6.291)
6.7. POLYMERS 345

• For positive ions, with N < Z, there is perfect screening at the ionic boundary R = t∗ r0 , where
χ(t∗ ) = 0. This requires
 
Ze2 Ze2 ′ (Z − N ) e
E = −∇φ = − 2 χ(R/r0 ) + χ (R/r0 ) r̂ = r̂ . (6.292)
R R r0 R2

This requires
N
− t∗ χ′ (t∗ ) = 1 − . (6.293)
Z

For an atom, with N = Z, the asymptotic solution to the TF equation is a power law, and by inspection
is found to be χ(t) ∼ C t−3 , where C is a constant. The constant follows from the TF equation, which
yields 12 C = C 3/2 , hence C = 144. Thus, a neutral TF atom has a density with a power law tail, with
ρ ∼ r −6 . TF ions with N > Z are unstable.

6.7 Polymers

6.7.1 Basic concepts

Linear chain polymers are repeating structures with the chemical formula (A)x , where A is the formula
unit and x is the degree of polymerization. In many cases (e.g. polystyrene), x > 5
∼ 10 is not uncommon.
For a very readable introduction to the subject, see P. G. de Gennes, Scaling Concepts in Polymer Physics.
Quite often a given polymer solution will contain a distribution of x values; this is known as polydisper-
sity. Various preparation techniques, such as chromatography, can mitigate the degree of polydispersity.
Another morphological feature of polymers is branching, in which the polymers do not form linear chains.
Polymers exhibit a static flexibility which can be understood as follows. Consider a long chain hydrocar-
bon with a −C − C − C− backbone. The angle between successive C − C bonds is fixed at θ ≈ 68◦ , but
the azimuthal angle ϕ can take one of three possible low-energy values, as shown in the right panel of fig.
6.22. Thus, the relative probabilities of gauche and trans orientations are
Prob (gauche)
= 2 e−∆ε/kB T (6.294)
Prob (trans)
where ∆ε is the energy difference between trans and gauche configurations. This means that the polymer
chain is in fact a random coil with a persistence length

ℓp = ℓ0 e∆ε/kB T (6.295)

where ℓ0 is a microscopic length scale, roughly given by the length of a formula unit, which is approx-
imately a few Ångstroms (see fig. 6.23). Let L be the total length of the polymer when it is stretched
into a straight line. If ℓp > L, the polymer is rigid. If ℓp ≪ L, the polymer is rigid on the length scale
ℓp but flexible on longer scales. We have

ℓp 1 ∆ε/k T
= e B , (6.296)
L N
346 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

Figure 6.21: Some examples of linear chain polymers.

where we now use N (rather than x) for the degree of polymerization.


In the time domain, the polymer exhibits a dynamical flexibility on scales longer than a persistence time.
The persistence time τp is the time required for a trans-gauche transition. The rate for such transitions
is set by the energy barrier B separating trans from gauche configurations:

τp = τ0 eB/kB T (6.297)

where τ0 ∼ 10−11 s. On frequency scales ω ≪ τp−1 the polymer is dynamically flexible. If ∆ε ∼ kB T ≪ B


the polymer is flexible from a static point of view, but dynamically rigid. That is, there are many gauche
orientations of successive carbon bonds which reflect a quenched disorder. The polymer then forms a
frozen random coil, like a twisted coat hanger.

6.7.2 Polymers as random walks

A polymer can be modeled by a self-avoiding random walk (SAW). That is, on scales longer than ℓp ,
it twists about randomly in space subject to the constraint that it doesn’t overlap itself. Before we
consider the mathematics of SAWs, let’s first recall some aspects of ordinary random walks which are not
self-avoiding.
We’ll simplify matters further by considering random walks on a hypercubic lattice of dimension d. Such
a lattice has coordination number 2d, i.e. there are 2d nearest neighbor separation vectors, given by
δ = ±a ê1 , ±a ê2 , . . . , ±a êd , where a is the lattice spacing. Consider now a random walk of N steps
6.7. POLYMERS 347

Figure 6.22: Left: trans and gauche orientations in carbon chains. Right: energy as a function of
azimuthal angle ϕ. There are three low energy states: trans (ϕ = 0) and gauche (ϕ = ±ϕ0 ).

starting at the origin. After N steps the position is

N
X
RN = δj (6.298)
j=1

where δj takes on one of 2d possible values. Now N is no longer the degree of polymerization, but
somthing approximating L/ℓp , which is the number of persistence lengths in the chain. We assume each


step is independent, hence hδjα δjβ′ i = (a2 /d) δjj ′ δαβ and R2N = N a2 . The full distribution PN (R) is
given by
X X
PN (R) = (2d)−N ··· δR,P δj
j
δ1 δN

Zπ/a Zπ/a " d #N


d dk1 dkd −ik·R 1 X
=a ··· e cos(kµ a)
2π 2π d
µ=1
−π/a −π/a
" (6.299)
Z  #
ddk −ik·R 1 2 2
= ad e exp N ln 1 − k a + ...
(2π)d 2d
Ω̂
 d Z  d/2
a d −N k2 a2 /2d −ik·R d 2 /2N a2
≈ dke e = e−dR .
2d 2πN


This is a simple Gaussian, with width R2 = d·(N a2 /d) = N a2 , as we have already computed. The
quantity R√defined here is the end-to-end vector of the chain. The RMS end-to-end distance is then
hR2 i1/2 = N a ≡ R0 .
348 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

A related figure of merit is the radius of gyration, Rg , defined by

1 DX 2 E
N
Rg2 = Rn − RCM , (6.300)
N
n=1

1 PN
where RCM = N j=1 Rj is the center of mass position. A brief calculation yields

 N a2
Rg2 = N + 3 − 4N −1 a2 ∼ , (6.301)
6
in all dimensions.
The total number of random walk configurations with end-to-end vector R is then (2d)N PN (R), so the
entropy of a chain at fixed elongation is
h i dk R2
S(R, N ) = kB ln (2d)N PN (R) = S(0, N ) − B 2 . (6.302)
2N a
If we assume that the energy of the chain is conformation independent, then E = E0 (N ) and

dkB T R2
F (R, N ) = F (0, N ) + . (6.303)
2N a2
In the presence of an external force Fext , the Gibbs free energy is the Legendre transform

G(Fext , N ) = F (R, N ) − Fext · R , (6.304)

and ∂G/∂R = 0 then gives the relation


N a2
R(Fext , N ) = F . (6.305)
dkB T ext

This may be considered an equation of state for the polymer.


Following de Gennes, consider a chain with charges ±e at each end, placed in an external electric field
of magnitude E = 30, 000 V/cm. Let N = 104 , a = 2 Å, and d = 3. What is the elongation? From the
above formula, we have
R eER0
= = 0.8 , (6.306)
R0 3kB T

with R0 = Na as before.

Structure factor

We can also compute the structure factor,

1 D X X ik·(Rm −Rn ) E 2 X X D ik·(Rm −Rn ) E


N N N m−1
S(k) = e =1+ e . (6.307)
N N
m=1 n=1 m=1 n=1
6.7. POLYMERS 349

Figure 6.23: The polymer chain as a random coil.

For averages with respect to a Gaussian distribution,


 

ik·(Rm −Rn )
1D 2 E
e = exp − k · (Rm − Rn ) . (6.308)
2
Pm
Now for m > n we have Rm − Rn = j=n+1 δj , and therefore

D 2 E m
X
1
k · (Rm − Rn ) = (k · δj )2 = (m − n) k2 a2 , (6.309)
d
j=n+1

since hδjα δjβ′ i = (a2 /d) δjj ′ δαβ . We then have

N m−1
2 X X −(m−n) k2 a2 /2d N (e2µk − 1) − 2 eµk (1 − e−N µk )
S(k) = 1 + e = 2 , (6.310)
N N eµk − 1
m=1 n=1

where µk = k2 a2 /2d. In the limit where N → ∞ and a → 0 with N a2 = R02 constant, the structure
factor has a scaling form, S(k) = N f (N µk ) = (R0 /a) f (k2 R02 /2d) , where

2 −x  x x2
f (x) = e − 1 + x = 1 − + + ... . (6.311)
x2 3 12

Rouse model

Consider next a polymer chain subjected to stochastic forcing. WePmodel the chain 2 as a collection of
1
mass points connected by springs, with a potential energy U = 2 k n xn+1 − xn . This reproduces
the distribution of eqn. 6.299 if we take the spring constant to be k = 3kB T /a2 and set the equilibrium
length of each spring to zero. The equations of motion are then

M ẍn + γ ẋn = −k 2xn − xn−1 − xn+1 + fn (t) , (6.312)

where n ∈ {1, . . . , N } and {fnµ (t)} a set of Gaussian white noise forcings, each with zero mean, and


fnµ(t) fnν′ (t′ ) = 2γkB T δnn′ δµν δ(t − t′ ) . (6.313)
350 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

We define x0 ≡ x1 and xN +1 ≡ xN so that the end mass points n = 1 and n = N experience a restoring
force from only one neighbor. We assume the chain is overdamped and set M → 0. We then have
N
X
γ ẋn = −k Ann′ xn′ + fn (t) , (6.314)
n′ =1

where  
1 −1 0 0 ··· 0
−1 2 −1 0 · · · 0 
 
 0 −1 2 −1 · · · 0 
 
A=
0 .. ..  . (6.315)
 0 −1 . ··· . 
 . .. .. 
 .. . . 2 −1
0 · · · · · · 0 −1 1

The matrix A is real and symmetric. Its eigenfunctions are labeled ψj (n), with j ∈ {0, . . . , N − 1}:

1
ψ0 (n) = √
N
r   (6.316)
2 (2n − 1)jπ
ψj (n) = cos , j ∈ {1, . . . , N − 1}
N 2N

The completeness and orthonormality relations are


N
X −1 N
X

ψj (n) ψj (n ) = δnn′ , ψj (n) ψj ′ (n) = δjj ′ , (6.317)
j=0 n=1


with eigenvalues λj = 4 sin2 πj/2N . Note that λ0 = 0.
We now work in the basis of normal modes {ηjµ }, where

N
X N
X −1
ηjµ (t) = ψj (n) xµn (t) , xµn (t) = ψj (n) ηjµ (t) . (6.318)
n=1 j=0

We then have
dηj 1
=− ηj + gj (t) , (6.319)
dt τj

where the j th relaxation time is


γ
τj =  2 (6.320)
4k sin πj/2N
and
N
X
gjµ (t) = γ −1 ψj (n) fnµ (t) . (6.321)
n=1
6.7. POLYMERS 351

Note that
µ
gj (t) gjν′ (t′ ) = 2γ −1 kB T δjj ′ δµν δ(t − t′ ) . (6.322)

Integrating eqn. 6.319, we have for, j = 0,


Zt
η0 (t) = η0 (0) + dt′ g0 (t′ ) . (6.323)
0

For the j > 0 modes,


Zt
−t/τj ′
ηj (t) = ηj (0) e + dt′ gj (t′ ) e(t −t)/τj . (6.324)
0
Thus,


η0µ (t) η0ν (t′ ) c = 2γ −1 kB T δµν min(t, t′ )

µ  ′ ′
 (6.325)
ηj (t) ηjν (t′ ) c = γ −1 kB T δµν τj e−|t−t |/τj − e−(t+t )/τj ,

where the ‘connected average’ is defined to be hA(t) B(t′ )ic ≡ hA(t) B(t′ )i − hA(t)ihB(t′ )i. Transforming
back to the original real space basis, we then have


2k T k T
N
X −1  ′ ′

xµn (t) xνn′ (t′ ) c = B δµν min(t, t′ ) + B δµν τj ψj (n) ψj (n′ ) e−|t−t |/τj − e−(t+t )/τj . (6.326)
Nγ γ
j=1

In particular, the ‘connected variance’ of xn (t) is


N −1
 
 2 6k T 3k T X  2 
CVar xn (t) ≡ xn (t) c = B t + B τj ψj (n) 1 − e−2t/τj . (6.327)
Nγ γ
j=1

From this we see that at long times, i.e. when t ≫ τ1 , the motion of xn (t) is diffusive, with diffusion
constant D = kB T /N γ ∝ B −1 , which is inversely proportional to the chain length. Recall the Stokes
result γ = 6πηR/M for a sphere of radius R and mass M moving in a fluid of dynamical viscosity η.
From D = kB T /γM , shouldn’t we expect the diffusion constant to be D = kB T /6πηR ∝ N −1/2 , since
the radius of gyration of the polymer is Rg ∝ N 1/2 ? This argument smuggles in the assumption that the
only dissipation is taking place at the outer surface of the polymer, modeled as a ball of radius Rg . In
fact, for a Gaussian random walk in three space dimensions, the density for r < Rg is ρ ∝ N −1/2 since
√ 3
there are N monomers inside a region of volume N . Accounting for Flory swelling due to steric
interactions (see below), the density is ρ ∼ N −4/5 , which is even smaller. So as N → ∞, the density
within the r = Rg effective sphere gets small, which means water molecules can easily penetrate, in which
case the entire polymer chain should be considered to be in a dissipative environment, which is what the
Rouse model says – each monomer executed overdamped motion.
 
A careful analysis of eqn. 6.327 reveals that there is a subdiffusive regime12 where CVar xn (t) ∝ t1/2 .
To see this, first take the N ≫ 1 limit, in which case we may write τj = N 2 τ0 /j 2 , where τ0 ≡ γ/π 2 k and
12
I am grateful to Jonathan Lam and Olga Dudko for explaining this to me.
352 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

j ∈ {1, . . . , N − 1}. Let s ≡ (n − 21 )/N ∈ [0, 1] be the scaled coordinate along the chain. The second term
in eqn. 6.327 is then
N −1
6k T τ X cos2 (πjs) −2j 2 t/τ1

S(s, t) ≡ B · 1 1 − e . (6.328)
γ N j2
j=1

Let σ ≡ (t/τ1 )1/2 . When t ≪ τ1 , i.e. σ ≪ 1, we have

ZN σ
6kB T τ1 cos2 (πus/σ) 2
S(s, t) ≃ · σ du 2
1 − e−2u . (6.329)
γ N u
0

Since s/σ ≫ 1, we may replace the cosine squared term by its average 21 . If we further assume N σ ≫ 1,
which means we are in the regime 1 ≪ t/τ0 ≪ N 2 , after performing the integral we obtain the result
3kB T p
S(s, t) = 2πτ0 t , (6.330)
γ
provided s = O(1) , i.e. the site n is not on either end of the chain. The result in eqn. 6.330 dominates
the first term on the RHS of eqn. 6.327 since τ0 ≪ t ≪ τ1 . This is the subdiffusive regime.
When t ≫ τ1 = N 2 τ0 , the exponential on the RHS of eqn. 6.328 is negligible, and if we again
approximate cos2 (πjs) ≃ 12 , and we extend the upper limit on the sum to infinity, we find S(t) =
(3kB T /γ)(τ1 /N )(π 2 /6) ∝ t0 , which is dominated by the leading term on the RHS of eqn. 6.327. This is
the diffusive regime, with D = kB T /N γ.
Finally, when t ≪ τ0 , the factor 1 − exp(−2t/τj ) may be expanded to first order in t. One then obtains
 
CVar xn (t) = (6kB T /γ) t, which is independent of the force constant k. In this regime, the monomers
don’t have time to respond to the force from their neighbors, hence they each diffuse independently. On
such short time scales, however, one should check to make sure that inertial effects can be ignored, i.e.
that t ≫ M/γ.

One serious defect of the Rouse model is its prediction of the relaxation time of the j = 1 mode, τ1 ∝ N 2 .
The experimentally observed result is τ1 ∝ N 3/2 . We should stress here that the Rouse model applies to
ideal chains. In the theory of polymer solutions, a theta solvent is one in which polymer coils act as ideal
chains. An extension of the Rouse model, due to my former UCSD colleague Bruno Zimm, accounts for
hydrodynamically-mediated interactions between any pair of ‘beads’ along the chain. Specifically, the
Zimm model is given by
dxµn X µν h  i
= H (xn − xn′ ) k xνn′ +1 + xνn′ −1 − 2xνn′ + fnν′ (t) , (6.331)
dt ′ n

where
1 
H µν (R) = δµν + R̂µ R̂ν (6.332)
6πηR
is known as the Oseen hydrodynamic tensor (1927) and arises when computing the velocity in a fluid
at position R when a point force F = f δ(r) is applied at the origin. Typically one replaces H(R) by
its average over the equilibrium distribution of polymer configurations. Zimm’s model more correctly
reproduces the behavior of polymers in θ-solvents.
6.7. POLYMERS 353

6.7.3 Flory theory of self-avoiding walks

What is missing from the random walk free energy is the effect of steric interactions. An argument due
to Flory takes these interactions into account in a mean field treatment. Suppose we have a chain of
radius R. Then the average monomer density within the chain is c = N/Rd . Assuming short-ranged
interactions, we should then add a term to the free energy which effectively counts the number of near
self-intersections of the chain. This number should be roughly N c. Thus, we write

N2 1 R2
F (R, N ) = F0 + u(T ) + 2 dkB T . (6.333)
Rd N a2
The effective interaction u(T ) is positive in the case of a so-called ‘good solvent’.
The free energy is minimized when

∂F dvN 2 R
0= = − d+1 + dkB T , (6.334)
∂R R N a2
which yields the result
 1/(d+2)
ua2
RF (N ) = N 3/(d+2) ∝ N ν . (6.335)
kB T
Thus, we obtain ν = 3/(d + 2). In d = 1 this says ν = 1, which is exactly correct because a SAW in d = 1
has no option but to keep going in the same direction. In d = 2, Flory theory predicts ν = 34 , which is
also exact. In d = 3, we have νd=3 = 53 , which is extremely close to the numerical value ν = 0.5880. Flory
theory is again exact at the SAW upper critical dimension, which is d = 4, where ν = 21 , corresponding
to a Gaussian random walk13 . Best. Mean. Field. Theory. Ever.
How well are polymers described as SAWs? Fig. 6.24 shows the radius of gyration Rg versus molecular
weight M for polystyrene chains in a toluene and benzene solvent. The slope is ν = d ln Rg /d ln M =
0.5936. Experimental results can vary with concentration and temperature, but generally confirm the
validity of the SAW model.
For a SAW under an external force, we compute the Gibbs partition function,
Z Z
Y (Fext , N ) = ddR PN (R) eFext ·R/kB T = ddx f (x) esn̂·x , (6.336)

where x = R/RF and s = kB T /RF Fext and n̂ = F̂ext . One than has R(Fext ) = RF Φ(RF /ξ), where
ξ = kB T /Fext and R(Fext ) = Fext RF2 /kB T . For small values of its argument one has Φ(u) ∝ u. For large
u it can be shown that R(Fext ) ∝ (Fext RF /kB T )2/3 .
On a lattice of coordination number z, the number of N -step random walks starting from the origin is
ΩN = z N . If we constrain our random walks to be self-avoiding, the number is reduced to
γ−1 N
ΩSAW
N =CN y , (6.337)

where C and γ are dimension-dependent constants, and we expect y < ∼ z − 1, since at the very least a
SAW cannot immediately double back on itself. In fact, on the cubic lattice one has z = 6 but y = 4.68,
13
There are logarithmic corrections to the SAW result exactly at d = 4, but for all d > 4 one has ν = 21 .
354 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

3
10

Rg / nm
2
10

1
10
5 6 7 8
10 10 10 10
M / (g/mol)

Figure 6.24: Radius of gyration Rg of polystyrene in a toluene and benzene solvent, plotted as a function
of molecular weight of the polystyrene. The best fit corresponds to a power law Rg ∝ M ν with ν = 0.5936.
From J. Des Cloizeaux and G. Jannink, Polymers in Solution: Their Modeling and Structure (Oxford,
1990).

4
slightly less than z − 1. One finds γd=2 ≃ 3 and γd=3 ≃ 76 . The RMS end-to-end distance of the SAW is

RF = a N ν , (6.338)
3 3
where a and ν are d-dependent constants,with νd=1 = 1, νd=2 ≃ 4, and νd=3 ≃ 5. The distribution
PN (R) has a scaling form,
 
1 R
PN (R) = f (a ≪ R ≪ N a) . (6.339)
RFd RF

One finds (
xg x≪1
f (x) ∼ δ
(6.340)
exp(−x ) x ≫ 1 ,

with g = (γ − 1)/ν and δ = 1/(1 − ν).

6.7.4 Polymers and solvents

Consider a solution of monodisperse polymers of length N in a solvent. Let φ be the dimensionless


monomer concentration, so φ/N is the dimensionless polymer concentration and φs = 1 − φ is the dimen-
sionless solvent concentration. (Dimensionless concentrations are obtained by dividing the corresponding
dimensionful concentration by the overall density.) The entropy of mixing for such a system is given by
eqn. 2.352. We have  
V kB 1
Smix = − · φ ln φ + (1 − φ) ln(1 − φ) , (6.341)
v0 N
6.7. POLYMERS 355

where v0 ∝ a3 is the volume per monomer. Accounting for an interaction between the monomer and the
solvent, we have that the free energy of mixing is

v0 Fmix 1
= φ ln φ + (1 − φ) ln(1 − φ) + χ φ(1 − φ) . (6.342)
V kB T N

where χ is the dimensionless polymer-solvent interaction, called the Flory parameter . This provides a
mean field theory of the polymer-solvent system.
The osmotic pressure Π is defined by
∂Fmix
Π=− , (6.343)
∂V Np
which is the variation of the free energy of mixing with respect to volume holding the number of polymers
constant. The monomer concentration is φ = N Np v0 /V , so

∂ φ2 ∂
=− . (6.344)
∂V Np N Np v0 ∂φ Np

Now we have  
1 −1
Fmix = N Np kB T ln φ + (φ − 1) ln(1 − φ) + χ (1 − φ) , (6.345)
N
and therefore
kB T h −1 i
Π= (N − 1) φ − ln(1 − φ) − χ φ2 . (6.346)
v0

In the limit of vanishing monomer concentration φ → 0, we recover

φ kB T
Π= , (6.347)
N v0

which is the ideal gas law for polymers.


For N −1 ≪ φ ≪ 1, we expand the logarithm and obtain

v0 Π 1
= φ + 21 (1 − 2χ) φ2 + O(φ3 )
kB T N (6.348)
≈ 12 (1 − 2χ) φ2 .

Note that Π > 0 only if χ < 12 , which is the condition for a ’good solvent’.
In fact, eqn. 6.348 is only qualitatively correct. In the limit where χ ≪ 21 , Flory showed that the
individual polymer coils behave much as hard spheres of radius RF . The osmotic pressure then satisfies
something analogous to a virial equation of state:
 
Π φ φ 2 3
= +A RF + . . .
kB T N v0 N v0
(6.349)
φ
= h(φ/φ∗ ) .
N v0
356 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

This is generalized to a scaling form in the second line, where h(x) is a scaling function, and φ∗ =
N v0 /RF3 ∝ N −4/5 , assuming d = 3 and ν = 35 from Flory theory. As x = φ/φ∗ → 0, we must recover the
ideal gas law, so h(x) = 1 + O(x) in this limit. For x → ∞, we require that the result be independent of
the degree of polymerization N . This means h(x) ∝ xp with 45 p = 1, i.e. p = 54 . The result is known as
the des Cloiseaux law:
v0 Π
= C φ9/4 , (6.350)
kB T
where C is a constant. This is valid for what is known as semi-dilute solutions, where φ∗ ≪ φ ≪ 1. In
the dense limit φ ∼ 1, the results do not exhibit this universality, and we must appeal to liquid state
theory, which is no fun at all.

6.8 Appendix I : Potts Model in One Dimension

6.8.1 Definition

The Potts model is defined by the Hamiltonian


X X
H = −J δσ ,σ − h δσ ,1 . (6.351)
i j i
hiji i

Here, the spin variables σi take values in the set {1, 2, . . . , q} on each site. The equivalent of an external
magnetic field in the Ising case is a field h which prefers a particular value of σ (σ = 1 in the above
Hamiltonian). Once again, it is not possible to compute the partition function on general lattices,
however in one dimension we may once again find Z using the transfer matrix method.

6.8.2 Transfer matrix

On a ring of N sites, we have

Z = Tr e−βH
X βhδ βJδ βhδ βJδ
= e σ1 ,1 e σ1 ,σ2 · · · e σN ,1 e σN ,σ1 (6.352)
{σn }

= Tr RN ,

where the q × q transfer matrix R is given by




 eβ(J+h) if σ = σ′ = 1



 βJ = σ ′ 6= 1
1 1
e if σ
Rσσ′ = eβJδσσ′ e 2 βhδσ,1 e 2 βhδσ′ ,1 = eβh/2 if σ = 1 and σ ′ 6= 1 (6.353)



 eβh/2 if σ 6= 1 and σ ′ = 1



1 if σ 6= 1 and σ ′ 6= 1 and σ 6= σ ′ .
6.8. APPENDIX I : POTTS MODEL IN ONE DIMENSION 357

In matrix form,  
eβ(J+h) eβh/2 eβh/2 ··· eβh/2
 eβh/2 eβJ 1 ··· 1 
 
 βh/2 βJ 
 e 1 e ··· 1 
R=
 .. .. .. .. ..  (6.354)
 . . . . . 
 βh/2 
 e 1 1 · · · eβJ 1 
eβh/2 1 1 ··· 1 eβJ
The matrix R has q eigenvalues λj , with j = 1, . . . , q. The partition function for the Potts chain is then
q
X
Z= λN
j . (6.355)
j=1

We can actually find the eigenvalues of R analytically. To this end, consider the vectors
   βh/2 
1 e
0   1 
  −1/2  
φ = . , ψ = q − 1 + eβh  .  . (6.356)
 ..   .. 
0 1
Then R may be written as
   
R = eβJ − 1 I + q − 1 + eβh | ψ ih ψ | + eβJ − 1 eβh − 1 | φ ih φ | , (6.357)

where I is the q × q identity matrix. When h = 0, we have a simpler form,



R = eβJ − 1 I + q | ψ ih ψ | . (6.358)

From this we can read off the eigenvalues:

λ1 = eβJ + q − 1
(6.359)
λj = eβJ − 1 , j ∈ {2, . . . , q} ,

since | ψ i is an eigenvector with eigenvalue λ = eβJ + q − 1, and any vector orthogonal to | ψ i has
eigenvalue λ = eβJ − 1. The partition function is then
N N
Z = eβJ + q − 1 + (q − 1) eβJ − 1 . (6.360)

In the thermodynamic limit N → ∞, only the λ1 eigenvalue contributes, and we have



F (T, N, h = 0) = −N kB T ln eJ/kB T + q − 1 for N → ∞ . (6.361)

When h is nonzero, the calculation becomes somewhat more tedious, but still relatively easy. The problem
is that | ψ i and | φ i are not orthogonal, so we define
| φ i − | ψ ih ψ | φ i
|χi = q , (6.362)
1 − h φ | ψ i2
358 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS

where  1/2
eβh
x ≡ hφ|ψ i = . (6.363)
q − 1 + eβh
Now we have h χ | ψ i = 0, with h χ | χ i = 1 and h ψ | ψ i = 1, with
p
| φ i = 1 − x2 | χ i + x | ψ i . (6.364)

and the transfer matrix is then


 
R = eβJ − 1 I + q − 1 + eβh | ψ ih ψ |
 p  
βJ
 βh  2 χ 2
+ e − 1 e − 1 (1 − x ) | ih χ | + x | ψ ih ψ | + x 1 − x | χ ih ψ | + | ψ ih χ |
2

"  #
βJ
 βh
 βJ
 βh  eβh
= e −1 I+ q−1+e + e −1 e −1 | ψ ih ψ | (6.365)
q − 1 + eβh
 
βJ
 βh  q−1
+ e −1 e −1 | χ ih χ |
q − 1 + eβh
 
βJ
 βh  (q − 1) eβh 1/2  
+ e −1 e −1 | χ ih ψ | + | ψ ih χ | ,
q − 1 + eβh
which in the two-dimensional subspace spanned by | χ i and | ψ i is of the form
 
a c
R= . (6.366)
c b

Recall that for any 2 × 2 Hermitian matrix,

M = a0 I + a · τ
 
a0 + a3 a1 − ia2 (6.367)
= ,
a1 + ia2 a0 − a3

the characteristic polynomial is



P (λ) = det λ I − M = (λ − a0 )2 − a21 − a22 − a23 , (6.368)

and hence the eigenvalues are q


λ± = a0 ± a21 + a22 + a23 . (6.369)
For the transfer matrix of eqn. 6.365, we obtain, after a little work,
h  i
λ1,2 = eβJ − 1 + 12 q − 1 + eβh + eβJ − 1 eβh − 1 (6.370)
rh
  i2  
± 12 q − 1 + eβh + eβJ − 1 eβh − 1 − 4(q − 1) eβJ − 1 eβh − 1 .

There are q − 2 other eigenvalues, however, associated with the (q −2)-dimensional subspace orthogonal
to | χ i and | ψ i. Clearly all these eigenvalues are given by

λj = eβJ − 1 , j ∈ {3 , . . . , q} . (6.371)
6.8. APPENDIX I : POTTS MODEL IN ONE DIMENSION 359

The partition function is then


Z = λN N N
1 + λ2 + (q − 2) λ3 , (6.372)
and in the thermodynamic limit N → ∞ the maximum eigenvalue λ1 dominates. Note that we recover
the correct limit as h → 0.
360 CHAPTER 6. CLASSICAL INTERACTING SYSTEMS
Chapter 7

Mean Field Theory of Phase Transitions

7.1 References

– M. Kardar, Statistical Physics of Particles (Cambridge, 2007)


A superb modern text, with many insightful presentations of key concepts.

– M. Plischke and B. Bergersen, Equilibrium Statistical Physics (3rd edition, World Scientific, 2006)
An excellent graduate level text. Less insightful than Kardar but still a good modern treatment of
the subject. Good discussion of mean field theory.

– G. Parisi, Statistical Field Theory (Addison-Wesley, 1988)


An advanced text focusing on field theoretic approaches, covering mean field and Landau-Ginzburg
theories before moving on to renormalization group and beyond.

– J. P. Sethna, Entropy, Order Parameters, and Complexity (Oxford, 2006)


An excellent introductory text with a very modern set of topics and exercises. Available online at
http://www.physics.cornell.edu/sethna/StatMech

361
362 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

7.2 The van der Waals system

7.2.1 Equation of state

Recall the van der Waals equation of state,


 
a
p + 2 (v − b) = RT , (7.1)
v

where v = NA V /N is the molar volume. Solving for p(v, T ), we have


RT a
p= − 2 . (7.2)
v−b v
Let us fix the temperature T and examine the function p(v). Clearly p(v) is a decreasing function of
volume for v just above the minimum allowed value v = b, as well as for v → ∞. But is p(v) a monotonic
function for all v ∈ [b, ∞]?
We can answer this by computing the derivative,
 
∂p 2a RT
= 3 − . (7.3)
∂v T v (v − b)2

Setting this expression to zero for finite v, we obtain the equation1

2a u3
= , (7.4)
bRT (u − 1)2

where u ≡ v/b is dimensionless. It is easy to see that the function f (u) = u3 /(u − 1)2 has a unique
minimum for u > 1. Setting f ′ (u∗ ) = 0 yields u∗ = 3, and so fmin = f (3) = 27
4 . Thus, for T > Tc =
8a/27bR, the LHS of eqn. 7.4 lies below the minimum value of the RHS, and there is no solution. This
means that p(v, T > Tc ) is a monotonically decreasing function of v.
At T = Tc there is a saddle-node bifurcation. Setting vc = bu∗ = 3b and evaluating pc = p(vc , Tc ), we
have that the location of the critical point for the van der Waals system is
a 8a
pc = , vc = 3b , Tc = . (7.5)
27 b2 27 bR

For T < Tc , there are two solutions to eqn. 7.4, corresponding to a local minimum and a local maximum
of the function p(v). The locus of points in the (v, p) plane for which (∂p/∂v)T = 0 is obtained by setting
eqn. 7.3 to zero and solving for T , then substituting this into eqn. 7.2. The result is
a 2ab
p∗ (v) = 2
− 3 . (7.6)
v v
Expressed in terms of dimensionless quantities p̄ = p/pc and v̄ = v/vc , this equation becomes
3 2
p̄∗ (v̄) = 2
− 3 . (7.7)
v̄ v̄
7.2. THE VAN DER WAALS SYSTEM 363

Figure 7.1: Pressure versus molar volume for the van der Waals gas at temperatures in equal intervals
from T = 1.10 Tc (red) to T = 0.85 Tc (blue). The purple curve is p̄∗ (v̄).


Along the curve p = p∗ (v), the isothermal compressibility, κT = − v1 ∂v
∂p T diverges, heralding a thermo-
dynamic instability. To understand better, let us compute the free energy of the van der Waals system,
F = E − T S. Regarding the energy E, we showed back in chapter 2 that
   
∂ε ∂p a
=T −p= , (7.8)
∂v T ∂T V v2

which entails
a
ε(T, v) = 12 f RT − , (7.9)
v
where ε = E/ν is the molar internal energy. The first term is the molar energy of an ideal gas, where f
is the number of molecular freedoms, which is the appropriate low density limit. The molar specific heat
∂ε
is then cV = ∂T v
= f2 R, which means that the molar entropy is
Z T
cV
s(T, v) = dT ′ = f2 R ln(T /Tc ) + s1 (v) . (7.10)
T′
∂f 
We then write f = ε − T s, and we fix the function s1 (v) by demanding that p = − ∂v T . This yields
s1 (v) = R ln(v − b) + s0 , where s0 is a constant. Thus2 ,
  a
f (T, v) = f2 R T 1 − ln T /Tc − − RT ln(v − b) − T s0 . (7.11)
v

1
There is always a solution to (∂p/∂v)T = 0 at v = ∞.
2
Don’t confuse the molar free energy (f ) with the number of molecular degrees of freedom (f )!
364 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

  
L2 ·bar L
gas a mol2
b mol pc (bar) Tc (K) vc (L/mol)
Acetone 14.09 0.0994 52.82 505.1 0.2982
Argon 1.363 0.03219 48.72 150.9 0.0966
Carbon dioxide 3.640 0.04267 7404 304.0 0.1280
Ethanol 12.18 0.08407 63.83 516.3 0.2522
Freon 10.78 0.0998 40.09 384.9 0.2994
Helium 0.03457 0.0237 2.279 5.198 0.0711
Hydrogen 0.2476 0.02661 12.95 33.16 0.0798
Mercury 8.200 0.01696 1055 1723 0.0509
Methane 2.283 0.04278 46.20 190.2 0.1283
Nitrogen 1.408 0.03913 34.06 128.2 0.1174
Oxygen 1.378 0.03183 50.37 154.3 0.0955
Water 5.536 0.03049 220.6 647.0 0.0915

Table 7.1: van der Waals parameters for some common gases. (Source: Wikipedia)

We know that under equilibrium conditions, f is driven to a minimum by spontaneous processes. Now

∂ 2f
suppose that ∂v 2 T < 0 over some range of v at a given temperature T . This would mean that one mole of
the system at volume v and temperature T could lower its energy by rearranging into two half-moles, with
respective molar volumes v ± δv, each at temperature T . The total volume and temperature thus remain

∂ 2f
fixed, but the free energy changes by an amount ∆f = 21 ∂v 2
2 T (δv) < 0. This means that the system is
unstable – it can lower its energy by dividing up into two subsystems each with different densities (i.e.
molar volumes). Note that the onset of stability occurs when

∂ 2f ∂p 1
2 =− = =0, (7.12)
∂v T ∂v T vκp
which is to say when κp = ∞. As we saw, this occurs at p = p∗ (v), given in eqn. 7.6.

∂ 2f
However, this condition, ∂v 2 T < 0, is in fact too strong. That is, the system can be unstable even at

∂ 2f
molar volumes where ∂v 2 T > 0. The reason is shown graphically in fig. 7.2. At the fixed temperature
T , for any molar volume v between vliquid ≡ v1 and vgas ≡ v2 , the system can lower its free energy by
phase separating into regions of different molar volumes. In general we can write
v = (1 − x) v1 + x v2 , (7.13)
so v = v1 when x = 0 and v = v2 when x = 1. The free energy upon phase separation is simply
f = (1 − x) f1 + x f2 , (7.14)
where fj = f (vj , T ). This function is given by the straight black line connecting the points at volumes
v1 and v2 in fig. 7.2.
The two equations which give us v1 and v2 are

∂f ∂f f (T, v2 ) − f (T, v1 )
= = . (7.15)
∂v v ,T ∂v v ,T (v2 − v1 )
1 2
7.2. THE VAN DER WAALS SYSTEM 365

Figure 7.2: Molar free energy f (T, v) of the van der Waals system T = 0.85 Tc , with dot-dashed black
line showing Maxwell construction connecting molar volumes v1,2 on opposite sides of the coexistence
curve.


Equivalently, in terms of the pressure, p = − ∂f
∂v T , these equations are equivalent to

Zv2
1
p(T, v1 ) = p(T, v2 ) = dv p(T, v) . (7.16)
v2 − v1
v1

This procedure is known as the Maxwell construction, and is depicted graphically in Fig. 7.3. When
the Maxwell construction is enforced, the isotherms resemble the curves in Fig. 7.4. In this figure,
∂ 2f
all points within the purple shaded region have ∂v 2 < 0, hence this region is unstable to infinitesimal
fluctuations. The boundary of this region is called the spinodal, and the spontaneous phase separation
into two phases is a process known as spinodal decomposition. The dot-dashed orange curve, called the
coexistence curve, marks the instability boundary for nucleation. In a nucleation process, an energy
barrier must be overcome in order to achieve the lower free energy state. There is no energy barrier for
spinodal decomposition – it is a spontaneous process.

7.2.2 Analytic form of the coexistence curve near the critical point

We write vL = vc + wL and vG = vc + wG . One of our equations is p(vc + wL , T ) = p(vc + wG , T ). Taylor


expanding in powers of wL and wG , we have
 
0 = pv (vc , T ) (wG − wL ) + 1
2 pvv (vc , T ) wG2 − wL2 + 16 pvvv (vc , T ) wG3 − wL3 + . . . , (7.17)

where
∂p ∂2p ∂3p ∂2p
pv ≡ , pvv ≡ , pvvv ≡ , pvT ≡ , etc. (7.18)
∂v ∂v 2 ∂v 3 ∂v ∂T
366 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.3: Maxwell construction in the (v, p) plane. The system is absolutely unstable between volumes
vd and ve . For v ∈ [va , vd ] of v ∈ [ve , vc ], the solution is unstable with respect to phase separation. Source:
Wikipedia.

The second equation we write as

ZwG  
dw p(vc + w, T ) = 12 (wG − wL ) p(vc + wL , T ) + p(vc + wG , T ) . (7.19)
wL

Expanding in powers of wL and wG , this becomes


 
p(vc , T ) (wG − wL ) + 12 pv (vc , T ) wG2 − wL2 + 16 pvv (vc , T ) wG3 − wL3
 
+ 241
pvvv (vc , T ) wG4 − wL4 + 1201
pvvvv (vc , T ) wG5 − wL5 + . . .
n  (7.20)
= 12 (wG − wL ) 2 p(vc , T ) + pv (vc , T ) (wG + wL ) + 12 pvv (vc , T ) wG2 + wL2
  o
+ 61 pvvv (vc , T ) wG3 + wL3 + 24 1
pvvvv (vc , T ) wG4 + wL4 + . . .

1

Subtracting the LHS from the RHS, we find that we can then divide by 6 wG2 − wL2 , resulting in
 
0 = pvv (vc , T ) + 12 pvvv (vc , T ) (wG + wL ) + 1
20 pvvvv (vc , T ) 3wG2 + 4wG wL + 3wL2 + O wG,L
3
. (7.21)

We now define w± ≡ wG ± wL . In terms of these variables, eqns. 7.17 and 7.21 become

2 2
 3

0 = pv (vc , T ) + 21 pvv (vc , T ) w+ + 18 pvvv (vc , T ) w+ + 13 w− + O w±
(7.22)
2 2
 3

0 = pvv (vc , T ) + 21 pvvv (vc , T ) w+ + 18 pvvvv (vc , T ) w+ + 51 w− + O w± .
7.2. THE VAN DER WAALS SYSTEM 367

We now evaluate w± to order T . Note that pv (vc , Tc ) = pvv (vc , Tc ) = 0, since the critical point is an
2
vT Θ + O(Θ ), where T = Tc + Θ and
inflection point in the (v, p) plane. Thus, we have pv (vc , T ) = p√
pvT = pvT (vc , Tc ). We can then see that w− is of leading order −Θ , while w+ is of leading order Θ.
This allows us to write
2
0 = pvT Θ + 1
24 pvvv w− + O(Θ 2 )
(7.23)
1 1 2 2
0 = pvvT Θ + pvvv w+ +
2 40 pvvvv w− + O(Θ ) .

Thus,
 1/2
24 pvT √
w− = −Θ + . . .
pvvv
  (7.24)
6 pvT pvvvv − 10 pvvv pvvT
w+ = Θ + ... .
5 p2vvv
We then have
   
6 pvT 1/2 √ 3 pvT pvvvv − 5 pvvv pvvT 
wL = − −Θ + 2
Θ + O Θ 3/2
pvvv 5 pvvv
    (7.25)
6 pvT 1/2 √ 3 pvT pvvvv − 5 pvvv pvvT 
wG = −Θ + 2
Θ + O Θ3/2 .
pvvv 5 pvvv

Suppose we follow along an isotherm starting from the high molar volume (gas) phase. If T > Tc , the
volume v decreases continuously as the pressure p increases. If T < Tc , then at the instant the isotherm
first intersects the orange boundary curve in Fig. 7.4, there is a discontinuous change in the molar volume
from high (gas) to low (liquid). This discontinuous change is the hallmark of a first order phase transition.
Note that the volume discontinuity, ∆v = w− ∝ (Tc − T )1/2 . This is an example of a critical behavior in
which the order parameter φ, which in this case may be taken to be the difference φ = vG − vL , behaves

as a power law in T − Tc , where Tc is the critical temperature. In this case, we have φ(T ) ∝ (Tc − T )β+ ,
where β = 21 is the exponent, and where (Tc − T )+ is defined to be Tc − T if T < Tc and 0 otherwise. The
isothermal compressibility is κT = −v/pv (v, T ). This is finite along the coexistence curve – it diverges
only along the spinodal. It therefore diverges at the critical point, which lies at the intersection of the
spinodal and the coexistence curve.
It is convenient to express the equation of state and the coexistence curve in terms of dimensionless
variables. Write
p v T
p̄ = , v̄ = , T̄ = . (7.26)
pc vc Tc
The van der Waals equation of state then becomes
8T̄ 3
p̄ = − 2 . (7.27)
3v̄ − 1 v̄
Further expressing these dimensionless quantities in terms of distance from the critical point, we write

p̄ = 1 + π , v̄ = 1 + ǫ , T̄ = 1 + t . (7.28)
368 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.4: Pressure-volume isotherms for the van der Waals system, as in Fig. 7.1, but corrected
to account for the Maxwell construction. The boundary of the purple shaded region is the spinodal
line p̄∗ (v̄). The boundary of the orange shaded region is the stability boundary with respect to phase
separation.

Thus,
8(1 + t) 3
π(ǫ, t) = − −1 . (7.29)
2 + 3ǫ (1 + ǫ)2
Note that the LHS and the RHS of this equation vanish identically for (π, ǫ, t) = (0, 0, 0). We can then
write    
6 πǫt 1/2 3 πǫt πǫǫǫǫ − 5 πǫǫǫ πǫǫt 
ǫL,G = ∓ (−t)1/2 + 2
t + O (−t)3/2
. (7.30)
πǫǫǫ 5 πǫǫǫ

7.2.3 History of the van der Waals equation

The van der Waals equation of state first appears in van der Waals’ 1873 PhD thesis3 , “Over de Con-
tinuı̈teit van den Gas - en Vloeistoftoestand” (“On the continuity of the gas and liquid state”). In his
Nobel lecture4 , van der Waals writes of how he was inspired by Rudolf Clausius’ 1857 treatise on the
nature of heat, where it is posited that a gas in fact consists of microscopic particles whizzing around at
3
Johannes Diderik van der Waals, the eldest of ten children, was the son of a carpenter. As a child he received only
a primary school education. He worked for a living until age 25, and was able to enroll in a three-year industrial evening
school for working class youth. Afterward he continued his studies independently, in his spare time, working as a teacher.
By the time he obtained his PhD, he was 36 years old. He received the Nobel Prize for Physics in 1910.
4
See http://www.nobelprize.org/nobel_prizes/physics/laureates/1910/waals-lecture.pdf
7.2. THE VAN DER WAALS SYSTEM 369

Figure 7.5: ‘Universality’ of the liquid-gas transition for eight different atomic and molecular fluids,
from E. A. Guggenheim, J. Chem. Phys. 13, 253 (1945). Dimensionless temperature T /Tc versus
dimensionless density ρ/ρc = vc /v is shown. The van der Waals / mean field theory gives ∆v = vgas −
vliquid ∝ (−t)1/2 , while experiments show a result closer to ∆v ∝ (−t)1/3 . Here t ≡ T̄ − 1 = (T − Tc )/Tc
is the dimensionless temperature deviation with respect to the critical point.

high velocities. van der Waals reasoned that liquids, which result when gases are compressed, also consist
of ’small moving particles’: ”Thus I conceived the idea that there is no essential difference between the
gaseous and the liquid state of matter. . . ”
Clausius’ treatise showed how his kinetic theory of heat was consistent with Boyle’s law for gases (pV =
constant at fixed temperature). van der Waals pondered why this might fail for the non-dilute liquid
phase, and he reasoned that there were two principal differences: inter-particle attraction and excluded
volume. These considerations prompted him to posit his famous equation,

RT a
p= − 2 . (7.31)
v−b v

The first term on the RHS accounts for excluded volume effects, and the second for mutual attractions.
In the limiting case of p → ∞, the molar volume approaches v = b. On physical grounds, one might
expect b = v0 /ζ, where v0 = NA ω0 is NA times the volume ω0 of a single molecule, and the packing
fraction is ζ = N ω0 /V = v0 /v, which is the ratio of the total molecular volume to the total system
volume. In three dimensions, the maximum possible packing fraction is for fcc and hcp lattices, each
π
of which have coordination number 12, with ζmax = 3√ 2
= 0.74078. Dense random packing results in
370 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

ζdrp = 0.634. Expanding the vdW equation of state in inverse powers of v yields
 
RT a RT 
p= + b− · 2 + O v −3 , (7.32)
v RT v
a

and we read of the second virial coefficient B2 = b − RT /NA . For hard spheres, a = 0, and the result
B2 = 4ω0 from the Mayer cluster expansion corresponds to bMayer = 4v0 , which is larger than the result
from even the loosest regular sphere packing, i.e. that for a cubic lattice, with ζcub = π6 .
Another of van der Waals’ great achievements was his articulation of the law of corresponding states.
Recall that the van der Waals equation of state, when written in terms of dimensionless quantities
p̄ = p/pc , v̄ = v/vc , and T̄ = T /Tc , takes the form of eqn. 7.27. Thus, while the a and b parameters are
specific to each fluid – see Tab. 7.1 – when written in terms of these scaled dimensionless variables, the
equation of state and all its consequent properties (i.e. the liquid-gas phase transition) are universal.
The van der Waals equation is best viewed as semi-phenomenological. Interaction and excluded volume
effects surely are present, but the van der Waals equation itself only captures them in a very approximate
way. It is applicable to gases, where it successfully predicts features that are not present in ideal systems
(e.g. throttling). It is of only qualitative and pedagogical use in the study of fluids, the essential physics of
which lies in the behavior of quantities like the pair distribution function g(r). As we saw in chapter 6, any
adequate first principles derivation of g(r) - a function which can be measured in scattering experiments -
involves rather complicated approximation schemes to close the BBGKY hierarchy. Else one must resort
to numerical simulations such as the Monte Carlo method. Nevertheless, the lessons learned from the
van der Waals system are invaluable and they provide us with a first glimpse of what is going on in the
vicinity of a phase transition, and how nonanalytic behavior, such as vG − vL ∝ (Tc − T )β with noninteger
exponent β may result due to singularities in the free energy at the critical point.

7.3 Fluids, Magnets, and the Ising Model

7.3.1 Lattice gas description of a fluid

The usual description of a fluid follows from a continuum Hamiltonian of the form
N
X X
p2i
Ĥ(p, x) = + u(xi − xj ) . (7.33)
2m
i=1 i<j

The potential u(r) is typically central, depending only on the magnitude |r|, and short-ranged. Now
consider a discretized version of the fluid, in which we divide up space into cells (cubes, say), each of
which can accommodate at most one fluid particle (due to excluded volume effects). That is, each cube
has a volume on the order of a3 , where a is the diameter of the fluid particles. In a given cube i we set
the occupancy ni = 1 if a fluid particle is present and ni = 0 if there is no fluid particle present. We then
have that the potential energy is
X X
U= u(xi − xj ) = 12 VRR′ nR nR′ , (7.34)
i<j R6=R′
7.3. FLUIDS, MAGNETS, AND THE ISING MODEL 371

Figure 7.6: The lattice gas model. An occupied cell corresponds to n = 1 (σ = +1), and a vacant cell to
n = 0 (σ = −1).

where VRR′ ≈ v(R − R′ ), where Rk is the position at the center of cube k. The grand partition function
is then approximated as
X Y   X 
nR 1
Ξ(T, V, µ) ≈ ξ exp − 2 β VRR′ nR nR′ , (7.35)
{nR } R R6=R′

where
ξ = eβµ λ−d d
T a , (7.36)
where a is the side length of each cube (chosen to be on the order
P of the hard sphere diameter). The
−d
λT factor arises from the integration over the momenta. Note R nR = N is the total number of fluid
particles, so Y
ξ nR = ξ N = eβµN λ−N
T
d Nd
a . (7.37)
R

Thus, we can write a lattice Hamiltonian,


X X
Ĥ = 21 VRR′ nR nR′ − kB T ln ξ nR
R6=R′ R
X X (7.38)
= − 12 JRR′ σR σR′ − H σR + E0 ,
R6=R′ R

where σR ≡ 2nR − 1 is a spin variable taking the possible values {−1, +1}, and
JRR′ = − 14 VRR′
X′ (7.39)
H = 21 kB T ln ξ − 1
4 VRR′ ,
R′

where the prime on the sum indicates that R′ = R is to be excluded. For the Lennard-Jones system,
VRR′ = v(R − R′ ) < 0 is due to the attractive tail of the potential, hence JRR′ is positive, which prefers
alignment of the spins σR and σR′ . This interaction is therefore ferromagnetic. The spin Hamiltonian in
eqn. 7.38 is known as the Ising model.
372 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

7.3.2 Phase diagrams and critical exponents

The physics of the liquid-gas transition in fact has a great deal in common with that of the transition
between a magnetized and unmagnetized state of a magnetic system. The correspondences are5

p ←→ H , v ←→ m ,

where m is the magnetization density, defined here to be the total magnetization M divided by the
number of lattice sites NS :6
M 1 X
m= = hσR i . (7.40)
NS NS
R

Sketches of the phase diagrams are reproduced in fig. 7.7. Of particular interest is the critical point, which
occurs at (Tc , pc ) in the fluid system and (Tc , Hc ) in the magnetic system, with Hc = 0 by symmetry.
In the fluid, the coexistence curve in the (p, T ) plane separates high density (liquid) and low density (va-
por) phases. The specific volume v (or the density n = v −1 ) jumps discontinuously across the coexistence
curve. In the magnet, the coexistence curve in the (H, T ) plane separates positive magnetization and
negative magnetization phases. The magnetization density m jumps discontinuously across the coexis-
tence curve. For T > Tc , the latter system is a paramagnet, in which the magnetization varies smoothly
as a function of H. This behavior is most apparent in the bottom panel of the figure, where v(p) and
m(H) curves are shown.
For T < Tc , the fluid exists in a two phase region, which is spatially inhomogeneous, supporting local
regions of high and low density. There is no stable homogeneous thermodynamic phase for (T, v) within
the two phase region shown in the middle left panel. Similarly, for the magnet, there is no stable
homogeneous thermodynamic phase at fixed temperature T and magnetization m if (T, m) lies within
the coexistence region. Rather, the system consists of blobs where the spin is predominantly up, and
blobs where the spin is predominantly down.
Note also the analogy between the isothermal compressibility κT and the isothermal susceptibility χT :
 
1 ∂v
κT = − , κT (Tc , pc ) = ∞
v ∂p T
 
∂m
χT = , χT (Tc , Hc ) = ∞
∂H T

The ‘order parameter’ for a second order phase transition is a quantity which vanishes in the disordered
phase and is finite in the ordered phase. For the fluid, the order parameter can be chosen to be Ψ ∝
(vvap − vliq ), the difference in the specific volumes of the vapor and liquid phases. In the vicinity of the

5
One could equally well identify the second correspondence as n ←→ m between density (rather than specific volume)
and magnetization. One might object that H is more properly analogous to µ. However, since µ = µ(p, T ) it can equally be
regarded as analogous to p. Note also that βp = zλ−d d
T for the ideal gas, in which case ξ = z(a/λT ) is proportional to p.
6
Note the distinction between the number of lattice sites NS and the number of occupied cells N . According to our
definitions, N = 12 (M + NS ).
7.3. FLUIDS, MAGNETS, AND THE ISING MODEL 373

Figure 7.7: Comparison of the liquid-gas phase diagram with that of the Ising ferromagnet.

critical point, the system exhibits power law behavior in many physical quantities, viz.

m(T, Hc ) ∼ Tc − T )β+

χ(T, Hc ) ∼ |T − Tc |−γ
(7.41)
CM (T, Hc ) ∼ |T − Tc |−α

m(Tc , H) ∼ ±|H|1/δ .
374 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

The quantities α, β, γ, and δ are the critical exponents associated with the transition. These exponents
satisfy certain equalities, such as the Rushbrooke and Griffiths relations and hyperscaling,7

α + 2β + γ = 2 (Rushbrooke)
β + γ = βδ (Griffiths) (7.42)
2 − α = dν (hyperscaling) .

Originally such relations were derived as inequalities, and only after the advent of scaling and renormal-
ization group theories it was realized that they held as equalities. We shall have much more to say about
critical behavior later on, when we discuss scaling and renormalization.

7.3.3 Gibbs-Duhem relation for magnetic systems

Homogeneity of E(S, M, NS ) means E = T S + HM + µNS , and, after invoking the First Law dE =
T dS + H dM + µ dNS , we have
S dT + M dH + NS dµ = 0 . (7.43)
Now consider two magnetic phases in coexistence. We must have dµ1 = dµ2 , hence

dµ1 = −s1 dT − m1 dH = −s2 dT − m2 dH = dµ2 , (7.44)

where m = M/NS is the magnetization per site and s = S/NS is the specific entropy. Thus, we obtain
the Clapeyron equation for magnetic systems,
 
dH s − s2
=− 1 . (7.45)
dT coex m1 − m2

Thus, if m1 6= m2 and dH dT coex = 0, then we must have s1 = s2 , which says that there is no latent
heat associated with the transition. This absence of latent heat is a consequence of the symmetry which
guarantees that F (T, H, NS ) = F (T, −H, NS ).

7.3.4 Order-disorder transitions

Another application of the Ising model lies in the theory of order-disorder transitions in alloys. Examples
include Cu3 Au, CuZn, and other compounds. In CuZn, the Cu and Zn atoms occupy sites of a body
centered cubic (BCC) lattice, forming an alloy known as β-brass. Below Tc ≃ 740 K, the atoms are
ordered, with the Cu preferentially occupying one simple cubic sublattice and the Zn preferentially
occupying the other.
The energy is a sum of pairwise interactions, with a given link contributing εAA , εBB , or εAB , depending
on whether it is an A-A, B-B, or A-B/B-A link. Here A and B represent Cu and Zn, respectively. Thus,
we can write the energy of the link hiji as

Eij = εAA PiA PjA + εBB PiB PjB + εAB PiA PjB + PiB PjA , (7.46)
7
In the third of the following exponent equalities, d is the dimension of space and ν is the correlation length exponent.
7.3. FLUIDS, MAGNETS, AND THE ISING MODEL 375

Figure 7.8: √Order-disorder


√ transition on the square lattice. Below T = Tc , order develops spontaneously
on the two 2 × 2 sublattices. There is perfect sublattice order at T = 0 (left panel).

where
(
A 1 1 if site i contains Cu
Pi = 2 (1 + σi ) =
0 if site i contains Zn

(
1 if site i contains Zn
PiB = 12 (1 − σi ) =
0 if site i contains Cu .
The Hamiltonian is then
X
Ĥ = Eij
hiji
X   

1 1 1
= 4 εAA + εBB − 2εAB σi σj + 4 εAA − εBB (σi + σj ) + 4 εAA + εBB + 2εAB
hiji
X X
= −J σi σj − H σi + E0 , (7.47)
hiji i

where the exchange constant J and the magnetic field H are given by

1

J= 4 2εAB − εAA − εBB
(7.48)
1

H= 4 εBB − εAA ,

and E0 = 18 N z εAA + εBB + 2εAB , where N is the total number of lattice sites and z = 8 is the lattice
coordination number , which is the number of nearest neighbors of any given site.
Note that
2εAB > εAA + εBB =⇒ J > 0 (ferromagnetic)
(7.49)
2εAB < εAA + εBB =⇒ J < 0 (antiferromagnetic) .
The antiferromagnetic case is depicted in fig. 7.8.
376 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

7.4 Mean Field Theory

Consider the Ising model Hamiltonian,


X X
Ĥ = −J σi σj − H σi , (7.50)
hiji i

where the first sum on the RHS is over all links of the lattice. Each spin can be either ‘up’ (σ = +1)
or ‘down’ (σ = −1). We further
 assume that the spins are located on a Bravais lattice8 and that the
coupling Jij = J |Ri − Rj | , where Ri is the position of the ith spin.
On each site i we decompose σi into a contribution from its thermodynamic average and a fluctuation
term, i.e.
σi = hσi i + δσi . (7.51)
We will write hσi i ≡ m, the local magnetization (dimensionless), and assume that m is independent of
position i. Then
σi σj = (m + δσi ) (m + δσj )
= m2 + m (δσi + δσj ) + δσi δσj (7.52)
2
= −m + m (σi + σj ) + δσi δσj .
The last term on the RHS of the second equation above is quadratic in the fluctuations, and we assume
this to be negligibly small. Thus, we obtain the mean field Hamiltonian
X
ĤMF = 12 N zJ m2 − H + zJm σi , (7.53)
i

where N is the total number of lattice sites. The first term is a constant, although the value of m is yet
to be determined. The Boltzmann weights are then completely determined by the second term, which is
just what we would write down for a Hamiltonian of noninteracting spins in an effective ‘mean field’
Heff = H + zJm . (7.54)
In other words, Heff = Hext + Hint , where the external field is applied field Hext = H, and the ‘internal
field’ is Hint = zJm. The internal field accounts for the interaction with the average values of all other
spins coupled to a spin at a given site, hence it is often called the ‘mean field’. Since the spins are
noninteracting, we have  
eβHeff − e−βHeff H + zJm
m = βH = tanh . (7.55)
e eff + e−βHeff kB T

It is a simple matter to solve for the free energy, given the noninteracting Hamiltonian ĤMF . The partition
function is X N
1
−β ĤMF − 2 βN zJ m2 β(H+zJm)σ
Z = Tr e =e e = e−βF . (7.56)
σ
8
A Bravais lattice is one in which any site is equivalent to any other site through an appropriate discrete translation.
Examples of Bravais lattices include the linear chain, square, triangular, simple cubic, face-centered cubic, etc. lattices. The
honeycomb lattice is not a Bravais lattice, because there are two sets of inequivalent sites – those in the center of a Y and
those in the center of an upside down Y.
7.4. MEAN FIELD THEORY 377

We now define dimensionless variables:


F kB T H
f≡ , θ≡ , h≡ , (7.57)
N zJ zJ zJ
and obtain the dimensionless free energy
 
f (m, h, θ) = 12 m2 − θ ln e(m+h)/θ + e−(m+h)/θ . (7.58)

Differentiating with respect to m gives the mean field equation,


m + h
m = tanh , (7.59)
θ
which is equivalent to the self-consistency requirement, m = hσi i.

7.4.1 h=0

When h = 0 the mean field equation becomes


m
m = tanh . (7.60)
θ
This nonlinear equation can be solved graphically, as in the top panel of fig. 7.9. The RHS in a tanh
function which gets steeper with decreasing t. If, at m = 0, the slope of tanh(m/θ) is smaller than unity,
then the curve y = tanh(m/h) will intersect y = m only at m = 0. However, if the slope is larger than
unity, there will be three such intersections. Since the slope is 1/θ, we identify θc = 1 as the mean field
transition temperature.
In the low temperature phase θ < 1, there are three solutions to the mean field equations. One solution is
always at m = 0. The other two solutions must be related by the m ↔ −m symmetry of the free energy
(when h = 0). The exact free energies are plotted in the bottom panel of fig. 7.9, but it is possible to
make analytical progress by assuming m is small and Taylor expanding the free energy f (m, θ) in powers
of m:
m
f (m, θ) = 12 m2 − θ ln 2 − θ ln cosh
θ
(7.61)
1 −1 2 m4 m6
= −θ ln 2 + 2 (1 − θ ) m + − + ... .
12 θ 3 45 θ 5
Note that the sign of the quadratic term is positive for θ > 1 and negative for θ < 1. Thus, the shape
of the free energy f (m, θ) as a function of m qualitatively changes at this point, θc = 1, the mean field
transition temperature, also known as the critical temperature.
For θ > θc , the free energy f (m, θ) has a single minimum at m = 0. Below θc , the curvature at m = 0
reverses, and m = 0 becomes a local maximum. There are then two equivalent minima symmetrically
displaced on either side of m = 0. Differentiating with respect to m, we find these local minima. For
θ < θc , the local minima are found at

m2 = 3θ 2 (1 − θ) = 3(1 − θ) + O (1 − θ)2 . (7.62)
378 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.9: .
Results for h = 0. Upper panels: graphical solution to self-consistency equation m = tanh(m/θ) at
temperatures θ = 0.65 (blue) and θ = 1.5 (dark red). Lower panel: mean field free energy, with energy
shifted by θ ln 2 so that f (m = 0, θ) = 0.

Thus, we find for |θ − 1| ≪ 1,


√ 1/2
m(θ, h = 0) = ± 3 1 − θ + , (7.63)

where the + subscript indicates that this solution is only for 1 − θ > 0. For θ > 1 the only solution is
m = 0. The exponent with which m(θ) vanishes as θ → θc− is denoted β. I.e. m(θ, h = 0) ∝ (θc − θ)β+ .

7.4.2 Specific heat

We can now expand the free energy f (θ, h = 0). We find


(
−θ ln 2 if θ > θc
f (θ, h = 0) = 3 2 4
 (7.64)
−θ ln 2 − 4 (1 − θ) + O (1 − θ) if θ < θc .

Thus, if we compute the heat capacity, we find in the vicinity of θ = θc


(
∂ 2f 0 if θ > θc
cV = −θ 2 = 3
(7.65)
∂θ 2 if θ < θc .
7.4. MEAN FIELD THEORY 379

Figure 7.10: . 
Results for h = 0.1. Upper panels: graphical solution to self-consistency equation m = tanh (m + h)/θ
at temperatures θ = 0.65 (blue), θ = 0.9 (dark green), and θ = 1.5 (dark red). Lower panel: mean field
free energy, with energy shifted by θ ln 2 so that f (m = 0, θ) = 0.

Thus, the specific heat is discontinuous at θ = θc . We emphasize that this is only valid near θ = θc = 1.
The general result valid for all θ is9

1 m2 (θ) − m4 (θ)
cV (θ) = · , (7.66)
θ θ − 1 + m2 (θ)

With this expression one can check both limits θ → 0 and θ → θc . As θ → 0 the magnetization
saturates and one has m2 (θ) ≃ 1 − 4 e−2/θ . The numerator then vanishes as e−2/θ , which overwhelms
the denominator that itself vanishes as θ 2 . As a result, cV (θ → 0) = 0, as expected. As θ → 1, invoking
m2 ≃ 3(1 − θ) we recover cV (θc− ) = 23 .
In the theory of critical phenomena, cV (θ) ∝ |θ − θc |−α as θ → θc . We see that mean field theory yields
α = 0.

9

To obtain this result, one writes f = f θ, m(θ) and then differentiates twice with respect to θ, using the chain rule.
∂f
Along the way, any naked (i.e. undifferentiated) term proportional to ∂m may be dropped, since this vanishes at any θ by
the mean field equation.
380 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

7.4.3 h 6= 0

Consider without loss of generality the case h > 0. The minimum of the free energy f (m, h, θ) now lies
at m > 0 for any θ. At low temperatures, the double well structure we found in the h = 0 case is tilted so
that the right well lies lower in energy than the left well. This is depicted in fig. 7.10. As the temperature
is raised, the local minimum at m < 0 vanishes, annihilating with the local maximum in a saddle-node
∂f ∂ 2f
bifurcation. To find where this happens, one sets ∂m = 0 and ∂m 2 = 0 simultaneously, resulting in

 √ 

√ θ 1+ 1−θ
h (θ) = 1 − θ − ln √ . (7.67)
2 1− 1−θ
 
The solutions lie at h = ±h∗ (θ). For θ < θc = 1 and h ∈ −h∗ (θ) , +h∗ (θ) , there are three solutions
to the mean field equation. Equivalently we could in principle invert the above expression to obtain
θ ∗ (h). For θ > θ ∗ (h), there is only a single global minimum in the free energy f (m) and there is no local
minimum. Note θ ∗ (h = 0) = 1.
Assuming h ≪ |θ − 1| ≪ 1, the mean field solution for m(θ, h) will also be small, and we expand the free
energy in m, and to linear order in h:
m4 hm
f (m, h, θ) = −θ ln 2 + 12 (1 − θ −1 ) m2 + −
12 θ 3 θ (7.68)
1 2 1 4
= f0 + 2 (θ − 1) m + 12 m − hm + . . . .
∂f
Setting ∂m = 0, we obtain
1 3
3m + (θ − 1) · m − h = 0 . (7.69)
If θ > 1 then we have a solution m = h/(θ − 1). The m3
term can be ignored because it is higher order
in h, and we have assumed h ≪ |θ − 1| ≪ 1. This is known as the Curie-Weiss law 10 . The magnetic
susceptibility behaves as
∂m 1
χ(θ) = = ∝ |θ − 1|−γ , (7.70)
∂h θ−1
where the magnetization critical exponent γ is γ = 1. If θ < 1 then while there is still a solution at
m = h/(θ − 1), it lies at a local maximum of the free energy, as
√ shown in fig. 7.10. The minimum of the
free energy occurs close to the h = 0 solution m = m0 (θ) ≡ 3 (1 − θ), and writing m = m0 + δm we
find δm to linear order in h as δm(θ, h) = h/2(1 − θ). Thus,
√ h
m(θ, h) = 3 (1 − θ)1/2 + . (7.71)
2(1 − θ)
10
Pierre Curie was a pioneer in the fields of crystallography, magnetism, and radiation physics. In 1880, Pierre and his
older brother Jacques discovered piezoelectricity. He was 21 years old at the time. It was in 1895 that Pierre made the first
systematic studies of the effects of temperature on magnetic materials, and he formulated what is known as Curie’s Law ,
χ = C/T , where C is a constant. Curie married Marie Sklodowska in the same year. Their research turned toward radiation,
recently discovered by Becquerel and Röntgen. In 1898, Pierre and Marie Curie discovered radium. They shared the 1903
Nobel Prize in Physics with Becquerel. Marie went on to win the 1911 Nobel Prize in Chemistry and was the first person
ever awarded two Nobel Prizes. Their daughter Irène Joliot Curie shared the 1935 Prize in Chemistry (with her husband),
also for work on radioactivity. Pierre Curie met an untimely and unfortunate end in the Spring of 1906. Walking across the
Place Dauphine, he slipped and fell under a heavy horse-drawn wagon carrying military uniforms. His skull was crushed by
one of the wagon wheels, killing him instantly. Later on that year, Pierre-Ernest Weiss proposed a modification of Curie’s
Law to account for ferromagnetism. This became known as the Curie-Weiss law, χ = C/(T − Tc ).
7.4. MEAN FIELD THEORY 381

2D Ising 3D Ising CO2


Exponent MFT (exact) (numerical) (expt.)
α 0 0 0.125 <
∼ 0.1
β 1/2 1/8 0.313 0.35
γ 1 7/4 1.25 1.26
δ 3 15 5 4.2

Table 7.2: Critical exponents from mean field theory as compared with exact results for the two-
dimensional Ising model, numerical results for the three-dimensional Ising model, and experiments on
the liquid-gas transition in CO2 . Source: H. E. Stanley, Phase Transitions and Critical Phenomena.

Once again, we find that χ(θ) diverges as |θ − 1|−γ with γ = 1. The exponent γ on either side of the
transition is the same.
Finally, we can set θ = θc and examine m(h). We find, from eqn. 7.69,

m(θ = θc , h) = (3h)1/3 ∝ h1/δ , (7.72)

where δ is a new critical exponent. Mean field theory gives δ = 3. Note that at θ = θc = 1 we have
m = tanh(m + h), and inverting we find
 
1 1+m m3 m5
h(m, θ = θc ) = 2 ln −m= + + ... , (7.73)
1−m 3 5

which is consistent with what we just found for m(h, θ = θc ).


How well does mean field theory do in describing the phase transition of the Ising model? In table 7.2 we
compare our mean field results for the exponents α, β, γ, and δ with exact values for the two-dimensional
Ising model, numerical work on the three-dimensional Ising model, and experiments on the liquid-gas
transition in CO2 . The first thing to note is that the exponents are dependent on the dimension of space,
and this is something that mean field theory completely misses. In fact, it turns out that the mean field
exponents are exact provided d > du , where du is the upper critical dimension of the theory. For the
Ising model, du = 4, and above four dimensions (which is of course unphysical) the mean field exponents
are in fact exact. We see that all in all the MFT results compare better with the three dimensional
exponent values than with the two-dimensional ones – this makes sense since MFT does better in higher
dimensions. The reason for this is that higher dimensions means more nearest neighbors, which has the
effect of reducing the relative importance of the fluctuations we neglected to include.

7.4.4 Magnetization dynamics

Dissipative processes drive physical systems to minimum energy states. We can crudely model the
dissipative dynamics of a magnet by writing the phenomenological equation

dm ∂f
=− , (7.74)
ds ∂m
382 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.11: Dissipative magnetization dynamics ṁ = −f ′ (m). Bottom panel shows h∗ (θ) from eqn.
7.67. For (θ, h) within the blue shaded region, the free energy f (m) has a global minimum plus a local
minimum and a local maximum. Otherwise f (m) has only a single global minimum. Top panels show an
imperfect bifurcation in the magnetization dynamics at h = 0.0215 , for which θ ∗ = 0.90. Temperatures
shown: θ = 0.65 (blue), θ = θ ∗ (h) = 0.90 (green), and θ = 1.2. The rightmost stable fixed point
corresponds to the global minimum of the free energy. The bottom of the middle two upper panels
shows h = 0, where both of the attractive fixed points and the repulsive fixed point coalesce into a single
attractive fixed point (supercritical pitchfork bifurcation).

where s is a dimensionless time variable. Under these dynamics, the free energy is never increasing:
 
df ∂f ∂m ∂f 2
= =− ≤0. (7.75)
ds ∂m ∂s ∂m
∂f
Clearly the fixed point of these dynamics, where ṁ = 0, is a solution to the mean field equation ∂m = 0.
The phase flow for the equation ṁ = −f ′ (m) is shown in fig. 7.11. As we have seen, for any value of h
there is a temperature θ ∗ below which the free energy f (m) has two local minima and one local maximum.
When h = 0 the minima are degenerate, but at finite h one of the minima is a global minimum. Thus, for
θ < θ ∗ (h) there are three solutions to the mean field equations. In the language of dynamical systems,
under the dynamics of eqn. 7.74, minima of f (m) correspond to attractive fixed points and maxima to
repulsive fixed points. If h > 0, the rightmost of these fixed points corresponds to the global minimum
of the free energy. As θ is increased, this fixed point evolves smoothly. At θ = θ ∗ , the (metastable) local
minimum and the local maximum coalesce and annihilate in a saddle-note bifurcation. However at h = 0
all three fixed points coalesce at θ = θc and the bifurcation is a supercritical pitchfork. As a function of
t at finite h, the dynamics are said to exhibit an imperfect bifurcation, which is a deformed supercritical
7.4. MEAN FIELD THEORY 383

Figure 7.12: Top panel : hysteresis as a function of ramping the dimensionless magnetic field h at
θ = 0.40. Dark red arrows below the curve follow evolution of the magnetization on slow increase of h.
Dark grey arrows above the curve follow evolution of the magnetization on slow decrease of h. Bottom
panel : solution set for m(θ, h) as a function of h at temperatures θ = 0.40 (blue), θ = θc = 1.0 (dark
green), and t = 1.25 (red).

pitchfork.
The solution set for the mean field equation is simply expressed by inverting the tanh function to obtain
h(θ, m). One readily finds  
θ 1+m
h(θ, m) = ln −m . (7.76)
2 1−m
 
As we see in the bottom panel of fig. 7.12, m(h) becomes multivalued for h ∈ − h∗ (θ) , +h∗ (θ) , where
h∗ (θ) is given in eqn. 7.67. Now imagine that θ < θc and we slowly ramp the field h from a large
negative value to a large positive value, and then slowly back down to its original value. On the time
scale of the magnetization dynamics, we can regard h(s) as a constant. (Remember the time variable is
s here.) Thus, m(s) will flow to the nearest stable fixed point. Initially the system starts with m = −1
and h large and negative, and there is only one fixed point, at m∗ ≈ −1. As h slowly increases, the
fixed point value m∗ also slowly increases. As h exceeds −h∗ (θ), a saddle-node bifurcation occurs, and
two new fixed points are created at positive m, one stable and one unstable. The global minimum of
the free energy still lies at the fixed point with m∗ < 0. However, when h crosses h = 0, the global
minimum of the free energy lies at the most positive fixed point m∗ . The dynamics, however, keep the
system stuck in what is a metastable phase. This persists until h = +h∗ (θ), at which point another
saddle-note bifurcation occurs, and the attractive fixed point at m∗ < 0 annihilates with the repulsive
fixed point. The dynamics then act quickly to drive m to the only remaining fixed point. This process
is depicted in the top panel of fig. 7.12. As one can see from the figure, the the system follows a stable
384 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

fixed point until the fixed point disappears, even though that fixed point may not always correspond to a
global minimum of the free energy. The resulting m(h) curve is then not reversible as a function of time,
and it possesses a characteristic shape known as a hysteresis loop. Etymologically, the word hysteresis
derives from the Greek υστ ερησις, which means ‘lagging behind’. Systems which are hysteretic exhibit a
history-dependence to their status, which is not uniquely determined by external conditions. Hysteresis
may be exhibited with respect to changes in applied magnetic field, changes in temperature, or changes
in other externally determined parameters.

7.4.5 Beyond nearest neighbors

Suppose we had started with the more general model,


X X
Ĥ = − Jij σi σj − H σi
i<j i
X X (7.77)
= − 12 Jij σi σj − H σi ,
i6=j i

where Jij is the coupling between spins on sites i and j. In the top equation above, each pair (ij) is
counted once in the interaction term; this may be replaced by a sum over all i and j if we include a factor
of 12 .11 The resulting mean field Hamiltonian is then
X
ĤMF = 21 N Jˆ(0) m2 − H + J(0)
ˆ m σi . (7.78)
i

ˆ
Here, J(q) is the Fourier transform of the interaction matrix Jij :12
X
ˆ
J(q) = J(R) e−iq·R . (7.79)
R

ˆ
For nearest neighbor interactions only, one has J(0) = zJ, where z is the lattice coordination number ,
i.e. the number of nearest neighbors of any given site. The scaled free energy is as in eqn. 7.58, with
f = F/N Jˆ(0), θ = kB T /Jˆ(0), and h = H/J(0).
ˆ The analysis proceeds precisely as before, and we
MF ˆ
conclude θc = 1, i.e. kB Tc = J (0).

7.4.6 Ising model with long-ranged forces

Consider an Ising model where Jij = J/N for all i and j, so that there is a very weak interaction between
every pair of spins. The Hamiltonian is then
 2
J X X
Ĥ = − σi − H σk . (7.80)
2N
i k
11
The self-interaction terms with i = j contribute a constant to Ĥ and may be either included or excluded. However, this
property only pertains to the σi = ±1 model. For higher spin versions of the Ising model, say where Si ∈ {−1, 0, +1}, then
Si2 is not constant and we should explicitly exclude the self-interaction terms.
12
The sum in the discrete Fourier transform is over all ‘direct Bravais lattice vectors’ and the wavevector q may be
restricted to the ‘first Brillouin zone’. These terms are familiar from elementary solid state physics.
7.4. MEAN FIELD THEORY 385

The partition function is


" X 2 #
βJ X
Z = Tr {σi } exp σi + βH σi . (7.81)
2N
i i

We now invoke the Gaussian integral,


Z∞ r
−αx2 −βx π β 2 /4α
dx e = e . (7.82)
α
−∞

Thus,
" X 2 #  1/2 Z∞
βJ N βJ 1 2
P
exp σi = dm e− 2 N βJm +βJm i σi , (7.83)
2N 2π
i −∞
and we can write the partition function as
  Z∞ X N
N βJ 1/2 − 12 N βJm2 β(H+Jm)σ
Z= dm e e
2π σ
−∞
  Z∞ (7.84)
N 1/2
= dm e−N A(m)/θ ,
2πθ
−∞

where θ = kB T /J, h = H/J, and


  
1 2 h+m
A(m) = 2m − θ ln 2 cosh . (7.85)
θ

Since N → ∞, we can perform the integral using the method of steepest descents. Thus, we must set
 ∗ 
dA ∗ m +h
= 0 =⇒ m = tanh . (7.86)
dm m∗ θ
Expanding about m = m∗ , we write
A(m) = A(m∗ ) + 12 A′′ (m∗ ) (m − m∗ )2 + 61 A′′′ (m∗ ) (m − m∗ )3 + . . . . (7.87)
Performing the integrations, we obtain
  Z∞ " #
N 1/2 −N A(m∗ )/θ N A′′ (m∗ ) 2 N A′′′ (m∗ ) 3
Z= e dν exp − m − m + ...
2πθ 2θ 6θ
−∞ (7.88)
1 n o
−N A(m∗ )/θ
=p e · 1 + O(N −1 ) .
A′′ (m∗ )
The corresponding free energy per site
F θ
f= = A(m∗ ) + ln A′′ (m∗ ) + O(N −2 ) , (7.89)
NJ 2N
where m∗ is the solution to the mean field equation which minimizes A(m). Mean field theory is exact
for this model!
386 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

7.5 Variational Density Matrix Method

7.5.1 The variational principle

Suppose we are given a Hamiltonian Ĥ. From this we construct the free energy, F :

F = E − TS
(7.90)
= Tr (̺ Ĥ) + kB T Tr (̺ ln ̺) .

Here, ̺ is the density matrix13 . A physical density matrix must be (i) normalized (i.e. Tr ̺ = 1), (ii)
Hermitian, and (iii) non-negative definite (i.e. all the eigenvalues of ̺ must be non-negative).
Our goal is to extremize the free energy subject to the various constraints on ̺. Let us assume that ̺ is
diagonal in the basis of eigenstates of Ĥ, i.e.
X

̺= Pγ γ γ , (7.91)
γ

where Pγ is the probability that the system is in state γ . Then
X X
F = Eγ Pγ + kB T Pγ ln Pγ . (7.92)
γ γ

Thus, the free energy is a function of the set {Pγ }. We now extremize F subject to the normalization
constraint. This means we form the extended function
  X 
F ∗ {Pγ }, λ = F {Pγ } + λ Pγ − 1 , (7.93)
γ

and then freely extremize over both the probabilities {Pγ } as well as the Lagrange multiplier λ. This
yields the Boltzmann distribution,
1
Pγeq = exp(−Eγ /kB T ) , (7.94)
Z
P
where Z = γ e−Eγ /kB T = Tr e−Ĥ/kB T is the canonical partition function, which is related to λ through

λ = kB T (ln Z − 1) . (7.95)

Note that the Boltzmann weights are, appropriately, all positive.


If the spectrum of Ĥ is bounded from below, our extremum should in fact yield a minimum for the free
energy F . Furthermore, since we have freely minimized over all the probabilities, subject to the single
normalization constraint, any distribution {Pγ } other than the equilibrium one must yield a greater value
of F .
Alas, the Boltzmann distribution, while exact, is often intractable to evaluate. For one-dimensional
systems, there are general methods such as the transfer matrix approach which do permit an exact
13
How do we take the logarithm of a matrix? The rule is this: A = ln B if B = exp(A). The exponential of a matrix may
be evaluated via its Taylor expansion.
7.5. VARIATIONAL DENSITY MATRIX METHOD 387

evaluation of the free energy. However, beyond one dimension the situation is in general hopeless. A
family of solvable (“integrable”) models exists in two dimensions, but their solutions require specialized
techniques and are extremely difficult. The idea behind the variational density matrix approximation is
to construct a tractable trial density matrix ̺ which depends on a set of variational parameters {xα },
and to minimize with respect to this (finite) set.

7.5.2 Variational density matrix for the Ising model

Consider once again the Ising model Hamiltonian,


X X
Ĥ = − Jij σi σj − H σi . (7.96)
i<j i

The states of the system γ may be labeled by the values of the spin variables: γ ←→ σ1 , σ2 , . . . .
We assume the density matrix is diagonal in this basis, i.e.

̺N γ γ ′ ≡ ̺(γ) δγ,γ ′ , (7.97)

where Y
δγ,γ ′ = δσi ,σi′ . (7.98)
i
Indeed, this is the case for the exact density matrix, which is to say the Boltzmann weight,
1 −β Ĥ(σ1 ,...,σ )
̺N (σ1 , σ2 , . . .) = e N . (7.99)
Z

We now write a trial density matrix which is a product over contributions from independent single sites:
Y
̺N (σ1 , σ2 , . . .) = ̺(σi ) , (7.100)
i

where 1 + m 1 − m
̺(σ) = δσ,1 + δσ,−1 . (7.101)
2 2
Note that we’ve changed our notation slightly. We are denoting by ̺(σ) the corresponding diagonal
element of the matrix  1+m 
2 0
̺= 1−m , (7.102)
0 2
and the full density matrix is a tensor product over the single site matrices:

̺N = ̺ ⊗ ̺ ⊗ · · · ⊗ ̺ . (7.103)

Note that ̺ and hence ̺N are appropriately normalized. The variational parameter here is m, which, if ρ
is to be non-negative definite, must satisfy −1 ≤ m ≤ 1. The quantity m has the physical interpretation
of the average spin on any given site, since
X
hσi i = ̺(σ) σ = m. (7.104)
σ
388 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

We may now evaluate the average energy:


X X
E = Tr (̺N Ĥ) = − Jij m2 − H m
i<j i (7.105)
= ˆ m2
− 12 N J(0) − N Hm ,

ˆ
where once again J(0) is the discrete Fourier transform of J(R) at wavevector q = 0. The entropy is
given by

S = −kB Tr (̺N ln ̺N ) = −N kB Tr (̺ ln ̺)
  (7.106)
1 + m 1 + m 1 − m 1 − m
= −N kB ln + ln .
2 2 2 2

ˆ
We now define the dimensionless free energy per site: f ≡ F/N J(0). We have
  1 − m 
1 + m 1 + m 1 − m
f (m, h, θ) = − 21 m2 − hm + θ ln + ln , (7.107)
2 2 2 2

where θ ≡ kB T /Jˆ(0) is the dimensionless temperature, and h ≡ H/Jˆ(0) the dimensionless magnetic field,
as before. We extremize f (m) by setting

∂f θ 1 + m
= 0 = −m − h + ln . (7.108)
∂m 2 1−m

Solving for m, we obtain  


m+h
m = tanh , (7.109)
θ
which is precisely what we found in eqn. 7.59.
Note that the optimal value of m indeed satisfies the requirement |m| ≤ 1 of non-negative probability.
This nonlinear equation may be solved graphically. For h = 0, the unmagnetized solution m = 0
always
p applies. However, for θ < 1 there are two additional solutions at m = ±mA (θ), with mA (θ) =
3(1 − θ) + O (1 − θ)3/2 for t close to (but less than) one. These solutions, which are related by the
Z2 symmetry of the h = 0 model, are in fact the low energy solutions. This is shown clearly in figure
7.13, where the variational free energy f (m, t) is plotted as a function of m for a range of temperatures
interpolating between ‘high’ and ‘low’ values. At the critical temperature θc = 1, the lowest energy state
changes from being unmagnetized (high temperature) to magnetized (low temperature).
For h > 0, there is no longer a Z2 symmetry (i.e. σi → −σi ∀ i). The high temperature solution now has
m > 0 (or m < 0 if h < 0), and this smoothly varies as t is lowered, approaching the completely polarized
limit m = 1 as θ → 0. At very high temperatures, the argument of the tanh function is small, and we
may approximate tanh(x) ≃ x, in which case

h
m(h, θ) = . (7.110)
θ − θc
7.5. VARIATIONAL DENSITY MATRIX METHOD 389

Figure 7.13: Variational field free energy ∆f = f (m, h, θ) + θ ln 2 versus magnetization m at six equally
spaced temperatures interpolating between ‘high’ (θ = 1.25, red) and ‘low’ (θ = 0.75, blue) values. Top
panel: h = 0. Bottom panel: h = 0.06.

This is called the Curie-Weiss law. One can infer θc from the high temperature susceptibility χ(θ) =
(∂m/∂h)h=0 by plotting χ−1 versus θ and extrapolating to obtain the θ-intercept. In our case, χ(θ) =
(θ − θc )−1 . For low θ and weak h, there are two inequivalent minima in the free energy.
When m is small, it is appropriate to expand f (m, h, θ), obtaining

f (m, h, θ) = −θ ln 2 − hm + 21 (θ − 1) m2 + θ
12 m4 + θ
30 m6 + θ
56 m8 + . . . . (7.111)

This is known as the Landau expansion of the free energy in terms of the order parameter m. An order
parameter is a thermodynamic variable φ which distinguishes ordered and disordered phases. Typically
φ = 0 in the disordered (high temperature) phase, and φ 6= 0 in the ordered (low temperature) phase.
When the order sets in continuously, i.e. when φ is continuous across θc , the phase transition is said to
be second order. When φ changes abruptly, the transition is first order. It is also quite commonplace to
observe phase transitions between two ordered states. For example, a crystal, which is an ordered state,
may change its lattice structure, say from a high temperature tetragonal phase to a low temperature
orthorhombic phase. When the high T phase possesses the same symmetries as the low T phase, as in
the tetragonal-to-orthorhombic example, the transition may be second order. When the two symmetries
are completely unrelated, for example in a hexagonal-to-tetragonal transition, or in a transition between
a ferromagnet and an antiferromagnet, the transition is in general first order.
Throughout this discussion, we have assumed that the interactions Jij are predominantly ferromagnetic,
i.e. Jij > 0, so that all the spins prefer to align. When Jij < 0, the interaction is said to be antiferro-
390 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

magnetic and prefers anti-alignment of the spins (i.e. σi σj = −1). Clearly not every pair of spins can
be anti-aligned – there are two possible spin states and a thermodynamically extensive number of spins.
But on the square lattice, for example, if the only interactions Jij are between nearest neighbors and
the interactions are antiferromagnetic, then the lowest energy configuration (T = 0 ground state) will
be one in which spins on opposite sublattices are anti-aligned. The square lattice is bipartite – it breaks
up into two interpenetrating sublattices A and B (which are themselves square ◦
√ lattices, rotated by 45
with respect to the original, and with a larger lattice constant by a factor of 2), such that any site in
A has nearest neighbors in B, and vice versa. The honeycomb lattice is another example of a bipartite
lattice. So is the simple cubic lattice. The triangular lattice, however, is not bipartite (it is tripartite).
Consequently, with nearest neighbor antiferromagnetic interactions, the triangular lattice Ising model is
highly frustrated. The moral of the story is this: antiferromagnetic interactions can give rise to compli-
cated magnetic ordering, and, when frustrated by the lattice geometry, may have finite specific entropy
even at T = 0.

7.5.3 Mean Field Theory of the Potts Model

The Hamiltonian for the Potts model is


X X
Ĥ = − Jij δσi ,σj − H δσi ,1 . (7.112)
i<j i

Here, σi ∈ {1, . . . , q}, with integer q. This is the so-called ‘q-state Potts model’. The quantity H is
analogous to an external magnetic field, and preferentially aligns (for H > 0) the local spins in the σ = 1
direction. We will assume H ≥ 0.
The q-component set is conveniently taken to be the integers from 1 to q, but it could be anything, such
as
σi ∈ {tomato, penny, ostrich, Grateful Dead ticket from 1987, . . .} . (7.113)
The interaction energy is −Jij if sites i and j contain the same object (q possibilities), and 0 if i and j
contain different objects (q 2 − q possibilities).
The two-state Potts model is equivalent to the Ising model. Let the allowed values of σ be ±1. Then the
quantity
δσ,σ′ = 21 + 12 σσ ′ (7.114)
equals 1 if σ = σ ′ , and is zero otherwise. The three-state Potts model cannot be written as a simple
three-state Ising model, i.e. one with a bilinear interaction σ σ ′ where σ ∈ {−1, 0, +1}. However, it is
straightforward to verify the identity

δσ,σ′ = 1 + 12 σσ ′ + 23 σ 2 σ ′2 − (σ 2 + σ ′2 ) . (7.115)

Thus, the q = 3-state Potts model is equivalent to a S = 1 (three-state) Ising model which includes
both bilinear (σσ ′ ) and biquadratic (σ 2 σ ′2 ) interactions, as well as a local field term which couples to the
square of the spin, σ 2 . In general one can find such correspondences for higher q Potts models, but, as
should be expected, the interactions become increasingly complex, with bi-cubic, bi-quartic, bi-quintic,
etc. terms. Such a formulation, however, obscures the beautiful Sq symmetry inherent in the model,
where Sq is the permutation group on q symbols, which has q! elements.
7.5. VARIATIONAL DENSITY MATRIX METHOD 391

Getting back to the mean field theory, we write the single site variational density matrix ̺ as a diagonal
matrix with entries  
1−x 
̺(σ) = x δσ,1 + 1 − δσ,1 , (7.116)
q−1
with ̺N (σ1 , . . . , σN ) = ̺(σ1 ) · · · ̺(σN ). Note that Tr (̺) = 1. The variational parameter is x. When
x = q −1 , all states are equally probable. But for x > q −1 , the state σ = 1 is preferred, and the other
(q − 1) states have identical but smaller probabilities. It is a simple matter to compute the energy and
entropy:
 
1 ˆ 2 (1 − x)2
E = Tr (̺N Ĥ) = − 2 N J(0) x + − N Hx
q−1
   (7.117)
1−x
S = −kB Tr (̺N ln ̺N ) = −N kB x ln x + (1 − x) ln .
q−1
The dimensionless free energy per site is then
    
1 2 (1 − x)2 1−x
f (x, θ, h) = − 2 x + + θ x ln x + (1 − x) ln − hx , (7.118)
q−1 q−1

where h = H/Jˆ(0). We now extremize with respect to x to obtain the mean field equation,
 
∂f 1−x 1−x
= 0 = −x + + θ ln x − θ ln −h . (7.119)
∂x q−1 q−1

Note that for h = 0, x = q −1 is a solution, corresponding to a disordered state in which all states are
equally probable. At high temperatures, for small h, we expect x − q −1 ∝ h. Indeed, using Mathematica
one can set
x ≡ q −1 + s , (7.120)
and expand the mean field equation in powers of s. One obtains
q (qθ − 1) q 3 (q − 2) θ 2
h= s+ s + O(s3 ) . (7.121)
q−1 2 (q − 1)2
For weak fields, |h| ≪ 1, and we have
(q − 1) h
s(θ) = + O(h2 ) , (7.122)
q (qθ − 1)
which again is of the Curie-Weiss form. The difference s = x − q −1 is the order parameter for the
transition.
Finally, one can expand the free energy in powers of s, obtaining the Landau expansion,
2h + 1 q (qθ − 1) 2 (q − 2) q 3 θ 3
f (s, θ, h) = − − θ ln q − hs + s − s
2q 2 (q − 1) 6 (q − 1)2
q3 θ h i q4θ h i
(7.123)
+ 1 + (q − 1)−3 s4 − 1 − (q − 1)−4 s5
12 20
q5θ h i
+ 1 + (q − 1)−5 s6 + . . . .
30
392 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Note that, for q = 2, the coefficients of s3 , s5 , and higher order odd powers of s vanish in the Landau
expansion. This is consistent with what we found for the Ising model, and is related to the Z2 symmetry
of that model. For q > 3, there is a cubic term in the mean field free energy, and thus we generically
expect a first order transition, as we shall see below when we discuss Landau theory.

7.5.4 Mean Field Theory of the XY Model

Consider the so-called XY model, in which each site contains a continuous planar spin, represented by
an angular variable φi ∈ [−π, π] :
1X  X
Ĥ = − Jij cos φi − φj − H cos φi . (7.124)
2
i6=j i

We write the (diagonal elements of the) full density matrix once again as a product:
Y
̺N (φ1 , φ2 , . . .) = ̺(φi ) . (7.125)
i

Our goal will be to extremize the free energy with respect to the function ̺(φ). To this end, we compute
 2 
ˆ Tr ̺ eiφ − N H Tr ̺ cos φ .
E = Tr (̺N Ĥ) = − 21 N J(0) (7.126)

The entropy is
S = −N kB Tr (̺ ln ̺) . (7.127)
Note that for any function A(φ), we have14


Tr ̺ A) ≡ ̺(φ) A(φ) . (7.128)

−π

 
We now extremize the functional F ̺(φ) = E − T S with respect to ̺(φ), under the condition that
Tr ̺ = 1. We therefore use Lagrange’s method of undetermined multipliers, writing
 
F ∗ = F − N kB T λ Tr ̺ − 1 . (7.129)

Note that F ∗ is a function of the Lagrange multiplier λ and a functional of the density matrix ̺(φ). The
prefactor N kB T which multiplies λ is of no mathematical consequence – we could always redefine the
multiplier to be λ′ ≡ N kB T λ. It is present only to maintain homogeneity and proper dimensionality of
F ∗ with λ itself dimensionless and of order N 0 . We now have
(
δF ∗ δ  2 
= ˆ Tr ̺ eiφ − N H Tr ̺ cos φ
− 12 N J(0)
δ̺(φ) δ̺(φ)
) (7.130)
  
+ N kB T Tr ̺ ln ̺ − N kB T λ Tr ̺ − 1 .

14
The denominator of 2π in the measure is not necessary, and in fact it is even slightly cumbersome. It divides out
whenever we take a ratio to compute a thermodynamic average. I introduce this factor to preserve the relation Tr 1 = 1. I
personally find unnormalized traces to be profoundly unsettling on purely aesthetic grounds.
7.5. VARIATIONAL DENSITY MATRIX METHOD 393

To this end, we note that



δ δ dφ 1
Tr (̺ A) = ̺(φ) A(φ) = A(φ) . (7.131)
δ̺(φ) δ̺(φ) 2π 2π
−π

Thus, we have
" #
δF̃ 1 ′  ′  cos φ
ˆ ·
= − 21 N J(0) Tr′ ̺ eiφ e−iφ + Tr′ ̺ e−iφ eiφ − N H ·
δ̺(φ) 2π φ φ 2π
(7.132)
1 h i λ
+ N kB T · ln ̺(φ) + 1 − N kB T · .
2π 2π
Now let us define


 dφ
Tr ̺ e = ̺(φ) eiφ ≡ m eiφ0 . (7.133)
φ 2π
−π

We then have
ˆ
J(0) H
ln ̺(φ) = m cos(φ − φ0 ) + cos φ + λ − 1. (7.134)
kB T kB T
Clearly the free energy will be reduced if φ0 = 0 so that the mean field is maximal and aligns with the
external field, which prefers φ = 0. Thus, we conclude
 
Heff
̺(φ) = C exp cos φ , (7.135)
kB T
where
ˆ m+H
Heff = J(0) (7.136)
and C = eλ−1 . The value of λ is then determined by invoking the constraint,
Zπ  
dφ Heff
Tr ̺ = 1 = C exp cos φ = C I0 (Heff /kB T ) , (7.137)
2π kB T
−π

where I0 (z) is the Bessel function. We are free to define ε ≡ Heff /kB T , and treat ε as our single variational
parameter. We then have the normalized single site density matrix
exp(ε cos φ) exp(ε cos φ)
̺(φ) = = . (7.138)
Rπ dφ′ I0 (ε)
exp(ε cos φ′)

−π

We next compute the following averages:




±iφ
dφ I1 (ε)
e = ̺(φ) e±iφ = (7.139)
2π I0 (ε)
−π
 2


′ I1 (ε)
cos(φ − φ ) = Re eiφ e−iφ =

, (7.140)
I0 (ε)
394 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

as well as

dφ eε cos φ n o I1 (ε)
Tr (̺ ln ̺) = ε cos φ − ln I0 (ε) = ε − ln I0 (ε) . (7.141)
2π I0 (ε) I0 (ε)
−π

The dimensionless free energy per site is therefore


 
1 I1 (ε) 2 I1 (ε)
f (ε, h, θ) = − + (θε − h) − θ ln I0 (ε) , (7.142)
2 I0 (ε) I0 (ε)

with θ = kB T /
Jˆ(0) and h = H/J(0)
ˆ ˆ
and f = F/N J(0) as before. Note that the mean field equation is
m = θε − h = e iφ , i.e.
I1 (ε)
θε − h = . (7.143)
I0 (ε)

For small ε, we may expand the Bessel functions, using



X ( 41 z 2 )k
Iν (z) = ( 12 z)ν , (7.144)
k! Γ(k + ν + 1)
k=0

to obtain  
f (ε, h, θ) = 1
4 θ− 1
2 ε2 + 1
64 2 − 3θ ε4 − 1
2 hε + 1
16 hε3 + . . . . (7.145)
This predicts a second order phase transition at θc = 12 .15 Note also the Curie-Weiss form of the
susceptibility at high θ:
∂f h
= 0 =⇒ ε = + ... . (7.146)
∂ε θ − θc

7.5.5 XY model via neglect of fluctuations method

Consider again the Hamiltonian of eqn. 7.124. Define zi ≡ exp(iφi ) and write

zi = w + δzi , (7.147)

where w ≡ hzi i and δzi ≡ zi − w. Of course we also have the complex conjugate relations zi∗ = w∗ + δzi∗
and w∗ = hzi∗ i. Writing cos(φi − φj ) = Re (zi∗ zj ) , by neglecting the terms proportional to δzi∗ δzj in Ĥ
we arrive at the mean field Hamiltonian,
X  1 X ∗ 
ˆ |w|2 − 1 J(0)
Ĥ MF = 12 N J(0) ˆ |w| w ∗
zi + wzi

− 2H zi + zi (7.148)
2
i i

It is clear that the free energy will be minimized if the mean field w breaks the O(2) symmetry in the
same direction as the external field H, which means w ∈ R and
X
ˆ |w|2 − H + J(0)
Ĥ MF = 21 N J(0) ˆ |w| cos φi . (7.149)
i
15
Note that the coefficient of the quartic term in ε is negative for θ > 32 . At θ = θc = 12 , the coefficient is positive, but for
larger θ one must include higher order terms in the Landau expansion.
7.6. LANDAU THEORY OF PHASE TRANSITIONS 395

The dimensionless free energy per site is then


 
1 2 h + |w|
f= 2 |w| − θ ln I0
θ
. (7.150)

Differentiating with respect to |w| , one obtains


h+m

I1 θ
|w| ≡ m =  , (7.151)
I0 h+m
θ

which is the same equation as eqn. 7.143. The two mean field theories yield the same results in every
detail (see §7.10).

7.6 Landau Theory of Phase Transitions

Landau’s theory of phase transitions is based on an expansion of the free energy of a thermodynamic
system in terms of an order parameter , which is nonzero in an ordered phase and zero in a disordered
phase. For example, the magnetization M of a ferromagnet in zero external field but at finite temperature
typically vanishes for temperatures T > Tc , where Tc is the critical temperature, also called the Curie
temperature in a ferromagnet. A low order expansion in powers of the order parameter is appropriate
sufficiently close to the phase transition, i.e. at temperatures such that the order parameter, if nonzero,
is still small.

7.6.1 Quartic free energy with Ising symmetry

The simplest example is the quartic free energy,

f (m, h = 0, θ) = f0 + 21 am2 + 41 bm4 , (7.152)

where f0 = f0 (θ), a = a(θ), and b = b(θ). Here, θ is a dimensionless measure of the temperature. If for
example the local exchange energy in the ferromagnet is J, then we might define θ = kB T /zJ, as before.
Let us assume b > 0, which is necessary if the free energy is to be bounded from below16 . The equation
of state ,
∂f
= 0 = am + bm3 , (7.153)
∂m
p p
has three solutions in the complex m plane: (i) m = 0, (ii) m = −a/b , and (iii) m = − −a/b . The
latter two solutions lie along the (physical) real axis if a < 0. We assume that there exists a unique
temperature θc where a(θc ) = 0. Minimizing f , we find

a2
θ < θc : f (θ) = f0 −
4b (7.154)
θ > θc : f (θ) = f0 .
16
It is always the case that f is bounded from below, on physical grounds. Were b negative, we’d have to consider higher
order terms in the Landau expansion.
396 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.14: Phase diagram for the quartic Landau free energy f = f0 + 21 am2 + 41 bm4 − hm, with b > 0.
There is a first order line at h = 0 extending from a = −∞ and terminating in a critical point at a = 0.
For |h| < h∗ (a) (dashed red line) there are three solutions to the mean field equation, corresponding
to one global minimum, one local minimum, and one local maximum. Insets show behavior of the free
energy f (m).

The free energy is continuous at θc since a(θc ) = 0. The specific heat, however, is discontinuous across
the transition, with
 2  2
+
 −
 ∂ 2 a θc a′ (θc )
c θc − c θc = −θc 2 =− . (7.155)
∂θ θ=θc 4b 2b(θc )

The presence of a magnetic field h breaks the Z2 symmetry of m → −m. The free energy becomes

f (m, h, θ) = f0 + 12 am2 + 41 bm4 − hm , (7.156)

and the mean field equation is


bm3 + am − h = 0 . (7.157)

This is a cubic equation for m with real coefficients, and as such it can either have three real solutions or
one real solution and two complex solutions related by complex conjugation. Clearly we must have a < 0
in order to have three real roots, since bm3 + am is monotonically increasing otherwise. The boundary
between these two classes of solution sets occurs when two roots coincide, which means f ′′ (m) = 0 as
well as f ′ (m) = 0. Simultaneously solving these two equations, we find

2 (−a)3/2
h∗ (a) = ± , (7.158)
33/2 b1/2
7.6. LANDAU THEORY OF PHASE TRANSITIONS 397

or, equivalently,
3
a∗ (h) = − b1/3 |h|2/3 . (7.159)
22/3
If, for fixed h, we have a < a∗ (h), then there will be three real solutions to the mean field equation
f ′ (m) = 0, one of which is a global minimum (the one for which m · h > 0). For a > a∗ (h) there is only
a single global minimum, at which m also has the same sign as h. If we solve the mean field equation
perturbatively in h/a, we find

h b
m(a, h) = − 4 h3 + O(h5 ) (a > 0)
a a
(7.160)
|a|1/2 h 3 b1/2 2
=± + ± h + O(h3 ) (a < 0) .
b1/2 2 |a| 8 |a|5/2

7.6.2 Cubic terms in Landau theory : first order transitions

Next, consider a free energy with a cubic term,

f = f0 + 12 am2 − 31 ym3 + 41 bm4 , (7.161)

with b > 0 for stability. Without loss of generality, we may assume y > 0 (else send m → −m). Note
that we no longer have m → −m (i.e. Z2 ) symmetry. The cubic term favors positive m. What is the
phase diagram in the (a, y) plane?
Extremizing the free energy with respect to m, we obtain

∂f
= 0 = am − ym2 + bm3 . (7.162)
∂m
This cubic equation factorizes into a linear and quadratic piece, and hence may be solved simply. The
three solutions are m = 0 and r 
y y 2 a
m = m± ≡ ± − . (7.163)
2b 2b b
We now see that for y 2 < 4ab there is only one real solution, at m = 0, while for y 2 > 4ab there are three
real solutions. Which solution has lowest free energy? To find out, we compare the energy f (0) with
f (m+ )17 . Thus, we set
2
f (m) = f (0) =⇒ 1
2 am − 31 ym3 + 41 bm4 = 0 , (7.164)

and we now have two quadratic equations to solve simultaneously:

0 = a − ym + bm2
(7.165)
0 = 12 a − 13 ym + 41 bm2 = 0 .

17
We needn’t waste our time considering the m = m− solution, since the cubic term prefers positive m.
398 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.15: Behavior of the quartic free energy f (m) = 21 am2 − 31 ym3 + 14 bm4 . A: y 2 < 4ab ; B:
4ab < y 2 < 29 ab ; C and D: y 2 > 29 ab. The thick black line denotes a line of first order transitions, where
the order parameter is discontinuous across the transition.

Eliminating the quadratic term gives m = 3a/y. Finally, substituting m = m+ gives us a relation between
a, b, and y:
y 2 = 92 ab . (7.166)
Thus, we have the following:
y2
a> : 1 real root m = 0
4b
y2 2y 2
>a> : 3 real roots; minimum at m = 0 (7.167)
4b 9b r
2y 2 y y 2 a
>a : 3 real roots; minimum at m = + −
9b 2b 2b b
The solution m = 0 lies at a local minimum of the free energy for a > 0 and at a local maximum for a < 0.
2 2
Over the range y4b > a > 2y 9b , then, there is a global minimum at m = 0, a local minimum at m = m+ ,
2
and a local maximum at m = m− , with m+ > m− > 0. For 2y 9b > a > 0, there is a local minimum at
a = 0, a global minimum at m = m+ , and a local maximum at m = m− , again with m+ > m− > 0.
For a < 0, there is a local maximum at m = 0, a local minimum at m = m− , and a global minimum at
m = m+ , with m+ > 0 > m− . See fig. 7.15.
With y = 0, we have a second order transition at a = 0. With y 6= 0, there is a discontinuous (first order)
transition at ac = 2y 2 /9b > 0 and mc = 2y/3b . This occurs before a reaches the value a = 0 where the
curvature at m = 0 turns negative. If we write a = α(T − T0 ), then the expected second order transition
at T = T0 is preempted by a first order transition at Tc = T0 + 2y 2 /9αb.
7.6. LANDAU THEORY OF PHASE TRANSITIONS 399

7.6.3 Magnetization dynamics

Suppose we now impose some dynamics on the system, of the simple relaxational type
∂m ∂f
= −Γ , (7.168)
∂t ∂m
where Γ is a phenomenological kinetic coefficient. Assuming y > 0 and b > 0, it is convenient to
adimensionalize by writing

y y2 b
m≡ ·u , a≡ ·r , t≡ ·s . (7.169)
b b Γ y2
Then we obtain
∂u ∂ϕ
=− , (7.170)
∂s ∂u
where the dimensionless free energy function is

ϕ(u) = 12 ru2 − 13 u3 + 41 u4 . (7.171)

We see that there is a single control parameter, r. The fixed points of the dynamics are then the stationary
points of ϕ(u), where ϕ′ (u) = 0, with

ϕ′ (u) = u (r − u + u2 ) . (7.172)

The solutions to ϕ′ (u) = 0 are then given by


q
u∗ = 0 , u∗ = 1
2 ± 1
4 −r . (7.173)

For r > 14 there is one fixed point at u = 0, which is attractive under the dynamics u̇ = −ϕ′ (u) since
ϕ′′ (0) = r. At r = 14 there occurs a saddle-node bifurcation and a pair of fixed points is generated, one
stable and one unstable. As we see from fig. 7.14, the interior fixed point is always unstable and the
two exterior fixed points are always stable. At r = 0 there is a transcritical bifurcation where two fixed
points of opposite stability collide and bounce off one another (metaphorically speaking).
At the saddle-node bifurcation, r = 14 and u = 12 , and we find ϕ(u = 12 ; r = 14 ) = 192
1
, which is positive.
Thus, the thermodynamic state of the system remains at u = 0 until the value of ϕ(u+ ) crosses zero.
This occurs when ϕ(u) = 0 and ϕ′ (u) = 0, the simultaneous solution of which yields r = 92 and u = 23 .
Suppose we slowly ramp the control parameter r up and down as a function of the dimensionless time s.
Under the dynamics of eqn. 7.170, u(s) flows to the first stable fixed point encountered – this is always
the case for a dynamical system with a one-dimensional phase space. Then as r is further varied,  u
follows the position of whatever locally stable fixed point it initially encountered. Thus, u r(s) evolves
smoothly until a bifurcation is encountered. The situation is depicted by the arrows in fig. 7.16. The
equilibrium thermodynamic value for u(r) is discontinuous; there is a first order phase transition at r = 29 ,
as we’ve already seen. As r is increased, u(r) follows a trajectory indicated by the magenta arrows. For
an negative initial value of u, the evolution as a function of r will be reversible. However, if u(0) is
initially positive, then the system exhibits hysteresis, as shown. Starting with a large positive value of
r, u(s) quickly evolves to u = 0+ , which means a positive infinitesimal value. Then as r is decreased,
400 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.16: Fixed points for ϕ(u) = 12 ru2 − 31 u3 + 41 u4 and flow under the dynamics u̇ = −ϕ′ (u).
Solid curves represent stable fixed points and dashed curves unstable fixed points. Magenta arrows
show behavior under slowly increasing control parameter r and dark blue arrows show behavior under
slowly decreasing r. For u > 0 there is a hysteresis loop. The thick black curve shows the equilibrium
thermodynamic value of u(r), i.e. that value which minimizes the free energy ϕ(u). There is a first order
phase transition at r = 29 , where the thermodynamic value of u jumps from u = 0 to u = 32 .

the system remains at u = 0+ even through the first order transition, because u = 0 is an attractive
fixed point. However, once r begins to go negative,
q the u = 0 fixed point becomes repulsive, and u(s)
quickly flows to the stable fixed point u+ = 12 + 14 − r. Further decreasing r, the system remains on
this branch. If r is later increased, then u(s) remains on the upper
q branch past r = 0, until the u+ fixed
point annihilates with the unstable fixed point at u− = 2 − 14 − r, at which time u(s) quickly flows
1

down to u = 0+ again.

7.6.4 Sixth order Landau theory : tricritical point

Finally, consider a model with Z2 symmetry, with the Landau free energy

f = f0 + 21 am2 + 41 bm4 + 16 cm6 , (7.174)

with c > 0 for stability. We seek the phase diagram in the (a, b) plane. Extremizing f with respect to m,
we obtain
∂f
= 0 = m (a + bm2 + cm4 ) , (7.175)
∂m
7.6. LANDAU THEORY OF PHASE TRANSITIONS 401

Figure 7.17: Behavior of the sextic free energy f (m) = 21 am2 + 14 bm4 + 61 cm6 . A: a > 0 and b > 0 ; B:
√ √ √
a < 0 and b > 0 ; C: a < 0 and b < 0 ; D: a > 0 and b < − √43 ac ; E: a > 0 and − √43 ac < b < −2 ac

; F: a > 0 and −2 ac < b < 0. The thick dashed line is a line of second order transitions, which meets
the thick solid line of first order transitions at the tricritical point, (a, b) = (0, 0).

which is a quintic with five solutions over the complex m plane. One solution is obviously m = 0. The
other four are v
u s 
u b b 2 a
t
m=± − ± − . (7.176)
2c 2c c
For each ± symbol in the above equation, there are two options, hence four roots in all.
If a > 0 and b > 0, then four of the roots are imaginary and there is a unique minimum at m = 0.
For a < 0, there are only three solutions to f ′ (m) = 0 for real m, since the − choice for the ± sign under
the radical leads to imaginary roots. One of the solutions is m = 0. The other two are
s r 
b b 2 a
m=± − + − . (7.177)
2c 2c c
402 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.18: Free energy ϕ(u) = 21 ru2 − 41 u4 + 61 u6 for several different values of the control parameter r.


The most interesting situation is a > 0 and b < 0. If a > 0 and b < −2 ac, all five roots are real. There
must be three minima, separated by two local maxima. Clearly if m∗ is a solution, then so is −m∗ . Thus,
the only question is whether the outer minima are of lower energy than the minimum at m = 0. We
assess this by demanding f (m∗ ) = f (0), where m∗ is the position of the largest root (i.e. the rightmost
minimum). This gives a second quadratic equation,

0 = 12 a + 14 bm2 + 61 cm4 , (7.178)

which together with equation 7.175 gives


b = − √43 ac . (7.179)

Thus, we have the following, for fixed a > 0:


b > −2 ac : 1 real root m = 0
√ √
−2 ac > b > − √43 ac : 5 real roots; minimum at m = 0 (7.180)
s r
√ b b 2 a
− √43 ac > b : 5 real roots; minima at m = ± − + −
2c 2c c

The point (a, b) = (0, 0), which lies at the confluence of a first order line and a second order line, is known
as a tricritical point.
7.6. LANDAU THEORY OF PHASE TRANSITIONS 403

7.6.5 Hysteresis for the sextic potential

Once again, we consider the dissipative dynamics ṁ = −Γ f ′ (m). We adimensionalize by writing


r
|b| b2 c
m≡ ·u , a≡ ·r , t≡ ·s . (7.181)
c c Γ b2
Then we obtain once again the dimensionless equation
∂u ∂ϕ
=− , (7.182)
∂s ∂u
where
ϕ(u) = 12 ru2 ± 14 u4 + 61 u6 . (7.183)
In the above equation, the coefficient of the quartic term is positive if b > 0 and negative if b < 0. That
is, the coefficient is sgn(b). When b > 0 we can ignore the sextic term for sufficiently small u, and we
recover the quartic free energy studied earlier. There is then a second order transition at r = 0. .
New and interesting behavior occurs for b > 0. The fixed points of the dynamics are obtained by setting
ϕ′ (u) = 0. We have
ϕ(u) = 21 ru2 − 14 u4 + 61 u6
(7.184)
ϕ′ (u) = u (r − u2 + u4 ) .

Figure 7.19: Fixed points ϕ′ (u∗ ) = 0 for the sextic potential ϕ(u) = 12 ru2 − 41 u4 + 16 u6 , and corresponding
dynamical flow (arrows) under u̇ = −ϕ′ (u). Solid curves show stable fixed points and dashed curves show
unstable fixed points. The thick solid black and solid grey curves indicate the equilibrium thermodynamic
values for u; note the overall u → −u symmetry. Within the region r ∈ [0, 41 ] the dynamics are irreversible
3
and the system exhibits the phenomenon of hysteresis. There is a first order phase transition at r = 16 .
404 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Thus, the equation ϕ′ (u) = 0 factorizes into a linear factor u and a quartic factor u4 − u2 + r which is
quadratic in u2 . Thus, we can easily obtain the roots:

r q
∗ ∗ 1 1
r<0 : u =0, u =± 2 + 4 −r
r q r q
(7.185)
0<r< 1
4 : u∗ = 0 , u∗ = ± 1
2 + 1
4 − r , u∗ = ± 1
2 − 1
4 −r

r> 1
4 : u∗ = 0 .

In fig. 7.19, we plot the fixed points and the hysteresis loops for this system. At r = 14 , there are two
1
symmetrically located saddle-node bifurcations at u = ± √12 . We find ϕ(u = ± √12 , r = 41 ) = 48 , which is

positive, indicating that the stable fixed point u = 0 remains the thermodynamic minimum for the free
energy ϕ(u) as r is decreased through r = 41 . Setting ϕ(u) = 0 and ϕ′ (u) = 0 simultaneously, we obtain

3 3
r= 16√and u = ± 2 . The thermodynamic value for u therefore jumps discontinuously from u = 0 to
u= ± 23 (either branch) at r = 3
16 ; this is a first order transition.
Under the dissipative dynamics considered here, the system exhibits hysteresis, as indicated in the figure,
where the arrows show the evolution of u(s) for very slowly varying r(s). When the control parameter
r is large and positive, the flow is toward the sole fixed point at u∗ = 0. At r = 41 , two simultaneous
saddle-node bifurcations take place at u∗ = ± √12 ; the outer branch is stable and the inner branch unstable
in both cases. At r = 0 there is a subcritical pitchfork bifurcation, and the fixed point at u∗ = 0 becomes
unstable.
Suppose one starts off with r ≫ 41 with some value u > 0. The flow u̇ = −ϕ′ (u) then rapidly results in
u → 0+ . This is the ‘high temperature phase’ in which there is no magnetization. Now let  r increase

slowly, using s as the dimensionless time variable. The scaled magnetization u(s) = u r(s) will remain
pinned at the fixed point u∗ = 0+ . As r passes through r = 14 , two new stable values of u∗ appear, but
our system remains at u = 0+ , since u∗ = 0 is a stable fixed point. But after the subcritical pitchfork,
u∗ = 0 becomes unstable. The magnetization u(s) then flows rapidly to the stable fixed point at u∗ = √12 ,
1/2
and follows the curve u∗ (r) = 21 + ( 14 − r)1/2 for all r < 0.
Now suppose we start increasing r (i.e. increasing temperature). The magnetization follows the stable
1/2
fixed point u∗ (r) = 12 + ( 41 − r)1/2 3
past r = 0, beyond the first order phase transition point at r = 16 ,
1
and all the way up to r = 4 , at which point this fixed point is annihilated at a saddle-node bifurcation.
The flow then rapidly takes u → u∗ = 0+ , where it remains as r continues to be increased further.
 
Within the region r ∈ 0, 41 of control parameter space, the dynamics are said to be irreversible and the
behavior of u(s) is said to be hysteretic.
7.7. MEAN FIELD THEORY OF FLUCTUATIONS 405

7.7 Mean Field Theory of Fluctuations

7.7.1 Correlation and response in mean field theory

Consider the Ising model, X X


Ĥ = − 21 Jij σi σj − H k σk , (7.186)
i,j k

where the local magnetic field on site k is now Hk . We assume without loss of generality that the diagonal
terms vanish: Jii = 0. Now consider the partition function Z = Tr e−β Ĥ as a function of the temperature
T and the local field values {Hi }. We have

∂Z h i
= β Tr σi e−β Ĥ = βZ · hσi i
∂Hi
2 h i (7.187)
∂Z
= β 2 Tr σi σj e−β Ĥ = β 2 Z · hσi σj i .
∂Hi ∂Hj

Thus,
∂F
mi = − = hσi i
∂Hi
(7.188)
∂mi ∂ 2F 1 n o
χij = =− = · hσi σj i − hσi i hσj i .
∂Hj ∂Hi ∂Hj kB T

Expressions such as hσi i, hσi σj i, etc. are in general called correlation functions. For example, we define
the spin-spin correlation function Cij as

Cij ≡ hσi σj i − hσi i hσj i . (7.189)

∂F 2
Expressions such as ∂H and ∂H∂ ∂HF
are called response functions. The above relation between corre-
i i j
lation functions and response functions, Cij = kB T χij , is valid only for the equilibrium distribution. In
particular, this relationship is invalid if one uses an approximate distribution, such as the variational
density matrix formalism of mean field theory.
The question then arises: within mean field theory, which is more accurate: correlation functions or
response functions? A simple argument suggests that the response functions are more accurate represen-
tations of the real physics. To see this, let’s write the variational density matrix ̺var as the sum of the
exact equilibrium (Boltzmann) distribution ̺eq = Z −1 exp(−β Ĥ) plus a deviation δ̺:

̺var = ̺eq + δ̺ . (7.190)

Then if we calculate a correlator using the variational distribution, we have


h i
hσi σj ivar = Tr ̺var σi σj
h i h i (7.191)
= Tr ̺eq σi σj + Tr δ̺ σi σj .
406 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Thus, the variational density matrix gets the correlator right to first order in δ̺. On the other hand, the
free energy is given by

X ∂F 1 X ∂ 2F
F var eq
=F + δ̺ +
∂̺σ ̺eq σ 2 δ̺σ δ̺σ′ + . . . . (7.192)
σ ′ ∂̺ ∂̺ ′ ̺eqσ,σ σ σ

Here σ denotes a state of the system, i.e. | σ i = | σ1 , . . . , σN i, where every spin polarization is specified.
Since the free energy is an extremum (and in fact an absolute minimum) with respect to the distribution,
the second term on the RHS vanishes. This means that the free energy is accurate to second order in the
deviation δ̺.

7.7.2 Calculation of the response functions

Consider the variational density matrix


Y
̺(σ) = ̺i (σi ) , (7.193)
i

where    
1 + mi 1 − mi
̺i (σi ) = δσi ,1 + δσi ,−1 . (7.194)
2 2
The variational energy E = Tr (̺ Ĥ) is
X X
E = − 12 Ji,j mi mj − H i mi (7.195)
ij i

and the entropy S = −kB T Tr (̺ ln ̺) is


( )
X 1 + m  1 + m  1 − m  1 − m 
i i i i
S = −kB ln + ln . (7.196)
2 2 2 2
i
∂F
Setting the variation = 0, with F = E − T S, we obtain the mean field equations,
∂mi

mi = tanh βJij mj + βHi , (7.197)
P
where we use the summation convention: Jij mj ≡ j Jij mj . Suppose T > Tc and mi is small. Then we
can expand the RHS of the above mean field equations, obtaining

δij − βJij mj = βHi . (7.198)
Thus, the susceptibility tensor χ is the inverse of the matrix (kB T · I − J) :
∂mi −1
χij = = kB T · I − J ij , (7.199)
∂Hj
where I is the identity. Note also that so-called connected averages of the kind in eqn. 7.189 vanish
identically if we compute them using our variational density matrix, since all the sites are independent,
hence   
hσi σj i = Tr ̺var σi σj = Tr ̺i σi · Tr ̺j σj = hσi i · hσj i , (7.200)
7.7. MEAN FIELD THEORY OF FLUCTUATIONS 407

and therefore χij = 0 if we compute the correlation functions themselves from the variational density
matrix, rather than from the free energy F . As we have argued above, the latter approximation is more
accurate.
Assuming Jij = J(Ri − Rj ), where Ri is a Bravais lattice site, we can Fourier transform the above
equation, resulting in
Ĥ(q)
m̂(q) = ≡ χ̂(q) Ĥ(q) . (7.201)
ˆ
k T − J(q)
B

Once again, our definition of lattice Fourier transform of a function φ(R) is


X
φ̂(q) ≡ φ(R) e−iq·R
R
Z d (7.202)
dq
φ(R) = Ω φ̂(q) eiq·R ,
(2π)d
Ω̂

where Ω is the unit cell in real space, called the Wigner-Seitz cell, and Ω̂ is the first Brillouin zone, which
is the unit cell in reciprocal space. Similarly, we have
X  
ˆ
J(q) = J(R) 1 − iq · R − 12 (q · R)2 + . . .
R
n o (7.203)
ˆ 2 2 4
= J(0) · 1 − q R∗ + O(q ) ,

where P
R2 J(R)
R∗2 = R
P . (7.204)
2d R J(R)
Here we have assumed inversion symmetry for the lattice, in which case
X 1 X
Rµ Rν J(R) = · δµν R2 J(R) . (7.205)
d
R R

On cubic lattices with nearest neighbor interactions only, one has R∗ = a/ 2d, where a is the lattice
constant and d is the dimension of space.
ˆ
Thus, with the identification kB Tc = J(0), we have
1
χ̂(q) =
kB (T − Tc ) + kB Tc R∗2 q 2 + O(q 4 )
(7.206)
1 1
= · −2 ,
kB Tc R∗ ξ + q + O(q 4 )
2 2

where  −1/2
T − Tc
ξ = R∗ · (7.207)
Tc
is the correlation length. With the definition

ξ(T ) ∝ |T − Tc |−ν (7.208)


408 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

as T → Tc , we obtain the mean field correlation length exponent ν = 21 . The exact result for the two-
dimensional Ising model is ν = 1, whereas ν ≈ 0.6 for the d = 3 Ising model. Note that χ̂(q = 0, T )
diverges as (T − Tc )−1 for T > Tc .
In real space, we have
X
mi = χij Hj , (7.209)
j

where Z d
dq
χij = Ω χ̂(q) eiq·(Ri −Rj ) . (7.210)
(2π)d

Note that χ̂(q) is properly periodic under q → q + G, where G is a reciprocal lattice vector, which
satisfies eiG·R = 1 for any direct Bravais lattice vector R. Indeed, we have

χ̂−1 (q) = kB T − J(q)


ˆ
X (7.211)
= kB T − J eiq·δ ,
δ

where δ is a nearest neighbor separation vector, and where in the second line we have assumed nearest
neighbor interactions only. On cubic lattices in d dimensions, there are 2d nearest neighbor separation
vectors, δ = ±a êµ , where µ ∈ {1, . . . , d}. The real space susceptibility is then

Zπ Zπ
dθ1 dθd ein1 θ1 · · · eind θd
χ(R) = ··· , (7.212)
2π 2π kB T − (2J cos θ1 + . . . + 2J cos θd )
−π −π

P
where R = a dµ=1 nµ êµ is a general direct lattice vector for the cubic Bravais lattice in d dimensions,
and the {nµ } are integers.
The long distance behavior was discussed in chapter 6 (see §6.5.9 on Ornstein-Zernike theory18 ). For
convenience we reiterate those results:

• In d = 1,  
ξ
χd=1 (x) = e−|x|/ξ . (7.213)
2kB Tc R∗2

• In d > 1, with r → ∞ and ξ fixed,


  
ξ (3−d)/2 e−r/ξ d−3
χd (r) ≃ Cd ·
OZ
· · 1+O , (7.214)
kB T R∗2 r (d−1)/2 r/ξ

where the Cd are dimensionless constants.


18
There is a sign difference between the particle susceptibility defined in chapter 6 and the spin susceptibility defined here.
The origin of the difference is that the single particle potential v as defined was repulsive for v > 0, meaning the local density
response δn should be negative, while in the current discussion a positive magnetic field H prefers m > 0.
7.7. MEAN FIELD THEORY OF FLUCTUATIONS 409

• In d > 2, with ξ → ∞ and r fixed (i.e. T → Tc at fixed separation r),


  
Cd′ e−r/ξ d−3
χd (r) ≃ · · 1+O . (7.215)
kB T R∗2 r d−2 r/ξ

In d = 2 dimensions we obtain
    
C2′ r −r/ξ 1
χd=2 (r) ≃ · ln e · 1+O , (7.216)
kB T R∗2 ξ ln(r/ξ)

where the Cd′ are dimensionless constants.

7.7.3 Beyond the Ising model

Consider a general spin model, and a variational density matrix ̺var which is a product of single site
density matrices:
  Y (i)
̺var {Si } = ̺1 (Si ) , (7.217)
i

where Tr ̺var S = mi is the local magnetization and Si , which may be a scalar (e.g., σi in the Ising
(i)
model previously discussed), is the local spin operator. Note that ̺1 (Si ) depends parametrically on the
variational parameter(s) mi . Let the Hamiltonian be
X µν µ X X
Ĥ = − 21 Jij Si Sjν + h(Si ) − Hi · Si . (7.218)
i,j i i

The variational free energy is then


X µν µ X X µ µ
Fvar = − 21 Jij mi mνj + ϕ(mi , T ) − H i mi , (7.219)
i,j i i

where the single site free energy ϕ(mi , T ) in the absence of an external field is given by
h i h i
(i) (i) (i)
ϕ(mi , T ) = Tr ̺1 (S) h(S) + kB T Tr ̺1 (S) ln ̺1 (S) (7.220)

We then have
∂Fvar X µν ∂ϕ(mi , T )
µ =− Jij mνj − Hiµ + . (7.221)
∂mi ∂mµi
j

For the noninteracting system, we have Jijµν = 0 , and the weak field response must be linear. In this
limit we may write mµi = χ0µν (T ) Hiν + O(Hi3 ), and we conclude

∂ϕ(mi , T )  0 −1 
µ = χ (T ) µν mνi + O m3i . (7.222)
∂mi

Note that this entails the following expansion for the single site free energy in zero field:
 −1
ϕ(mi , T ) = 21 χ0 (T ) µν mνi mνi + O(m4 ) . (7.223)
410 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Finally, we restore the interaction term and extremize Fvar by setting ∂Fvar /∂mµi = 0. To linear order,
then,  
X
mµi = χ0µν (T ) Hiν + Jijνλ mλj . (7.224)
j

Typically the local susceptibility is a scalar in the internal spin space, i.e. χ0µν (T ) = χ0 (T ) δµν , in which
case we obtain

δµν δij − χ0 (T ) Jijµν mνi = χ0 (T ) Hiµ . (7.225)

In Fourier space, then,


 −1
χ̂µν (q, T ) = χ0 (T ) 1 − χ0 (T ) Ĵ(q) , (7.226)
µν

where Ĵ(q) is the matrix whose elements are Jˆµν (q). If Jˆµν (q) = J(q)
ˆ δµν , then the susceptibility is
isotropic in spin space, with
1
χ̂(q, T ) =  −1 . (7.227)
0
χ (T ) − Jˆ(q)

Consider now the following illustrative examples:

(i) Quantum spin S with h(S) = 0 : We take the ẑ axis to be that of the local external magnetic
field, i.e. Ĥi . Write ̺1 (S) = z −1 exp(uS z /kB T ), where u = u(m, T ) is obtained implicitly from
the relation m(u, T ) = Tr(̺1 S z ). The normalization constant is

S 
X sinh (S + 12 ) u/kB T ]
uS z /kB T ju/kB T  
z = Tr e = e = (7.228)
j=−S
sinh u/2kB T

The relation between m, u, and T is then given by

∂ ln z    
m = hS z i = kB T = (S + 12 ) ctnh (S + 12 ) u/kB T − 1
2 ctnh u/2kB T
∂u
(7.229)
S(S + 1)
= u + O(u3 ) .
3kB T

The free-field single-site free energy is then



ϕ(m, T ) = kB T Tr ̺1 ln ̺1 = um − kB T ln z , (7.230)

whence
∂ϕ ∂u ∂ ln z ∂u
=u+m − kB T = u ≡ χ−1 3
0 (T ) m + O(m ) , (7.231)
∂m ∂m ∂u ∂m
and we thereby obtain the result
S(S + 1)
χ0 (T ) = , (7.232)
3kB T
which is the Curie susceptibility.
7.7. MEAN FIELD THEORY OF FLUCTUATIONS 411

(ii) Classical spin S = S n̂ with h = 0 and n̂ an N -component unit vector : We take the single site
density matrix to be ̺1 (S) = z −1 exp(u · S/kB T ). The single site field-free partition function is
then

Z
dn̂ S 2 u2
z= exp(u · S/kB T ) = 1 + + O(u4 ) (7.233)
ΩN N (kB T )2

and therefore

∂ ln z S2 u
m = kB T = + O(u3 ) , (7.234)
∂u N kB T

from which we read off χ0 (T ) = S 2 /N kB T . Note that this agrees in the classical (S → ∞) limit,
for N = 3, with our previous result.

(iii) Quantum spin S with h(S) = ∆(S z )2 : This corresponds to so-called easy plane anisotropy, meaning
that the single site energy h(S) is minimized when the local spin vector S lies in the (x, y) plane.
As in example (i), we write ̺1 (S) = z −1 exp(uS z /kB T ), yielding the same expression for z and the
same relation between z and u. What is different is that we must evaluate the local energy,

 ∂ 2 ln z
e (u, T ) = Tr ̺1 h(S) = ∆ (kB T )2
" ∂u2 # (7.235)
∆ 1 (2S + 1)2 S(S + 1)∆ u2
=  −   = + O(u4 ) .
4 sinh2 u/2kB T ] sinh2 (2S + 1)u/2kB T 6(kB T )2

We now have ϕ = e + um − kB T ln z, from which we obtain the susceptibility

S(S + 1)
χ0 (T ) = . (7.236)
3(kB T + ∆)

Note that the local susceptibility no longer diverges as T → 0, because there is always a gap in the
spectrum of h(S).
412 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

7.8 Global Symmetries

7.8.1 Symmetries and symmetry groups

Interacting systems can be broadly classified according to their global symmetry group. Consider the
following five examples:
X
ĤIsing = − Jij σi σj σi ∈ {−1, +1}
i<j
X  
2π(ni − nj )
Ĥp−clock = − Jij cos ni ∈ {1, 2, . . . , p}
p
i<j
X
Ĥq−Potts = − Jij δσi ,σj σi ∈ {1, 2, . . . , q} (7.237)
i<j
X  
ĤXY = − Jij cos(φi − φj ) φi ∈ 0, 2π
i<j
X
ĤO(n) = − Jij Ω̂i · Ω̂j Ω̂i ∈ S n−1 .
i<j

The Ising Hamiltonian is left invariant by the global symmetry group Z2 , which has two elements, I and
η, with
η σi = −σi . (7.238)

I is the identity, and η 2 = I. By simultaneously reversing all the spins σi → −σi , the interactions remain
invariant.
The degrees of freedom of the p-state clock model are integer variables ni each of which ranges from 1
to p. The Hamiltonian is invariant under the discrete group Zp , whose p elements are generated by the
single operation η, where (
ni + 1 if ni ∈ {1, 2, . . . , p − 1}
η ni = (7.239)
1 if ni = p .

Think of a clock with one hand and p ‘hour’ markings consecutively spaced by an angle 2π/p. In each
site i, a hand points to one of the p hour marks; this determines ni . The operation η simply advances all
the hours by one tick, with hour p advancing to hour
 1, just as 23:00 military time is followed one hour
later by 00:00. The interaction cos 2π(ni − nj )/p is invariant under such an operation. The p elements
of the group Zp are then
I , η , η 2 , . . . , η p−1 . (7.240)

We’ve already met up with the q-state Potts model, where each site supports a ‘spin’ σi which can be in
any of q possible states, which we may label by integers {1 , . . . , q}. The energy of two interacting sites i
and j is −Jij if σi = σj and zero otherwise. This energy function is invariant under global operations of the
symmetric group on q characters, Sq , which is the group of permutations of the sequence {1 , 2 , 3 , . . . , q}.
The group Sq has q! elements. Note the difference between a Zq symmetry and an Sq symmetry. In the
7.8. GLOBAL SYMMETRIES 413

Figure 7.20: A domain wall in a one-dimensional Ising model.

former case, the Hamiltonian is invariant only under the q-element cyclic permutations, e.g.
 
1 2 · · · q−1 q
η≡
2 3 ··· q 1
and its powers η l with l = 0, . . . , q − 1.
All these models – the Ising, p-state clock, and q-state Potts models – possess a global symmetry group
which is discrete. That is, each of the symmetry groups Z2 , Zp , Sq is a discrete group, with a finite
number of elements. The XY Hamiltonian ĤXY on the other hand is invariant under a continuous
group of transformations φi → φi + α, where φi is the angle variable  on site i. More to the point, we
1 ∗
could write the interaction term cos(φi − φj ) as 2 zi zj + zi zj , where zi = eiφi is a phase which lives

on the unit circle, and zi∗ is the complex conjugate of zi . The model is then invariant under the global
transformation zi → eiα zi . The phases eiα form a group under multiplication, called U(1), which is the
same as O(2). Equivalently, we could write the interaction as Ω̂i · Ω̂j , where Ω̂i = (cos φi , sin φi ), which
explains the O(2), symmetry, since the symmetry operations are global rotations in the plane, which is
to say the two-dimensional orthogonal group. This last representation generalizes nicely to unit vectors
in n dimensions, where
Ω̂ = (Ω 1 , Ω 2 , . . . , Ω n ) (7.241)
with Ω̂ 2 = 1. The dot product Ω̂i · Ω̂j is then invariant under global rotations in this n-dimensional
space, which is the group O(n).

7.8.2 Lower critical dimension

Depending on whether the global symmetry group of a model is discrete or continuous, there exists a
lower critical dimension dℓ at or below which no phase transition may take place at finite temperature.
That is, for d ≤ dℓ , the critical temperature is Tc = 0. Owing to its neglect of fluctuations, mean
field theory generally overestimates the value of Tc because it overestimates the stability of the ordered
phase19 . Indeed, there are many examples where mean field theory predicts a finite Tc when the actual
critical temperature is Tc = 0. This happens whenever d ≤ dℓ .
Let’s test the stability of the ordered (ferromagnetic) state of the one-dimensional Ising model at low
temperatures. We consider order-destroying domain wall excitations which interpolate between regions
19
It is a simple matter to concoct models for which the mean field transition temperature underestimates the actual critical
temperature. Consider for example an Ising model with interaction u(σ, σ ′ ) = −ǫ−1 ln(1 + ǫσσ ′ ), where the spins take values
σ, σ ′ = ±1, and where 0 < ǫ < 1. If we write σ = hσi + δσ at each site and neglect terms quadratic in fluctuations, the
resulting mean field Hamiltonian is equivalent to a set of decoupled spins in an external field h = zm/(1 + ǫm2 ). The mean
field transition temperature is TcMF = z, the lattice coordination number, independent of ǫ. On the other hand, we may also
write u(σ, σ ′ ) = uǫ − Jǫ σσ ′ , where uǫ = − ln(1 − ǫ2 )/2ǫ and Jǫ = ǫ−1 tanh−1 (ǫ). On the square lattice, where z = 4, one
has the exact result Tc (ǫ) = 2Jǫ / sinh−1 (1), which diverges as ǫ → 1, while TcMF = 4 remains finite. For ǫ > 0.9265, one has
Tc (ǫ) > TcMF .
414 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.21: Domain walls in the two-dimensional (left) and three-dimensional (right) Ising model.

of degenerate, symmetry-related ordered phase, i.e. ↑↑↑↑↑ and ↓↓↓↓↓. For a system with a discrete
symmetry at low temperatures, the domain wall is abrupt, on the scale of a single lattice spacing. If the
exchange energy is J, then the energy of a single domain wall is 2J, since a link of energy −J is replaced
with one of energy +J. However, there are N possible locations for the domain wall, hence its entropy
is kB ln N . For a system with M domain walls, the free energy is
 
N
F = 2M J − kB T ln
M
( ) (7.242)
h i
= N · 2Jx + kB T x ln x + (1 − x) ln(1 − x) ,

where x = M/N is the density of domain walls, and where we have used Stirling’s approximation for k!
when k is large. Extremizing with respect to x, we find

x 1
= e−2J/kB T =⇒ x= . (7.243)
1−x e2J/kB T +1

The average distance between domain walls is x−1 , which is finite for finite T . Thus, the thermodynamic
state of the system is disordered, with no net average magnetization.
Consider next an Ising domain wall in d dimensions. Let the linear dimension of the system be L · a,
where L is a real number and a is the lattice constant. Then the energy of a single domain wall which
partitions the entire system is 2J · Ld−1 . The domain wall entropy is difficult to compute, because the
wall can fluctuate significantly, but for a single domain wall we have S >
∼ kB ln L. Thus, the free energy
F = 2JL d−1 − kB T ln L is dominated by the energy term if d > 1, suggesting that the system may be
ordered. We can do a slightly better job in d = 2 by writing
 X 
d −2P J/kB T
Z ≈ exp L NP e , (7.244)
P

where the sum is over all closd loops of perimeter P , and NP is the number of such loops. An example
of such a loop circumscribing a domain is depicted in the left panel of fig. 7.21. It turns out that
n o
NP ≃ κP P −θ · 1 + O(P −1 ) , (7.245)
7.8. GLOBAL SYMMETRIES 415

where κ = z − 1 with z the lattice coordination number, and θ is some exponent. We can understand the
κP factor in the following way. At each step along the perimeter of the loop, there are κ = z−1 possible
directions to go (since one doesn’t backtrack). The fact that the loop must avoid overlapping itself and
must return to its original position to be closed leads to the power law term P −θ , which is subleading
since κP P −θ = exp(P ln κ − θ ln P ) and P ≫ ln P for P ≫ 1. Thus,
1 X
F ≈ − Ld P −θ e(ln κ−2βJ)P , (7.246)
β
P

which diverges if ln κ > 2βJ, i.e. if T > 2J/kB ln(z − 1). We identify this singularity with the phase
transition. The high temperature phase involves a proliferation of such loops. The excluded volume
effects between the loops, which we have not taken into account, then enter in an essential way so that
the sum converges. Thus, we have the following picture:
ln κ < 2βJ : large loops suppressed ; ordered phase
ln κ > 2βJ : large loops proliferate ; disordered phase .
On the square lattice, we obtain
2J
kB Tcapprox = = 1.82 J
ln 3
2J
kB Tcexact = = 2.27 J .
sinh−1 (1)

The agreement is better than we should reasonably expect from such a crude argument.
Nota bene : Beware of arguments which allegedly prove the existence of an ordered phase. Generally
speaking, any approximation will underestimate the entropy, and thus will overestimate the stability of
the putative ordered phase.

7.8.3 Continuous symmetries

When the global symmetry group is continuous, the domain walls interpolate smoothly between ordered
phases. The energy generally involves a stiffness term,
Z
E = 12 ρs ddr (∇θ)2 , (7.247)

where θ(r) is the angle of a local rotation about a single axis and where ρs is the spin stiffness. Of course,
in O(n) models, the rotations can be with respect to several different axes simultaneously.
In the ordered phase, we have θ(r) = θ0 , a constant. Now imagine a domain wall in which θ(r) rotates by
2π across the width of the sample. We write θ(r) = 2πnx/L, where L is the linear size of the sample (here
with dimensions of length) and n is an integer telling us how many complete twists the order parameter
field makes. The domain wall then resembles that in fig. 7.22. The gradient energy is
ZL  
d−1 2πn 2
E= 1
2 ρs L dx = 2π 2 n2 ρs Ld−2 . (7.248)
L
0
416 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.22: A domain wall in an XY ferromagnet.

Recall that in the case of discrete symmetry, the domain wall energy scaled as E ∝ Ld−1 . Thus, with
S> ∼ kB ln L for a single wall, we see that the entropy term dominates if d ≤ 2, in which case there is no
finite temperature phase transition. Thus, the lower critical dimension dℓ depends on whether the global
symmetry is discrete or continuous, with

discrete global symmetry =⇒ dℓ = 1


continuous global symmetry =⇒ dℓ = 2 .

Note that all along we have assumed local, short-ranged interactions. Long-ranged interactions can
enhance order and thereby suppress dℓ .
Thus, we expect that for models with discrete symmetries, dℓ = 1 and there is no finite temperature
phase transition for d ≤ 1. For models with continuous symmetries, dℓ = 2, and we expect Tc = 0 for
d ≤ 2. In this context we should emphasize that the two-dimensional XY model does exhibit a phase
transition at finite temperature, called the Kosterlitz-Thouless transition. However, this phase transition
is not associated with the breaking of the continuous global O(2) symmetry and rather has to do with
the unbinding of vortices and antivortices. So there is still no true long-ranged order below the critical
temperature TKT , even though there is a phase transition!

7.8.4 Random systems : Imry-Ma argument

Oftentimes, particularly in condensed matter systems, intrinsic randomness exists due to quenched impu-
rities, grain boundaries, immobile vacancies, etc. How does this quenched randomness affect a system’s
attempt to order at T = 0? This question was taken up in a beautiful and brief paper by J. Imry and S.-K.
Ma, Phys. Rev. Lett. 35, 1399 (1975). Imry and Ma considered models in which there are short-ranged
interactions and a random local field coupling to the local order parameter:
X X
ĤRFI = −J σi σj − H i σi (7.249)
hiji i
X X
ĤRFO(n) = −J Ω̂i · Ω̂j − Hiα Ωiα , (7.250)
hiji i

where
hh Hiα ii = 0 , hh Hiα Hjβ ii = Γ δαβ δij , (7.251)
7.8. GLOBAL SYMMETRIES 417

Figure 7.23: Left panel : Imry-Ma domains for an O(2) model. The arrows point in the direction of the
local order parameter field hΩ̂(r)i. Right panel : free energy density as a function of domain size Ld .
Keep in mind that the minimum possible value for Ld is the lattice spacing a.

where hh · ii denotes a configurational average over the disorder. Imry and Ma reasoned that a system
could try to lower its free energy by forming domains in which the order parameter takes advantage of
local fluctuations in the random field. The size of these domains is assumed to be Ld , a length scale to
be determined. See the sketch in the left panel of fig. 7.23.
There are two contributions to the energy of a given domain: bulk and surface terms. The bulk energy is

Ebulk = −Hrms (Ld /a)d/2 , (7.252)

where a is the lattice spacing. This is because when we add together (Ld /a)d random fields, the magnitude
of the result d/2
√ is proportional to the square root of the number of terms, i.e. to (Ld /a) . The quantity
Hrms = Γ is the root-mean-square fluctuation in the random field at a given site. The surface energy is
(
J (Ld /a)d−1 (discrete symmetry)
Esurface ∝ (7.253)
J (Ld /a)d−2 (continuous symmetry) .

We compute the critical dimension dc by balancing the bulk and surface energies,

d − 1 = 21 d =⇒ dc = 2 (discrete)
1
d−2= 2d =⇒ dc = 4 (continuous) .

The total free energy is F = (V /Ldd ) · ∆E, where ∆E = Ebulk + Esurf . Thus, the free energy per unit cell
is
 1 dc  1 d
F a 2 a 2
f= d
≈J − Hrms . (7.254)
V /a Ld Ld
If d < dc , the surface term dominates for small Ld and the bulk term dominates for large Ld There is
global minimum at
  2
Ld dc J dc −d
= · . (7.255)
a d Hrms
418 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

For d > dc , the relative dominance of the bulk and surface terms is reversed, and there is a global
maximum at this value of Ld .
Sketches of the free energy f (Ld ) in both cases are provided in the right panel of fig. 7.23. We must keep
in mind that the domain size Ld cannot become smaller than the lattice spacing a. Hence we should draw
a vertical line on the graph at Ld = a and discard the portion Ld < a as unphysical. For d < dc , we see
that the state with Ld = ∞, i.e. the ordered state, is never the state of lowest free energy. In dimensions
d < dc , the ordered state is always unstable to domain formation in the presence of a random field.
For d > dc , there are two possibilities, depending on the relative size of J and Hrms. We can see this
by evaluating f (Ld = a) = J − Hrms and f (Ld = ∞) = 0. Thus, if J > Hrms, the minimum energy
state occurs for Ld = ∞. In this case, the system has an ordered ground state, and we expect a finite
temperature transition to a disordered state at some critical temperature Tc > 0. If, on the other hand,
J < Hrms , then the fluctuations in H overwhelm the exchange energy at T = 0, and the ground state is
disordered down to the very smallest length scale (i.e. the lattice spacing a).
Please read the essay, Memories of Shang-Keng Ma.

7.9 Ginzburg-Landau Theory

7.9.1 Ginzburg-Landau free energy

Including gradient terms in the free energy, we write


Z  
  d 1 2 1 4 1 6 1 2
F m(x) , h(x) = d x f0 + 2 a m + 4 b m + 6 c m − h m + 2 κ (∇m) + . . . . (7.256)

In principle, any term which does not violate the appropriate global symmetry will turn up in such an
expansion of the free energy, with some coefficient. Examples include hm3 (both m and h are odd under
time 2 2
 reversal),m (∇m) , etc. We now ask: what function m(x) extremizes the free energy functional
F m(x) , h(x) ? The answer is that m(x) must satisfy the corresponding Euler-Lagrange equation, which
for the above functional is
a m + b m3 + c m5 − h − κ ∇ 2 m = 0 . (7.257)
If a > 0 and h is small (we assume b > 0 and c > 0), we may neglect the m3 and m5 terms and write

a − κ ∇2 m = h , (7.258)

whose solution is obtained by Fourier transform as

ĥ(q)
m̂(q) = , (7.259)
a + κq2

which, with h(x) appropriately defined, recapitulates the result in eqn. 7.201. Thus, we conclude that

1
χ̂(q) = , (7.260)
a + κq 2
7.9. GINZBURG-LANDAU THEORY 419

which should be compared with eqn. 7.206. For continuous functions, we have
Z
m̂(q) = ddx m(x) e−iq·x
Z d (7.261)
dq
m(x) = m̂(q) eiq·x .
(2π)d

We can then derive the result Z


m(x) = ddx′ χ(x − x′ ) h(x′ ) , (7.262)

where Z ′
1 ddq eiq·(x−x )
χ(x − x′ ) = , (7.263)
κ (2π)d q2 + ξ −2
p
where the correlation length is ξ = κ/a ∝ (T − Tc )−1/2 , as before.
If a < 0 then there is a spontaneous magnetization and we write m(x) = m0 + δm(x). Assuming h is
weak, we then have two equations

a + b m20 + c m40 = 0
(7.264)
(a + 3b m20 + 5c m40 − κ ∇2 ) δm = h .

If −a > 0 is small, we have m20 = −a/3b and

ĥ(q)
δm̂(q) = , (7.265)
−2a + κq 2

7.9.2 Domain wall profile

A particularly interesting application of Ginzburg-Landau theory is its application toward modeling the
spatial profile of defects such as vortices and domain walls. Consider, for example, the case of Ising (Z2 )
symmetry with h = 0. We expand the free energy density to order m4 :
Z  
  d 1 2 1 4 1 2
F m(x) = d x f0 + 2 am + 4 bm + 2 κ (∇m) . (7.266)

We assume a < 0, corresponding to T < Tc . Consider now a domain wall, where m(x → −∞) = −m0
and m(x → +∞) = +m0 , where m0 is the equilibrium magnetization, which we obtain from the Euler-
Lagrange equation,
a m + b m3 − κ ∇ 2 m = 0 , (7.267)
q 
assuming a uniform solution where ∇m = 0. This gives m0 = |a| b . It is useful to scale m(x) by m0 ,
writing m(x) = m0 φ(x). The scaled order parameter function φ(x) interpolates between φ(−∞) = −1
and φ(+∞) = 1.
Thus, we have
ξ 2 ∇2φ = −φ + φ3 , (7.268)
420 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

p
where ξ = κ/|a|. We assume φ(x) = φ(x1 ) is only a function of the first coordinate. Then the
Euler-Lagrange equation becomes

d2φ dU
ξ2 2 = −φ + φ3 ≡ − , (7.269)
dx1 dφ

where
2
U (φ) = − 41 φ2 − 1 . (7.270)

The ‘potential’ U (φ) is an inverted double well, with maxima at φ = ±1. The equation φ̈ = −U ′ (φ),
where dot denotes differentiation with respect to ζ, is simply Newton’s second law with time replaced
by space. In order to have a stationary solution at ζ → ±∞ where φ = ±1, the total energy must be
E = U (φ = ±1) = 0, where E = 12 φ̇2 + U (φ). This leads to the first order differential equation


ξ = 12 (1 − φ2 ) , (7.271)
dx1

with solution
φ(x) = tanh(x1 /2ξ) ⇒ m(x) = m0 tanh(x1 /2ξ) . (7.272)

Note that the correlation length ξ diverges at the Ising transition.

7.9.3 Derivation of Ginzburg-Landau free energy

We can make some progress in systematically deriving the Ginzburg-Landau free energy. Consider the
Ising model,
Ĥ X X X
= − 12 Kij σi σj − hi σi + 21 Kii , (7.273)
kB T
i,j i i

where now Kij = Jij /kB T and hi = Hi /kB T are the interaction energies and local magnetic fields in units
of kB T . The last term on the RHS above cancels out any contribution from diagonal elements of Kij .
Our derivation makes use of a generalization of the Gaussian integral,

Z∞  1/2
− 12 ax2 −bx 2π 2
dx e = eb /2a . (7.274)
a
−∞

The generalization is
Z∞ Z∞
1 (2π)N/2 1 A−1
dx1 · · · dxN e− 2 Aij xi xj −bi xi = √ e 2 ij bi bj , (7.275)
det A
−∞ −∞
7.9. GINZBURG-LANDAU THEORY 421

where we use the Einstein convention of summing over repeated indices, and where we assume that the
matrix A is positive definite (else the integral diverges). This allows us to write
 
1
− 12 Kii K σ σ h σ
Z=e Tr e 2 ij i j e i i
Z∞ Z∞
1 −1
− 12 Kii
= det −1/2
(2πK) e dφ1 · · · dφN e− 2 Kij φi φj Tr e(φi +hi )σi
−∞ −∞
Z∞ Z∞ P (7.276)
1 1 −1
ln[2 cosh(φi +hi )]
= det−1/2 (2πK) e− 2 Kii dφ1 · · · dφN e− 2 Kij φi φj
e i

−∞ −∞
Z∞ Z∞
≡ dφ1 · · · dφN e−Φ(φ1 ,...,φN ) ,
−∞ −∞

where
X X
1 −1
Φ= 2 Kij φi φj − ln cosh(φi + hi ) + 21 ln det (2πK) + 21 Tr K − N ln 2 . (7.277)
i,j i

We assume the model is defined on a Bravais lattice, in which case we can write φi = φR . We can then
i
define the Fourier transforms,
1 X
φR = √ φ̂q eiq·R
N q
(7.278)
1 X
φ̂q = √ φR e−iq·R
N R

and X
K̂(q) = K(R) e−iq·R . (7.279)
R

A few remarks about the lattice structure and periodic boundary conditions are in order. For a Bravais
lattice, we can write each direct lattice vector R as a sum over d basis vectors with integer coefficients,
viz.
X d
R= nµ aµ , (7.280)
µ=1

where d is the dimension of space. The reciprocal lattice vectors bµ satisfy

aµ · bν = 2π δµν , (7.281)

and any wavevector q may be expressed as


d
1 X
q= θµ bµ . (7.282)

µ=1
422 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

We can impose periodic boundary conditions on a system of size M1 × M2 × · · · × Md by requiring

φR+Pd = φR . (7.283)
µ=1 lµ Mµ aµ

This leads to the quantization of the wavevectors, which must then satisfy

eiMµ q·aµ = eiMµ θµ = 1 , (7.284)

and therefore θµ = 2πmµ /Mµ , where mµ is an integer. There are then M1 M2 · · · Md = N independent
values of q, which can be taken to be those corresponding to mµ ∈ {1, . . . , Mµ }.

Let’s now expand the function Φ φ ~ in powers of the φ , and to first order in the external fields h . We
i i
obtain
X  X X 
Φ = 21 K̂ −1 (q) − 1 |φ̂q |2 + 12
1
φ4R − hR φR + O φ6 , h2 (7.285)
q R R
1
+ 2 Tr K + 12 Tr ln(2πK) − N ln 2

On P a d-dimensional lattice, for a model with nearest neighbor interactions K1 only, we have K̂(q) =
K1 δ eiq·δ , where δ is a nearest neighbor separation vector. These are the eigenvalues of the matrix
Kij . We note that Kij is then not positive definite, since there are negative eigenvalues20 . To fix this, we
can add a term K0 everywhere along the diagonal. We then have
X
K̂(q) = K0 + K1 cos(q · δ) . (7.286)
δ

Here we have used the inversion symmetry of the Bravais lattice to eliminate the imaginary term. The
eigenvalues are all positive so long as K0 > zK1 , where z is the lattice coordination number. We can
therefore write K̂(q) = K̂(0) − α q2 for small q, with α > 0. Thus, we can write

K̂ −1 (q) − 1 = a + κ q 2 + . . . . (7.287)

To lowest order in q the RHS is isotropic if the lattice has cubic symmetry, but anisotropy will enter in
higher order terms. We’ll assume isotropy at this level. This is not necessary but it makes the discussion
somewhat less involved. We can now write down our Ginzburg-Landau free energy density:

F = a φ2 + 12 κ |∇φ|2 + 1
12 φ4 − h φ , (7.288)

valid to lowest nontrivial order in derivatives, and to sixth order in φ.


One might wonder what we have gained over the inhomogeneous variational density matrix treatment,
where we found
X X
F = − 12 ˆ |m̂(q)|2 −
J(q) Ĥ(−q) m̂(q)
q q
( )
X  1 + mi
 
1 + mi
 
1 − mi
 
1 − mi
(7.289)
+ kB T ln + ln .
2 2 2 2
i
20 π
To evoke a negative eigenvalue on a d-dimensional cubic lattice, set qµ = a
for all µ. The eigenvalue is then −2dK1 .
7.9. GINZBURG-LANDAU THEORY 423

ˆ
Surely we could expand J(q) ˆ − 1 aq2 + . . . and obtain a similar expression for F. However, such a
= J(0) 2
derivation using the variational density matrix is only approximate. The method outlined in this section
is exact.
Let’s return to our complete expression for Φ:
  X
Φ φ~ =Φ φ ~ + v(φR ) , (7.290)
0
R

where
X    
 2 1 2π
~ =
Φ0 φ 1
G−1
(q) φ̂(q) + 1
Tr 1
+ Tr ln − N ln 2 . (7.291)
2 2 1 + G−1 2 1 + G−1
q

Here we have defined


v(φ) = 12 φ2 − ln cosh φ
(7.292)
= 1
12 φ4 − 1
45 φ6 + 17
2520 φ8 + . . .
and
K̂(q)
G(q) = . (7.293)
1 − K̂(q)
We now want to compute Z
~ P
Z= ~ e−Φ0 (φ) e−
Dφ R v(φR ) (7.294)

where
~ ≡ dφ dφ · · · dφ .
Dφ (7.295)
1 2 N
We expand the second exponential factor in a Taylor series, allowing us to write
 X
XX

Z = Z0 1 − v(φR ) + 21 v(φR ) v(φR′ ) + . . . , (7.296)
R R R′

where
Z
~ e−Φ0 (φ) ~
Z0 = Dφ
  (7.297)
1 G
ln Z0 = 2 Tr ln(1 + G) − + N ln 2
1+G
and R

 ~ −Φ0
~ = RD φ F e
F φ . (7.298)
Dφ~ e−Φ0

To evaluate the various terms in the expansion of eqn. 7.296, we invoke Wick’s theorem, which says
Z∞ Z∞ , Z∞ Z∞

−1
− 12 Gij xi xj 1 −1
xi xi · · · xi = dx1 · · · dxN e xi xi · · · xi dx1 · · · dxN e− 2 Gij xi xj
1 2 2L 1 2 2L
−∞ −∞ −∞ −∞ (7.299)
X
= Gj Gj · · · Gj ,
1 j2 3 j4 2L−1 j2L
all distinct
pairings
424 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

where the sets {j1 , . . . , j2L } are all permutations of the set {i1 , . . . , i2L }. In particular, we have

2
x4i = 3 Gii . (7.300)

In our case, we have


 2

1 X
φ4R =3 G(q) . (7.301)
N q
1
Thus, if we write v(φ) ≈ 12 φ4 and retain only the quartic term in v(φ), we obtain
 
F 1 G 1 2
= − ln Z0 = 2Tr − ln(1 + G) + Tr G − N ln 2
kB T 1+G 4N
(7.302)
1 2 1  
= −N ln 2 + Tr G − Tr G2 + O G3 .
4N 4

Note that if we set Kij to be diagonal, then K̂(q) and hence G(q) are constant functions of q. The O G2
term then vanishes, which is required since the free energy cannot depend on the diagonal elements of
Kij .

7.9.4 Ginzburg criterion

Let us define A(T, H, V, N ) to be the usual (i.e. thermodynamic) Helmholtz free energy. Then
Z
−βA
e = Dm e−βF [m(x)] , (7.303)

where the functional F [m(x)] is of the Ginzburg-Landau form, given in eqn. 7.266. The integral above
is a functional integral. We can give it a more precise meaning by defining its measure in the case of
periodic functions m(x) confined to a rectangular box. Then we can expand

1 X
m(x) = √ m̂q eiq·x , (7.304)
V q

and we define the measure Y


Dm ≡ dm0 d Re m̂q d Im m̂q . (7.305)
q
qx >0

Note that the fact that m(x) ∈ R means that m̂−q = m̂∗q . We’ll assume T > Tc and H = 0 and we’ll
explore limit T → Tc+ from above to analyze the properties of the critical region close to Tc . In this limit
we can ignore all but the quadratic terms in m, and we have
Z  X 
−βA 1 2 2
e = Dm exp − 2 β (a + κ q ) |m̂q |
q
(7.306)
Y  πk T 1/2
B
= .
q
a + κ q2
7.9. GINZBURG-LANDAU THEORY 425

Thus,
X  a + κ q2 
1
A= 2 kB T ln . (7.307)
q
πkB T

We now assume that a(T ) = αt, where t is the dimensionless quantity


T − Tc
t= , (7.308)
Tc
known as the reduced temperature.
∂ A 2
We now compute the heat capacity CV = −T ∂T 2 . We are really only interested in the singular contri-
butions to CV , which means that we’re only interested in differentiating with respect to T as it appears
in a(T ). We divide by NS kB where NS is the number of unit cells of our system, which we presume is a
lattice-based model. Note NS ∼ V /ad where V is the volume and a the lattice constant. The dimensionless
heat capacity per lattice site is then

CV α2 ad ddq 1
c≡ = , (7.309)
NS 2κ2 (2π)d (ξ −2 + q 2 )2

where ξ = (κ/αt)1/2 ∝ |t|−1/2 is the correlation length, and where Λ ∼ a−1 is an ultraviolet cutoff. We
define R∗ ≡ (κ/α)1/2 , in which case
ZΛξ
ddq̄ 1
c= R∗−4 ad ξ 4−d · 1
2 , (7.310)
(2π)d (1 + q̄ 2 )2
where q̄ ≡ qξ. Thus, 

const. if d > 4
c(t) ∼ − ln t if d = 4 (7.311)

 d −2
t2 if d < 4 .

For d > 4, mean field theory is qualitatively accurate, with finite corrections. In dimensions d ≤ 4, the
mean field result is overwhelmed by fluctuation contributions as t → 0+ (i.e. as T → Tc+ ). We see that
MFT is sensible provided the fluctuation contributions are small, i.e. provided
R∗−4 ad ξ 4−d ≪ 1 , (7.312)
which entails t ≫ tG , where
  2d
a 4−d
tG = (7.313)
R∗
is the Ginzburg reduced temperature. The criterion for the sufficiency of mean field theory, namely t ≫ tG ,
is known as the Ginzburg criterion. The region |t| < tG is known as the critical region.
In a lattice ferromagnet, as we have seen, R∗ ∼ a is on the scale of the lattice spacing itself, hence tG ∼ 1
and the critical regime is very large. Mean field theory then fails quickly as T → Tc . In a (conventional)
three-dimensional superconductor, R∗ is on the order of the Cooper pair size, and R∗ /a ∼ 102 − 103 ,
hence tG = (a/R∗ )6 ∼ 10−18 − 10−12 is negligibly narrow. The mean field theory of the superconducting
transition – BCS theory – is then valid essentially all the way to T = Tc .
426 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

7.10 Appendix I : Equivalence of the Mean Field Descriptions

In both the variational density matrix and mean field Hamiltonian methods as applied to the Ising model,
we obtained the same result m = tanh (m+h)/θ . What is perhaps not obvious is whether these theories
are in fact the same, i.e. if their respective free energies agree. Indeed, the two free energy functions,
       
1 2 1+m 1+m 1−m 1−m
fA (m, h, θ) = − 2 m − hm + θ ln + ln
2 2 2 2 (7.314)
 
fB (m, h, θ) = + 21 m2 − θ ln e+(m+h)/θ + e−(m+h)/θ ,

where fA is the variational density matrix result and fB is the mean field Hamiltonian result, clearly are
different functions of their arguments. However, it turns out that upon minimizing with respect to m
in each cast, the resulting free energies obey fA (h, θ) = fB (h, θ). This agreement may seem surprising.
The first method utilizes an approximate (variational) density matrix applied to the exact Hamiltonian
Ĥ. The second method approximates the Hamiltonian as ĤMF , but otherwise treats it exactly. The two
Landau expansions seem hopelessly different:

fA (m, h, θ) = −θ ln 2 − hm + 21 (θ − 1) m2 + θ
12 m4 + θ
30 m6 + . . .
(7.315)
1 2 (m + h)2 (m + h)4 (m + h)6
fB (m, h, θ) = −θ ln 2 + 2m − + − + ... .
2θ 12 θ 3 45 θ 5
We shall now prove that these two methods, the variational density matrix and the mean field approach,
are in fact equivalent, and yield the same free energy f (h, θ).
Let us generalize the Ising model and write
X X
Ĥ = − Jij ε(σi , σj ) − Φ(σi ) . (7.316)
i<j i

Here, each ‘spin’ σi may take on any of K possible values, {s1 , . . . , sK }. For the S = 1 Ising model,
we would have K = 3 possibilities, with s1 = −1, s2 = 0, and s3 = +1. But the set {sα }, with
α ∈ {1, . . . , K}, is completely arbitrary21 . The ‘local field’ term Φ(σ) is also a completely arbitrary
function. It may be linear, with Φ(σ) = Hσ, for example, but it could also contain terms quadratic in σ,
or whatever one desires.
The symmetric, dimensionless interaction function ε(σ, σ ′ ) = ε(σ ′ , σ) is a real symmetric K × K matrix.
According to the singular value decomposition theorem, any such matrix may be written in the form
Ns
X

ε(σ, σ ) = Ap λp (σ) λp (σ ′ ) , (7.317)
p=1

where the {Ap } are coefficients (the singular values), and the λp (σ) are the singular vectors. The
number of terms Ns in this decomposition is such that Ns ≤ K. This treatment can be generalized to
account for continuous σ.
21
It needn’t be an equally spaced sequence, for example.
7.10. APPENDIX I : EQUIVALENCE OF THE MEAN FIELD DESCRIPTIONS 427

7.10.1 Variational Density Matrix

The most general single-site variational density matrix is written


K
X
̺(σ) = xα δσ,sα . (7.318)
α=1

Thus, xα is the probability for a given site to be in state α, Pwith σ = sα . The {xα } are the K variational
parameters, subject to the single normalization constraint, α xα = 1. We now have
 
1
f= Tr (̺Ĥ) + kB T Tr (̺ ln ̺)
ˆ
N J(0)
XX X X (7.319)
1
= −2 Ap λp (sα ) λp (sα′ ) xα xα′ − ϕ(sα ) xα + θ xα ln xα ,
p α,α′ α α

where ϕ(σ) = Φ(σ)/Jˆ(0). We extremize in the usual way, introducing a Lagrange


 undetermined multiplier
ζ to enforce the constraint. This means we extend the function f {xα } , writing
X
K 

f (x1 , . . . , xK , ζ) = f (x1 , . . . , xK ) + ζ xα − 1 , (7.320)
α=1

and freely extremizing with respect to the (K + 1) parameters {x1 , . . . , xK , ζ}. This yields K nonlinear
equations,
∂f ∗ XX
0= =− Ap λp (sα ) λp (sα′ ) xα′ − ϕ(sα ) + θ ln xα + ζ + θ , (7.321)
∂xα p ′ α

for each α, and one linear equation, which is the normalization condition,
∂f ∗ X
0= = xα − 1 . (7.322)
∂ζ α

We cannot solve these nonlinear equations analytically, but they may be recast, by exponentiating them,
as (  )
1 1 XX
xα = exp Ap λp (sα ) λp (sα′ ) xα′ + ϕ(sα ) , (7.323)
Z θ p ′ α

with (  )
X 1 XX
(ζ/θ)+1
Z=e = exp Ap λp (sα ) λp (sα′ ) xα′ + ϕ(sα ) . (7.324)
α
θ p ′ α

From the logarithm of xα , we may compute the entropy, and, finally, the free energy:
XX
f (θ) = 21 Ap λp (sα ) λp (sα′ ) xα xα′ − θ ln Z , (7.325)
p α,α′

which is to be evaluated at the solution of 7.321, x∗α (h, θ)
428 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

7.10.2 Mean Field Approximation

We now derive a mean field approximation in the spirit of that used in the Ising model above. We write


λp (σ) = λp (σ) + δλp (σ) , (7.326)


and abbreviate λ̄p = λp (σ) , the thermodynamic average of λp (σ) on any given site. We then have

λp (σ) λp (σ ′ ) = λ̄2p + λ̄p δλp (σ) + λ̄p δλp (σ ′ ) + δλp (σ) δλp (σ ′ )
(7.327)

= −λ̄2p + λ̄p λp (σ) + λp (σ ) + δλp (σ) δλp (σ ′ ) .

The product δλp (σ) δλp (σ ′ ) is of second order in fluctuations, and we neglect it. This leads us to the
mean field Hamiltonian,
X X X 
1
ĤMF = + 2 N J(0)ˆ 2
Ap λ̄p − ˆ
J(0) Ap λ̄p λp (σi ) + Φ(σi ) . (7.328)
p i p

The free energy is then


(  )
 X X 1 X
f {λ̄p }, θ = 1
2 Ap λ̄2p − θ ln exp Ap λ̄p λp (sα ) + ϕ(sα ) . (7.329)
p α
θ p

The variational parameters are the mean field values λ̄p .
The single site probabilities {xα } are then
(  )
1 1 X
xα = exp Ap λ̄p λp (sα ) + ϕ(sα ) , (7.330)
Z θ p
P
with Z implied by the normalization α xα = 1. These results reproduce exactly what we found in eqn.
7.321, since the mean field equation here, ∂f /∂ λ̄p = 0, yields
K
X
λ̄p = λp (sα ) xα . (7.331)
α=1

The free energy is immediately found to be


X
f (θ) = 1
2 Ap λ̄2p − θ ln Z , (7.332)
p

which again agrees with what we found using the variational density matrix.
Thus, whether one extremizes with respect to the set {x1 , . . . , xK , ζ}, or with respect to the set {λ̄p },
the results are the same, in terms of all these parameters, as well as the free energy f (θ). Generically,
both approaches may be termed ‘mean field theory’ since the variational density matrix corresponds to
a mean field which acts on each site independently22 .
22
The function Φ(σ) may involve one or more adjustable parameters which could correspond, for example, to an external
magnetic field h. We suppress these parameters when we write the free energy as f (θ).
7.11. APPENDIX II : ADDITIONAL EXAMPLES 429

7.11 Appendix II : Additional Examples

7.11.1 Blume-Capel model

The Blume-Capel model provides a simple and convenient way to model systems with vacancies. The
simplest version of the model is written
X X
Ĥ = − 21 Jij Si Sj + ∆ Si2 . (7.333)
i,j i

The spin variables Si range over the values {−1 , 0 , +1}, so this is an extension of the S = 1 Ising model.
We explicitly separate out the diagonal terms, writing Jii ≡ 0, and placing them in the second term on
the RHS above. We say that site i is occupied if Si = ±1 and vacant if Si = 0, and we identify −∆
as the vacancy creation energy, which may be positive or negative, depending on whether vacancies are
disfavored or favored in our system.
We make the mean field Ansatz , writing Si = m + δSi . This results in the mean field Hamiltonian,
X X
ˆ m2 − J(0)
ĤMF = 12 N J(0) ˆ m Si + ∆ Si2 . (7.334)
i i

Once again, we adimensionalize, writing f ≡ F/N Jˆ(0), θ = kB T /Jˆ(0), and δ = ∆/J(0).


ˆ We assume
ˆ
J(0) > 0. The free energy per site is then
 
f (θ, δ, m) = 21 m2 − θ ln 1 + 2e−δ/θ cosh(m/θ) . (7.335)

Extremizing with respect to m, we obtain the mean field equation,

2 sinh(m/θ)
m= . (7.336)
exp(δ/θ) + 2 cosh(m/θ)

Note that m = 0 is always a solution. Finding the slope of the RHS at m = 0 and setting it to unity
gives us the critical temperature:
2
θc = . (7.337)
exp(δ/θc ) + 2
This is an implicit equation for θc in terms of the vacancy energy δ.
Let’s now expand the free energy in terms of the magnetization m. We find, to fourth order,
 
−δ/θ
 1 2
f = −θ ln 1 + 2e + θ− m2
2θ 2 + exp(δ/θ)
  (7.338)
1 6
+ 
3
− 1 m4 + . . . .
12 2 + exp(δ/θ) θ 2 + exp(δ/θ)

Note that setting the coefficient of the m2 term to zero yields the equation for θc . However, upon further
examination, we see that the coefficient of the m4 term can also vanish. As we have seen, when both the
430 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.24: Mean field phase diagram for the Blume-Capel model. The black dot signifies a tricritical
point, where the coefficients of m2 and m4 in the Landau free energy expansion both vanish. The dashed
curve denotes a first order transition, and the solid curve a second order transition. The thin dotted line
is the continuation of the θc (δ) relation to zero temperature.

coefficients of the m2 and the m4 terms vanish, we have a tricritical point23 . Setting both coefficients to
zero, we obtain
θt = 13 , δt = 23 ln 2 . (7.339)

At θ = 0, it is easy to see we have a first order transition, simply by comparing the energies of the
paramagnetic (Si = 0) and ferromagnetic (Si = +1 or Si = −1) states. We have
(
EMF 0 if m = 0
= 1 (7.340)
ˆ
N J(0) 2 − ∆ if m = ±1 .

These results are in fact exact, and not only valid for the mean field theory. Mean field theory is
approximate because it neglects fluctuations, but at zero temperature, there are no fluctuations to neglect!
The phase diagram is shown in fig. 7.24. Note that for δ large and negative, vacancies are strongly
disfavored, hence the only allowed states on each site have Si = ±1, which is our old friend the two-state
Ising model. Accordingly, the phase boundary there approaches the vertical line θc = 1, which is the
mean field transition temperature for the two-state Ising model.

7.11.2 Ising antiferromagnet in an external field

Consider the following model: X X


Ĥ = J σi σj − H σi , (7.341)
hiji i

with J > 0 and σi = ±1. We’ve solved for the mean field phase diagram of the Ising ferromagnet; what
happens if the interactions are antiferromagnetic?
23
We should really check that the coefficient of the sixth order term is positive, but that is left as an exercise to the eager
student.
7.11. APPENDIX II : ADDITIONAL EXAMPLES 431

It turns out that under certain circumstances, the ferromagnet and the antiferromagnet behave exactly
the same in terms of their phase diagram, response functions, etc. This occurs when H = 0, and when
the interactions are between nearest neighbors on a bipartite lattice. A bipartite lattice is one which can
be divided into two sublattices, which we call A and B, such that an A site has only B neighbors, and a B
site has only A neighbors. The square, honeycomb, and body centered cubic (BCC) lattices are bipartite.
The triangular and face centered cubic lattices are non-bipartite. Now if the lattice is bipartite and the
interaction matrix Jij is nonzero only when i and j are from different sublattices (they needn’t be nearest
neighbors only), then we can simply redefine the spin variables such that
(
+σj if j ∈ A
σj′ = (7.342)
−σj if j ∈ B .

Then σi′ σj′ = −σi σj , and in terms of the new spin variables the exchange constant has reversed. The
thermodynamic properties are invariant under such a redefinition of the spin variables.
We can see why this trick doesn’t work in the presence of a magnetic field, because the field H would
have to be reversed on the B sublattice. In other words, the thermodynamics of an Ising ferromagnet
on a bipartite lattice in a uniform applied field is identical to that of the Ising antiferromagnet, with the
same exchange constant (in magnitude), in the presence of a staggered field HA = +H and HB = −H.
We treat this problem using the variational density matrix method, using two independent variational
parameters mA and mB for the two sublattices:
1 + mA 1 − mA
̺A (σ) = δσ,1 + δσ,−1
2 2 (7.343)
1 + mB 1 − mB
̺B (σ) = δσ,1 + δσ,−1 .
2 2
With the usual adimensionalization, f = F/N zJ, θ = kB T /zJ, and h = H/zJ, we have the free energy

f (mA , mB ) = 12 mA mB − 21 h (mA + mB ) − 12 θ s(mA ) − 1


2 θ s(mB ) , (7.344)

where the entropy function is


"    #
1+m 1+m 1−m 1−m
s(m) = − ln + ln . (7.345)
2 2 2 2

Note that  
ds 1 1+m d2s 1
= − 2 ln , 2
=− . (7.346)
dm 1−m dm 1 − m2

Differentiating f (mA , mB ) with respect to the variational parameters, we obtain two coupled mean field
equations:
 
∂f θ 1 + mA
= 0 =⇒ mB = h − ln
∂mA 2 1 − mA
  (7.347)
∂f θ 1 + mB
= 0 =⇒ mA = h − ln .
∂mB 2 1 − mB
432 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Figure 7.25: Graphical solution to the mean field equations for the Ising antiferromagnet in an external
field, here for θ = 0.6. Clockwise from upper left: (a) h = 0.1, (b) h = 0.5, (c) h = 1.1, (d) h = 1.4.

h i
−1 1
Recognizing tanh (x) = 2 ln (1+x)/(1−x) , we may write these equations in an equivalent but perhaps
more suggestive form:
   
h − mB h − mA
mA = tanh , mB = tanh . (7.348)
θ θ
In other words, the A sublattice sites see an internal field HA,int = −zJmB from their B neighbors, and
the B sublattice sites see an internal field HB,int = −zJmA from their A neighbors.
We can solve these equations graphically, as in fig. 7.25. Note that there is always a paramagnetic
solution with mA = mB = m, where
   
θ 1+m h−m
m = h − ln ⇐⇒ m = tanh . (7.349)
2 1−m θ
However, we can see from the figure that there will be three solutions to the mean field equations provided
∂m
that ∂mA < −1 at the point of the solution where mA = mB = m. This gives us two equations with
B
which to eliminate mA and mB , resulting in the curve
 
∗ θ 1+m √
h (θ) = m + ln with m = 1 − θ . (7.350)
2 1−m
7.11. APPENDIX II : ADDITIONAL EXAMPLES 433

Figure 7.26: Mean field phase diagram for the Ising antiferromagnet in an external field. The phase
diagram is symmetric under reflection in the h = 0 axis.

Thus, for θ < 1 and |h| < h∗ (θ) there are three solutions to the mean field equations. It is usually the
case, the broken symmetry solutions, which mean those for which mA 6= mB in our case, are of lower
energy than the symmetric solution(s). We show the curve h∗ (θ) in fig. 7.26.
We can make additional progress by defining the average and staggered magnetizations m and ms ,

m ≡ 21 (mA + mB ) , ms ≡ 12 (mA − mB ) . (7.351)

We expand the free energy in terms of ms :

f (m, ms ) = 21 m2 − 12 m2s − h m − 12 θ s(m + ms ) − 12 θ s(m − ms )


  (7.352)
= 21 m2 − h m − θ s(m) − 21 1 + θ s′′ (m) m2s − 24 1
θ s′′′′ (m) m4s + . . . .


The term quadratic in ms vanishes when θ s′′ (m) = −1, i.e. when m = 1 − θ. It is easy to obtain

d3s 2m d4s 2 (1 + 3m2 )


3
=− , = − , (7.353)
dm (1 − m2 )2 dm4 (1 − m2 )3

1
from which we learn that the coefficient of the quartic term, − 24 θ s′′′′ (m), never vanishes. Therefore the
transition remains second order down to θ = 0, where it finally becomes first order.
We can confirm the θ → 0 limit directly. The two competing states are the ferromagnet, with mA =
mB = ±1, and the antiferromagnet, with mA = −mB = ±1. The free energies of these states are

1
f FM = 2 −h , f AFM = − 12 . (7.354)

There is a first order transition when f FM = f AFM , which yields h = 1.


434 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

7.11.3 Canted quantum antiferromagnet

1
Consider the following model for quantum S = 2 spins:
Xh  i X
Ĥ = − J σix σjx + σiy σjy + ∆ σiz σjz + 14 K σiz σjz σkz σlz , (7.355)
hiji hijkli

where σi is the vector of Pauli matrices on site i. The spins live on a square lattice. The second sum is
over all square plaquettes. All the constants J, ∆, and K are positive.
Let’s take a look at the Hamiltonian for a moment. The J term clearly wants the spins to align ferro-
magnetically in the (x, y) plane (in internal spin space). The ∆ term prefers antiferromagnetic alignment
along the ẑ axis. The K term discourages any kind of moment along ẑ and works against the ∆ term.
We’d like our mean field theory to capture the physics behind this competition.
√ √
Accordingly, we break up the square lattice into two interpenetrating 2 × 2 square sublattices (each
rotated by 45◦ with respect to the original), in order to be able to describe an antiferromagnetic state. In
addition, we include a parameter α which describes the canting angle that the spins on these sublattices
make with respect to the x̂-axis. That is, we write

̺A = 1
2 + 12 m sin α σ x + cos α σ z )
̺B = 1
+ 12 m sin α σ x − cos α σ z ) . (7.356)
2

Note that Tr ̺A = Tr ̺B = 1 so these density matrices are normalized. Note also that the mean direction
for a spin on the A and B sublattices is given by

mA,B = Tr (̺A,B σ) = ± m cos α ẑ + m sin α x̂ . (7.357)

Thus, when α = 0, the system is an antiferromagnet with its staggered moment lying along the ẑ axis.
When α = 12 π, the system is a ferromagnet with its moment lying along the x̂ axis.
Finally, the eigenvalues of ̺A,B are still λ± = 12 (1 ± m), hence

s(m) ≡ − Tr (̺A ln ̺A ) = − Tr (̺B ln ̺B )


"    #
1+m 1+m 1−m 1−m (7.358)
=− ln + ln .
2 2 2 2

Note that we have taken mA = mB = m, unlike the case of the antiferromagnet in a uniform field. The
reason is that there remains in our model a symmetry between A and B sublattices.
The free energy is now easily calculated:

F = Tr (̺Ĥ) + kB T Tr (̺ ln ̺)
(7.359)
 
= −2N J sin2 α + ∆ cos2 α m2 + 14 N Km4 cos4 α − N kB T s(m)
7.11. APPENDIX II : ADDITIONAL EXAMPLES 435

Figure 7.27: Mean field phase diagram for the model of eqn. 7.355 for the case κ = 1.

We can adimensionalize by defining δ ≡ ∆/J, κ ≡ K/4J, and θ ≡ kB T /4J. Then the free energy per site
is f ≡ F/4N J is

f (m, α) = − 12 m2 + 21 1 − δ m2 cos2 α + 14 κ m4 cos4 α − θ s(m) . (7.360)

There are two variational parameters: m and θ. We thus obtain two coupled mean field equations,
 
∂f  2 3 4 1 1+m
= 0 = −m + 1 − δ m cos α + κ m cos α + 2 θ ln
∂m 1−m
(7.361)
∂f  
= 0 = 1 − δ + κ m2 cos2 α m2 sin α cos α .
∂α

Let’s start with the second of the mean field equations. Assuming m 6= 0, it is clear from eqn. 7.360 that


 0 if δ < 1





2
cos α = (δ − 1)/κm2 if 1 ≤ δ ≤ 1 + κ m2 (7.362)







1 if δ ≥ 1 + κ m2 .

Suppose δ < 1. Then we have cos α = 0 and the first mean field equation yields the familiar result

m = tanh m/θ . (7.363)

Along the θ axis, then, we have the usual ferromagnet-paramagnet transition at θc = 1.


For 1 < δ < 1 + κm2 we have canting with an angle
r
δ−1
α = α∗ (m) = cos−1 . (7.364)
κ m2
436 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS


Substituting this into the first mean field equation, we once again obtain the relation m = tanh m/θ .
p
However, eventually, as θ is increased, the magnetization will dip below the value m0 ≡ (δ − 1)/κ .
This occurs at a dimensionless temperature
r
m0 δ−1
θ0 = −1 <1 ; m0 = . (7.365)
tanh (m0 ) κ

For θ > θ0 , we have δ > 1 + κm2 , and we must take cos2 α = 1. The first mean field equation then
becomes  
3 θ 1+m
δm − κm = ln , (7.366)
2 1−m

or, equivalently, m = tanh (δm − κm3 )/θ . A simple graphical analysis shows that a nontrivial solution
exists provided θ < δ. Since cos α = ±1, this solution describes an antiferromagnet, with mA = ±mẑ
and mB = ∓mẑ. The resulting mean field phase diagram is then as depicted in fig. 7.27.

7.11.4 Coupled order parameters

Consider the Landau free energy

f (m, φ) = 1
2 am m2 + 14 bm m4 + 21 aφ φ2 + 1
4 bφ φ
4
+ 21 Λ m2 φ2 . (7.367)

We write
am ≡ αm θm , aφ = αφ θφ , (7.368)
where
T − Tc,m T − Tc,φ
θm = , θφ = , (7.369)
T0 T0
where T0 is some temperature scale. We assume without loss of generality that Tc,m > Tc,φ . We begin
by rescaling:
   
αm 1/2 αm 1/2 e
m≡ m
e , φ≡ φ. (7.370)
bm bm
We then have  
 
f = ε0 r 1
e2
2 θm m
1
+ m
4e 4
+r −1 1
2 θφ φe2 + 41 φe4 + 12 λ m
e 2 φe2 , (7.371)

where  1/2
αm αφ α bφ Λ
ε0 = , r= m , λ= . (7.372)
(bm bφ )1/2 αφ bm (bm bφ )1/2
It proves convenient to perform one last rescaling, writing

m̃ ≡ r −1/4 m , φ̃ ≡ r 1/4 ϕ . (7.373)

Then  
2 4 1 −1
f = ε0 1
2 q θm m
1
+ m +
4 2q θφ ϕ2 1 4
+ ϕ + λm ϕ
4
1
2
2 2
, (7.374)
7.11. APPENDIX II : ADDITIONAL EXAMPLES 437

where  1/2  1/4


√ αm bφ
q= r= . (7.375)
αφ bm
Note that we may write
   2  
ε0 2 2
 1 λ m ε0 2 2
 q θm
f (m, ϕ) = m ϕ + m ϕ . (7.376)
4 λ 1 ϕ2 2 q −1 θφ
1

The eigenvalues of the above 2 × 2 matrix are 1 ± λ, with corresponding eigenvectors ±1 . Since ϕ2 > 0,

we are only interested in the first eigenvector 11 , corresponding to the eigenvalue 1 + λ. Clearly when
λ < 1 the free energy is unbounded from below, which is unphysical.
We now set
∂f ∂f
=0 , =0, (7.377)
∂m ∂ϕ
and identify four possible phases:

• Phase I : m = 0, ϕ = 0. The free energy is fI = 0.

• Phase II : m 6= 0 with ϕ = 0. The free energy is


ε0 
f= q θm m2 + 12 m4 , (7.378)
2
hence we require θm < 0 in this phase, in which case
p ε0 2 2
mII = −q θm , fII = − q θm . (7.379)
4

• Phase III : m = 0 with ϕ 6= 0. The free energy is


ε0 −1 
f= q θφ ϕ2 + 12 ϕ4 , (7.380)
2
hence we require θφ < 0 in this phase, in which case
q ε0 −2 2
ϕIII = −q −1 θφ , fIII = − q θφ . (7.381)
4

• Phase IV : m 6= 0 and ϕ 6= 0. Varying f yields


   2  
1 λ m q θm
= − −1 , (7.382)
λ 1 ϕ2 q θφ

with solution

2
q θm − q −1 θφ λ
m =
λ2 − 1
(7.383)
q −1 θφ − q θm λ
ϕ2 = .
λ2 − 1
438 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

Since m2 and ϕ2 must each be nonnegative, phase IV exists only over a yet-to-be-determined subset
of the entire parameter space. The free energy is
q 2 θm
2 + q −2 θ 2 − 2λ θ θ
φ m φ
fIV = . (7.384)
4(λ2 − 1)

We now define θ ≡ θm and τ ≡ θφ − θm = (Tc,m − Tc,φ) /T0 . Note that τ > 0. There are three possible
temperature ranges to consider.

(1) θφ > θm > 0. The only possible phases are I and IV. For phase IV, we must impose the conditions
m2 > 0 and φ2 > 0. If λ2 > 1, then the numerators in eqns. 7.383 must each be positive:
 2 
q 2 θm θφ q θm θφ
λ< , λ< 2 ⇒ λ < min , 2 . (7.385)
θφ q θm θφ q θm

But since either q 2 θm /θφ or its inverse must be less than or equal to unity, this requires λ < −1,
which is unphysical.
If on the other hand we assume λ2 < 1, the non-negativeness of m2 and ϕ2 requires
 2 
q 2 θm θφ q θm θφ
λ> , λ> 2 ⇒ λ > max , 2 >1. (7.386)
θφ q θm θφ q θm
Thus, λ > 1 and we have a contradiction.
Therefore, the only allowed phase for θ > 0 is phase I.
(2) θφ > 0 > θm . Now the possible phases are I, II, and IV. We can immediately rule out phase I
because fII < fI . To compare phases II and IV, we compute
(q λ θm − q −1 θφ )2
∆f = fIV − fII = . (7.387)
4(λ2 − 1)
Thus, phase II has the lower energy if λ2 > 1. For λ2 < 1, phase IV has the lower energy, but the
conditions m2 > 0 and ϕ2 > 0 then entail
q 2 θm θφ
<λ< 2 ⇒ q 2 |θm | > θφ > 0 . (7.388)
θφ q θm
Thus, λ is restricted to the range " #
θφ
λ∈ − 1, − . (7.389)
q 2 |θm |
With θm ≡ θ < 0 and θφ ≡ θ + τ > 0, the condition q 2 |θm | > θφ is found to be
τ
−τ <θ <− . (7.390)
q2 +1
Thus, phase IV exists and has lower energy when
τ θ+τ
−τ <θ <− and −1<λ<− , (7.391)
r+1 rθ
where r = q 2 .
7.11. APPENDIX II : ADDITIONAL EXAMPLES 439

Figure 7.28: Phase diagram for τ = 0.5, r = 1.5 (top) and τ = 0.5, r = 0.25 (bottom). The hatched
purple region is unphysical, with a free energy unbounded from below. The blue lines denote second
order transitions. The thick red line separating phases II and III is a first order line.

(3) 0 > θφ > θm . In this regime, any phase is possible, however once again phase I can be ruled out
since phases II and III are of lower free energy. The condition that phase II have lower free energy
than phase III is
ε 
fII − fIII = 0 q −2 θφ2 − q 2 θm
2
<0, (7.392)
4
i.e. |θφ | < r|θm |, which means r|θ| > |θ| − τ . If r > 1 this is true for all θ < 0, while if r < 1 phase
II is lower in energy only for |θ| < τ /(1 − r).
We next need to test whether phase IV has an even lower energy than the lower of phases II and
III. We have

(q λ θm − q −1 θφ )2
fIV − fII =
4(λ2 − 1)
(7.393)
(q θm − q −1 λ θφ )2
fIV − fIII = .
4(λ2 − 1)

In both cases, phase IV can only be the true thermodynamic phase if λ2 < 1. We then require
440 CHAPTER 7. MEAN FIELD THEORY OF PHASE TRANSITIONS

m2 > 0 and ϕ2 > 0, which fixes


"  2 #
q θm θφ
λ∈ − 1 , min , 2 . (7.394)
θφ q θm

The upper limit will be the first term inside the rounded brackets if q 2 |θm | < θφ , i.e. if r|θ| < |θ|− τ .
This is impossible if r > 1, hence the upper limit is given by the second term in the rounded brackets:
 
θ+τ
r > 1 : λ ∈ − 1, (condition for phase IV) . (7.395)

If r < 1, then the upper limit will be q 2 θm /θφ = rθ/(θ + τ ) if |θ| > τ /(1 − r), and will be
θφ /q 2 θm = (θ + τ )/rθ if |θ| < τ /(1 − r).
 #
τ θ+τ
r<1, − < θ < −τ : λ ∈ − 1, (phase IV)
1−r rθ
 # (7.396)
τ rθ
r<1, θ<− : λ∈ − 1, (phase IV) .
1−r θ+τ

Representative phase diagrams for the cases r > 1 and r < 1 are shown in fig. 7.28.
Chapter 8

Nonequilibrium Phenomena

8.1 References

– H. Smith and H. H. Jensen, Transport Phenomena (Oxford, 1989)


An outstanding, thorough, and pellucid presentation of the theory of Boltzmann transport in clas-
sical and quantum systems.

– P. L. Krapivsky, S. Redner, and E. Ben-Naim, A Kinetic View of Statistical Physics (Cambridge,


2010)
Superb, modern discussion of a broad variety of issues and models in nonequilibrium statistical
physics.

– E. M. Lifshitz and L. P. Pitaevskii, Physical Kinetics (Pergamon, 1981)


Volume 10 in the famous Landau and Lifshitz Course of Theoretical Physics. Surprisingly readable,
and with many applications (some advanced).

– M. Kardar, Statistical Physics of Particles (Cambridge, 2007)


A superb modern text, with many insightful presentations of key concepts. Includes a very instruc-
tive derivation of the Boltzmann equation starting from the BBGKY hierarchy.

– J. A. McLennan, Introduction to Non-equilibrium Statistical Mechanics (Prentice-Hall, 1989)


Though narrow in scope, this book is a good resource on the Boltzmann equation.

– F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, 1987)


This has been perhaps the most popular undergraduate text since it first appeared in 1967, and
with good reason. The later chapters discuss transport phenomena at an undergraduate level.

– N. G. Van Kampen, Stochastic Processes in Physics and Chemistry (3rd edition, North-Holland,
2007)
This is a very readable and useful text. A relaxed but meaty presentation.

441
442 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

8.2 Equilibrium, Nonequilibrium and Local Equilibrium

Classical equilibrium statistical mechanics is described by the full N -body distribution,


 −1
1
 ZN ·
 N! e−β ĤN (p,x) OCE
0
f (x1 , . . . , xN ; p1 , . . . , pN ) = (8.1)

 1
Ξ −1 · N! eβµN e−β ĤN (p,x) GCE .

We assume a Hamiltonian of the form


N
X N
X N
X
p2i
ĤN = + v(xi ) + u(xi − xj ), (8.2)
2m
i=1 i=1 i<j

typically with v = 0, i.e. only two-body interactions. The quantity

ddx1 ddp1 ddxN ddpN


f 0 (x1 , . . . , xN ; p1 , . . . , pN ) · · · (8.3)
hd hd
is the probability, under equilibrium conditions, of finding N particles in the system, with particle #1
lying within d3x1 of x1 and having momentum within ddp1 of p1 , etc. The temperature T and chemical
potential µ are constants, independent of position. Note that f ({xi }, {pi }) is dimensionless.
Nonequilibrium statistical mechanics seeks to describe thermodynamic systems which are out of equilib-
rium, meaning that the distribution function is not given by the Boltzmann distribution above. For a
general nonequilibrium setting, it is hopeless to make progress – we’d have to integrate the equations
of motion for all the constituent particles. However, typically we are concerned with situations where
external forces or constraints are imposed over some macroscopic scale. Examples would include the
imposition of a voltage drop across a metal, or a temperature differential across any thermodynamic
sample. In such cases, scattering at microscopic length and time scales described by the mean free path
ℓ and the collision time τ work to establish local equilibrium throughout the system. A local equilibrium
is a state described by a space and time varying temperature T (r, t) and chemical potential µ(r, t). As
we will see, the Boltzmann distribution with T = T (r, t) and µ = µ(r, t) will not be a solution to the
evolution equation governing the distribution function. Rather, the distribution for systems slightly out
of equilibrium will be of the form f = f 0 + δf , where f 0 describes a state of local equilibrium.
We will mainly be interested in the one-body distribution
N
X

f (r, p; t) = δ xi (t) − r) δ(pi (t) − p
i=1
Z Y (8.4)
N
=N ddxi ddpi f (r, x2 , . . . , xN ; p, p2 , . . . , pN ; t) .
i=2

In this chapter, we will drop the 1/~ normalization for phase space integration. Thus, f (r, p, t) has
dimensions of h−d , and f (r, p, t) d3r d3p is the average number of particles found within d3r of r and d3p
of p at time t.
8.2. EQUILIBRIUM, NONEQUILIBRIUM AND LOCAL EQUILIBRIUM 443

In the GCE, we sum the RHS above over N . Assuming v = 0 so that there is no one-body potential to
break translational symmetry, the equilibrium distribution is time-independent and space-independent:
2 /2mk
f 0 (r, p) = n (2πmkB T )−3/2 e−p BT , (8.5)

where n = N/V or n = n(T, µ) is the particle density in the OCE or GCE. From the one-body distribution
we can compute things like the particle current, j, and the energy current, jε :
Z
p
j(r, t) = ddp f (r, p; t) (8.6)
m
Z
p
jε (r, t) = ddp f (r, p; t) ε(p) , (8.7)
m

where ε(p) = p2 /2m. Clearly these currents both vanish in equilibrium, when f = f 0 , since f 0 (r, p)
depends only on p2 and not on the direction of p. In a steady state nonequilibrium situation, the above
quantities are time-independent.
Thermodynamics says that
dq = T ds = dε − µ dn , (8.8)
where s, ε, and n are entropy density, energy density, and particle density, respectively, and dq is the
differential heat density. This relation may be case as one among the corresponding current densities:

jq = T js = jε − µ j . (8.9)

Thus, in a system with no particle flow, j = 0 and the heat current jq is the same as the energy current
jε .
When the individual particles are not point particles, they possess angular momentum as well as linear
momentum. Following Lifshitz and Pitaevskii, we abbreviate Γ = (p, L) for these two variables for the
case of diatomic molecules, and Γ = (p, L, n̂ · L) in the case of spherical top molecules, where n̂ is the
symmetry axis of the top. We then have, in d = 3 dimensions,

 3
d p point particles
dΓ = d3p L dL dΩL diatomic molecules (8.10)

 3 2
d p L dL dΩL d cos ϑ symmetric tops ,

where ϑ = cos−1 (n̂ · L̂). We will call the set Γ the ‘kinematic variables’. The instantaneous number
density at r is then Z
n(r, t) = dΓ f (r, Γ ; t) . (8.11)

One might ask why we do not also keep track of the angular orientation of the individual molecules. There
are two reasons. First, the rotations of the molecules are generally extremely rapid, so we are justified
in averaging over these motions. Second, the orientation of, say, a rotor does not enter into its energy.
While the same can be said of the spatial position in the absence of external fields, (i) in the presence
of external fields one must keep track of the position coordinate r since there is physical transport of
particles from one region of space to another, and (iii) the collision process, which as we shall see enters
the dynamics of the distribution function, takes place in real space.
444 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

8.3 Boltzmann Transport Theory

8.3.1 Derivation of the Boltzmann equation

For simplicity of presentation, we assume point particles. Recall that


(
3 3 # of particles with positions within d3r of
f (r, p, t) d r d p ≡ (8.12)
r and momenta within d3p of p at time t.

We now ask how the distribution functions f (r, p, t) evolves in time. It is clear that in the absence of
collisions, the distribution function must satisfy the continuity equation,
∂f
+ ∇·(uf ) = 0 . (8.13)
∂t
This is just the condition of number conservation for particles. Take care to note that ∇ and u are
six -dimensional phase space vectors:

u = ( ẋ , ẏ , ż , ṗx , ṗy , ṗz ) (8.14)


 
∂ ∂ ∂ ∂ ∂ ∂
∇= , , , , , . (8.15)
∂x ∂y ∂z ∂px ∂py ∂pz
The continuity equation describes a distribution in which each constituent particle evolves according to
a prescribed dynamics, which for a mechanical system is specified by
dr ∂H dp ∂H
= = v(p) , =− = Fext , (8.16)
dt ∂p dt ∂r
where F is an external applied force. Here,

H(p, r) = ε(p) + Uext (r) . (8.17)

For example, if the particles are under the influence of gravity, then Uext (r) = mg · r and F = −∇Uext =
−mg.
Note that as a consequence of the dynamics, we have ∇·u = 0, i.e. phase space flow is incompressible,
provided that ε(p) is a function of p alone, and not of r. Thus, in the absence of collisions, we have
∂f
+ u ·∇f = 0 . (8.18)
∂t
The differential operator Dt ≡ ∂t + u ·∇ is sometimes called the ‘convective derivative’, because Dt f is
the time derivative of f in a comoving frame of reference.
Next we must consider the effect of collisions, which are not accounted for by the semiclassical dynamics.
In a collision process, a particle with momentum p and one with momentum p̃ can instantaneously convert
into a pair with momenta p′ and p̃′ , provided total momentum is conserved: p + p̃ = p′ + p̃′ . This means
that Dt f 6= 0. Rather, we should write
 
∂f ∂f ∂f ∂f
+ ṙ · + ṗ · = (8.19)
∂t ∂r ∂p ∂t coll
8.3. BOLTZMANN TRANSPORT THEORY 445

where the right side is known as the collision integral. The collision integral is in general a function of
r, p, and t and a functional of the distribution f .
After a trivial rearrangement of terms, we can write the Boltzmann equation as
   
∂f ∂f ∂f
= + , (8.20)
∂t ∂t str ∂t coll
where  
∂f ∂f ∂f
≡ −ṙ · − ṗ · (8.21)
∂t str ∂r ∂p
is known as the streaming term. Thus, there are two contributions to ∂f /∂t : streaming and collisions.

8.3.2 Collisionless Boltzmann equation

In the absence of collisions, the Boltzmann equation is given by


∂f ∂ε ∂f ∂f
+ · − ∇Uext · =0. (8.22)
∂t ∂p ∂r ∂p
In order to gain some intuition about how the streaming term affects the evolution of the distribution
f (r, p, t), consider a case where Fext = 0. We then have
∂f p ∂f
+ · =0. (8.23)
∂t m ∂r
Clearly, then, any function of the form

f (r, p, t) = ϕ r − v(p) t , p (8.24)
∂ε
will be a solution to the collisionless Boltzmann equation, where v(p) = ∂p . One possible solution would
be the Boltzmann distribution,
2
f (r, p, t) = eµ/kB T e−p /2mkB T , (8.25)
which is time-independent1 . Here we have assumed a ballistic dispersion, ε(p) = p2 /2m.
2 /2σ 2 2 /2κ2
For a slightly less trivial example, let the initial distribution be ϕ(r, p) = A e−r e−p , so that
2
− r− pmt /2σ2 −p2 /2κ2
f (r, p, t) = A e e . (8.26)

Consider the one-dimensional version, and rescale position, momentum, and time so that
1 2 1 2
f (x, p, t) = A e− 2 (x̄−p̄ t̄) e− 2 p̄ . (8.27)
1 2
Consider the level sets of f , where f (x, p, t) = A e− 2 α . The equation for these sets is
p
x̄ = p̄ t̄ ± α2 − p̄2 . (8.28)
1
Indeed, any arbitrary function of p alone would be a solution. Ultimately, we require some energy exchanging processes,
such as collisions, in order for any initial nonequilibrium distribution to converge to the Boltzmann distribution.
446 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

1 2 1 2 1 2
Figure 8.1: Level sets for a sample f (x̄, p̄, t̄) = A e− 2 (x̄−p̄t̄) e− 2 p̄ , for values f = A e− 2 α with α in
equally spaced intervals from α = 0.2 (red) to α = 1.2 (blue). The time variable t̄ is taken to be t̄ = 0.0
(upper left), 0.2 (upper right), 0.8 (lower right), and 1.3 (lower left).

For fixed t̄, these level sets describe the loci in phase space of equal probability densities, with the
probability density decreasing exponentially in the parameter α2 . For t̄ = 0, the initial distribution
describes a Gaussian cloud of particles with a Gaussian momentum distribution. As t̄ increases, the
distribution widens in x̄ but not in p̄ – each particle moves with a constant momentum, so the set of
momentum values never changes. However, the level sets in the (x̄ , p̄) plane become elliptical, with a
semimajor axis oriented at an angle θ = ctn −1 (t) with respect to the x̄ axis. For t̄ > 0, he particles at
the outer edges of the cloud are more likely to be moving away from the center. See the sketches in fig.
8.1
Suppose we add in a constant external force Fext . Then it is easy to show (and left as an exercise to the
reader to prove) that any function of the form

 
p t Fext t2 Fext t
f (r, p, t) = A ϕ r − + , p− (8.29)
m 2m m

satisfies the collisionless Boltzmann equation (ballistic dispersion assumed).


8.3. BOLTZMANN TRANSPORT THEORY 447

8.3.3 Collisional invariants

Consider a function A(r, p) of position and momentum. Its average value at time t is
Z
A(t) = d3r d3p A(r, p) f (r, p, t) . (8.30)

Taking the time derivative,


Z
dA ∂f
= d3r d3p A(r, p)
dt ∂t
Z (   )
3 3 ∂ ∂ ∂f
= d r d p A(r, p) − · (ṙf ) − · (ṗf ) + (8.31)
∂r ∂p ∂t coll
Z (    )
3 3 ∂A dr ∂A dp ∂f
= d rd p · + · f + A(r, p) .
∂r dt ∂p dt ∂t coll

Hence, if A is preserved by the dynamics between collisions, then2


dA ∂A dr ∂A dp
= · + · =0. (8.32)
dt ∂r dt ∂p dt
We therefore have that the rate of change of A is determined wholly by the collision integral
Z  
dA 3 3 ∂f
= d r d p A(r, p) . (8.33)
dt ∂t coll

Quantities which are then conserved in the collisions satisfy Ȧ = 0. Such quantities are called collisional
invariants. Examples of collisional invariants include the particle number (A = 1), the components of
the total momentum (A = pµ ) (in the absence of broken translational invariance, due e.g. to the presence
of walls), and the total energy (A = ε(p)).

8.3.4 Scattering processes

What sort of processes contribute to the collision integral? There are two broad classes to consider. The
first involves potential scattering, where a particle in state |Γ i scatters, in the presence of an external
potential, to a state |Γ ′ i. Recall that Γ is an abbreviation for the set of kinematic variables, e.g. Γ = (p, L)
in the case of a diatomic molecule. For point particles, Γ = (px , py , pz ) and dΓ = d3p.

We now define the function w Γ ′ |Γ such that
(

 ′ rate at which a particle within dΓ of (r, Γ )
w Γ |Γ f (r, Γ ; t) dΓ dΓ = (8.34)
scatters to within dΓ ′ of (r, Γ ′ ) at time t.
2
Recall from classical mechanics the definition of the Poisson bracket, {A, B} = ∂A · ∂B − ∂B
∂r ∂p
· ∂A . Then from Hamilton’s
∂r ∂p
equations ṙ = ∂H
∂p
and ṗ = − ∂H
∂r
, where H(p, r , t) is the Hamiltonian, we have dA
dt
= {A, H}. Invariants have zero Poisson
bracket with the Hamiltonian.
448 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

Figure 8.2: Left: single particle scattering process |Γ i → |Γ ′ i. Right: two-particle scattering process
|Γ Γ1 i → |Γ ′ Γ1′ i.

The units of w dΓ are therefore 1/T . The differential scattering cross section for particle scattering is
then 
w Γ ′ |Γ
dσ = dΓ ′ , (8.35)
n |v|
where v = p/m is the particle’s velocity and n the density.
The second class is that  of two-particle scattering processes, i.e. |Γ Γ1 i → |Γ ′ Γ1′ i. We define the scattering
function w Γ ′ Γ1′ | Γ Γ1 by


rate at which two particles within dΓ of (r, Γ )
′ ′
 ′ ′
w Γ Γ1 | Γ Γ1 f2 (r, Γ ; r, Γ1 ; t) dΓ dΓ1 dΓ dΓ1 = and within dΓ1 of (r, Γ1 ) scatter into states within

 ′
dΓ of (r, Γ ′ ) and dΓ1′ of (r, Γ1′ ) at time t ,
(8.36)
where
X  
f2 (r, p ; r ′ , p′ ; t) = δ xi (t) − r) δ(pi (t) − p δ xj (t) − r ′ ) δ(pj (t) − p′ (8.37)
i,j

is the nonequilibrium two-particle distribution for point particles. The differential scattering cross section
is 
w Γ ′ Γ1′ | Γ Γ1
dσ = dΓ ′ dΓ1′ . (8.38)
|v − v1 |

We assume, in both cases, that any scattering occurs locally, i.e. the particles attain their asymptotic
kinematic states on distance scales small compared to the mean interparticle separation. In this case we
can treat each scattering process independently. This assumption is particular to rarefied systems, i.e.
gases, and is not appropriate for dense liquids. The two types of scattering processes are depicted in fig.
8.2.
In computing the collision integral for the state |r, Γ i, we must take care to sum over contributions from
transitions out of this state, i.e. |Γ i → |Γ ′ i, which reduce f (r, Γ ), and transitions into this state, i.e.
8.3. BOLTZMANN TRANSPORT THEORY 449

|Γ ′ i → |Γ i, which increase f (r, Γ ). Thus, for one-body scattering, we have


  Z n o
D ∂f
f (r, Γ ; t) = = dΓ ′ w(Γ | Γ ′ ) f (r, Γ ′ ; t) − w(Γ ′ | Γ ) f (r, Γ ; t) . (8.39)
Dt ∂t coll

For two-body scattering, we have


 
D ∂f
f (r, Γ ; t) =
Dt ∂t coll
Z Z Z n  (8.40)

= dΓ1 dΓ dΓ1′ w Γ Γ1 | Γ ′ Γ1′ f2 (r, Γ ′ ; r, Γ1′ ; t)
 o
− w Γ ′ Γ1′ | Γ Γ1 f2 (r, Γ ; r, Γ1 ; t) .

Unlike the one-body scattering case, the kinetic equation for two-body scattering does not close, since
the LHS involves the one-body distribution f ≡ f1 and the RHS involves the two-body distribution f2 .
To close the equations, we make the approximation

f2 (r, Γ ′ ; r̃, Γ̃ ; t) ≈ f (r, Γ ; t) f (r̃, Γ̃ ; t) . (8.41)

We then have
Z Z Z n
D ′

f (r, Γ ; t) = dΓ1 dΓ dΓ1′ w Γ Γ1 | Γ ′ Γ1′ f (r, Γ ′ ; t) f (r, Γ1′ ; t)
Dt (8.42)
 o
− w Γ ′ Γ1′ | Γ Γ1 f (r, Γ ; t) f (r, Γ1 ; t) .

8.3.5 Detailed balance



Classical mechanics places some restrictions on the form of the kernel w Γ Γ1 | Γ ′ Γ1′ . In particular, if
Γ T = (−p, −L) denotes the kinematic variables under time reversal, then
 
w Γ ′ Γ1′ | Γ Γ1 = w Γ T Γ1T | Γ ′ T Γ1′ T . (8.43)

This is because the time reverse of the process |Γ Γ1 i → |Γ ′ Γ1′ i is |Γ ′ T Γ1′ T i → |Γ T Γ1T i.
In equilibrium, we must have
 
w Γ ′ Γ1′ | Γ Γ1 f 0 (Γ ) f 0 (Γ1 ) d4Γ = w Γ T Γ1T | Γ ′ T Γ1′ T f 0 (Γ ′ T ) f 0 (Γ1′ T ) d4Γ T (8.44)

where
d4Γ ≡ dΓ dΓ1 dΓ ′ dΓ1′ , d4Γ T ≡ dΓ T dΓ1T dΓ ′ T dΓ1′ T . (8.45)
Since dΓ = dΓ T etc., we may cancel the differentials above, and after invoking eqn. 8.43 and suppressing
the common r label, we find
f 0 (Γ ) f 0 (Γ1 ) = f 0 (Γ ′ T ) f 0 (Γ1′ T ) . (8.46)
This is the condition of detailed balance. For the Boltzmann distribution, we have

f 0 (Γ ) = A e−ε/kB T , (8.47)
450 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

where A is a constant and where ε = ε(Γ ) is the kinetic energy, e.g. ε(Γ ) = p2 /2m in the case of point
particles. Note that ε(Γ T ) = ε(Γ ). Detailed balance is satisfied because the kinematics of the collision
requires energy conservation:
ε + ε1 = ε′ + ε′1 . (8.48)
Since momentum is also kinematically conserved, i.e.

p + p1 = p′ + p′1 , (8.49)

any distribution of the form


f 0 (Γ ) = A e−(ε−p·V )/kB T (8.50)
also satisfies detailed balance, for any velocity parameter V . This distribution is appropriate for gases
which are flowing with average particle V .
In addition to time-reversal, parity is also a symmetry of the microscopic mechanical laws. Under the
parity operation P , we have r → −r and p → −p. Note that a pseudovector such as L = r × p
is unchanged under P . Thus, Γ P = (−p, L). Under the combined operation of C = P T , we have
Γ C = (p, −L). If the microscopic Hamiltonian is invariant under C, then we must have
 
w Γ ′ Γ1′ | Γ Γ1 = w Γ C Γ1C | Γ ′ C Γ1′ C . (8.51)

For point particles, invariance under T and P then means

w(p′ , p′1 | p, p1 ) = w(p, p1 | p′ , p′1 ) , (8.52)

and therefore the collision integral takes the simplified form,


 
Df (p) ∂f
=
Dt ∂t
Z Zcoll Z n o (8.53)
= 3
d p1 d3p′ d3p′1 w(p′ , p′1 | p, p1 ) f (p′ ) f (p′1 ) − f (p) f (p1 ) ,

where we have suppressed both r and t variables.


The most general statement of detailed balance is

f 0 (Γ ′ ) f 0 (Γ1′ ) w Γ ′ Γ1′ | Γ Γ1
=  . (8.54)
f 0 (Γ ) f 0 (Γ1 ) w Γ Γ1 | Γ ′ Γ1′

Under this condition, the collision term vanishes for f = f 0 , which is the equilibrium distribution.

8.3.6 Kinematics and cross section

We can rewrite eqn. 8.53 in the form


Z Z
Df (p) 3 ∂σ n ′ ′
o
= d p1 dΩ |v − v1 | f (p ) f (p1 ) − f (p) f (p1 ) , (8.55)
Dt ∂Ω
8.3. BOLTZMANN TRANSPORT THEORY 451

∂σ
where ∂Ω is the differential scattering cross section. If we recast the scattering problem in terms of center-
of-mass and relative coordinates, we conclude that the total momentum is conserved by the collision, and
furthermore that the energy in the CM frame is conserved, which means that the magnitude of the relative
momentum is conserved. Thus, we may write p′ − p′1 = |p − p1 | Ω̂, where Ω̂ is a unit vector. Then p′
and p′1 are determined to be

p′ = 1
2 p + p1 + |p − p1 | Ω̂
 (8.56)
p′1 = 1
2 p + p1 − |p − p1 | Ω̂ .

8.3.7 H-theorem

Let’s consider the Boltzmann equation with two particle collisions. We define the local (i.e. r-dependent)
quantity Z
ρϕ (r, t) ≡ dΓ ϕ(Γ, f ) f (Γ, r, t) . (8.57)

At this point, ϕ(Γ, f ) is arbitrary. Note that the ϕ(Γ, f ) factor has r and t dependence through its
dependence on f , which itself is a function of r, Γ , and t. We now compute
Z Z
∂ρϕ ∂(ϕf ) ∂(ϕf ) ∂f
= dΓ = dΓ
∂t ∂t ∂f ∂t
Z Z  
∂(ϕf ) ∂f
= − dΓ u · ∇(ϕf ) − dΓ (8.58)
∂f ∂t coll
I Z  
∂(ϕf ) ∂f
= − dΣ n̂ · (u ϕf ) − dΓ .
∂f ∂t coll

The first term on the last line follows from the divergence theorem, and vanishes if we assume f = 0 for
infinite values of the kinematic variables, which is the only physical possibility. Thus, the rate of change
of ρϕ is entirely due to the collision term. Thus,
Z Z Z Z n o
∂ρϕ  
= ′
dΓ dΓ1 dΓ dΓ1′ w Γ ′ Γ1′ | Γ Γ1 f f1 χ − w Γ Γ1 | Γ ′ Γ1′ f ′ f1′ χ
∂t
Z Z Z Z (8.59)


= dΓ dΓ1 dΓ dΓ1′ w Γ ′ Γ1′ | Γ Γ1 f f1 (χ − χ′ ) ,

where f ≡ f (Γ ), f ′ ≡ f (Γ ′ ), f1 ≡ f (Γ1 ), f1′ ≡ f (Γ1′ ), χ = χ(Γ ), with

∂(ϕf ) ∂ϕ
χ= =ϕ+f . (8.60)
∂f ∂f
We now invoke the symmetry  
w Γ ′ Γ1′ | Γ Γ1 = w Γ1′ Γ ′ | Γ1 Γ , (8.61)
which allows us to write
Z Z Z Z
∂ρϕ 
= 1
2

dΓ dΓ1 dΓ dΓ1′ w Γ ′ Γ1′ | Γ Γ1 f f1 (χ + χ1 − χ′ − χ′1 ) . (8.62)
∂t
452 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

This shows that ρϕ is preserved by the collision term if χ(Γ ) is a collisional invariant.

Now let us consider ϕ(f ) = ln f . We define h ≡ ρ ϕ=ln f . We then have
Z Z Z Z
∂h 1 ′
= − 2 dΓ dΓ1 dΓ dΓ1′ w f ′ f1′ · x ln x , (8.63)
∂t

where w ≡ w Γ ′ Γ1′ | Γ Γ1 and x ≡ f f1 /f ′ f1′ . We next invoke the result
Z Z Z Z
 
′ ′ ′ ′ ′
dΓ dΓ1 w Γ Γ1 | Γ Γ1 = dΓ dΓ1′ w Γ Γ1 | Γ ′ Γ1′ (8.64)

which is a statement of unitarity of the scattering matrix3 . Multiplying both sides by f (Γ ) f (Γ1 ), then
integrating over Γ and Γ1 , and finally changing variables (Γ, Γ1 ) ↔ (Γ ′ , Γ1′ ), we find
Z Z Z Z Z Z Z Z

′ ′ ′ ′ ′
0 = dΓ dΓ1 dΓ dΓ1 w f f1 − f f1 = dΓ dΓ1 dΓ dΓ1′ w f ′ f1′ (x − 1) . (8.65)

1
Multiplying this result by 2 and adding it to the previous equation for ḣ, we arrive at our final result,
Z Z Z Z
∂h 1 ′
= − 2 dΓ dΓ1 dΓ dΓ1′ w f ′ f1′ (x ln x − x + 1) . (8.66)
∂t
Note that w, f ′ , and f1′ are all nonnegative. It is then easy to prove that the function g(x) = x ln x − x + 1
is nonnegative for all positive x values4 , which therefore entails the important result
∂h(r, t)
≤0. (8.67)
∂t
R
Boltzmann’s H function is the space integral of the h density: H = d3r h.
Thus, everywhere in space, the function h(r, t) is monotonically decreasing or constant, due to collisions.
In equilibrium, ḣ = 0 everywhere, which requires x = 1, i.e.

f 0 (Γ ) f 0 (Γ1 ) = f 0 (Γ ′ ) f 0 (Γ1′ ) , (8.68)

or, taking the logarithm,


ln f 0 (Γ ) + ln f 0 (Γ1 ) = ln f 0 (Γ ′ ) + ln f 0 (Γ1′ ) . (8.69)
But this means that ln f 0 is itself a collisional invariant, and if 1, p, and ε are the only collisional
invariants, then ln f 0 must be expressible in terms of them. Thus,
µ V ·p ε
ln f 0 = + − , (8.70)
kB T kB T kB T

where µ, V , and T are constants which parameterize the equilibrium distribution f 0 (p), corresponding
to the chemical potential, flow velocity, and temperature, respectively.
3
See Lifshitz and Pitaevskii, Physical Kinetics, §2.
4
The function g(x) = x ln x − x + 1 satisfies g ′ (x) = ln x, hence g ′ (x) < 0 on the interval x ∈ [0, 1) and g ′ (x) > 0 on
x ∈ (1, ∞]. Thus, g(x) monotonically decreases from g(0) = 1 to g(1) = 0, and then monotonically increases to g(∞) = ∞,
never becoming negative.
8.4. WEAKLY INHOMOGENEOUS GAS 453

8.4 Weakly Inhomogeneous Gas

Consider a gas which is only weakly out of equilibrium. We follow the treatment in Lifshitz and Pitaevskii,
§6. As the gas is only slightly out of equilibrium, we seek a solution to the Boltzmann equation of the
form f = f 0 + δf , where f 0 is describes a local equilibrium. Recall that such a distribution function
is annihilated by the collision term in the Boltzmann equation but not by the streaming term, hence a
correction δf must be added in order to obtain a solution.
The most general form of local equilibrium is described by the distribution
 
µ − ε(Γ ) + V · p
f 0 (r, Γ ) = C exp , (8.71)
kB T
where µ = µ(r, t), T = T (r, t), and V = V (r, t) vary in both space and time. Note that
! 
0 dT ∂f 0
df = dµ + p · dV + (ε − µ − V · p) − dε −
T ∂ε
!  (8.72)
1 dT ∂f 0
= dp + p · dV + (ε − h) − dε −
n T ∂ε
where we have assumed V = 0 on average, and used
   
∂µ ∂µ
dµ = dT + dp
∂T p ∂p T
(8.73)
1
= −s dT + dp ,
n
where s is the entropy per particle and n is the number density. We have further written h = µ + T s,
which is the enthalpy per particle. Here, cp is the heat capacity per particle at constant pressure5 . Finally,
note that when f 0 is the Maxwell-Boltzmann distribution, we have
∂f 0 f0
− = . (8.74)
∂ε kB T

The Boltzmann equation is written


   
∂ p ∂ ∂ 0
 ∂f
+ · +F · f + δf = . (8.75)
∂t m ∂r ∂p ∂t coll
The RHS of this equation must be of order δf because the local equilibrium distribution f 0 is annihilated
by the collision integral. We therefore wish to evaluate one of the contributions to the LHS of this
equation,
 ( h i
∂f 0 p ∂f 0 ∂f 0 ∂f 0 1 ∂p ε − h ∂T
+ · +F · = − + + mv· (v·∇) V
∂t m ∂r ∂p ∂ε n ∂t T ∂t
  ) (8.76)
∂V 1 ε−h
+v· m + ∇p + v · ∇T − F · v .
∂t n T
5
In the chapter on thermodynamics, we adopted a slightly different definition of cp as the heat capacity per mole. In this
chapter cp is the heat capacity per particle.
454 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

To simplify this, first note that Newton’s laws applied to an ideal fluid give ρV̇ = −∇p, where ρ = mn
is the mass density. Corrections to this result, e.g. viscosity and nonlinearity in V , are of higher order.
Next, continuity for particle number means ṅ + ∇·(nV ) = 0. We assume V is zero on average and that
all derivatives are small, hence ∇·(nV ) = V ·∇n + n ∇·V ≈ n ∇·V . Thus,

∂ ln n ∂ ln p ∂ ln T
= − = −∇·V , (8.77)
∂t ∂t ∂t
where we have invoked the ideal gas law n = p/kB T above.
Next, we invoke conservation of entropy. If s is the entropy per particle, then ns is the entropy per unit
volume, in which case we have the continuity equation
   
∂(ns) ∂s ∂n
+ ∇ · (nsV ) = n + V ·∇s + s + ∇ · (nV ) = 0 . (8.78)
∂t ∂t ∂t

The second bracketed term on the RHS vanishes because of particle continuity, leaving us with ṡ+V ·∇s ≈
ṡ = 0 (since V = 0 on average, and any gradient is first order in smallness). Now thermodynamics says
   
∂s ∂s
ds = dT + dp
∂T p ∂p T
(8.79)
cp kB
= dT − dp ,
T p
∂s
 ∂s
 ∂v

since T ∂T p = cp and ∂p T = ∂T p , where v = V /N . Thus,

cp ∂ ln T ∂ ln p
− =0. (8.80)
kB ∂t ∂t
∂ ln T ∂ ln p
We now have in eqns. 8.77 and 8.80 two equations in the two unknowns ∂t and ∂t , yielding

∂ ln T k
= − B ∇·V (8.81)
∂t cV
∂ ln p cp
=− ∇·V . (8.82)
∂t cV

Thus eqn. 8.76 becomes


 (
∂f 0 p ∂f 0 ∂f 0 ∂f 0 ε(Γ ) − h
+ · +F · = − v · ∇T + m vα vβ Qαβ
∂t m ∂r ∂p ∂ε T
) (8.83)
h − T cp − ε(Γ )
+ ∇·V − F · v ,
cV /kB

where  
1 ∂Vα ∂Vβ
Qαβ = + . (8.84)
2 ∂xβ ∂xα
8.5. RELAXATION TIME APPROXIMATION 455

Therefore, the Boltzmann equation takes the form


( )  
ε(Γ ) − h ε(Γ ) − h + T cp f0 ∂ δf ∂f
v · ∇T + m vα vβ Qαβ − ∇·V − F · v + = . (8.85)
T cV /kB kB T ∂t ∂t coll

∂ δf ∂ δf
Notice we have dropped the terms v · ∂r and F · ∂p , since δf must already be first order in smallness,
and both the ∂
∂r operator as well as F add a second order of smallness, which is negligible. Typically ∂∂tδf
is nonzero if the applied force F (t) is time-dependent. We use the convention of summing over repeated
indices. Note that δαβ Qαβ = Qαα = ∇·V . For ideal gases in which only translational and rotational
degrees of freedom are excited, h = cp T .

8.5 Relaxation Time Approximation

8.5.1 Approximation of collision integral

We now consider a very simple model of the collision integral,


 
∂f f − f0 δf
=− =− . (8.86)
∂t coll τ τ

This model is known as the relaxation time approximation. Here, f 0 = f 0 (r, p, t) is a distribution function
which describes a local equilibrium at each position r and time t. The quantity τ is the relaxation time,
which can in principle be momentum-dependent, but which we shall first consider to be constant. In the
absence of streaming terms, we have

∂ δf δf
=− =⇒ δf (r, p, t) = δf (r, p, 0) e−t/τ . (8.87)
∂t τ
The distribution f then relaxes to the equilibrium distribution f 0 on a time scale τ . We note that this
approximation is obviously flawed in that all quantities – even the collisional invariants – relax to their
equilibrium values on the scale τ . In the Appendix, we consider a model for the collision integral in which
the collisional invariants are all preserved, but everything else relaxes to local equilibrium at a single rate.

8.5.2 Computation of the scattering time

Consider two particles with velocities v and v ′ . The average of their relative speed is
Z Z
h |v − v ′ | i = d3v d3v ′ P (v) P (v ′ ) |v − v ′ | , (8.88)

where P (v) is the Maxwell velocity distribution,


 3/2  
m mv 2
P (v) = exp − , (8.89)
2πkB T 2kB T
456 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

Figure 8.3: Graphic representation of the equation n σ v̄rel τ = 1, which yields the scattering time τ in
terms of the number density n, average particle pair relative velocity v̄rel , and two-particle total scattering
cross section σ. The equation says that on average there must be one particle within the tube.

which follows from the Boltzmann form of the equilibrium distribution f 0 (p). It is left as an exercise for
the student to verify that
 
′ 4 kB T 1/2
v̄rel ≡ h |v − v | i = √ . (8.90)
π m

Note that v̄rel = 2 v̄, where v̄ is the average particle speed. Let σ be the total scattering cross section,
which for hard spheres is σ = πd2 , where d is the hard sphere diameter. Then the rate at which particles
scatter is
1
= n v̄rel σ . (8.91)
τ
The particle mean free path is simply
1
ℓ = v̄ τ = √ . (8.92)
2nσ
While the scattering length is not temperature-dependent within this formalism, the scattering time is
T -dependent, with
√  
1 π m 1/2
τ (T ) = = . (8.93)
n v̄rel σ 4nσ kB T
As T → 0, the collision time diverges as τ ∝ T −1/2 , because the particles on average move more√slowly
at lower temperatures. The mean free path, however, is independent of T , and is given by ℓ = 1/ 2nσ.

8.5.3 Thermal conductivity

We consider a system with a temperature gradient ∇T and seek a steady state (i.e. time-independent)
solution to the Boltzmann equation. We assume Fα = Qαβ = 0. Appealing to eqn. 8.85, and using the
relaxation time approximation for the collision integral, we have

τ (ε − cp T )
δf = − (v · ∇T ) f 0 . (8.94)
kB T 2

We are now ready to compute the energy and particle currents. In order to compute the local density of
any quantity A(r, p), we multiply by the distribution f (r, p) and integrate over momentum:
Z
ρA (r, t) = d3p A(r, p) f (r, p, t) , (8.95)
8.5. RELAXATION TIME APPROXIMATION 457

R
For the energy (thermal) current, we let A = ε vα = ε pα /m, in which case ρA = jα . Note that d3pp f 0 = 0
since f 0 is isotropic in p even when µ and T depend on r. Thus, only δf enters into the calculation of
the various currents. Thus, the energy (thermal) current is
Z
jεα (r) = d3p ε v α δf
(8.96)

α β ∂T
=− v v ε (ε − cp T ) ,
kB T 2 ∂xβ
where the repeated index β is summed over, and where momentum averages are defined relative to the
equilibrium distribution, i.e.
Z Z Z
3 0
h φ(p) i = d p φ(p) f (p) d3p f 0 (p) = d3v P (v) φ(mv) . (8.97)

In this context, it is useful to point out the identity


d3p f 0 (p) = n d3v P (v) , (8.98)
where  3/2
m 2 /2k
P (v) = e−m(v−V ) BT (8.99)
2πkB T
is the Maxwell velocity distribution.
Note that if φ = φ(ε) is a function of the energy, and if V = 0, then
d3p f 0 (p) = n d3v P (v) = n Pe(ε) dε , (8.100)
where
Pe(ε) = √2 (k T )−3/2 ε1/2 e−ε/kB T
π B
, (8.101)
R∞
is the Maxwellian distribution of single particle energies. This distribution is normalized with dε Pe(ε) =
0
1. Averages with respect to this distribution are given by
Z∞ Z∞
h φ(ε) i = dε φ(ε) Pe(ε) = √π (kB T )
2 −3/2
dε ε1/2 φ(ε) e−ε/kB T . (8.102)
0 0

If φ(ε) is homogeneous, then for any α we have



h εα i = √2 Γ
π
α+ 3
2 (kB T )α . (8.103)

Due to spatial isotropy, it is clear that we can replace



v α v β → 31 v 2 δαβ = δ (8.104)
3m αβ
in eqn. 8.96. We then have jε = −κ ∇T , with
2nτ 2
 5nτ kB2 T
κ= h ε ε − cp T i = = π8 nℓv̄ cp , (8.105)
3mkB T 2 2m
8kB T
where we have used cp = 25 kB and v̄ 2 = πm . The quantity κ is called the thermal conductivity. Note
that κ ∝ T 1/2 .
458 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

Figure 8.4: Gedankenexperiment to measure shear viscosity η in a fluid. The lower plate is fixed. The
viscous drag force per unit area on the upper plate is Fdrag /A = −ηV /d. This must be balanced by an
applied force F .

8.5.4 Viscosity

Consider the situation depicted in fig. 8.4. A fluid filling the space between two large flat plates at z = 0
and z = d is set in motion by a force F = F x̂ applied to the upper plate; the lower plate is fixed. It
is assumed that the fluid’s velocity locally matches that of the plates. Fluid particles at the top have
an average x-component of their momentum hpx i = mV . As these particles move downward toward
lower z values, they bring their x-momenta with them. Therefore there is a downward (−ẑ-directed)
flow of hpx i. Since x-momentum is constantly being drawn away from z = d plane, this means that
there is a −x-directed viscous drag on the upper plate. The viscous drag force per unit area is given by
Fdrag /A = −ηV /d, where V /d = ∂Vx /∂z is the velocity gradient and η is the shear viscosity. In steady
state, the applied force balances the drag force, i.e. F + Fdrag = 0. Clearly in the steady state the net
momentum density of the fluid does not change, and is given by 12 ρV x̂, where ρ is the fluid mass density.
The momentum per unit time injected into the fluid by the upper plate at z = d is then extracted by the
lower plate at z = 0. The momentum flux density Πxz = n h px vz i is the drag force on the upper surface
per unit area: Πxz = −η ∂V∂z . The units of viscosity are [η] = M/LT .
x

We now provide some formal definitions of viscosity. As we shall see presently, there is in fact a second
type of viscosity, called second viscosity or bulk viscosity, which is measurable although not by the type
of experiment depicted in fig. 8.4.
The momentum flux tensor Παβ = n h pα vβ i is defined to be the current of momentum component pα in
the direction of increasing xβ . For a gas in motion with average velocity V, we have

Παβ = nm h (Vα + vα′ )(Vβ + vβ′ ) i


= nm Vα Vβ + nm h vα′ vβ′ i
2 (8.106)
= nm Vα Vβ + 13 nm h v ′ i δαβ
= ρ Vα Vβ + p δαβ ,

where v ′ is the particle velocity in a frame moving with velocity V, and where we have invoked the ideal
gas law p = nkB T . The mass density is ρ = nm.
8.5. RELAXATION TIME APPROXIMATION 459

When V is spatially varying,


Παβ = p δαβ + ρ Vα Vβ − σ̃αβ , (8.107)
where σ̃αβ is the viscosity stress tensor . Any symmetric tensor, such as σ̃αβ , can be decomposed into a
sum of (i) a traceless component, and (ii) a component proportional to the identity matrix. Since σ̃αβ
should be, to first order, linear in the spatial derivatives of the components of the velocity field V , there
is a unique two-parameter decomposition:

∂Vα ∂Vβ 2
σ̃αβ =η + − ∇·V δαβ + ζ ∇·V δαβ
∂xβ ∂xα 3 (8.108)
 
= 2η Qαβ − 13 Tr (Q) δαβ + ζ Tr (Q) δαβ .

The coefficient of the traceless component is η, known as the shear viscosity. The coefficient of the
component proportional to the identity is ζ, known as the bulk viscosity. The full stress tensor σαβ
contains a contribution from the pressure:

σαβ = −p δαβ + σ̃αβ . (8.109)

The differential force dFα that a fluid exerts on on a surface element n̂ dA is

dFα = −σαβ nβ dA , (8.110)

where we are using the Einstein summation convention and summing over the repeated index β. We will
now compute the shear viscosity η using the Boltzmann equation in the relaxation time approximation.
Appealing again to eqn. 8.85, with F = 0 and h = cp T , we find
( )
τ ε − cp T ε
δf = − m vα vβ Qαβ + v · ∇T − ∇·V f 0 . (8.111)
kB T T cV /kB

We assume ∇T = ∇·V = 0, and we compute the momentum flux:


Z
Πxz = n d3p px vz δf

nm2 τ
=− Qαβ h vx vz vα vβ i
kB T
  (8.112)
nτ ∂Vx ∂Vz
=− + h mvx2 · mvz2 i
kB T ∂z ∂x
 
∂Vz ∂Vx
= −nτ kB T + .
∂x ∂z

Thus, if Vx = Vx (z), we have


∂Vx
Πxz = −nτ kB T (8.113)
∂z
from which we read off the viscosity,

η = nkB T τ = π8 nmℓv̄ . (8.114)


460 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

Figure 8.5: Left: thermal conductivity (λ in figure) of Ar between T = 800 K and T = 2600 K. The best
fit to a single power law λ = aT b results in b = 0.651. Source: G. S. Springer and E. W. Wingeier, J.
Chem Phys. 59, 1747 (1972). Right: log-log plot of shear viscosity (µ in figure) of He between T ≈ 15 K
and T ≈ 1000 K. The red line has slope 12 . The slope of the data is approximately 0.633. Source: J.
Kestin and W. Leidenfrost, Physica 25, 537 (1959).

Note that η(T ) ∝ T 1/2 .


How well do these predictions hold up? In fig. 8.5, we plot data for the thermal conductivity of argon
and the shear viscosity of helium. Both show a clear sublinear behavior as a function of temperature, but
the slope d ln κ/d ln T is approximately 0.65 and d ln η/d ln T is approximately 0.63. Clearly the simple
model is not even getting the functional dependence on T right, let alone its coefficient. Still, our crude
theory is at least qualitatively correct.
Why do both κ(T ) as well as η(T ) decrease at low temperatures? The reason is that the heat current
which flows in response to ∇T as well as the momentum current which flows in response to ∂Vx /∂z
are due to the presence of collisions, which result in momentum and energy transfer between particles.
This is true even when total energy and momentum are conserved, which they are not in the relaxation
time approximation. Intuitively, we might think that the viscosity should increase as the temperature
is lowered, since common experience tells us that fluids ‘gum up’ as they get colder – think of honey
as an extreme example. But of course honey is nothing like an ideal gas, and the physics behind the
crystallization or glass transition which occurs in real fluids when they get sufficiently cold is completely
absent from our approach. In our calculation, viscosity results from collisions, and with no collisions
there is no momentum transfer and hence no viscosity. If, for example, the gas particles were to simply
pass through each other, as though they were ghosts, then there would be no opposition to maintaining
an arbitrary velocity gradient.

8.5.5 Oscillating external force

Suppose a uniform oscillating external force Fext (t) = F e−iωt is applied. For a system of charged
particles, this force would arise from an external electric field Fext = qE e−iωt , where q is the charge of
8.5. RELAXATION TIME APPROXIMATION 461

each particle. We’ll assume ∇T = 0. The Boltzmann equation is then written

∂f p ∂f ∂f f − f0
+ · + F e−iωt · =− . (8.115)
∂t m ∂r ∂p τ

We again write f = f 0 + δf , and we assume δf is spatially constant. Thus,

∂ δf ∂f 0 δf
+ F e−iωt · v =− . (8.116)
∂t ∂ε τ

If we assume δf (t) = δf (ω) e−iωt then the above differential equation is converted to an algebraic equation,
with solution
τ e−iωt ∂f 0
δf (t) = − F ·v . (8.117)
1 − iωτ ∂ε
We now compute the particle current:
Z
jα (r, t) = d3p v δf
Z
τ e−iωt Fβ
= · d3p f 0 (p) vα vβ
1 − iωτ kB T
Z (8.118)
τ e−iωt nFα
= · d3v P (v) v 2
1 − iωτ 3kB T
nτ Fα e−iωt
= · .
m 1 − iωτ

If the particles are electrons, with charge q = −e, then the electrical current is (−e) times the particle
current. We then obtain
ne2 τ Eα e−iωt
jα(elec) (t) = · ≡ σαβ (ω) Eβ e−iωt , (8.119)
m 1 − iωτ
where
ne2 τ 1
σαβ (ω) = · δ (8.120)
m 1 − iωτ αβ
is the frequency-dependent electrical conductivity tensor. Of course for fermions such as electrons, we
should be using the Fermi distribution in place of the Maxwell-Boltzmann distribution for f 0 (p). This
affects the relation between n and µ only, and the final result for the conductivity tensor σαβ (ω) is
unchanged.

8.5.6 Quick and Dirty Treatment of Transport

Suppose we have some averaged intensive quantity φ which is spatially dependent through T (r) or µ(r)
or V (r). For simplicity we will write φ = φ(z). We wish to compute the current of φ across some surface
whose equation is dz = 0. If the mean free path is ℓ, then the value of φ for particles crossing this surface
in the +ẑ direction is φ(z − ℓ cos θ), where θ is the angle the particle’s velocity makes with respect to
462 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

ẑ, i.e. cos θ = vz /v. We perform the same analysis for particles moving in the −ẑ direction, for which
φ = φ(z + ℓ cos θ). The current of φ through this surface is then
Z Z
jφ = nẑ d v P (v) vz φ(z − ℓ cos θ) + nẑ d3v P (v) vz φ(z + ℓ cos θ)
3

vz >0 vz <0 (8.121)


Z
∂φ vz2 ∂φ
= −nℓ ẑ d3v P (v) = − 31 nv̄ℓ ẑ ,
∂z v ∂z
q
where v̄ = 8kπmBT
is the average particle speed. If the z-dependence of φ comes through the dependence
of φ on the local temperature T , then we have
∂φ
jφ = − 31 nℓv̄ ∇T ≡ −K ∇T , (8.122)
∂T
where
∂φ
K = 31 nℓv̄ (8.123)
∂T
∂φ
is the transport coefficient. If φ = hεi, then ∂T = cp , where cp is the heat capacity per particle at constant
pressure. We then find jε = −κ ∇T with thermal conductivity
κ = 31 nℓv̄ cp . (8.124)
π
Our Boltzmann equation calculation yielded the same result, but with a prefactor of 8 instead of 13 .
We can make a similar argument for the viscosity. In this case φ = hpx i is spatially varying through its
dependence on the flow velocity V (r). Clearly ∂φ/∂Vx = m, hence
∂Vx
jpzx = Πxz = − 13 nmℓv̄ , (8.125)
∂z
from which we identify the viscosity, η = 13 nmℓv̄. Once again, this agrees in its functional dependences
with the Boltzmann equation calculation in the relaxation time approximation. Only the coefficients
8
differ. The ratio of the coefficients is KQDC /KBRT = 3π = 0.849 in both cases6 .

8.5.7 Thermal diffusivity, kinematic viscosity, and Prandtl number

Suppose, under conditions of constant pressure, we add heat q per unit volume to an ideal gas. We
know from thermodynamics that its temperature will then increase by an amount ∆T = q/ncp . If a heat
current jq flows, then the continuity equation for energy flow requires
∂T
ncp+ ∇ · jq = 0 . (8.126)
∂t
In a system where there is no net particle current, the heat current jq is the same as the energy current
jε , and since jε = −κ ∇T , we obtain a diffusion equation for temperature,
∂T κ
= ∇2 T . (8.127)
∂t ncp
6
Here we abbreviate QDC for ‘quick and dirty calculation’ and BRT for ‘Boltzmann equation in the relaxation time
approximation’ .
8.6. DIFFUSION AND THE LORENTZ MODEL 463

Gas η (µPa · s) κ (mW/m · K) cp /kB Pr


He 19.5 149 2.50 0.682
Ar 22.3 17.4 2.50 0.666
Xe 22.7 5.46 2.50 0.659
H2 8.67 179 3.47 0.693
N2 17.6 25.5 3.53 0.721
O2 20.3 26.0 3.50 0.711
CH4 11.2 33.5 4.29 0.74
CO2 14.8 18.1 4.47 0.71
NH3 10.1 24.6 4.50 0.90

Table 8.1: Viscosities, thermal conductivities, and Prandtl numbers for some common gases at T = 293 K
and p = 1 atm. (Source: Table 1.1 of Smith and Jensen, with data for triatomic gases added.)

The combination
κ
a≡ (8.128)
ncp
is known as the thermal diffusivity. Our Boltzmann equation calculation in the relaxation time approxi-
mation yielded the result κ = nkB T τ cp /m. Thus, we find a = kB T τ /m via this method. Note that the
dimensions of a are the same as for any diffusion constant D, namely [a] = L2 /T .
Another quantity with dimensions of L2 /T is the kinematic viscosity, ν = η/ρ, where ρ = nm is the mass
density. We found η = nkB T τ from the relaxation time approximation calculation, hence ν = kB T τ /m.
The ratio ν/a, called the Prandtl number , Pr = ηcp /mκ, is dimensionless. According to our calculations,
Pr = 1. According to table 8.1, most monatomic gases have Pr ≈ 23 .

8.6 Diffusion and the Lorentz model

8.6.1 Failure of the relaxation time approximation

As we remarked above, the relaxation time approximation fails to conserve any of the collisional invariants.
It is therefore unsuitable for describing hydrodynamic phenomena such as diffusion. To see this, let
f (r, v, t) be the distribution function, here written in terms of position, velocity, and time rather than
position, momentum, and time as before7 . In the absence of external forces, the Boltzmann equation in
the relaxation time approximation is
∂f ∂f f − f0
+v· =− . (8.129)
∂t ∂r τ
The density of particles in velocity space is given by
Z
ñ(v, t) = d3r f (r, v, t) . (8.130)

7
The difference is trivial, since p = mv .
464 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

In equilibrium, this is the Maxwell distribution times Rthe total number of particles: ñ0 (v) = N PM (v).
The number of particles as a function of time, N (t) = d3v ñ(v, t), should be a constant.
Integrating the Boltzmann equation one has

∂ ñ ñ − ñ0
=− . (8.131)
∂t τ
Thus, with δñ(v, t) = ñ(v, t) − ñ0 (v), we have

δñ(v, t) = δñ(v, 0) e−t/τ . (8.132)

Thus, ñ(v, t) decays exponentially to zero with time constant τ , from which it follows that the total
particle number exponentially relaxes to N0 . This is physically incorrect; local density perturbations
can’t just vanish. Rather, they diffuse.

8.6.2 Modified Boltzmann equation and its solution

To remedy this unphysical aspect, consider the modified Boltzmann equation,


 Z 
∂f ∂f 1 dv̂ 1 
+v· = −f + f ≡ P −1 f , (8.133)
∂t ∂r τ 4π τ
R dv̂
where P is a projector onto a space of isotropic functions of v: P F = 4π F (v) for any function F (v).
Note that P F is a function of the speed v = |v|. For this modified equation, known as the Lorentz model,
one finds ∂t ñ = 0.
The model in eqn. 8.133 is known as the Lorentz model 8 . To solve it, we consider the Laplace transform,

Z∞ Z
ˆ
f (k, v, s) = dt e−st
d3r e−ik·r f (r, v, t) . (8.134)
0

Taking the Laplace transform of eqn. 8.133, we find



s + iv · k + τ −1 fˆ(k, v, s) = τ −1 P fˆ(k, v, s) + f (k, v, t = 0) . (8.135)

We now solve for P fˆ(k, v, s):

τ −1 f (k, v, t = 0)
fˆ(k, v, s) = P fˆ(k, v, s) + , (8.136)
s + iv · k + τ −1 s + iv · k + τ −1
which entails
Z  Z
dv̂ τ −1 dv̂ f (k, v, t = 0)
P fˆ(k, v, s) = P fˆ(k, v, s) + . (8.137)
4π s + iv · k + τ −1 4π s + iv · k + τ −1
8
See the excellent discussion in the book by Krapivsky, Redner, and Ben-Naim, cited in §8.1.
8.6. DIFFUSION AND THE LORENTZ MODEL 465

Now we have
Z Z1
dv̂ τ −1 τ −1
= dx
4π s + iv · k + τ −1 s + ivkx + τ −1
−1 (8.138)
 
1 vkτ
= tan−1 .
vk 1 + τs
Thus,
"  #−1Z
1 vkτ dv̂ f (k, v, t = 0)
P f (k, v, s) = 1 − tan−1 . (8.139)
vkτ 1 + τs 4π s + iv · k + τ −1
We now have the solution to Lorentz’s modified Boltzmann equation:
"  #−1Z
τ −1 1 vkτ dv̂ f (k, v, t = 0)
fˆ(k, v, s) = 1− tan −1
s + iv · k + τ −1 vkτ 1 + τs 4π s + iv · k + τ −1
(8.140)
f (k, v, t = 0)
+ .
s + iv · k + τ −1

Let us assume an initial distribution which is perfectly localized in both r and v:

f (r, v, t = 0) = δ(v − v0 ) . (8.141)

For these initial conditions, we find


Z
dv̂ f (k, v, t = 0) 1 δ(v − v0 )
= · . (8.142)
4π s + iv · k + τ −1 s + iv0 · k + τ −1 4πv02

We further have that  


1 −1 vkτ
1− tan = sτ + 13 k2 v 2 τ 2 + . . . , (8.143)
vkτ 1 + τs
and therefore
τ −1 τ −1 1 δ(v − v0 )
fˆ(k, v, s) = · · ·
s + iv · k + τ −1 s + iv0 · k + τ −1 s + 31 v02 k2 τ + . . . 4πv02
(8.144)
δ(v − v0 )
+ .
s + iv0 · k + τ −1

We are interested in the long time limit t ≫ τ for f (r, v, t). This is dominated by s ∼ t−1 , and we assume
that τ −1 is dominant over s and iv · k. We then have
1 δ(v − v0 )
fˆ(k, v, s) ≈ 1 2 2 · . (8.145)
s+ 3 v0 k τ
4πv02

Performing the inverse Laplace and Fourier transforms, we obtain


2 /4Dt δ(v − v0 )
f (r, v, t) = (4πDt)−3/2 e−r · , (8.146)
4πv02
466 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

where the diffusion constant is


D = 31 v02 τ . (8.147)
The units are [D] = L2 /T . Integrating over velocities, we have the density
Z
2
n(r, t) = d3v f (r, v, t) = (4πDt)−3/2 e−r /4Dt . (8.148)

Note that Z
d3r n(r, t) = 1 (8.149)

for all time. Total particle number is conserved!

8.7 Linearized Boltzmann Equation

8.7.1 Linearizing the collision integral

We now return to the classical Boltzmann equation and consider a more formal treatment of the collision
term in the linear approximation. We will assume time-reversal symmetry, in which case
  Z Z Z n o
∂f 3 3 ′
= d p1 d p d3p′1 w(p′ , p′1 | p, p1 ) f (p′ ) f (p′1 ) − f (p) f (p1 ) . (8.150)
∂t coll
The collision integral is nonlinear in the distribution f . We linearize by writing

f (p) = f 0 (p) + f 0 (p) ψ(p) , (8.151)

where we assume ψ(p) is small. We then have, to first order in ψ,


 
∂f
= f 0 (p) L̂ψ + O(ψ 2 ) , (8.152)
∂t coll
where the action of the linearized collision operator is given by
Z Z Z n o
L̂ψ = d3p1 d3p′ d3p′1 w(p′ , p′1 | p, p1 ) f 0 (p1 ) ψ(p′ ) + ψ(p′1 ) − ψ(p) − ψ(p1 )
Z Z n o (8.153)
3 ∂σ 0
= d p1 dΩ |v − v1 | f (p1 ) ψ(p′ ) + ψ(p′1 ) − ψ(p) − ψ(p1 ) ,
∂Ω
where we have invoked eqn. 8.55 to write the RHS in terms of the differential scattering cross section.
In deriving the above result, we have made use of the detailed balance relation,

f 0 (p) f 0 (p1 ) = f 0 (p′ ) f 0 (p′1 ) . (8.154)

We have also suppressed the r dependence in writing f (p), f 0 (p), and ψ(p).
From eqn. 8.85, we then have the linearized equation
 

L̂ − ψ = Y, (8.155)
∂t
8.7. LINEARIZED BOLTZMANN EQUATION 467

where, for point particles,


( )
1 ε(p) − cp T k ε(p)
Y = v · ∇T + m vα vβ Qαβ − B ∇·V − F · v . (8.156)
kB T T cV


Eqn. 8.155 is an inhomogeneous linear equation, which can be solved by inverting the operator L̂ − ∂t .

8.7.2 Linear algebraic properties of L̂

Although L̂ is an integral operator, it shares many properties with other linear operators with which you
are familiar, such as matrices and differential operators. We can define an inner product 9 ,
Z
h ψ1 | ψ2 i ≡ d3p f 0 (p) ψ1 (p) ψ2 (p) . (8.157)

Note that this is not the usual Hilbert space inner product from quantum mechanics, since the factor
f 0 (p) is included in the metric. This is necessary in order that L̂ be self-adjoint:

h ψ1 | L̂ψ2 i = h L̂ψ1 | ψ2 i . (8.158)

We can now define the spectrum of normalized eigenfunctions of L̂, which we write as φn (p). The
eigenfunctions satisfy the eigenvalue equation,

L̂φn = −λn φn , (8.159)

and may be chosen to be orthonormal,

h φm | φn i = δmn . (8.160)

Of course, in order to obtain the eigenfunctions φn we must have detailed knowledge of the function
w(p′ , p′1 | p, p1 ).
Recall that there are five collisional invariants, which are the particle number, the three components of
the total particle momentum, and the particle energy. To each collisional invariant, there is an associated
eigenfunction φn with eigenvalue λn = 0. One can check that these normalized eigenfunctions are

1
φn (p) = √ (8.161)
n

φpα (p) = p (8.162)
nmkB T
r  
2 ε(p) 3
φε (p) = − . (8.163)
3n kB T 2
9
The requirements of an inner product hf |gi are symmetry, linearity, and non-negative definiteness.
468 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

If there are no temperature, chemical potential, or bulk velocity gradients, and there are no external
forces, then Y = 0 and the only changes to the distribution are from collisions. The linearized Boltzmann
equation becomes
∂ψ
= L̂ψ . (8.164)
∂t
We can therefore write the most general solution in the form
X′
ψ(p, t) = Cn φn (p) e−λn t , (8.165)
n

where the prime on the sum reminds us that collisional invariants are to be excluded. All the eigenvalues
λn , aside from the five zero eigenvalues for the collisional invariants, must be positive. Any negative
eigenvalue would cause ψ(p, t) to increase without bound, and an initial nonequilibrium distribution
would not relax to the equilibrium f 0 (p), which we regard as unphysical. Henceforth we will drop the
prime on the sum but remember that Cn = 0 for the five collisional invariants.
Recall also the particle, energy, and thermal (heat) currents,
Z Z
3
j= d3p f 0 (p) v ψ(p) = h v | ψ i
d p v f (p) =
Z Z
jε = d p v ε f (p) = d3p f 0 (p) v ε ψ(p) = h v ε | ψ i
3
(8.166)
Z Z
jq = d p v (ε − µ) f (p) = d3p f 0 (p) v (ε − µ) ψ(p) = h v (ε − µ) | ψ i .
3

Note jq = jε − µj.

8.7.3 Steady state solution to the linearized Boltzmann equation

Under steady state conditions, there is no time dependence, and the linearized Boltzmann equation takes
the form
L̂ψ = Y . (8.167)
P
We may expand ψ in the eigenfunctions φn and write ψ = n Cn φn . Applying L̂ and taking the inner
product with φj , we have
1
Cj = − h φj | Y i . (8.168)
λj

Thus, the formal solution to the linearized Boltzmann equation is


X 1
ψ(p) = − h φn | Y i φn (p) . (8.169)
n
λn

This solution is applicable provided | Y i is orthogonal to the five collisional invariants.


8.7. LINEARIZED BOLTZMANN EQUATION 469

Thermal conductivity

For the thermal conductivity, we take ∇T = ∂z T x̂, and

1 ∂T
Y = · Xκ , (8.170)
kB T 2 ∂x

where Xκ ≡ (ε − cp T ) vx . Under the conditions of no particle flow (j = 0), we have jq = −κ ∂x T x̂. Then
we have
∂T
h Xκ | ψ i = −κ . (8.171)
∂x

Viscosity

For the viscosity, we take


m ∂Vx
Y = · Xη , (8.172)
kB T ∂y
with Xη = vx vy . We then
∂Vx
Πxy = h m vx vy | ψ i = −η . (8.173)
∂y
Thus,
η ∂Vx
h Xη | ψ i = − . (8.174)
m ∂y

8.7.4 Variational approach

Following the treatment in chapter 1 of Smith and Jensen, define Ĥ ≡ −L̂. We have that Ĥ is a positive
semidefinite operator, whose only zero eigenvalues correspond to the collisional invariants. We then have
the Schwarz inequality,
2
h ψ | Ĥ | ψ i · h φ | Ĥ | φ i ≥ h φ | Ĥ | ψ i , (8.175)
for any two Hilbert space vectors | ψ i and | φ i. Consider now the above calculation of the thermal
conductivity. We have
1 ∂T
Ĥψ = − X (8.176)
kB T 2 ∂x κ
and therefore
kB T 2 1 h φ | Xκ i2
κ= h ψ | Ĥ | ψ i ≥ . (8.177)
(∂T /∂x)2 kB T 2 h φ | Ĥ | φ i
Similarly, for the viscosity, we have
m ∂Vx
Ĥψ = − X , (8.178)
kB T ∂y η
from which we derive
2
kB T m2 h φ | Xη i
η= h ψ | Ĥ | ψ i ≥ . (8.179)
(∂Vx /∂y)2 kB T h φ | Ĥ | φ i
470 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

In order to get a good lower bound, we want φ in each case to have a good overlap with Xκ,η . One
approach then is to take φ = Xκ,η , which guarantees that the overlap will be finite (and not zero due to
symmetry, for example). We illustrate this method with the viscosity calculation. We have
2
m2 h vx vy | vx vy i
η≥ . (8.180)
kB T h vx vy | Ĥ | vx vy i

Now the linearized collision operator L̂ acts as


Z Z Z n o
∂σ
h φ | L̂ | ψ i = d3p g0 (p) φ(p) d3p1 dΩ |v − v1 | f 0 (p1 ) ψ(p) + ψ(p1 ) − ψ(p′ ) − ψ(p′1 ) . (8.181)
∂Ω

Here the kinematics of the collision guarantee total energy and momentum conservation, so p′ and p′1 are
determined as in eqn. 8.56.
Now we have
dΩ = sin χ dχ dϕ , (8.182)
where χ is the scattering angle depicted in Fig. 8.6 and ϕ is the azimuthal angle of the scattering. The
differential scattering cross section is obtained by elementary mechanics and is known to be
2
∂σ d(b /2)
= , (8.183)
∂Ω d sin χ

where b is the impact parameter . The scattering angle is


Z∞
b
χ(b, u) = π − 2 dr q , (8.184)
2U (r)r 4
rp r4 − b2 r 2 − m̃u2

where m̃ = 12 m is the reduced mass, and rp is the relative coordinate separation at periapsis, i.e. the
distance of closest approach, which occurs when ṙ = 0, i.e.

1 2 ℓ2
2 m̃u =
2m̃rp2
+ U (rp ) , (8.185)

where ℓ = m̃ub is the relative coordinate angular momentum.


We work in center-of-mass coordinates, so the velocities are

v = V + 21 u v ′ = V + 21 u′ (8.186)
v1 = V − 1
2u v1′ =V − 1 ′
2u , (8.187)

with |u| = |u′ | and û · û′ = cos χ. Then if ψ(p) = vx vy , we have



∆(ψ) ≡ ψ(p) + ψ(p1 ) − ψ(p′ ) − ψ(p′1 ) = 1
2 ux uy − u′x u′y . (8.188)

We may write  
u′ = u sin χ cos ϕ ê1 + sin χ sin ϕ ê2 + cos χ ê3 , (8.189)
8.7. LINEARIZED BOLTZMANN EQUATION 471

Figure 8.6: Scattering in the CM frame. O is the force center and P is the point of periapsis. The impact
parameter is b, and χ is the scattering angle. φ0 is the angle through which the relative coordinate moves
between periapsis and infinity.

where ê3 = û. With this parameterization, we have


Z2π
 
dϕ 1
2 uα uβ − u′α u′β = −π sin2 χ u2 δαβ − 3uα uβ . (8.190)
0

Note that we have used here the relation

e1α e1β + e2α e2β + e3α e3β = δαβ , (8.191)


P
which holds since the LHS is a projector 3i=1 |êi ihêi |.
It is convenient to define the following integral:
Z∞
R(u) ≡ db b sin2 χ(b, u) . (8.192)
0

Since the Jacobian



det (∂v, ∂v1 ) = 1 , (8.193)
(∂V , ∂u)
we have
 3 Z Z
2 m 2 /k 2 /4k
h vx vy | L̂ | vx vy i = n d V d3u e−mV
3 BT e−mu BT ·u· 3π
2 ux uy · R(u) · vx vy . (8.194)
2πkB T
This yields

h vx vy | L̂ | vx vy i = π
40 n2 u5 R(u) , (8.195)
where
Z∞  Z∞

2 −mu2 /4kB T 2
F (u) ≡ du u e F (u) du u2 e−mu /4kB T . (8.196)
0 0
472 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

It is easy to compute the term in the numerator of eqn. 8.180:


 3/2 Z  
m 3 −mv 2 /2kB T 2 2 kB T 2
h vx vy | vx vy i = n dve vx vy = n . (8.197)
2πkB T m
Putting it all together, we find 
40 (kB T )3

η≥ u5 R(u) . (8.198)
π m2

The computation for κ is a bit more tedious. One has ψ(p) = (ε − cp T ) vx , in which case
h i
∆(ψ) = 12 m (V · u) ux − (V · u′ ) u′x . (8.199)

Ultimately, one obtains the lower bound



150 kB (kB T )3

κ≥ u5 R(u) . (8.200)
π m3
Thus, independent of the potential, this variational calculation yields a Prandtl number of
ν η cp 2
Pr = = = 3 , (8.201)
a mκ
which is very close to what is observed in dilute monatomic gases (see Tab. 8.1).
While the variational expressions for η and κ are complicated functions of the potential, for hard sphere
scattering the calculation is simple, because b = d sin φ0 = d cos( 12 χ), where d is the hard sphere diameter.
Thus, the impact parameter b is independent of the relative speed u, and one finds R(u) = 31 d3 . Then
 

5
1 3
5 128 kB T 5/2 2
u R(u) = 3 d u = √ d (8.202)
π m
and one finds  1/2
5 (mkB T )1/2 75 kB kB T
η≥ √ , κ≥ √ . (8.203)
16 π d2 64 π d2 m

8.8 The Equations of Hydrodynamics

We now derive the equations governing fluid flow. The equations of mass and momentum balance are
∂ρ
+ ∇·(ρ V ) = 0 (8.204)
∂t
∂(ρ Vα ) ∂Παβ
+ =0, (8.205)
∂t ∂xβ
where
σ̃αβ
z( }| ){

∂Vα ∂Vβ
Παβ = ρ Vα Vβ + p δαβ − η + − 2 ∇·V δαβ + ζ ∇·V δαβ . (8.206)
∂xβ ∂xα 3
8.9. NONEQUILIBRIUM QUANTUM TRANSPORT 473

Substituting the continuity equation into the momentum balance equation, one arrives at

∂V
ρ + ρ (V ·∇)V = −∇p + η ∇2 V + (ζ + 13 η)∇(∇·V ) , (8.207)
∂t
which, together with continuity, are known as the Navier-Stokes equations. These equations are supple-
mented by an equation describing the conservation of energy,

∂s ∂Vα
T + T ∇·(sV ) = σ̃αβ + ∇·(κ∇T ) . (8.208)
∂T ∂xβ

Note that the LHS of eqn. 8.207 is ρ DV /Dt, where D/Dt is the convective derivative. Multiplying by
a differential volume, this gives the mass times the acceleration of a differential local fluid element. The
RHS, multiplied by the same differential volume, gives the differential force on this fluid element in a
frame instantaneously moving with constant velocity V . Thus, this is Newton’s Second Law for the fluid.

8.9 Nonequilibrium Quantum Transport

8.9.1 Boltzmann equation for quantum systems

Almost everything we have derived thus far can be applied, mutatis mutandis, to quantum systems.
The main difference is that the distribution f 0 corresponding to local equilibrium is no longer of the
Maxwell-Boltzmann form, but rather of the Bose-Einstein or Fermi-Dirac form,
(   )−1
0 ε(k) − µ(r, t)
f (r, k, t) = exp ∓1 , (8.209)
kB T (r, t)

where the top sign applies to bosons and the bottom sign to fermions. Here we shift to the more common
notation for quantum systems in which we write the distribution in terms of the wavevector k = p/~
rather than the momentum p. The quantum distributions satisfy detailed balance with respect to the
quantum collision integral
  Z Z 3 ′Z 3 ′ n o
∂f d3k1 dk d k1 ′ ′ ′ ′
= w f f 1 (1 ± f ) (1 ± f 1 ) − f f 1 (1 ± f ) (1 ± f 1 ) (8.210)
∂t coll (2π)3 (2π)3 (2π)3

where w = w(k, k1 | k′ , k1′ ), f = f (k), f1 = f (k1 ), f ′ = f (k′ ), and f1′ = f (k1′ ), and where we have
assumed time-reversal and parity symmetry. Detailed balance requires

f f1 f′ f1′
· = · , (8.211)
1 ± f 1 ± f1 1 ± f ′ 1 ± f1′

where f = f 0 is the equilibrium distribution. One can check that

1 f
f= =⇒ = eβ(µ−ε) , (8.212)
eβ(ε−µ) ∓ 1 1±f
474 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

which is the Boltzmann distribution, which we have already shown to satisfy detailed balance. For the
streaming term, we have
 
0 ∂f 0 ε−µ
df = kB T d
∂ε k T
0
 B 
∂f dµ (ε − µ) dT dε
= kB T − − + (8.213)
∂ε kB T kB T 2 kB T
 
∂f 0 ∂µ ε − µ ∂T ∂ε
=− · dr + · dr − · dk ,
∂ε ∂r T ∂r ∂k
from which we read off
 
∂f 0 ∂f 0 ∂µ ε − µ ∂T
=− +
∂r ∂ε ∂r T ∂r
(8.214)
∂f 0 ∂f 0
= ~v .
∂k ∂ε
The most important application is to the theory of electron transport in metals and semiconductors,
in which case f 0 is the Fermi distribution. In this case, the quantum collision integral also receives a
contribution from one-body scattering in the presence of an external potential U (r), which is given by
Fermi’s Golden Rule:
 
∂f (k) ′ 2π X
′ 2  
= | k U k | f (k′ ) − f (k) δ ε(k) − ε(k′ )
∂t coll ~
k′ ∈ Ω̂
Z 3 (8.215)
2π dk ′ 2 ′
 ′

= | Û (k − k )| f (k ) − f (k) δ ε(k) − ε(k ) .
~V (2π)3
Ω̂

The wavevectors are now restricted to the first Brillouin zone, and the dispersion ε(k) is no longer the
ballistic form ε = ~2 k2 /2m but rather the dispersion for electrons in a particular energy band (typically
the valence band) of a solid10 . Note that f = f 0 satisfies detailed balance with respect to one-body
collisions as well11 .
In the presence of a weak electric field E and a (not necessarily weak) magnetic field B, we have, within
the relaxation time approximation, f = f 0 + δf with
  0
∂ δf e ∂ δf ε−µ ∂f δf
− v×B· − v · eE+ ∇T =− , (8.216)
∂t ~c ∂k T ∂ε τ

where E = −∇(φ − µ/e) = E − e−1 ∇µ is the gradient of the ‘electrochemical potential’ φ − e−1 µ. In
deriving the above equation, we have worked to lowest order in small quantities. This entails dropping
terms like v · ∂∂rδf (higher order in spatial derivatives)
 and E · ∂∂kδf (both E and δf are assumed small).
Typically τ is energy-dependent, i.e. τ = τ ε(k) .
10
We neglect interband scattering here, which can be important in practical applications, but which is beyond the scope
of these notes.
11
The transition rate from |k′ i to |ki is proportional to the matrix element and to the product f ′ (1 − f ). The reverse
process is proportional to f (1−f ′ ). Subtracting these factors, one obtains f ′ −f , and therefore the nonlinear terms felicitously
cancel in eqn. 8.215.
8.9. NONEQUILIBRIUM QUANTUM TRANSPORT 475

We can use eqn. 8.216 to compute the electrical current j and the thermal current jq ,
Z 3
dk
j = −2e v δf (8.217)
(2π)3
Ω̂
Z 3
dk
jq = 2 (ε − µ) v δf . (8.218)
(2π)3
Ω̂

Here the factor of 2 is from spin degeneracy of the electrons (we neglect Zeeman splitting).
In the presence of a time-independent temperature gradient and electric field, linearized Boltzmann
equation in the relaxation time approximation has the solution
  
ε−µ ∂f 0
δf = −τ (ε) v · eE + ∇T − . (8.219)
T ∂ε

We now consider both the electrical current12 j as well as the thermal current density jq . One readily
obtains
Z 3
dk
j = −2e v δf ≡ L11 E − L12 ∇ T (8.220)
(2π)3
Ω̂
Z 3
dk
jq = 2 (ε − µ) v δf ≡ L21 E − L22 ∇ T (8.221)
(2π)3
Ω̂

where the transport coefficients L11 etc. are matrices:


Z  Z
αβ e2 ∂f 0 vα vβ
L11 = 3 dε τ (ε) − dSε (8.222)
4π ~ ∂ε |v|
Z  0
Z
e ∂f vα vβ
Lαβ αβ
21 = T L12 = − dε τ (ε) (ε − µ) − dS ε (8.223)
4π 3 ~ ∂ε |v|
Z  0
Z
1 ∂f vα vβ
Lαβ
22 = dε τ (ε) (ε − µ) 2
− dS ε . (8.224)
4π 3 ~ T ∂ε |v|
If we define the hierarchy of integral expressions
Z  Z
αβ 1 n ∂f 0 vα vβ
Jn ≡ 3 dε τ (ε) (ε − µ) − dSε (8.225)
4π ~ ∂ε |v|
then we may write
1 αβ
Lαβ 2 αβ
11 = e J0 , Lαβ αβ αβ
21 = T L12 = −e J1 , Lαβ
22 = J . (8.226)
T 2

The linear relations in eqn. (8.221) may be recast in the following form:
E= ρj + Q∇T
(8.227)
jq = ⊓ j − κ ∇ T ,
12
In this section we use j to denote electrical current, rather than particle number current as before.
476 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

Figure 8.7: A thermocouple is a junction formed of two dissimilar metals. With no electrical current
passing, an electric field is generated in the presence of a temperature gradient, resulting in a voltage
V = VA − VB .

where the matrices ρ, Q, ⊓, and κ are given by

ρ = L−1
11 Q = L−1
11 L12 (8.228)
⊓= L21 L−1
11 κ = L22 − L21 L−1
11 L12 , (8.229)

or, in terms of the Jn ,

1 −1 1
ρ= J Q=− J −1 J1 (8.230)
e2 0 eT 0
1 1 
⊓ = − J1 J0−1 κ= J2 − J1 J0−1 J1 , (8.231)
e T

These equations describe a wealth of transport phenomena:

• Electrical resistance (∇T = B = 0)


An electrical current j will generate an electric field E = ρj, where ρ is the electrical resistivity.

• Peltier effect (∇T = B = 0)


An electrical current j will generate an heat current jq = ⊓j, where ⊓ is the Peltier coefficient.

• Thermal conduction (j = B = 0)
A temperature gradient ∇T gives rise to a heat current jq = −κ∇T , where κ is the thermal
conductivity.

• Seebeck effect (j = B = 0)
A temperature gradient ∇T gives rise to an electric field E = Q∇T , where Q is the Seebeck
coefficient.
8.9. NONEQUILIBRIUM QUANTUM TRANSPORT 477

Figure 8.8: A sketch of a Peltier effect refrigerator. An electrical current I is passed through a junction
between two dissimilar metals. If the dotted line represents the boundary of a thermally well-insulated
body, then the body cools when ⊓B > ⊓A , in order to maintain a heat current balance at the junction.

One practical way to measure the thermopower is to form a junction between two dissimilar metals, A
and B. The junction is held at temperature T1 and the other ends of the metals are held at temperature
T0 . One then measures a voltage difference between the free ends of the metals – this is known as the
Seebeck effect. Integrating the electric field from the free end of A to the free end of B gives

ZB
VA − VB = − E · dl = (QB − QA )(T1 − T0 ) . (8.232)
A

What one measures here is really the difference in thermopowers of the two metals. For an absolute
measurement of QA , replace B by a superconductor (Q = 0 for a superconductor). A device which
converts a temperature gradient into an emf is known as a thermocouple.
The Peltier effect has practical applications in refrigeration technology. Suppose an electrical current I
is passed through a junction between two dissimilar metals, A and B. Due to the difference in Peltier
coefficients, there will be a net heat current into the junction of W = (⊓A − ⊓B ) I. Note that this is
proportional to I, rather than the familiar I 2 result from Joule heating. The sign of W depends on the
direction of the current. If a second junction is added, to make an ABA configuration, then heat absorbed
at the first junction will be liberated at the second. 13

13
To create a refrigerator, stick the cold junction inside a thermally insulated box and the hot junction outside the box.
478 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

8.9.2 The Heat Equation

We begin with the continuity equations for charge density ρ and energy density ε:
∂ρ
+∇·j =0 (8.233)
∂t
∂ε
+ ∇ · jε = j ·E , (8.234)
∂t
where E is the electric field14 . Now we invoke local thermodynamic equilibrium and write
∂ε ∂ε ∂n ∂ε ∂T
= +
∂t ∂n ∂t ∂T ∂t
µ ∂ρ ∂T
=− + cV , (8.235)
e ∂t ∂t
where n is the electron number density (n = −ρ/e) and cV is the specific heat. We may now write
∂T ∂ε µ ∂ρ
cV = +
∂t ∂t e ∂t
µ
= j · E − ∇ · jε − ∇ · j
e
= j · E − ∇ · jq . (8.236)

Invoking jq = ⊓j − κ∇ T , we see that if there is no electrical current (j = 0), we obtain the heat equation
∂T ∂ 2T
cV = καβ . (8.237)
∂t ∂xα ∂xβ
This results in a time scale τT for temperature diffusion τT = CL2 cV /κ, where L is a typical length scale
and C is a numerical constant. For a cube of size L subjected to a sudden external temperature change,
L is the side length and C = 1/3π 2 (solve by separation of variables).

8.9.3 Calculation of Transport Coefficients

We will henceforth assume that sufficient crystalline symmetry exists (e.g. cubic symmetry) to render
all the transport coefficients multiples of the identity matrix. Under such conditions, we may write
Jnαβ = Jn δαβ with  Z
Z
1 n ∂f 0
Jn = dε τ (ε) (ε − µ) − dSε |v| . (8.238)
12π 3 ~ ∂ε
The low-temperature behavior is extracted using the Sommerfeld expansion,
Z∞  
∂f 0
I ≡ dε H(ε) − = πD csc(πD) H(ε) (8.239)
∂ε ε=µ
−∞
π2
= H(µ) + (k T )2 H ′′ (µ) + . . . (8.240)
6 B
14
Note that it is E · j and not E · j which is the source term in the energy continuity equation.
8.9. NONEQUILIBRIUM QUANTUM TRANSPORT 479


where D ≡ kB T ∂ε is a dimensionless differential operator.15
Let us now perform some explicit calculations in the case of a parabolic band with an energy-independent
scattering time τ . In this case, one readily finds

σ0 −3/2
3/2 n
Jn = ε πD csc πD ε (ε − µ) , (8.241)
e2 F ε=µ

where σ0 = ne2 τ /m∗ . Note that


 3/2
1 2m∗ εF
n= 2 (8.242)
3π ~2
and that εF and µ are related by
3/2
εF = πD csc πD ε3/2 . (8.243)
ε=µ

Thus,

σ0
J0 =
e2
σ π 2 (kB T )2
J1 = 20 + ... (8.244)
e 2 µ
σ0 π 2
J2 = (k T )2 + . . . ,
e2 3 B

from which we obtain the low-T results ρ = σ0−1 ,

π 2 kB2 T π 2 nτ 2
Q=− κ= k T , (8.245)
2 e εF 3 m∗ B

and of course ⊓ = T Q. The predicted universal ratio

κ π2
= (k /e)2 = 2.45 × 10−8 V2 K−2 , (8.246)
σT 3 B
is known as the Wiedemann-Franz law. Note also that our result for the thermopower is unambiguously
negative. In actuality, several nearly free electron metals have positive low-temperature thermopowers
(Cs and Li, for example). What went wrong? We have neglected electron-phonon scattering!

8.9.4 Onsager Relations

Transport phenomena are described in general by a set of linear relations,

Ji = Lik Fk , (8.247)
15
Remember that physically the fixed quantities are temperature and total carrier number density (or charge density, in
the case of electron and hole bands), and not temperature and chemical potential. An equation of state relating n, µ, and
T is then inverted to obtain µ(n, T ), so that all results ultimately may be expressed in terms of n and T .
480 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

where the {Fk } are generalized forces and the {Ji } are generalized currents. Moreover, to each force Fi
corresponds a unique conjugate current Ji , such that the rate of internal entropy production is
X ∂ Ṡ
Ṡ = Fi Ji =⇒ Fi = . (8.248)
∂Ji
i

The Onsager relations (also known as Onsager reciprocity) state that


Lik (B) = ηi ηk Lki (−B) , (8.249)
where ηi describes the parity of Ji under time reversal:
JiT = ηi Ji , (8.250)
where JiT is the time reverse of Ji . To justify the Onsager relations requires a microscopic description of
our nonequilibrium system.
The Onsager relations have some remarkable consequences. For example, they require, for B = 0, that the
thermal conductivity tensor κij of any crystal must be symmetric, independent of the crystal structure.
In general,this result does not follow from considerations of crystalline symmetry. It also requires that for
every ‘off-diagonal’ transport phenomenon, e.g. the Seebeck effect, there exists a distinct corresponding
phenomenon, e.g. the Peltier effect.
For the transport coefficients studied, Onsager reciprocity means that in the presence of an external
magnetic field,
ραβ (B) = ρβα (−B) (8.251)
καβ (B) = κβα (−B) (8.252)
⊓αβ (B) = T Qβα (−B) . (8.253)
Let’s consider an isotropic system in a weak magnetic field, and expand the transport coefficients to first
order in B:
ραβ (B) = ρ δαβ + ν ǫαβγ B γ (8.254)
καβ (B) = κ δαβ + ̟ ǫαβγ B γ (8.255)
Qαβ (B) = Q δαβ + ζ ǫαβγ B γ (8.256)
⊓αβ (B) = ⊓ δαβ + θ ǫαβγ B γ . (8.257)
Onsager reciprocity requires ⊓ = T Q and θ = T ζ. We can now write
E= ρj + ν j × B + Q∇T + ζ ∇T × B (8.258)
jq = ⊓ j + θ j × B − κ ∇ T − ̟ ∇ T × B . (8.259)
There are several new phenomena lurking:

• Hall effect ( ∂T
∂x =
∂T
∂y = jy = 0)
An electrical current j = jx x̂ and a field B = Bz ẑ yield an electric field E. The Hall coefficient is
RH = Ey /jx Bz = −ν.
8.10. STOCHASTIC PROCESSES 481

• Ettingshausen effect ( ∂T
∂x = jy = jq,y = 0)
∂T
An electrical current j = jx x̂ and a field B = Bz ẑ yield a temperature gradient ∂y . The Etting-

shausen coefficient is P = ∂T
∂y jx Bz = −θ/κ.

∂T
• Nernst effect (jx = jy = ∂y = 0)
A temperature gradient ∇ T = ∂T ∂x x̂ and a field B = Bz ẑ yield an electric field E. The Nernst
 ∂T
coefficient is Λ = Ey ∂x Bz = −ζ.

• Righi-Leduc effect (jx = jy = Ey = 0)


A temperature gradient ∇ T = ∂T field B = Bz ẑ yield an orthogonal temperature gradient
∂x x̂ and a 
∂T ∂T ∂T
∂y . The Righi-Leduc coefficient is L = ∂y ∂x Bz = ζ/Q.

8.10 Stochastic Processes

A stochastic process is one which is partially random, i.e. it is not wholly deterministic. Typically the
randomness is due to phenomena at the microscale, such as the effect of fluid molecules on a small particle,
such as a piece of dust in the air. The resulting motion (called Brownian motion in the case of particles
moving in a fluid) can be described only in a statistical sense. That is, the full motion of the system is
a functional of one or more independent random variables. The motion is then described by its averages
with respect to the various random distributions.

8.10.1 Langevin equation and Brownian motion

Consider a particle of mass M subjected to dissipative and random forcing. We’ll examine this system
in one dimension to gain an understanding of the essential physics. We write

ṗ + γp = F + η(t) . (8.260)

Here, γ is the damping rate due to friction, F is a constant external force, and η(t) is a stochastic random
force. This equation, known as the Langevin equation, describes a ballistic particle being buffeted by
random forcing events. Think of a particle of dust as it moves in the atmosphere; F would then represent
the external force due to gravity and η(t) the random forcing due to interaction with the air molecules.
For a sphere of radius a moving with velocity v in a fluid, the Stokes drag is given by Fdrag = −6πηav,
where a is the radius. Thus,
6πηa
γStokes = , (8.261)
M
where M is the mass of the particle. It is illustrative to compute γ in some setting. Consider a micron
sized droplet (a = 10−4 cm) of some liquid of density ρ ∼ 1.0 g/cm3 moving in air at T = 20◦ C. The
viscosity of air is η = 1.8 × 10−4 g/cm · s at this temperature16 . If the droplet density is constant, then
γ = 9η/2ρa2 = 8.1× 104 s−1 , hence the time scale for viscous relaxation of the particle is τ = γ −1 = 12 µs.
We should stress that the viscous damping on the particle is of course due to the fluid molecules, in
16
The cgs unit of viscosity is the Poise (P). 1 P = 1 g/cm·s.
482 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

some average ‘coarse-grained’ sense. The random component to the force η(t) would then represent the
fluctuations with respect to this average.
We can easily integrate this equation:
d 
p eγt = F eγt + η(t) eγt
dt
Zt (8.262)
F 
p(t) = p(0) e−γt + 1 − e−γt + ds η(s) eγ(s−t)
γ
0

Note that p(t) is indeed a functional of the random function η(t). We can therefore only compute averages
in order to describe the motion of the system.
The

first
average we will compute is that of p itself. In so doing, we assume that η(t) has zero mean:
η(t) = 0. Then

F 
p(t) = p(0) e−γt + 1 − e−γt . (8.263)
γ
−1 −1
On the time γ , the initial conditions p(0) are effectively forgotten, and asymptotically for t ≫ γ

scale
we have p(t) → F/γ, which is the terminal momentum.
Next, consider
Zt Zt

2

2

p (t) = p(t) + ds1 ds2 eγ(s1 −t) eγ(s2 −t) η(s1 ) η(s2 ) . (8.264)
0 0


We now need to know the two-time correlator η(s1 ) η(s2 ) . We assume that the correlator is a function
only of the time difference ∆s = s1 − s2 , so that the random force η(s) satisfies


η(s) = 0 (8.265)


η(s1 ) η(s2 ) = φ(s1 − s2 ) . (8.266)

The function φ(s) is the autocorrelation function of the random force. A macroscopic object moving in
a fluid is constantly buffeted by fluid particles over its entire perimeter. These different fluid particles
are almost completely uncorrelated, hence φ(s) is basically nonzero except on a very small time scale τφ ,
which is the time a single fluid particle spends interacting with the object. We can take τφ → 0 and
approximate
φ(s) ≈ Γ δ(s) . (8.267)
We shall determine the value of Γ from equilibrium thermodynamic considerations below.
With this form for φ(s), we can easily calculate the equal time momentum autocorrelation:

Zt


2
2
p (t) = p(t) + Γ ds e2γ(s−t)
0 (8.268)

2 Γ 
= p(t) + 1 − e−2γt .

8.10. STOCHASTIC PROCESSES 483

Consider the case where F = 0 and the limit t ≫ γ −1 . We demand that the object thermalize at
temperature T . Thus, we impose the condition
 
p2 (t)
= 12 kB T =⇒ Γ = 2γM kB T , (8.269)
2M

where M is the particle’s mass. This determines the value of Γ .


We can now compute the general momentum autocorrelator:

Zt Zt′



′ ′

p(t) p(t′ ) − p(t) p(t′ ) = ds ds′ eγ(s−t) eγ(s −t ) η(s) η(s′ )
(8.270)
0 0

= M kB T e−γ|t−t | (t, t′ → ∞ , |t − t′ | finite) .

The full expressions for this and subsequent expressions, including subleading terms, are contained in an
appendix, §8.14.
Let’s now compute the position x(t). We find

Zt Zs

1
x(t) = x(t) + ds ds1 η(s1 ) eγ(s1 −s) , (8.271)
M
0 0

where  

Ft 1 F 
x(t) = x(0) + + v(0) − 1 − e−γt . (8.272)
γM γ γM


Note that for γt ≪ 1 we have x(t) = x(0) + v(0) t + 12 M −1 F t2 + O(t3 ), as is appropriate for ballistic
particles moving under the influence of a constant
force. This long time limit of course agrees with our
earlier evaluation for the terminal velocity, v∞ = p(∞) /M = F/γM . We next compute the position
autocorrelation:

Zt Zt′ Zs Zs′



1 ′

x(t) x(t′ ) − x(t) x(t′ ) = 2 ds ds′ e−γ(s+s ) ds1 ds′1 eγ(s1 +s2 ) η(s1 ) η(s2 )
M
0 0 0 0
2k T
= B min(t, t′ ) + O(1) .
γM

In particular, the equal time autocorrelator is



2 2k T t
x2 (t) − x(t) = B ≡ 2D t , (8.273)
γM

at long times, up to terms of order unity. Here,

kB T
D= (8.274)
γM
484 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

is the diffusion constant. For a liquid droplet of radius a = 1 µm moving in air at T = 293 K, for which
η = 1.8 × 10−4 P, we have
kB T (1.38 × 10−16 erg/K) (293 K)
D= = = 1.19 × 10−7 cm2 /s . (8.275)
6πηa 6π (1.8 × 10−4 P) (10−4 cm)
This result presumes that the droplet is large enough compared to the intermolecular distance in the
fluid that one can adopt a continuum approach and use the Navier-Stokes equations, and then assuming
a laminar flow.
If we consider molecular diffusion, the situation is quite a bit different. As we shall derive below in §8.10.3,
the molecular diffusion constant is D = ℓ2 /2τ , where ℓ is the mean free path and τ is the collision time.
As we found in eqn. 8.91, the mean free path ℓ, collision time τ , number density n, and total scattering
cross section σ are related by
1
ℓ = v̄τ = √ , (8.276)
2 nσ
p
where v̄ = 8kB T /πm is the average particle speed. Approximating the particles as hard spheres, we
have σ = 4πa2 , where a is the hard sphere radius. At T = 293 K, and p = 1 atm, we have n = p/kB T =
2.51 × 1019 cm−3 . Since air is predominantly composed of N2 molecules, we take a = 1.90 × 10−8 cm and
m = 28.0 amu = 4.65 × 10−23 g, which are appropriate for N2 . We find an average speed of v̄ = 471 m/s
and a mean free path of ℓ = 6.21 × 10−6 cm. Thus, D = 21 ℓv̄ = 0.146 cm 2 /s. Though much larger than
the diffusion constant for large droplets, this is still too small to explain common experiences. Suppose
we set the characteristic distance scale at d = 10 cm and we ask how much time a point source would take
to diffuse out to this radius. The answer is ∆t = d2 /2D = 343 s, which is between five and six minutes.
Yet if someone in the next seat emits a foul odor, your sense the offending emission in on the order of
a second. What this tells us is that diffusion isn’t the only transport process involved in these and like
phenomena. More important are convection currents which distribute the scent much more rapidly.

8.10.2 Langevin equation for a particle in a harmonic well

Consider next the equation


M Ẍ + γM Ẋ + M ω02 X = F0 + η(t) , (8.277)
F0
where F0 is a constant force. We write X = M ω02
+ x and measure x relative to the potential minimum,
yielding
1
ẍ + γ ẋ + ω02 x = η(t) . (8.278)
M
At this point there are several ways to proceed.
Perhaps the most straightforward is by use of the Laplace transform. Recall:
Z∞
x̂(ν) = dt e−νt η(ν) (8.279)
Z0
dν +νt
x(t) = e x̂(ν) , (8.280)
2πi
C
8.10. STOCHASTIC PROCESSES 485

where the contour C proceeds from a − i∞ to a + i∞ such that all poles of the integrand lie to the left
of C. We then have
Z∞ Z∞  
1 1
dt e−νt η(t) = dt e−νt ẍ + γ ẋ + ω02 x
M M
0 0

= −(ν + γ) x(0) − ẋ(0) + ν 2 + γν + ω02 x̂(ν) . (8.281)

Thus, we have
Z∞
(ν + γ) x(0) + ẋ(0) 1 1
x̂(ν) = + · dt e−νt η(t) . (8.282)
ν 2 + γν + ω02 M ν 2 + γν + ω02
0
Now we may write
ν 2 + γν + ω02 = (ν − ν+ )(ν − ν− ) , (8.283)
where q
ν± = − 12 γ ± 1 2
4γ − ω02 . (8.284)
Note that Re (ν± ) ≤ 0 and that γ + ν± = −ν∓ .
Performing the inverse Laplace transform, we obtain
x(0)   ẋ(0)  ν+ t 
x(t) = ν+ eν− t − ν− eν+ t + e − eν− t
ν+ − ν− ν+ − ν−
Z∞ (8.285)
+ ds K(t − s) η(s) ,
0

where
Θ(t − s)  ν+ (t−s) 
K(t − s) = e − eν− (t−s) (8.286)
M (ν+ − ν− )
is the response kernel and Θ(t − s) is the step function which is unity for t > s and zero otherwise. The
response is causal, i.e. x(t) depends on η(s) for all previous times s < t, but not for future times s > t.
Note that K(τ ) decays exponentially for τ → ∞, if Re(ν± ) < 0. The marginal case where ω0 = 0 and
ν+ = 0 corresponds to the diffusion calculation we performed in the previous section.

8.10.3 Discrete random walk

Consider an object moving on a one-dimensional lattice in such a way that every time step it moves either
one unit to the right or left, at random. If the lattice spacing is ℓ, then after n time steps the position
will be
X n
xn = ℓ σj , (8.287)
j=1

where (
+1 if motion is one unit to right at time step j
σj = (8.288)
−1 if motion is one unit to left at time step j .
486 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

Clearly hσj i = 0, so hxn i = 0. Now let us compute

n X
X n


x2n = ℓ2 hσj σj ′ i = nℓ2 , (8.289)
j=1 j ′ =1

where we invoke


σj σj ′ = δjj ′ . (8.290)

If the length of each time step is τ , then we have, with t = nτ ,


ℓ2
x2 (t) = t, (8.291)
τ
and we identify the diffusion constant
ℓ2
D= . (8.292)

Suppose, however, the random walk is biased, so that the probability for each independent step is given
by
P (σ) = p δσ,1 + q δσ,−1 , (8.293)

where p + q = 1. Then
hσj i = p − q = 2p − 1 (8.294)

and

hσj σj ′ i = (p − q)2 1 − δjj ′ + δjj ′
(8.295)
= (2p − 1)2 + 4 p (1 − p) δjj ′ .

Then

hxn i = (2p − 1) ℓn (8.296)




2
x2n − xn = 4 p (1 − p) ℓ2 n . (8.297)

8.10.4 Fokker-Planck equation

Suppose x(t) is a stochastic variable. We define the quantity

δx(t) ≡ x(t + δt) − x(t) , (8.298)

and we assume


δx(t) = F1 x(t) δt (8.299)

 2 
δx(t) = F2 x(t) δt (8.300)
8.10. STOCHASTIC PROCESSES 487

Figure 8.9: Interpretive sketch of the mathematics behind the Chapman-Kolmogorov equation.


 n 
but δx(t) = O (δt)2 for n > 2. The n = 1 term is due to drift and the n = 2 term is due to diffusion.
Now consider the conditional probability density, P (x, t | x0 , t0 ), defined to be the probability distribution
for x ≡ x(t) given that x(t0 ) = x0 . The conditional probability density satisfies the composition rule,

Z∞
P (x2 , t2 | x0 , t0 ) = dx1 P (x2 , t2 | x1 , t1 ) P (x1 , t1 | x0 , t0 ) , (8.301)
−∞

for any value of t1 . This is also known as the Chapman-Kolmogorov equation. In words, what it says
is that the probability density for a particle being at x2 at time t2 , given that it was at x0 at time t0 ,
is given by the product of the probability density for being at x2 at time t2 given that it was at x1 at
t1 , multiplied by that for being at x1 at t1 given it was at x0 at t0 , integrated over x1 . This should be
intuitively obvious, since if we pick any time t1 ∈ [t0 , t2 ], then the particle had to be somewhere at that
time. Indeed, one wonders how Chapman and Kolmogorov got their names attached to a result that is
so obvious. At any rate, a picture is worth a thousand words: see fig. 8.9.
Proceeding, we may write

Z∞
P (x, t + δt | x0 , t0 ) = dx′ P (x, t + δt | x′ , t) P (x′ , t | x0 , t0 ) . (8.302)
−∞
488 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

Now


P (x, t + δt | x′ , t) = δ x − δx(t) − x′
 

d
 2 d2
= 1 + δx(t) + 1
δx(t) + . . . δ(x − x′ ) (8.303)
dx′ 2 dx′ 2
d δ(x − x′ ) 2 ′
′ d δ(x − x )

= δ(x − x′ ) + F1 (x′ ) δt + 1
2 F2 (x ) 2 δt + O (δt)2
,
dx′ dx′
where the average is over the random variables. We now insert this result into eqn. 8.302, integrate by
parts, divide by δt, and then take the limit δt → 0. The result is the Fokker-Planck equation,

∂P ∂   1 ∂2  
=− F1 (x) P (x, t) + 2
F2 (x) P (x, t) . (8.304)
∂t ∂x 2 ∂x

8.10.5 Brownian motion redux

Let’s apply our Fokker-Planck equation to a description of Brownian motion. From our earlier results,
we have
F
F1 (x) = , F2 (x) = 2D . (8.305)
γM
A formal proof of these results is left as an exercise for the reader. The Fokker-Planck equation is then

∂P ∂P ∂ 2P
= −u +D 2 , (8.306)
∂t ∂x ∂x
where u = F/γM is the average terminal velocity. If we make a Galilean transformation and define

y = x − ut , s=t (8.307)

then our Fokker-Planck equation takes the form

∂P ∂ 2P
=D 2 . (8.308)
∂s ∂y

This is known as the diffusion equation. Eqn. 8.306 is also a diffusion equation, rendered in a moving
frame.
While the Galilean transformation is illuminating, we can easily solve eqn. 8.306 without it. Let’s take
a look at this equation after Fourier transforming from x to q:
Z∞
dq iqx
P (x, t) = e P̂ (q, t) (8.309)

−∞
Z∞
P̂ (q, t) = dx e−iqx P (x, t) . (8.310)
−∞
8.10. STOCHASTIC PROCESSES 489


Then as should be well known to you by now, we can replace the operator ∂x with multiplication by iq,
resulting in

P̂ (q, t) = −(Dq 2 + iqu) P̂ (q, t) , (8.311)
∂t
with solution
2
P̂ (q, t) = e−Dq t e−iqut P̂ (q, 0) . (8.312)
We now apply the inverse transform to get back to x-space:
Z∞ Z∞
dq iqx −Dq2 t −iqut ′
P (x, t) = e e e dx′ e−iqx P (x′ , 0)

−∞ −∞
Z∞ Z∞
dq −Dq2 t iq(x−ut−x′ )
= dx′ P (x′ , 0) e e (8.313)

−∞ −∞
Z∞
= dx′ K(x − x′ , t) P (x′ , 0) ,
−∞

where
1 2
K(x, t) = √ e−(x−ut) /4Dt (8.314)
4πDt
is the diffusion kernel. We now have a recipe for obtaining P (x, t) given the initial conditions P (x, 0). If
P (x, 0) = δ(x), describing a particle confined to an infinitesimal region about the origin, then P (x, t) =
K(x, t) is the probability distribution for finding the particle at x at time t. There are two aspects to
K(x, t) which merit comment. The first is that the center of the distribution moves with velocity √ u.
This is due to the presence of the external force. The second is that the standard deviation σ = 2Dt
is increasing in time, so the distribution is not only shifting its center but it is also getting broader as
time evolves. This movement of the center and broadening are what we have called drift and diffusion,
respectively.

8.10.6 Master Equation

Another way to model stochastic processes is via the master equation, which was discussed in chapter 3.
Recall that if Pi (t) is the probability for a system to be in state | i i at time t and Wij is the transition
rate from state | j i to state | i i, then
dPi X 
= Wij Pj − Wji Pi . (8.315)
dt
j

Consider a birth-death process in which the states | n i are labeled by nonnegative integers. Let αn denote
the rate of transitions from | n i → | n + 1 i and let βn denote the rate of transitions from | n i → | n − 1 i.
The master equation then takes the form17
dPn 
= αn−1 Pn−1 + βn+1 Pn+1 − αn + βn Pn . (8.316)
dt
17
We further demand βn=0 = 0 and P−1 (t) = 0 at all times.
490 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

Let us assume we can write αn = K ᾱ(n/K) and βn = K β̄(n/K), where K ≫ 1. We assume √ the
distribution Pn (t) has a time-dependent maximum at n = √ Kφ(t) and a width proportional to K. We
expand relative to this maximum, writing n ≡ Kφ(t) + K ξ and we define Pn (t) ≡ Π(ξ, t). We now
rewrite the master equation in eqn. 8.316 in terms of Π(ξ, t). Since n is an independent variable, we set
√ √
dn = K φ̇ dt + K dξ ⇒ dξ n = − K φ̇ dt . (8.317)

Therefore
dPn √ ∂Π ∂Π
= − K φ̇ + . (8.318)
dt ∂ξ ∂t
Next, we write, for any function fn ,

fn = Kf φ + K −1/2 ξ
(8.319)
= Kf (φ) + K 1/2 ξ f ′ (φ) + 12 ξ 2 f ′′ (φ) + . . . .

Similarly,

fn±1 = Kf φ + K −1/2 ξ ± K −1
(8.320)
= Kf (φ) + K 1/2 ξ f ′ (φ) ± f ′ (φ) + 12 ξ 2 f ′′ (φ) + . . . .

Dividing both sides of eqn. 8.316 by K, we have
 
∂Π ∂Π ∂Π ∂Π 1 ∂ 2Π
− φ̇+K −1/2 = (β̄ − ᾱ) +K −1/2 ′ ′
(β̄ − ᾱ ) ξ ′ ′
+ (ᾱ+ β̄) 2 +(β̄ − ᾱ )Π +. . . . (8.321)
∂ξ ∂t ∂ξ ∂ξ 2 ∂ξ

Equating terms of order K 0 yields the equation

φ̇ = f (φ) ≡ ᾱ(φ) − β̄(φ) . (8.322)

Equating terms of order K −1/2 yields the Fokker-Planck equation,

∂Π  ∂   ∂ 2Φ
= −f ′ φ(t) ξ Π + 21 g φ(t) , (8.323)
∂t ∂ξ ∂ξ 2

where g(φ) ≡ ᾱ(φ) + β̄(φ). If in the limit t → ∞, eqn. 8.322 evolves to a stable fixed point φ∗ , then the
stationary solution of the Fokker-Planck eqn. 8.323, Πeq (ξ) = Π(ξ, t = ∞) must satisfy

∂  ∂ 2Πeq 1 2 /2σ 2
− f ′ (φ∗ ) ξ Πeq + 21 g(φ∗ ) =0 ⇒ Πeq (ξ) = √ e−ξ , (8.324)
∂ξ ∂ξ 2 2πσ 2
where
g(φ∗ )
σ2 = − . (8.325)
2f ′ (φ∗ )
Now both α and β are rates, hence both are positive and thus g(φ) > 0. We see that the condition
σ 2 > 0 , which is necessary for a normalizable equilibrium distribution, requires f ′ (φ∗ ) < 0, which is
saying that the fixed point in eqn. 8.322 is stable.
8.11. APPENDIX I : BOLTZMANN EQUATION AND COLLISIONAL INVARIANTS 491

8.11 Appendix I : Boltzmann Equation and Collisional Invariants

Problem : The linearized Boltzmann operator Lψ is a complicated functional. Suppose we replace L by


L, where
 3/2 Z  
m 3 mu2
Lψ = −γ ψ(v, t) + γ d u exp −
2πkB T 2kB T
(   ) (8.326)
m 2 mu2 3 mv 2 3
× 1+ u·v+ − − ψ(u, t) .
kB T 3 2kB T 2 2kB T 2

Show that L shares all the important properties of L. What is the meaning of γ? Expand ψ(v, t) in
spherical harmonics and Sonine polynomials,
X
ψ(v, t) = arℓm (t) S r 1 (x) xℓ/2 Ymℓ (n̂), (8.327)
ℓ+ 2
rℓm

with x = mv 2 /2kB T , and thus express the action of the linearized Boltzmann operator algebraically on
the expansion coefficients arℓm (t).
The Sonine polynomials Sαn (x) are a complete, orthogonal set which are convenient to use in the calcu-
lation of transport coefficients. They are defined as
n
X Γ(α + n + 1) (−x)m
Sαn (x) = , (8.328)
m=0
Γ(α + m + 1) (n − m)! m!

and satisfy the generalized orthogonality relation


Z∞
′ Γ(α + n + 1)
dx e−x xα Sαn (x) Sαn (x) = δnn′ . (8.329)
n!
0

Solution : The ‘important properties’ of L are that it annihilate the five collisional invariants, i.e. 1, v,
and v 2 , and that all other eigenvalues are negative. That this is true for L can be verified by an explicit
calculation.
Plugging the conveniently parameterized form of ψ(v, t) into L, we have

X Z∞
γ X 1/2
Lψ = −γ arℓm (t) S r 1 (x) x ℓ/2
Ymℓ (n̂) + arℓm (t) dx1 x1 e−x1
ℓ+ 2 2π 3/2 rℓm
rℓm 0 (8.330)
Z h i
1/2   ℓ/2
× dn̂1 1 + 2 x1/2 x1 n̂· n̂1 + 2
3 x − 32 x1 − 32 S r 1 (x1 ) x1 Ymℓ (n̂1 ) ,
ℓ+ 2

where we’ve used r r


2kB T 1/2 kB T −1/2
u= x1 , du = x dx1 . (8.331)
m 2m 1
492 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

Now recall Y00 (n̂) = √14π and


r r r
1 3 3 3
Y1 (n̂) = − sin θ eiϕ Y01 (n̂) = cos θ 1
Y−1 (n̂) =+ sin θ e−iϕ
8π 4π 8π
0 0 1 3
S1/2 (x) = 1 S3/2 (x) = 1 S1/2 (x) = 2 −x ,

which allows us to write


1 = 4π Y00 (n̂) Y00 (n̂1 ) (8.332)

4π h 1 ∗ ∗ 1 ∗
i
n̂· n̂1 = Y0 (n̂) Y01 (n̂1 ) + Y11 (n̂) Y11 (n̂1 ) + Y−1
1
(n̂) Y−1 (n̂1 ) . (8.333)
3
We can do the integrals by appealing to the orthogonality relations for the spherical harmonics and Sonine
polynomials:
Z
′ ∗
dn̂ Ymℓ (n̂) Yml ′ (n̂) = δll′ δmm′ (8.334)

Z∞
′ Γ(n + α + 1)
dx e−x xα Sαn (x) Sαn (x) = δnn′ . (8.335)
Γ(n + 1)
0

Integrating first over the direction vector n̂1 ,


X
Lψ = −γ arℓm (t) S r 1 (x) xℓ/2 Ymℓ (n̂)
ℓ+ 2
rℓm
Z∞ Z 
2γ X 1/2 −x1 ∗
+√ arℓm (t) dx1 x1 e dn̂1 Y00 (n̂) Y00 (n̂1 ) S1/2
0 0
(x) S1/2 (x1 )
π
rℓm 0
(8.336)
1
X
1/2 ∗
+ 23 x1/2 x1 Ym1 ′ (n̂) Ym1 ′ (n̂1 ) S3/2
0 0
(x) S3/2 (x1 )
m′ =−1

∗ ℓ/2
+ 2
3 Y00 (n̂) Y00 (n̂1 ) S1/2
1 1
(x) S1/2 (x1 ) Sr 1 (x1 ) x1 Ymℓ (n̂1 ) ,
ℓ+ 2

we obtain the intermediate result


X
Lψ = −γ arℓm (t) S r 1 (x) xℓ/2 Ymℓ (n̂)
ℓ+ 2
rℓm
Z∞ 
2γ X 1/2 −x1
+√ arℓm (t) dx1 x1 e Y00 (n̂) δl0 δm0 S1/2
0 0
(x) S1/2 (x1 )
π
rℓm 0
(8.337)
1
X
1/2
+ 23 x1/2 x1 0
Ym1 ′ (n̂) δl1 δmm′ S3/2 0
(x) S3/2 (x1 )
m′ =−1

1/2
+ 2
3 Y00 (n̂) δl0 δm0 1
S1/2 1
(x) S1/2 (x1 ) Sr 1 (x1 ) x1 .
ℓ+ 2
8.11. APPENDIX I : BOLTZMANN EQUATION AND COLLISIONAL INVARIANTS 493

Appealing now to the orthogonality of the Sonine polynomials, and recalling that

Γ( 12 ) = π , Γ(1) = 1 , Γ(z + 1) = z Γ(z) , (8.338)

we integrate over x1 . For the first term in brackets, we invoke the orthogonality relation with n = 0

and α = 21 , giving Γ( 32 ) = 21 π. For the second bracketed term, we have n = 0 but α = 23 , and we
obtain Γ( 25 ) = 32 Γ( 32 ), while the third bracketed term involves leads to n = 1 and α = 21 , also yielding
Γ( 52 ) = 32 Γ( 32 ). Thus, we obtain the simple and pleasing result

X′
Lψ = −γ arℓm (t) S r 1 (x) xℓ/2 Ymℓ (n̂) (8.339)
ℓ+ 2
rℓm

where the prime on the sum indicates that the set


n o
CI = (0, 0, 0) , (1, 0, 0) , (0, 1, 1) , (0, 1, 0) , (0, 1, −1) (8.340)

are to be excluded from the sum. But these are just the functions which correspond to the five collisional
invariants! Thus, we learn that

ψrℓm (v) = Nrℓm S r 1 (x) x


ℓ/2
Ymℓ (n̂), (8.341)
ℓ+ 2

is an eigenfunction of L with eigenvalue −γ if (r, ℓ, m) does not correspond to one of the five collisional
invariants. In the latter case, the eigenvalue is zero. Thus, the algebraic action of L on the coefficients
arℓm is
(
−γ arℓm if (r, ℓ, m) ∈ / CI
(La)rℓm = (8.342)
=0 if (r, ℓ, m) ∈ CI

The quantity τ = γ −1 is the relaxation time.


It is pretty obvious that L is self-adjoint, since
Z
h φ | Lψ i ≡ d3v f 0 (v) φ(v) L[ψ(v)]
3/2 Z  
m 3 mv 2
= −γ n d v exp − φ(v) ψ(v)
2πkB T 2kB T
 3 Z Z    
m 3 3 mu2 mv 2 (8.343)
+γn d v d u exp − exp −
2πkB T 2kB T 2kB T
"   #
m 2 mu2 3 mv 2 3
× φ(v) 1 + u·v+ − − ψ(u)
kB T 3 2kB T 2 2kB T 2
= h Lφ | ψ i ,

where n is the bulk number density and f 0 (v) is the Maxwellian velocity distribution.
494 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

8.12 Appendix II : Distributions and Functionals

Let x ∈ R be a random variable, and P (x) a probability distribution for x. The average of any function
φ(x) is then
Z∞ , Z∞


φ(x) = dx P (x) φ(x) dx P (x) . (8.344)
−∞ −∞

 
Let η(t) be a random function  of t, with η(t) ∈ R, and let P η(t) be the probability distribution
functional for η(t). Then if Φ η(t) is a functional of η(t), the average of Φ is given by
Z ,Z
     
Dη P η(t) Φ η(t) Dη P η(t) (8.345)

R
The expression Dη P [η] Φ[η] is a functional integral. A functional integral is a continuum limit of a
multivariable integral. Suppose η(t) were defined on a set of t values tn = nτ . A functional of η(t)
becomes a multivariable function of the values ηn ≡ η(tn ). The metric then becomes
Y
Dη −→ dηn . (8.346)
n

In fact, for our purposes we will not need to know any details about the functional measure Dη; we will
finesse this delicate issue18 . Consider the generating functional,
Z Z∞ !
 
Z J(t) = Dη P [η] exp dt J(t) η(t) . (8.347)
−∞

It is clear that
1 δnZ[J]


= η(t1 ) · · · η(tn ) . (8.348)
Z[J] δJ(t1 ) · · · δJ(tn )
J(t)=0

The function J(t) is an arbitrary source function. We differentiate with respect to it in order to find the
η-field correlators.
Let’s compute the generating function for a class of distributions of the Gaussian form,
Z∞ !
1 
P [η] = exp − dt τ 2 η̇ 2 + η 2 (8.349)

−∞
Z∞ !
1 dω  2
= exp − 1 + ω 2 τ 2 η̂(ω) . (8.350)
2Γ 2π
−∞

18
A discussion of measure for functional integrals is found in R. P. Feynman and A. R. Hibbs, Quantum Mechanics and
Path Integrals.
8.12. APPENDIX II : DISTRIBUTIONS AND FUNCTIONALS 495

 
Figure 8.10: Discretization of a continuous function η(t). Upon discretization, a functional Φ η(t)
becomes an ordinary multivariable function Φ({ηj }).

Then Fourier transforming the source function J(t), it is easy to see that
Z∞ 2 !
Γ dω Jˆ(ω)
Z[J] = Z[0] · exp . (8.351)
2 2π 1 + ω 2 τ 2
−∞

Note that with η(t) ∈ R and J(t) ∈ R we have η ∗ (ω) = η(−ω) and J ∗ (ω) = J(−ω). Transforming back
to real time, we have
Z∞ Z∞ !
1
Z[J] = Z[0] · exp dt dt′ J(t) G(t − t′ ) J(t′ ) , (8.352)
2
−∞ −∞
where
Γ −|s|/τ b Γ
G(s) =
e , G(ω) = (8.353)
2τ 1 + ω2τ 2
is the Green’s function, in real and Fourier space. Note that
Z∞
b
ds G(s) = G(0) =Γ . (8.354)
−∞

We can now compute




η(t1 ) η(t2 ) = G(t1 − t2 ) (8.355)


η(t1 ) η(t2 ) η(t3 ) η(t4 ) = G(t1 − t2 ) G(t3 − t4 ) + G(t1 − t3 ) G(t2 − t4 ) (8.356)
+ G(t1 − t4 ) G(t2 − t3 ) .
The generalization is now easy to prove, and is known as Wick’s theorem:

X
η(t1 ) · · · η(t2n ) = G(ti1 − ti2 ) · · · G(ti2n−1 − ti2n ) , (8.357)
contractions
496 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

where the sum is over all distinct contractions of the sequence 1-2 · · · 2n into products of pairs. How
many terms are there? Some simple combinatorics answers this question. Choose the index 1. There
are (2n − 1) other time indices with which it can be contracted. Now choose another index. There are
(2n − 3) indices with which that index can be contracted. And so on. We thus obtain

# of contractions (2n)!
C(n) ≡ = (2n − 1)(2n − 3) · · · 3 · 1 = n . (8.358)
of 1-2-3 · · · 2n 2 n!

8.13 Appendix III : General Linear Autonomous Inhomogeneous ODEs

We can also solve general autonomous linear inhomogeneous ODEs of the form

dnx dn−1 x dx
n
+ a n−1 n−1
+ . . . + a1 + a0 x = ξ(t) . (8.359)
dt dt dt

We can write this as


Lt x(t) = ξ(t) , (8.360)

where Lt is the nth order differential operator

dn dn−1 d
Lt = n
+ an−1 n−1
+ . . . + a1 + a0 . (8.361)
dt dt dt

The general solution to the inhomogeneous equation is given by

Z∞
x(t) = xh (t) + dt′ G(t, t′ ) ξ(t′ ) , (8.362)
−∞

where G(t, t′ ) is the Green’s function. Note that Lt xh (t) = 0. Thus, in order for eqns. 8.360 and 8.362
to be true, we must have

this vanishes
z }| { Z∞
Lt x(t) = Lt xh (t) + dt′ Lt G(t, t′ ) ξ(t′ ) = ξ(t) , (8.363)
−∞

which means that


Lt G(t, t′ ) = δ(t − t′ ) , (8.364)

where δ(t − t′ ) is the Dirac δ-function.

If the differential equation Lt x(t) = ξ(t) is defined over some finite or semi-infinite t interval with
prescribed boundary conditions on x(t) at the endpoints, then G(t, t′ ) will depend on t and t′ separately.
For the case we are now considering, let the interval be the entire real line t ∈ (−∞, ∞). Then G(t, t′ ) =
G(t − t′ ) is a function of the single variable t − t′ .
8.13. APPENDIX III : GENERAL LINEAR AUTONOMOUS INHOMOGENEOUS ODES 497

d
 d
Note that Lt = L dt may be considered a function of the differential operator dt . If we now Fourier
transform the equation Lt x(t) = ξ(t), we obtain

Z∞ Z∞  n 
iωt iωt d dn−1 d
dt e ξ(t) = dt e + an−1 n−1 + . . . + a1 + a0 x(t)
dtn dt dt
−∞ −∞
(  (8.365)
Z∞
= dt eiωt (−iω)n + an−1 (−iω)n−1 + . . . + a1 (−iω) + a0 x(t) .
−∞

Thus, if we define
n
X
L̂(ω) = ak (−iω)k , (8.366)
k=0

then we have
L̂(ω) x̂(ω) = ξ̂(ω) , (8.367)

where an ≡ 1. According to the Fundamental Theorem of Algebra, the nth degree polynomial L̂(ω) may
be uniquely factored over the complex ω plane into a product over n roots:

L̂(ω) = (−i)n (ω − ω1 )(ω − ω2 ) · · · (ω − ωn ) . (8.368)

 ∗
If the {ak } are all real, then L̂(ω) = L̂(−ω ∗ ), hence if Ω is a root then so is −Ω ∗ . Thus, the roots
appear in pairs which are symmetric about the imaginary axis. I.e. if Ω = a + ib is a root, then so is
−Ω ∗ = −a + ib.
The general solution to the homogeneous equation is

n
X
xh (t) = Aσ e−iωσ t , (8.369)
σ=1

which involves n arbitrary complex constants Ai . The susceptibility, or Green’s function in Fourier space,
Ĝ(ω) is then
1 in
Ĝ(ω) = = , (8.370)
L̂(ω) (ω − ω1 )(ω − ω2 ) · · · (ω − ωn )
 ∗
Note that Ĝ(ω) = Ĝ(−ω), which is equivalent to the statement that G(t − t′ ) is a real function of its
argument. The general solution to the inhomogeneous equation is then

Z∞
x(t) = xh (t) + dt′ G(t − t′ ) ξ(t′ ) , (8.371)
−∞
498 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

where xh (t) is the solution to the homogeneous equation, i.e. with zero forcing, and where

Z∞
′ dω −iω(t−t′ )
G(t − t ) = e Ĝ(ω)

−∞
Z∞ ′
n dω e−iω(t−t ) (8.372)
=i
2π (ω − ω1 )(ω − ω2 ) · · · (ω − ωn )
−∞
n
X −iωσ (t−t′ )
e
= Θ(t − t′ ) ,
σ=1 i L′ (ω σ)

where we assume that Im ωσ < 0 for all σ. This guarantees causality – the response x(t) to the influence
ξ(t′ ) is nonzero only for t > t′ .
As an example, consider the familiar case

L̂(ω) = −ω 2 − iγω + ω02


= −(ω − ω+ ) (ω − ω− ) , (8.373)
q
with ω± = − 2i γ ± β, and β = ω02 − 14 γ 2 . This yields

L′ (ω± ) = ∓(ω+ − ω− ) = ∓2β . (8.374)

Then according to equation 8.372,


( )
e−iω+ s e−iω− s
G(s) = + Θ(s)
iL′ (ω+ ) iL′ (ω− )
 
e−γs/2 e−iβs e−γs/2 eiβs (8.375)
= + Θ(s)
−2iβ 2iβ
= β −1 e−γs/2 sin(βs) Θ(s) .



′ ) , assuming the noise is correlated
Now let us evaluate

the
two-point correlation function x(t) x(t
according to ξ(s) ξ(s′ ) = φ(s − s′ ). We assume t, t′ → ∞ so the transient contribution xh is negligible.
We then have

Z∞ Z∞




x(t) x(t ) = ds ds′ G(t − s) G(t′ − s′ ) ξ(s) ξ(s′ )
−∞ −∞
(8.376)
Z∞
dω 2 ′
= φ̂(ω) Ĝ(ω) eiω(t−t ) .

−∞
8.13. APPENDIX III : GENERAL LINEAR AUTONOMOUS INHOMOGENEOUS ODES 499

Higher order ODEs

Note that any nth order ODE, of the general form


 
dnx dx dn−1x
= F x, , . . . , n−1 , (8.377)
dtn dt dt

may be represented by the first order system ϕ̇ = V (ϕ). To see this, define ϕk = dk−1x/dtk−1 , with
k = 1, . . . , n. Thus, for k < n we have ϕ̇k = ϕk+1 , and ϕ̇n = F . In other words,

ϕ̇ V (ϕ)
z }| { z }| {
ϕ1 ϕ2
 .   .. 

d  ..   . 
=  . (8.378)
dt 
ϕ
 
  ϕ n


n−1 
ϕn F ϕ1 , . . . , ϕp

An inhomogeneous linear nth order ODE,

dnx dn−1x dx
+ a n−1 + . . . + a1 + a0 x = ξ(t) (8.379)
dtn dtn−1 dt
may be written in matrix form, as
Q ξ
  z }| {   z }| {
ϕ1 0 1 0 ··· 0 ϕ1 0
   0 0    
d  ϕ  0 1 ··· ϕ
  2 0 
 .2  =
 .. .. .. ..   . + 


..  . (8.380)

dt  . 
.   . . .
 . 
.  .   . 
ϕn −a0 −a1 −a2 · · · −an−1 ϕn ξ(t)

Thus,
ϕ̇ = Q ϕ + ξ , (8.381)
and if the coefficients ck are time-independent, i.e. the ODE is autonomous.
For the homogeneous case where ξ(t) = 0, the solution is obtained by exponentiating the constant matrix
Qt:
ϕ(t) = exp(Qt) ϕ(0) ; (8.382)
the exponential of a matrix may be given meaning by its Taylor series expansion. If the ODE is not
autonomous, then Q = Q(t) is time-dependent, and the solution is given by the path-ordered exponential,
( Zt )
ϕ(t) = P exp dt′ Q(t′ ) ϕ(0) , (8.383)
0

where P is the path ordering operator which places earlier times to the right. As defined, the equation
ϕ̇ = V (ϕ) is autonomous, since the t-advance mapping gt depends only on t and on no other time
500 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

variable. However, by extending the phase space M ∋ ϕ from M → M × R, which is of dimension n + 1,


one can describe arbitrary time-dependent ODEs.
In general, path ordered exponentials are difficult to compute analytically. We will henceforth consider
the autonomous case where Q is a constant matrix in time. We will assume the matrix Q is real, but other
than that it has no helpful symmetries. We can however decompose it into left and right eigenvectors:
n
X
Qij = νσ Rσ,i Lσ,j . (8.384)
σ=1
P
Or, in bra-ket notation, Q = σ νσ |Rσ ihLσ |. The normalization condition we use is


Lσ Rσ′ = δσσ′ , (8.385)

where νσ are the eigenvalues of Q. The eigenvalues may be real or imaginary. Since the characteristic
polynomial P (ν) = det (ν I − Q) has real coefficients, we know that the eigenvalues of Q are either real
or come in complex conjugate pairs.
Consider, for example, the n = 2 system we studied earlier. Then
 
0 1
Q= . (8.386)
−ω02 −γ
q
The eigenvalues are as before: ν± = − 12 γ ± 1 2
4γ − ω02 . The left and right eigenvectors are
 
±1  1
L± = −ν∓ 1 , R± = . (8.387)
ν+ − ν− ν±

The utility of working in a left-right eigenbasis is apparent once we reflect upon the result
n
X

f (Q) = f (νσ ) Rσ Lσ (8.388)
σ=1

for any function f . Thus, the solution to the general autonomous homogeneous case is
n
X

ϕ(t) = eνσ t Rσ Lσ ϕ(0) (8.389)
σ=1
Xn n
X
νσ t
ϕi (t) = e Rσ,i Lσ,j ϕj (0) . (8.390)
σ=1 j=1

If Re (νσ ) ≤ 0 for all σ, then the initial conditions ϕ(0) are forgotten on time scales τσ = νσ−1 . Physicality
demands that this is the case.
Now let’s consider the inhomogeneous case where ξ(t) 6= 0. We begin by recasting eqn. 8.381 in the form

d −Qt 
e ϕ = e−Qt ξ(t) . (8.391)
dt
8.13. APPENDIX III : GENERAL LINEAR AUTONOMOUS INHOMOGENEOUS ODES 501

We can integrate this directly:

Zt
Qt
ϕ(t) = e ϕ(0) + ds eQ(t−s) ξ(s) . (8.392)
0

In component notation,

n
X n Zt

X

ϕi (t) = eνσ t
Rσ,i Lσ ϕ(0) + Rσ,i ds eνσ (t−s) Lσ ξ(s) . (8.393)
σ=1 σ=1 0

Note that the first term on the RHS is the solution to the homogeneous equation, as must be the case
when ξ(s) = 0.
The solution in eqn. 8.393 holds for general Q and ξ(s). For the particular form of Q and ξ(s) in eqn.
8.380, we can proceed further. For starters, hLσ |ξ(s)i = Lσ,n ξ(s). We can further exploit a special
feature of the Q matrix to analytically determine all its left and right eigenvectors. Applying Q to the
right eigenvector |Rσ i, we obtain

Rσ,j = νσ Rσ,j−1 (j > 1) . (8.394)

We are free to choose Rσ,1 = 1 for all σ and defer the issue of normalization to the derivation of the left
eigenvectors. Thus, we obtain the pleasingly simple result,

Rσ,k = νσk−1 . (8.395)

Applying Q to the left eigenvector hLσ |, we obtain

−a0 Lσ,n = νσ Lσ,1 (8.396)


Lσ,j−1 − aj−1 Lσ,n = νσ Lσ,j (j > 1) . (8.397)

From these equations we may derive


k−1 n
Lσ,n X j−k−1
Lσ,n X
Lσ,k =− aj νσ = aj νσj−k−1 . (8.398)
νσ νσ
j=0 j=k

Pn
The equality in the above equation is derived using the result P (νσ ) = j=0 aj νσj = 0. Recall also that
an ≡ 1. We now impose the normalization condition,
n
X
Lσ,k Rσ,k = 1 . (8.399)
k=1

This condition determines our last remaining unknown quantity (for a given σ), Lσ,p :
n
X


Lσ Rσ = Lσ,n k ak νσk−1 = P ′ (νσ ) Lσ,n , (8.400)
k=1
502 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

where P ′ (ν) is the first derivative of the characteristic polynomial. Thus, we obtain another neat result,
1
Lσ,n = . (8.401)
P ′ (ν σ)

Now let us evaluate the general two-point correlation function,






Cjj ′ (t, t′ ) ≡ ϕj (t) ϕj ′ (t′ ) − ϕj (t) ϕj ′ (t′ ) . (8.402)
We write
Z∞

dω ′
ξ(s) ξ(s ) = φ(s − s′ ) =

φ̂(ω) e−iω(s−s ) . (8.403)

−∞

When φ̂(ω) is constant, we have ξ(s) ξ(s′ ) = φ̂(t) δ(s − s′ ). This is the case of so-called white noise,
when all frequencies contribute equally. The more general case when φ̂(ω) is frequency-dependent is
known as colored noise. Appealing to eqn. 8.393, we have
′ Zt Zt′
X ν j−1 νσj′ −1 ′ ′
Cjj ′ (t, t′ ) = σ
ds eνσ (t−s) ds′ eνσ′ (t −s ) φ(s − s′ ) (8.404)
σ,σ′ P ′ (νσ ) P ′ (νσ′ )
0 0
j ′ −1 Z∞ ′ ′
X νσj−1 νσ ′ dω φ̂(ω) (e−iωt − eνσ t )(eiωt − eνσ′ t )
= . (8.405)
P ′ (νσ ) ′
P (νσ′ ) 2π (ω − iνσ )(ω + iνσ′ )
σ,σ′ −∞

In the limit t, t′ → ∞, assuming Re (νσ ) < 0 for all σ (i.e. no diffusion), the exponentials eνσ t and eνσ′ t
may be neglected, and we then have
′ Z∞
X ν j−1 νσj′ −1

′ σ dω φ̂(ω) e−iω(t−t )
Cjj ′ (t, t ) = . (8.406)
P ′ (νσ ) P ′ (νσ′ ) 2π (ω − iνσ )(ω + iν ′ )
σ,σ′ −∞ σ

8.14 Appendix IV : Correlations in the Langevin formalism

As shown above, integrating the Langevin equation ṗ + γp = F + η(t) yields


Zt
−γt F 
p(t) = p(0) e + 1 − e−γt + ds η(s) eγ(s−t) . (8.407)
γ
0

. Thus, the momentum autocorrelator is


Zt Zt′



′ ′

p(t) p(t′ ) − p(t) p(t′ ) = ds ds′ eγ(s−t) eγ(s −t ) η(s) η(s′ )
0 0
tZmin
(8.408)

 ′ ′

= Γ e−γ(t+t ) ds e2γs = M kB T e−γ|t−t | − e−γ(t+t ) ,
0
8.14. APPENDIX IV : CORRELATIONS IN THE LANGEVIN FORMALISM 503

Figure 8.11: Regions for some of the double integrals encountered in the text.

where (
t if t < t′
tmin = min(t, t′ ) = ′ (8.409)
t if t′ < t
is the lesser of t and t′ . Here we have used the result
Zt Zt′ tZmin Z
tmin
′ ′
ds ds′ eγ(s+s ) δ(s − s′ ) = ds ds′ eγ(s+s ) δ(s − s′ )
0 0 0 0
(8.410)
tZmin
1  2γt 
= ds e2γs = e min − 1 .

0

One way to intuitively understand this result is as follows. The double integral over s and s′ is over a
rectangle of dimensions t × t′ . Since the δ-function can only be satisfied when s = s′ , there can be no
contribution to the integral from regions where s > t′ or s′ > t. Thus, the only contributions can arise
from integration over the square of dimensions tmin × tmin . Note also

t + t′ − 2 min(t, t′ ) = |t − t′ | . (8.411)

Let’s now compute the position x(t). We have


Zt
1
x(t) = x(0) + ds p(s)
M
0
Zt "  # Zt Zs
F −γs F 1
= x(0) + ds v(0) − e + + ds ds1 η(s1 ) eγ(s1 −s) (8.412)
γM γM M
0 0 0
Zt Zs

1
= x(t) + ds ds1 η(s1 ) eγ(s1 −s) ,
M
0 0
504 CHAPTER 8. NONEQUILIBRIUM PHENOMENA



with v = p/M . Since η(t) = 0, we have

Z t "  #

F −γs F
x(t) = x(0) + ds v(0) − e +
γM γM
0 (8.413)
 
Ft 1 F 
= x(0) + + v(0) − 1 − e−γt .
γM γ γM


Note that for γt ≪ 1 we have x(t) = x(0) + v(0) t + 12 M −1 F t2 + O(t3 ), as is appropriate for ballistic
particles moving under the influence of a constant
force. This long time limit of course agrees with our
earlier evaluation for the terminal velocity, v∞ = p(∞) /M = F/γM .
We next compute the position autocorrelation:

Zt Zt′ Zs Zs′



1 ′

x(t) x(t′ ) − x(t) x(t′ ) = 2 ds ds′ e−γ(s+s ) ds1 ds′1 eγ(s1 +s2 ) η(s1 ) η(s2 )
M
0 0 0 0
Zt Zt′
 
Γ ′ ′
= ds ds′ e−γ|s−s | − e−γ(s+s ) (8.414)
2γM 2
0 0

We have to be careful in computing the double integral of the first term in brackets on the RHS. We can
assume, without loss of generality, that t ≥ t′ . Then

Zt Zt′ Zt′ Zt Zt′ Zs′


′ ′ ′
ds ds′ e−γ|s−s | = ds′ eγs ds e−γs + ds′ e−γs ds eγs
(8.415)
0 0 0 s′ 0 0
−γt′ ′ 
= 2γ −1 t′ + γ −2 e−γt + e − 1 − e−γ(t−t ) .

We then find, for t > t′ ,




2k T k T ′ ′ ′ 
x(t) x(t′ ) − x(t) x(t′ ) = B t′ + 2B 2e−γt + 2e−γt − 2 − e−γ(t−t ) − e−γ(t+t ) . (8.416)
γM γ M

In particular, the equal time autocorrelator is



2 2k T k T 
x2 (t) − x(t) = B t + 2B 4e−γt − 3 − e−2γt . (8.417)
γM γ M

We see that for long times




2
x2 (t) − x(t) ∼ 2Dt , (8.418)

where D = kB T /γM is the diffusion constant.


8.15. APPENDIX V : KRAMERS-KRÖNIG RELATIONS 505

8.15 Appendix V : Kramers-Krönig Relations

Suppose χ̂(ω) ≡ Ĝ(ω) is analytic in the UHP19 . Then for all ν, we must have
Z∞
dν χ̂(ν)
=0, (8.419)
2π ν − ω + iǫ
−∞

where ǫ is a positive infinitesimal. The reason is simple: just close the contour in the UHP, assuming
χ̂(ω) vanishes sufficiently rapidly that Jordan’s lemma can be applied. Clearly this is an extremely weak
restriction on χ̂(ω), given the fact that the denominator already causes the integrand to vanish as |ω|−1 .
Let us examine the function
1 ν−ω iǫ
= 2 2
− . (8.420)
ν − ω + iǫ (ν − ω) + ǫ (ν − ω)2 + ǫ2

which we have separated into real and imaginary parts. Under an integral sign, the first term, in the
limit ǫ → 0, is equivalent to taking a principal part of the integral. That is, for any function F (ν) which
is regular at ν = ω,
Z∞ Z∞
dν ν−ω dν F (ν)
lim 2 2
F (ν) ≡ ℘ . (8.421)
ǫ→0 2π (ν − ω) + ǫ 2π ν − ω
−∞ −∞

The principal part symbol ℘ means that the singularity at ν = ω is elided, either by smoothing out the
function 1/(ν − ǫ) as above, or by simply cutting out a region of integration of width ǫ on either side of
ν = ω.
The imaginary part is more interesting. Let us write
ǫ
h(u) ≡ . (8.422)
u2 + ǫ2

For |u| ≫ ǫ, h(u) ≃ ǫ/u2 , which vanishes as ǫ → 0. For u = 0, h(0) = 1/ǫ which diverges as ǫ → 0. Thus,
h(u) has a huge peak at u = 0 and rapidly decays to 0 as one moves off the peak in either direction a
distance greater that ǫ. Finally, note that
Z∞
du h(u) = π , (8.423)
−∞

a result which itself is easy to show using contour integration. Putting it all together, this tells us that
ǫ
lim = πδ(u) . (8.424)
ǫ→0 u2 + ǫ2
Thus, for positive infinitesimal ǫ,
1 ℘
= ∓ iπδ(u) , (8.425)
u ± iǫ u
19
In this section, we use the notation χ̂(ω) for the susceptibility, rather than Ĝ(ω)
506 CHAPTER 8. NONEQUILIBRIUM PHENOMENA

a most useful result.


We now return to our initial result 8.419, and we separate χ̂(ω) into real and imaginary parts:

χ̂(ω) = χ̂′ (ω) + iχ̂′′ (ω) . (8.426)

(In this equation, the primes do not indicate differentiation with respect to argument.) We therefore
have, for every real value of ω,
Z∞ h ih ℘ i
dν ′
0= χ (ν) + iχ′′ (ν) − iπδ(ν − ω) . (8.427)
2π ν−ω
−∞

Taking the real and imaginary parts of this equation, we derive the Kramers-Krönig relations:
Z∞
′ dν χ̂′′ (ν)
χ (ω) = +℘ (8.428)
π ν−ω
−∞
Z∞
dν χ̂′ (ν)
χ′′ (ω) = −℘ . (8.429)
π ν−ω
−∞

You might also like