lesson_1
lesson_1
conservation of information
1
Introduction
2
dynamics course and so forth, in the collection The The-
oretical Minimum, those topics are all about perfect pre-
dictability.
So we can see that while the basic laws of physics are ex-
tremely powerful in their predictability, they can also be
3
in many cases totally useless for actually analyzing what is
really going on.
4
We cannot predict the position of every molecule. We can-
not predict when there might be a uctuation. Fluctuations
are phenomena which happen, which don't really violate
probability theory, but are sort of tails of the probability
distributions. They are things which are unlikely but not
impossible. Fluctuations happen from time to time. In a
sealed room, every so often a density of molecules bigger
than the average will appear in some small region. Some-
place else molecules may be briey less dense. Fluctuations
like that are hard to predict.
5
contain elements either too small, or too numerous, or for
any reason too dicult to keep track of. That is when we
use statistical mechanics, in other words probability theory.
Let's start this course with a brief review of the main con-
cepts of elementary probability theory.
6
Elementary probability theory
8
is considered as likely as any other. Thus when we
spin the wheel on the right part of gure 1, and wait
until it has stopped and the xed index points to a
color, the ve colors are said to be equiprobable.
9
Another example, of impossibility this time, is when we
say "let's pick a point at random uniformly over the entire
line". It is actually impossible. There is no such thing as
a density of probability with the same value from −∞ to
+∞4 . So we have to be careful.
X(ω) (1)
X:Ω→A (2)
4
But a sequence of densities can become more and more at and
therefore with almost the same value necessarily close to zero
everywhere. Similarly, there is no such thing as a function whose
value is zero everywhere except at one point and whose integral is
one. But distributions, which are limits of functions, can be like that.
10
The set A can be nite, innite and countable, or innite
and continuous.
A = {1, 2, 3, 4, 5, 6} (3)
11
P is dened such that any subset of Ω called an event
has a probability. In the case of the die, it is particularly
simple. Each ω is itself an interesting event. There are six of
them. And if the die is well balanced they are equiprobable.
Thus we write
1
P {X = 5} = (4)
6
12
after the performance of E , and a probability P .
framework = [ E, Ω, P ] (6)
Let's now simplify a bit the setting and the notations. For
the time being, the set of possible states will be
Ω = { ω1 , ω2 , ω2 , ... ωn } (7)
P (i) (8)
5
In maths manuals, the reader will usually see the framework de-
scribed as [ Ω, A, P ], the experiment E not being mentioned which
in our teaching experience is regrettable. And the extra A, not to be
confused with the target set of any random variable, is the collection
of subsets of Ω, but not quite all of them, only the "measurable" ones.
We won't be concerned here with those subtleties.
6
In this last case probabilities will be replaced by densities of
probability. Instead of considering P {X = x}, which will usually be
equal to 0, we will consider P { X ∈ [ x, x + dx ] } = p(x)dx. And,
following the custom, we will often still denote it P (x)dx, keeping in
mind that each random variable has its own density.
13
They must satisfy
P (i) ≥ 0
n
X (9)
P (i) = 1
i
Ni
lim = P (i) (10)
N →∞ N
14
times, the frequency will be even closer to 1/2. In each
case, it is only a probabilistic statement. There can be
and in fact most of the times there will be a discrepancy.
That discrepancy is itself a random variable. But it will
have a distribution more and more concentrated, relatively
to its range, around 0.
Let's see why, in the case of tossing coins, the law is a sim-
ple result in numbering. Consider the experiment F which
consists in tossing the coin 1000 times, i.e. repeating E 1000
times. The space ΩF attached to F has 21000 elements
that is a huge number. Each are equiprobable. When we
perform F once, i.e. when we repeat E a thousand times,
we pick one element in ΩF .
15
To try to shed even more light on the phenomenon, rather
than do some combinatorics, consider the 16 possible re-
sults, displayed below, of throwing the coin four times. The
reader can check that there is only one result with zero
head. Four results with 1 head. Six results with 2 heads.
Four results with 3 heads, and one result with 4 heads.
T, T, T, T
T, T, T, H
T, T, H, T
T, T, H, H
T, H, T, T
T, H, T, H
T, H, H, T
T, H, H, H
H, T, T, T
H, T, T, H
H, T, H, T
H, T, H, H
H, H, T, T
H, H, T, H
H, H, H, T
H, H, H, H
16
(a + b)N . The concentration about half and half is more
marked, of course, when N is larger than 4, and it grows
more and more marked as N increases.
7
Named after Pafnuty Chebyshev (1821 - 1894), Russian mathe-
matician.
8
Another beautiful and useful result is the Central Limit Theorem,
which shows in essence that the Pascal triangle looks more and more
like a bell-shaped curve called a Gaussian. And it is true in a much
more general setting than just ipping a coin many times.
17
F can be some meaningful physical quantity. We can also
make it up. For example if our system is heads and tails,
and nothing but heads and tails, we could assign
F (heads) = +1
(11)
F (tails) = −1
F :Ω→A (12)
E(i) (13)
18
Then an important quantity is the average of F (i). We will
use the quantum mechanical notation for it, even though
we are not doing quantum mechanics. It is a nice notation.
Physicists tend to use it all over the place. Mathematicians
hate it. We just put a pair of brackets around F to mean
its average. It is dened as follows
n
X
<F >= F (i)P (i) (14)
i
19
what a random variable or random measurement is, and
what is an average, because we will use them over and over.
Probability, symmetry
and systems evolving with time
Let's start with coin ipping again. For the usual coin, the
probability for heads is usually deemed to be 1/2, and the
probability for tails is usually also deemed to be 1/2. But
why do we think that? Why is it 1/2 and 1/2? What is the
logic there?
20
Another example besides coin tossing would be die throw-
ing. Now the space Ω, instead of having two states, has six
states. When we throw the die, and it has nished rolling,
the state it is in is the face showing up. To stress that states
don't have to be numerical values, let's consider a die with
colored faces, gure 2.
What is the probability that after a throw the die turns up,
for instance, blue like in gure 2? It is 1/6.
Why? Because there are six possibilities. They are all sym-
metric with respect to each other. We use the principle of
symmetry that tell us that the P (i)'s are all equal, therefore
21
1
P (i) = for all i (17)
6
22
Let's take our six-sided polyhedron, and assume that it is
not a nice symmetric cube. Furthermore suppose this ob-
ject has the following funny habit: when it is in one state,
it then jumps to another denite state, then to another def-
inite state, etc. It is called the law of motion of the system.
R −→ B
B −→ G
G −→ Y
Y −→ O
O −→ P
23
P −→ R
24
We might have no idea when and in which state the system
began. But suppose our job is to catch it at a particular
instant and ask what the color is. Even though we don't
know the starting point, we can still say that the probabil-
ity for each state is 1/6.
25
cycle of events in which we pass through each color once
before we cycle around.
Are there possible laws for the system which will not give
us 1/6? Yes. Let's write another law, gure 5.
26
Notice in this case, when we are on one of the two cycles,
we stay on it forever.
In fact, let's give the cycles labels. The upper cycle we call
+1. And the lower triangle we call −1. It is just attaching
to them a number, a numerical value. If we are on the up-
per cycle, something or other is called +1, and below it is
−1.
27
Now we have to append something that we get from some
place else. It doesn't follow from symmetry. And it doesn't
follow from cycling through the system. It is the proba-
bility10 that we are on cycle +1, or its complement to one
that we are on cycle −1.
P { cycle = +1 }
(18)
P { cycle = −1 }
10
Again, talking about a probability means talking about a ran-
dom experiment at least an implicit one that can be repeated.
This touches on the debate between Frequentists and Bayesians in
statistics. The latter are willing to consider probabilities of events
in experiments that can be performed only once, whereas the former
require that there exist a reproducible experiment. We refer the reader
to the statistics literature on the subject.
28
where P {blue} is the overall probability to get blue, P {+1}
is an abbreviation for P { cycle = +1 }, and P { blue | + 1}
is the standard notation for the conditional probability to
get blue given that we already know that we are on the rst
cycle.
29
doesn't change. That is what a conservation law is.
30
comes from is part of the study of statistical mechanics.
Answer: Right. That comes from the fact that the time
spent in each state is the same.
31
at which we take the measurement of the color.
A.: Oh yes. If there are several distinct cycles, like for in-
stance in gure 5, and we take two pictures one after the
other, the two colors we shall obtain will be on the same
cycle.
32
the probabilities for the various possible congurations of
the system.
33
Figure 8: Two closed systems.
34
such value, given some overall piece of information.
12
Defense Of Marriage Act.
35
Conservation of information is not a standard conservation
law like those we just studied, saying that there are certain
quantities which are conserved when a system evolves over
time through a cycle or any other kind of trajectory.
36
When we are at red, where would the law say we should go?
It wouldn't say, because there are several arrows pointing
at red in gure 10, and therefore when we reverse time sev-
eral arrows would leave red for dierent states. That would
not be a law at all, because in physics a law of evolution
must be deterministic13 .
A good law of motion is one such that for each state, one
and only one arrow arrives at it, and one and only one ar-
row leaves it.
37
The law in gure 10 loses information. That is exactly the
kind of thing that classical physics doesn't allow. Quantum
physics also doesn't allow laws of motions that are not de-
terministic in the future, or not deterministic in the past,
i.e. not reversible.
38
at the foundation of what knowledge means, as opposed to
phantasmagoric beliefs.
d2 xn dxn
= −γ (21)
dt2 dt
15
Named after Joseph Liouville (1809 - 1882), French mathemati-
cian.
39
The left-hand side is acceleration. And the right-hand side
corresponds to a frictional force opposing motion. The fac-
tor γ is the viscous drag. We could put masses on the
left-hand side multiplying the second derivative, but that
would not change anything for our purpose, so let's take
them all equal to 1.
40
We start with a random bunch of particles moving in ran-
dom directions, and we let it run, and they all come to rest.
What we end up with is simpler and requires less informa-
tion to describe than what we started with. It is very much
like, in gure 10, everything going to red.
41
other three colors have probability zero. Where we got that
from? It doesn't matter. We got it from somewhere. Some-
body secretly told us in our ear: "It is either red, yellow or
blue, but I'm not going to tell you which."16
They may get reshued, which ones are probable and which
ones are improbable. But after a certain time, there will
continue to be three states which have each probability 1/3
16
The upper face of the die actually shows one color, that is the
die is in one specic state, but for some reason we don't know it. We
only know the partial information given by the chap.
We could also imagine the die having three faces with three shades
of green, and the others with three shades of purple. And, because
we did not take a good enough look, we only got the general color of
the state, not the precise shade.
And to avoid being in a Bayesian setting where probabilities don't
correspond to any reproducible experiment and therefore, to many
of us, would be meaningless , let's imagine that at least theoretically
we could play the game many times and those three colors would turn
up randomly with three equal probabilities, because the die is loaded
or for any other reason.
42
and the rest probability zero.
43
have non-zero probability will remain constant and their
probabilities will remain equal to 1/M .
17
As already explained, if we could reproduce the random experi-
ment generating these probabilities, and we had omnipotent powers
to measure time and states, we would see that at time 0 a certain set
of M states occur more or less equally frequently, the others having 0
probability. And at another time t, it would be another set of states
which would be possible, the others having 0 probability. This other
set would have the same number M of states as a consequence of the
conservation of information, and they would still be equiprobable.
44
If M is equal to N/2, that means we know that the system
is in one out of half the states. We are still pretty ignorant,
but we are not that ignorant. We are less ignorant.
Entropy
45
S = log M (23)
46
observer , perhaps there are too many degrees of freedom
to keep track of.
47
It depends on two things. It depends on characteristics of
the system, and it also depends on our state of knowledge
of the system. The reader should keep that in mind.
20
Note on terminology: sometimes the term conguration space
refers only to positions, as we did in volume 1; and sometimes it
refers to positions and momenta, in which case it is another name for
the phase space. Anyway, phase space is the usual and unambiguous
name for the space of positions and momenta.
48
that the phase space is the space of positions and velocities.
49
this axis is a stand-in for all of the momentum degrees of
freedom, that is all the pi 's. If there are 1023 particles there
are 1023 p's. And each of them is actually a vector of three
values. But we can't draw more than one of them. Well,
we could draw two of them but then we wouldn't have any
room for the q 's, that is for the x's.
50
In the sub-region shown in gure 12, all the states have
equal probability. The time is xed for the moment. That
is, at that time t, the system can be in any of the states
in the patch, equally probably for each of them. And the
probability that the system be outside is zero.
Let's note, to start with, that knowing that the particles are
in the room puts some boundaries on where x is in gure
12. Remember that we now call x the entire collection of
positions of the particles. x is a unique multi-dimensional
variable attached to the state of the system. And for conve-
nience we represent it as a simple number on the horizontal
axis.
51
Now the system evolves over time. As the system evolves
x and p change. Over a period ∆t, each point in the sub-
region in gure 12 will go to another point elsewhere in the
gure, not necessarily in the initial region. Points nearby
will go to points nearby. The whole sub-region also called
the occupied patch will ow to another region with a pos-
sibly dierent shape.
And just like there was equal probability for the system to
be at any point in the initial patch in gure 12, after ∆t
there is equal probability for the system to be at any point
in the new patch, wherever it is and whatever its new shape
is now.
52
by small volumes, but the reasoning is the same.
53
Hint: Start with the discrete case corresponding to
formulas (22), and then go to the continuous case of
gure 13.
Let's stress that Liouville's theorem not only says that the
volume of the occupied region will stay the same as it ows,
but also, a little bit better, that if we start with a uniform
probability distribution, it will remain uniform.
54
What Liouville's result says in fact is if the blob squeezes
in one direction, it must expand in another direction, like
a drop of water trapped beneath a plastic lm coating a
surface, which we can move around but not remove.
For the case of the eraser sliding on the table, there is re-
ally a very high dimensional phase space. And as the eraser
may come to rest, or almost rest, so that the phase space
squeezes one way, it spreads out in another direction having
to do with the other hidden microscopic degrees of freedom.
As the reader may be aware of, we will see that it can also
go up. That will be the subject of the second law of ther-
modynamics.
55
vation of information, is expressed in the context of statis-
tical mechanics and imperfect knowledge of all the degrees
of freedom describing a system.
56
thermal equilibrium with C .
dE
=0 (24)
dt
Now this is the law of energy conservation for a closed sys-
tem.
57
Figure 14: Two systems forming altogether one closed system.
dEA dEB
=− (25)
dt dt
We could have written that the sum of the two derivatives
is equal to zero, but equation (25) emphazises that what
energy we lose on one side we gain on the other.
If you have two systems and they interact with each other,
there maybe for example forces between the two parts. So
there might be a potential energy that is a function of both
of the coordinates.
58
System made of the Sun and the Earth. The energy consists
of the kinetic energy of one body plus the kinetic energy of
the other body plus a term which doesn't belong to either
body. It belongs to both of them in a sense. It is the po-
tential energy of interaction between the Sun and the Earth.
On the other hand, there are many contexts where the inter-
action energies between subsystems is negligible compared
to the energy that the subsystems themselves have.
59
and which we omit. In those circumstances, the rst law of
thermodynamics is often expressed with equation (25).
More on entropy
60
Figure 15: Equiprobable distribution over the set of occupied
states.
On the horizontal axis are laid out the dierent states la-
belled by their index i. And vertically we plot their prob-
ability. The probability distributions we considered are
those such that inside a subset of occupied states all the
probabilities have the same positive value, and outside the
probabilities are zero. If there are M occupied states, the
probability of each of them is 1/M .
S = log M (26)
61
be anything as long as they are positive and all add up to 1.
Let's draw another one, and for simplicity, even though we
are in a discrete case, let's draw it as a continuous curve.
62
out distribution will correspond to a high entropy, while a
narrow distribution over a small number of states will cor-
respond to a small entropy.
Let's see what this yields in the special case where the prob-
ability distribution is 1/M , over M states.
For the unoccupied states the term P (i) log P (i) looks like
it is going to cause a problem. It is zero times logarithm
of zero. Of course logarithm of zero is not dened. The
limit when P tends to 0 of log P is minus innity. But the
logarithm grows, in the negative numbers, to innity much
slower than P goes to zero. So P log P goes to 0. It is
left as a little exercise for the reader to show that. So in
general the contribution from states with very small proba-
bility will be very small, and by continuity the contribution
of unoccupied states will be zero.
63
M
X 1 1
S=− log
M M
i=1
S = log M
Notice that the minus sign in the general denition (27) for
the entropy is there simply because P (i) is always less than
one, therefore all the log P (i) are negative. If we want the
entropy to be a positive number, we have to introduce a
minus sign in its denition.
64
should check that it is consistent with formula (27).
65
state it is in, it oers new and stunning explanations of what
is entropy and temperature, and their relations to energy
and pressure. We will discover little by little this beauty as
we progress in our study. The present chapter is devoted
to the probabilistic framework and to the introduction of
entropy. Temperature will be treated in chapter 2, pres-
sure in chapter 5. And we will study many other subjects
in statistical mechanics.
66
Figure 17: Collection of n coins, each showing heads or tails.
N = 2n (28)
There are two states for the rst coin, two states for the
second coin, etc. That makes 2n possible states for the col-
lection of n coins.
67
S = n log 2 (29)
68
So complete knowledge corresponds to zero entropy. And,
generally speaking, the more you know about the system,
the smaller the entropy is.
S = log n (30)
69
Nevertheless log 2 is a good unit of entropy. And it is called
the bit.
70
Questions / answers session (2)
71
In this book, when we write log we mean logarithm to the
base e. But very little would change if we used some other
base for the logarithms. For us physicists the maximum
entropy of the collection of n coins is aproximately 0.7 x n.
But anyway we just write it n log 2.
Figure 19: Region, or blob, in the phase space formed by the oc-
cupied states.
72
Then the denition of the entropy should be simple: we
take the logarithm of the number of states in the blob
except that we are in the continuous case and the number
of occupied states is innite.
S = log VP S (31)
73
becomes now the integral of a probability density over the
phase space
Z
P (p, x) dp dx = 1 (33)
The density P (p, x) is non zero only over the blob of occu-
pied states but that doesn't make any dierence in formula
(33).
74
equilibrium yet but we will.
75
To say, like in the rst example, that all states are equally
probable is closely related to saying that there are no cor-
relations.
28
Abbreviation for random variable.
76
We can also dene and compute a joint probability dis-
tribution for (X, Y ). It is denoted PXY . For instance
PXY (H, H) = 1/2, PXY (H, T ) = 1/4, etc.
PXY = PX PY (37)
77
variables are uncorrelated i the expectation of their prod-
uct is the product of their expectations29 , that is i
29
We use in formulas (38) and (39) the standard notations of prob-
abilists.
78
Now let's look at the case where what we know is: (n − 1)
are heads, and one is tails, but we don't know which one.
If we look at the rst coin and it is heads, does it give us
any information on the second one? Well, not much, but
actually a little bit. Instead of having probability 1/n of
being tails, it is now 1/(n − 1). So the coin tosses cannot
have been independent. And in fact if the rst coin is tails,
then the second coin is surely heads, etc.
79
Concerning entropy we have to be much more careful.
80
bors.
is due to Boltzmann.
S = k log W (41)
81
Figure 20: Boltzmann tombstone.
entropy.
82
A.: No. These are two separate issues. S doesn't have to
do with the quantum mechanical uncertainty which we en-
counter when measuring an observable, if the state of the
system is not one of the eigenstates of the observable.
c = h̄ = G = kB = 1
83
But the energy of a molecule for example is approximately
equal to its temperature in certain units. Those units con-
tain a conversion factor kB .
84