Probability Handbook Revised
Probability Handbook Revised
Guido Bacciagaluppi†
14 February 2014
1
to its angular momentum in the same direction (i.e. proportional to how
fast it is spinning along that axis). In quantum mechanics one observes a
similar effect, except that the object is deflected only by discrete amounts,
as if its classical angular momentum could take only certain values (varying
by units of Planck’s constant ~). Some particles even seem to possess an
intrinsic such ‘spin’ (i.e. not derived from any rotational motion), e.g. so-
called spin-1/2 systems, such as electrons, which get deflected by amounts
corresponding to spin values ± ~2 . Experiments of this kind are known as
Stern–Gerlach experiments, and they can be used to illustrate some of the
most important features of quantum mechanics.
Imagine a beam of electrons, say, moving along the y-axis and encoun-
tering a region with a magnetic field inhomogeneous along the x-axis. The
beam will split in two, as can be ascertained by placing a screen on the other
side of the experiment, and observing that particle detections are localised
around two distinct spots on the screen (needless to say, real experiments
are a little messier than this).1
The first thing to point out is that if we send identically prepared elec-
trons one by one through such an apparatus, each of them will trigger only
one detection, either in the upper half or the lower half of the screen, with
probabilities depending on the initial preparation of the incoming electrons.
The second thing to point out is that the same is true whatever the
direction in which the inhomogeneous magnetic field is laid, whether along
the x-axis, the z-axis, or any other direction: the beam of electrons will
always be split in two components, corresponding to a spin value ± ~2 along
the direction of inhomogeneity of the field. If the incoming beam happens
to be prepared by selecting one of the deflected beams in a previous Stern–
Gerlach experiment (say, a beam of ‘spin-x up’ electrons), then the proba-
bilities for detection in a further Stern–Gerlach experiment in a direction x0
depend only on the angle ϑ between x and x0 (and are given by cos2 (ϑ/2)
and sin2 (ϑ/2)). So, for example, the probability of measuring spin-z ‘up’ or
‘down’ in a beam of spin-x ‘up’ electrons is 1/2.
2
be the joint distribution for the outcomes of different experiments, because
performing one kind of experiment (say, measuring spin-z) disturbs the prob-
abilities relating to other subsequent experiments (say, spin-x). Indeed, if we
imagine performing a spin-z followed by a spin-x measurement on electrons
originally prepared in a spin-x up state, we shall get a 50–50 distribution
for the results of the last measurement (whether we previosuly got spin-z
up or down), although the original beam was 100% spin-x up. At least in
this sense, different measurements in general are incompatible.
There is thus a genuine puzzle (one of many!) about whether and how
the probability measures defined over the outcome spaces of the different ex-
periments can be combined. This will be one of the main questions discussed
in this article.
3
complex vector space (with the usual scalar product, which we denote hϕ|ψi).
Vectors are usually normalised to unit length, since any two vectors |ψi and
α|ψi that are multiples of each other are considered physically equivalent.
Each pair of up and down states is taken to correspond to an orthonormal
basis, e.g. the spin-x and spin-z states, related by
1 1
|+z i = √ (|+x i + |−x i) , |−z i = √ (|+x i − |−x i). (2)
2 2
Thus, while each spin-z state can be split into both up and down components
in the spin-x basis, the down components, say, can cancel out again if the
two spin-z states are appropriately combined:
1 1 1 1
√ (|+z i + |−z i) = √ √ (|+x i + |−x i) + √ (|+x i − |−x i) = |+x i . (3)
2 2 2 2
These operators are in fact unitary, i.e. preserve the length of vectors (and
more generally scalar products between them). In a Stern–Gerlach measure-
ment, the relevant unitary evolution is generated by an operator containing
a term proportional to a ‘spin operator’, e.g. the z-spin operator Sz , written
~ 1 0
Sz = (5)
2 0 −1
in the spin-z basis, which thus simply multiplies by a scalar the spin-z
vectors:
~
Sz |±z i = ± |±z i (6)
2
(the spin-z states are eigenstates of the operator with corresponding eigen-
values ± ~2 ). During the measurement, this term couples the spin-z eigen-
states (which it leaves invariant) to the spatial degrees of freedom of the
electron, thus deflecting the motion of the electron:
(where we assume that |ψ± i are states of the spatial degrees of freedom of the
electron in which the electron is localised in two non-overlapping regions).
4
Measurable quantities in quantum mechanics (usually called ‘observ-
ables’) are traditionally associated with such operators, the corresponding
eigenvalues being the values the observable can take.3 Two observables thus
understood will be compatible if they have all eigenvectors in common, or
equivalently if the associated operators commute, i.e. AB|ψi = BA|ψi for all
states |ψi. Incompatibility of quantum mechanical observables is intuitively
related to the idea that measurements of non-commuting observables gen-
erally require mutually exclusive experimental arrangements (implemented
through appropriate unitary operators).
5
eigenvalues 1 and 0 (or 0 and 1). Thus, for instance,
1 1 1
|+x i = √ (|+z i + |−z i) 7→ P+z √ (|+z i + |−z i) = √ |+z i (9)
2 2 2
or
1 1
|+x i 7→ P−z √ (|+z i + |−z i) = √ |−z i . (10)
2 2
(The final state is then thought of as renormalised, i. e. rescaled to the unit
vector |+z i or |−z i, respectively.)
6
Equivalently, since every (σ-)field of sets is a Boolean (σ-)algebra, and by
Stone’s theorem every Boolean (σ-)algebra is representable as a (σ-)field
of sets, one can take the event space B to be a Boolean (σ-)algebra, and
re-express (C1)–(C3) accordingly:
where we use 0 and 1 to denote also the zero and unit elements of the
algebra. In the following, we shall take the set Ω to be discrete (finite or
denumerable), and we shall take for simplicity all singletons {ω} ⊂ Ω to be
measurable.
7
and for any subset J of I:
X XX
p( ei ) := ei (ω)p({ω}) . (13)
i∈J i∈J ω∈Ω
(This is correctly normalised because of (11) and (C30 ).) In the special case
in which all ei are ‘sharp’ (ei (1−ei ) = 0, where 0 is the random variable that
is identically 0) — i. e. characteristic functions —, we see that the probabil-
ities
P are just the measures of the sets defined by the characteristic functions
i∈J ei , so that the ‘sharp observables’ are in bijective correspondence with
the (finite or denumerable) partitions of Ω, and ‘measuring’ a sharp observ-
able is simply a procedure for distinguishing between the events forming
such a partition.
(we can call this the finest sharp observable). Now, every effect e can be
written as X
e(ω)χ{ω} . (15)
ω∈Ω
Since for any observable {ei }, in particular also for (14), the probability of
each ei is given by (12), we have that
X
p(χ{ω} ) = χ{ω} p({ω 0 }) = χ{ω} p({ω}) = p({ω}) , (16)
ω 0 ∈Ω
and thus X
p(ei ) = ei (ω)p(χ{ω} ) . (17)
ω∈Ω
And we see that ei (ω) can be interpreted as the conditional probability for
the response ei in the experiment {ei }, given that a (counterfactual) mea-
surement of the finest sharp observable would have yielded ω.
8
only the Boolean algebras associated with measurements of sharp observ-
ables that correspond to Boolean subalgebras of B. (As we shall have again
occasion to remark in Section 4, sharp observables have a number of useful
properties not shared by general observables.)
There are two further notions we wish to introduce with an eye to the
analogy with quantum probability. One is a notion of compatibility of ob-
servables. To this end, we first introduce the coarse-graining of observables:
the observable {ei }i∈I is a coarse-graining of the observable {gk }k∈K iff there
is a partition of the index set K = ∪i∈I Ki such that for all i ∈ I,
X
ei = gk (18)
k∈Ki
9
(C200 ) p(1) = 1 ,
e : p 7→ p(e) . (19)
States are associated in the first place with (unit) vectors in the space.7
As in our example of a Stern–Gerlach measurement and the associated ‘col-
lapse’ of the state (9–10), experiments are generally and abstractly associ-
ated with probabilistic transformations of the states, corresponding to the
6
The Hilbert spaces used in standard quantum mechanics are over the complex num-
bers, and they are separable (i. e. they always have either a finite or a denumerable basis).
7
The notation we use is the so-called Dirac notation, in which scalar products are
denoted by angle brackets, hϕ|ψi, and vectors (‘kets’) are denoted by right half-brackets,
|ψi (a left half-bracket, hϕ|, denotes the linear functional (‘bra’) assigning to each vector
|ψi the complex number hϕ|ψi).
10
idea that ‘measurements’ induce an irreducible disturbance of a quantum
system:
|ψi 7→ A|ψi (20)
(where the right-hand side should be thought of as suitably renormalised).8
The probability for a transition of the form (20) is given by
which is the scalar product of the vector A|ψi with itself.9 The operator
A in (20) is arbitrary (in particular not necessarily unitary or self-adjoint),
the only restriction being that (21) be no greater than 1. Note that the
probability (21) depends only on the product A∗ A and not on the specific
transformation A (indeed, one can construct infinitely many other operators
B such that B ∗ B = A∗ A). Operators of the form E = A∗ A for some
transformation A are called ‘effects’.10
This is required to hold for all possible unit vectors |ψi, so in fact we have
X
A∗i Ai = 1 , (24)
i
8
As we point out at the end of this section, one can consider even more general trans-
formations, but that does not in the least affect the generality of what follows.
9
The operator A∗ , called the adjoint of A, is the operator defined by hϕ|A∗ ψi = hAϕ|ψi
(ignoring niceties about domains of definition, which become vacuous if the Hilbert space
is finite-dimensional). Self-adjoint operators are operators with A∗ = A. Note that
(AB)∗ = B ∗ A∗ , so that in particular an operator of the form A∗ A is self-adjoint. Note
also that for A self-adjoint, the mapping A 7→ hψ|Aψi (normally written A 7→ hψ|A|ψi) is
a linear functional onto the positive reals that is normalised, in the sense that hψ|1|ψi = 1
(with 1 the identity operator), and is continuous with respect to the so-called operator
norm (again vacuously so in finite dimensions).
10
More explicitly, effects are operators E such that both E = A∗ A and 1 − E = B ∗ B
for some operators A, B (thus ensuring that the expression (21) is between 0 and 1).
11
or, writing Ei := A∗i Ai , X
Ei = 1 , (25)
i
Note that the probability of an effect (in any given state) is independent
of which observable the effect is part of, i. e. which family of alternative
transformations is being implemented in a particular experiment. We shall
return to this ‘non-contextuality’ of probabilities in Sections 5 and 6 below.12
Suffice it to say now that it is a non-trivial feature because, unlike the
classical case, the same effect could be part of two mutually incompatible
observables.
Any experiment that measures {Gk } also measures {Ei }. As in the classical
case, we call two observables {Ei } and {Fj } compatible iff there is an observ-
able {Gk } such that {Ei } and {Fj } are both coarse-grainings of {Gk }. The
observable {Gk } is called a joint observable for (or a joint fine-graining of)
{Ei } and {Fj }. Compatibility of two observables can be easily generalised
to joint compatibility of arbitrary sets of observables.
11
As in the classical case, one could consider more general observables (even in finite
dimensions), in which the sum in (25) is replaced or supplemented by an integral. For
simplicity, however, we consider only discrete resolutions of the identity.
12
More generally, the term non-contextuality is used to denote independence not only
of the observable measured but also of any details of the measurement context (Shimony
(1984) calls these, respectively, ‘algebraic’ and ‘environmental’ contextuality). Unless
we have a very weird theory, we can presumably assume that the observable measured
is fixed by the details of the experiment (which in quantum mechanics also determine
which transformation implements any particular effect). Cf. also our distinction between
‘observables’ and ‘experiments’ in Section 4.
12
The definition of an observable in any (older) textbook on quantum
mechanics is as a self-adjoint operator A, i. e. an operator with A∗ = A. But
this traditional definition corresponds to a special case of the oneP above.
Self-adjoint operators are diagonalisable, in the sense that A = i ai Pi
with real ai (the eigenvalues) and {Pi } a family of projections (self-adjoint
operators with Pi2 = Pi , or Pi (1 − Pi ) = 0, where 0 is the zero operator)
that are mutually orthogonal (Pi Pj = 0 for i 6= j).13 Thus, each self-
adjoint operator is associated with a unique ‘projection-valued observable’,
i. e. a resolution of the identity (25), in which the effects Ei are in fact
projections (they are ‘sharp’, meaning Ei (1 − Ei ) = 0), and which is finite
if the Hilbert space is finite-dimensional. Note also that a measurement
of such a ‘sharp observable’ can be implemented by taking Ai = Pi , since
Pi∗ Pi = Pi , i. e. each state is transformed to an eigenstate of the measured
observable. This is the usual ‘collapse postulate’ or ‘projection postulate’
of textbook quantum mechanics, corresponding to a ‘minimally disturbing’
measurement of a sharp observable.14
13
sharp observables possess a joint projection-valued resolution of the iden-
tity.15 Indeed, since in finite dimensions all diagonal decompositions are
discrete, one can generalise it further to arbitrary sets of pairwise commut-
ing operators.
where all the eigenvalues ek are positive and lie in the interval [0, 1].16 Now
suppose all effects Ei in an observable (25) commute: there will then exist a
single projection-valued resolution of the identity {Rk }, such that every Ei
can be written as X
Ei = eik Rk . (30)
k
and we can again at least formally identify the coefficient eik as the condi-
tional probability that the measurement of {Ei } yields i given that a measure-
ment of {Rk } would have yielded k.17 Thus, we can think of a commutative
15
This property of the sharp observables in quantum mechanics is known as ‘coherence’.
See also Section 5 below.
16
Indeed, an equivalent definition of an effect is as a self-adjoint operator with spectrum
in the interval [0, 1].
17
We have the correct normalisation because the Ei form a resolution of the identity
(25), so that
XX i X X
ek hψ|Rk |ψi = hψ|Ei |ψi = hψ| Ei |ψi = 1 ,
i k i i
and choosing |ψi to be an eigenstate with eigenvalue 1 of, say, Rk0 , we get
XX i XX i X i
1= ek hψ|Rk |ψi = ek δkk0 = ek0 ,
i k i k i
14
effect-valued observable as (at least probabilistically equivalent to) a ‘noisy’
or ‘fuzzy’ or ‘unsharp’ measurement of an associated projection-valued ob-
servable.
15
(Q1) For all E, pψ (E) = hψ|E|ψi ∈ [0, 1] ,
16
sphere corresponds to the so-called maximally mixed state, which assigns
equal probabilities 21 to spin up or down in any spatial direction r.
Note that, in turn, effects can be identified with the mappings from the
states into [0, 1] defined by
E : |ψi 7→ pψ (E) . (32)
These mappings are also affine, i. e. map convex combinations of states to
the same convex combination of the corresponding probabilities.22
17
probability. The main lines of research in this tradition stem, respectively,
from the classic paper by Birkhoff and von Neumann (1936), which inau-
gurated the lattice-theoretic version of quantum logic (with its emphasis
on weakening the distributive law, originally in favour of modularity and
then of orthomodularity), and from the work on partial Boolean algebras by
Specker (1960) and Kochen and Specker (1965a,b, 1967) (with its emphasis
on partial operations).23
18
that each effect e defines an affine mapping from the states to the unit
interval:
e : P → [0, 1], p 7→ p(e) . (34)
It may be convenient to require that every such mapping corresponds to an
effect, but for our limited purposes we shall not do so.
0A ∼ 0B and 1A ∼ 1B (35)
(where 0A and 1A are the 0 and 1 elements of the Boolean algebra BA , and
similarly for BB ). Indeed, p(0A ) = 0 and p(1A ) = 1 for all p, independently
of A. We can thus define an effect 0 and an effect 1 as
and
1 := [1A ] independently of A . (37)
Similarly, for any A, B ∈ A, if eA ∼ eB then ¬eA ∼ ¬eB (where ¬ denotes
negation in the relevant Boolean algebra). Indeed, if p(eA ) = p(eB ) for all
p, then
Clearly, for any e, we have e⊥⊥ = e. Note, however, that it is perfectly pos-
sible for some e that e⊥ = e (this is the case for both the response function
1 1
2 1 in classical probability and the effect 2 1 in quantum probability).
with results of laboratory procedures, such as letting an electron pass through an inho-
mogeneous magnetic field and observing it hit the upper (or lower) half of a screen; or
opening a box and finding (or failing to find) a gem inside it. But this will clearly not
do. Despite the lip service to operationalism, such procedures will always abstract away
from aspects of the experimental setting deemed irrelevant, e. g. whether or not I am per-
forming the experiment standing on one leg. But to deem any detail of the experimental
arrangement irrelevant is already to make a theoretical decision. To quote two suggestive
examples: it might be important whether the magnetic field is stronger at the north or
the south pole of the magnet (this detail is irrelevant in standard quantum mechanics,
but makes all the difference in the description of spin measurements in de Broglie and
Bohm’s pilot-wave theory, as discussed e. g. by Albert (1992)); or it might be important
whether one opens box A together with box B or together with box C (we shall discuss
this example explicitly in Section 6).
19
The states naturally induce a partial ordering on the effects — which
will be a useful tool in the following — defined as
Note that
e ≤ f ⇔ f ⊥ ≤ e⊥ . (41)
We now introduce two important notions: compatibility and orthogonality
of effects.
Two effects e and f are compatible, written e$f , iff there is an experiment
A and outcomes eA ∈ BA and fA ∈ BA such that e = [eA ] and f = [fA ].
That is, two effects are compatible iff they can be measured in a single
experiment. The definition of compatibility can be trivially extended to
finite sets of effects.
By the requirement above we also have that orthogonal effects are com-
patible. Indeed, e ⊥ f means that e and f ⊥ are comparable, and thus
20
compatible. But if eA ∈ e and ¬fA ∈ f ⊥ are in the same Boolean algebra
BA , then so are eA and fA , thus e and f are compatible. It also follows
that if e ⊥ f , i. e. p(e) + p(f ) ≤ 1 for all p, there is an effect e ⊕ f jointly
compatible with e and f , such that p(e ⊕ f ) = p(e) + p(f ) for all p. Indeed,
given the above, we can define
f = e ⊕ (f e) , (44)
or
f = e ⊕ (e ⊕ f ⊥ )⊥ (45)
(the so-called ‘effect algebra orthomodular identity’).
If ordered chains are jointly compatible it follows that also jointly orthog-
onal sets of effects are jointly compatible. Indeed, given a jointly orthogonal
set of effects {ei }N
i=1 , the sequence of effects
e1 , e1 ⊕ e2 , (e1 ⊕ e2 ) ⊕ e3 , . . . (46)
is an ordered chain, and so is a jointly compatible set. But if e1A , e1A ∨ e2A , . . .
are in the same Boolean algebra BA , so are e1A , e2A , . . ., and the original set
{ei } is jointly compatible.
21
(G1) For all e ∈ E, p(e) ∈ [0, 1] ,
(G2) p(1) = 1 ,
Since jointly orthogonal sets are jointly compatible, to each such observable
on E there corresponds at least one experiment A ∈ A. Coarse-graining
and compatibility of observables can be defined as above, and compatibility
of two effects e and f is trivially equivalent to compatibility of the two
observables {e, 1 − e} and {f, 1 − f }.
Proof :
Define ⊕ as above. Since the relation ⊥ is symmetric, if e ⊕ f is defined, so
is f ⊕ e, and (E1) follows because
22
Next, assume that e ⊕ f and (e ⊕ f ) ⊕ g are defined, i. e.
Then also
Further, for each e the unique element satisfying (E3) is the element e⊥
defined by (39): clearly e ⊕ e⊥ is defined and e ⊕ e⊥ = 1; and because of
(43), if there are two effects f and f 0 both satisfying
Note that in any effect algebra, we can abstractly define a partial order
e ≤ f as: there is a g such that e ⊕ g = f . Given our definition of ⊕, it
follows from (43) and (44) above that our previous definition of the partial
order on E coincides with the abstract one.
23
It is easy to see that also our definition of compatibility for E coincides
with the abstract one. For instance (and similarly for finitely many effects),
if two effects e and f are compatible in our sense above, there is an experi-
ment A ∈ A and experimental outcomes eA ∈ e and fA ∈ f in BA . In this
case the effects g := [eA ∧ fA ], h := [eA ∧ ¬fA ] and i := [¬eA ∧ fA ] form a
jointly orthogonal set with e = g ⊕ h and f = g ⊕ i (such a ‘minimal’ orthog-
onal decomposition is called a Mackey decomposition).26 Conversely, if two
effects e and f are compatible in the abstract sense, they have an orthogonal
decomposition (in fact a Mackey decomposition). But orthogonality in the
abstract sense coincides with orthogonality in our sense above, and we have
already seen that this implies compatibility also in our sense.
Using the partial order, we can finally define sharp elements of the ef-
fect algebra (‘sharp effects’ or ‘projections’), as those satisfying e ∧ e⊥ = 0
(meaning that the greatest lower bound of e and e⊥ exists and is 0). Since
0 is the minimal element of the partially ordered set (poset) E, this in turn
means that every lower bound of e and e⊥ is 0, i. e.
Proof :
Let e be sharp, and let e ⊕ e be defined. Then, by (43),
24
Note that the sharp elements of an arbitrary effect algebra need not form
an orthoalgebra in general, so this last result depends in fact on how we have
constructed E.
These are all properties of the orthoalgebras of sharp effects in both quantum
and classical probability (indeed, we have seen this explicitly in the case of
coherence).27
27
For good discussions, see Hardegree and Frazer (1981) and Hughes (1985). Note that
the effect algebras of quantum and classical probability are less well-behaved. For instance,
ef and min(e, f ) are alternative elements g defining non-unique Mackey decompositions
for any two response functions e and f (and analogously for any two commuting quantum
effects E and F ). And 41 1, 31 1, 12 1 are pairwise orthogonal quantum effects that fail to be
jointly orthogonal (and analogously for multiples of the classical response function 1).
25
A well-known problem, however, relates to the existence of tensor prod-
ucts, i. e. the possibility of composing generalised probability structures.
If enough states exist (in a well-defined sense), one can construct tensor
products of orthoalgebras, but forming tensor products tends to destroy
orthocoherence. In this respect, the theory still needs to be investigated
further.28
26
Indeed, if two sharp effects e and f are not compatible, the joint probability
of e ∧ f might not be experimentally meaningful, but by the same token
no experiment would constrain us in choosing a measure that specified also
these joint probabilities (e. g. by considering e and f as independent) — or
so it might seem. And if the orthoalgebra of sharp events is rich enough,
one might get back all the original observables by considering unsharp reali-
sations of sharp observables, as in the classical case of Section 2. Even if we
considered this a purely formal construction, it would still mean that our
‘generalised probability spaces’ are indeed embeddable into classical prob-
ability spaces, and thus provide only a fairly trivial generalisation of the
formalism of classical probability.
Where this argument goes wrong, however, is in failing to realise the rich-
ness of the compatibility structure in a general orthoalgebra. The relation
of compatibility is clearly reflexive and symmetric, but it is not transitive,
so that observables do not fall neatly into different equivalence classes of
mutually compatible ones. Put slightly differently, if compatibility is not
transitive, it is possible for the same observable to be a coarse-graining of
two mutually incompatible observables — which in this sense can be said
to be partially compatible. And the interlocking structure of partially com-
patible observables (and the corresponding partially overlapping probability
measures) can be surprisingly rich. In such cases the question of whether
general probabilistic states might be induced by classical probability mea-
sures becomes non-trivial.
Imagine that we have three boxes, A, B and C. We can open any box,
and find (or fail to find) a gem in it. Let e, f, g be the outcomes ‘finding a
gem in the box’ in each of the three experiments. We further imagine that
we can open any two but not all three boxes simultaneously. We finally
imagine that we have probabilistic states specifying for any pair of boxes
the probabilities for finding gems in neither, either or both of the boxes.30
29
In fact our example generalises one given by Albert (1992) for the purpose of illus-
trating the Bell inequalities.
30
Cf. Specker’s tale of the Sage of Nineveh in Section 6 below.
27
Now let us take a special state, such that for any ordered pair (x, y) with
x, y ∈ {e, f, g}:
p(x|y) = p(¬x|¬y) = α , (55)
with α ∈ [0, 1]. For α = 1, such a state obviously has the form
p(x ∧ y) = a ,
p(¬x ∧ ¬y) = 1 − a ,
(56)
p(x ∧ ¬y) = 0 ,
p(¬x ∧ y) = 0
for all pairs (x, y), for some a ∈ [0, 1]. And it is equally obvious that for any
a, the state is (uniquely) induced by the probability measure defined by
and
Lemma:
A state satisfying (55) with α 6= 1 is uniquely given by
α
p(x ∧ y) = p(¬x ∧ ¬y) = , (59)
2
and
1−α
p(x ∧ ¬y) = p(¬x ∧ y) = (60)
2
for any (x, y). Thus in particular,
1
p(x) = p(¬x) = p(y) = p(¬y) = . (61)
2
Proposition:
Under the assumptions of the Lemma, p is induced by a joint probability
28
measure on the Boolean algebra (formally) generated by {e, f, g} if and only
if α ≥ 13 .
The proofs are left for the Appendix. Intuitively, the case α = 1 cor-
responds to perfect correlations between finding or not finding gems in any
two boxes, and it is indeed obvious that this state can be extended to a
probability measure in which there are perfect correlations between finding
gems in all three boxes. To take an intermediate case, α = 12 is the uncorre-
lated case, in which any two boxes are independent, and is again obviously
extendable to the case in which all three boxes are independent. The case
α = 0 instead is the perfectly anti-correlated case, and is clearly not clas-
sically reproducible: if whenever there is a gem in the first box there is no
gem in the second, and whenever there is no gem in the second there is one
in the third, then whenever there is a gem in the first box there also is one in
the third, contradicting the hypothesis.31 In fact, every state with α < 1 is
a convex combination of the (unique) perfectly anti-correlated state and the
(special) perfectly correlated state with p(e) = p(f ) = p(g) = 21 . While all
states with positive correlations, the uncorrelated state, and even some with
negative correlations are reproducible classically (all states with α ≥ 31 ), if
the perfectly anti-correlated component comes to dominate too strongly, the
negative correlations can no longer be reproduced by a classical probability
measure.
29
Take two spin- 12 systems in the so-called singlet state,
1
√ (|+i|−i + |−i|+i) . (62)
2
In this state, results of spin measurements in the same direction on the two
electrons are perfectly anti-correlated. The singlet state (62) is also rota-
tionally symmetric, so that perfect anti-correlations are obtained for pairs of
parallel measurements in whatever direction.33 Pairs of spin measurements
on different particles are always compatible, and the joint probability for
spin up in direction r on the left and −r0 on the right is equal to cos2 (ϑ/2),
where ϑ is the angle between r and r0 . Note that taking the two directions
r, r00 on the left and the two directions −r0 , −r on the right, one obtains four
compatible pairs comprising each one direction on the left and one on the
right.
It is obvious that one can have three spatial directions pairwise spanning
the same angle ϑ iff this angle is between 0 (when the three directions
are collinear) and 120 degrees (when they are coplanar). This correponds
exactly to values of cos2 (ϑ/2) between 1 and 41 . And it gives us a quantum
model for states satisfying (55) with 14 ≤ α ≤ 1.34
non-local non-quantum states (‘Popescu–Rohrlich boxes’) needed for its simulation (see
e. g. Cerf et al. 2005).
33
Here we are going slightly beyond the brief description of entangled states given in Sec-
tion 1. The point to grasp is that measurements on different subsystems are always com-
patible with each other, so we can consider the probabilities for such joint measuremenst
in an entangled state, and it turns out that some such states provide perfect examples of
correlations that cannot be reproduced classically.
34
A related quantum model of the same correlations is the following. Take a single spin-
1
2
system and take spin states corresponding to spin up in three directions r, r0 , r00 pairwise
spanning the same angle ϑ. The transition probability hψ|ϕi between any two such spin
states is also equal to cos2 (ϑ/2). Of course, sharp spin observables in non-collinear direc-
tions are not compatible, because the projections on the spin states in different directions
do not commute. So these transition probabilities only correspond to conditional prob-
abilities for outcomes of sequential minimally disturbing spin measurements. But if we
assume the initial state of the electron to be the maximally mixed state (which assigns
30
We can now make explicit the connection with classic results in quantum
mechanics about ruling out various kinds of ‘hidden variables’ models, in
particular the Bell inequalities and the Kochen–Specker theorem.
It was Fine (1982) who first pointed out that (63) is also the necessary
and sufficient condition for the existence of a joint probability measure for
the observables A, A0 , B, B 0 when the marginals for the four compatible pairs
are given. Such a joint probability measure is known as a ‘non-contextual
hidden variables’ model of the experimental situation, since the same mea-
sure returns the correct marginals irrespective of how an observable is paired
with an observable on the other side, or indeed on how an observable is as-
sumed to be measured.35 Pitowsky (1989a,b, 1991, 1994) then gave a general
probability 12 to spin in any direction), one can see that the statistics of such sequential
measurements on any two spin observables are independent of the order of measurement
(and the marginal statistics are the same as the statistics for single measurements). We
can thus say that any two spin observables are compatible in the maximally mixed state
(with either sequential observable playing the role of a joint fine-graining of the two single
observables). With this understanding, we can reproduce the example also using a single
spin system. (It is a general fact that using the perfect correlations of the singlet state,
one can always translate between results about the (im)possibility of modelling certain
correlations in two systems (‘non-locality’ results) and results about the (im)possibility of
modelling certain correlations in a single system (‘non-contextuality’) results.)
35
More generally (and, indeed, in the case of Bell’s derivation), one could consider
31
and systematic treatment of necessary and sufficient conditions for the ex-
istence of joint probability measures in terms of such inequalities, further
pointing out that these results had already been anticipated more than a
century earlier by George Boole (1862).36
for any three two-valued observables. Interpreting any two of them as ‘find-
ing or not finding a gem in the box’, and substituting the probabilities (55)
into (64), we have
α 1−α
E(XY ) = 2 −2 = α − (1 − α) = 2α − 1 (66)
2 2
for any distinct X, Y , and (65) becomes
− 2 ≤ 6α − 3 − 1 ≤ 2 , (67)
that is
2 ≤ 6α ≤ 6 , (68)
thus α ∈ [ 31 , 1], as above.
‘contextual hidden variables’ models of the correlations, i. e. allow the use of a different
classical probability measure in each experimental context, thus treating for instance the
outcomes of a measurement of A as different events in the two cases in which A is measured
together with B or together with B 0 . The relation between Bell’s and Fine’s derivations
of the CHSH inequalities can then be seen as follows. Take the case of the singlet state,
and assume that the two systems are spatially (or space-like) separated, call them the
left-hand and right-hand system. Assume that the probabilities for the outcomes of a spin
measurement on the left-hand system depend on the experimental context on the left.
But then, given the perfect (anti-)correlations, also the probabilities for outcomes of spin
measurements in the same direction on the right-hand side will (non-locally!) depend on
the experimental context on the left. Thus, a local hidden variables model that reproduces
the correlations for the singlet state must be non-contextual, and so by Fine’s theorem
necessarily obey the CHSH inequality. (We shall return to the idea of contextual classical
models of general probabilistic structures in our final Section 6.)
36
See also Beltrametti and Bugajski (1996) for further discussion of the case in which
probabilities fail to be induced by a joint probability measure (which they call the ‘Bell
phenomenon’).
32
The Kochen–Specker theorem instead takes finite sets of projection-
valued observables in a Hilbert space of dimension at least 3, that may
pairwise share a projection (partially compatible observables), and consid-
ers the question of whether values 1 and 0 may be assigned to the projections
in such a way that exactly one projection from each observable is assigned
the value 1. (The theorem was first announced in Specker (1960), and its
proof was published in Kochen and Specker (1967).)
The analogy goes both ways: any Kochen–Specker construction (of finite
sets of orthonormal bases that cannot be assigned values 1 and 0 in such a
way that exactly one vector in each basis is assigned 1) is equivalent to the
non-existence of trivial probability measures satisfying suitable constraints
(in three dimensions, these are p(x ∨ y|z) = 0 and p(¬x|¬y ∧ ¬z) = 0 for
all orthonormal triples). But the existence of trivial probability measures
satisfying such constraints is in fact equivalent to the existence of non-trivial
probability measures satisfying the same constraints. Thus, indeed, every
Kochen–Specker theorem can be translated into the violation of some Bell–
Pitowsky inequality.38
37
More economical proofs are now available even in 3 dimensions, using as little as 31
one-dimensional projections (in an unpublished proof by Conway and Kochen), and in
4 dimensions, using as little as 18 (Cabello, Estebaranz and Garcı́a Alcaine, 1996). For
details and references, see e. g. Bub (1997) and Held (2013).
38
If I am not mistaken, this is the intuitive way of understanding the ‘non-contextuality’
33
6 Is probability empirical (and quantum)?
The title of this section recalls (tongue-in-cheek) the title of the classic paper
by Putnam (1968) in which he notoriously argued that quantum mechan-
ics requires a fundamental revision of logic. Empirical considerations alone
presumably cannot decide the question of whether logic is an empirical or
an a priori discipline (as forcefully pointed out in another classic paper by
Dummett (1976)). But if one is already sympathetic to the idea that logic
is an empirical discipline, then it does make sense to ask what kind of em-
pirical evidence might suggest adopting this or that logic, and in particular
whether the evidence we have for quantum mechanics suggests adopting a
non-classical one (e. g. one based on Kochen and Specker’s partial Boolean
algebras). Essentially, the question boils down to whether quantum logic
should be seen as a derivative construct that is definable in terms of and
alongside classical logic, or whether classical logic should be seen as an in-
stance of quantum logic restricted to certain special ‘well-behaved’ cases.39
There is a sense in which the latter question can indeed be trivially an-
swered in the affirmative, by taking seriously the idea that a general proba-
bilistic state should be seen as a family of classical probability measures, but
denying that they in fact overlap. For instance, in our master example above
(say, with α = 0), this means simply that instead of describing the relevant
inequalities first introduced in Cabello (2008), and shown to describe any Kochen–Specker
contradiction in Badzia̧g et al. (2008).
39
This would be analogous to, say, the application of intuitionistic logic to finitary
problems in mathematics, for which also tertium non datur becomes an intuitionistically
valid principle. For a recent review, emphasising that the answer might depend rather
sensitively on the interpretation of quantum mechanics, see Bacciagaluppi (2009).
34
probabilistic structure using a single state that assigns the probabilities
1
p(e) = p(f ) = p(g) = (69)
2
to the outcomes e, f and g (and the appropriate joint probabilities to pairs),
we describe it using three different classical probability measures, which
are to be applied respectively to the experiments in which we measure e
and f together, or f and g together, or g and e together, and that assign,
respectively, the probabilities
1 1 1
pef (e) = pef (f ) = , pf g (f ) = pf g (g) = , pge (g) = pge (e) = , (70)
2 2 2
to the single outcomes (and the appropriate joint probabilities to the three
pairs). We see now that these three probability measures can be derived
from a single classical probability measure if we assign probabilities also to
performing each of the experiments ef , f g and ge. This is just the ‘naı̈ve’
argument we rehearsed at the beginning of Section 5, but which is now no
longer blocked, because we resist identifying the two events eef and ege as
one (and similarly for f and for g). In this (formal) sense, a ‘contextual
hidden variables theory’ is always possible.40 (Note, however, that if we
then imagine performing the three joint measurements ef , f g and ge in
sequence, then the measured value of at least one observable, say e, must
be different in the two measurements containing it, in this case ef and ge.
Thus we have some mysterious form of ‘disturbance through measurement’,
much like in our simple discussion of spin in Section 1.)
35
quantum system, which are then measured in various ways, then projections
in common to different observables (resolutions of the identity) should, in-
deed, be identified. If instead the properties of the system are some ‘hidden
variables’, which in the context of specific experimental arrangements lead
to certain experimental outcomes (perhaps with certain probabilities), then
projections no longer represent intrinsic properties of the system in general,
but only aspects of how systems can be probed in the context of specific
experimental situations.
These brief remarks suggest that the question of whether different experi-
mental outcomes ought to be identified should not be decided abstractly, but
rather in relation to specific theoretical commitments. We shall not attempt
a general discussion of this point, nor even an exhaustive one of hidden vari-
ables theories in quantum mechanics. What we shall do instead is illustrate
the point in some concrete implementations of our master example (mainly
for α = 0), which will enable us to see the possibility of underlying mecha-
nisms providing us with a rationale for deciding when different experimental
outcomes should be treated as different events.
36
no way worthy of her. And so that each should himself be assured of
his unworthiness, he promised her hand to the one who could perform
a set prophecy task. The suitor was led in front of a table on which
stood three boxes in a row, and urged to say which boxes contained a
gem and which were empty. Yet, as many as would try it, it appeared
impossible to perform the task. After his prophecy, each suitor was
in fact urged by the father to open two boxes that he had named as
both empty or as both not empty: it always proved to be that one
contained a gem and the other did not, and actually the gem lay now
in the first, now in the second of the opened boxes. But how should
it be possible, out of three boxes, to name no two as empty or as not
empty? Thus indeed the daughter would have remained unmarried
until her father’s death, had she not upon the prophecy of a prophet’s
son swiftly opened two boxes herself, namely one named as full and
one named as empty — which they yet truly turned out to be. Upon
the father’s weak protest that he wanted to have two different boxes
opened, she tried to open also the third box, which however proved
to be impossible, upon which the father, grumbling, let the unfalsified
prophecy count as successful.
What the father wants to establish is whether any of the suitors are
better prophets than himself (only then would he willingly surrender his
daughter’s hand in marriage). Whenever a suitor is set the task, the father
predicts which two boxes will be opened, and places exactly one gem at
random in one of the two boxes. (Note that, in this form, the example bears
some analogy to Newcomb’s paradox!) If we now assume that the father
possesses a genuine gift for clairvoyant prophecy, the action of opening boxes
A and B, or of opening A and C, has a retrocausal effect on whether the
father has placed the gem in either A or B, or has placed it in either A or
C.
37
single probabilistic state irrespective of the experimental situation. (Note
that should probabilities be defined also for which two of the boxes will be
opened, one could again introduce a single classical measure from which the
three probability measures arise through conditionalisation.)
At the centre of the box sits a firefly, which is attracted to the light of
our lantern, and thus enters at random one of the two chambers on the side
from which we are approaching. And mistaking our lantern for a potential
mating partner, the firefly starts to glow!
38
developed in detail, but retrocausality has long been recognised as a possi-
ble strategy to deal with the puzzles of quantum mechanics, in particular
in the face of the Bell inequalities.43 The firefly model instead more closely
resembles a theory like de Broglie and Bohm’s pilot-wave theory, in which
experimental outcomes depend on both the initial configuration of the sys-
tem (e.g. the position of an electron) and the details of the experimental
arrangement. This last point can probably best be seen in another slight
variant of the example.
Imagine that instead of the firefly we have a small metal ball in the cen-
tre of the box, and that each experiment consists of tilting the box towards
one of the sides, say AB. The ball rolls towards the side AB and bounces
off a metal pin either into chamber A or chamber B, depending on its exact
initial location to the left or the right of the symmetry axis perpendicular to
the side AB. It is now clear that the same initial position of the ball might
lead it to fall or not to fall into, say, chamber A, depending on whether the
whole box is tilted towards the side AB or the side CA (namely if the ball
is on the left of the symmetry axis through AB as well as to the left of
the symmetry axis through CA). Thus, depending on which way the box is
tilted, the ball ending up in A corresponds to a different random variable
on the probability space of initial positions of the ball. If the initial position
of the ball is uniformly distributed in a symmetric neighbourhood of the
centre of the triangle, the equal probabilities of the non-classical state are
reproduced. But if the initial position is not in such an ‘equilibrium’ distri-
bution, deviations from the probabilities in the Lemma can occur — so that
if one allows also such ‘disequilibrium’ hidden states, different experimental
outcomes are in fact no longer equiprobable in all states.44
39
deed to include cases with α > 0. For the latter, we need a cubical firefly
box, which we approach from any of the six faces (counting opposite faces
as equivalent). On each face, the four corners correspond to, say, e ∧ f
and ¬e ∧ ¬f across one diagonal, and e ∧ ¬f and ¬e ∧ f across the other,
and similarly with f and g, or g and e, on the other faces. The classical
cases can be obtained if the firefly just sits somewhere in the box (maybe
preferentially along one spatial diagonal — where food might be provided),
and starts to glow when it sees the light from our lantern. We then observe
the projections of the firefly’s position on the face from which we approach.
The non-classical cases can be obtained if the firefly moves towards the side
from which we are approaching, and through various obstacles is channelled
preferentially (although not always) along the planar diagonal correspond-
ing to the opposite outcomes for that face (say e ∧ ¬f and ¬e ∧ f ). We can
thus construct classical but contextual models that violate (our special case
of) the Bell inequalities, reproducing the quantum violations, or even the
non-quantum violations (reducing to the equilateral triangle in the limit).
Appendix
We give the proofs of the Lemma and the Proposition from Section 5.
40
assumption, and if p(¬y) 6= 0 but p(x) = 0, then p(¬x) = 1, and p(¬x|¬y) =
1, also contrary to assumption. Thus we can write
p(x ∧ ¬y)
= p(x|¬y) = 1 − p(¬x|¬y) = 1 − α =
p(¬y)
p(¬y ∧ x)
1 − p(y|x) = p(¬y|x) = . (71)
p(x)
But if
p(e) = p(¬f ) , p(f ) = p(¬g) , p(g) = p(¬e) , (73)
it follows that
1
p(e) = p(¬e) = p(f ) = p(¬f ) = p(g) = p(¬g) = . (74)
2
Finally, by (74) and assumption (55), we have
α
p(x ∧ y) = p(¬x ∧ ¬y) = , (75)
2
and
1−α
p(x ∧ ¬y) = p(¬x ∧ y) = (76)
2
for any (x, y). QED.
α
p(e ∧ f ∧ g) + p(¬e ∧ f ∧ g) = (79)
2
α
p(e ∧ ¬f ∧ ¬g) + p(¬e ∧ ¬f ∧ ¬g) = , (80)
2
41
and
α
p(e ∧ f ∧ g) + p(e ∧ ¬f ∧ g) = (81)
2
α
p(¬e ∧ f ∧ ¬g) + p(¬e ∧ ¬f ∧ ¬g) = , (82)
2
respectively. From (77), (79) and (81),
Thus α ≥ 31 . QED.
1
‘If’ implication: let 3 ≤ α < 1 (note that this construction works also with
α = 1), and let
3α − 1 1−α
a := and b := . (86)
4 4
We have a, b ∈ [0, 1]. Set
and
Then the probability measure p induces a state satisfying both (74) (because
a + 3b = 21 ) and (77)–(82) (because a + b = α2 ). Thus the state satisfies (55).
QED.
42
Note that this is not the unique probability measure inducing the given
state. As in the case α = 1, one need not have p(e∧f ∧g) = p(¬e∧¬f ∧¬g).
Indeed, for any ε ∈ [− min( 3α−1 1−α 3α−1 1−α
4 , 4 ), min( 4 , 4 )], one can set
3α − 1
p(e ∧ f ∧ g) = +ε (89)
4
and
3α − 1
p(¬e ∧ ¬f ∧ ¬g) = −ε , (90)
4
and extend via (83)–(84) to a probability measure inducing the same state.
Acknowledgements
I would like to thank Alan Hájek and Crish Hitchcock for their invitation to con-
tribute a version of this article to the Oxford Handbook of Probability and Philosophy
and the opportunity to write on this topic, and for very helpful feedback on a previ-
ous draft. I am further grateful to Alex Wilce for some extremely useful discussions
and hard-to-find references, to Jennifer Bailey for some stylistic advice, and to the
audience of the Philosophy of Physics seminar at the University of Aberdeen, who
heard preliminary versions of this material.
References
Bacciagaluppi, G., and Wilce, A. (in preparation), ‘Specker’s Principle and (Or-
tho)coherence’.
43
Badzia̧g, P., Bengtsson, I., Cabello, A., and Pitowsky, I. (2008), ‘Universality of
State-independent Violation of Correlation Inequalities for Noncontextual Theo-
ries’, Physical Review Letters 101, 210401, http://arxiv.org/abs/0809.0430.
Birkhoff, G., and von Neumann, J. (1936), ‘The Logic of Quantum Mechanics’,
Annals of Mathematics 37, 823–843. Reprinted in Hooker (1975), pp. 1–26.
Busch, P., Lahti, P. J. and Mittelstaedt, P. (1991), The Quantum Theory of Mea-
surement (Berlin: Springer).
Busch, P., Grabowski, M., and Lahti, P. J. (1995), Operational Quantum Physics
(Berlin: Springer).
44
Cabello, A., Severini, S., and Winter, A. (2010), ‘(Non-)Contextuality of Physical
Theories as an Axiom’, http://arxiv.org/abs/1010.2163.
Cattaneo, G., Marsico, T., Nisticò, G., and Bacciagaluppi, G. (1997) ‘A Concrete
Procedure for Obtaining Sharp Reconstructions of Unsharp Observables in Finite-
Dimensional Quantum Mechanics’, Foundations of Physics 27, 1323–1343.
Cerf, N. J., Gisin, N., Massar, S., and Popescu, S. (2005), ‘Simulating Maximal
Quantum Entanglement Without Communication’, Physical Review Letters 94(22),
220403.
Clauser, J. F., Horne, M. A., Shimony, A., and Holt, R. A. (1969), ‘Proposed Ex-
periment to Test Local Hidden-Variable Theories’ Physical Review Letters 23(15),
880–884.
Coecke, B., Moore, D. and Wilce, A. (eds.) (2000), Current Research in Operational
Quantum Logic (Dordrecht: Kluwer).
Dakić, B., and Brukner, Č. (2011), ‘Quantum Theory and Beyond: Is Entanglement
Special?’, in H. Halvorson (ed.), Deep Beauty: Understanding the Quantum World
through Mathematical Innovation (Cambridge: CUP), pp. 365–392, http://arxiv.
org/abs/0911.0695.
Fine, A. (1982), ‘Hidden Variables, Joint Probability and Bell Inequalities’, Physical
Review Letters 48, 291–295.
45
Foulis, D. J., and Bennett, M. K. (1994), ‘Effect Algebras and Unsharp Quantum
Logics’, Foundations of Physics 24, 1331–1352.
Foulis, D. J., and Randall, C. H. (1974), ‘Empirical Logic and Quantum Mechanics’,
Synthese 29, 81–111.
Foulis, D. J., and Randall, C. H. (1981), ‘Empirical Logic and Tensor Products’, in
H. Neumann (ed.), Interpretations and Foundations of Quantum Theory (Mannheim:
Bibliographisches Institut), pp. 9–20.
Ghirardi, G.C. (1997), Un’occhiata alle carte di Dio (Milano: Il Saggiatore). Transl.
by G. Malsbary as Sneaking a Look at God’s Cards (Princeton: Princeton University
Press, 2005).
Giuntini, R., and Greuling, H. (1989), ‘Toward a Formal Language for Unsharp
Properties’, Foundations of Physics 19, 931–945.
46
Held, C. (2013), ‘The Kochen–Specker Theorem’, in E. N. Zalta (ed.), The Stanford
Encyclopedia of Philosophy (Spring 2013 Edition), http://plato.stanford.edu/
archives/spr2013/entries/kochen-specker/.
Kochen, S., and Specker, E. P. (1967), ‘The Problem of Hidden Variables in Quan-
tum Mechanics’, Journal of Mathematics and Mechanics 17, 59–88. Reprinted in
Hooker (1975), pp. 293–328.
Masanes, L., and Müller, M. (2011), ‘A Derivation of Quantum Theory from Phys-
ical Requirements’, New Journal of Physics 13, 063001, http://arxiv.org/abs/
1004.1483.
Pitowsky, I. (1989a), ‘From George Boole to John Bell: The Origins of Bell’s In-
47
equality’, in M. Kafatos (ed.), Bell’s Theorem, Quantum Theory and the Concep-
tions of the Universe (Dordrecht: Kluwer), pp. 37–49.
Price, H. (1996), Time’s Arrow and Archimedes’ Point: New Directions for the
Physics of Time (Oxford: OUP).
Randall, C. H., and Foulis, D. J. (1970), ‘An Approach to Empirical Logic’, Amer-
ican Mathematical Monthly 77, 363–374.
Towler, M. D., Russell, N. J., and Valentini, A. (2012), ‘Timescales for Dynamical
Relaxation to the Born Rule’, Proceedings of the Royal Society A: Mathematical,
Physical and Engineering Science 468(2140), 990-1013, http://arxiv.org/abs/
1103.1589.
48
ters A 332(3), 187–193, http://arxiv.org/abs/quant-ph/0309107.
Wilce, A. (2000), ‘Test Spaces and Orthoalgebras’, in Coecke, Moore and Wilce
(2000), pp. 81–114.
Wilce, A. (2012), ‘Quantum Logic and Probability Theory’, in E. N. Zalta (ed.), The
Stanford Encyclopedia of Philosophy (Fall 2012 Edition), http://plato.stanford.
edu/archives/fall2012/entries/qt-quantlog/.
49