0% found this document useful (0 votes)
17 views

Probability Handbook Revised

1) The document discusses quantum probability and whether it requires a generalization of classical probability. It focuses on finite-dimensional quantum mechanics as an easier case to understand. 2) Key features of quantum mechanics are illustrated using the Stern-Gerlach experiment, where electron beams split depending on their spin. While individual measurements have well-defined probabilities, it is unclear how to define joint probabilities for incompatible measurements. 3) Interference phenomena suggest the probabilities cannot be explained by a disturbance model, posing a puzzle about how to combine probability measures over different experiments.

Uploaded by

Kerly Correa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Probability Handbook Revised

1) The document discusses quantum probability and whether it requires a generalization of classical probability. It focuses on finite-dimensional quantum mechanics as an easier case to understand. 2) Key features of quantum mechanics are illustrated using the Stern-Gerlach experiment, where electron beams split depending on their spin. While individual measurements have well-defined probabilities, it is unclear how to define joint probabilities for incompatible measurements. 3) Interference phenomena suggest the probabilities cannot be explained by a disturbance model, posing a puzzle about how to combine probability measures over different experiments.

Uploaded by

Kerly Correa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Quantum Probability: An Introduction∗

Guido Bacciagaluppi†

14 February 2014

The topic of probabilty in quantum mechanics is rather vast, and in


this article, we shall choose to discuss it from the perspective of whether
and in what sense quantum mechanics requires a generalisation of the usual
(Kolmogorovian) concept of probability. We shall focus on the case of finite-
dimensional quantum mechanics (which is analogous to that of discrete prob-
ability spaces), partly for simplicity and partly for ease of generalisation.
While we shall largely focus on formal aspects of quantum probability (in
particular the non-existence of joint distributions for incompatible observ-
ables), our discussion will relate also to notorious issues in the interpretation
of quantum mechanics. Indeed, whether quantum probability can or cannot
be ultimately reduced to classical probability connects rather nicely to the
question of ‘hidden variables’ in quantum mechanics.

1 Quantum mechanics (once over gently)

If a spinning charged object flies through an appropriately inhomogeneous


magnetic field, then according to the laws of classical physics it will experi-
ence a deflection, in the direction of the gradient of the field, proportional

An abridged version of this essay will appear in A. Hájek and C. Hitchcock (eds.),
The Oxford Handbook of Probability and Philosophy, Oxford University Press. The present
version includes in particular many more footnotes and references and the Appendix with
the proofs of the Lemma and Proposition of Section 5.

Department of Philosophy, University of Aberdeen; Institut d’Histoire et de Philoso-
phie des Sciences et des Techniques (CNRS, Paris 1, ENS); and Collegio Dottorale in
Filosofia, Epistemologia e Storia della Cultura, Università di Cagliari. Address for corre-
spondence: Department of Philosophy, University of Aberdeen, Old Brewery, High Street,
Aberdeen AB24 3UB, Scotland, U.K. (e-mail: [email protected]).

1
to its angular momentum in the same direction (i.e. proportional to how
fast it is spinning along that axis). In quantum mechanics one observes a
similar effect, except that the object is deflected only by discrete amounts,
as if its classical angular momentum could take only certain values (varying
by units of Planck’s constant ~). Some particles even seem to possess an
intrinsic such ‘spin’ (i.e. not derived from any rotational motion), e.g. so-
called spin-1/2 systems, such as electrons, which get deflected by amounts
corresponding to spin values ± ~2 . Experiments of this kind are known as
Stern–Gerlach experiments, and they can be used to illustrate some of the
most important features of quantum mechanics.

Imagine a beam of electrons, say, moving along the y-axis and encoun-
tering a region with a magnetic field inhomogeneous along the x-axis. The
beam will split in two, as can be ascertained by placing a screen on the other
side of the experiment, and observing that particle detections are localised
around two distinct spots on the screen (needless to say, real experiments
are a little messier than this).1

The first thing to point out is that if we send identically prepared elec-
trons one by one through such an apparatus, each of them will trigger only
one detection, either in the upper half or the lower half of the screen, with
probabilities depending on the initial preparation of the incoming electrons.

The second thing to point out is that the same is true whatever the
direction in which the inhomogeneous magnetic field is laid, whether along
the x-axis, the z-axis, or any other direction: the beam of electrons will
always be split in two components, corresponding to a spin value ± ~2 along
the direction of inhomogeneity of the field. If the incoming beam happens
to be prepared by selecting one of the deflected beams in a previous Stern–
Gerlach experiment (say, a beam of ‘spin-x up’ electrons), then the proba-
bilities for detection in a further Stern–Gerlach experiment in a direction x0
depend only on the angle ϑ between x and x0 (and are given by cos2 (ϑ/2)
and sin2 (ϑ/2)). So, for example, the probability of measuring spin-z ‘up’ or
‘down’ in a beam of spin-x ‘up’ electrons is 1/2.

For each given preparation procedure (each prepared ‘state’) we thus


have well-defined probability measures over the outcome spaces of various
experiments. It is not obvious, however, what in general (if any) should
1
Note also that reversing either the sign of the gradient or the polarity of the field will
produce a deflection in the opposite direction (in both the classical and quantum case).

2
be the joint distribution for the outcomes of different experiments, because
performing one kind of experiment (say, measuring spin-z) disturbs the prob-
abilities relating to other subsequent experiments (say, spin-x). Indeed, if we
imagine performing a spin-z followed by a spin-x measurement on electrons
originally prepared in a spin-x up state, we shall get a 50–50 distribution
for the results of the last measurement (whether we previosuly got spin-z
up or down), although the original beam was 100% spin-x up. At least in
this sense, different measurements in general are incompatible.

What is truly remarkable, however, and makes a straightforward hy-


pothesis of disturbance untenable, is that such a putative disturbance can
be undone if the spin-z up and spin-z down beam are brought together
again before the spin-x measurement. In this case one obtains again spin-x
up with probability 1, and this even if the whole experiment is performed
on one electron at a time. Thus this case cannot be explained by interaction
between different electrons when the two beams are brought together again.
It reminds one rather of typically wave-like phenomena: the up components
of the two beams appear to have interfered constructively, and the down
components of the two beams appear to have interfered destructively, al-
though it is each individual electron that displays this wave-like interference
behaviour.

There is thus a genuine puzzle (one of many!) about whether and how
the probability measures defined over the outcome spaces of the different ex-
periments can be combined. This will be one of the main questions discussed
in this article.

Let us very briefly review some standard bits of formalism (mainly to


ease the transition into the more abstract setting of Section 3).2 The natural
mathematical framework for describing interference phenomena is a vector
space, where any two elements, call them ‘states’, |ψi and |ϕi can be linearly
superposed,
α|ψi + β|ϕi . (1)
The spin degree of freedom of an electron is described by a two-dimensional
2
For more comprehensive but still relatively gentle introductions to quantum mechan-
ics (and its philosophy), see e. g. Albert (1992), Ghirardi (1997), Wallace (2008), Bac-
ciagaluppi (2013), and the relevant articles in the Stanford Encyclopedia of Philosophy
(http://plato.stanford.edu/). For treatments emphasising the modern notion of effect-
valued observables, used below in Section 3, see Busch, Lahti and Mittelstaedt (1991) and
Busch, Grabowski and Lahti (1995).

3
complex vector space (with the usual scalar product, which we denote hϕ|ψi).
Vectors are usually normalised to unit length, since any two vectors |ψi and
α|ψi that are multiples of each other are considered physically equivalent.
Each pair of up and down states is taken to correspond to an orthonormal
basis, e.g. the spin-x and spin-z states, related by
1 1
|+z i = √ (|+x i + |−x i) , |−z i = √ (|+x i − |−x i). (2)
2 2
Thus, while each spin-z state can be split into both up and down components
in the spin-x basis, the down components, say, can cancel out again if the
two spin-z states are appropriately combined:
1 1  1 1 
√ (|+z i + |−z i) = √ √ (|+x i + |−x i) + √ (|+x i − |−x i) = |+x i . (3)
2 2 2 2

Temporal evolution is given by the action of an appropriate group of


linear operators (i. e. of linear mappings, which map superpositions into the
corresponding superposition) on the states:

|ψi 7→ Ut2 t1 |ψi . (4)

These operators are in fact unitary, i.e. preserve the length of vectors (and
more generally scalar products between them). In a Stern–Gerlach measure-
ment, the relevant unitary evolution is generated by an operator containing
a term proportional to a ‘spin operator’, e.g. the z-spin operator Sz , written
 
~ 1 0
Sz = (5)
2 0 −1
in the spin-z basis, which thus simply multiplies by a scalar the spin-z
vectors:
~
Sz |±z i = ± |±z i (6)
2
(the spin-z states are eigenstates of the operator with corresponding eigen-
values ± ~2 ). During the measurement, this term couples the spin-z eigen-
states (which it leaves invariant) to the spatial degrees of freedom of the
electron, thus deflecting the motion of the electron:

|±z i|ψi 7→ U |±z i|ψi = |±z i|ψ± i (7)

(where we assume that |ψ± i are states of the spatial degrees of freedom of the
electron in which the electron is localised in two non-overlapping regions).

4
Measurable quantities in quantum mechanics (usually called ‘observ-
ables’) are traditionally associated with such operators, the corresponding
eigenvalues being the values the observable can take.3 Two observables thus
understood will be compatible if they have all eigenvectors in common, or
equivalently if the associated operators commute, i.e. AB|ψi = BA|ψi for all
states |ψi. Incompatibility of quantum mechanical observables is intuitively
related to the idea that measurements of non-commuting observables gen-
erally require mutually exclusive experimental arrangements (implemented
through appropriate unitary operators).

Note that if the initial state of the electron is a superposition of spin-z


states, e.g. the spin-x up state (3), the linearity of the evolution will preserve
the superposition:
1 1
√ (|+z i + |−z i)|ψi 7→ √ (|+z i|ψ+ i + |−z i|ψ− i) . (8)
2 2
The state (8) has no longer product from, unlike the states (7). Indeed, in
quantum mechanics the composition of degrees of freedom (or of different
systems) proceeds by taking the tensor product of the vector spaces describ-
ing the different degrees of freedom (or systems), which is the space of all
linear superpositions of product states. Non-product states are called entan-
gled, and are the source of some of the most peculiar features of quantum
mechanics. We can ignore them further, however, until we make contact
with the discussion of the Bell inequalities in Section 5.

The last thing we need to recall from elementary treatments of quantum


mechanics is what happens upon measurement. Namely, when a certain ob-
servable is measured, say Sz , the state of the system undergoes a stochastic
transformation (a ‘collapse’, or ‘reduction’, or ‘projection’), given mathe-
matically by the projection onto one of the eigenstates of the measured
 ob-

z 1 0
servable, i.e. by the application of the projection operator P+ =
0 0
 
0 0
(or P−z ) — the operator with eigenvectors |±z i and corresponding
0 1
3
In order for such an operator to generate a unitary group, it has to be self-adjoint,
which in finite
 dimensions
 simply means
 that the corresponding matrix is conjugate sym-
α∗ γ ∗

α β
metric, e.g. = . This implies further that all its eigenvalues are
β ∗ δ∗ γ δ
real and that the vector space has an orthonormal basis composed of eigenvectors of the
operator.

5
eigenvalues 1 and 0 (or 0 and 1). Thus, for instance,
1 1 1
|+x i = √ (|+z i + |−z i) 7→ P+z √ (|+z i + |−z i) = √ |+z i (9)
2 2 2
or
1 1
|+x i 7→ P−z √ (|+z i + |−z i) = √ |−z i . (10)
2 2
(The final state is then thought of as renormalised, i. e. rescaled to the unit
vector |+z i or |−z i, respectively.)

The probability for such a transformation is given by the so-called ‘Born


rule’: it is the modulus squared of the coefficient of the corresponding com-
ponent in the initial state, or equivalently the squared norm of the (un-
normalised) collapsed state. In the case of both (9) and (10) this equals
1/2. (Note that in a Stern–Gerlach experiment what we measure is in fact
whether the electron impinges on the upper or lower half of the screen, thus
collapsing the state (8) to one of √12 |±z i|ψ± i, each with probability 1/2.)

2 Classical probability (with an eye to quantum


mechanics)

We shall now look at the usual Kolmogorovian notion of probability, formu-


lated with an emphasis on aspects relevant for the analogy with quantum
probability in the next section.

In the standard formulation of Kolmogorov’s axioms, a probability space


is a triple (Ω, B, p), where Ω is a set, the event space B is a (σ-)field of sub-
sets of Ω (i. e. closed under complements, (denumerable) intersections and
unions, and containing Ω) and p is a normalised (σ-)additive real measure
on B, in the sense that

(C1) For all b ∈ B, p(b) ∈ [0, 1] ,


(C2) p(Ω) = 1 ,
(C3) For all finite (or denumerable) families {bi } of mutually disjoint sets
in B, [ X
p( bi ) = p(bi ) .
i i

6
Equivalently, since every (σ-)field of sets is a Boolean (σ-)algebra, and by
Stone’s theorem every Boolean (σ-)algebra is representable as a (σ-)field
of sets, one can take the event space B to be a Boolean (σ-)algebra, and
re-express (C1)–(C3) accordingly:

(C10 ) For all b ∈ B, p(b) ∈ [0, 1] ,


(C20 ) p(1) = 1 ,
(C30 ) ll For all finite (or denumerable) families {bi } of mutually disjoint
events (i. e. with bi ∧ bj = 0 for i 6= j),
_ X
p( bi ) = p(bi ) ,
i i

where we use 0 and 1 to denote also the zero and unit elements of the
algebra. In the following, we shall take the set Ω to be discrete (finite or
denumerable), and we shall take for simplicity all singletons {ω} ⊂ Ω to be
measurable.

We shall now introduce a (suitably general) notion of observable. Con-


sider first the random variables with values in the unit interval e : Ω → [0, 1]
(so-called response functions or effects), and as a special case the response
functions χ : Ω → {0, 1} (these are identical with the characteristic func-
tions of measurable subsets Σ ⊂ Ω, i. e. functions that take the value 1 on Σ
and 0 on Ω \ Σ). We now define an observable as a (finite or denumerable)
family of response functions {ei }i∈I such that
X
ei = 1 (11)
i∈I

(where 1 is the random variable that is identically 1).4

For each such observable, the probability measure p induces a probability


measure on (the Boolean algebra generated by) the family of functions {ei }
(or on the index set I), which we also denote by p:
X
p(ei ) := ei (ω)p({ω}) , (12)
ω∈Ω
4
One could consider also continuous or partially continuous families of response func-
tions as defining observables (even though the probability space itself is discrete), but we
shall ignore them for simplicity, and keep everything discrete.

7
and for any subset J of I:
X XX
p( ei ) := ei (ω)p({ω}) . (13)
i∈J i∈J ω∈Ω

(This is correctly normalised because of (11) and (C30 ).) In the special case
in which all ei are ‘sharp’ (ei (1−ei ) = 0, where 0 is the random variable that
is identically 0) — i. e. characteristic functions —, we see that the probabil-
ities
P are just the measures of the sets defined by the characteristic functions
i∈J ei , so that the ‘sharp observables’ are in bijective correspondence with
the (finite or denumerable) partitions of Ω, and ‘measuring’ a sharp observ-
able is simply a procedure for distinguishing between the events forming
such a partition.

General observables (at least in the classical case) can be interpreted as


noisy or fuzzy or unsharp versions of sharp observables. Indeed, take the
following observable, given by the resolution of the identity
X
χ{ω} = 1 (14)
ω∈Ω

(we can call this the finest sharp observable). Now, every effect e can be
written as X
e(ω)χ{ω} . (15)
ω∈Ω

Since for any observable {ei }, in particular also for (14), the probability of
each ei is given by (12), we have that
X
p(χ{ω} ) = χ{ω} p({ω 0 }) = χ{ω} p({ω}) = p({ω}) , (16)
ω 0 ∈Ω

and thus X
p(ei ) = ei (ω)p(χ{ω} ) . (17)
ω∈Ω

And we see that ei (ω) can be interpreted as the conditional probability for
the response ei in the experiment {ei }, given that a (counterfactual) mea-
surement of the finest sharp observable would have yielded ω.

It is important to note that while each experiment has a Boolean struc-


ture (the subsets of the index set I form a Boolean algebra), in general these
Boolean algebras do not correspond to the Boolean subalgebras of B. It is

8
only the Boolean algebras associated with measurements of sharp observ-
ables that correspond to Boolean subalgebras of B. (As we shall have again
occasion to remark in Section 4, sharp observables have a number of useful
properties not shared by general observables.)

There are two further notions we wish to introduce with an eye to the
analogy with quantum probability. One is a notion of compatibility of ob-
servables. To this end, we first introduce the coarse-graining of observables:
the observable {ei }i∈I is a coarse-graining of the observable {gk }k∈K iff there
is a partition of the index set K = ∪i∈I Ki such that for all i ∈ I,
X
ei = gk (18)
k∈Ki

(note that every sharp observable is indeed a coarse-graining of the finest


sharp observable (14)). Clearly, any experiment that measures {gk } also
measures {ei }. We now call two observables {ei } and {fj } compatible iff
there is an observable {gk } such that {ei } and {fj } are both coarse-grainings
of {gk }. The observable {gk } is called a joint observable for (or a joint fine-
graining of) {ei } and {fj }.

Obviously, any two classical observables are compatible. Indeed, given


any two observables {ei } and {fj }, we can define a joint observable {gk }
simply as {ei fj }(i,j) (the indices ranging over those pairs (i, j) for which the
product ei fj 6= 0). In the special case of sharp observables, the product
ei fj is of course the characteristic function of the intersection of the sets
defined by ei and fj , and the joint observable corresponds simply to the
Boolean sub-algebra of B generated by the union of the two subalgebras
corresponding to {ei } and {fj }.5

The other notion we introduce with an eye to quantum probability is


that of a state, defined as a family of overlapping probability measures over
the outcomes of all possible experiments, i. e. a mapping p from the response
functions to the reals, such that:

(C100 ) For all e, p(e) ∈ [0, 1] ,


5
The definition of the joint of any two observables {ei } and {fj } extends to that for
finite sets of observables. In the case of the sharp observables it is easy to see that arbitrary
sets of observabes are jointly compatible, because they are all Boolean subalgebras of the
same Boolean algebra B (corresponding to the finest sharp observable (14)).

9
(C200 ) p(1) = 1 ,

(C300 ) For all (finite or denumerable) families {ei } of effects with


P
i ei ≤ 1,
X X
p( ei ) = p(ei ) .
i i

We have already seen that a classical probability measure p induces a prob-


ability measure on every classical observable and thus defines a state in this
sense. Conversely, the family of the probability measures on all observables
fixes the original probability measure uniquely, because the original proba-
bility measure is nothing other than the probability measure associated with
the finest sharp observable (14). Note, finally, that if we consider all possible
states p, we can identify the response functions e with the affine mappings
from the states into [0, 1] defined by

e : p 7→ p(e) . (19)

3 Quantum mechanics (with an eye to probability)

As mentioned in Section 1, the formalism of quantum mechanics is based


on the fairly familiar structure of a vector space with scalar product, or
technically a Hilbert space (because it is complete in the norm induced by
the scalar product – vacuously so in finite dimensions).6 What interests
us in particular (with an eye to highlighting the probabilistic structure of
quantum mechanics) are the notions of ‘state’ and ‘observable’ we find in
the formalism.

States are associated in the first place with (unit) vectors in the space.7
As in our example of a Stern–Gerlach measurement and the associated ‘col-
lapse’ of the state (9–10), experiments are generally and abstractly associ-
ated with probabilistic transformations of the states, corresponding to the
6
The Hilbert spaces used in standard quantum mechanics are over the complex num-
bers, and they are separable (i. e. they always have either a finite or a denumerable basis).
7
The notation we use is the so-called Dirac notation, in which scalar products are
denoted by angle brackets, hϕ|ψi, and vectors (‘kets’) are denoted by right half-brackets,
|ψi (a left half-bracket, hϕ|, denotes the linear functional (‘bra’) assigning to each vector
|ψi the complex number hϕ|ψi).

10
idea that ‘measurements’ induce an irreducible disturbance of a quantum
system:
|ψi 7→ A|ψi (20)
(where the right-hand side should be thought of as suitably renormalised).8
The probability for a transition of the form (20) is given by

hψ|A∗ A|ψi , (21)

which is the scalar product of the vector A|ψi with itself.9 The operator
A in (20) is arbitrary (in particular not necessarily unitary or self-adjoint),
the only restriction being that (21) be no greater than 1. Note that the
probability (21) depends only on the product A∗ A and not on the specific
transformation A (indeed, one can construct infinitely many other operators
B such that B ∗ B = A∗ A). Operators of the form E = A∗ A for some
transformation A are called ‘effects’.10

Each experiment will include a number of alternative transformations


that could possibly take place:

|ψi 7→ Ai |ψi , (22)

whose combined probability equals 1:


X
hψ|A∗i Ai |ψi = 1 . (23)
i

This is required to hold for all possible unit vectors |ψi, so in fact we have
X
A∗i Ai = 1 , (24)
i
8
As we point out at the end of this section, one can consider even more general trans-
formations, but that does not in the least affect the generality of what follows.
9
The operator A∗ , called the adjoint of A, is the operator defined by hϕ|A∗ ψi = hAϕ|ψi
(ignoring niceties about domains of definition, which become vacuous if the Hilbert space
is finite-dimensional). Self-adjoint operators are operators with A∗ = A. Note that
(AB)∗ = B ∗ A∗ , so that in particular an operator of the form A∗ A is self-adjoint. Note
also that for A self-adjoint, the mapping A 7→ hψ|Aψi (normally written A 7→ hψ|A|ψi) is
a linear functional onto the positive reals that is normalised, in the sense that hψ|1|ψi = 1
(with 1 the identity operator), and is continuous with respect to the so-called operator
norm (again vacuously so in finite dimensions).
10
More explicitly, effects are operators E such that both E = A∗ A and 1 − E = B ∗ B
for some operators A, B (thus ensuring that the expression (21) is between 0 and 1).

11
or, writing Ei := A∗i Ai , X
Ei = 1 , (25)
i

where 1 is the identity operator. Each such (finite or denumerable) family


of effects {Ei }i∈I , or ‘resolution of the identity’ (25) is called an observable,
quantum mechanical effects being the formal analogue of classical response
functions.11

Note that the probability of an effect (in any given state) is independent
of which observable the effect is part of, i. e. which family of alternative
transformations is being implemented in a particular experiment. We shall
return to this ‘non-contextuality’ of probabilities in Sections 5 and 6 below.12
Suffice it to say now that it is a non-trivial feature because, unlike the
classical case, the same effect could be part of two mutually incompatible
observables.

There is more than one definition of (in)compatibility in the literature,


but the following one (on which we have modelled the definition of Section 2)
is the most suited to our purposes (see e. g. Cattaneo et al. 1997). Define
an observable {Ei }i∈I to be a coarse-graining of the observable {Gk }k∈K iff
there is a partition of the index set K = ∪i∈I Ki such that for all i ∈ I,
X
Ei = Gk . (26)
k∈Ki

Any experiment that measures {Gk } also measures {Ei }. As in the classical
case, we call two observables {Ei } and {Fj } compatible iff there is an observ-
able {Gk } such that {Ei } and {Fj } are both coarse-grainings of {Gk }. The
observable {Gk } is called a joint observable for (or a joint fine-graining of)
{Ei } and {Fj }. Compatibility of two observables can be easily generalised
to joint compatibility of arbitrary sets of observables.
11
As in the classical case, one could consider more general observables (even in finite
dimensions), in which the sum in (25) is replaced or supplemented by an integral. For
simplicity, however, we consider only discrete resolutions of the identity.
12
More generally, the term non-contextuality is used to denote independence not only
of the observable measured but also of any details of the measurement context (Shimony
(1984) calls these, respectively, ‘algebraic’ and ‘environmental’ contextuality). Unless
we have a very weird theory, we can presumably assume that the observable measured
is fixed by the details of the experiment (which in quantum mechanics also determine
which transformation implements any particular effect). Cf. also our distinction between
‘observables’ and ‘experiments’ in Section 4.

12
The definition of an observable in any (older) textbook on quantum
mechanics is as a self-adjoint operator A, i. e. an operator with A∗ = A. But
this traditional definition corresponds to a special case of the oneP above.
Self-adjoint operators are diagonalisable, in the sense that A = i ai Pi
with real ai (the eigenvalues) and {Pi } a family of projections (self-adjoint
operators with Pi2 = Pi , or Pi (1 − Pi ) = 0, where 0 is the zero operator)
that are mutually orthogonal (Pi Pj = 0 for i 6= j).13 Thus, each self-
adjoint operator is associated with a unique ‘projection-valued observable’,
i. e. a resolution of the identity (25), in which the effects Ei are in fact
projections (they are ‘sharp’, meaning Ei (1 − Ei ) = 0), and which is finite
if the Hilbert space is finite-dimensional. Note also that a measurement
of such a ‘sharp observable’ can be implemented by taking Ai = Pi , since
Pi∗ Pi = Pi , i. e. each state is transformed to an eigenstate of the measured
observable. This is the usual ‘collapse postulate’ or ‘projection postulate’
of textbook quantum mechanics, corresponding to a ‘minimally disturbing’
measurement of a sharp observable.14

Compatibility of two sharp observables A and B is traditionally defined


as their commutativity, i. e. AB = BA. This is equivalent to the P commuta-
tivityPof the elements of the respective resolutions of the identity, i Pi = 1
and j Qj = 1, i. e.

Pi Qj = Qj Pi for all i, j . (27)

In this case, a joint (projection-valued) observable {Rk } is given by {Pi Qj }(i,j)


(the indices (i, j) ranging over those pairs for which Pi Qj 6= 0). Indeed, triv-
ially, X X
Pi = Pi Qn and Qj = Pm Qj . (28)
n m

This argument generalises to show that finite sets of pairwise compatible


13
This is the so-called spectral theorem. In infinite dimensions the sum is generally to
be replaced or supplemented by an integral. The set of (generalised) eigenvalues is called
the spectrum of the operator. Each discrete eigenvalue ai has corresponding eigenvectors,
i. e. vectors such that A|ψi = ai |ψi, and if the whole spectrum is discrete there exists an
orthonormal basis of the Hilbert space consisting of eigenvectors of A. In finite dimensions,
the number of terms in the diagonal decomposition is bounded by the dimension n of
the Hilbert space, because there cannot be a set of more than n mutually orthogonal
(eigen)vectors. Note that the spectrum of a projection operator is just the set {0, 1}.
14
Note that the corresponding transition probabilities reduce to the form hψ|Pi |ψi, i. e.
the scalar product between the initial state |ψi and final state Pi |ψi. The value of the
scalar product hψ|ϕi is often referred to as the transition probability between the vectors
|ψi and |ϕi.

13
sharp observables possess a joint projection-valued resolution of the iden-
tity.15 Indeed, since in finite dimensions all diagonal decompositions are
discrete, one can generalise it further to arbitrary sets of pairwise commut-
ing operators.

Effect-valued and projection-valued observables are the analogues, re-


spectively, of the general and sharp classical observables of Section 2, and,
at least in some cases, also effect-valued observables can be interpreted as
‘unsharp’ versions of projection-valued ones. Since every effect E is a self-
adjoint operator, it is itself diagonalisable as
X
E= ek Pk , (29)
k

where all the eigenvalues ek are positive and lie in the interval [0, 1].16 Now
suppose all effects Ei in an observable (25) commute: there will then exist a
single projection-valued resolution of the identity {Rk }, such that every Ei
can be written as X
Ei = eik Rk . (30)
k

with suitable coefficients eik . The probability of Ei in the state |ψi is


X
hψ|Ei |ψi = eik hψ|Rk |ψi , (31)
k

and we can again at least formally identify the coefficient eik as the condi-
tional probability that the measurement of {Ei } yields i given that a measure-
ment of {Rk } would have yielded k.17 Thus, we can think of a commutative
15
This property of the sharp observables in quantum mechanics is known as ‘coherence’.
See also Section 5 below.
16
Indeed, an equivalent definition of an effect is as a self-adjoint operator with spectrum
in the interval [0, 1].
17
We have the correct normalisation because the Ei form a resolution of the identity
(25), so that
XX i X X
ek hψ|Rk |ψi = hψ|Ei |ψi = hψ| Ei |ψi = 1 ,
i k i i

and choosing |ψi to be an eigenstate with eigenvalue 1 of, say, Rk0 , we get
XX i XX i X i
1= ek hψ|Rk |ψi = ek δkk0 = ek0 ,
i k i k i

independently of the choice of k0 .

14
effect-valued observable as (at least probabilistically equivalent to) a ‘noisy’
or ‘fuzzy’ or ‘unsharp’ measurement of an associated projection-valued ob-
servable.

Further, given our definition of joint observables, we can also understand


the general case of a non-commutative effect-valued observable {Ei }i∈I as
a joint observable for the generally denumerably many commutative effect-
valued observables {Ei , 1 − Ei }, one for each i ∈ I.18 This gives us a further
insight on compatibility and incompatibility — namely that incompatible
observables can be made compatible if one is willing to introduce enough
‘noise’ in one’s measurements.19

While in a sense we can thus reduce the effect-valued observables to


the projection-valued ones (and the question of incompatibility essentially
to that of incompatibility for projection-valued observables), it makes very
good sense to work with the more general effect-valued ones. To quote three
reasons: effect-valued observables are needed for modelling realistic experi-
ments; the concatenation of two experiments is clearly an experiment, but
cannot generally be represented by a projection-valued observable; and no
measurement of a projection-valued observable can fully determine a quan-
tum state, while — precisely because effect-valued observables can combine
probabilistic information from incompatible projection-valued ones, even if
noisily — there are so-called ‘informationally complete’ effect-valued ob-
servables, whose measurement statistics completely determine the quantum
state.20

Let us now return to the quantum states themselves. We can think of a


quantum state as defining a family of overlapping probability measures over
the outcomes of all possible experiments. More precisely, we can identify
states |ψi with mappings pψ from the effects to the reals, such that:
18
See Cattaneo et al. (1997) for a fuller discussion of both the commutative and the
non-commutative case (in finite dimensions).
19
Indeed, let {Ei } and {Fj } be incompatible observables. Then the observable
{ 21 E1 , 21 F1 , 12 E2 , 12 F2 , . . .} is trivially a joint observable for { 21 1, 12 E1 , 12 E2 , . . .} and
{ 12 1, 12 F1 , 12 F2 , . . .}, which are ‘noisy’ versions of {Ei } and {Fj }.
20
Any effect-valued observable can in fact be implemented by letting the system interact
appropriately with an ancillary system and then performing an appropriate measurement
of a projection-valued observable on the ancillary system (so-called Naimark dilation),
much like one measures spin by coupling it to the spatial degrees of freedom and then
measuring position. But we shall not need this in the following.

15
(Q1) For all E, pψ (E) = hψ|E|ψi ∈ [0, 1] ,

(Q2) pψ (1) = hψ|1|ψi = 1 ,


P
(Q3) For all (finite or denumerable) families {Ei } of effects with i Ei ≤ 1,
X X
pψ ( Ei ) = pψ (Ei ) .
i i

This definition can of course be restricted to projection operators (with


(Q3) restricted to families of mutually orthogonal projections). If one does
so, a famous theorem due to Gleason (1957) (valid for Hilbert spaces of
dimension n ≥ 3) shows that the most general state ρ on the projections is
an arbitrary convex combination (i. e. a weighted average) of vector states pψ .
A fortiori this is true for states defined on effects (and the direct proof for
this case is much simpler (Busch 2003)). The most general quantum states
thus form a convex set, with the vector states as its extremal points.21

Perhaps surprisingly, these general quantum states cannot uniquely be


decomposed as convex combinations of vector states. (We shall not have
the space to develop this point, but it is a very important disanalogy with
the classical case.) This can be seen very easily in the special case of a spin-
1/2 system (described by a 2-dimensional Hilbert space), using a rather
beautiful geometric representation, the so-called Poincaré sphere (or Bloch
sphere). The unit vectors form a 2-dimensional complex sphere, and this is
affinely isomorphic to a 3-dimensional real sphere. That is, one can map the
two bijectively in a way that preserves convex combinations. This allows
one to associate the abstract ‘spin’ states with directions in 3-dimensional
space: an electron has spin ‘up’ in the direction r iff its abstract spin state is
mapped to the spatial vector r under this affine mapping, and ‘down’ iff it is
mapped to the spatial vector −r. One can then see directly that any point
in the interior of the sphere (corresponding to a general quantum state)
can be written in infinitely many ways as a convex combination of points
on the surface of the sphere (corresponding to vector states). Indeed, any
straight line through a point in the interior will intersect the surface of the
sphere in two points, thus defining a convex decomposition of the interior
point, but there are infinitely many such straight lines. The centre of the
21
General quantum states are formally represented by so-called ‘density operators’ ρ,
and the probabilities they define on the effects E can be expressed using the ‘trace func-
tional’, Tr(ρE) (see footnote 2 for standard references).

16
sphere corresponds to the so-called maximally mixed state, which assigns
equal probabilities 21 to spin up or down in any spatial direction r.

Note that, in turn, effects can be identified with the mappings from the
states into [0, 1] defined by
E : |ψi 7→ pψ (E) . (32)
These mappings are also affine, i. e. map convex combinations of states to
the same convex combination of the corresponding probabilities.22

We are now ready to sketch a generalisation of the notion of probability


encompassing both classical and quantum probability (Section 4), and to
discuss whether it is indeed a non-trivial generalisation of the classical notion
(Sections 5 and 6).

4 Generalised probability (a sketch)

There are different (but largely convergent) approaches to defining a theory


of probability generalising both classical and quantum probability. The
seminal work in this area is due to Mackey (1957), who inaugurated what
is known today as the ‘convex set’ approach to quantum and generalised
probability, which axiomatises directly pairs of state–observable structures
(Mackey 1963, Varadarajan 1968, Beltrametti and Cassinelli 1981).

Alternative more concrete routes to generalised probability have been


pursued in particular by Ludwig (1954, 1985) and his group, and by Foulis
and Randall with their work on test spaces (starting with Randall and Foulis
(1970, 1973), and Foulis and Randall (1972, 1974); see also the review by
Wilce (2000)).

Research in quantum logic opened yet further routes into generalised


probability by providing various generalisations of Boolean algebras (ortho-
modular lattices, orthomodular posets, partial Boolean algebras, orthoal-
gebras, effect algebras etc.), thus generalising the event spaces of classical
22
Once the notion of state has been generalised, it makes sense to generalise also the
notion of state transformation, which in (20) was restricted to those transformations that
map vector states to vector states. This does not lead, however, to a further generalisation
of the notion of observable: effect-valued observables are still the most general ones, and
can be used to completely determine a general quantum state.

17
probability. The main lines of research in this tradition stem, respectively,
from the classic paper by Birkhoff and von Neumann (1936), which inau-
gurated the lattice-theoretic version of quantum logic (with its emphasis
on weakening the distributive law, originally in favour of modularity and
then of orthomodularity), and from the work on partial Boolean algebras by
Specker (1960) and Kochen and Specker (1965a,b, 1967) (with its emphasis
on partial operations).23

I shall deliberately ignore these distinctions and sketch instead a some-


what pedagogical version of generalised probability theory — drawing from
elements of these various approaches, and having the advantage of being
fairly simple and of leading rather naturally to the abstract notions of ef-
fect algebras and orthoalgebras (possibly the best current candidates for
providing an abstract setting for a generalised probability theory). While
much of what follows can be generalised or ought to be generalisable to the
denumerable case, in this section I shall focus exclusively on the finite case.

Let us start with the quasi-operational idea of a set A of experiments.


Each experiment A ∈ A is characterised by a Boolean algebra BA of ex-
perimental outcomes eA ∈ BA . We further consider states p ∈ P, which
define probability measures over the outcomes of all possible experiments,
p(eA ). If an outcome eA of one experiment can be somehow identified with
an outcome eB of a different experiment, one will obviously require that
p(eA ) = p(eB ) for all states p, i. e. the probability measures induced by the
states can generally overlap. Less obviously, we will identify all pairs of
experimental outcomes that are equiprobable in all states. This will be the
first of only very few substantial requirements. Thus we will define effects as
equivalence classes e = [eA ] of experimental outcomes under the equivalence
relation ∼ of equiprobability for all p:

eA ∼ eB :⇔ p(eA ) = p(eB ) for all p . (33)

We denote the set of all effects by E. In Section 6 we shall return to the


question of identifying outcomes of different experiments. Suffice it to say
at this stage that identifying equiprobable outcomes means we are thinking
of effects as characterised by what they can tell us about the states.24 Note
23
For useful discussions and reviews of aspects of quantum logic and generalised prob-
ability, see also Hardegree and Frazer (1981), Hughes (1985), Coecke, Moore and Wilce
(2001), Wilce (2012), Darrigol (forthcoming), and the collections edited by Hooker (1975),
Marlow (1978) and Coecke, Moore and Wilce (2000).
24
To take a fully operationalist stance on this question would mean to identify events

18
that each effect e defines an affine mapping from the states to the unit
interval:
e : P → [0, 1], p 7→ p(e) . (34)
It may be convenient to require that every such mapping corresponds to an
effect, but for our limited purposes we shall not do so.

Now, for any two experiments A, B ∈ A,

0A ∼ 0B and 1A ∼ 1B (35)

(where 0A and 1A are the 0 and 1 elements of the Boolean algebra BA , and
similarly for BB ). Indeed, p(0A ) = 0 and p(1A ) = 1 for all p, independently
of A. We can thus define an effect 0 and an effect 1 as

0 := [0A ] independently of A (36)

and
1 := [1A ] independently of A . (37)
Similarly, for any A, B ∈ A, if eA ∼ eB then ¬eA ∼ ¬eB (where ¬ denotes
negation in the relevant Boolean algebra). Indeed, if p(eA ) = p(eB ) for all
p, then

p(¬eA ) = 1 − p(eA ) = 1 − p(eB ) = p(¬eB ) for all p , (38)

and for any effect e we can define a unique effect e⊥ as

e⊥ := [¬eA ] independently of A . (39)

Clearly, for any e, we have e⊥⊥ = e. Note, however, that it is perfectly pos-
sible for some e that e⊥ = e (this is the case for both the response function
1 1
2 1 in classical probability and the effect 2 1 in quantum probability).

with results of laboratory procedures, such as letting an electron pass through an inho-
mogeneous magnetic field and observing it hit the upper (or lower) half of a screen; or
opening a box and finding (or failing to find) a gem inside it. But this will clearly not
do. Despite the lip service to operationalism, such procedures will always abstract away
from aspects of the experimental setting deemed irrelevant, e. g. whether or not I am per-
forming the experiment standing on one leg. But to deem any detail of the experimental
arrangement irrelevant is already to make a theoretical decision. To quote two suggestive
examples: it might be important whether the magnetic field is stronger at the north or
the south pole of the magnet (this detail is irrelevant in standard quantum mechanics,
but makes all the difference in the description of spin measurements in de Broglie and
Bohm’s pilot-wave theory, as discussed e. g. by Albert (1992)); or it might be important
whether one opens box A together with box B or together with box C (we shall discuss
this example explicitly in Section 6).

19
The states naturally induce a partial ordering on the effects — which
will be a useful tool in the following — defined as

e≤f :⇔ p(e) ≤ p(f ) for all p . (40)

Note that
e ≤ f ⇔ f ⊥ ≤ e⊥ . (41)
We now introduce two important notions: compatibility and orthogonality
of effects.

Two effects e and f are compatible, written e$f , iff there is an experiment
A and outcomes eA ∈ BA and fA ∈ BA such that e = [eA ] and f = [fA ].
That is, two effects are compatible iff they can be measured in a single
experiment. The definition of compatibility can be trivially extended to
finite sets of effects.

Two effects e and f are orthogonal (or disjoint), written e ⊥ f , iff


e ≤ f ⊥ , i. e. p(e) ≤ 1 − p(f ) for all p. Note that this relation is symmetric,
but generally not irreflexive (since it is possible that e⊥ = e). We can
generalise also orthogonality to finite setsPof effects, by defining a family
{ei } of effects to be jointly orthogonal iff i p(ei ) ≤ 1 for all p.

Note that if there are experimental outcomes eA and fA in some A ∈ A


such that eA ≤ fA with respect to the partial order on the Boolean algebra
BA , then p(eA ) ≤ p(fA ) for all p, and therefore [eA ] ≤ [fA ] with respect to
the partial order on the effects.

We shall require, conversely, that if e ≤ f for two effects e and f , then


there exist at least one experiment A and experimental outcomes eA ∈ e and
fA ∈ f , such that eA ≤ fA in the Boolean algebra BA . This is the second of
our substantive requirements.

It follows in particular that comparable effects are compatible, that is,


that e ≤ f implies that e and f are in fact compatible. Since eA ≤ fA for
some A means that p(fA ∧ ¬eA ) = p(fA ) − p(eA ) for all p, it also follows
that if e ≤ f , there is an effect f e := [fA ∧ ¬eA ] (independently of any
particular A with eA ≤ fA ), which is jointly compatible with e and f , and
such that p(f e) = p(f ) − p(e) for all p.

By the requirement above we also have that orthogonal effects are com-
patible. Indeed, e ⊥ f means that e and f ⊥ are comparable, and thus

20
compatible. But if eA ∈ e and ¬fA ∈ f ⊥ are in the same Boolean algebra
BA , then so are eA and fA , thus e and f are compatible. It also follows
that if e ⊥ f , i. e. p(e) + p(f ) ≤ 1 for all p, there is an effect e ⊕ f jointly
compatible with e and f , such that p(e ⊕ f ) = p(e) + p(f ) for all p. Indeed,
given the above, we can define

e ⊕ f := (f ⊥ e)⊥ = [eA ∨ fA ] (42)

(independently of any particular A with eA ≤ fA ), which is jointly compat-


ible with e and f , and such that

p(e ⊕ f ) = p(e) + p(f ) for all p . (43)

Note that for e ≤ f we thus have

f = e ⊕ (f e) , (44)

or
f = e ⊕ (e ⊕ f ⊥ )⊥ (45)
(the so-called ‘effect algebra orthomodular identity’).

As our third and last substantive requirement, we shall strengthen the


above so that for (finite) ordered chains of effects e1 ≤ e2 ≤ e3 ≤ . . .,
there exist an experiment A and experimental outcomes e1A ∈ e1 , e2A ∈ e2 ,
etc., such that e1A ≤ e2A ≤ e3A ≤ . . . in the Boolean algebra BA , so that in
particular the effects in the chain are jointly compatible. (In the rest of
this section, whenever we write ‘ordered chain’ we shall mean ‘finite ordered
chain’, and similarly for ‘jointly orthogonal set’.)

If ordered chains are jointly compatible it follows that also jointly orthog-
onal sets of effects are jointly compatible. Indeed, given a jointly orthogonal
set of effects {ei }N
i=1 , the sequence of effects

e1 , e1 ⊕ e2 , (e1 ⊕ e2 ) ⊕ e3 , . . . (46)

is an ordered chain, and so is a jointly compatible set. But if e1A , e1A ∨ e2A , . . .
are in the same Boolean algebra BA , so are e1A , e2A , . . ., and the original set
{ei } is jointly compatible.

A (finite) observable on E can now be defined


L simply as a jointly orthog-
onal set (for which one automatically has i ei ≤ 1), and a state p ∈ P can
be identified with a mapping from the effects to the reals, such that:

21
(G1) For all e ∈ E, p(e) ∈ [0, 1] ,

(G2) p(1) = 1 ,

(G3) For all jointly orthogonal sets {ei } of effects,


M X
p( ei ) = p(ei ) .
i i

Since jointly orthogonal sets are jointly compatible, to each such observable
on E there corresponds at least one experiment A ∈ A. Coarse-graining
and compatibility of observables can be defined as above, and compatibility
of two effects e and f is trivially equivalent to compatibility of the two
observables {e, 1 − e} and {f, 1 − f }.

We are now in a position to show that the structure (E, 0, 1, ⊕) is an


effect algebra, that is, a structure with two distinguished elements 0 and
1 and a partial operation ⊕ (defined on a subset of E × E), satisfying the
following axioms:25

(E1) The partial operation ⊕ is commutative, i. e. if e ⊕ f is defined, so is


f ⊕ e, and e ⊕ f = f ⊕ e.

(E2) The partial operation ⊕ is associative, i. e. if e ⊕ f and (e ⊕ f ) ⊕ g are


defined, so are f ⊕ g and e ⊕ (f ⊕ g), and (e ⊕ f ) ⊕ g = e ⊕ (f ⊕ g).

(E3) For any e, there is a unique element e⊥ such that e ⊕ e⊥ = 1.

(E4) If e ⊕ 1 is defined, then e = 0.

(The elements of an abstract effect algebra are also called effects.)

Proof :
Define ⊕ as above. Since the relation ⊥ is symmetric, if e ⊕ f is defined, so
is f ⊕ e, and (E1) follows because

p(e) + p(f ) = p(f ) + p(e) for all p . (47)


25
There are some slightly different but equivalent axiomatisations of effect algebras,
which were introduced more or less independently by a number of authors: by Giuntini
and Greuling (1989) under the name ‘weak orthoalgebras’ or ‘generalised orthoalgebras’,
by Kôpka (1992) under the name ‘D-posets’, and by Foulis and Bennett (1994) under the
name ‘effect algebras’. Here I follow the latter.

22
Next, assume that e ⊕ f and (e ⊕ f ) ⊕ g are defined, i. e.

p(e) ≤ 1 − p(f ) and p(e) + p(f ) ≤ 1 − p(g) for all p . (48)

Then also

p(f ) ≤ 1 − p(g) and p(f ) + p(g) ≤ 1 − p(e) for all p , (49)

i. e. also f ⊕ g and e ⊕ (f ⊕ g) are defined, and (E2) follows because


   
p(e) + p(f ) + p(g) = p(e) + p(f ) + p(g) . (50)

(Associativity of ⊕ was already implicit when we showed above that orthog-


onal sets are jointly compatible, rather than only orthogonal sequences.)

Further, for each e the unique element satisfying (E3) is the element e⊥
defined by (39): clearly e ⊕ e⊥ is defined and e ⊕ e⊥ = 1; and because of
(43), if there are two effects f and f 0 both satisfying

p(e ⊕ f ) = p(e ⊕ f 0 ) = 1 for all p , (51)

then p(f ) = p(f 0 ) for all p, hence f = f 0 , and (E3) follows.

Finally, if e ⊕ 1 is defined then p(e ⊕ 1) = p(e) + p(1) = p(e) + 1 for all p,


but since 0 ≤ p(e), p(e ⊕ 1) ≤ 1, we have p(e) = 0 for all p, and (E4) follows.
QED.

Note that in any effect algebra, we can abstractly define a partial order
e ≤ f as: there is a g such that e ⊕ g = f . Given our definition of ⊕, it
follows from (43) and (44) above that our previous definition of the partial
order on E coincides with the abstract one.

Similarly, in any effect algebra one can abstractly define relations of


orthogonality and compatibility. Two effects e and f are orthogonal in the
abstract senseL iff e ⊕ f is defined, and a finite set of effects {ei } is jointly
orthogonal iff i ei is defined. Our definition of orthogonality for E clearly
coincides with the abstract one.

As for compatibility, two or finitely many effects {fj } are compatible in


L iff there is an orthogonal set {ei }i∈I and subsets Ij ⊂ I
the abstract sense
such that fj = i∈Ij ei for all j (we shall say the family of effects {fj } has
an orthogonal decomposition).

23
It is easy to see that also our definition of compatibility for E coincides
with the abstract one. For instance (and similarly for finitely many effects),
if two effects e and f are compatible in our sense above, there is an experi-
ment A ∈ A and experimental outcomes eA ∈ e and fA ∈ f in BA . In this
case the effects g := [eA ∧ fA ], h := [eA ∧ ¬fA ] and i := [¬eA ∧ fA ] form a
jointly orthogonal set with e = g ⊕ h and f = g ⊕ i (such a ‘minimal’ orthog-
onal decomposition is called a Mackey decomposition).26 Conversely, if two
effects e and f are compatible in the abstract sense, they have an orthogonal
decomposition (in fact a Mackey decomposition). But orthogonality in the
abstract sense coincides with orthogonality in our sense above, and we have
already seen that this implies compatibility also in our sense.

Using the partial order, we can finally define sharp elements of the ef-
fect algebra (‘sharp effects’ or ‘projections’), as those satisfying e ∧ e⊥ = 0
(meaning that the greatest lower bound of e and e⊥ exists and is 0). Since
0 is the minimal element of the partially ordered set (poset) E, this in turn
means that every lower bound of e and e⊥ is 0, i. e.

p(f ) ≤ min[p(e), 1 − p(e)] for all p ⇒ f =0. (52)

We can now show that the sharp elements of E form an orthoalgebra L, i. e.


in addition to (E1)–(E4) they satisfy also:

(E5) If e ⊕ e is defined, then e = 0

(in this case, (E4) in fact becomes redundant).

Proof :
Let e be sharp, and let e ⊕ e be defined. Then, by (43),

p(e ⊕ e) = 2p(e) for all p , (53)


1 1
and thus p(e) ≤ 2 for all p. But then 1 − p(e) ≥ 2 for all p, therefore

p(e) = min[p(e), 1 − p(e)] for all p . (54)

Since e is sharp, e = 0, and (E5) follows. QED.


26
Note that, once g is given, h and i are uniquely defined by h = e g and i = f g.
However, g = [eA ∧ fA ] itself is generally not independent of the choice of A, i. e. Mackey
decompositions are generally not unique, so that one cannot define a partial operation ∧
in this way. (Some pairs of effects may nevertheless have greatest lower bounds in the
sense of the partial order.)

24
Note that the sharp elements of an arbitrary effect algebra need not form
an orthoalgebra in general, so this last result depends in fact on how we have
constructed E.

Without further requirements on the experiments and the states, how-


ever, we cannot garantee that general observables can be somehow reduced
to sharp observables. (Indeed, nothing forces the orthoalgebra of sharp ele-
ments of E to be an interestingly rich structure — there might even be no
sharp effects besides 0 and 1!)

One could ask further what conditions one might impose on E or L in


order to recover classical or quantum probability. In the case of classical
probability, it is obvious that L needs to be a Boolean algebra. In the case
of quantum probability, there are some classic partial results going some way
towards ensuring that the orthoalgebra of sharp effects be isomorphic to the
projections on a complex Hilbert space. For instance, one might impose the
following conditions on an orthoalgebra, in increasing order of strength:

(α) Unique Mackey Decomposition (UMD), i. e. compatible pairs are re-


quired to have unique Mackey decompositions. This ensures that the
Boolean structure of sharp experiments coincides where experiments
overlap, in particular allowing conjunction and disjunction to be de-
fined globally as partial operations on the orthoalgebra (thereby turn-
ing an orthoalgebra into a so-called Boolean manifold).

(β) Orthocoherence, i. e. pairwise orthogonal sets are jointly orthogonal.


This ensures that an orthoalgebra is an orthomodular poset.

(γ) Coherence, i. e. pairwise compatible sets are jointly compatible. This


ensures that an orthoalgebra is a (transitive) partial Boolean algebra.

These are all properties of the orthoalgebras of sharp effects in both quantum
and classical probability (indeed, we have seen this explicitly in the case of
coherence).27
27
For good discussions, see Hardegree and Frazer (1981) and Hughes (1985). Note that
the effect algebras of quantum and classical probability are less well-behaved. For instance,
ef and min(e, f ) are alternative elements g defining non-unique Mackey decompositions
for any two response functions e and f (and analogously for any two commuting quantum
effects E and F ). And 41 1, 31 1, 12 1 are pairwise orthogonal quantum effects that fail to be
jointly orthogonal (and analogously for multiples of the classical response function 1).

25
A well-known problem, however, relates to the existence of tensor prod-
ucts, i. e. the possibility of composing generalised probability structures.
If enough states exist (in a well-defined sense), one can construct tensor
products of orthoalgebras, but forming tensor products tends to destroy
orthocoherence. In this respect, the theory still needs to be investigated
further.28

Fortunately, these are not questions we need to address in order to discuss


whether a generalisation of the notion of probability such as the above is
indeed necessary, or whether generalised probabilities might after all be
embeddable in some suitable classical probability space. This is what we
discuss in the next section, and we can do it by looking at very simple
examples.

5 Non-embeddability (and no-hidden-variables)

In Section 4, we have sketched a generalised theory of probability that is


non-classical in the sense that it allows for incompatibility of observables.
None of what we have said so far, however, shows that it is impossible to
embed such a generalised probabilistic structure into some larger classical
probability space, at least under some minimal assumptions.

For instance, if we require that the orthoalgebra L of sharp effects


have the UMD property, then we can define partial Boolean operations
on the sharp effects. One might now naı̈vely imagine formally extending
the Boolean operations even to pairs of incompatible ones, and defining a
probability measure on the thus enlarged event space that should return
the original measures as marginals on each of the original Boolean algebras.
28
See in particular Foulis and Randall (1979, 1981). Other programmes for reconstruct-
ing quantum mechanics have been developed in more recent years, some of which have
had striking successes (Comte 1996, Hardy 2002, Goyal 2008a,b, Dakić and Brukner 2011,
Masanes and Müller 2011, Chiribella, D’Ariano and Perinotti 2011), the most influen-
tial of these arguably being Hardy (2002). (See Darrigol (forthcoming) for analysis and
comparison of most or all of these approaches.) The same is true of programmes for re-
constructing specific striking aspects of quantum mechanics, such as the bounds in the
Bell inequalities (perhaps of particular note among these is the research by Cabello and
co-workers (Cabello, Severini and Winter 2010, Cabello 2012, 2013)). Making connec-
tions between quantum logic and quantum probability on the one hand and some of these
newer approaches on the other might eventually prove very fruitful. For a suggestion in
this direction, see Bacciagaluppi and Wilce (in preparation).

26
Indeed, if two sharp effects e and f are not compatible, the joint probability
of e ∧ f might not be experimentally meaningful, but by the same token
no experiment would constrain us in choosing a measure that specified also
these joint probabilities (e. g. by considering e and f as independent) — or
so it might seem. And if the orthoalgebra of sharp events is rich enough,
one might get back all the original observables by considering unsharp reali-
sations of sharp observables, as in the classical case of Section 2. Even if we
considered this a purely formal construction, it would still mean that our
‘generalised probability spaces’ are indeed embeddable into classical prob-
ability spaces, and thus provide only a fairly trivial generalisation of the
formalism of classical probability.

Where this argument goes wrong, however, is in failing to realise the rich-
ness of the compatibility structure in a general orthoalgebra. The relation
of compatibility is clearly reflexive and symmetric, but it is not transitive,
so that observables do not fall neatly into different equivalence classes of
mutually compatible ones. Put slightly differently, if compatibility is not
transitive, it is possible for the same observable to be a coarse-graining of
two mutually incompatible observables — which in this sense can be said
to be partially compatible. And the interlocking structure of partially com-
patible observables (and the corresponding partially overlapping probability
measures) can be surprisingly rich. In such cases the question of whether
general probabilistic states might be induced by classical probability mea-
sures becomes non-trivial.

We shall now construct a simple master example showing explicitly that


probabilistic states cannot in general be induced by classical probability
measures. We shall then see how the example relates to classic impossibility
theorems ruling out various kinds of ‘hidden variables’ theories in quantum
mechanics (in particular Bell’s theorem and the Kochen–Specker theorem).29

Imagine that we have three boxes, A, B and C. We can open any box,
and find (or fail to find) a gem in it. Let e, f, g be the outcomes ‘finding a
gem in the box’ in each of the three experiments. We further imagine that
we can open any two but not all three boxes simultaneously. We finally
imagine that we have probabilistic states specifying for any pair of boxes
the probabilities for finding gems in neither, either or both of the boxes.30
29
In fact our example generalises one given by Albert (1992) for the purpose of illus-
trating the Bell inequalities.
30
Cf. Specker’s tale of the Sage of Nineveh in Section 6 below.

27
Now let us take a special state, such that for any ordered pair (x, y) with
x, y ∈ {e, f, g}:
p(x|y) = p(¬x|¬y) = α , (55)
with α ∈ [0, 1]. For α = 1, such a state obviously has the form

p(x ∧ y) = a ,
p(¬x ∧ ¬y) = 1 − a ,
(56)
p(x ∧ ¬y) = 0 ,
p(¬x ∧ y) = 0

for all pairs (x, y), for some a ∈ [0, 1]. And it is equally obvious that for any
a, the state is (uniquely) induced by the probability measure defined by

p(e ∧ f ∧ g) = a , p(¬e ∧ ¬f ∧ ¬g) = 1 − a , (57)

and

p(e ∧ f ∧ ¬g) = p(e ∧ ¬f ∧ g) = p(e ∧ ¬f ∧ ¬g) =


p(¬e ∧ f ∧ g) = p(¬e ∧ f ∧ ¬g) = p(¬e ∧ ¬f ∧ g) = 0 . (58)

For the case α 6= 1 we have instead:

Lemma:
A state satisfying (55) with α 6= 1 is uniquely given by
α
p(x ∧ y) = p(¬x ∧ ¬y) = , (59)
2
and
1−α
p(x ∧ ¬y) = p(¬x ∧ y) = (60)
2
for any (x, y). Thus in particular,
1
p(x) = p(¬x) = p(y) = p(¬y) = . (61)
2

We then have a rather striking result:

Proposition:
Under the assumptions of the Lemma, p is induced by a joint probability

28
measure on the Boolean algebra (formally) generated by {e, f, g} if and only
if α ≥ 13 .

The proofs are left for the Appendix. Intuitively, the case α = 1 cor-
responds to perfect correlations between finding or not finding gems in any
two boxes, and it is indeed obvious that this state can be extended to a
probability measure in which there are perfect correlations between finding
gems in all three boxes. To take an intermediate case, α = 12 is the uncorre-
lated case, in which any two boxes are independent, and is again obviously
extendable to the case in which all three boxes are independent. The case
α = 0 instead is the perfectly anti-correlated case, and is clearly not clas-
sically reproducible: if whenever there is a gem in the first box there is no
gem in the second, and whenever there is no gem in the second there is one
in the third, then whenever there is a gem in the first box there also is one in
the third, contradicting the hypothesis.31 In fact, every state with α < 1 is
a convex combination of the (unique) perfectly anti-correlated state and the
(special) perfectly correlated state with p(e) = p(f ) = p(g) = 21 . While all
states with positive correlations, the uncorrelated state, and even some with
negative correlations are reproducible classically (all states with α ≥ 31 ), if
the perfectly anti-correlated component comes to dominate too strongly, the
negative correlations can no longer be reproduced by a classical probability
measure.

As we shall now see, a larger set of states can, however, be reproduced


quantum mechanically (all states with α ≥ 14 ). This shows, indeed, that
already the case of quantum probabilities requires generalised probabilites
that cannot be embedded in classical probability spaces. It also shows explic-
itly that quantum probabilities are only a subset of all possible generalised
probabilities.32
31
More precisely: the Proposition states the non-existence of any joint probability mea-
sure exhibiting perfect anti-correlations between the outcomes of three binary experiments,
and this obviously implies the non-existence of trivial probability measures (i. e. ones as-
signing the probabilities 0 or 1 to all events) exhibiting the same anti-correlations; but the
converse is also true, because if a probability measure did exist, it would at least formally
be a convex combination of trivial probability measures, all of which would have to exhibit
perfect anti-correlations (since any weaker ones would spoil the perfect anti-correlations of
any of their convex combinations); the argument in the text establishes the non-existence
of such trivial probabiity measures.
32
Note that by the remark on convex combinations in the previous paragraph, the states
that are specifically quantum ( 14 ≤ α < 31 ) can be simulated by a convex combination of
a classical and a non-classical but non-quantum state. Such techniques can be put to
use for instance to analyse the non-locality of a quantum state in terms of the maximally

29
Take two spin- 12 systems in the so-called singlet state,

1
√ (|+i|−i + |−i|+i) . (62)
2
In this state, results of spin measurements in the same direction on the two
electrons are perfectly anti-correlated. The singlet state (62) is also rota-
tionally symmetric, so that perfect anti-correlations are obtained for pairs of
parallel measurements in whatever direction.33 Pairs of spin measurements
on different particles are always compatible, and the joint probability for
spin up in direction r on the left and −r0 on the right is equal to cos2 (ϑ/2),
where ϑ is the angle between r and r0 . Note that taking the two directions
r, r00 on the left and the two directions −r0 , −r on the right, one obtains four
compatible pairs comprising each one direction on the left and one on the
right.

Now, if one is attempting to construct a joint probability measure for


all four spin observables, then, given the perfect correlations for the pair
of measurements in the directions (r, −r), the constraint (55) with α =
cos2 (ϑ/2) will hold of the joint probabilities for (r, −r0 ) and (−r0 , r00 ) (in
both cases defined on different sides) and the putative joint probabilities for
(r00 , r) (defined on the same side).

It is obvious that one can have three spatial directions pairwise spanning
the same angle ϑ iff this angle is between 0 (when the three directions
are collinear) and 120 degrees (when they are coplanar). This correponds
exactly to values of cos2 (ϑ/2) between 1 and 41 . And it gives us a quantum
model for states satisfying (55) with 14 ≤ α ≤ 1.34
non-local non-quantum states (‘Popescu–Rohrlich boxes’) needed for its simulation (see
e. g. Cerf et al. 2005).
33
Here we are going slightly beyond the brief description of entangled states given in Sec-
tion 1. The point to grasp is that measurements on different subsystems are always com-
patible with each other, so we can consider the probabilities for such joint measuremenst
in an entangled state, and it turns out that some such states provide perfect examples of
correlations that cannot be reproduced classically.
34
A related quantum model of the same correlations is the following. Take a single spin-
1
2
system and take spin states corresponding to spin up in three directions r, r0 , r00 pairwise
spanning the same angle ϑ. The transition probability hψ|ϕi between any two such spin
states is also equal to cos2 (ϑ/2). Of course, sharp spin observables in non-collinear direc-
tions are not compatible, because the projections on the spin states in different directions
do not commute. So these transition probabilities only correspond to conditional prob-
abilities for outcomes of sequential minimally disturbing spin measurements. But if we
assume the initial state of the electron to be the maximally mixed state (which assigns

30
We can now make explicit the connection with classic results in quantum
mechanics about ruling out various kinds of ‘hidden variables’ models, in
particular the Bell inequalities and the Kochen–Specker theorem.

The best-known Bell inequality is the Clauser–Holt–Shimony–Holt (CHSH)


inequality (Clauser et al. 1969),

− 2 ≤ E(AB) − E(AB 0 ) + E(A0 B) + E(A0 B 0 ) ≤ 2 , (63)

where A, A0 , B, B 0 are two-valued observables with the values ±1, each of


A, A0 is compatible with each of B, B 0 , and

E(XY ) = p(X = 1, Y = 1) − p(X = −1, Y = 1)−


p(X = 1, Y = −1) + p(X = −1, Y = −1) (64)

is the correlation coefficient of X and Y . As is well-known, the CHSH


inequality (63) can be derived from the assumption of a local hidden vari-
ables model when A, A0 and B, B 0 are interpreted, respectively, as pairs of
observables pertaining to two (space-like) separated systems, e. g. spin- 12 ob-
servables in various directions for two different particles (Bell 1971). (One
readily recognises that the quantum version of our example above uses the
same set-up, with a special choice of directions.)

It was Fine (1982) who first pointed out that (63) is also the necessary
and sufficient condition for the existence of a joint probability measure for
the observables A, A0 , B, B 0 when the marginals for the four compatible pairs
are given. Such a joint probability measure is known as a ‘non-contextual
hidden variables’ model of the experimental situation, since the same mea-
sure returns the correct marginals irrespective of how an observable is paired
with an observable on the other side, or indeed on how an observable is as-
sumed to be measured.35 Pitowsky (1989a,b, 1991, 1994) then gave a general
probability 12 to spin in any direction), one can see that the statistics of such sequential
measurements on any two spin observables are independent of the order of measurement
(and the marginal statistics are the same as the statistics for single measurements). We
can thus say that any two spin observables are compatible in the maximally mixed state
(with either sequential observable playing the role of a joint fine-graining of the two single
observables). With this understanding, we can reproduce the example also using a single
spin system. (It is a general fact that using the perfect correlations of the singlet state,
one can always translate between results about the (im)possibility of modelling certain
correlations in two systems (‘non-locality’ results) and results about the (im)possibility of
modelling certain correlations in a single system (‘non-contextuality’) results.)
35
More generally (and, indeed, in the case of Bell’s derivation), one could consider

31
and systematic treatment of necessary and sufficient conditions for the ex-
istence of joint probability measures in terms of such inequalities, further
pointing out that these results had already been anticipated more than a
century earlier by George Boole (1862).36

In this sense, our discussion of the master example above must be a


special case of a Bell inequality, and in fact it is a special case of (63).
Setting A = B 0 in (63) we get

− 2 ≤ E(AB) − E(AA) + E(A0 B) + E(A0 A) =


E(AB) − 1 + E(A0 B) + E(A0 A) ≤ 2 (65)

for any three two-valued observables. Interpreting any two of them as ‘find-
ing or not finding a gem in the box’, and substituting the probabilities (55)
into (64), we have
α 1−α
E(XY ) = 2 −2 = α − (1 − α) = 2α − 1 (66)
2 2
for any distinct X, Y , and (65) becomes

− 2 ≤ 6α − 3 − 1 ≤ 2 , (67)

that is
2 ≤ 6α ≤ 6 , (68)
thus α ∈ [ 31 , 1], as above.
‘contextual hidden variables’ models of the correlations, i. e. allow the use of a different
classical probability measure in each experimental context, thus treating for instance the
outcomes of a measurement of A as different events in the two cases in which A is measured
together with B or together with B 0 . The relation between Bell’s and Fine’s derivations
of the CHSH inequalities can then be seen as follows. Take the case of the singlet state,
and assume that the two systems are spatially (or space-like) separated, call them the
left-hand and right-hand system. Assume that the probabilities for the outcomes of a spin
measurement on the left-hand system depend on the experimental context on the left.
But then, given the perfect (anti-)correlations, also the probabilities for outcomes of spin
measurements in the same direction on the right-hand side will (non-locally!) depend on
the experimental context on the left. Thus, a local hidden variables model that reproduces
the correlations for the singlet state must be non-contextual, and so by Fine’s theorem
necessarily obey the CHSH inequality. (We shall return to the idea of contextual classical
models of general probabilistic structures in our final Section 6.)
36
See also Beltrametti and Bugajski (1996) for further discussion of the case in which
probabilities fail to be induced by a joint probability measure (which they call the ‘Bell
phenomenon’).

32
The Kochen–Specker theorem instead takes finite sets of projection-
valued observables in a Hilbert space of dimension at least 3, that may
pairwise share a projection (partially compatible observables), and consid-
ers the question of whether values 1 and 0 may be assigned to the projections
in such a way that exactly one projection from each observable is assigned
the value 1. (The theorem was first announced in Specker (1960), and its
proof was published in Kochen and Specker (1967).)

One already knows that making such assignments to all projection-


valued observables in a Hilbert space of dimension at least 3 must lead to a
contradiction. Indeed, such assignments are simply trivial probability mea-
sures over the projections in Hilbert space, and by Gleason’s theorem (which
we discussed in Section 3) the most general such probability measures are the
quantum mechanical states, which in fact always assign non-trivial probabil-
ities to some observables. By the compactness theorem of first-order logic,
one then knows also that there must be a finite set of observables for which
such an assignment leads to a contradiction. But Kochen and Specker offer
a constructive proof that such a finite set of observables exists (the original
proof involved 117 one-dimensional projections in 3 dimensions37 ).

The case α = 0 in our example can now be seen as a ‘two-dimensional’


analogue of the Kochen–Specker theorem. Indeed, it can be seen as com-
prised of three interlocking pairs of projections, such that exactly one el-
ement in each pair is assigned the value 1, and the other one the value
0.

The analogy goes both ways: any Kochen–Specker construction (of finite
sets of orthonormal bases that cannot be assigned values 1 and 0 in such a
way that exactly one vector in each basis is assigned 1) is equivalent to the
non-existence of trivial probability measures satisfying suitable constraints
(in three dimensions, these are p(x ∨ y|z) = 0 and p(¬x|¬y ∧ ¬z) = 0 for
all orthonormal triples). But the existence of trivial probability measures
satisfying such constraints is in fact equivalent to the existence of non-trivial
probability measures satisfying the same constraints. Thus, indeed, every
Kochen–Specker theorem can be translated into the violation of some Bell–
Pitowsky inequality.38
37
More economical proofs are now available even in 3 dimensions, using as little as 31
one-dimensional projections (in an unpublished proof by Conway and Kochen), and in
4 dimensions, using as little as 18 (Cabello, Estebaranz and Garcı́a Alcaine, 1996). For
details and references, see e. g. Bub (1997) and Held (2013).
38
If I am not mistaken, this is the intuitive way of understanding the ‘non-contextuality’

33
6 Is probability empirical (and quantum)?

The title of this section recalls (tongue-in-cheek) the title of the classic paper
by Putnam (1968) in which he notoriously argued that quantum mechan-
ics requires a fundamental revision of logic. Empirical considerations alone
presumably cannot decide the question of whether logic is an empirical or
an a priori discipline (as forcefully pointed out in another classic paper by
Dummett (1976)). But if one is already sympathetic to the idea that logic
is an empirical discipline, then it does make sense to ask what kind of em-
pirical evidence might suggest adopting this or that logic, and in particular
whether the evidence we have for quantum mechanics suggests adopting a
non-classical one (e. g. one based on Kochen and Specker’s partial Boolean
algebras). Essentially, the question boils down to whether quantum logic
should be seen as a derivative construct that is definable in terms of and
alongside classical logic, or whether classical logic should be seen as an in-
stance of quantum logic restricted to certain special ‘well-behaved’ cases.39

A somewhat similar question might be asked with regard to probability.


We have seen in Sections 3–4 that quantum mechanics suggests introducing
a notion of probabilistic state generalising that of a probability measure,
and the non-embeddability result we have derived in Section 5 states that
in general the joint probability distributions defined by a state (in particu-
lar a quantum mechanical state) for certain pairs of observables cannot be
recovered as marginals of a single classical probability measure. In this final
section, we shall discuss whether these results should compel us to see clas-
sical probabilities as a special case of generalised probabilities (for the case
in which all observables are compatible), or whether generalised probability
theory could after all be derivative of classical probability.

There is a sense in which the latter question can indeed be trivially an-
swered in the affirmative, by taking seriously the idea that a general proba-
bilistic state should be seen as a family of classical probability measures, but
denying that they in fact overlap. For instance, in our master example above
(say, with α = 0), this means simply that instead of describing the relevant
inequalities first introduced in Cabello (2008), and shown to describe any Kochen–Specker
contradiction in Badzia̧g et al. (2008).
39
This would be analogous to, say, the application of intuitionistic logic to finitary
problems in mathematics, for which also tertium non datur becomes an intuitionistically
valid principle. For a recent review, emphasising that the answer might depend rather
sensitively on the interpretation of quantum mechanics, see Bacciagaluppi (2009).

34
probabilistic structure using a single state that assigns the probabilities
1
p(e) = p(f ) = p(g) = (69)
2
to the outcomes e, f and g (and the appropriate joint probabilities to pairs),
we describe it using three different classical probability measures, which
are to be applied respectively to the experiments in which we measure e
and f together, or f and g together, or g and e together, and that assign,
respectively, the probabilities
1 1 1
pef (e) = pef (f ) = , pf g (f ) = pf g (g) = , pge (g) = pge (e) = , (70)
2 2 2
to the single outcomes (and the appropriate joint probabilities to the three
pairs). We see now that these three probability measures can be derived
from a single classical probability measure if we assign probabilities also to
performing each of the experiments ef , f g and ge. This is just the ‘naı̈ve’
argument we rehearsed at the beginning of Section 5, but which is now no
longer blocked, because we resist identifying the two events eef and ege as
one (and similarly for f and for g). In this (formal) sense, a ‘contextual
hidden variables theory’ is always possible.40 (Note, however, that if we
then imagine performing the three joint measurements ef , f g and ge in
sequence, then the measured value of at least one observable, say e, must
be different in the two measurements containing it, in this case ef and ge.
Thus we have some mysterious form of ‘disturbance through measurement’,
much like in our simple discussion of spin in Section 1.)

What is crucial here is that instead of insisting that experimental out-


comes belonging to the same effect be identified as the same event, we rather
insist that experimental outcomes belonging to different experiments are dif-
ferent events. This suggestion should not be too hastily dismissed. Identify-
ing experimental outcomes that are equiprobable in all states might after all
be thought of only as a convenient book-keeping device of no fundamental
importance. (Even in quantum mechanics, as we pointed out in Section 3,
the same effect can correspond to different physical transformations of the
state in different experiments, so one could very well argue that these be
considered different events.)

In quantum mechanics, these questions are played out in the context


of the debate on hidden variables theories (see e. g. Shimony 1984). If ef-
fects (or at least projections) correspond directly to physical properties of a
40
For an explicit construction in the case of quantum mechanics, see Gudder (1970).

35
quantum system, which are then measured in various ways, then projections
in common to different observables (resolutions of the identity) should, in-
deed, be identified. If instead the properties of the system are some ‘hidden
variables’, which in the context of specific experimental arrangements lead
to certain experimental outcomes (perhaps with certain probabilities), then
projections no longer represent intrinsic properties of the system in general,
but only aspects of how systems can be probed in the context of specific
experimental situations.

These brief remarks suggest that the question of whether different experi-
mental outcomes ought to be identified should not be decided abstractly, but
rather in relation to specific theoretical commitments. We shall not attempt
a general discussion of this point, nor even an exhaustive one of hidden vari-
ables theories in quantum mechanics. What we shall do instead is illustrate
the point in some concrete implementations of our master example (mainly
for α = 0), which will enable us to see the possibility of underlying mecha-
nisms providing us with a rationale for deciding when different experimental
outcomes should be treated as different events.

The original setting of our case α = 0 (but without the probabilistic


structure) is the tale of the Sage of Nineveh from Specker (1960) (my trans-
lation):41

At the Assyrian school for prophets in Arba’ilu, there taught, in the


age of king Asarhaddon, a sage from Nineveh. He was an outstanding
representative of his discipline (solar and lunar eclipses), who, except
for the heavenly bodies, had thoughts almost only for his daughter.
His teaching success was modest; the discipline was seen as dry, and
did furthermore require previous mathematical knowledge that was
rarely available. If in his teaching he thus failed to gain the interest he
would have wanted from the students, he received it overabundantly in
a different field: no sooner had his daughter reached the marriageable
age, than he was flooded with requests for her hand from students and
young graduates. And even though he did not imagine wishing to keep
her with him forever, yet she was still far too young, and the suitors in
41
This is the paper that first (informally) introduced the notion of partial Boolean
algebras, later developed in detail by Kochen and Specker (1965a,b). The tale can in
fact be analysed as an example of a (non-transitive, hence non-quantum) partial Boolean
algebra. Specker was a great story-teller, and I personally heard him tell this particular
story I think in the spring of 1985.

36
no way worthy of her. And so that each should himself be assured of
his unworthiness, he promised her hand to the one who could perform
a set prophecy task. The suitor was led in front of a table on which
stood three boxes in a row, and urged to say which boxes contained a
gem and which were empty. Yet, as many as would try it, it appeared
impossible to perform the task. After his prophecy, each suitor was
in fact urged by the father to open two boxes that he had named as
both empty or as both not empty: it always proved to be that one
contained a gem and the other did not, and actually the gem lay now
in the first, now in the second of the opened boxes. But how should
it be possible, out of three boxes, to name no two as empty or as not
empty? Thus indeed the daughter would have remained unmarried
until her father’s death, had she not upon the prophecy of a prophet’s
son swiftly opened two boxes herself, namely one named as full and
one named as empty — which they yet truly turned out to be. Upon
the father’s weak protest that he wanted to have two different boxes
opened, she tried to open also the third box, which however proved
to be impossible, upon which the father, grumbling, let the unfalsified
prophecy count as successful.

The question we wish to address is whether opening box A in the context


of also opening box B is the same event as opening box A in the context of
also opening box C. Given our usual intuitions, i. e. background theoretical
assumptions, it would seem that we do have to identify the two events. But
we can also imagine the situation as follows.

What the father wants to establish is whether any of the suitors are
better prophets than himself (only then would he willingly surrender his
daughter’s hand in marriage). Whenever a suitor is set the task, the father
predicts which two boxes will be opened, and places exactly one gem at
random in one of the two boxes. (Note that, in this form, the example bears
some analogy to Newcomb’s paradox!) If we now assume that the father
possesses a genuine gift for clairvoyant prophecy, the action of opening boxes
A and B, or of opening A and C, has a retrocausal effect on whether the
father has placed the gem in either A or B, or has placed it in either A or
C.

Now we have an explanation of why to each of the three experimental


situations corresponds a different classical probability measure, and there
appears to be no longer a motivation for describing the situation using a

37
single probabilistic state irrespective of the experimental situation. (Note
that should probabilities be defined also for which two of the boxes will be
opened, one could again introduce a single classical measure from which the
three probability measures arise through conditionalisation.)

We can imagine a different mechanism (and arrive at the same conclu-


sion) by considering another classic illustration of our example, the so-called
‘firefly box’.42 Imagine that the three boxes are in fact chambers at the cor-
ners of a single box in the shape of an equilateral triangle, all three chambers
being accessible from the centre. Assume that we are in darkness, and hold
up a lantern to any one of the sides of the triangle. And assume that what
we observe is always that (at random) one chamber on the illuminated side
faintly starts to glow. Probabilistically, this is exactly the same example as
above. But we can now imagine a different explanation for this phenomenon,
as follows.

At the centre of the box sits a firefly, which is attracted to the light of
our lantern, and thus enters at random one of the two chambers on the side
from which we are approaching. And mistaking our lantern for a potential
mating partner, the firefly starts to glow!

We have again a mechanism explaining the statistics of our experiments,


and we can give the same classical probabilistic model of the situation as
before, i. e. we have three different experiments, each of which is described
by a different classical probability measure. And if we so wish, we can again
introduce probabilities for our approaching from any particular side.

Quantum mechanics provides us with non-classical, non-contextual prob-


abilistic models of various phenomena, and several impossibility theorems
show that there fail to be any classical non-contextual probabilistic mod-
els reproducing the quantum mechanical statistics (so-called non-contextual
hidden variables theories). In these examples instead we see the analogues
of various strategies used in quantum mechanics to introduce classical but
contextual probabilistic models (so-called contextual hidden variables theo-
ries).

Indeed, only few retrocausal models of quantum mechanics have been


42
It appears that this illustration was devised by Dave Foulis to explain the work of his
group to Eugene Wigner, who happened to be visiting. (Thanks to Alex Wilce for relating
the anecdote.)

38
developed in detail, but retrocausality has long been recognised as a possi-
ble strategy to deal with the puzzles of quantum mechanics, in particular
in the face of the Bell inequalities.43 The firefly model instead more closely
resembles a theory like de Broglie and Bohm’s pilot-wave theory, in which
experimental outcomes depend on both the initial configuration of the sys-
tem (e.g. the position of an electron) and the details of the experimental
arrangement. This last point can probably best be seen in another slight
variant of the example.

Imagine that instead of the firefly we have a small metal ball in the cen-
tre of the box, and that each experiment consists of tilting the box towards
one of the sides, say AB. The ball rolls towards the side AB and bounces
off a metal pin either into chamber A or chamber B, depending on its exact
initial location to the left or the right of the symmetry axis perpendicular to
the side AB. It is now clear that the same initial position of the ball might
lead it to fall or not to fall into, say, chamber A, depending on whether the
whole box is tilted towards the side AB or the side CA (namely if the ball
is on the left of the symmetry axis through AB as well as to the left of
the symmetry axis through CA). Thus, depending on which way the box is
tilted, the ball ending up in A corresponds to a different random variable
on the probability space of initial positions of the ball. If the initial position
of the ball is uniformly distributed in a symmetric neighbourhood of the
centre of the triangle, the equal probabilities of the non-classical state are
reproduced. But if the initial position is not in such an ‘equilibrium’ distri-
bution, deviations from the probabilities in the Lemma can occur — so that
if one allows also such ‘disequilibrium’ hidden states, different experimental
outcomes are in fact no longer equiprobable in all states.44

Our examples can be easily generalised to include e. g. the original Kochen–


Specker example — for which we need a spherical firefly box with 117 sub-
chambers, only three of which are made accessible to the firefly every time
we approach the box (depending on how exactly we approach it). Or in-
43
This has been recognised at least since the work of Costa de Beauregard (1977). For
a detailed retrocausal model, see Sutherland (2008), and for in-depth discussion of both
retrocausality and its possible role in quantum mechanics, see Price (1996).
44
As remarked already, in a measurement of spin in pilot-wave theory the same initial
position of a particle can lead to a spot on the screen corresponding to spin up or spin down
depending on the relative orientation of the polarity and the gradient of the magnetic field
that deflects the particle (this is a case of ‘environmental’ contextuality going beyond the
‘algebraic’ one). For the notion of disequilibrium in pilot-wave theory, see e. g. Valentini
(2004) and Towler, Russell and Valentini (2012).

39
deed to include cases with α > 0. For the latter, we need a cubical firefly
box, which we approach from any of the six faces (counting opposite faces
as equivalent). On each face, the four corners correspond to, say, e ∧ f
and ¬e ∧ ¬f across one diagonal, and e ∧ ¬f and ¬e ∧ f across the other,
and similarly with f and g, or g and e, on the other faces. The classical
cases can be obtained if the firefly just sits somewhere in the box (maybe
preferentially along one spatial diagonal — where food might be provided),
and starts to glow when it sees the light from our lantern. We then observe
the projections of the firefly’s position on the face from which we approach.
The non-classical cases can be obtained if the firefly moves towards the side
from which we are approaching, and through various obstacles is channelled
preferentially (although not always) along the planar diagonal correspond-
ing to the opposite outcomes for that face (say e ∧ ¬f and ¬e ∧ f ). We can
thus construct classical but contextual models that violate (our special case
of) the Bell inequalities, reproducing the quantum violations, or even the
non-quantum violations (reducing to the equilateral triangle in the limit).

In conclusion, while the results of Section 5 show that a generalised prob-


abilistic model as introduced in Section 4 cannot always be embedded in a
single classical probability space, the examples in this section indicate that
it can always be reproduced using a family of classical probability measures
indexed by different experimental contexts, if indeed we have a reason to
resist the temptation to identify experimental outcomes across different ex-
periments. Identifying experimental outcomes that are equiprobable in all
states may be completely natural once a theoretical setting is given; but
whether two events are to be judged the same is not a formal question, nor
can it be decided purely on operational grounds. Instead it depends on the
choice of theoretical setting. In the specific case of quantum probabilities,
this question is closely related to the notorious question of the interpretation
of quantum mechanics.

Appendix

We give the proofs of the Lemma and the Proposition from Section 5.

Proof of the Lemma:


Note first that if α 6= 1, then for any pair (x, y) we have p(¬y) 6= 0 and
p(x) 6= 0. Indeed, if p(¬y) = 0, then p(¬x|¬y) is ill-defined, contrary to

40
assumption, and if p(¬y) 6= 0 but p(x) = 0, then p(¬x) = 1, and p(¬x|¬y) =
1, also contrary to assumption. Thus we can write

p(x ∧ ¬y)
= p(x|¬y) = 1 − p(¬x|¬y) = 1 − α =
p(¬y)
p(¬y ∧ x)
1 − p(y|x) = p(¬y|x) = . (71)
p(x)

Since α 6= 1, also the numerators are non-zero, and we have

p(x) = p(¬y) . (72)

But if
p(e) = p(¬f ) , p(f ) = p(¬g) , p(g) = p(¬e) , (73)
it follows that
1
p(e) = p(¬e) = p(f ) = p(¬f ) = p(g) = p(¬g) = . (74)
2
Finally, by (74) and assumption (55), we have
α
p(x ∧ y) = p(¬x ∧ ¬y) = , (75)
2
and
1−α
p(x ∧ ¬y) = p(¬x ∧ y) = (76)
2
for any (x, y). QED.

Proof of the Proposition:


‘Only if’ implication: Assume p is induced by a joint probability mea-
sure (also denoted by p). Then, by repeatedly applying the Lemma to the
Boolean algebras generated by {e, f }, {f, g} and {g, e}, we have:
α
p(e ∧ f ∧ g) + p(e ∧ f ∧ ¬g) = (77)
2
α
p(¬e ∧ ¬f ∧ g) + p(¬e ∧ ¬f ∧ ¬g) = , (78)
2

α
p(e ∧ f ∧ g) + p(¬e ∧ f ∧ g) = (79)
2
α
p(e ∧ ¬f ∧ ¬g) + p(¬e ∧ ¬f ∧ ¬g) = , (80)
2

41
and
α
p(e ∧ f ∧ g) + p(e ∧ ¬f ∧ g) = (81)
2
α
p(¬e ∧ f ∧ ¬g) + p(¬e ∧ ¬f ∧ ¬g) = , (82)
2
respectively. From (77), (79) and (81),

p(e ∧ f ∧ ¬g) = p(¬e ∧ f ∧ g) = p(e ∧ ¬f ∧ g) =


α α
− p(e ∧ f ∧ g) ≤ , (83)
2 2
and from (78) (80) and (82),

p(¬e ∧ ¬f ∧ g) = p(e ∧ ¬f ∧ ¬g) = p(¬e ∧ f ∧ ¬g) =


α α
− p(¬e ∧ ¬f ∧ ¬g) ≤ . (84)
2 2
But now, from (77), (78), (83) and (84),

1 = p(e ∧ f ∧ g) + p(e ∧ f ∧ ¬g) + p(e ∧ ¬f ∧ g) + p(e ∧ ¬f ∧ ¬g)+


p(¬e∧f ∧g)+p(¬e∧f ∧¬g)+p(¬e∧¬f ∧g)+p(¬e∧¬f ∧¬g) ≤ 3α ,
(85)

Thus α ≥ 31 . QED.
1
‘If’ implication: let 3 ≤ α < 1 (note that this construction works also with
α = 1), and let
3α − 1 1−α
a := and b := . (86)
4 4
We have a, b ∈ [0, 1]. Set

p(e ∧ f ∧ g) = p(¬e ∧ ¬f ∧ ¬g) = a (87)

and

p(e ∧ f ∧ ¬g) = p(e ∧ ¬f ∧ g) = p(e ∧ ¬f ∧ ¬g) =


p(¬e ∧ f ∧ g) = p(¬e ∧ f ∧ ¬g) = p(¬e ∧ ¬f ∧ g) = b . (88)

Then the probability measure p induces a state satisfying both (74) (because
a + 3b = 21 ) and (77)–(82) (because a + b = α2 ). Thus the state satisfies (55).
QED.

42
Note that this is not the unique probability measure inducing the given
state. As in the case α = 1, one need not have p(e∧f ∧g) = p(¬e∧¬f ∧¬g).
Indeed, for any ε ∈ [− min( 3α−1 1−α 3α−1 1−α
4 , 4 ), min( 4 , 4 )], one can set
3α − 1
p(e ∧ f ∧ g) = +ε (89)
4
and
3α − 1
p(¬e ∧ ¬f ∧ ¬g) = −ε , (90)
4
and extend via (83)–(84) to a probability measure inducing the same state.

We see that we can construct a classical model of the given probabilistic


state if and only if 13 ≤ α ≤ 1. In the case of the non-unique states with
α = 1, this is given uniquely by (57)–(58), and in the case of the unique
states with α < 1, it is given non-uniquely by (89)–(90) and (83)–(84).

Acknowledgements

I would like to thank Alan Hájek and Crish Hitchcock for their invitation to con-
tribute a version of this article to the Oxford Handbook of Probability and Philosophy
and the opportunity to write on this topic, and for very helpful feedback on a previ-
ous draft. I am further grateful to Alex Wilce for some extremely useful discussions
and hard-to-find references, to Jennifer Bailey for some stylistic advice, and to the
audience of the Philosophy of Physics seminar at the University of Aberdeen, who
heard preliminary versions of this material.

References

Albert, D. (1992), Quantum Mechanics and Experience (Cambridge, Mass.: Har-


vard University Press).

Bacciagaluppi, G. (2009), ‘Is Logic Empirical?’, in D. Gabbay, D. Lehmann and K.


Engesser (eds.), Handbook of Quantum Logic (Amsterdam: Elsevier), pp. 49–78,
http://philsci-archive.pitt.edu/3380/.

Bacciagaluppi, G. (2013), ‘Measurement and Classical Regime in Quantum Me-


chanics’, in R. Batterman (ed.), Oxford Handbook of Philosophy of Physics (Oxford:
OUP), pp. 416–459 http://philsci-archive.pitt.edu/8770/.

Bacciagaluppi, G., and Wilce, A. (in preparation), ‘Specker’s Principle and (Or-
tho)coherence’.

43
Badzia̧g, P., Bengtsson, I., Cabello, A., and Pitowsky, I. (2008), ‘Universality of
State-independent Violation of Correlation Inequalities for Noncontextual Theo-
ries’, Physical Review Letters 101, 210401, http://arxiv.org/abs/0809.0430.

Bell, J. S. (1971), ‘Introduction to the Hidden-Variable Question’, Proceedings of the


International School of Physics ‘Enrico Fermi’, Course IL, Foundations of Quan-
tum Mechanics (New York: Academic), pp. 171–181. Reprinted in J. S. Bell,
Speakable and Unspeakable in Quantum Mechanics (Cambridge: CUP), pp. 29–39.

Beltrametti, E. G., and Bugajski, S. (1996), ‘The Bell Phenomenon in Classical


Frameworks’, Journal of Physics A: Mathematical and General 29(2), 247–261.

Beltrametti, E. G., and Cassinelli, G. (1981), The Logic of Quantum Mechanics


(Reading, Mass.: Addison-Wesley).

Birkhoff, G., and von Neumann, J. (1936), ‘The Logic of Quantum Mechanics’,
Annals of Mathematics 37, 823–843. Reprinted in Hooker (1975), pp. 1–26.

Boole, G. (1862), ‘On the Theory of Probabilities’, Philosophical Transactions of


the Royal Society of London 152, 225–252.

Bub, J. (1997), Interpreting the Quantum World (Cambridge: CUP).

Busch, P. (2003), ‘Quantum States and Generalized Observables: A Simple Proof


of Gleason’s Theorem’, Physical Review Letters 91(12), 120403, http://arxiv.
org/abs/quant-ph/9909073.

Busch, P., Lahti, P. J. and Mittelstaedt, P. (1991), The Quantum Theory of Mea-
surement (Berlin: Springer).

Busch, P., Grabowski, M., and Lahti, P. J. (1995), Operational Quantum Physics
(Berlin: Springer).

Cabello, A. (2008), ‘Experimentally Testable State-Independent Quantum Contex-


tuality’, Physical Review Letters 101, 210401, http://arxiv.org/abs/0808.2456.

Cabello, A. (2012), ‘Specker’s Fundamental Principle of Quantum Mechanics’,


http://arxiv.org/abs/1212.1756.

Cabello, A. (2013), ‘Simple Explanation of the Quantum Violation of a Funda-


mental Inequality’, Physical Review Letters 110, 060402, http://arxiv.org/abs/
1210.2988.

Cabello, A., Estebaranz, J. M., and Garcı́a Alcaine, G. (1996), ‘Bell–Kochen–


Specker Theorem: A Proof with 18 Vectors’, Physics Letters A 212(4), 183–187,
http://arxiv.org/abs/quant-ph/9706009.

44
Cabello, A., Severini, S., and Winter, A. (2010), ‘(Non-)Contextuality of Physical
Theories as an Axiom’, http://arxiv.org/abs/1010.2163.

Cattaneo, G., Marsico, T., Nisticò, G., and Bacciagaluppi, G. (1997) ‘A Concrete
Procedure for Obtaining Sharp Reconstructions of Unsharp Observables in Finite-
Dimensional Quantum Mechanics’, Foundations of Physics 27, 1323–1343.

Cerf, N. J., Gisin, N., Massar, S., and Popescu, S. (2005), ‘Simulating Maximal
Quantum Entanglement Without Communication’, Physical Review Letters 94(22),
220403.

Chiribella, G., D’Ariano, G. M., and Perinotti, P. (2011), ‘Informational Derivation


of Quantum Theory’, Physical Review A 84, 012311, http://arxiv.org/abs/
1011.6451.

Clauser, J. F., Horne, M. A., Shimony, A., and Holt, R. A. (1969), ‘Proposed Ex-
periment to Test Local Hidden-Variable Theories’ Physical Review Letters 23(15),
880–884.

Coecke, B., Moore, D. and Wilce, A. (eds.) (2000), Current Research in Operational
Quantum Logic (Dordrecht: Kluwer).

Coecke, B., Moore, D. and Wilce, A. (2001), ‘Operational Quantum Logic: An


Overview’, http://arxiv.org/abs/quant-ph/0008019.

Comte, C. (1996), ‘Symmetry, Relativity and Quantum Mechanics’, Il Nuovo Ci-


mento B 111(8), 937–956.

Costa de Beauregard, O. (1977), ‘Time Symmetry and the Einstein Paradox’, Il


Nuovo Cimento B 42(1), 41–64.

Dakić, B., and Brukner, Č. (2011), ‘Quantum Theory and Beyond: Is Entanglement
Special?’, in H. Halvorson (ed.), Deep Beauty: Understanding the Quantum World
through Mathematical Innovation (Cambridge: CUP), pp. 365–392, http://arxiv.
org/abs/0911.0695.

Darrigol, O. (forthcoming), Physics and Necessity: Rationalist Pursuits from the


Cartesian Past to the Quantum Present (Oxford: OUP).

Dummett, M. (1976), ‘Is logic empirical?’, in H. D. Lewis (ed.), Contemporary


British Philosophy, 4th series (London: Allen and Unwin), pp. 45–68. Reprinted in
M. Dummett, Truth and other Enigmas (London: Duckworth, 1978), pp. 269–289.

Fine, A. (1982), ‘Hidden Variables, Joint Probability and Bell Inequalities’, Physical
Review Letters 48, 291–295.

45
Foulis, D. J., and Bennett, M. K. (1994), ‘Effect Algebras and Unsharp Quantum
Logics’, Foundations of Physics 24, 1331–1352.

Foulis, D. J., and Randall, C. H. (1972), ‘Operational Statistics. I. Basic Concepts’,


Journal of Mathematical Physics 13, 1667–1675.

Foulis, D. J., and Randall, C. H. (1974), ‘Empirical Logic and Quantum Mechanics’,
Synthese 29, 81–111.

Foulis, D. J., and Randall, C. H. (1979), ‘Tensor Products of Quantum Logics do


not Exist’, Notices of the American Mathematical Society 26, 557.

Foulis, D. J., and Randall, C. H. (1981), ‘Empirical Logic and Tensor Products’, in
H. Neumann (ed.), Interpretations and Foundations of Quantum Theory (Mannheim:
Bibliographisches Institut), pp. 9–20.

Ghirardi, G.C. (1997), Un’occhiata alle carte di Dio (Milano: Il Saggiatore). Transl.
by G. Malsbary as Sneaking a Look at God’s Cards (Princeton: Princeton University
Press, 2005).

Giuntini, R., and Greuling, H. (1989), ‘Toward a Formal Language for Unsharp
Properties’, Foundations of Physics 19, 931–945.

Gleason, A. M. (1957), ‘Measures on the Closed Subspaces of a Hilbert Space’,


Journal of Mathematics and Mechanics 6, 885–893. Reprinted in Hooker (1975),
pp. 123–133.

Goyal, P. (2008a), ‘An Information-Geometric Reconstruction of Quantum Theory,


I: The Abstract Quantum Formalism’, Physical Review A 78, 052120, http://
arxiv.org/abs/0805.2761.

Goyal, P. (2008b), ‘An Information-Geometric Reconstruction of Quantum Theory,


II: The Correspondence Rules of Quantum Theory’, http://arxiv.org/abs/0805.
2765.

Gudder, S. P. (1970), ‘On Hidden-Variable Theories’, Journal of Mathematical


Physics 11, 431–436.

Hardegree, G. M., and Frazer, P. J. (1981), ‘Charting the Labyrinth of Quantum


Logics: A Progress Report’, in E. G. Beltrametti and B. C. van Fraassen (eds.),
Current Issues in Quantum Logic (New York: Plenum), pp. 53–76.

Hardy, L. (2002), ‘Why Quantum Theory?’, in T. Placek and J. Butterfield (eds.),


Non-locality and Modality, NATO Science Series II, Vol. 64 (Dordrecht: Kluwer),
pp. 61–73, http://arxiv.org/abs/quant-ph/0111068.

46
Held, C. (2013), ‘The Kochen–Specker Theorem’, in E. N. Zalta (ed.), The Stanford
Encyclopedia of Philosophy (Spring 2013 Edition), http://plato.stanford.edu/
archives/spr2013/entries/kochen-specker/.

Hooker, C. A. (1975), The Logico-Algebraic Approach to Quantum Mechanics, Vol. 1


(Dordrecht: Reidel).

Hughes, R. I. G. (1985), ‘Review of S. Kochen and E. P. Specker, “Logical Structures


Arising in Quantum Theory”, etc.’, Journal of Symbolic Logic 50(2), 558–566.

Kochen, S., and Specker, E. P. (1965a), ‘Logical Structures Arising in Quantum


Theory’, in L. Addison, L. Henkin and A. Tarski (eds.), The Theory of Models
(Amsterdam: North-Holland), pp. 177–189. Reprinted in Hooker (1975), pp. 263–
276.

Kochen, S., and Specker, E. P. (1965b), ‘The Calculus of Partial Propositional


Functions’, in Y. Bar-Hillel (ed.), Logic, Methodology, and Philosophy of Science
(Amsterdam: North-Holland), pp. 45–57. Reprinted in Hooker (1975), pp. 277–292.

Kochen, S., and Specker, E. P. (1967), ‘The Problem of Hidden Variables in Quan-
tum Mechanics’, Journal of Mathematics and Mechanics 17, 59–88. Reprinted in
Hooker (1975), pp. 293–328.

Kôpka, F. (1992), ‘D-posets of Fuzzy Sets’, Tatra Mountains Mathematical Publi-


cations 1, 83–87.

Ludwig, G. (1954), Die Grundlagen der Quantenmechanik (Berlin: Springer). Transl.


by C. A. Hein as Foundations of Quantum Mechanics (Berlin: Springer, 1983).

Ludwig, G. (1985), An Axiomatic Basis of Quantum Mechanics. 1. Derivation of


Hilbert Space (Berlin: Springer).

Mackey, G. W. (1957), ‘Quantum Mechanics and Hilbert Space’, American Math-


ematical Monthly 64(2), 45–57.

Mackey, G. W. (1963), The Mathematical Foundations of Quantum Mechanics


(New York: W. A. Benjamin).

Marlow, A. R. (ed.) (1978), Mathematical Foundations of Quantum Physics (New


York: Academic).

Masanes, L., and Müller, M. (2011), ‘A Derivation of Quantum Theory from Phys-
ical Requirements’, New Journal of Physics 13, 063001, http://arxiv.org/abs/
1004.1483.

Pitowsky, I. (1989a), ‘From George Boole to John Bell: The Origins of Bell’s In-

47
equality’, in M. Kafatos (ed.), Bell’s Theorem, Quantum Theory and the Concep-
tions of the Universe (Dordrecht: Kluwer), pp. 37–49.

Pitowsky, I. (1989b), Quantum Probability, Quantum Logic, Lecture Notes in Physics,


Vol. 321 (Berlin: Springer).

Pitowsky, I. (1991), ‘Correlation Polytopes, their Geometry and Complexity’, Math-


ematical Programming A 50, 395–414.

Pitowsky, I. (1994), ‘George Boole’s “Conditions of Possible Experience” and the


Quantum Puzzle’, The British Journal for the Philosophy of Science 45(1), 95–125.

Price, H. (1996), Time’s Arrow and Archimedes’ Point: New Directions for the
Physics of Time (Oxford: OUP).

Putnam, H. (1968), ‘Is Logic Empirical?’ in R. Cohen and M. Wartofsky (eds.),


Boston Studies in the Philosophy of Science, Vol. 5 (Dordrecht: Reidel), pp. 216–
241. Reprinted as ‘The Logic of Quantum Mechanics’ in H. Putnam, Mathemat-
ics, Matter, and Method. Philosophical Papers, Vol. 1 (Cambridge: CUP, 1975),
pp. 174–197.

Randall, C. H., and Foulis, D. J. (1970), ‘An Approach to Empirical Logic’, Amer-
ican Mathematical Monthly 77, 363–374.

Randall, C. H., and Foulis, D. J. (1973), ‘Operational Statistics. II. Manuals of


Operations and their Logics’, Journal of Mathematical Physics 14, 1472–1480.

Shimony, A. (1984), ‘Contextual Hidden Variables Theories and Bell’s Inequalities’,


The British Journal for the Philosophy of Science 35(1), 25–45.

Specker, E. P. (1960), ‘Die Logik nicht gleichzeitig entscheidbarer Aussagen’, Di-


alectica 14, 239–246. Transl. by A. Stairs as ‘The Logic of Propositions which are
not Simultaneously Decidable’ in Hooker (1975), pp 135–140, and by M. P. Seevinck
as ‘The Logic of Non-simultaneously Decidable Propositions’, http://arxiv.org/
abs/1103.4537.

Sutherland, R. I. (2008), ‘Causally Symmetric Bohm Model’, Studies in History and


Philosophy of Modern Physics 39(4), 782–805, http://arxiv.org/abs/quant-ph/
0601095.

Towler, M. D., Russell, N. J., and Valentini, A. (2012), ‘Timescales for Dynamical
Relaxation to the Born Rule’, Proceedings of the Royal Society A: Mathematical,
Physical and Engineering Science 468(2140), 990-1013, http://arxiv.org/abs/
1103.1589.

Valentini, A. (2004), ‘Universal Signature of Non-quantum Systems’, Physics Let-

48
ters A 332(3), 187–193, http://arxiv.org/abs/quant-ph/0309107.

Varadarajan, V. S. (1968), The Geometry of Quantum Theory, Vols. I–II (New


York: Van Nostrand).

Wallace, D. (2008), ‘Philosophy of Quantum Mechanics’, in D. Rickles (ed.), The


Ashgate Companion to Contemporary Philosophy of Physics (Aldershot: Ashgate),
pp. 16–98. (Preliminary version available as ‘The Quantum Measurement Problem:
State of Play (December 2007)’, http://philsci-archive.pitt.edu/3420/.)

Wilce, A. (2000), ‘Test Spaces and Orthoalgebras’, in Coecke, Moore and Wilce
(2000), pp. 81–114.

Wilce, A. (2012), ‘Quantum Logic and Probability Theory’, in E. N. Zalta (ed.), The
Stanford Encyclopedia of Philosophy (Fall 2012 Edition), http://plato.stanford.
edu/archives/fall2012/entries/qt-quantlog/.

49

You might also like