Lecture Notes For Physics 229: Quantum Information and Computation
Lecture Notes For Physics 229: Quantum Information and Computation
can be implemented in an ion trap with altogether 5 laser pulses. The condi-
tional excitation of a phonon, Eq. (1.35) has been demonstrated experimen-
tally, for a single trapped ion, by the NIST group.
One big drawback of the ion trap computer is that it is an intrinsically
slow device. Its speed is ultimately limited by the energy-time uncertainty
relation. Since the uncertainty in the energy of the laser photons should be
small compared to the characteristic vibrational splitting , each laser pulse
should last a time long compared to 1 . In practice, is likely to be of
order 100 kHz.
1.9.3 NMR
A third (dark horse) hardware scheme has sprung up in the past year, and
has leap frogged over the ion trap and cavity QED to take the current lead
in coherent quantum processing. The new scheme uses nuclear magnetic
resonance (NMR) technology. Now qubits are carried by certain nuclear
spins in a particular molecule. Each spin can either be aligned (j "i = j0i)
or antialigned (j #i = j1i) with an applied constant magnetic eld. The
spins take a long time to relax or decohere, so the qubits can be stored for a
reasonable time.
We can also turn on a pulsed rotating magnetic eld with frequency
! (where the ! is the energy splitting between the spin-up and spin-down
states), and induce Rabi oscillations of the spin. By timing the pulse suitably,
we can perform a desired unitary transformation on a single spin (just as in
our discussion of the ion trap). All the spins in the molecule are exposed to
the rotating magnetic eld but only those on resonance respond.
Furthermore, the spins have dipole-dipole interactions, and this coupling
can be exploited to perform a gate. The splitting between j "i and j #i for
one spin actually depends on the state of neighboring spins. So whether a
driving pulse is on resonance to tip the spin over is conditioned on the state
of another spin.
1.9. QUANTUM HARDWARE 35
All this has been known to chemists for decades. Yet it was only in the
past year that Gershenfeld and Chuang, and independently Cory, Fahmy, and
Havel, pointed out that NMR provides a useful implementation of quantum
computation. This was not obvious for several reasons. Most importantly,
NMR systems are very hot. The typical temperature of the spins (room
temperature, say) might be of order a million times larger than the energy
splitting between j0i and j1i. This means that the quantum state of our
computer (the spins in a single molecule) is very noisy { it is subject to
strong random thermal
uctuations. This noise will disguise the quantum
information. Furthermore, we actually perform our processing not on a single
molecule, but on a macroscopic sample containing of order 1023 \computers,"
and the signal we read out of this device is actually averaged over this ensem-
ble. But quantum algorithms are probabilistic, because of the randomness of
quantum measurement. Hence averaging over the ensemble is not equivalent
to running the computation on a single device; averaging may obscure the
results.
Gershenfeld and Chuang and Cory, Fahmy, and Havel, explained how to
overcome these diculties. They described how \eective pure states" can
be prepared, manipulated, and monitored by performing suitable operations
on the thermal ensemble. The idea is to arrange for the
uctuating properties
of the molecule to average out when the signal is detected, so that only the
underlying coherent properties are measured. They also pointed out that
some quantum algorithms (including Shor's factoring algorithm) can be cast
in a deterministic form (so that at least a large fraction of the computers give
the same answer); then averaging over many computations will not spoil the
result.
Quite recently, NMR methods have been used to prepare a maximally
entangled state of three qubits, which had never been achieved before.
Clearly, quantum computing hardware is in its infancy. Existing hardware
will need to be scaled up by many orders of magnitude (both in the number of
stored qubits, and the number of gates that can be applied) before ambitious
computations can be attempted. In the case of the NMR method, there is
a particularly serious limitation that arises as a matter of principle, because
the ratio of the coherent signal to the background declines exponentially with
the number of spins per molecule. In practice, it will be very challenging to
perform an NMR quantum computation with more than of order 10 qubits.
Probably, if quantum computers are eventually to become practical de-
vices, new ideas about how to construct quantum hardware will be needed.
36 CHAPTER 1. INTRODUCTION AND OVERVIEW
1.10 Summary
This concludes our introductory overview to quantum computation. We
have seen that three converging factors have combined to make this subject
exciting.
1. Quantum computers can solve hard problems. It seems that
a new classication of complexity has been erected, a classication
better founded on the fundamental laws of physics than traditional
complexity theory. (But it remains to characterize more precisely the
class of problems for which quantum computers have a big advantage
over classical computers.)
2. Quantum errors can be corrected. With suitable coding methods,
we can protect a complicated quantum system from the debilitating
eects of decoherence. We may never see an actual cat that is half dead
and half alive, but perhaps we can prepare and preserve an encoded cat
that is half dead and half alive.
3. Quantum hardware can be constructed. We are privileged to be
witnessing the dawn of the age of coherent manipulation of quantum
information in the laboratory.
Our aim, in this course, will be to deepen our understanding of points
(1), (2), and (3).
Chapter 2
Foundations I: States and
Ensembles
2.1 Axioms of quantum mechanics
For a few lectures I have been talking about quantum this and that, but
I have never dened what quantum theory is. It is time to correct that
omission.
Quantum theory is a mathematical model of the physical world. To char-
acterize the model, we need to specify how it will represent: states, observ-
ables, measurements, dynamics.
1. States. A state is a complete description of a physical system. In
quantum mechanics, a state is a ray in a Hilbert space.
What is a Hilbert space?
a) It is a vector space over the complex numbers C. Vectors will be
denoted j i (Dirac's ket notation).
b) It has an inner product h j'i that maps an ordered pair of vectors
to C, dened by the properties
(i) Positivity: h j i > 0 for j i = 0
(ii) Linearity: h'j(aj 1i + bj 2i) = ah'j 1i + bh'j 2i
(iii) Skew symmetry: h'j i = h j'i
c) It is complete in the norm jj jj = h j i1=2
37
38 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
(Completeness is an important proviso in innite-dimensional function
spaces, since it will ensure the convergence of certain eigenfunction
expansions { e.g., Fourier analysis. But mostly we'll be content to
work with nite-dimensional inner product spaces.)
What is a ray? It is an equivalence class of vectors that dier by multi-
plication by a nonzero complex scalar. We can choose a representative
of this class (for any nonvanishing vector) to have unit norm
h j i = 1: (2.1)
We will also say that j i and eij i describe the same physical state,
where jeij = 1.
(Note that every ray corresponds to a possible state, so that given two
states j'i; j i, we can form another as aj'i + bj i (the \superposi-
tion principle"). The relative phase in this superposition is physically
signicant; we identify aj'i + bj'i with ei(aj'i + bj i) but not with
aj'i + eibj i:)
2. Observables. An observable is a property of a physical system that
in principle can be measured. In quantum mechanics, an observable is
a self-adjoint operator. An operator is a linear map taking vectors to
vectors
A : j i ! Aj i; A (aj i + bj i) = aAj i + bBj i: (2.2)
The adjoint of the operator A is dened by
h'jA i = hAy'j i; (2.3)
for all vectors j'i; j i (where here I have denoted Aj i as jA i). A is
self-adjoint if A = Ay.
If A and B are self adjoint, then so is A + B (because (A + B)y =
Ay + By) but (AB)y = ByAy, so AB is self adjoint only if A and B
commute. Note that AB + BA and i(AB BA) are always self-adjoint
if A and B are.
A self-adjoint operator in a Hilbert space H has a spectral representa-
tion { it's eigenstates form a complete orthonormal basis in H. We can
express a self-adjoint operator A as
X
A = anPn : (2.4)
n
2.1. AXIOMS OF QUANTUM MECHANICS 39
Here each an is an eigenvalue of A, and Pn is the corresponding or-
thogonal projection onto the space of eigenvectors with eigenvalue an.
(If an is nondegenerate, then Pn = jnihnj; it is the projection onto the
corresponding eigenvector.) The Pn 's satisfy
Pn Pm = n;mPn
Pyn = Pn : (2.5)
(For unbounded operators in an innite-dimensional space, the deni-
tion of self-adjoint and the statement of the spectral theorem are more
subtle, but this need not concern us.)
3. Measurement. In quantum mechanics, the numerical outcome of a
measurement of the observable A is an eigenvalue of A; right after the
measurement, the quantum state is an eigenstate of A with the mea-
sured eigenvalue. If the quantum state just prior to the measurement
is j i, then the outcome an is obtained with probability
Prob (an) =k Pn j i k2= h jPn j i; (2.6)
If the outcome is an is attained, then the (normalized) quantum state
becomes
Pn j i : (2.7)
(h jPn j i)1=2
(Note that if the measurement is immediately repeated, then according
to this rule the same outcome is attained again, with probability one.)
4. Dynamics. Time evolution of a quantum state is unitary; it is gener-
ated by a self-adjoint operator, called the Hamiltonian of the system. In
the Schrodinger picture of dynamics, the vector describing the system
moves in time as governed by the Schrodinger equation
d j (t)i = iHj (t)i; (2.8)
dt
where H is the Hamiltonian. We may reexpress this equation, to rst
order in the innitesimal quantity dt, as
j (t + dt)i = (1 iHdt)j (t)i: (2.9)
40 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
The operator U(dt) 1 iHdt is unitary; because H is self-adjoint
it satises UyU = 1 to linear order in dt. Since a product of unitary
operators is nite, time evolution over a nite interval is also unitary
j (t)i = U(t)j (0)i: (2.10)
In the case where H is t-independent; we may write U = e itH.
This completes the mathematical formulation of quantum mechanics. We
immediately notice some curious features. One oddity is that the Schrodinger
equation is linear, while we are accustomed to nonlinear dynamical equations
in classical physics. This property seems to beg for an explanation. But far
more curious is the mysterious dualism; there are two quite distinct ways
for a quantum state to change. On the one hand there is unitary evolution,
which is deterministic. If we specify j (0)i, the theory predicts the state
j (t)i at a later time.
But on the other hand there is measurement, which is probabilistic. The
theory does not make denite predictions about the measurement outcomes;
it only assigns probabilities to the various alternatives. This is troubling,
because it is unclear why the measurement process should be governed by
dierent physical laws than other processes.
Beginning students of quantum mechanics, when rst exposed to these
rules, are often told not to ask \why?" There is much wisdom in this ad-
vice. But I believe that it can be useful to ask way. In future lectures.
we will return to this disconcerting dualism between unitary evolution and
measurement, and will seek a resolution.
2.2.1 Spin- 21
First of all, the coecients a and b in eq. (2.11) encode more than just the
probabilities of the outcomes of a measurement in the fj0i; j1ig basis. In
particular, the relative phase of a and b also has physical signicance.
For a physicist, it is natural to interpret eq. (2.11) as the spin state of an
object with spin- 21 (like an electron). Then j0i and j1i are the spin up (j "i)
and spin down (j #i) states along a particular axis such as the z-axis. The
two real numbers characterizing the qubit (the complex numbers a and b,
modulo the normalization and overall phase) describe the orientation of the
spin in three-dimensional space (the polar angle and the azimuthal angle
').
We cannot go deeply here into the theory of symmetry in quantum me-
chanics, but we will brie
y recall some elements of the theory that will prove
useful to us. A symmetry is a transformation that acts on a state of a system,
42 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
yet leaves all observable properties of the system unchanged. In quantum
mechanics, observations are measurements of self-adjoint operators. If A is
measured in the state j i, then the outcome jai (an eigenvector of A) oc-
curs with probability jhaj ij2. A symmetry should leave these probabilities
unchanged (when we \rotate" both the system and the apparatus).
A symmetry, then, is a mapping of vectors in Hilbert space
j i ! j 0 i; (2.12)
that preserves the absolute values of inner products
jh'j ij = jh'0j 0ij; (2.13)
for all j'i and j i. According to a famous theorem due to Wigner, a mapping
with this property can always be chosen (by adopting suitable phase conven-
tions) to be either unitary or antiunitary. The antiunitary alternative, while
important for discrete symmetries, can be excluded for continuous symme-
tries. Then the symmetry acts as
j i ! j 0i = Uj i; (2.14)
where U is unitary (and in particular, linear).
Symmetries form a group: a symmetry transformation can be inverted,
and the product of two symmetries is a symmetry. For each symmetry op-
eration R acting on our physical system, there is a corresponding unitary
transformation U(R). Multiplication of these unitary operators must respect
the group multiplication law of the symmetries { applying R1 R2 should be
equivalent to rst applying R2 and subsequently R1. Thus we demand
U(R1)U(R2) = Phase (R1; R2)U(R1 R2) (2.15)
The phase is permitted in eq. (2.15) because quantum states are rays; we
need only demand that U(R1 R2) act the same way as U(R1)U(R2) on
rays, not on vectors. U(R) provides a unitary representation (up to a phase)
of the symmetry group.
So far, our concept of symmetry has no connection with dynamics. Usu-
ally, we demand of a symmetry that it respect the dynamical evolution of
the system. This means that it should not matter whether we rst transform
the system and then evolve it, or rst evolve it and then transform it. In
other words, the diagram
2.2. THE QUBIT 43
dynamics -
Initial Final
rotation rotation
? ?
dynamics - New Final
New Initial
is commutative. This means that the time evolution operator eitH should
commute with the symmetry transformation U(R) :
U(R)e itH = e itHU(R); (2.16)
and expanding to linear order in t we obtain
U(R)H = HU(R) (2.17)
For a continuous symmetry, we can choose R innitesimally close to the
identity, R = I + T , and then U is close to 1,
U = 1 i"Q + O("2): (2.18)
From the unitarity of U (to order ") it follows that Q is an observable,
Q = Qy. Expanding eq. (2.17) to linear order in " we nd
[Q; H] = 0; (2.19)
the observable Q commutes with the Hamiltonian.
Eq. (2.19) is a conservation law. It says, for example, that if we prepare
an eigenstate of Q, then time evolution governed by the Schrodinger equation
will preserve the eigenstate. We have seen that symmetries imply conserva-
tion laws. Conversely, given a conserved quantity Q satisfying eq. (2.19) we
can construct the corresponding symmetry transformations. Finite transfor-
mations can be built as a product of many innitesimal ones
R = (1 + N T )N ) U(R) = (1 + i N Q)N ! eiQ; (2.20)
(taking the limit N ! 1). Once we have decided how innitesimal sym-
metry transformations are represented by unitary operators, then it is also
44 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
determined how nite transformations are represented, for these can be built
as a product of innitesimal transformations. We say that Q is the generator
of the symmetry.
Let us brie
y recall how this general theory applies to spatial rotations
and angular momentum. An innitesimal rotation by d about the axis
specied by the unit vector n^ = (n1; n2; n3) can be expressed as
~
R(^n; d) = I idn^ J; (2.21)
where (J1; J2; J3) are the components of the angular momentum. A nite
rotation is expressed as
R(^n; ) = exp( in^ J~): (2.22)
Rotations about distinct axes don't commute. From elementary properties
of rotations, we nd the commutation relations
[Jk; J` ] = i"k`mJm ; (2.23)
where "k`m is the totally antisymmetric tensor with "123 = 1, and repeated
indices are summed. To implement rotations on a quantum system, we nd
self-adjoint operators J1; J2; J3 in Hilbert space that satisfy these relations.
The \dening" representation of the rotation group is three dimensional,
but the simplest nontrivial irreducible representation is two dimensional,
given by
Jk = 21 k ; (2.24)
where
! ! !
0 1 0
1 = 1 0 ; 2 = i 0 ; 3 = 0 1 ; i 1 0 (2.25)
are the Pauli matrices. This is the unique two-dimensional irreducible rep-
resentation, up to a unitary change of basis. Since the eigenvalues of Jk are
12 , we call this the spin- 21 representation. (By identifying J as the angular-
momentum, we have implicitly chosen units with ~ = 1).
The Pauli matrices also have the properties of being mutually anticom-
muting and squaring to the identity,
k ` + ` k = 2k` 1; (2.26)
2.2. THE QUBIT 45
So we see that (^n ~)2 = nk n`k ` = nk nk 1 = 1. By expanding the
exponential series, we see that nite rotations are represented as
U(^n; ) = e i 2 n^~ = 1 cos 2 in^ ~ sin 2 : (2.27)
The most general 2 2 unitary matrix with determinant 1 can be expressed
in this form. Thus, we are entitled to think of a qubit as the state of a spin- 21
object, and an arbitrary unitary transformation acting on the state (aside
from a possible rotation of the overall phase) is a rotation of the spin.
A peculiar property of the representation U(^n; ) is that it is double-
valued. In particular a rotation by 2 about any axis is represented nontriv-
ially:
U(^n; = 2) = 1: (2.28)
Our representation of the rotation group is really a representation \up to a
sign"
U(R1)U(R2) = U(R1 R2): (2.29)
But as already noted, this is acceptable, because the group multiplication is
respected on rays, though not on vectors. These double-valued representa-
tions of the rotation group are called spinor representations. (The existence
of spinors follows from a topological property of the group | it is not simply
connected.)
While it is true that a rotation by 2 has no detectable eect on a spin-
1 object, it would be wrong to conclude that the spinor property has no
2
observable consequences. Suppose I have a machine that acts on a pair of
spins. If the rst spin is up, it does nothing, but if the rst spin is down, it
rotates the second spin by 2. Now let the machine act when the rst spin
is in a superposition of up and down. Then
p1 (j "i1 + j #i1) j "i2 ! p1 (j "i1 j #i1) j "i2 : (2.30)
2 2
While there is no detectable eect on the second spin, the state of the rst
has
ipped to an orthogonal state, which is very much observable.
In a rotated frame of reference, a rotation R(^n; ) becomes a rotation
through the same angle but about a rotated axis. It follows that the three
components of angular momentum transform under rotations as a vector:
U(R)Jk U(R)y = Rk` J` : (2.31)
46 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
Thus, if a state jmi is an eigenstate of J3
J3jmi = mjmi; (2.32)
then U(R)jmi is an eigenstate of RJ3 with the same eigenvalue:
RJ3 (U(R)jmi) = U(R)J3U(R)yU(R)jmi
= U(R)J3jmi = m (U(R)jmi) : (2.33)
Therefore, we can construct eigenstates of angular momentum along the axis
n^ = (sin cos '; sin sin '; cos ) by applying a rotation through , about the
axis n^ 0 = ( sin '; cos '; 0), to a J3 eigenstate. For our spin- 21 representation,
this rotation is
" # " i'!#
exp i 2 n^ 0 ~ = exp 2 ei' 0 0 e
cos e i' sin !
= ei' sin 2
cos 2 ; (2.34)
2 2
and applying it to 10 , the J3 eigenstate with eigenvalue 1, we obtain
e i'=2 cos !
j (; ')i = ei'=2 sin 2 ; (2.35)
2
(up to an overall phase). We can check directly that this is an eigenstate of
cos e i' sin !
n^ ~ = ei' sin cos ; (2.36)
with eigenvalue one. So we have seen that eq. (2.11) with a = e i'=2 cos 2 ; b =
ei'=2 sin 2 ; can be interpreted as a spin pointing in the (; ') direction.
We noted that we cannot determine a and b with a single measurement.
Furthermore, even with many identical copies of the state, we cannot com-
pletely determine the state by measuring each copy only along the z-axis.
This would enable us to estimate jaj and jbj, but we would learn nothing
about the relative phase of a and b. Equivalently, we would nd the compo-
nent of the spin along the z-axis
h (; ')j3j (; ')i = cos2 2 sin2 2 = cos ; (2.37)
2.2. THE QUBIT 47
but we would not learn about the component in the x y plane. The problem
of determining j i by measuring the spin is equivalent to determining the
unit vector n^ by measuring its components along various axes. Altogether,
measurements along three dierent axes are required. E.g., from h3i and
h1i we can determine n3 and n1, but the sign of n2 remains undetermined.
Measuring h2i would remove this remaining ambiguity.
Of course, if we are permitted to rotate the spin, then only measurements
along the z-axis will suce. That is, measuring a spin along the n^ axis is
equivalent to rst applying a rotation that rotates the n^ axis to the axis z^,
and then measuring along z^.
In the special case = 2 and ' = 0 (the x^-axis) our spin state is
j "xi = p12 (j "z i + j #z i) ; (2.38)
(\spin-up along the x-axis"). The orthogonal state (\spin down along the
x-axis") is
j #xi = p12 (j "z i j #z i) : (2.39)
For either of these states, if we measure the spin along the z-axis, we will
obtain j "z i with probability 12 and j #z i with probability 21 .
Now consider the combination
p1 (j "xi + j #xi) : (2.40)
2
This state has the property that, if we measure the spin along the x-axis, we
obtain j "xi or j #xi, each with probability 21 . Now we may ask, what if we
measure the state in eq. (2.40) along the z-axis?
If these were probabilistic classical bits, the answer would be obvious.
The state in eq. (2.40) is in one of two states, and for each of the two, the
probability is 21 for pointing up or down along the z-axis. So of course we
should nd up with probability 12 when we measure along the z-axis.
But not so for qubits! By adding eq. (2.38) and eq. (2.39), we see that
the state in eq. (2.40) is really j "z i in disguise. When we measure along the
z-axis, we always nd j "z i, never j #z i.
We see that for qubits, as opposed to probabilistic classical bits, proba-
bilities can add in unexpected ways. This is, in its simplest guise, the phe-
nomenon called \quantum interference," an important feature of quantum
information.
48 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
It should be emphasized that, while this formal equivalence with a spin- 21
object applies to any two-level quantum system, of course not every two-level
system transforms as a spinor under rotations!
with eigenvalues 1. Because the eigenvalues are 1 (not 21 ) we say that
the photon has spin-1.
In this context, the quantum interference phenomenon can be described
this way: Suppose that we have a polarization analyzer that allows only
one of the two linear photon polarizations to pass through. Then an x or y
polarized photon has prob 12 of getting through a 45o rotated polarizer, and
a 45o polarized photon has prob 21 of getting through an x and y analyzer.
But an x photon never passes through a y analyzer. If we put a 45o rotated
analyzer in between an x and y analyzer, then 21 the photons make it through
each analyzer. But if we remove the analyzer in the middle no photons make
it through the y analyzer.
A device can be constructed easily that rotates the linear polarization of
a photon, and so applies the transformation Eq. (2.41) to our qubit. As
noted, this is not the most general possible unitary transformation. But if
we also have a device that alters the relative phase of the two orthogonal
linear polarization states
jxi ! ei!=2jxi
jyi ! e i!=2jyi ; (2.45)
the two devices can be employed together to apply an arbitrary 2 2 unitary
transformation (of determinant 1) to the photon polarization state.
where 0 < pa 1 and Pa pa = 1. If the state is not pure, there are two
or more terms in this sum, and 2 6= ; in fact, tr 2 = P p2a < P pa = 1.
We say that is an incoherent superposition of the states fj aig; incoherent
meaning that the relative phases of the j ai are experimentally inaccessible.
Since the expectation value of any observable M acting on the subsystem
can be expressed as
X
hMi = trM = pah ajMj ai; (2.61)
a
= 12 (1 + n^ ~) (2.68)
where n^ = (sin cos '; sin sin '; cos ). One nice property of the Bloch
parametrization of the pure states is that while j (; ')i has an arbitrary
overall phase that has no physical signicance, there is no phase ambiguity
in the density matrix (; ') = j (; ')ih (; ')j; all the parameters in
have a physical meaning.
From the property
1 tr = (2.69)
2 i j ij
we see that
hn^ ~iP~ = tr n^ ~(P~ ) = n^ P~ : (2.70)
Thus the vector P~ in Eq. (2.62) parametrizes the polarization of the spin. If
there are many identically prepared systems at our disposal, we can determine
P~ (and hence the complete density matrix (P~ )) by measuring hn^ ~i along
each of three linearly independent axes.
2.4.1 Entanglement
With any bipartite pure state j iAB we may associate a positive integer, the
Schmidt number, which is the number of nonzero eigenvalues in A (or B )
and hence the number of terms in the Schmidt decomposition of j iAB . In
terms of this quantity, we can dene what it means for a bipartite pure state
to be entangled: j iAB is entangled (or nonseparable) if its Schmidt number
is greater than one; otherwise, it is separable (or unentangled). Thus, a
separable bipartite pure state is a direct product of pure states in HA and
HB ,
j iAB = j'iA
jiB ; (2.92)
then the reduced density matrices A = j'iA A h'j and B = jiB B hj are
pure. Any state that cannot be expressed as such a direct product is entan-
gled; then A and B are mixed states.
62 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
One of our main goals this term will be to understand better the signif-
icance of entanglement. It is not strictly correct to say that subsystems A
and B are uncorrelated if j iAB is separable; after all, the two spins in the
separable state
j "iA j "iB ; (2.93)
are surely correlated { they are both pointing in the same direction. But
the correlations between A and B in an entangled state have a dierent
character than those in a separable state. Perhaps the critical dierence is
that entanglement cannot be created locally. The only way to entangle A and
B is for the two subsystems to directly interact with one another.
We can prepare the state eq. (2.93) without allowing spins A and B to
ever come into contact with one another. We need only send a (classical!)
message to two preparers (Alice and Bob) telling both of them to prepare a
spin pointing along the z-axis. But the only way to turn the state eq. (2.93)
into an entangled state like
p1 (j "iA j "iB + j #iAj #iB ) ; (2.94)
2
is to apply a collective unitary transformation to the state. Local unitary
transformations of the form UA
UB , and local measurements performed by
Alice or Bob, cannot increase the Schmidt number of the two-qubit state,
no matter how much Alice and Bob discuss what they do. To entangle two
qubits, we must bring them together and allow them to interact.
As we will discuss later, it is also possible to make the distinction between
entangled and separable bipartite mixed states. We will also discuss various
ways in which local operations can modify the form of entanglement, and
some ways that entanglement can be put to use.
2.6 Summary
Axioms. The arena of quantum mechanics is a Hilbert space H. The
fundamental assumptions are:
(1) A state is a ray in H.
74 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
(2) An observable is a self-adjoint operator on H.
(3) A measurement is an orthogonal projection.
(4) Time evolution is unitary.
Density operator. But if we conne our attention to only a portion of
a larger quantum system, assumptions (1)-(4) need not be satised. In par-
ticular, a quantum state is described not by a ray, but by a density operator
, a nonnegative operator with unit trace.2 The density operator is pure (and
the state can be described by a ray) if = ; otherwise, the state is mixed.
An observable M has expectation value tr(M) in this state.
Qubit. A quantum system with a two-dimensional Hilbert space is called
a qubit. The general density matrix of a qubit is
(P~ ) = 21 (1 + P~ ~) (2.120)
where P~ is a three-component vector of length jP~ j 1. Pure states have
jP~ j = 1.
Schmidt decomposition. For any quantum system divided into two
parts A and B (a bipartite system), the Hilbert space is a tensor product HA
HB . For any pure state j iAB of a bipartite system, there are orthonormal
bases fjiiA g for HA and fji0iB g for HB such that
X
j iAB = ppi jiiAji0iB ; (2.121)
i
Eq. (2.121) is called the Schmidt decomposition of j iAB . In a bipartite pure
state, subsystems A and B separately are described by density operators A
and B ; it follows from eq. (2.121) that A and B have the same nonvanish-
ing eigenvalues (the pi's). The number of nonvanishing eigenvalues is called
the Schmidt number of j iAB . A bipartite pure state is said to be entangled
if its Schmidt number is greater than one.
Ensembles. The density operators on a Hilbert space form a convex set,
and the pure states are the extremal points of the set. A mixed state of a
system A can be prepared as an ensemble of pure states in many dierent
ways, all of which are experimentally indistinguishable if we observe system
A alone. Given any mixed state A of system A, any preparation of A
as an ensemble of pure states can be realized in principle by performing a
2.7. EXERCISES 75
measurement in another system B with which A is entangled. In fact given
many such preparations of A , there is a single entangled state of A and
B such that any one of these preparations can be realized by measuring a
suitable observable in B (the GHJW theorem). By measuring in system B
and reporting the measurement outcome to system A, we can extract from
the mixture a pure state chosen from one of the ensembles.
2.7 Exercises
2.1 Fidelity of a random guess
A single qubit (spin- 12 object) is in an unknown pure state j i, se-
lected at random from an ensemble uniformly distributed over the Bloch
sphere. We guess at random that the state is ji. On the average, what
is the delity F of our guess, dened by
F jhj ij2 : (2.122)
2.2 Fidelity after measurement
After randomly selecting a one-qubit pure state as in the previous prob-
lem, we perform a measurement of the spin along the z^-axis. This
measurement prepares a state described by the density matrix
= P"h jP"j i + P#h jP#j i (2.123)
(where P";# denote the projections onto the spin-up and spin-down
states along the z^-axis). On the average, with what delity
F h j j i (2.124)
does this density matrix represent the initial state j i? (The improve-
ment in F compared to the answer to the previous problem is a crude
measure of how much we learned by making the measurement.)
2.3 Schmidt decomposition
For the two-qubit state
p ! p !
1 1 3 1 3 1
= p j "iA 2 j "iB + 2 j #iB + p j #iA 2 j "iB + 2 j #iB ;
2 2 (2.125)
76 CHAPTER 2. FOUNDATIONS I: STATES AND ENSEMBLES
a. Compute A = trB (jihj) and B = trA (jihj).
b. Find the Schmidt decomposition of ji.
2.4 Tripartite pure state
Is there a Schmidt decomposition for an arbitrary tripartite pure state?
That is if j iABC is an arbitrary vector in HA
HB
HC , can we nd
orthonormal bases fjiiA g, fjiiB g, fjiiC g such that
X
j iABC = ppi jiiA
jiiB
jiiC ? (2.126)
i
Explain your answer.
2.5 Quantum correlations in a mixed state
Consider a density matrix for two qubits
= 18 1 + 21 j ih j ; (2.127)
where 1 denotes the 4 4 unit matrix, and
j i = p12 (j "ij #i j #ij "i) : (2.128)
Suppose we measure the rst spin along the n^ axis and the second spin
along the m^ axis, where n^ m^ = cos . What is the probability that
both spins are \spin-up" along their respective axes?
Chapter 3
Foundations II: Measurement
and Evolution
3.1 Orthogonal Measurement and Beyond
3.1.1 Orthogonal Measurements
We would like to examine the properties of the generalized measurements
that can be realized on system A by performing orthogonal measurements
on a larger system that contains A. But rst we will brie
y consider how
(orthogonal) measurements of an arbitrary observable can be achieved in
principle, following the classic treatment of Von Neumann.
To measure an observable M, we will modify the Hamiltonian of the world
by turning on a coupling between that observable and a \pointer" variable
that will serve as the apparatus. The coupling establishes entanglement
between the eigenstates of the observable and the distinguishable states of the
pointer, so that we can prepare an eigenstate of the observable by \observing"
the pointer.
Of course, this is not a fully satisfying model of measurement because we
have not explained how it is possible to measure the pointer. Von Neumann's
attitude was that one can see that it is possible in principle to correlate
the state of a microscopic quantum system with the value of a macroscopic
classical variable, and we may take it for granted that we can perceive the
value of the classical variable. A more complete explanation is desirable and
possible; we will return to this issue later.
We may think of the pointer as a particle that propagates freely apart
77
78 CHAPTER 3. MEASUREMENT AND EVOLUTION
from its tunable coupling to the quantum system being measured. Since we
intend to measure the position of the pointer, it should be prepared initially
in a wavepacket state that is narrow in position space | but not too narrow,
because a vary narrow wave packet will spread too rapidly. If the initial
width of the wave packet is x, then the uncertainty in it velocity will be
of order v = p=m ~=mx, so that after a time t, the wavepacket will
spread to a width
x(t) x + m~t x ; (3.1)
which is minimized for [x(t)]2 [x]2 ~t=m. Therefore, if the experi-
ment takes a time t, the resolution we can achieve for the nal position of
the pointer is limited by
s
~t
x >(x)SQL m ; (3.2)
the \standard quantum limit." We will choose our pointer to be suciently
heavy that this limitation is not serious.
The Hamiltonian describing the coupling of the quantum system to the
pointer has the form
H = H0 + 21m P2 + MP; (3.3)
where P2=2m is the Hamiltonian of the free pointer particle (which we will
henceforth ignore on the grounds that the pointer is so heavy that spreading
of its wavepacket may be neglected), H0 is the unperturbed Hamiltonian of
the system to be measured, and is a coupling constant that we are able to
turn on and o as desired. The observable to be measured, M, is coupled to
the momentum P of the pointer.
If M does not commute with H0, then we have to worry about how the
observable evolves during the course of the measurement. To simplify the
analysis, let us suppose that either [M; H0] = 0, or else the measurement
is carried out quickly enough that the free evolution of the system can be
neglected during the measurement procedure. Then the Hamiltonian can be
approximated as H ' MP (where of course [M; P] = 0 because M is an
observable of the system and P is an observable of the pointer), and the time
evolution operator is
U(t) ' exp[ itMP]: (3.4)
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 79
Expanding in the basis in which M is diagonal,
X
M = jaiMahaj; (3.5)
a
we express U(t) as
X
U(t) = jai exp[ itMaP]haj: (3.6)
a
Now we recall that P generates a translation of the position of the pointer:
P = i dxd in the position representation, so that e ixoP = exp xo dxd , and
by Taylor expanding,
e ixoP (x) = (x xo); (3.7)
In other words e ixoP acting on a wavepacket translates the wavepacket by xo.
We see that if our quantum system starts in a superposition of M eigenstates,
initially unentangled with the position-space wavepacket j (x) of the pointer,
then after time t the quantum state has evolved to
X !
U(t) ajai
j (x)i
a
X
= ajai
j (x tMa)i; (3.8)
a
the position of the pointer is now correlated with the value of the observable
M. If the pointer wavepacket is narrow enough for us to resolve all values of
the Ma that occur (x <tMa), then when we observe the position of the
pointer (never mind how!) we will prepare an eigenstate of the observable.
With probability jaj2, we will detect that the pointer has shifted its position
by tMa, in which case we will have prepared the M eigenstate jai. In the
end, then, we conclude that the initial state j'i or the quantum system is
projected to jai with probability jhaj'ij2. This is Von Neumann's model of
orthogonal measurement.
The classic example is the Stern{Gerlach apparatus. To measure 3 for a
spin- 21 object, we allow the object to pass through a region of inhomogeneous
magnetic eld
B3 = z: (3.9)
80 CHAPTER 3. MEASUREMENT AND EVOLUTION
The magnetic moment of the object is ~ , and the coupling induced by the
magnetic eld is
H = z 3: (3.10)
In this case 3 is the observable to be measured, coupled to the position
z rather than the momentum of the pointer, but that's all right because z
generates a translation of Pz , and so the coupling imparts an impulse to the
pointer. We can perceive whether the object is pushed up or down, and so
project out the spin state j "z i or j #z i. Of course, by rotating the magnet,
we can measure the observable n^ ~ instead.
Our discussion of the quantum eraser has cautioned us that establishing
the entangled state eq. (3.8) is not sucient to explain why the measurement
procedure prepares an eigenstate of M. In principle, the measurement of the
pointer could project out a peculiar superposition of position eigenstates,
and so prepare the quantum system in a superposition of M eigenstates. To
achieve a deeper understanding of the measurement process, we will need to
explain why the position eigenstate basis of the pointer enjoys a privileged
status over other possible bases.
If indeed we can couple any observable to a pointer as just described, and
we can observe the pointer, then we can perform any conceivable orthogonal
projection in Hilbert space. Given a set of operators fEag such that
X
Ea = Eya; EaEb = abEa; Ea = 1; (3.11)
a
we can carry out a measurement procedure that will take a pure state j ih j
to
Eaj ih jEa (3.12)
h jEaj i
with probability
Prob(a) = h jEaj i: (3.13)
The measurement outcomes can be described by a density matrix obtained
by summing over all possible outcomes weighted by the probability of that
outcome (rather than by choosing one particular outcome) in which case the
measurement modies the initial pure state according to
X
j ih j ! Eaj ih jEa: (3.14)
a
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 81
This is the ensemble of pure states describing the measurement outcomes
{ it is the description we would use if we knew a measurement had been
performed, but we did not know the result. Hence, the initial pure state has
become a mixed state unless the initial state happened to be an eigenstate
of the observable being measured. If the initial state before the measure-
ment were a mixed state with density matrix , then by expressing as an
ensemble of pure states we nd that the eect of the measurement is
! X EaEa: (3.15)
a
Now let's change our perspective on eq. (3.28). Interpret the ( a)i's not as
n N vectors in an N -dimensional space, but rather an N n vectors
( iT )a in an n-dimensional space. Then eq. (3.28) becomes the statement
that these N vectors form an orthonormal set. Naturally, it is possible to
extend these vectors to an orthonormal basis for an n-dimensional space. In
other words, there is an n n matrix uai, with uai = ~ai for i = 1; 2; : : : ; N ,
such that
X
uaiuaj = ij ; (3.29)
a
B )E a : (3.41)
AB a A B )]
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 87
To an observer who has access only to system A, the new density matrix for
that system is given by the partial trace of the above, or
0A(a) = trtrBAB[E[aE(a(A
B)E)]a] : (3.42)
A B
The expression eq. (3.40) for the probability of outcome a can also be written
Prob(a) = trA [trB (E a(A
B ))] = trA (F aA ); (3.43)
If we introduce orthonormal bases fjiiAg for HA and jiB for HB , then
X X
(Ea)j;i (A)ij (B ) = (Fa)ji(A)ij ; (3.44)
ij ij
or
X
(Fa)ji = (Ea)j;i (B ) : (3.45)
It follows from eq. (3.45) that each F a has the properties:
(1) Hermiticity: X
(Fa)ij = (Ea)i;j (B )
X
= (Ea)j;i (B ) = Fji
(because E a and B are hermitian:
(2) Positivity:
In the basis that diagonalizes = P p ji hj, h jF j i =
P p ( h j
hj)E (j i
ji ) B B A
B a A
A B a A B
0 (because E a is positive):
(3) Completeness:
X
F a = X p B hj X E ajiB = 1A
a a
X
(because E a = 1AB and tr B = 1):
a
88 CHAPTER 3. MEASUREMENT AND EVOLUTION
But the F a's need not be mutually orthogonal. In fact, the number of F a's
is limited only by the dimension of HA
HB , which is greater than (and
perhaps much greater than) the dimension of HA .
There is no simple way, in general, to express the nal density matrix
A(a) in terms of A and F a . But let us disregard how the POVM changes
0
the density matrix, and instead address this question: Suppose that HA has
dimension N , and
P consider a POVM with n one-dimensional nonnegative
F a's satisfying a=1 F a = 1A . Can we choose the space HB , density matrix
n
B in HB , and projection operators E a in HA
HB (where the number or
E a's may exceed the number of F a's) such3 that the probability of outcome
a of the orthogonal measurement satises
tr E a(A
B ) = tr(F aA) ? (3.46)
(Never mind how the orthogonal projections modify A!) We will consider
this to be a \realization" of the POVM by orthogonal measurement, because
we have no interest in what the state 0A is for each measurement outcome;
we are only asking that the probabilities of the outcomes agree with those
dened by the POVM.
Such a realization of the POVM is indeed possible; to show this, we will
appeal once more to Neumark's theorem. Each one-dimensional F a; a =
1; 2; : : : ; n, can be expressed as F a = j ~aih ~aj. According to Neumark,
there are n orthonormal n-component vectors juai such that
juai = j ~ai + j ~a?i: (3.47)
Now consider, to start with, the special case n = rN , where r is a positive
integer. Then it is convenient to decompose j ~a?i as a direct sum of r 1
N -component vectors:
j ~a?i = j ~1?;ai j ~2?;ai j ~r? 1;ai; (3.48)
Here j ~1?;ai denotes the rst N components of j ~a?i, j ~2?;ai denotes the next
N components, etc. Then the orthonormality of the juai's implies that
rX1
ab = huajubi = h aj bi + h ~;a
~ ~ ? j ~? i :
;b (3.49)
=1
3 If there are more E a 's than F a 's, all but n outcomes have probability zero.
3.1. ORTHOGONAL MEASUREMENT AND BEYOND 89
Now we will choose HB to have dimension r and we will denote an orthonor-
mal basis for HB by
fjiB g; = 0; 1; 2; : : : ; r 1: (3.50)
Then it follows from Eq. (3.49) that
rX1
jaiAB = j ~aiA j0iB + j ~;a
? i ji ; a = 1; 2; : : : ; n;
A B
=1 (3.51)
is an orthonormal basis for HA
HB .
Now suppose that the state in HA
HB is
AB = A
j0iB B h0j; (3.52)
and that we perform an orthogonal projection onto the basis fjaiAB g in
HA
HB . Then, since B h0jiB = 0 for 6= 0, the outcome jaiAB occurs
with probability
AB ha j AB ja iAB = Ah ~ajAj ~aiA ; (3.53)
and thus,
ha jAB jaiAB = tr(F aA): (3.54)
We have indeed succeeded in \realizing" the POVM fF ag by performing
orthogonal measurement on HA
HB. This construction is just as ecient as
the \direct sum" construction described previously; we performed orthogonal
measurement in a space of dimension n = N r.
If outcome a occurs, then the state
0AB = jaiAB AB haj; (3.55)
is prepared by the measurement. The density matrix seen by an observer
who can probe only system A is obtained by performing a partial trace over
HB ,
0A = trB (jaiAB AB haj)
rX1
= j ~aiA Ah ~aj + j ~;a
? iA A h ~? j
;a (3.56)
=1
90 CHAPTER 3. MEASUREMENT AND EVOLUTION
which isn't quite the same thing as what we obtained in our \direct sum"
construction. In any case, there are many possible ways to realize a POVM
by orthogonal measurement and eq. (3.56) applies only to the particular
construction we have chosen here.
Nevertheless, this construction really is perfectly adequate for realizing
the POVM in which the state j aiA Ah aj is prepared in the event that
outcome a occurs. The hard part of implementing a POVM is assuring that
outcome a arises with the desired probability. It is then easy to arrange that
the result in the event of outcome a is the state j aiA Ah aj; if we like, once
the measurement is performed and outcome a is found, we can simply throw
A away and proceed to prepare the desired state! In fact, in the case of the
projection onto the basis jaiAB , we can complete the construction of the
POVM by projecting system B onto the fjiB g basis, and communicating
the result to system A. If the outcome is j0iB , then no action need be taken.
If the outcome is jiB , > 0, then the state j ~;a ? iA has been prepared,
which can then be rotated to j aiA.
So far, we have discussed only the special case n = rN . But if actually
n = rN c, 0 < c < N , then we need only choose the nal c components of
j ~r? 1;aiA to be zero, and the states jiAB will still be mutually orthogonal.
To complete the orthonormal basis, we may add the c states
3.2 Superoperators
3.2.1 The operator-sum representation
We now proceed to the next step of our program of understanding the be-
havior of one part of a bipartite quantum system. We have seen that a pure
state of the bipartite system may behave like a mixed state when we observe
subsystem A alone, and that an orthogonal measurement of the bipartite
system may be a (nonorthogonal) POVM on A alone. Next we ask, if a state
of the bipartite system undergoes unitary evolution, how do we describe the
evolution of A alone?
Suppose that the initial density matrix of the bipartite system is a tensor
product state of the form
A
j0iB B h0j; (3.64)
system A has density matrix A, and system B is assumed to be in a pure
state that we have designated j0iB . The bipartite system evolves for a nite
time, governed by the unitary time evolution operator
UAB (A
j0iB B h0j) UAB : (3.65)
Now we perform the partial trace over HB to nd the nal density matrix of
system A,
0A = trB UAB (A
j0iB B h0j) UyAB
X
= B hjUAB j0iB A B h0jUAB jiB ; (3.66)
where fjiB g is an orthonormal basis for HB0 and B hjUAB j0iB is an operator
acting on HA . (If fjiiA
jiB g is an orthonormal basis for HA
HB , then
B hjUAB j iB denotes the operator whose matrix elements are
A hij (B hjUAB j iB ) jj iA
3.2. SUPEROPERATORS 93
= (A hij
B hj) UAB (jj iA
j iB ) :) (3.67)
If we denote
M = B hjUAB j0iB ; (3.68)
then we may express 0A as
X
$(A) 0A = MAMy: (3.69)
It follows from the unitarity of UAB that the M 's satisfy the property
X y X
M M = B h0jUyAB jiB B hjUAB j0iB
= B h0jUyAB UAB j0iB = 1A: (3.70)
Eq. (3.69) denes a linear map $ that takes linear operators to linear
operators. Such a map, if the property in eq. (3.70) is satised, is called a
superoperator, and eq. (3.69) is called the operator sum representation (or
Kraus representation) of the superoperator. A superoperator can be regarded
as a linear map that takes density operators to density operators, because it
follows from eq. (3.69) and eq. (3.70) that 0A is a density matrix if A is:
(1) 0 is hermitian: 0y = P M y My = .
A A A A
(2) 0A has unit trace: tr0A = P tr(AMy M) = trA = 1.
(3) 0A is positive: A h j0Aj iA = P(h jM )A(My j i) 0.
We showed that the operator sum representation in eq. (3.69) follows from
the \unitary representation" in eq. (3.66). But furthermore, given the oper-
ator sum representation of a superoperator, it is always possible to construct
a corresponding unitary representation. We choose HB to be a Hilbert space
whose dimension is at least as large as the number of terms in the operator
sum. If fj'Ag is any vector in HA, the fjiB g are orthonormal states in HB ,
and j0iB is some normalized state in HB , dene the action of UAB by
X
UAB (j'iA
j0iB ) = Mj'iA
jiB : (3.71)
94 CHAPTER 3. MEASUREMENT AND EVOLUTION
This action is inner product preserving:
X ! X !
A h'2 jM
B h j M j'1iA
jiB
y
X
= Ah'2j My Mj'1iA = A h'2j'1iA ; (3.72)
therefore, UAB can be extended to a unitary operator acting on all of HA
3.2.2 Linearity
Now we will broaden our viewpoint a bit and consider the essential properties
that should be satised by any \reasonable" time evolution law for density
matrices. We will see that any such law admits an operator-sum representa-
tion, so in a sense the dynamical behavior we extracted by considering part
of a bipartite system is actually the most general possible.
A mapping $ : ! 0 that takes an initial density matrix to a nal
density matrix 0 is a mapping of operators to operators that satises
(1) $ preserves hermiticity: 0 hermitian if is.
(2) $ is trace preserving: tr0 = 1 if tr = 1.
(3) $ is positive: 0 is nonnegative if is.
It is also customary to assume
(0) $ is linear.
While (1), (2), and (3) really are necessary if 0 is to be a density matrix,
(0) is more open to question. Why linearity?
One possible answer is that nonlinear evolution of the density matrix
would be hard to reconcile with any ensemble interpretation. If
$ (()) $ (1 + (1 )2) = $(1) + (1 )$(2);
(3.75)
96 CHAPTER 3. MEASUREMENT AND EVOLUTION
then time evolution is faithful to the probabilistic interpretation of ():
either (with probability ) 1 was initially prepared and evolved to $(1), or
(with probability 1 ) 2 was initially prepared and evolved to $(2). But
a nonlinear $ typically has consequences that are seemingly paradoxical.
Consider, for example, a single qubit evolving according to
$() = exp [i1tr(1)] exp [ i1tr(1)] : (3.76)
One can easily check that $ is positive and trace-preserving. Suppose that
the initial density matrix is = 21 1, realized as the ensemble
= 12 j "z ih"z j + 12 j #z ih#z j: (3.77)
Since tr(1) = 0, the evolution of is trivial, and both representatives of
the ensemble are unchanged. If the spin was prepared as j "z i, it remains in
the state j "z i.
But now imagine that, immediately after preparing the ensemble, we do
nothing if the state has been prepared as j "z i, but we rotate it to j "xi if it
has been prepared as j #z i. The density matrix is now
0 = 21 j "z ih"z j + 12 j "xij "xi; (3.78)
so that tr01 = 21 . Under evolution governed by $, this becomes $(0) =
101. In this case then, if the spin was prepared as j "z i, it evolves to the
orthogonal state j #z i.
The state initially prepared as j "z i evolves dierently under these two
scenarios. But what is the dierence between the two cases? The dierence
was that if the spin was initially prepared as j #z i, we took dierent actions:
doing nothing in case (1) but rotating the spin in case (2). Yet we have found
that the spin behaves dierently in the two cases, even if it was initially
prepared as j "z i!
We are accustomed to saying that describes two (or more) dierent
alternative pure state preparations, only one of which is actually realized
each time we prepare a qubit. But we have found that what happens if we
prepare j "z i actually depends on what we would have done if we had prepared
j #xi instead. It is no longer sensible, apparently, to regard the two possible
preparations as mutually exclusive alternatives. Evolution of the alternatives
actually depends on the other alternatives that supposedly were not realized.
3.2. SUPEROPERATORS 97
Joe Polchinski has called this phenomenon the \Everett phone," because the
dierent \branches of the wave function" seem to be able to \communicate"
with one another.
Nonlinear evolution of the density matrix, then, can have strange, perhaps
even absurd, consequences. Even so, the argument that nonlinear evolution
should be excluded is not completely compelling. Indeed Jim Hartle has
argued that there are versions of \generalized quantum mechanics" in which
nonlinear evolution is permitted, yet a consistent probability interpretation
can be salvaged. Nevertheless, we will follow tradition here and demand that
$ be linear.
(where q > 0, P q = 1, and each j~ i, like j ~iAB , is normalized so that
h~ j~ i = N ). Invoking the relative-state method, we have
$A(j'iA A h'j) =B h'j($A
IB )(j ~iAB AB h ~j)j'iB
X
= q B h'j~ iAB AB h~ j'iB : (3.100)
Now we are almost done; we dene an operator M on HA by
M : j'iA ! pq B h'j~ iAB : (3.101)
We can check that:
1. M is linear, because the map j'iA ! j'iB is antilinear.
2. $A(j'iA A h'j) = P M (j'iA Ah'j)M y, for any pure state j'iA 2 HA.
102 CHAPTER 3. MEASUREMENT AND EVOLUTION
3. $A (A) = P M A M y for any density matrix A, because A can be
expressed as an ensemble of pure states, and $A is linear.
4. P M y M = 1A , because $A is trace preserving for any A.
Thus, we have constructed an operator-sum representation of $A.
Put succinctly, the argument went as follows. Because $A is completely
positive, $A
IB takes a maximally entangled density matrix on HA
HB to
another density matrix. This density matrix can be expressed as an ensemble
of pure states. With each of these pure states in HA
HB , we may associate
(via the relative-state method) a term in the operator sum.
Viewing the operator-sum representation this way, we may quickly estab-
lish two important corollaries:
How many Kraus operators? Each M is associated with a state
j i in the ensemble representation of ~AB . Since ~AB has a rank at most
0 0
N 2 (where N = dim HA ), $A always has an operator-sum representation with
at most N 2 Kraus operators.
How ambiguous? We remarked earlier that the Kraus operators
Na = M Ua; (3.102)
(where Ua is unitary) represent the same superoperator $ as the M 's. Now
we can see that any two Kraus representations of $ must always be related
in this way. (If there are more Na's than M 's, then it is understood that
some zero operators are added to the M 's so that the two operator sets
have the same cardinality.) This property may be viewed as a consequence
of the GHJW theorem.
The relative-state construction described above established a 1 1 corre-
spondence between ~ ensemble~ representations of the (unnormalized) density
matrix ($A
IB) j iAB AB h j and operator-sum representations of $A . (We
explicitly described how to proceed from the ensemble representation to the
operator sum, but we can clearly go the other way, too: If
X
$A(jiiA Ahj j) = M jiiA Ahj jM y ; (3.103)
then
X
($A
IB )(j ~iAB AB h ~j) = (M jiiA ji0iB )(Ahj jM y B hj 0j)
i;j
X
= qj~ iAB AB h~ j; (3.104)
3.3. THE KRAUS REPRESENTATION THEOREM 103
where
pqj~ iAB = X M jiiAji0iB : ) (3.105)
i
Now consider two suchpensembles (or correspondingly
p two operator-sum rep-
~ ~
resentations of $A), f qj iAB g and f pajaiAB g. For each ensemble,
there is a corresponding \purication" in HAB
HC :
Xp ~
qjiAB jiC
Xp ~
pajaiAB jaiC ; (3.106)
a
where f(iC g and fjaiC g are two dierent orthonormal sets in Hc . The
GHJW theorem asserts that these two purications are related by 1AB
U0C ,
a unitary transformation on HC . Therefore,
Xp ~
pa jaiAB jaiC
a
Xp ~
= qjiAB U0C jiC
X
= pqj~ iAB Ua jaiC ; (3.107)
;a
where, to establish the second equality we note that the orthonormal bases
fjiC g and fjaiC g are related by a unitary transformation, and that a
product of unitary transformations is unitary. We conclude that
pp j~ i = X pq j~ i U ; (3.108)
a a AB AB a
(where Ua is unitary) from which follows
N a = X M Ua : (3.109)
Remark. Since we have already established that we can proceed from an
operator-sum representation of $ to a unitary representation, we have now
found that any \reasonable" evolution law for density operators on HA can
104 CHAPTER 3. MEASUREMENT AND EVOLUTION
be realized by a unitary transformation UAB that acts on HA
HB according
to
X
UAB : j iA
j0iB ! j'iA
jiB : (3.110)
Is this result surprising? Perhaps it is. We may interpret a superoperator as
describing the evolution of a system (A) that interacts with its environment
(B ). The general states of system plus environment are entangled states.
But in eq. (3.110), we have assumed an initial state of A and B that is
unentangled. Apparently though a real system is bound to be entangled
with its surroundings, for the purpose of describing the evolution of its density
matrix there is no loss of generality if we imagine that there is no pre-existing
entanglement when we begin to track the evolution!
Remark: The operator-sum representation provides a very convenient
way to express any completely positive $. But a positive $ does not admit
such a representation if it is not completely positive. As far as I know, there
is no convenient way, comparable to the Kraus representation, to express the
most general positive $.
If an error occurs, then j i evolves to an ensemble of the three states 1j i; 2j i; 3j i,
all occuring with equal likelihood.
Unitary representation
The depolarizing channel can be represented by a unitary operator acting on
HA
HE , where HE has dimension 4. (I am calling it HE here to encour-
age you to think of the auxiliary system as the environment.) The unitary
operator UAE acts as
UAE : j iA
j0iE
q rp2
! 1 pj i
j0iE + 3 41j iA
j1iE
3
+ 2j i
j2iE + 3j i
j3iE 5: (3.111)
Bloch-sphere representation
This will be worked out in a homework exercise.
Interpretation
We might interpret the phase-damping channel as describing a heavy \clas-
sical" particle (e.g., an interstellar dust grain) interacting with a background
gas of light particles (e.g., the 30K microwave photons). We can imagine
that the dust is initially prepared in a superposition of position eigenstates
j i = p12 (jxi + j xi) (or more generally a superposition of position-space
wavepackets with little overlap). We might be able to monitor the behavior
of the dust particle, but it is hopeless to keep track of the quantum state of
all the photons that scatter from the particle; for our purposes, the quantum
state of the particle is described by the density matrix obtained by tracing
over the photon degrees of freedom.
Our analysis of the phase damping channel indicates that if photons are
scattered by the dust particle at a rate , then the o-diagonal terms in
decay like exp( t), and so become completely negligible for t 1 .
At that point, the coherence of the superposition of position eigenstates is
completely lost { there is no chance that we can recombine the wavepackets
and induce them to interfere. (If we attempt to do a double-slit interference
pattern with dust grains, we will not see any interference pattern if it takes
a time t 1 for the grain to travel from the source to the screen.)
The dust grain is heavy. Because of its large inertia, its state of motion is
little aected by the scattered photons. Thus, there are two disparate time
scales relevant to its dynamics. On the one hand, there is a damping time
scale, the time for a signicant amount of the particle's momentum to be
transfered to the photons; this is a long time if the particle is heavy. On the
other hand, there is the decoherence time scale. In this model, the time scale
for decoherence is of order , the time for a single photon to be scattered
by the dust grain, which is far shorter than the damping time scale. For a
3.4. THREE QUANTUM CHANNELS 111
macroscopic object, decoherence is fast.
As we have already noted, the phase-damping channel picks out a pre-
ferred basis for decoherence, which in our \interpretation" we have assumed
to be the position-eigenstate basis. Physically, decoherence prefers the spa-
tially localized states of the dust grain because the interactions of photons
and grains are localized in space. Grains in distinguishable positions tend to
scatter the photons of the environment into mutually orthogonal states.
Even if the separation between the \grains" were so small that it could
not be resolved very well by the scattered photons, the decoherence process
would still work in a similar way. Perhaps photons that scatter o grains at
positions x and x are not mutually orthogonal, but instead have an overlap
h
+ j
i = 1 "; " 1: (3.128)
The phase-damping channel would still describe this situation, but with p
replaced by p" (if p is still the probability of a scattering event). Thus, the
decoherence rate would become dec = " scat; where scat is the scattering
rate (see the homework).
The intuition we distill from this simple model applies to a vast variety
of physical situations. A coherent superposition of macroscopically distin-
guishable states of a \heavy" object decoheres very rapidly compared to its
damping rate. The spatial locality of the interactions of the system with its
environment gives rise to a preferred \local" basis for decoherence. Presum-
ably, the same principles would apply to the decoherence of a \cat state"
p1 (j deadi + j alivei), since \deadness" and \aliveness" can be distinguished
2
by localized probes.
9 The nth level of excitation of the oscillator may be interpreted as a state of n nonin-
teracting particles; the rate is n because any one of the n particles can decay.
10This model extends our discussion of the amplitude-damping channel to a damped
oscillator rather than a damped qubit.
3.5. MASTER EQUATION 121
= tr 12 [ay; a]aI = 2 tr(aI ) = 2 ha~ i: (3.161)
Integrating this equation, we obtain
ha~ (t)i = e a
t=2h ~ (0)i: (3.162)
Similarly, the occupation number of the oscillator n aya = a~ ya~ decays
according to
d hni = d ha~ ya~ i = tr(aya_ )
dt dt I
= tr ayaaI ay 12 ayaayaI 21 ayaI aya
= tray[ay; a]aI = trayaI = hni; (3.163)
which integrates to
hn(t)i = e thn(0)i: (3.164)
Thus is the damping rate of the oscillator. We can interpret the nth
excitation state of the oscillator as a state of n noninteracting particles,
each with a decay probability per unit time; hence eq. (3.164) is just the
exponential law satised by the population of decaying particles.
More interesting is what the master equation tells us about decoherence.
The details of that analysis will be a homework exercise. But we will analyze
here a simpler problem { an oscillator undergoing phase damping.
jNo decayiatom jAliveicat jKnow it0s Aliveime Prob = 21
(3.178)
This describes two alternatives, but for either alternative, I am certain
about the health of the cat. I never see a cat that is half alive and half dead.
(I am in an eigenstate of the \certainty operator," in accord with experience.)
By assuming that the wave function describes reality and that all evo-
lution is unitary, we are led to the \many-worlds interpretation" of quan-
tum theory. In this picture, each time there is a \measurement," the wave
function of the universe \splits" into two branches, corresponding to the
two possible outcomes. After many measurements, there are many branches
(many worlds), all with an equal claim to describing reality. This prolifera-
tion of worlds seems like an ironic consequence of our program to develop the
most economical possible description. But we ourselves follow one particular
branch, and for the purpose of predicting what we will see in the next instant,
the many other branches are of no consequence. The proliferation of worlds
comes at no cost to us. The \many worlds" may seem weird, but should
we be surprised if a complete description of reality, something completely
foreign to our experience, seems weird to us?
By including ourselves in the reality described by the wave function, we
have understood why we perceive a denite outcome to a measurement, but
there is still a further question: how does the concept of probability enter
128 CHAPTER 3. MEASUREMENT AND EVOLUTION
into this (deterministic) formalism? This question remains troubling, for to
answer it we must be prepared to state what is meant by \probability."
The word \probability" is used in two rather dierent senses. Sometimes
probability means frequency. We say the probability of a coin coming up
heads is 1=2 if we expect, as we toss the coin many times, the number of
heads divided by the total number of tosses to converge to 1=2. (This is a
tricky concept though; even if the probability is 1=2, the coin still might come
up heads a trillion times in a row.) In rigorous mathematical discussions,
probability theory often seems to be a branch of measure theory { it concerns
the properties of innite sequences.
But in everyday life, and also in quantum theory, probabilities typically
are not frequencies. When we make a measurement, we do not repeat it
an innite number of times on identically prepared systems. In the Everett
viewpoint, or in cosmology, there is just one universe, not many identically
prepared ones.
So what is a probability? In practice, it is a number that quanties the
plausibility of a proposition given a state of knowledge. Perhaps surpris-
ingly, this view can be made the basis of a well-dened mathematical theory,
sometimes called the \Bayesian" view of probability. The term \Bayesian"
re
ects the way probability theory is typically used (both in science and in
everyday life) { to test a hypothesis given some observed data. Hypothesis
testing is carried out using Bayes's rule for conditional probability
P (A0jB ) = P (B jA0)P (A0)=P (B ): (3.179)
For example { suppose that A0 is the preparation of a particular quantum
state, and B is a particular outcome of a measurement of the state. We
have made the measurement (obtaining B ) and now we want to infer how
the state was prepared (compute P (A0jB ). Quantum mechanics allows us to
compute P (B jA0). But it does not tell us P (A0) (or P (B )). We have to make
a guess of P (A0), which is possible if we adopt a \principle of indierence"
{ if we have no knowledge that Ai is more or less likely than Aj we assume
P (Ai) = P (Aj ). Once an ensemble of preparations is chosen, we can compute
X
P (B ) = P (B=Ai )P (Ai); (3.180)
i
and so obtain P (A0jB ) by applying Bayes's rule.
But if our attitude will be that probability theory quanties plausibility
given a state of knowledge, we are obligated to ask \whose state of knowl-
edge?" To recover an objective theory, we must interpret probability in
3.6. WHAT IS THE PROBLEM? (IS THERE A PROBLEM?) 129
quantum theory not as a prediction based on our actual state of knowledge,
but rather as a prediction based on the most complete possible knowledge
about the quantum state. If we prepare j "xi and measure 3, then we say
that the result is j "z i with probability 1=2, not because that is the best
prediction we can make based on what we know, but because it is the best
prediction anyone can make, no matter how much they know. It is in this
sense that the outcome is truly random; it cannot be predicted with certainty
even when our knowledge is complete (in contrast to the pseudo randomness
that arises in classical physics because our knowledge is incomplete).
So how, now, are we to extract probabilities from Everett's deterministic
universe? Probabilities arise because we (a part of the system) cannot predict
our future with certainty. I know the formalism, I know the Hamiltonian and
wave function of the universe, I know my branch of the wave function. Now
I am about to look at the cat. A second from now, I will be either be certain
that the cat is dead or I will be certain that it is alive. Yet even with all I
know, I cannot predict the future. Even with complete knowledge about the
present, I cannot say what my state of knowledge will be after I look at the
cat. The best I can do is assign probabilities to the outcomes. So, while the
wave function of the universe is deterministic I, as a part of the system, can
do no better than making probabilistic predictions.
Of course, as already noted, decoherence is a crucial part of this story.
We may consistently assign probabilities to the alternatives Dead and Alive
only if there is no (or at least negligible) possibility of interference among the
alternatives. Probabilities make sense only when we can identify an exhaus-
tive set of mutually exclusive alternatives. Since the issue is really whether
interference might arise at a later time, we cannot decide whether probabil-
ity theory applies by considering a quantum state at a xed time; we must
examine a set of mutually exclusive (coarse-grained) histories, or sequences
of events. There is a sophisticated technology (\decoherence functionals")
for adjudicating whether the various histories decohere to a sucient extent
for probabilities to be sensibly assigned.
So the Everett viewpoint can be reconciled with the quantum indeter-
minism that we observe, but there is still a troubling gap in the picture, at
least as far as I can tell. I am about to look at the cat, and I know that the
density matrix a second from now will be
We can compute
h
x j 3j x i = 0 ;
(N ) (N )
h
x j 3j x i
(N ) 2 (N )
X ( i) ( j ) ( N )
= N12 h x(N )j
3 3 j x i
ij
X
= N12 ij = NN2 = N1 : (3.186)
ij
3.7 Summary
POVM. If we restrict our attention to a subspace of a larger Hilbert space,
then an orthogonal (Von Neumann) measurement performed on the larger
space cannot in general be described as an orthogonal measurement on the
subspace. Rather, it is a generalized measurement or POVM { the outcome
a occurs with a probability
Prob(a) = tr (F a) ; (3.191)
where is the density matrix of the subsystem, each F a is a positive hermi-
tian operator, and the F a's satisfy
X
Fa = 1 : (3.192)
a
134 CHAPTER 3. MEASUREMENT AND EVOLUTION
A POVM in HA can be realized as a unitary transformation on the tensor
product HA
HB , followed by an orthogonal measurement in HB .
Superoperator. Unitary evolution on HA
HB will not in general
appear to be unitary if we restrict our attention to HA alone. Rather, evo-
lution in HA will be described by a superoperator, (which can be inverted
by another superoperator only if unitary). A general superoperator $ has an
operator-sum (Kraus) representation:
X
$ : ! $() = M M y ; (3.193)
where
X
M yM = 1 : (3.194)
In fact, any reasonable (linear and completely positive) mapping of density
matrices to density matrices has unitary and operator-sum representations.
Decoherence. Decoherence { the decay of quantum information due to
the interaction of a system with its environment { can be described by a
superoperator. If the environment frequently \scatters" o the system, and
the state of the environment is not monitored, then o-diagonal terms in the
density matrix of the system decay rapidly in a preferred basis (typically a
spatially localized basis selected by the nature of the coupling of the system
to the environment). The time scale for decoherence is set by the scattering
rate, which may be much larger than the damping rate for the system.
Master Equation. When the relevant dynamical time scale of an open
quantum system is long compared to the time for the environment to \for-
get" quantum information, the evolution of the system is eectively local in
time (the Markovian approximation). Much as general unitary evolution is
generated by a Hamiltonian, a general Markovian superoperator is generated
by a Lindbladian L as described by the master equation:
_ L[] = i[H ; ] + X LLy 12 LyL 21 LyL :
(3.195)
Here each Lindblad operator (or quantum jump operator) represents a \quan-
tum jump" that could in principle be detected if we monitored the envi-
ronment faithfully. By solving the master equation, we can compute the
decoherence rate of an open system.
3.8. EXERCISES 135
3.8 Exercises
3.1 Realization of a POVM
Consider the POVM dened by the four positive operators
P1 = 12 j "z ih"z j ; P2 = 21 j #z ih#z j
P3 = 12 j "xih"x j ; P4 = 12 j #xih#x j :
(3.196)
Show how this POVM can be realized as an orthogonal measurement
in a two-qubit Hilbert space, if one ancilla spin is introduced.
3.2 Invertibility of superoperators
The purpose of this exercise is to show that a superoperator is invertible
only if it is unitary. Recall that any superoperator has an operator-sum
representation; it acts on a pure state as
X
M(j ih j) = Mj ih jMy; (3.197)
where P MyM = 1. Another superoperator N is said to be the
inverse of M if N M = I , or
X
NaMj ih jMyNya = j ih j; (3.198)
;a
for any j i. It follows that
X
jh jNaMj ij2 = 1: (3.199)
;a
M2 = pp 12 (1 3): (3.203)
3.8. EXERCISES 137
a) Find an alternative representation using only two Kraus operators
N0 ; N1 .
b) Find a unitary 3 3 matrix Ua such that your Kraus operators
found in (a) (augmented by N2 = 0) are related to M0;1;2 by
M = UaNa: (3.204)
c) Consider a single-qubit channel with a unitary representation
q
j0iA j0iE ! 1 p j0iA j0iE + pp j0iA j
0iE
q
j1iA j0iE ! 1 p j1iA j0iE + pp j1iA j
1iE ;
(3.205)
where j
0iE and j
1iE are normalized states, both orthogonal to
j0iE , that satisfy
E h
0 j
1 iE = 1 "; 0 < " < 1: (3.206)
Show that this is again the phase-damping channel, and nd its
operator-sum representation with two Kraus operators.
d) Suppose that the channel in (c) describes what happens to the qubit
when a single photon scatters from it. Find the decoherence rate
decoh in terms of the scattering rate scatt .
3.6 Decoherence on the Bloch sphere
Parametrize the density matrix of a single qubit as
= 21 1 + P~ ~ : (3.207)
a) Describe what happens to P~ under the action of the phase-damping
channel.
b) Describe what happens to P~ under the action of the amplitude-
damping channel dened by the Kraus operators.
! pp !
1 0
M0 = 0 p1 p ; M1 = 0 0 : 0
(3.208)
138 CHAPTER 3. MEASUREMENT AND EVOLUTION
c) The same for the \two-Pauli channel:"
q r r
M0 = 1 p 1; M1 = p2 1; M2 = p2 3:
(3.209)
3.7 Decoherence of the damped oscillator
We saw in class that, for an oscillator that can emit quanta into a zero-
temperature reservoir, the interaction picture density matrix I (t) of
the oscillator obeys the master equation
1 1
_ I = aI a 2 a aI 2 I a a ;
y y y (3.210)
where a is the annihilation operator of the oscillator.
a) Consider the quantity
h i
X (; t) = tr I (t)eay e a ; (3.211)
(where is a complex number). Use the master equation to derive
and solve a dierential equation for X (; t). You should nd
X (; t) = X (0 ; 0); (3.212)
where 0 is a function of ; ; and t. What is this function
0(; ; t)?
b) Now suppose that a \cat state" of the oscillator is prepared at t = 0:
jcati = p12 (j1i + j2i) ; (3.213)
where ji denotes the coherent state
ji = e jj2=2 eay j0i: (3.214)
Use the result of (a) to infer the density matrix at a later time
t. Assuming t 1, at what rate do the o-diagonal terms in
decay (in this coherent state basis)?
Chapter 4
Quantum Entanglement
4.1 Nonseparability of EPR pairs
4.1.1 Hidden quantum information
The deep ways that quantum information diers from classical information
involve the properties, implications, and uses of quantum entanglement. Re-
call from x2.4.1 that a bipartite pure state is entangled if its Schmidt number
is greater than one. Entangled states are interesting because they exhibit
correlations that have no classical analog. We will begin the study of these
correlations in this chapter.
Recall, for example, the maximally entangled state of two qubits dened
in x3.4.1:
j+iAB = p1 (j00iAB + j11iAB ): (4.1)
2
\Maximally entangled" means that when we trace over qubit B to nd the
density operator A of qubit A, we obtain a multiple of the identity operator
A = trB (j+iAB AB h+ ) = 21 1A; (4.2)
(and similarly B = 21 1B ). This means that if we measure spin A along
any axis, the result is completely random { we nd spin up with probability
1/2 and spin down with probability 1=2. Therefore, if we perform any local
measurement of A or B , we acquire no information about the preparation of
the state, instead we merely generate a random bit. This situation contrasts
139
140 CHAPTER 4. QUANTUM ENTANGLEMENT
sharply with case of a single qubit in a pure state; there we can store a bit by
preparing, say, either j "n^ i or j #n^ i, and we can recover that bit reliably by
measuring along the n^-axis. With two qubits, we ought to be able to store
two bits, but in the state j+iAB this information is hidden; at least, we can't
acquire it by measuring A or B .
In fact, j+i is one member of a basis of four mutually orthogonal states
for the two qubits, all of which are maximally entangled | the basis
j i = p1 (j00i j11i);
2
1
j i = p2 (j01i j10i); (4.3)
introduced in x3.4.1. We can choose to prepare one of these four states, thus
encoding two bits in the state of the two-qubit system. One bit is the parity
bit (ji or j i) { are the two spins aligned or antialigned? The other is
the phase bit (+ or ) { what superposition was chosen of the two states
of like parity. Of course, we can recover the information by performing
an orthogonal measurement that projects onto the fj+i; j i; j +i; j ig
basis. But if the two qubits are distantly separated, we cannot acquire this
information locally; that is, by measuring A or measuring B .
What we can do locally is manipulate this information. Suppose that
Alice has access to qubit A, but not qubit B . She may apply 3 to her
qubit,
ipping the relative phase of j0iA and j1iA . This action
ips the phase
bit stored in the entangled state:
j+i $ j i;
j + i $ j i: (4.4)
On the other hand, she can apply 1, which
ips her spin (j0iA $ j1iA ), and
also
ips the parity bit of the entangled state:
j+i $ j +i;
j i $ j i: (4.5)
Bob can manipulate the entangled state similarly. In fact, as we discussed
in x2.4, either Alice or Bob can perform a local unitary transformation that
changes one maximally entangled state to any other maximally entangled
4.1. NONSEPARABILITY OF EPR PAIRS 141
state.1 What their local unitary transformations cannot do is alter A =
B = 21 1 { the information they are manipulating is information that neither
one can read.
But now suppose that Alice and Bob are able to exchange (classical)
messages about their measurement outcomes; together, then, they can learn
about how their measurements are correlated. The entangled basis states are
conveniently characterized as the simultaneous eigenstates of two commuting
observables:
(1A)(1B);
(3A) (3B); (4.6)
the eigenvalue of (3A)(3B) is the parity bit, and the eigenvalue of (1A)(1B) is
the phase bit. Since these operators commute, they can in principle be mea-
sured simultaneously. But they cannot be measured simultaneously if Alice
and Bob perform localized measurements. Alice and Bob could both choose
to measure their spins along the z-axis, preparing a simultaneous eigenstate
of (3A) and (3B). Since (3A) and (3B) both commute with the parity operator
(3A)(3B), their orthogonal measurements do not disturb the(Aparity bit, and
they can combine their results to infer the parity bit. But 3 ) and (3B) do
not commute with phase operator (1A) (1B ), so their measurement disturbs
the phase bit. On the other hand, they could both choose to measure their
spins along the x-axis; then they would learn the phase bit at the cost of
disturbing the parity bit. But they can't have it both ways. To have hope of
acquiring the parity bit without disturbing the phase bit, they would need to
learn about the product (3A)(3B) without nding out anything about (3A)
and (3B) separately. That cannot be done locally.
Now let us bring Alice and Bob together, so that they can operate on
their qubits jointly. How might they acquire both the parity bit and the
phase bit of their pair? By applying an appropriate unitary transformation,
they can rotate the entangled basis fji; j ig to the unentangled basis
fj00i; j01i; j10i; j11ig. Then they can measure qubits A and B separately to
acquire the bits they seek. How is this transformation constructed?
1But of course, this does not suce to perform an arbitrary unitary transformation on
the four-dimensional space HA
HB , which contains states that are not maximally entan-
gled. The maximally entangled states are not a subspace { a superposition of maximally
entangled states typically is not maximally entangled.
142 CHAPTER 4. QUANTUM ENTANGLEMENT
This is a good time to introduce notation that will be used heavily later in
the course, the quantum circuit notation. Qubits are denoted by horizontal
lines, and the single-qubit unitary transformation U is denoted:
U
A particular single-qubit unitary we will nd useful is the Hadamard trans-
form
!
1
H = p2 1 1 = p12 (1 + 3);
1 1 (4.7)
which has the properties
H 2 = 1; (4.8)
and
H1H = 3;
H3H = 1: (4.9)
(We can envision H (up to an overall phase) as a = rotation about the
axis n^ = p12 (^n1 + n^3) that rotates x^ to z^ and vice-versa; we have
H u
i
(to be read from left to right) represents the product of H applied to the
rst qubit followed by CNOT with the rst bit as the source and the second
bit as the target. It is straightforward to see that this circuit transforms the
standard basis to the entangled basis,
j00i ! p12 (j0i + j1i)j0i ! j+i;
j01i ! p12 (j0i + j1i)j1i ! j +i;
j10i ! p12 (j0i j1i)j0i ! j i;
j11i ! p12 (j0i j1i)j1i ! j i; (4.13)
so that the rst bit becomes the phase bit in the entangled basis, and the
second bit becomes the parity bit.
Similarly, we can invert the transformation by running the circuit back-
wards (since both CNOT and H square to the identity); if we apply the
inverted circuit to an entangled state, and then measure both bits, we can
learn the value of both the phase bit and the parity bit.
Of course, H acts on only one of the qubits; the \nonlocal" part of our
circuit is the controlled-NOT gate { this is the operation that establishes or
removes entanglement. If we could only perform an \interstellar CNOT,"
we would be able to create entanglement among distantly separated pairs, or
144 CHAPTER 4. QUANTUM ENTANGLEMENT
extract the information encoded in entanglement. But we can't. To do its
job, the CNOT gate must act on its target without revealing the value of
its source. Local operations and classical communication will not suce.
4.1.4 Photons
Experiments that test the Bell inequality are done with entangled photons,
not with spin 12 objects. What are the quantum-mechanical predictions for
photons?
Suppose, for example, that an excited atom emits two photons that come
out back to back, with vanishing angular momentum and even parity. If jxi
and jyi are horizontal and vertical linear polarization states of the photon,
4.1. NONSEPARABILITY OF EPR PAIRS 149
then we have seen that
j+i = p12 (jxi + ijyi);
j i = p1 (ijxi + jyi); (4.23)
2
are the eigenstates of helicity (angular momentum along the axis of propaga-
tion z^. For two photons, one propagating in the +^z direction, and the other
in the z^ direction, the states
j+iA j iB
j iA j+iB (4.24)
are invariant under rotations about z^. (The photons have opposite values of
Jz , but the same helicity, since they are propagating in opposite directions.)
Under a re
ection in the y z plane, the polarization states are modied
according to
jxi ! jxi; j+i ! +ij i;
jyi ! jyi; j i ! ij+i; (4.25)
therefore, the parity eigenstates are entangled states
p1 (j+iAj iB j iAj+iB ): (4.26)
2
The state with Jz = 0 and even parity, then, expressed in terms of the linear
polarization states, is
pi (j+iA j iB + j iA j+iB )
2
= p1 (jxxiAB + jyyiAB )n = j+iAB : (4.27)
2
Because of invariance under rotations about z^, the state has this form irre-
spective of how we orient the x and y axes.
We can use a polarization analyzer to measure the linear polarization of
either photon along any axis in the xy plane. Let jx()i and jy()i denote
150 CHAPTER 4. QUANTUM ENTANGLEMENT
the linear polarization eigenstates along axes rotated by angle relative to
the canonical x and y axes. We may dene an operator (the analog of ~ n^)
() = jx()ihx()j jy()ihy()j; (4.28)
which has these polarization states as eigenstates with respective eigenvalues
1. Since
! !
cos
jx()i = sin ; jy()i = cos ; sin (4.29)
in the jxi; jyi basis, we can easily compute the expectation value
AB h j 1
+ (A) ( ) (B )( )j+ i :
2 AB (4.30)
Using rotational invariance:
= AB h+ j (A)(0) (B)(2 1)j+iAB
= 12 hxj (B)(2 1)jxiB 1 hy j (B ) ( )jy i
2B 2 1 B
B
= cos2(2 1) sin2(2 1) = cos[2(2 1)]: (4.31)
(For spin- 21 objects, we would obtain
AB h j( ~ n1
+ (A) ^ )( ~ (B ) n
^ 2) = n^ 1 n^2 = cos(2 1); (4.32)
the argument of the cosine is dierent than in the case of photons, because
the half angle =2 appears in the formula analogous to eq. (4.29).)
!(B)
( B ) cos
b = ( n^2) = sin 2 cos 2 ;
2 sin 2 (4.61)
so that quantum mechanics predicts
habi = hjabji
= cos 1 cos 2 + 2 sin 1 sin 2 (4.62)
p
(and we recover cos(1 2) in the maximally entangled case = = 1= 2).
Now let us consider, for simplicity, the (nonoptimal!) special case
A = 0; A0 = 2 ; B0 = B ; (4.63)
so that the quantum predictions are:
habi = cos B = hab0i
ha0bi = 2 sin B = ha0b0i (4.64)
Plugging into the CHSH inequality, we obtain
j cos B 2 sin B j 1; (4.65)
and we easily see that violations occur for B close to 0 or . Expanding to
linear order in B , the left hand side is
' 1 2B ; (4.66)
which surely exceeds 1 for B negative and small.
We have shown, then, that any entangled pure state of two qubits violates
some Bell inequality. It is not hard to generalize the argument to an arbitrary
bipartite pure state. For bipartite pure states, then, \entangled" is equivalent
to \Bell-inequality violating." For bipartite mixed states, however, we will
see shortly that the situation is more subtle.
156 CHAPTER 4. QUANTUM ENTANGLEMENT
4.2 Uses of Entanglement
After Bell's work, quantum entanglement became a subject of intensive study
among those interested in the foundations of quantum theory. But more
recently (starting less than ten years ago), entanglement has come to be
viewed not just as a tool for exposing the weirdness of quantum mechanics,
but as a potentially valuable resource. By exploiting entangled quantum
states, we can perform tasks that are otherwise dicult or impossible.
= H (X; Y ) H (Y ); (5.16)
and similarly
H (Y jX ) h log p(yjx)i
!
p ( x; y )
= h log p(x) i = H (X; Y ) H (X ): (5.17)
We may interpret H (X jY ), then, as the number of additional bits per letter
needed to specify both x and y once y is known. Obviously, then, this quantity
cannot be negative.
The information about X that I gain when I learn Y is quantied by how
much the number of bits per letter needed to specify X is reduced when Y is
known. Thus is
I (X ; Y ) H (X ) H (X jY )
= H (X ) + H (Y ) H (X; Y )
= H (Y ) H (Y jX ): (5.18)
I (X ; Y ) is called the mutual information. It is obviously symmetric under
interchange of X and Y ; I nd out as much about X by learning Y as about Y
5.1. SHANNON FOR DUMMIES 173
by learning X . Learning Y can never reduce my knowledge of X , so I (X ; Y )
is obviously nonnegative. (The inequalities H (X ) H (X jY ) 0 are easily
proved using the convexity of the log function; see for example Elements of
Information Theory by T. Cover and J. Thomas.)
Of course, if X and Y are completely uncorrelated, we have p(x; y) =
p(x)p(y), and
C is called the channel capacity and depends only on the conditional proba-
bilities p(yjx) that dene the channel.
We have now shown that any rate R < C is attainable, but is it possible
for R to exceed C (with the error probability still approaching 0 for large
n)? To show that C is an upper bound on the rate may seem more subtle
in the general case than for the binary symmetric channel { the probability
of error is dierent for dierent letters, and we are free to exploit this in the
design of our code. However, we may reason as follows:
Suppose we have chosen 2nR strings of n letters as our codewords. Con-
sider a probability distribution (denoted X~ n ) in which each codeword occurs
with equal probability (= 2 nR ). Evidently, then,
H (X~ n ) = nR: (5.33)
Sending the codewords through the channel we obtain a probability distri-
bution Y~ n of output states.
Because we assume that the channel acts on each letter independently,
the conditional probability for a string of n letters factorizes:
p(y1y2 ynjx1x2 xn ) = p(y1jx1)p(y2jx2) p(yn jxn);
(5.34)
and it follows that the conditional entropy satises
X
H (Y~ n jX~ n) = h log p(ynjxn)i = h log p(yijxi)i
Xi
= H (Y~i jX~i ); (5.35)
i
178 CHAPTER 5. QUANTUM INFORMATION THEORY
where X~i and Y~i are the marginal probability distributions for the ith letter
determined by our distribution on the codewords. Recall that we also know
that H (X; Y ) H (X ) + H (Y ), or
X
H (Y~ n ) H (Y~i): (5.36)
i
It follows that
I (Y~ n ; X~ n) = H (Y~ n ) H (Y~ njX~ n )
X
(H (Y~i) H (Y~i jX~i))
Xi ~ ~
= I (Yi; Xi) nC ; (5.37)
i
the mutual information of the messages sent and received is bounded above
by the sum of the mutual information per letter, and the mutual information
for each letter is bounded above by the capacity (because C is dened as the
maximum of I (X ; Y )).
Recalling the symmetry of mutual information, we have
I (X~ n; Y~ n) = H (X~ n ) H (X~ n jY~ n )
= nR H (X~ n jY~ n) nC: (5.38)
Now, if we can decode reliably as n ! 1, this means that the input code-
word is completely determined by the signal received, or that the conditional
entropy of the input (per letter) must get small
1 H (X~ n jY~ n ) ! 0: (5.39)
n
If errorless transmission is possible, then, eq. (5.38) becomes
R C; (5.40)
in the limit n ! 1. The rate cannot exceed the capacity. (Remember that
the conditional entropy, unlike the mutual information, is not symmetric.
Indeed (1=n)H (Y~ n jX~ n ) does not become small, because the channel intro-
duces uncertainty about what message will be received. But if we can decode
accurately, there is no uncertainty about what codeword was sent, once the
signal has been received.)
5.2. VON NEUMANN ENTROPY 179
We have now shown that the capacity C is the highest rate of communi-
cation through the noisy channel that can be attained, where the probability
of error goes to zero as the number of letters in the message goes to innity.
This is Shannon's noisy channel coding theorem.
Of course the method we have used to show that R = C is asymptotically
attainable (averaging over random codes) is not very constructive. Since a
random code has no structure or pattern, encoding and decoding would be
quite unwieldy (we require an exponentially large code book). Nevertheless,
the theorem is important and useful, because it tells us what is in principle
attainable, and furthermore, what is not attainable, even in principle. Also,
since I (X ; Y ) is a concave function of X = fx; p(x)g (with fp(yjx)g xed),
it has a unique local maximum, and C can often be computed (at least
numerically) for channels of interest.
where E 0 denotes the orthogonal projection onto the subspace 0. The aver-
age delity therefore obeys
X X
F = pi Fi pih'i jE 0j'ii = tr(nE 0): (5.102)
i i
But since E 0 projects onto a space of dimension 2n(S ) ; tr(n E 0) can be no
larger than the sum of the 2n(S ) largest eigenvalues of n . It follows from
the properties of typical subspaces that this sum becomes as small as we
please; for n large enough
F tr(nE 0) < ": (5.103)
Thus we have shown that, if we attempt to compress to S qubits per
letter, then the delity inevitably becomes poor for n suciently large. We
conclude then, that S () qubits per letter is the optimal compression of the
quantum information that can be attained if we are to obtain good delity as
n goes to innity. This is Schumacher's noiseless quantum coding theorem.
The above argument applies to any conceivable encoding scheme, but only
to a restricted class of decoding schemes (unitary decodings). A more general
decoding scheme can certainly be contemplated, described by a superoperator.
More technology is then required to prove that better compression than S
194 CHAPTER 5. QUANTUM INFORMATION THEORY
qubits per letter is not possible. But the conclusion is the same. The point is
that n(S ) qubits are not sucient to distinguish all of the typical states.
To summarize, there is a close analogy between Shannon's noiseless cod-
ing theorem and Schumacher's noiseless quantum coding theorem. In the
classical case, nearly all long messages are typical sequences, so we can code
only these and still have a small probability of error. In the quantum case,
nearly all long messages have nearly unit overlap with the typical subspace,
so we can code only the typical subspace and still achieve good delity.
In fact, Alice could send eectively classical information to Bob|the
string x1x2 xn encoded in mutually orthogonal quantum states|and Bob
could then follow these classical instructions to reconstruct Alice's state.
By this means, they could achieve high-delity compression to H (X ) bits|
or qubits|per letter. But if the letters are drawn from an ensemble of
nonorthogonal pure states, this amount of compression is not optimal; some
of the classical information about the preparation of the state has become re-
dundant, because the nonorthogonal states cannot be perfectly distinguished.
Thus Schumacher coding can go further, achieving optimal compression to
S () qubits per letter. The information has been packaged more eciently,
but at a price|Bob has received what Alice intended, but Bob can't know
what he has. In contrast to the classical case, Bob can't make any measure-
ment that is certain to decipher Alice's message correctly. An attempt to
read the message will unavoidably disturb it.
0 and it carries no information; Bob can reconstruct the message
perfectly without receiving anything from Alice. Therefore, the message can
be compressed to zero qubits per letters, which is less than S () > 0.
To construct a slightly less trivial example, recall that for an ensemble of
3 See M. Horodecki, quant-ph/9712035.
5.3. QUANTUM DATA COMPRESSION 195
mutually orthogonal pure states, the Shannon entropy of the ensemble equals
the Von Neumann entropy
H (X ) = S (); (5.104)
so that the classical and quantum compressibility coincide. This makes sense,
since the orthogonal states are perfectly distinguishable. In fact, if Alice
wants to send the message
j'x i'x i j'xn i
1 2 (5.105)
to Bob, she can send the classical message x1 : : :xn to Bob, who can recon-
struct the state with perfect delity.
But now suppose that the letters are drawn from an ensemble of mutually
orthogonal mixed states fx; pxg,
trxy = 0 for x 6= y; (5.106)
that is, x and y have support on mutually orthogonal subspaces of the
Hilbert space. These mixed states are also perfectly distinguishable, so again
the messages are essentially classical, and therefore can be compressed to
H (X ) qubits per letter. For example, we can extend the Hilbert space HA
of our letters to the larger space HA
HB , and choose a purication of each
x, a pure state j'xiAB 2 HA
HB such that
trB (j'xiAB AB h'xj) = (x)A: (5.107)
These pure states are mutually orthogonal, and the ensemble fj'xiAB ; px g
has Von Neumann entropy H (X ); hence we may Schumacher compress a
message
j'x iAB j'xn iAB ;
1 (5.108)
to H (X ) qubits per letter (asymptotically). Upon decompressing this state,
Bob can perform the partial trace by \throwing away" subsystem B , and so
reconstruct Alice's message.
To make a reasonable guess about what expression characterizes the com-
pressibility of a message constructed from a mixed state alphabet, we might
seek a formula that reduces to S () for an ensemble of pure states, and to
196 CHAPTER 5. QUANTUM INFORMATION THEORY
H (X ) for an ensemble of mutually orthogonal mixed states. Choosing a basis
in which
= X px x ; (5.109)
x
is block diagonalized, we see that
X
S () = tr log = tr(pxx) log(pxx)
X x
X
= px log px pxtrx log x
x X x
= H (X ) + px S ( x ); (5.110)
x
(recalling that trx = 1 for each x). Therefore we may write the Shannon
entropy as
X
H (X ) = S ( ) pxS (x) (E ): (5.111)
x
The quantity (E ) is called the Holevo information of the ensemble E =
fx; pxg. Evidently, it depends not just on the density matrix , but also
on the particular way that is realized as an ensemble of mixed states. We
have found that, for either an ensemble of pure states, or for an ensemble of
mutually orthogonal mixed states, the Holevo information (E ) is the optimal
number of qubits per letter that can be attained if we are to compress the
messages while retaining good delity for large n.
The Holevo information can be regarded as a generalization of Von Neu-
mann entropy, reducing to S () for an ensemble of pure states. It also bears a
close resemblance to the mutual information of classical information theory:
I (Y ; X ) = H (Y ) H (Y jX ) (5.112)
tells us how much, on the average, the Shannon entropy of Y is reduced once
we learn the value of X ; similarly,
X
(E ) = S () pxS (x) (5.113)
x
tells us how much, on the average, the Von Neumann entropy of an ensemble
is reduced when we know which preparation was chosen. Like the classical
5.3. QUANTUM DATA COMPRESSION 197
mutual information, the Holevo information is always nonnegative, as follows
from the concavity property of S (),
X X
S ( px x) pxS (x): (5.114)
x
Now we wish to explore the connection between the Holevo information and
the compressibility of messages constructed from an alphabet of nonorthog-
onal mixed states. In fact, it can be shown that, in general, high-delity
compression to less than qubits per letter is not possible.
To establish this result we use a \monotonicity" property of that was
proved by Lindblad and by Uhlmann: A superoperator cannot increase the
Holevo information. That is, if $ is any superoperator, let it act on an
ensemble of mixed states according to
$ : E = fx; pxg ! E 0 = f$(x); px g; (5.115)
then
(E 0) (E ): (5.116)
Lindblad{Uhlmann monotonicity is closely related to the strong subadditiv-
ity of the Von Neumann entropy, as you will show in a homework exercise.
The monotonicity of provides a further indication that quanties
an amount of information encoded in a quantum system. The decoherence
described by a superoperator can only retain or reduce this quantity of infor-
mation { it can never increase it. Note that, in contrast, the Von Neumann
entropy is not monotonic. A superoperator might take an initial pure state
to a mixed state, increasing S (). But another superoperator takes every
mixed state to the \ground state" j0ih0j, and so reduces the entropy of an
initial mixed state to zero. It would be misleading to interpret this reduction
of S as an \information gain," in that our ability to distinguish the dier-
ent possible preparations has been completely destroyed. Correspondingly,
decay to the ground state reduces the Holevo information to zero, re
ecting
that we have lost the ability to reconstruct the initial state.
We now consider messages of n letters, each drawn independently from
the ensemble E = fx; pxg; the ensemble of all such input messages is denoted
E (n). A code is constructed that compresses the messages so that they all
occupy a Hilbert space H~ (n); the ensemble of compressed messages is denoted
E~(n). Then decompression is performed with a superoperator $,
$ : E~(n) ! E 0(n); (5.117)
198 CHAPTER 5. QUANTUM INFORMATION THEORY
to obtain an ensemble E 0(n) of output messages.
Now suppose that this coding scheme has high delity. To minimize
technicalities, let us not specify in detail how the delity of E 0(n) relative to
E (n) should be quantied. Let us just accept that if E 0(n) has high delity,
then for any and n suciently large
1 (E (n)) 1 (E 0(n)) 1 (E (n) ) + ; (5.118)
n n n
the Holevo information per letter of the output approaches that of the input.
Since the input messages are product states, it follows from the additivity of
S () that
(E (n)) = n(E ); (5.119)
and we also know from Lindblad{Uhlmann monotonicity that
(E 0(n)) (E~(n)): (5.120)
By combining eqs. (5.118)-(5.120), we nd that
1 (E~(n)) (E ) : (5.121)
n
Finally, (E~(n) ) is bounded above by S (~(n)), which is in turn bounded above
by log dim H~ (n). Since may be as small as we please, we conclude that,
asymptotically as n ! 1,
1 log(dim H~ (n) ) (E ); (5.122)
n
high-delity compression to fewer than (E ) qubits per letter is not possible.
One is sorely tempted to conjecture that compression to (E ) qubits per
letter is asymptotically attainable. As of mid-January, 1998, this conjecture
still awaits proof or refutation.
5.7 Exercises
5.1 Distinguishing nonorthogonal states.
Alice has prepared a single qubit in one of the two (nonorthogonal)
states
1 !
cos 2
jui = 0 ; jvi = sin ; (5.222)
2
where 0 < < . Bob knows the value of , but he has no idea whether
Alice prepared jui or jvi, and he is to perform a measurement to learn
what he can about Alice's preparation.
Bob considers three possible measurements:
a) An orthogonal measurement with
E 1 = juihuj; E 2 = 1 juihuj: (5.223)
(In this case, if Bob obtains outcome 2, he knows that Alice must have
prepared jvi.)
b) A three-outcome POVM with
F 1 = A(1 juihuj); F 2 = A(1 jvihvj)
226 CHAPTER 5. QUANTUM INFORMATION THEORY
F 3 = (1 2A)1 + A(juihuj + jvihvj); (5.224)
where A has the largest value consistent with positivity of F 3. (In
this case, Bob determines the preparation unambiguously if he obtains
outcomes 1 or 2, but learns nothing from outcome 3.)
c) An orthogonal measurement with
E 1 = jwihwj; E 2 = 1 jwihwj; (5.225)
where
0 h 1 i 1
cos +
jwi = @ h 12 2 2 i A : (5.226)
sin 2 2 + 2
(In this case E 1 and E 2 are projections onto the spin states that are ori-
ented in the x z plane normal to the axis that bisects the orientations
of jui and jvi.)
Find Bob's average information gain I () (the mutual information of
the preparation and the measurement outcome) in all three cases, and
plot all three as a function of . Which measurement should Bob
choose?
5.2 Relative entropy.
The relative entropy S (j) of two density matrices and is dened
by
S (j) = tr(log log ): (5.227)
You will show that S (j) is nonnegative, and derive some conse-
quences of this property.
a) A dierentiable real-valued function of a real variable is concave if
f (y) f (x) (y x)f 0(x); (5.228)
for all x and y. Show that if a and b are observables, and f is concave,
then
tr(f (b) f (a)) tr[(b a)f 0(a)]: (5.229)
5.7. EXERCISES 227
b) Show that f (x) = x log x is concave for x > 0.
c) Use (a) and (b) to show S (j) 0 for any two density matrices and
.
d) Use nonnegativity of S (j) to show that if has its support on a space
of dimension D, then
S () log D: (5.230)
e) Use nonnegativity of relative entropy to prove the subadditivity of entropy
S (AB ) S (A ) + S (B ): (5.231)
[Hint: Consider the relative entropy of A
B and AB .]
f) Use subadditivity to prove the concavity of the entropy:
X X
S ( ii) iS (i); (5.232)
i i
where the i's are positive real numbers summing to one. [Hint: Apply
subadditivity to
AB = X i (i)A
(jeiiheij)B : ] (5.233)
i
g) Use subadditivity to prove the triangle inequality (also called the Araki-
Lieb inequality):
S (AB ) jS (A) S (B )j: (5.234)
[Hint: Consider a purication of AB ; that is, construct a pure state
j i such that AB = trC j ih j. Then apply subadditivity to BC .]
5.3 Lindblad{Uhlmann monotonicity.
According to a theorem proved by Lindblad and by Uhlmann, relative
entropy on HA
HB has a property called monotonicity:
S (AjA ) S (AB jAB ); (5.235)
The relative entropy of two density matrices on a system AB cannot
be less than the induced relative entropy on the subsystem A.
228 CHAPTER 5. QUANTUM INFORMATION THEORY
a) Use Lindblad-Uhlmann monotonicity to prove the strong subadditivity
property of the Von Neumann entropy. [Hint: On a tripartite system
ABC , consider the relative entropy of ABC and A
BC .]
b) Use Lindblad{Uhlmann monotonicity to show that the action of a super-
operator cannot increase relative entropy, that is,
S ($j$) S (j); (5.236)
Where $ is any superoperator (completely positive map). [Hint: Recall
that any superoperator has a unitary representation.]
c) Show that it follows from (b) that a superoperator cannot increase the
Holevo information of an ensemble E = fx; px g of mixed states:
($(E )) (E ); (5.237)
where
X ! X
(E ) = S px x px S (x): (5.238)
x x
5.4 The Peres{Wootters POVM.
Consider the Peres{Wootters information source described in x5.4.2 of
the lecture notes. It prepares one of the three states
jai = j'aij'ai; a = 1; 2; 3; (5.239)
each occurring with a priori probability 13 , where the j'ai's are dened
in eq. (5.149).
a) Express the density matrix
X !
1
= 3 jaiha j ; (5.240)
a
in terms of the Bell basis of maximally entangled states fji; j ig,
and compute S ().
b) For the three vectors jai; a = 1; 2; 3, construct the \pretty good mea-
surement" dened in eq. (5.162). (Again, expand the jai's in the Bell
basis.) In this case, the PGM is an orthogonal measurement. Express
the elements of the PGM basis in terms of the Bell basis.
5.7. EXERCISES 229
c) Compute the mutual information of the PGM outcome and the prepara-
tion.
5.5 Teleportation with mixed states.
An operational way to dene entanglement is that an entangled state
can be used to teleport an unknown quantum state with better delity
than could be achieved with local operations and classical communica-
tion only. In this exercise, you will show that there are mixed states
that are entangled in this sense, yet do not violate any Bell inequality.
Hence, for mixed states (in contrast to pure states) \entangled" and
\Bell-inequality-violating" are not equivalent.
Consider a \noisy" entangled pair with density matrix.
() = (1 )j ih j + 14 1: (5.241)
a) Find the delity F that can be attained if the state () is used to teleport
a qubit from Alice to Bob. [Hint: Recall that you showed in an earlier
exercise that a \random guess" has delity F = 12 .]
b) For what values of is the delity found in (a) better than what can be
achieved if Alice measures her qubit and sends a classical message to
Bob? [Hint: Earlier, you showed that F = 2=3 can be achieved if Alice
measures her qubit. In fact this is the best possible F attainable with
classical communication.]
c) Compute
Prob("n^ "m^ ) tr (E A(^n)E B (m^ )()) ; (5.242)
where E A(^n) is the projection of Alice's qubit onto j "n^ i and E B (m^ )
is the projection of Bob's qubit onto j "m^ i.
d) Consider the case = 1=2. Show that in this case the state () violates
no Bell inequalities. Hint: It suces to construct a local hidden variable
model that correctly reproduces the spin correlations found in (c), for
= 1=2. Suppose that the hidden variable ^ is uniformly distributed
on the unit sphere, and that there are functions fA and fB such that
ProbA ("n^ ) = fA(^ n^); ProbB ("m^ ) = fB (^ m^ ):
(5.243)
230 CHAPTER 5. QUANTUM INFORMATION THEORY
The problem is to nd fA and fB (where 0 fA;B 1) with the
properties
Z Z
fA
^ Z
(^ n
^ ) = 1 = 2 ; ^
fB (^ m^ ) = 1=2;
fA (^ n^)fB (^ m^ ) = Prob("n^ "m^ ): (5.244)
^
Chapter 6
Quantum Computation
6.1 Classical Circuits
The concept of a quantum computer was introduced in Chapter 1. Here we
will specify our model of quantum computation more precisely, and we will
point out some basic properties of the model. But before we explain what a
quantum computer does, perhaps we should say what a classical computer
does.
Then
f (x) = f (1)(x) _ f (2)(x) _ f (3)(x) _ : : : : (6.5)
f is the logical OR (_) of all the f (a)'s. In binary arithmetic the _ operation
of two bits may be represented
x _ y = x + y x y; (6.6)
it has the value 0 if x and y are both zero, and the value 1 otherwise.
Now consider the evaluation of f (a). In the case where x(a) = 111 : : : 1,
we may write
f (a)(x) = x1 ^ x2 ^ x3 : : : ^ xn; (6.7)
it is the logical AND (^) of all n bits. In binary arithmetic, the AND is the
product
x ^ y = x y: (6.8)
For any other x(a); f (a) is again obtained as the AND of n(abits, but where the
NOT (:) operation is rst applied to each xi such that xi = 0; for example
)
x s x
y g xy
This gate
ips the second bit if the rst is 1, and does nothing if the rst bit
is 0 (hence the name controlled-NOT). Its square is trivial, that is, it inverts
itself. Of course, this gate performs a NOT on the second bit if the rst bit
is set to 1, and it performs the copy operation if y is initially set to zero:
XOR : (x; 0) 7! (x; x): (6.34)
With the circuit
x s gs y
y gs g x
it
ips the third bit if the rst two are 1 and does nothing otherwise. Like
the XOR gate, it is its own inverse.
Unlike the reversible 2-bit gates, the Tooli gate serves as a universal gate
for Boolean logic, if we can provide xed input bits and ignore output bits.
If z is initially 1, then x " y = 1 xy appears in the third output | we can
perform NAND. If we x x = 1, the Tooli gate functions like an XOR gate,
and we can use it to copy.
The Tooli gate (3) is universal in the sense that we can build a circuit to
compute any reversible function using Tooli gates alone (if we can x input
bits and ignore output bits). It will be instructive to show this directly,
without relying on our earlier argument that NAND/NOT is universal for
Boolean functions. In fact, we can show the following: From the NOT gate
6.1. CLASSICAL CIRCUITS 243
and the Tooli gate (3), we can construct any invertible function on n bits,
provided we have one extra bit of scratchpad space available.
The rst step is to show that from the three-bit Tooli-gate (3) we can
construct an n-bit Tooli gate (n) that acts as
(x1; x2; : : : xn 1; y) ! (x1; x2; : : : ; xn 1y x1x2 : : : xn 1):
(6.40)
The construction requires one extra bit of scratch space. For example, we
construct (4) from (3)'s with the circuit
x1 s s x1
x2 s s x2
0 gs g 0
x3 s x3
y g y x1x2x3
The purpose of the last (3) gate is to reset the scratch bit back to its original
value zero. Actually, with one more gate we can obtain an implementation
of (4) that works irrespective of the initial value of the scratch bit:
x1 s s x1
x2 s s x2
w gs gs w
x3 s s x3
y g g y x1x2x3
Again, we can eliminate the last gate if we don't mind
ipping the value of
the scratch bit.
We can see that the scratch bit really is necessary, because (4) is an odd
permutation (in fact a transposition) of the 24 4-bit strings | it transposes
1111 and 1110. But (3) acting on any three of the four bits is an even
permutation; e.g., acting on the last three bits it transposes 0111 with 0110,
244 CHAPTER 6. QUANTUM COMPUTATION
and 1111 with 1110. Since a product of even permutations is also even, we
cannot obtain (4) as a product of (3)'s that act on four bits only.
The construction of (4) from four (3)'s generalizes immediately to the
construction of (n) from two (n 1)'s and two (3)'s (just expand x1 to several
control bits in the above diagram). Iterating the construction, we obtain (n)
from a circuit with 2n 2 +2n 3 2 (3)'s. Furthermore, just one bit of scratch
space is sucient.2) (When we need to construct (k), any available extra
bit will do, since the circuit returns the scratch bit to its original value. The
next step is to note that, by conjugating (n) with NOT gates, we can in
eect modify the value of the control string that \triggers" the gate. For
example, the circuit
x1 gs g
x2 s
x3 gs g
y g
ips the value of y if x1x2x3 = 010, and it acts trivially otherwise. Thus
this circuit transposes the two strings 0100 and 0101. In like fashion, with
(n) and NOT gates, we can devise a circuit that transposes any two n-bit
strings that dier in only one bit. (The location of the bit where they dier
is chosen to be the target of the (n) gate.)
But in fact a transposition that exchanges any two n-bit strings can be
expressed as a product of transpositions that interchange strings that dier
in only one bit. If a0 and as are two strings that are Hamming distance s
apart (dier in s places), then there is a chain
a0; a1; a2; a3; : : : ; as; (6.41)
such that each string in the chain is Hamming distance one from its neighbors.
Therefore, each of the transpositions
(a0a1); (a1a2); (a2a3); : : : (as 1as); (6.42)
2 With more scratch space, we can build (n) from (3) 's much more eciently | see
the exercises.
6.1. CLASSICAL CIRCUITS 245
can be implemented as a (n) gate conjugated by NOT gates. By composing
transpositions we nd
(a0as) = (as 1as)(as 2as 1) : : : (a2a3)(a1a2)(a0a1)(a1a2)(a2a3)
: : : (as 2 as 1)(as 1as); (6.43)
we can construct the Hamming-distance-s transposition from 2s 1 Hamming-
distance-one transpositions. It follows that we can construct (a0as) from
(n)'s and NOT gates.
Finally, since every permutation is a product of transpositions, we have
shown that every invertible function on n-bits (every permutation on n-bit
strings) is a product of (3)'s and NOT's, using just one bit of scratch space.
Of course, a NOT can be performed with a (3) gate if we x two input
bits at 1. Thus the Tooli gate (3) is universal for reversible computation,
if we can x input bits and discard output bits.
builds the Fredkin gate from four switch gates (two running forward and two
running backward). Time delays needed to maintain synchronization are not
explicitly shown.
In the billiard ball computer, the switch gate is constructed with two
re
ectors, such that (in the case x = y = 1) two moving balls collide twice.
The trajectories of the balls in this case are:
6.1. CLASSICAL CIRCUITS 247
A ball labeled x emerges from the gate along the same trajectory (and at the
same time) regardless of whether the other ball is present. But for x = 1, the
position of the other ball (if present) is shifted down compared to its nal
position for x = 0 | this is a switch gate. Since we can perform a switch
gate, we can construct a Fredkin gate, and implement universal reversible
logic with a billiard ball computer.
An evident weakness of the billiard-ball scheme is that initial errors in the
positions and velocities of the ball will accumulate rapidly, and the computer
will eventually fail. As we noted in Chapter 1 (and Landauer has insistently
pointed out) a similar problem will aict any proposed scheme for dissipa-
tionless computation. To control errors we must be able to compress the
phase space of the device, which will necessarily be a dissipative process.
6.2.1 Accuracy
Let's discuss the issue of accuracy. We imagine that we wish to implement
a computation in which the quantum gates U 1; U 2; : : : ; U T are applied se-
quentially to the initial state j'0i. The state prepared by our ideal quantum
circuit is
j'T i = U T U T 1 : : : U 2U 1j'0i: (6.60)
But in fact our gates do not have perfect accuracy. When we attempt to ap-
ply the unitary transformation U t, we instead apply some \nearby" unitary
transformation U~ t. (Of course, this is not the most general type of error that
we might contemplate { the unitary U t might be replaced by a superoperator.
Considerations similar to those below would apply in that case, but for now
we conne our attention to \unitary errors.")
The errors cause the actual state of the computer to wander away from
the ideal state. How far does it wander? Let j'ti denote the ideal state after
t quantum gates are applied, so that
j'ti = U tj't 1i: (6.61)
But if we apply the actual transformation U~ t, then
U~ tj't 1i = j'ti + jEti; (6.62)
where
jEti = (U~ t U t)j't 1i; (6.63)
is an unnormalized vector. If j'~ti denotes the actual state after t steps, then
we have
j'~1i = j'1i + jE1i;
j'~2i = U~ 2j'~1i = j'2i + jE2i + U~ 2jE1i; (6.64)
6.2. QUANTUM CIRCUITS 255
and so forth; we ultimately obtain
j'~T i = j'T i + jET i + U~ T jET 1i + U~ T U~ T 1jET 2i
+ : : : + U~ T U~ T 1 : : : U~ 2jE1i: (6.65)
Thus we have expressed the dierence between j'~T i and j'T i as a sum of T
remainder terms. The worst case yielding the largest deviation of j'~T i from
j'T i occurs if all remainder terms line up in the same direction, so that the
errors interfere constructively. Therefore, we conclude that
k j'~T i j'T i k k jET i k + k jET 1i k
+ : : : + k jE2i k + k jE1i k; (6.66)
where we have used the property k U jEii k=k jEii k for any unitary U .
Let k A ksup denote the sup norm of the operator A | that is, the
maximum modulus of an eigenvalue of A. We then have
k jEti k=k U~ t U t j't 1i kk U~ t U t ksup (6.67)
(since j't 1i is normalized). Now suppose that, for each value of t, the error
in our quantum gate is bounded by
k U~ t U t ksup< ": (6.68)
Then after T quantum gates are applied, we have
k j'~T i j'T i k< T"; (6.69)
in this sense, the accumulated error in the state grows linearly with the length
of the computation.
The distance bounded in eq. (6.68) can equivalently be expressed as k
W t 1 ksupi, where W t = U~ tU yt . Since W t is unitary, each of its eigenvalues
is a phase e , and the corresponding eigenvalue of W t 1 has modulus
jei 1j = (2 2 cos )1=2; (6.70)
so that eq. (6.68) is the requirement that each eigenvalue satises
cos > 1 "2=2; (6.71)
256 CHAPTER 6. QUANTUM COMPUTATION
<", for " small). The origin of eq. (6.69) is clear. In each time step,
(or jj
j'~i rotates relative to j'i by (at worst) an angle of order ", and the distance
between the vectors increases by at most of order ".
How much accuracy is good enough? In the nal step of our computation,
we perform an orthogonal measurement, and the probability of outcome a,
in the ideal case, is
P (a) = jhaj'T ij2: (6.72)
Because of the errors, the actual probability is
P~ (a) = jhaj'~T ij2: (6.73)
If the actual vector is close to the ideal vector, then the probability distribu-
tions are close, too. If we sum over an orthonormal basis fjaig, we have
X ~
jP (a) P (a)j 2 k j'~T i j'T i k; (6.74)
a
as you will show in a homework exercise. Therefore, if we keep T" xed (and
small) as T gets large, the error in the probability distribution also remains
xed. In particular, if we have designed a quantum algorithm that solves a
decision problem correctly with probability greater 12 + (in the ideal case),
then we can achieve success probability greater than 12 with our noisy gates,
if we can perform the gates with an accuracy T" < O(). A quantum circuit
family in the BQP class can really solve hard problems, as long as we can
improve the accuracy of the gates linearly with the computation size T .
U0 = P U P 1
that applies R to the third qubit if the rst two qubits have the value 1;
otherwise it acts trivially. Here
! !
R = iRx() = ( i) exp i 2 x = ( i) cos 2 + ix sin 2
(6.89)
is, up to a phase, a rotation by about the x-axis, where is a particular
angle incommensurate with .
262 CHAPTER 6. QUANTUM COMPUTATION
The nth power of the Deutsch gate is the controlled-controlled-Rn . In
particular, R4 = Rx(4), so that all one-qubit transformations generated by
x are reachable by integer powers of R. Furthermore the (4n + 1)st power
is
" #
(4n + 1)
( i) cos 2 + ix sin 2 (4n + 1) ; (6.90)
s s
s s
j0i gs g
s
R
s
U
264 CHAPTER 6. QUANTUM COMPUTATION
denotes the controlled-U gate (the 2 2 unitary U is applied to the second
qubit if the rst qubit is 1; otherwise the gate acts trivially) then a controlled-
controlled-U 2 gate is obtained from the circuit
x s x s s s x
y s = y s gs g y
U2 U Uy U
A g B g C
fori"Asuciently
n small) we can reach any feiAg to within distance " with
e , for some integer n of order " 2k . We also know that we can ob-
tain transformations feiAa g where the Aa's span the full U (2k ) Lie algebra,
using P circuits of xed size (independent of "). We may then approach any
exp (i a aAa) as in eq. (6.87), also with polynomial convergence.
In principle, we should be able to do much better, reaching a desired
k-qubit unitary within distance " using just poly(log(" 1)) quantum gates.
Since the number of size-T circuits that we can construct acting on k qubits
is exponential in T , and the circuits ll U (2k ) roughly uniformly, there should
be a size-T circuit reaching within a distance of order e T of any point in
U (2k ). However, it might be a computationally hard problem classically
to work out the circuit that comes exponentially close to the unitary we are
trying to reach. Therefore, it would be dishonest to rely on this more ecient
construction in an asymptotic analysis of quantum complexity.
j0i H s H Measure
j1i H Uf
7 The term \oracle" signies that the box responds to a query immediately; that is, the
time it takes the box to operate is not included in the complexity analysis.
6.3. SOME QUANTUM ALGORITHMS 269
Here H denotes the Hadamard transform
H : jxi ! p12 X( 1)xy jyi; (6.107)
y
or
H : j0i ! p12 (j0i + j1i)
j1i ! p12 (j0i j1i); (6.108)
that is, H is the 2 2 matrix
p1 p1
!
H : p1 p1 :
2 2 (6.109)
2 2
The circuit takes the input j0ij1i to
j0ij1i ! 21 (j0i + j1i)(j0i j1i)
! 21 ( 1)f (0)j0i + ( 1)f (1)j1i (j0i j1i)
2
1
! 2 4 ( 1)f (0) + ( 1)f (1) j0i
3
f (0)
+ ( 1) ( 1)f (1) j1i5 p1 (j0i j1i):
2 (6.110)
Then when we measure the rst qubit, we nd the outcome j0i with prob-
ability one if f (0) = f (1) (constant function) and the outcome j1i with
probability one if f (0) 6= f (1) (balanced function).
A quantum computer enjoys an advantage over a classical computer be-
cause it can invoke quantum parallelism. Because we input a superposition
of j0i and j1i, the output is sensitive to both the values of f (0) and f (1),
even though we ran the box just once.
Deutsch{Jozsa problem. Now we'll consider some generalizations of
Deutsch's problem. We will continue to assume that we are to analyze a
quantum black box (\quantum oracle"). But in the hope of learning some-
thing about complexity, we will imagine that we have a family of black boxes,
270 CHAPTER 6. QUANTUM COMPUTATION
with variable input size. We are interested in how the time needed to nd
out what is inside the box scales with the size of the input (where \time" is
measured by how many times we query the box).
In the Deutsch{Jozsa problem, we are presented with a quantum black
box that computes a function taking n bits to 1,
f : f0; 1gn ! f0; 1g; (6.111)
and we have it on good authority that f is either constant (f (x) = c for all
x) or balanced (f (x) = 0 for exactly 21 of the possible input values). We are
to solve the decision problem: Is f constant or balanced?
In fact, we can solve this problem, too, accessing the box only once, using
the same circuit as for Deutsch's problem (but with x expanded from one
bit to n bits). We note that if we apply n Hadamard gates in parallel to
n-qubits.
H (n) = H
H
: : :
H ; (6.112)
then the n-qubit state transforms as
0 1
Yn 1 X 1 2X
n 1
H : jxi !
( n) @ p
2 yi =f0;1g
x i y i A
( 1) jyii 2n=2 ( 1)xy jyi;
i=1 y=0 (6.113)
where x; y represent n-bit strings, and x y denotes the bitwise AND (or mod
2 scalar product)
x y = (x1 ^ y1) (x2 ^ y2) : : : (xn ^ yn ): (6.114)
Acting on the input (j0i)n j1i, the action of the circuit is
2X
n 1 !
(j0i) j1i ! n=2
n 1 j xi p 1 (j0i j1i)
2 x=0 2
2X1
n !
! 2n=2 ( 1) jxi p12 (j0i j1i)
1 f ( x )
0 2n x=0 1
1 X 1 2X
n 1
! @ 2n ( 1)f (x)( 1)xy jyiA p1 (j0i j1i)
x=0 y=0 2 (6.115)
Now let us evaluate the sum
1 2Xn 1
f (x) xy
2n x=0 ( 1) ( 1) : (6.116)
6.3. SOME QUANTUM ALGORITHMS 271
If f is a constant function, the sum is
2X
n 1 !
1
( 1)f (x) n ( 1)xy = ( 1)f (x)y;0; (6.117)
2 x=0
it vanishes unless y = 0. Hence, when we measure the n-bit register, we
obtain the result jy = 0i (j0i)n with probability one. But if the function
is balanced, then for y = 0, the sum becomes
1 2Xn 1
f (x)
2n x=0 ( 1) = 0; (6.118)
(because half of the terms are (+1) and half are ( 1)). Therefore, the prob-
ability of obtaining the measurement outcome jy = 0i is zero.
We conclude that one query of the quantum oracle suces to distinguish
constant and balanced function with 100% condence. The measurement
result y = 0 means constant, any other result means balanced.
So quantum computation solves this problem neatly, but is the problem
really hard classically? If we are restricted to classical input states jxi, we
can query the oracle repeatedly, choosing the input x at random (without
replacement) each time. Once we obtain distinct outputs for two dierent
queries, we have determined that the function is balanced (not constant).
But if the function is in fact constant, we will not be certain it is constant
until we have submitted 2n 1 +1 queries and have obtained the same response
every time. In contrast, the quantum computation gives a denite response
in only one go. So in this sense (if we demand absolute certainty) the classical
calculation requires a number of queries exponential in n, while the quantum
computation does not, and we might therefore claim an exponential quantum
speedup.
But perhaps it is not reasonable to demand absolute certainty of the
classical computation (particularly since any real quantum computer will be
susceptible to errors, so that the quantum computer will also be unable to
attain absolute certainty.) Suppose we are satised to guess balanced or
constant, with a probability of success
P (success) > 1 ": (6.119)
If the function is actually balanced, then if we make k queries, the probability
of getting the same response every time is p = 2 (k 1). If after receiving the
272 CHAPTER 6. QUANTUM COMPUTATION
same response k consecutive times we guess that the function is balanced,
then a quick Bayesian analysis shows that the probability that our guess is
wrong is 2k 11+1 (assuming that balanced and constant are a priori equally
probable). So if we guess after k queries, the probability of a wrong guess is
1 P (success) = 2k 1 (2k1 1 + 1) : (6.120)
Therefore, we can achieve success probability 1 " for " 1 = 2k 1(2k 1 +1) or
k 21 log 1" . Since we can reach an exponentially good success probability
with a polynomial number of trials, it is not really fair to say that the problem
is hard.
Bernstein{Vazirani problem. Exactly the same circuit can be used
to solve another variation on the Deutsch{Jozsa problem. Let's suppose that
our quantum black box computes one of the functions fa, where
fa(x) = a x; (6.121)
and a is an n-bit string. Our job is to determine a.
The quantum algorithm can solve this problem with certainty, given just
one (n-qubit) quantum query. For this particular function, the quantum
state in eq. (6.115) becomes
1 2X X
n 1 2n 1
But in fact
1 2X
n 1
ax xy
2n x=0 ( 1) ( 1) = a;y ; (6.123)
so this state is jai. We can execute the circuit once and measure the n-qubit
register, nding the n-bit string a with probability one.
If only classical queries are allowed, we acquire only one bit of information
from each query, and it takes n queries to determine the value of a. Therefore,
we have a clear separation between the quantum and classical diculty of
the problem. Even so, this example does not probe the relation of BPP
to BQP , because the classical problem is not hard. The number of queries
required classically is only linear in the input size, not exponential.
6.3. SOME QUANTUM ALGORITHMS 273
Simon's problem. Bernstein and Vazirani managed to formulate a vari-
ation on the above problem that is hard classically, and so establish for the
rst time a \relativized" separation between quantum and classical complex-
ity. We will nd it more instructive to consider a simpler example proposed
somewhat later by Daniel Simon.
Once again we are presented with a quantum black box, and this time we
are assured that the box computes a function
f : f0; 1gn ! f0; 1gn ; (6.124)
that is 2-to-1. Furthermore, the function has a \period" given by the n-bit
string a; that is
f (x) = f (y) i y = x a; (6.125)
where here denotes the bitwise XOR operation. (So a is the period if we
regard x as taking values in (Z2)n rather than Z2n .) This is all we know
about f . Our job is to determine the value of a.
Classically this problem is hard. We need to query the oracle an exponen-
tially large number of times to have any reasonable probability of nding a.
We don't learn anything until we are fortunate enough to choose two queries
x and y that happen to satisfy x y = a. Suppose, for example, that we
choose 2n=4 queries. The number of pairs of queries is less than (2n=4)2, and
for each pair fx; yg, the probability that x y = a is 2 n . Therefore, the
probability of successfully nding a is less than
2 n (2n=4 )2 = 2 n=2 ; (6.126)
even with exponentially many queries, the success probability is exponentially
small.
If we wish, we can frame the question as a decision problem: Either f
is a 1-1 function, or it is 2-to-1 with some randomly chosen period a, each
occurring with an a priori probability 21 . We are to determine whether the
function is 1-to-1 or 2-to-1. Then, after 2n=4 classical queries, our probability
of making a correct guess is
P (success) < 12 + 2n=1 2 ; (6.127)
which does not remain bounded away from 21 as n gets large.
274 CHAPTER 6. QUANTUM COMPUTATION
But with quantum queries the problem is easy! The circuit we use is
essentially the same as above, but now both registers are expanded to n
qubits. We prepare the equally weighted superposition of all n-bit strings
(by acting on j0i with H (n)), and then we query the oracle:
2X
n 1 ! 2X
n 1
Uf : jxi j0i ! jxijf (x)i: (6.128)
x=0 x=0
Now we measure the second register. (This step is not actually necessary,
but I include it here for the sake of pedagogical clarity.) The measurement
outcome is selected at random from the 2n 1 possible values of f (x), each
occurring equiprobably. Suppose the outcome is f (x0). Then because both
x0 and x0 a, and only these values, are mapped by f to f (x0), we have
prepared the state
p1 (jx0i + jx0 ai) (6.129)
2
in the rst register.
Now we want to extract some information about a. Clearly it would
do us no good to measure the register (in the computational basis) at this
point. We would obtain either the outcome x0 or x0 a, each occurring with
probability 21 , but neither outcome would reveal anything about the value of
a.
But suppose we apply the Hadamard transform H (n) to the register before
we measure:
H (n) : p12 (jx0i + jx0 + ai)
n 1h
2X i
1
! 2(n+1)=2 ( 1)x0y + ( 1)(x0a)y jyi
y=0
X
= 2(n 11)=2 ( 1)x0y jyi: (6.130)
ay=0
so that jsi is rotated by from the axis j!? i normal to j!i in the plane. U !
re
ects a vector in the plane about the axis j!?i, and U s re
ects a vector
about the axis jsi. Together, the two re
ections rotate the vector by 2:
' p1 ; (6.149)
N
for N large; if we choose
p
T = 4 N (1 + O(N 1=2 )); (6.150)
U s = 2jsihsj 1; (6.155)
that re
ects a vector about the axis dened by the vector jsi. How do
we build this transformation eciently from quantum gates? Since jsi =
H (n)j0i, where H (n) is the bitwise Hadamard transformation, we may write
U s = H (n)(2j0ih0j 1)H (n); (6.156)
so it will suce to construct a re
ection about the axis j0i. We can easily
build this re
ection from an n-bit Tooli gate (n).
Recall that
HxH = z ; (6.157)
s s
s s
s =
s
...
H g H Z
after conjugating the last bit by H ; (n) becomes controlled(n 1)- z , which
ips the phase of j11 : : : j1i and acts trivially on all other computational
basis states. Conjugating by NOT(n), we obtain U s , aside from an irrelevant
overall minus sign.
You will show in an exercise that the n-bit Tooli gate (n) can be con-
structed from 2n 5 3-bit Tooli gates (3) (if sucient scratch space is
available). Therefore, the circuit that constructs U s has a size linear in
n = log N . Grover's database search (assuming
p the oracle answers a query
instantaneously) takes a time of order N log N . If we regard the oracle as
a subroutine that performspa function evaluation in polylog time, then the
search takes time of order N poly(log N ).
hence, for T < N=2, we are just as likely to guess \even" when the actual
PARITY(X~ ) is odd as when it is even (on average). Our quantum algorithm
6.8. DISTRIBUTED DATABASE SEARCH 293
fails to tell us anything about the value of PARITY(X~ ); that is, averaged
over the (a priori equally likely) possible values of Xi , we are just as likely
to be right as wrong.
We can also show, by exhibiting an explicit algorithm (exercise), that
N=2 queries (assuming N even) are sucient to determine PARITY (either
probabilistically or deterministically.) In a sense, then, we can achieve a
factor of 2 speedup compared to classical queries. But that is the best we
can do.
6.9 Periodicity
So far, the one case for which we have found an exponential separation be-
tween the speed of a quantum algorithm and the speed of the corresponding
15R. Cleve, et al., \Quantum Entanglement and the Communication Complexity of the
Inner Product Function," quant-ph/9708019; W. van Dam, et al., \Multiparty Quantum
Communication Complexity," quant-ph/9710054.
6.9. PERIODICITY 297
classical algorithm is the case of Simon's problem. Simon's algorithm exploits
quantum parallelism to speed up the search for the period of a function. Its
success encourages us to seek other quantum algorithms designed for other
kinds of period nding.
Simon studied periodic functions taking values in (Z2)n . For that purpose
the n-bit Hadamard transform H (n) was a powerful tool. If we wish instead to
study periodic functions taking values in Z2n , the (discrete) Fourier transform
will be a tool of comparable power.
The moral of Simon's problem is that, while nding needles in a haystack
may be dicult, nding periodically spaced needles in a haystack can be far
easier. For example, if we scatter a photon o of a periodic array of needles,
the photon is likely to be scattered in one of a set of preferred directions,
where the Bragg scattering condition is satised. These preferred directions
depend on the spacing between the needles, so by scattering just one photon,
we can already collect some useful information about the spacing. We should
further explore the implications of this metaphor for the construction of
ecient quantum algorithms.
So imagine a quantum oracle that computes a function
f : f0; 1gn ! f0; 1gm ; (6.192)
that has an unknown period r, where r is a positive integer satisfying
1 r 2n : (6.193)
That is,
f (x) = f (x + mr); (6.194)
where m is any integer such that x and x + mr lie in f0; 1; 2; : : : ; 2n 1g.
We are to nd the period r. Classically, this problem is hard. If r is, say,
of order 2n=2, we will need to query the oracle of order 2n=4 times before we
are likely to nd two values of x that are mapped to the same value of f (x),
and hence learn something about r. But we will see that there is a quantum
algorithm that nds r in time poly (n).
Even if we know how to compute eciently the function f (x), it may
be a hard problem to determine its period. Our quantum algorithm can
be applied to nding, in poly(n) time, the period of any function that we
can compute in poly(n) time. Ecient period nding allows us to eciently
298 CHAPTER 6. QUANTUM COMPUTATION
solve a variety of (apparently) hard problems, such as factoring an integer,
or evaluating a discrete logarithm.
The key idea underlying quantum period nding is that the Fourier trans-
form can be evaluated by an ecient quantum circuit (as discovered by Peter
Shor). The quantum Fourier transform (QFT) exploits the power of quantum
parallelism to achieve an exponential speedup of the well-known (classical)
fast Fourier transform (FFT). Since the FFT has such a wide variety of
applications, perhaps the QFT will also come into widespread use someday.
jx2i H R1 R2 jy2i
jx1i s H R1 jy1i
jx0i s s H jy0i
does the job (but note that the order of the bits has been reversed in the
output). Each Hadamard gate acts as
H : jxk i ! p12 j0i + e2i(:xk)j1i : (6.221)
The other contributions to the relative phase of j0i and j1i in the kth qubit
are provided by the two-qubit conditional rotations, where
!
1 0
Rd = 0 ei=2d ; (6.222)
and d = (k j ) is the \distance" between the qubits.
In the case n = 3, the QFT is constructed from three H gates and three
controlled-R gates. For general
n, the obvious generalization of this circuit
requires n H gates and n2 = 21 n(n 1) controlled R's. A two qubit gate
is applied to each pair of qubits, again with controlled relative phase =2d ,
where d is the \distance" between the qubits. Thus the circuit family that
implements QFT has a size of order (log N )2.
We can reduce the circuit complexity to linear in log N if we are will-
ing to settle for an implementation of xed accuracy, because the two-qubit
gates acting on distantly separated qubits contribute only exponentially small
phases. If we drop the gates acting on pairs with distance greater than m,
than each term in eq. (6.217) is replaced by an approximation to m bits of
accuracy; the total error in xy=2n is certainly no worse than n2 m , so we
can achieve accuracy " in xy=2n with m log n=". If we retain only the
gates acting on qubit pairs with distance m or less, then the circuit size is
mn n log n=".
304 CHAPTER 6. QUANTUM COMPUTATION
In fact, if we are going to measure in the computational basis immedi-
ately after implementing the QFT (or its inverse), a further simplication
is possible { no two-qubit gates are needed at all! We rst remark that the
controlled { Rd gate acts symmetrically on the two qubits { it acts trivially
on j00i; j01i, and j10i, and modies the phase of j11i by eid . Thus, we
can interchange the \control" and \target" bits without modifying the gate.
With this change, our circuit for the 3-qubit QFT can be redrawn as:
jx2i H s s jy2i
jx1i R1 H s jy1i
jx0i R2 R1 H jy0i
Once we have measured jy0i, we know the value of the control bit in the
controlled-R1 gate that acted on the rst two qubits. Therefore, we will
obtain the same probability distribution of measurement outcomes if, instead
of applying controlled-R1 and then measuring, we instead measure y0 rst,
and then apply (R1)y0 to the next qubit, conditioned on the outcome of the
measurement of the rst qubit. Similarly, we can replace the controlled-R1
and controlled-R2 gates acting on the third qubit by the single qubit rotation
(R2)y0 (R1)y1 ; (6.223)
(that is, a rotation with relative phase (:y1y0)) after the values of y1 and y0
have been measured.
Altogether then, if we are going to measure after performing the QFT,
only n Hadamard gates and n 1 single-qubit rotations are needed to im-
plement it. The QFT is remarkably simple!
6.10 Factoring
6.10.1 Factoring as period nding
What does the factoring problem (nding the prime factors of a large com-
posite positive integer) have to do with periodicity? There is a well-known
6.10. FACTORING 305
(randomized) reduction of factoring to determining the period of a function.
Although this reduction is not directly related to quantum computing, we
will discuss it here for completeness, and because the prospect of using a
quantum computer as a factoring engine has generated so much excitement.
Suppose we want to nd a factor of the n-bit number N . Select pseudo-
randomly a < N , and compute the greatest common divisor GCD(a; N ),
which can be done eciently (in a time of order (log N )3) using the Euclidean
algorithm. If GCD(a; N ) 6= 1 then the GCD is a nontrivial factor of N , and
we are done. So suppose GCD(a; N ) = 1.
[Aside: The Euclidean algorithm. To compute GCD(N1; N2) (for N1 >
N2) rst divide N1 by N2 obtaining remainder R1. Then divide N2 by
R1, obtaining remainder R2. Divide R1 by R2, etc. until the remainder
is 0. The last nonzero remainder is R = GCD(N1; N2). To see that the
algorithm works, just note that (1) R divides all previous remainders
and hence also N1 and N2, and (2) any number that divides N1 and
N2 will also divide all remainders, including R. A number that divides
both N1 and N2, and also is divided by any number that divides both
N1 and N2 must be GCD(N1 ; N2). To see how long the Euclidean
algorithm takes, note that
Rj = qRj+1 + Rj+2 ; (6.224)
where q 1 and Rj+2 < Rj+1; therefore Rj+2 < 12 Rj . Two divisions
reduce the remainder by at least a factor of 2, so no more than 2 log N1
divisions are required, with each division using O((log N )2) elementary
operations; the total number of operations is O((log N )3).]
The numbers a < N coprime to N (having no common factor with N )
form a nite group under multiplication mod N . [Why? We need to establish
that each element a has an inverse. But for given a < N coprime to N , each
ab (mod N ) is distinct, as b ranges over all b < N coprime to N .16 Therefore,
for some b, we must have ab 1 (mod N ); hence the inverse of a exists.]
Each element a of this nite group has a nite order r, the smallest positive
integer such that
ar 1 (mod N ): (6.225)
16If N divides ab ab0, it must divide b b0.
306 CHAPTER 6. QUANTUM COMPUTATION
The order of a mod N is the period of the function
fN;a(x) = ax (mod N ): (6.226)
We know there is an ecient quantum algorithm that can nd the period of
a function; therefore, if we can compute fN;a eciently, we can nd the order
of a eciently.
Computing fN;a may look dicult at rst, since the exponent x can be
very large. But if x < 2m and we express x as a binary expansion
x = xm 1 2m 1 + xm 2 2m 2 + : : : + x0; (6.227)
we have
ax(mod N ) = (a2m 1 )xm 1 (a2m 2 )xm 2 : : : (a)x0 (mod N ):
(6.228)
Each a2j has a large exponent, but can be computed eciently by a classical
computer, using repeated squaring
a2j (mod N ) = (a2j 1 )2 (mod N ): (6.229)
So only m 1 (classical) mod N multiplications are needed to assemble a
table of all a2j 's.
The computation of ax(mod N ) is carried out by executing a routine:
INPUT 1
For j = 0 to m 1, if xj = 1, MULTIPLY a2j .
This routine requires at most m mod N multiplications, each requiring of
order (log N )2 elementary operations.17 Since r < N , we will have a rea-
sonable chance of success at extracting the period if we choose m 2 log N .
Hence, the computation of fN;a can be carried out by a circuit family of size
O((log N )3). Schematically, the circuit has the structure:
17Using tricks for performing ecient multiplication of very large numbers, the number
of elementary operations can be reduced to O(log N loglog N loglog log N ); thus, asymp-
totically for large N , a circuit family with size O(log2 N log log N loglog log N ) can com-
pute fN;a .
6.10. FACTORING 307
jx2i s
jx1i s
jx0i s
j1i a a2 a4
6.10.2 RSA
Does anyone care whether factoring is easy or hard? Well, yes, some people
do.
The presumed diculty of factoring is the basis of the security of the
widely used RSA18 scheme for public key cryptography, which you may have
used yourself if you have ever sent your credit card number over the internet.
The idea behind public key cryptography is to avoid the need to exchange
a secret key (which might be intercepted and copied) between the parties
that want to communicate. The enciphering key is public knowledge. But
using the enciphering key to infer the deciphering key involves a prohibitively
dicult computation. Therefore, Bob can send the enciphering key to Alice
and everyone else, but only Bob will be able to decode the message that Alice
(or anyone else) encodes using the key. Encoding is a \one-way function"
that is easy to compute but very hard to invert.
18For Rivest, Shamir, and Adleman
310 CHAPTER 6. QUANTUM COMPUTATION
(Of course, Alice and Bob could have avoided the need to exchange the
public key if they had decided on a private key in their previous clandestine
meeting. For example, they could have agreed to use a long random string
as a one-time pad for encoding and decoding. But perhaps Alice and Bob
never anticipated that they would someday need to communicate privately.
Or perhaps they did agree in advance to use a one-time pad, but they have
now used up their private key, and they are loath to reuse it for fear that an
eavesdropper might then be able to break their code. Now they are two far
apart to safely exchange a new private key; public key cryptography appears
to be their most secure option.)
To construct the public key Bob chooses two large prime numbers p and
q. But he does not publicly reveal their values. Instead he computes the
product
N = pq: (6.239)
Since Bob knows the prime factorization of N , he also knows the value of the
Euler function '(N ) { the number of number less than N that are coprime
with N . In the case of a product of two primes it is
'(N ) = N p q + 1 = (p 1)(q 1); (6.240)
(only multiples of p and q share a factor with N ). It is easy to nd '(N ) if
you know the prime factorization of N , but it is hard if you know only N .
Bob then pseudo-randomly selects e < '(N ) that is coprime with '(N ).
He reveals to Alice (and anyone else who is listening) the value of N and e,
but nothing else.
Alice converts her message to ASCII, a number a < N . She encodes the
message by computing
b = f (a) = ae (mod N ); (6.241)
which she can do quickly by repeated squaring. How does Bob decode the
message?
Suppose that a is coprime to N (which is overwhelmingly likely if p and
q are very large { anyway Alice can check before she encodes). Then
a'(N ) 1 (mod N ) (6.242)
(Euler's theorem). This is so because the numbers less than N and coprime
to N form a group (of order '(N )) under mod N multiplication. The order of
6.10. FACTORING 311
any group element must divide the order of the group (the powers of a form
a subgroup). Since GCD(e; '(N ) = 1, we know that e has a multiplicative
inverse d = e 1 mod '(N ):
ed 1 (mod '(N )): (6.243)
The value of d is Bob's closely guarded secret; he uses it to decode by com-
puting:
f 1 (b) = bd (mod N )
= aed (mod N )
= a (a'(N ))integer (mod N )
= a (mod N ): (6.244)
[Aside: How does Bob compute d = e 1? The multiplicative inverse is a
byproduct of carrying out the Euclidean algorithm to compute GCD(e; '(N )) =
1. Tracing the chain of remainders from the bottom up, starting with
Rn = 1:
1 = Rn = Rn 2 qn 1 Rn 1
Rn 1 = Rn 3 qn 2 Rn 2
Rn 2 = Rn 4 qn 3 Rn 3
etc : : : : (6.245)
(where the qj 's are the quotients), so that
1 = (1 + qn 1qn 2 )Rn 2 qn 1Rn 3
1 = ( qn 1 qn 3(1 + qn 1qn 2 ))Rn 3
+ (1 + qn 1qn 2)Rn 4 ;
etc : : : : : (6.246)
Continuing, we can express 1 as a linear combination of any two suc-
cessive remainders; eventually we work our way up to
1 = d e + q '( N ) ; (6.247)
and identify d as e 1 (mod '(N )).]
312 CHAPTER 6. QUANTUM COMPUTATION
Of course, if Eve has a superfast factoring engine, the RSA scheme is
insecure. She factors N , nds '(N ), and quickly computes d. In fact, she
does not really need to factor N ; it is sucient to compute the order modulo
N of the encoded message ae (mod N ). Since e is coprime with '(N ), the
order of ae (mod N ) is the same as the order of a (both elements generate
the same orbit, or cyclic subgroup). Once the order Ord(a) is known, Eve
computes d~ such that
~ 1 (mod Ord(a))
de (6.248)
so that
(ae)d~ a (aOrd(a))integer (mod N ) a (mod N );
(6.249)
and Eve can decipher the message. If our only concern is to defeat RSA,
we run the Shor algorithm to nd r = Ord(ae), and we needn't worry about
whether we can use r to extract a factor of N or not.
How important are such prospective cryptographic applications of quan-
tum computing? When fast quantum computers are readily available, con-
cerned parties can stop using RSA, or can use longer keys to stay a step
ahead of contemporary technology. However, people with secrets sometimes
want their messages to remain condential for a while (30 years?). They may
not be satised by longer keys if they are not condent about the pace of
future technological advances.
And if they shun RSA, what will they use instead? Not so many suitable
one-way functions are known, and others besides RSA are (or may be) vul-
nerable to a quantum attack. So there really is a lot at stake. If fast large
scale quantum computers become available, the cryptographic implications
may be far reaching.
But while quantum theory taketh away, quantum theory also giveth;
quantum computers may compromise public key schemes, but also oer an
alternative: secure quantum key distribution, as discussed in Chapter 4.
!
n
1 + 2
+2j2i 2 j0i + 2 j1i : 1 2
(6.257)
If 1 6= 2, the overlap between the two states of the n control bits is ex-
ponentially small for large n; by measuring the control bits, we can perform
the orthogonal projection onto the fj1i; j2ig basis, at least to an excellent
approximation.
If we use enough control bits, we have a large enough sample to measure
Prob (0)= 12 (1 + cos 2) with reasonable statistical condence. By execut-
ing a controlled-(iU ), we can also measure 12 (1 + sin 2) which suces to
determine modulo an integer.
6.11. PHASE ESTIMATION 315
However, in the factoring algorithm, we need to measure the phase of
e2ik=r to exponential accuracy, which seems to require an exponential number
of trials. Suppose, though, that we can eciently compute high powers of U
(as is the case for U a) such as
U 2j : (6.258)
By applying the above procedure to measurement of U 2j , we determine
exp(2i2j ); (6.259)
where e2i is an eigenvalue of U . Hence, measuring U 2j to one bit of accu-
racy is equivalent to measuring the j th bit of the eigenvalue of U .
We can use this phase estimation procedure for order nding, and hence
factorization. We invert eq. (6.253) to obtain
rX1
jx0i = p1r jk i; (6.260)
k=0
each computational basis state (for x0 6= 0) is an equally weighted superpo-
sition of r eigenstates of U a.
Measuring the eigenvalue, we obtain k = e2ik=r , with k selected from
f0; 1 : : : ; r 1g equiprobably. If r < 2n , we measure to 2n bits of precision to
determine k=r. In principle, we can carry out this procedure in a computer
that stores fewer qubits than we would need to evaluate the QFT, because
we can attack just one bit of k=r at a time.
But it is instructive to imagine that we incorporate the QFT into this
phase estimation procedure. Suppose the circuit
ji U U2 U4
316 CHAPTER 6. QUANTUM COMPUTATION
acts on the eigenstate ji of the unitary transformation U . The conditional
U prepares p12 (j0i + j1i), the conditional U 2 prepares p12 (j0i + 2j1i), the
conditional U 4 prepares p12 (j0i + 4j1i), and so on. We could perform a
Hadamard and measure each of these qubits to sample the probability dis-
tribution governed by the j th bit of , where = e2i. But a more ecient
method is to note that the state prepared by the circuit is
p1 m X e2iy jyi:
2m 1
(6.261)
2 y=0
A better way to learn the value of is to perform the QFT(m), not the
Hadamard H (m), before we measure.
Considering the case m = 3 for clarity, the circuit that prepares and then
Fourier analyzes the state
X7
p1 e2iy jyi (6.262)
8 y=0
is
j0i H r H r r jy~0i
j0i H r 1 H r jy~1i
j0i H r 2 1 H jy~2i
U U2 U4
This circuit very nearly carries out our strategy for phase estimation out-
lined above, but with a signicant modication. Before we execute the nal
Hadamard transformation and measurement of y~1 and y~2, some conditional
phase rotations are performed. It is those phase rotations that distinguish
the QFT(3) from Hadamard transform H (3), and they strongly enhance the
reliability with which we can extract the value of .
We can understand better what the conditional rotations are doing if we
suppose that = k=8, for k 2 f0; 1; 2 : : : ; 7g; in that case, we know that the
Fourier transform will generate the output y~ = k with probability one. We
may express k as the binary expansion
k = k2 k1k0 k2 4 + k1 2 + k0: (6.263)
6.12. DISCRETE LOG 317
In fact, the circuit for the least signicant bit y~0 of the Fourier transform
is precisely Kitaev's measurement circuit applied to the unitary U 4, whose
eigenvalue is
(e2i)4 = eik = eik0 = 1: (6.264)
The measurement circuit distinguishes eigenvalues 1 perfectly, so that y~0 =
k0.
The circuit for the next bit y~1 is almost the measurement circuit for U 2,
with eigenvalue
(e2i)2 = eik=2 = ei(k1 k0): (6.265)
Except that the conditional phase rotation has been inserted, which multi-
plies the phase by exp[i(k0)], resulting in eik1 . Again, applying a Hadamard
followed by measurement, we obtain the outcome y~1 = k1 with certainty.
Similarly, the circuit for y~2 measures the eigenvalue
e2i = eik=4 = ei(k2 k1k0 ); (6.266)
except that the conditional rotation removes ei(k1k0 ), so that the outcome
is y~2 = k2 with certainty.
Thus, the QFT implements the phase estimation routine with maximal
cleverness. We measure the less signicant bits of rst, and we exploit
the information gained in the measurements to improve the reliability of our
estimate of the more signicant bits. Keeping this interpretation in mind,
you will nd it easy to remember the circuit for the QFT(n)!
c) Suppose that = j ih j and ~ = j ~ih ~j are pure states. Use (b) to show
that
X
jPa P~aj 2 k j i j ~i k : (6.275)
a