Introduction and Overview: 1.1 Physics of Information
Introduction and Overview: 1.1 Physics of Information
1
2 CHAPTER 1. INTRODUCTION AND OVERVIEW
has two input bits and one output bit, and we can’t recover a unique input
from the output bit. According to Landauer’s principle, since about one
bit is erased by the gate (averaged over its possible inputs), at least work
W = kT ln 2 is needed to operate the gate. If we have a finite supply of
batteries, there appears to be a theoretical limit to how long a computation
we can perform.
But Charles Bennett found in 1973 that any computation can be per-
formed using only reversible steps, and so in principle requires no dissipation
and no power expenditure. We can actually construct a reversible version
of the NAND gate that preserves all the information about the input: For
example, the (Toffoli) gate
is a reversible 3-bit gate that flips the third bit if the first two both take
the value 1 and does nothing otherwise. The third output bit becomes the
NAND of a and b if c = 1. We can transform an irreversible computation
1.1. PHYSICS OF INFORMATION 3
cost).
These examples illustrate that work at the interface of physics and infor-
mation has generated noteworthy results of interest to both physicists and
computer scientists.
where c = (64/9)1/3 ∼ 1.9. The current state of the art is that the 65 digit
factors of a 130 digit number can be found in the order of one month by a
network of hundreds of work stations. Using this to estimate the prefactor
in Eq. 1.3, we can estimate that factoring a 400 digit number would take
about 1010 years, the age of the universe. So even with vast improvements
in technology, factoring a 400 digit number will be out of reach for a while.
The factoring problem is interesting from the perspective of complexity
theory, as an example of a problem presumed to be intractable; that is, a
problem that can’t be solved in a time bounded by a polynomial in the size
of the input, in this case log n. But it is also of practical importance, because
the difficulty of factoring is the basis of schemes for public key cryptography,
such as the widely used RSA scheme.
The exciting new result that Shor found is that a quantum computer can
factor in polynomial time, e.g., in time O[(ln n)3 ]. So if we had a quantum
computer that could factor a 130 digit number in one month (of course we
don’t, at least not yet!), running Shor’s algorithm it could factor that 400
digit number in less than 3 years. The harder the problem, the greater the
advantage enjoyed by the quantum computer.
Shor’s result spurred my own interest in quantum information (were it
not for Shor, I don’t suppose I would be teaching this course). It’s fascinating
to contemplate the implications for complexity theory, for quantum theory,
for technology.
where we have associated with each string the number that it represents in
N
P in 2value from 0 to 2 − 1. Here the ax ’s are complex
binary notation, ranging
numbers satisfying x |ax | = 1. If we measure all N qubits by projecting
each onto the {|0i, |1i} basis, the probability of obtaining the outcome |xi is
|ax |2 .
Now, a quantum computation can be described this way. We assemble N
qubits, and prepare them in a standard initial state such as |0i|0i · · · |0i, or
|x = 0i. We then apply a unitary transformation U to the N qubits. (The
transformation U is constructed as a product of standard quantum gates,
unitary transformations that act on just a few qubits at a time). After U is
applied, we measure all of the qubits by projecting onto the {|0i, |1i} basis.
The measurement outcome is the output of the computation. So the final
8 CHAPTER 1. INTRODUCTION AND OVERVIEW
1
We cannot make copies of an unknown quantum state ourselves, but we can ask a
friend to prepare many identical copies of the state (he can do it because he knows what
the state is), and not tell us what he did.
10 CHAPTER 1. INTRODUCTION AND OVERVIEW
But we don’t have that much time; we need the answer in 24 hours, not
48. And it turns out that we would be satisfied to know whether f (x) is
constant (f (0) = f (1)) or balanced (f (0) 6= f (1)). Even so, it takes 48 hours
to get the answer.
Now suppose we have a quantum black box that computes f (x). Of course
f (x) might not be invertible, while the action of our quantum computer is
unitary and must be invertible, so we’ll need a transformation Uf that takes
two qubits to two:
Uf : |xi|yi → |xi|y ⊕ f (x)i . (1.8)
(This machine flips the second qubit if f acting on the first qubit is 1, and
doesn’t do anything if f acting on the first qubit is 0.) We can determine if
f (x) is constant or balanced by using the quantum black box twice. But it
still takes a day for it to produce one output, so that won’t do. Can we get
the answer (in 24 hours) by running the quantum black box just once. (This
is “Deutsch’s problem.”)
Because the black box is a quantum computer, we can choose the input
state to be a superposition of |0i and |1i. If the second qubit is initially
prepared in the state √12 (|0i − |1i), then
1 1
Uf : |xi √ (|0i − |1i) → |xi √ (|f (x)i − |1 ⊕ f (x)i)
2 2
1
= |xi(−1)f (x) √ (|0i − |1i), (1.9)
2
1 1
Uf : √ (|0i + |1i) √ (|0i − |1i) →
2 2
1 1
√ (−1)f (0) |0i + (−1)f (1) |1i √ (|0i − |1i) . (1.10)
2 2
Finally, we can perform a measurement that projects the first qubit onto the
basis
1
|±i = √ (|0i ± |1i). (1.11)
2
Evidently, we will always obtain |+i if the function is constant, and |−i if
12 CHAPTER 1. INTRODUCTION AND OVERVIEW
We could proceed to measure the output register to find the value of f (x0 ).
But because Eq. (1.14) has been destroyed by the measurement, the intricate
correlations among the registers have been lost, and we get no opportunity
to determine f (y0 ) for any y0 6= x0 by making further measurements. In this
case, then, the quantum computation provided no advantage over a classical
one.
The lesson of the solution to Deutsch’s problem is that we can sometimes
be more clever in exploiting the correlations encoded in Eq. (1.14). Much
of the art of designing quantum algorithms involves finding ways to make
efficient use of the nonlocal correlations.
distinctions between hard and easy without specifying the hardware we will
be using? A problem might be hard on the PC, but perhaps I could design
a special purpose machine that could solve that problem much faster. Or
maybe in the future a much better general purpose computer will be available
that solves the problem far more efficiently. Truly meaningful distinctions
between hard and easy should be universal — they ought not to depend on
which machine we are using.
Much of complexity theory focuses on the distinction between “polyno-
mial time” and “exponential time” algorithms. For any algorithm A, which
can act on an input of variable length, we may associate a complexity func-
tion TA (N ), where N is the length of the input in bits. TA (N ) is the longest
“time” (that is, number of elementary steps) it takes for the algorithm to
run to completion, for any N -bit input. (For example, if A is a factoring
algorithm, TA (N ) is the time needed to factor an N -bit number in the worst
possible case.) We say that A is polynomial time if
TA (N ) ≤ Poly (N ), (1.16)
where Poly (N ) denotes a polynomial of N . Hence, polynomial time means
that the time needed to solve the problem does not grow faster than a power
of the number of input bits.
If the problem is not polynomial time, we say it is exponential time
(though this is really a misnomer, because of course that are superpoly-
nomial functions like N log N that actually increase much more slowly than
an exponential). This is a reasonable way to draw the line between easy and
hard. But the truly compelling reason to make the distinction this way is
that it is machine-independent: it does not matter what computer we are
using. The universality of the distinction between polynomial and exponen-
tial follows from one of the central results of computer science: one universal
(classical) computer can simulate another with at worst “polynomial over-
head.” This means that if an algorithm runs on your computer in polynomial
time, then I can always run it on my computer in polynomial time. If I can’t
think of a better way to do it, I can always have my computer emulate how
yours operates; the cost of running the emulation is only polynomial time.
Similarly, your computer can emulate mine, so we will always agree on which
algorithms are polynomial time.3
3
To make this statement precise, we need to be a little careful. For example, we
should exclude certain kinds of “unreasonable” machines, like a parallel computer with an
unlimited number of nodes.
1.7. WHAT ABOUT ERRORS? 15
To put it another way, contact between the computer and the environ-
ment (decoherence) causes errors that degrade the quantum information. To
operate a quantum computer reliably, we must find some way to prevent or
correct these errors.
Actually, decoherence is not our only problem. Even if we could achieve
perfect isolation from the environment, we could not expect to operate a
quantum computer with perfect accuracy. The quantum gates that the ma-
chine executes are unitary transformations that operate on a few qubits at a
time, let’s say 4 × 4 unitary matrices acting on two qubits. Of course, these
unitary matrices form a continuum. We may have a protocol for applying
U0 to 2 qubits, but our execution of the protocol will not be flawless, so the
actual transformation
U = U0 (1 + O(ε)) (1.18)
will differ from the intended U0 by some amount of order ε. After about 1/ε
gates are applied, these errors will accumulate and induce a serious failure.
Classical analog devices suffer from a similar problem, but small errors are
much less of a problem for devices that perform discrete logic.
In fact, modern digital circuits are remarkably reliable. They achieve
such high accuracy with help from dissipation. We can envision a classical
gate that acts on a bit, encoded as a ball residing at one of the two minima
of a double-lobed potential. The gate may push the ball over the intervening
barrier to the other side of the potential. Of course, the gate won’t be
implemented perfectly; it may push the ball a little too hard. Over time,
these imperfections might accumulate, causing an error.
To improve the performance, we cool the bit (in effect) after each gate.
This is a dissipative process that releases heat to the environment and com-
presses the phase space of the ball, bringing it close to the local minimum
of the potential. So the small errors that we may make wind up heating the
environment rather than compromising the performance of the device.
But we can’t cool a quantum computer this way. Contact with the en-
vironment may enhance the reliability of classical information, but it would
destroy encoded quantum information. More generally, accumulation of er-
ror will be a problem for classical reversible computation as well. To prevent
errors from building up we need to discard the information about the errors,
and throwing away information is always a dissipative process.
Still, let’s not give up too easily. A sophisticated machinery has been
developed to contend with errors in classical information, the theory of er-
18 CHAPTER 1. INTRODUCTION AND OVERVIEW
ror correcting codes. To what extent can we coopt this wisdom to protect
quantum information as well?
How does classical error correction work? The simplest example of a
classical error-correcting code is a repetition code: we replace the bit we
wish to protect by 3 copies of the bit,
0 → (000),
1 → (111). (1.19)
Now an error may occur that causes one of the three bits to flip; if it’s the
first bit, say,
(000) → (100),
(111) → (011). (1.20)
Now in spite of the error, we can still decode the bit correctly, by majority
voting.
Of course, if the probability of error in each bit were p, it would be
possible for two of the three bits to flip, or even for all three to flip. A double
flip can happen in three different ways, so the probability of a double flip is
3p2 (1 − p), while the probability of a triple flip is p3 . Altogether, then, the
probability that majority voting fails is 3p2 (1 − p) + p3 = 3p2 − 2p3 . But for
1
3p2 − 2p3 < p or p < , (1.21)
2
the code improves the reliability of the information.
We can improve the reliability further by using a longer code. One such
code (though far from the most efficient) is an N -bit repetition code. The
probability distribution for the average value of√the bit, by the central limit
theorem, approaches a Gaussian with width 1/ N as N → ∞. If P = 21 + ε
is the probability that each bit has the correct value, then the probability
that the majority vote fails (for large N ) is
2
Perror ∼ e−N ε , (1.22)
arising from the tail of the Gaussian. Thus, for any ε > 0, by introducing
enough redundancy we can achieve arbitrarily good reliability. Even for
ε < 0, we’ll be okay if we always assume that majority voting gives the
1.7. WHAT ABOUT ERRORS? 19
wrong result. Only for P = 21 is the cause lost, for then our block of N bits
will be random, and encode no information.
In the 50’s, John Von Neumann showed that a classical computer with
noisy components can work reliably, by employing sufficient redundancy. He
pointed out that, if necessary, we can compute each logic gate many times,
and accept the majority result. (Von Neumann was especially interested in
how his brain was able to function so well, in spite of the unreliability of
neurons. He was pleased to explain why he was so smart.)
But now we want to use error correction to keep a quantum computer on
track, and we can immediately see that there are difficulties:
1. Phase errors. With quantum information, more things can go wrong.
In addition to bit-flip errors
|0i → |1i,
|1i → |0i. (1.23)
there can also be phase errors
|0i → |0i,
|1i → −|1i. (1.24)
A phase error is serious, because it makes the state √12 [|0i + |1i] flip to
the orthogonal state √12 [|0i − |1i]. But the classical coding provided no
protection against phase errors.
2. Small errors. As already noted, quantum information is continuous.
If a qubit is intended to be in the state
a|0i + b|1i, (1.25)
an error might change a and b by an amount of order ε, and these small
errors can accumulate over time. The classical method is designed to
correct large (bit flip) errors.
3. Measurement causes disturbance. In the majority voting scheme,
it seemed that we needed to measure the bits in the code to detect and
correct the errors. But we can’t measure qubits without disturbing the
quantum information that they encode.
4. No cloning. With classical coding, we protected information by mak-
ing extra copies of it. But we know that quantum information cannot
be copied with perfect fidelity.
20 CHAPTER 1. INTRODUCTION AND OVERVIEW
We would like to be able to correct a bit flip error without destroying this
superposition.
Of course, it won’t do to measure a single qubit. If I measure the first
qubit and get the result |0i, then I have prepared the state |0̄i of all three
qubits, and we have lost the quantum information encoded in the coefficients
a and b.
But there is no need to restrict our attention to single-qubit measure-
ments. I could also perform collective measurements on two-qubits at once,
and collective measurements suffice to diagnose a bit-flip error. For a 3-qubit
state |x, y, zi I could measure, say, the two-qubit observables y ⊕ z, or x ⊕ z
(where ⊕ denotes addition modulo 2). For both |x, y, zi = |000i and |111i
these would be 0, but if any one bit flips, then at least one of these quantities
will be 1. In fact, if there is a single bit flip, the two bits
(y ⊕ z, x ⊕ z), (1.28)
1.8. QUANTUM ERROR-CORRECTING CODES 21
just designate in binary notation the position (1,2 or 3) of the bit that flipped.
These two bits constitute a syndrome that diagnoses the error that occurred.
For example, if the first bit flips,
then the measurement of (y ⊕ z, x ⊕ z) yields the result (0, 1), which instructs
us to flip the first bit; this indeed repairs the error.
Of course, instead of a (large) bit flip there could be a small error:
But even in this case the above procedure would work fine. In measuring
(y ⊕ z, x ⊕ z), we would project out an eigenstate of this observable. Most
of the time (probability 1 − |ε|2 ) we obtain the result (0, 0) and project the
damaged state back to the original state, and so correct the error. Occasion-
ally (probability |ε|2 ) we obtain the result (0, 1) and project the state onto
Eq. 1.29. But then the syndrome instructs us to flip the first bit, which re-
stores the original state. Similarly, if there is an amplitude of order ε for each
of the three qubits to flip, then with a probability of order |ε|2 the syndrome
measurement will project the state to one in which one of the three bits is
flipped, and the syndrome will tell us which one.
So we have already overcome 3 of the 4 obstacles cited earlier. We see
that it is possible to make a measurement that diagnoses the error without
damaging the information (answering (3)), and that a quantum measurement
can project a state with a small error to either a state with no error or a state
with a large discrete error that we know how to correct (answering (2)). As
for (4), the issue didn’t come up, because the state a|0̄i + b|1̄i is not obtained
by cloning – it is not the same as (a|0i + b|1i)3 ; that is, it differs from three
copies of the unencoded state.
Only one challenge remains: (1) phase errors. Our code does not yet
provide any protection against phase errors, for if any one of the three qubits
undergoes a phase error then our encoded state a|0̄i + b|1̄i is transformed
to a|0̄i − b|1̄i, and the encoded quantum information is damaged. In fact,
phase errors have become three times more likely than if we hadn’t used the
code. But with the methods in hand that conquered problems (2)-(4), we can
approach problem (1) with new confidence. Having protected against bit-flip
22 CHAPTER 1. INTRODUCTION AND OVERVIEW
and a phase flip occur. If we prepare an encoded state a|0̄i + b|1̄i, allow
the unitary errors to occur on each qubit, and then measure the bit-flip and
phase-flip syndromes, then most of the time we will project the state back
to its original form, but with a probability of order |ε|2 , one qubit will have
a large error: a bit flip, a phase flip, or both. From the syndrome, we learn
which bit flipped, and which cluster had a phase error, so we can apply the
suitable one-qubit unitary operator to fix the error.
Error recovery will fail if, after the syndrome measurement, there are
two bit flip errors in each of two clusters (which induces a phase error in
the encoded data) or if phase errors occur in two different clusters (which
induces a bit-flip error in the encoded data). But the probability of such a
double phase error is of order |ε|4 . So for |ε| small enough, coding improves
the reliability of the quantum information.
The code also protects against decoherence. By restoring the quantum
state irrespective of the nature of the error, our procedure removes any en-
tanglement between the quantum state and the environment.
Here as always, error correction is a dissipative process, since information
about the nature of the errors is flushed out of the quantum system. In this
case, that information resides in our recorded measurement results, and heat
will be dissipated when that record is erased.
Further developments in quantum error correction will be discussed later
in the course, including:
• As with classical coding it turns out that there are “good” quantum
codes that allow us to achieve arbitrarily high reliability as long as the error
rate per qubit is small enough.
• We’ve assumed that the error recovery procedure is itself executed flaw-
lessly. But the syndrome measurement was complicated – we needed to mea-
sure two-qubit and six-qubit collective observables to diagnose the errors – so
we actually might further damage the data when we try to correct it. We’ll
show, though, that error correction can be carried out so that it still works
effectively even if we make occasional errors during the recovery process.
• To operate a quantum computer we’ll want not only to store quantum
information reliably, but also to process it. We’ll show that it is possible to
apply quantum gates to encoded information.
Let’s summarize the essential ideas that underlie our quantum error cor-
rection scheme:
3. The errors are local, and the encoded information is nonlocal. It is im-
portant to emphasize the central assumption underlying the construc-
tion of the code – that errors affecting different qubits are, to a good
approximation, uncorrelated. We have tacitly assumed that an event
that causes errors in two qubits is much less likely than an event caus-
ing an error in a single qubit. It is of course a physics question whether
this assumption is justified or not – we can easily envision processes
that will cause errors in two qubits at once. If such correlated errors
are common, coding will fail to improve reliability.
The code takes advantage of the presumed local nature of the errors by
encoding the information in a nonlocal way - that is the information is stored
in correlations involving several qubits. There is no way to distinguish |0̄i
and |1̄i by measuring a single qubit of the nine. If we measure one qubit
we will find |0i with probability 21 and |1i with probability 21 irrespective of
the value of the encoded qubit. To access the encoded information we need
to measure a 3-qubit observable (the operator that flips all three qubits in a
cluster can distinguish |000i + |111i from |000i − |111i).
The environment might occasionally kick one of the qubits, in effect “mea-
suring” it. But the encoded information cannot be damaged by disturbing
that one qubit, because a single qubit, by itself, actually carries no informa-
tion at all. Nonlocally encoded information is invulnerable to local influences
– this is the central principle on which quantum error-correcting codes are
founded.
1. Storage: We’ll need to store qubits for a long time, long enough to
complete an interesting computation.
can survive for a time comparable to the lifetime of the excited state (though
of course the relative phase oscillates as shown because of the energy splitting
h̄ω between the levels). The ions are so well isolated that spontaneous decay
can be the dominant form of decoherence.
It is easy to read out the ions by performing a measurement that projects
onto the {|gi, |ei} basis. A laser is tuned to a transition from the state |gi
to a short-lived excited state |e0 i. When the laser illuminates the ions, each
26 CHAPTER 1. INTRODUCTION AND OVERVIEW
qubit with the value |0i repeatedly absorbs and reemits the laser light, so
that it flows visibly (fluoresces). Qubits with the value |1i remain dark.
Because of their mutual Coulomb repulsion, the ions are sufficiently well
separated that they can be individually addressed by pulsed lasers. If a laser
is tuned to the frequency ω of the transition and is focused on the nth ion,
then Rabi oscillations are induced between |0i and |1i. By timing the laser
pulse properly and choosing the phase of the laser appropriately, we can
apply any one-qubit unitary transformation. In particular, acting on |0i, the
laser pulse can prepare any desired linear combination of |0i and |1i.
But the most difficult part of designing and building quantum computing
hardware is getting two qubits to interact with one another. In the ion
trap, interactions arise because of the Coulomb repulsion between the ions.
Because of the mutual Couloumb repulsion, there is a spectrum of coupled
normal modes of vibration for the trapped ions. When the ion absorbs or
emits a laser photon, the center of mass of the ion recoils. But if the laser
is properly tuned, then when a single ion absorbs or emits, a normal mode
involving many ions will recoil coherently (the Mössbauer effect).
The vibrational mode of lowest frequency (frequency ν) is the center-of-
mass (cm) mode, in which the ions oscillate in lockstep in the harmonic well
of the trap. The ions can be laser cooled to a temperature much less than ν,
so that each vibrational mode is very likely to occupy its quantum-mechanical
ground state. Now imagine that a laser tuned to the frequency ω − ν shines
on the nth ion. For a properly time pulse the state |ein will rotate to |gin ,
while the cm oscillator makes a transition from its ground state |0icm to its
first excited state |1icm (a cm “phonon” is produced). However, the state
|gin |0icm is not on resonance for any transition and so is unaffected by the
pulse. Thus the laser pulse induces a unitary transformation acting as
the internal state of one of the ions. The procedure should be designed so
that the cm mode always returns to its ground state |0icm at the conclusion
of the gate implementation. For example, Cirac and Zoller showed that the
quantum XOR (or controlled not) gate
can be implemented in an ion trap with altogether 5 laser pulses. The condi-
tional excitation of a phonon, Eq. (1.35) has been demonstrated experimen-
tally, for a single trapped ion, by the NIST group.
One big drawback of the ion trap computer is that it is an intrinsically
slow device. Its speed is ultimately limited by the energy-time uncertainty
relation. Since the uncertainty in the energy of the laser photons should be
small compared to the characteristic vibrational splitting ν, each laser pulse
should last a time long compared to ν −1 . In practice, ν is likely to be of
order 100 kHz.
where |Li, |Ri denote photon states with left and right circular polarization.
To achieve this interaction, one photon is stored in the cavity, where the |Li
polarization does not couple to the atom, but the |Ri polarization couples
strongly. A second photon transverses the cavity, and for the second photon
as well, one polarization interacts with the atom preferentially. The second
photon wave pocket acquires a particular phase shift ei∆ only if both pho-
tons have |Ri polarization. Because the phase shift is conditioned on the
polarization of both photons, this is a nontrivial two-qubit quantum gate.
1.9.3 NMR
A third (dark horse) hardware scheme has sprung up in the past year, and
has leap frogged over the ion trap and cavity QED to take the current lead
in coherent quantum processing. The new scheme uses nuclear magnetic
resonance (NMR) technology. Now qubits are carried by certain nuclear
spins in a particular molecule. Each spin can either be aligned (| ↑i = |0i)
or antialigned (| ↓i = |1i) with an applied constant magnetic field. The
spins take a long time to relax or decohere, so the qubits can be stored for a
reasonable time.
We can also turn on a pulsed rotating magnetic field with frequency
ω (where the ω is the energy splitting between the spin-up and spin-down
states), and induce Rabi oscillations of the spin. By timing the pulse suitably,
we can perform a desired unitary transformation on a single spin (just as in
our discussion of the ion trap). All the spins in the molecule are exposed to
the rotating magnetic field but only those on resonance respond.
Furthermore, the spins have dipole-dipole interactions, and this coupling
can be exploited to perform a gate. The splitting between | ↑i and | ↓i for
one spin actually depends on the state of neighboring spins. So whether a
driving pulse is on resonance to tip the spin over is conditioned on the state
of another spin.
1.9. QUANTUM HARDWARE 29
All this has been known to chemists for decades. Yet it was only in the
past year that Gershenfeld and Chuang, and independently Cory, Fahmy, and
Havel, pointed out that NMR provides a useful implementation of quantum
computation. This was not obvious for several reasons. Most importantly,
NMR systems are very hot. The typical temperature of the spins (room
temperature, say) might be of order a million times larger than the energy
splitting between |0i and |1i. This means that the quantum state of our
computer (the spins in a single molecule) is very noisy – it is subject to
strong random thermal fluctuations. This noise will disguise the quantum
information. Furthermore, we actually perform our processing not on a single
molecule, but on a macroscopic sample containing of order 1023 “computers,”
and the signal we read out of this device is actually averaged over this ensem-
ble. But quantum algorithms are probabilistic, because of the randomness of
quantum measurement. Hence averaging over the ensemble is not equivalent
to running the computation on a single device; averaging may obscure the
results.
Gershenfeld and Chuang and Cory, Fahmy, and Havel, explained how to
overcome these difficulties. They described how “effective pure states” can
be prepared, manipulated, and monitored by performing suitable operations
on the thermal ensemble. The idea is to arrange for the fluctuating properties
of the molecule to average out when the signal is detected, so that only the
underlying coherent properties are measured. They also pointed out that
some quantum algorithms (including Shor’s factoring algorithm) can be cast
in a deterministic form (so that at least a large fraction of the computers give
the same answer); then averaging over many computations will not spoil the
result.
Quite recently, NMR methods have been used to prepare a maximally
entangled state of three qubits, which had never been achieved before.
Clearly, quantum computing hardware is in its infancy. Existing hardware
will need to be scaled up by many orders of magnitude (both in the number of
stored qubits, and the number of gates that can be applied) before ambitious
computations can be attempted. In the case of the NMR method, there is
a particularly serious limitation that arises as a matter of principle, because
the ratio of the coherent signal to the background declines exponentially with
the number of spins per molecule. In practice, it will be very challenging to
perform an NMR quantum computation with more than of order 10 qubits.
Probably, if quantum computers are eventually to become practical de-
vices, new ideas about how to construct quantum hardware will be needed.
30 CHAPTER 1. INTRODUCTION AND OVERVIEW
1.10 Summary
This concludes our introductory overview to quantum computation. We
have seen that three converging factors have combined to make this subject
exciting.
John Preskill
California Institute of Technology
2
2
Foundations I: States and Ensembles
3
4 2 Foundations I: States and Ensembles
for all vectors |ϕi, |ψi (where here A|ψi has been denoted as |Aψi). A
is self-adjoint if A = A† , or in other words, if hϕ|A|ψi = hψ|A|ϕi∗ for all
vectors |ϕi and |ψi. If A and B are self adjoint, then so is A+B (because
(A + B)† = A† + B † ), but (AB)† = B † A† , so that AB is self adjoint
2.1 Axioms of quantum mechanics 5
E n E m = δn,m E n .
E †n = E n . (2.5)
where a, b are complex numbers that satisfy |a|2 +|b|2 = 1, and the overall
phase is physically irrelevant. A qubit is a quantum system described by
a two-dimensional Hilbert space, whose state can take any value of the
form eq.(2.15).
We can perform a measurement that projects the qubit onto the basis
{|0i, |1i}. Then we will obtain the outcome |0i with probability |a|2 , and
the outcome |1i with probability |b|2 . Furthermore, except in the cases
a = 0 and b = 0, the measurement irrevocably disturbs the state. If the
value of the qubit is initially unknown, then there is no way to determine
a and b with that single measurement, or any other conceivable measure-
ment. However, after the measurement, the qubit has been prepared in a
known state – either |0i or |1i – that differs (in general) from its previous
state.
In this respect, a qubit differs from a classical bit; we can measure a
classical bit without disturbing it, and we can decipher all of the infor-
mation that it encodes. But suppose we have a classical bit that really
does have a definite value (either 0 or 1), but where that value is initially
unknown to us. Based on the information available to us we can only say
that there is a probability p0 that the bit has the value 0, and a probability
p1 that the bit has the value 1, where p0 + p1 = 1. When we measure
the bit, we acquire additional information; afterwards we know the value
with 100% confidence.
An important question is: what is the essential difference between a
qubit and a probabilistic classical bit? In fact they are not the same, for
several reasons that we will explore. To summarize the difference in brief:
there is only one way to look at a bit, but there is more than one way to
look at a qubit.
2.2.1 Spin- 12
First of all, the coefficients a and b in eq.(2.15) encode more than just the
probabilities of the outcomes of a measurement in the {|0i, |1i} basis. In
particular, the relative phase of a and b also has physical significance.
The properties of a qubit are easier to grasp if we appeal to a geomet-
rical interpretation of its state. For a physicist, it is natural to interpret
eq.(2.15) as the spin state of an object with spin- 12 (like an electron).
Then |0i and |1i are the spin up (| ↑i) and spin down (| ↓i) states along
2.2 The Qubit 9
a particular axis such as the z-axis. The two real numbers characterizing
the qubit (the complex numbers a and b, modulo the normalization and
overall phase) describe the orientation of the spin in three-dimensional
space (the polar angle θ and the azimuthal angle ϕ).
We will not go deeply here into the theory of symmetry in quantum
mechanics, but we will briefly recall some elements of the theory that
will prove useful to us. A symmetry is a transformation that acts on
a state of a system, yet leaves all observable properties of the system
unchanged. In quantum mechanics, observations are measurements of
self-adjoint operators. If A is measured in the state |ψi, then the outcome
|ai (an eigenvector of A) occurs with probability |ha|ψi|2 . A symmetry
should leave these probabilities unchanged, when we “rotate” both the
system and the apparatus.
A symmetry, then, is a mapping of vectors in Hilbert space
|ψi 7→ |ψ 0 i, (2.16)
that preserves the absolute values of inner products
|hϕ|ψi| = |hϕ0 |ψ 0 i|, (2.17)
for all |ϕi and |ψi. According to a famous theorem due to Wigner, a
mapping with this property can always be chosen (by adopting suitable
phase conventions) to be either unitary or antiunitary. The antiunitary
alternative, while important for discrete symmetries, can be excluded for
continuous symmetries. Then the symmetry acts as
|ψi 7→ |ψ 0 i = U |ψi, (2.18)
where U is unitary (and in particular, linear).
Symmetries form a group: a symmetry transformation can be inverted,
and the product of two symmetries is a symmetry. For each symmetry op-
eration R acting on our physical system, there is a corresponding unitary
transformation U (R). Multiplication of these unitary operators must re-
spect the group multiplication law of the symmetries – applying R1 ◦ R2
should be equivalent to first applying R2 and subsequently R1 . Thus we
demand
U (R1 )U (R2 ) = Phase(R1 , R2 ) · U (R1 ◦ R2 ) (2.19)
A phase depending on R1 and R2 is permitted in eq.(2.19) because quan-
tum states are rays; we need only demand that U (R1 ◦ R2 ) act the same
way as U (R1 )U (R2 ) on rays, not on vectors. We say that U (R) provides
a unitary representation, up to a phase, of the symmetry group.
So far, our concept of symmetry has no connection with dynamics.
Usually, we demand of a symmetry that it respect the dynamical evolu-
tion of the system. This means that it should not matter whether we
10 2 Foundations I: States and Ensembles
first transform the system and then evolve it, or first evolve it and then
transform it. In other words, the diagram
dynamics
Initial - Final
transformation transformation
? ?
dynamics
New Initial - New Final
[Q, H] = 0 ; (2.23)
The most general 2×2 unitary matrix with determinant 1 can be expressed
in this form. Thus, we are entitled to think of a qubit as a spin- 12 object,
and an arbitrary unitary transformation acting on the qubit’s state (aside
from a possible physically irrelevant rotation of the overall phase) is a
rotation of the spin.
A peculiar property of the representation U (n̂, θ) is that it is double-
valued. In particular a rotation by 2π about any axis is represented non-
trivially:
U (n̂, θ = 2π) = −I. (2.32)
Our representation of the rotation group is really a representation “up to
a sign”
U (R1 )U (R2 ) = ±U (R1 ◦ R2 ). (2.33)
But as already noted, this is acceptable, because the group multiplication
is respected on rays, though not on vectors. These double-valued repre-
sentations of the rotation group are called spinor representations. (The
existence of spinors follows from a topological property of the group —
that it is not simply connected.)
While it is true that a rotation by 2π has no detectable effect on a
spin- 21 object, it would be wrong to conclude that the spinor property
has no observable consequences. Suppose I have a machine that acts on
a pair of spins. If the first spin is up, it does nothing, but if the first spin
is down, it rotates the second spin by 2π. Now let the machine act when
the first spin is in a superposition of up and down. Then
1 1
√ (| ↑i1 + | ↓i1 ) | ↑i2 7→ √ (| ↑i1 − | ↓i1 ) | ↑i2 . (2.34)
2 2
While there is no detectable effect on the second spin, the state of the
first has flipped to an orthogonal state, which is very much observable.
In a rotated frame of reference, a rotation R(n̂, θ) becomes a rotation
through the same angle but about a rotated axis. It follows that the three
components of angular momentum transform under rotations as a vector:
−iϕ/2
cos 2θ
e
|ψ(θ, ϕ)i = , (2.39)
eiϕ/2 sin 2θ
(up to an overall phase). We can check directly that this is an eigenstate
of
e−iϕ sin θ
cos θ
n̂ · ~
σ = iϕ , (2.40)
e sin θ − cos θ
with eigenvalue one. We now see that eq.(2.15) with a = e−iϕ/2 cos 2θ ,
b = eiϕ/2 sin 2θ , can be interpreted as a spin pointing in the (θ, ϕ) direction.
We noted that we cannot determine a and b with a single measurement.
Furthermore, even with many identical copies of the state, we cannot
completely determine the state by measuring each copy only along the
z-axis. This would enable us to estimate |a| and |b|, but we would learn
nothing about the relative phase of a and b. Equivalently, we would find
the component of the spin along the z-axis
θ θ
hψ(θ, ϕ)|σ3 |ψ(θ, ϕ)i = cos2 − sin2 = cos θ, (2.41)
2 2
but we would not learn about the component in the x-y plane. The prob-
lem of determining |ψi by measuring the spin is equivalent to determining
the unit vector n̂ by measuring its components along various axes. Alto-
gether, measurements along three different axes are required. E.g., from
hσ3 i and hσ1 i we can determine n3 and n1 , but the sign of n2 remains
undetermined. Measuring hσ2 i would remove this remaining ambiguity.
If we are permitted to rotate the spin, then only measurements along
the z-axis will suffice. That is, measuring a spin along the n̂ axis is
equivalent to first applying a rotation that rotates the n̂ axis to the axis
ẑ, and then measuring along ẑ.
In the special case θ = π2 and ϕ = 0 (the x̂-axis) our spin state is
1
| ↑x i = √ (| ↑z i + | ↓z i) (2.42)
2
14 2 Foundations I: States and Ensembles
(“spin-up along the x-axis”). The orthogonal state (“spin down along the
x-axis”) is
1
| ↓x i = √ (| ↑z i − | ↓z i) . (2.43)
2
For either of these states, if we measure the spin along the z-axis, we will
obtain | ↑z i with probability 12 and | ↓z i with probability 12 .
Now consider the combination
1
√ (| ↑x i + | ↓x i) . (2.44)
2
This state has the property that, if we measure the spin along the x-axis,
we obtain | ↑x i or | ↓x i, each with probability 12 . Now we may ask, what
if we measure the state in eq.(2.44) along the z-axis?
If these were probabilistic classical bits, the answer would be obvious.
The state in eq.(2.44) is in one of two states, and for each of the two,
the probability is 21 for pointing up or down along the z-axis. So of
course we should find up with probability 21 when we measure the state
√1 (| ↑x i + | ↓x i) along the z-axis.
2
But not so for qubits! By adding eq.(2.42) and eq.(2.43), we see that
the state in eq.(2.44) is really | ↑z i in disguise. When we measure along
the z-axis, we always find | ↑z i, never | ↓z i.
We see that for qubits, as opposed to probabilistic classical bits, proba-
bilities can add in unexpected ways. This is, in its simplest guise, the phe-
nomenon called “quantum interference,” an important feature of quantum
information.
To summarize the geometrical interpretation of a qubit: we may think
of a qubit as a spin- 12 object, and its quantum state is characterized
by a unit vector n̂ in three dimensions, the spin’s direction. A unitary
transformation rotates the spin, and a measurement of an observable has
two possible outcomes: the spin is either up or down along a specified
axis.
It should be emphasized that, while this formal equivalence with a spin-
1
2 object applies to any two-level quantum system, not every two-level
system transforms as a spinor under spatial rotations!
with eigenvalues eiθ and e−iθ , the states of right and left circular polar-
ization. That is, these are the eigenstates of the rotation generator
0 −i
J= = σ2, (2.48)
i 0
are mutually orthogonal and can be obtained by rotating the states |xi
and |yi by 45◦ . Suppose that we have a polarization analyzer that allows
only one of two orthogonal linear photon polarizations to pass through,
absorbing the other. Then an x or y polarized photon has probability 12
of getting through a 45◦ rotated polarizer, and a 45◦ polarized photon
has probability 21 of getting through an x or y analyzer. But an x photon
never passes through a y analyzer.
Suppose that a photon beam is directed at an x analyzer, with a y
analyzer placed further downstream. Then about half of the photons will
pass through the first analyzer, but every one of these will be stopped
by the second analyzer. But now suppose that we place a 45◦ -rotated
analyzer between the x and y analyzers. Then about half of the photons
pass through each analyzer, and about one in eight will manage to pass all
three without being absorbed. Because of this interference effect, there
is no consistent interpretation in which each photon carries one classical
bit of polarization information. Qubits are different than probabilistic
classical bits.
A device can be constructed that rotates the linear polarization of a
photon, and so applies the transformation Eq. (2.45) to our qubit; it
functions by “turning on” a Hamiltonian for which the circular polar-
ization states |Li and |Ri are nondegenerate energy eigenstates. This
is not the most general possible unitary transformation. But if we also
have a device that alters the relative phase of the two orthogonal linear
polarization states
|xi → e−iϕ/2 |xi,
|yi → eiϕ/2 |yi (2.50)
(by turning on a Hamiltonian whose nondegenerate energy eigenstates are
the linear polarization states), then the two devices can be employed to-
gether to apply an arbitrary 2 × 2 unitary transformation (of determinant
1) to the photon polarization state.
axioms appear to be violated. The trouble is that our axioms are intended
to characterize the quantum behavior of a closed system that does not
interact with its surroundings. In practice, closed quantum systems do
not exist; the observations we make are always limited to a small part of
a much larger quantum system.
When we study open systems, that is, when we limit our attention to
just part of a larger system, then (contrary to the axioms):
with probability |b|2 , we obtain the result |1iA and prepare the state
hM A i = AB hψ|M A ⊗ I B |ψiAB
X X
= a∗jν (A hj| ⊗ B hν|) (M A ⊗ I B ) aiµ (|iiA ⊗ |µiB )
j,ν i,µ
X
= a∗jµ aiµ hj|M A |ii = tr (M A ρA ) , (2.62)
i,j,µ
where
X
ρA = trB (|ψihψ|) ≡ aiµ a∗jµ |iihj| (2.63)
i,j,µ
dual vector (or bra) B hµ| as a linear map that takes vectors in HA ⊗ HB
to vectors of HA , defined through its action on a basis:
similarly, the ket |µiB defines a map from the HA ⊗ HB dual basis to the
HA dual basis, via
AB hiν|µiB = δµν A hi|. (2.65)
The partial trace operation is a linear map that takes an operator M AB
on HA ⊗ HB to an operator on HA defined as
X
trB M AB = B hµ|M AB |µiB . (2.66)
µ
From the definition eq.(2.63), we can immediately infer that ρA has the
following properties:
1. ρA is self-adjoint: ρA = ρ†A .
2. ρA is positive: For any |ϕi, hϕ|ρA |ϕi = µ | i aiµ hϕ|ii|2 ≥ 0.
P P
2
P
3. tr(ρA ) = 1: We have tr(ρA ) = i,µ |aiµ | = 1, since |ψiAB is
normalized.
It follows that ρA can be diagonalized in an orthonormal basis, that the
eigenvalues are all real and nonnegative, and that the eigenvalues sum to
one.
If we are looking at a subsystem of a larger quantum system, then, even
if the state of the larger system is a ray, the state of the subsystem need
not be; in general, the state is represented by a density operator. In the
case where the state of the subsystem is a ray, and we say that the state is
pure. Otherwise the state is mixed. If the state is a pure state |ψiA , then
the density matrix ρA = |ψihψ| is the projection onto the one-dimensional
space spanned by |ψiA . Hence a pure density matrix has the property
ρ2 = ρ. A general density matrix, expressed in the basis {|ai} in which
it is diagonal, has the form
X
ρA = pa |aiha|, (2.68)
a
P
where 0 < pa ≤ 1 and a pa = 1. If the state is not pure, P 2there
Pare two
or more terms in this sum, and ρ2 6= ρ; in fact, tr ρ2 = pa < pa = 1.
2.3 The density operator 21
density matrices must have the eigenvalues 0 and 1 — they are one-
dimensional projectors, and hence pure states. We have already seen
that any pure state of a single qubit is of the form |ψ(θ, ϕ)i and can be
envisioned as a spin pointing in the (θ, ϕ) direction. Indeed using the
property
σ )2 = I,
(n̂ · ~ (2.71)
where n̂ is a unit vector, we can easily verify that the pure-state density
matrix
1
ρ(n̂) = (I + n̂ · ~σ) (2.72)
2
satisfies the property
(n̂ · ~
σ ) ρ(n̂) = ρ(n̂) (n̂ · ~
σ ) = ρ(n̂), (2.73)
and, therefore is the projector
ρ(n̂) = |ψ(n̂)ihψ(n̂)| ; (2.74)
that is, n̂ is the direction along which the spin is pointing up. Alterna-
tively, from the expression
−iϕ/2
e cos (θ/2)
|ψ(θ, φ)i = , (2.75)
eiϕ/2 sin (θ/2)
we may compute directly that
ρ(θ, φ) = |ψ(θ, φ)ihψ(θ, φ)|
cos2 (θ/2) cos (θ/2) sin (θ/2)e−iϕ
=
cos (θ/2) sin (θ/2)eiϕ sin2 (θ/2)
sin θe−iϕ
1 1 cos θ 1
= I+ iϕ = (I + n̂ · ~
σ ) (2.76)
2 2 sin θe − cos θ 2
where n̂ = (sin θ cos ϕ, sin θ sin ϕ, cos θ). One nice property of the Bloch
parametrization of the pure states is that while |ψ(θ, ϕ)i has an arbitrary
overall phase that has no physical significance, there is no phase ambiguity
in the density matrix ρ(θ, ϕ) = |ψ(θ, ϕ)ihψ(θ, ϕ)|; all the parameters in ρ
have a physical meaning.
From the property
1
tr σ i σ j = δij (2.77)
2
we see that
hn̂ · ~
σ iP~ = tr n̂ · ~σ ρ(P~ ) = n̂ · P~ . (2.78)
Here {|iiA } and {|µiB } are orthonormal basis for HA and HB respectively,
but to obtain the second equality in eq.(2.79) we have defined
X
|ĩiB ≡ ψiµ |µiB . (2.80)
µ
ρA = trB (|ψihψ|)
X X
= trB ( |iihj| ⊗ |ĩihj̃|) = hj̃|ĩi (|iihj|) . (2.82)
i,j i,j
Hence, it turns out that the {|ĩiB } are orthogonal after all. We obtain
orthonormal vectors by rescaling,
−1/2
|i0 iB = pi |ĩiB (2.85)
24 2 Foundations I: States and Ensembles
(we may assume pi 6= 0, because we will need eq.(2.85) only for i appearing
in the sum eq.(2.81)), and therefore obtain the expansion
X√
|ψiAB = pi |iiA ⊗ |i0 iB , (2.86)
i
The orthonormal bases {|aiA } and {|µiB } are related to the Schmidt
bases {|iiA } and {|i0 iB } by unitary transformations U A and U B , hence
X X
|iiA = |aiA (UA )ai , |i0 iB = |µiB (UB )µi0 . (2.88)
a µ
and |i0 iB ’s, and then we pair up the eigenstates of ρA and ρB with the
same eigenvalue to obtain eq.(2.86). We have chosen the phases of our
basis states so that no phases appear in the coefficients in the sum; the
only remaining freedom is to redefine |iiA and |i0 iB by multiplying by
opposite phases (which leaves the expression eq.(2.86) unchanged).
But if ρA has degenerate nonzero eigenvalues, then we need more in-
formation than that provided by ρA and ρB to determine the Schmidt
decomposition; we need to know which |i0 iB gets paired with each |iiA .
For example, if both HA and HB are d-dimensional and Uij is any d × d
unitary matrix, then
d
1 X
|ψiAB = √ |iiA Uij ⊗ |j 0 iB , (2.91)
d i,j=1
2.4.1 Entanglement
With any bipartite pure state |ψiAB we may associate a positive integer,
the Schmidt number, which is the number of nonzero eigenvalues in ρA
(or ρB ) and hence the number of terms in the Schmidt decomposition
of |ψiAB . In terms of this quantity, we can define what it means for a
bipartite pure state to be entangled: |ψiAB is entangled (or nonseparable)
if its Schmidt number is greater than one; otherwise, it is separable (or
unentangled). Thus, a separable bipartite pure state is a direct product
of pure states in HA and HB ,
|ψiAB = |ϕiA ⊗ |χiB ; (2.94)
26 2 Foundations I: States and Ensembles
then the reduced density matrices ρA = |ϕihϕ| and ρB = |χihχ| are pure.
Any state that cannot be expressed as such a direct product is entangled;
then ρA and ρB are mixed states.
When |ψiAB is entangled we say that A and B have quantum corre-
lations. It is not strictly correct to say that subsystems A and B are
uncorrelated if |ψiAB is separable; after all, the two spins in the separable
state
| ↑iA | ↑iB , (2.95)
are surely correlated – they are both pointing in the same direction. But
the correlations between A and B in an entangled state have a different
character than those in a separable state. One crucial difference is that
entanglement cannot be created locally. The only way to entangle A and
B is for the two subsystems to directly interact with one another.
We can prepare the state eq.(2.95) without allowing spins A and B to
ever come into contact with one another. We need only send a (classical!)
message to two preparers (Alice and Bob) telling both of them to prepare
a spin pointing along the z-axis. But the only way to turn the state
eq.(2.95) into an entangled state like
1
√ (| ↑iA | ↑iB + | ↓iA | ↓iB ) , (2.96)
2
is to apply a collective unitary transformation to the state. Local unitary
transformations of the form UA ⊗ UB , and local measurements performed
by Alice or Bob, cannot increase the Schmidt number of the two-qubit
state, no matter how much Alice and Bob discuss what they do. To
entangle two qubits, we must bring them together and allow them to
interact.
As we will discuss in Chapters 4 and 10, it is also possible to make the
distinction between entangled and separable bipartite mixed states. We
will also discuss various ways in which local operations can modify the
form of entanglement, and some ways that entanglement can be put to
use.
(2) ρ is nonnegative.
2.5 Ambiguity of the ensemble interpretation 27
(3) tr(ρ) = 1.
It follows immediately that, given two density matrices ρ1 , and ρ2 , we can
always construct another density matrix as a convex linear combination
of the two:
ρ(λ) = λρ1 + (1 − λ)ρ2 (2.97)
is a density matrix for any real λ satisfying 0 ≤ λ ≤ 1. We easily see that
ρ(λ) satisfies (1) and (3) if ρ1 and ρ2 do. To check (2), we evaluate
Since the right hand side is a sum of two nonnegative terms, and the
sum vanishes, both terms must vanish. If λ is not 0 or 1, we conclude
that ρ1 and ρ2 are orthogonal to |ψ⊥ i. But since |ψ⊥ i can be any vector
orthogonal to |ψi, we see that ρ1 = ρ2 = ρ.
The vectors in a convex set that cannot be expressed as a linear com-
bination of other vectors in the set are called the extremal points of the
set. We have just shown that the pure states are extremal points of the
set of density matrices. Furthermore, only the P pure states are extremal,
because any mixed state can be written ρ = i pi |iihi| in the basis in
which it is diagonal, and so is a convex sum of pure states.
We have already encountered this structure in our discussion of the
special case of the Bloch sphere. We saw that the density operators are a
(unit) ball in the three-dimensional set of 2 × 2 Hermitian matrices with
unit trace. The ball is convex, and its extremal points are the points on
the boundary. Similarly, the d × d density operators are a convex subset
of the (d2 −1)-dimensional set of d×d Hermitian matrices with unit trace,
and the extremal points of the set are the pure states.
28 2 Foundations I: States and Ensembles
However, the 2 × 2 case is atypical in one respect: for d > 2, the points
on the boundary of the set of density matrices are not necessarily pure
states. The boundary of the set consists of all density matrices with
at least one vanishing eigenvalue (since there are nearby matrices with
negative eigenvalues). Such a density matrix need not be pure, for d > 2,
since the number of nonvanishing eigenvalues can exceed one.
hM i = λhM i1 + (1 − λ)hM i2
= λtr(M ρ1 ) + (1 − λ)tr(M ρ2 )
= tr (M ρ(λ)) . (2.100)
to the Andromeda galaxy and Bob keeps all of the B qubits on earth.
When Bob wants to send a one-bit message to Alice, he chooses to mea-
sure either σ 1 or σ 3 for all his spins, thus preparing Alice’s spins in either
the {| ↑z iA , | ↓z iA } or {| ↑x iA , | ↓x iA } ensembles. (V is real in this case, so
V = V ∗ and n̂ = n̂0 .) To read the message, Alice immediately measures
her spins to see which ensemble has been prepared.
This scheme has a flaw. Though the two preparation methods are
surely different, both ensembles are described by precisely the same den-
sity matrix ρA . Thus, there is no conceivable measurement Alice can
make that will distinguish the two ensembles, and no way for Alice to tell
what action Bob performed. The “message” is unreadable.
Why, then, do we confidently state that “the two preparation methods
are surely different?” To ease any concerns about that, imagine that Bob
either (1) measures all of his spins along the ẑ-axis, or (2) measures
all of his spins along the x̂-axis, and then calls Alice on the intergalactic
telephone. He does not tell Alice whether he did (1) or (2), but he does
tell her the results of all his measurements: “the first spin was up, the
second was down,” etc. Now Alice performs either (1) or (2) on her
spins. If both Alice and Bob measured along the same axis, Alice will
find that every single one of her measurement outcomes agrees with what
Bob found. But if Alice and Bob measured along different (orthogonal)
axes, then Alice will find no correlation between her results and Bob’s.
About half of her measurements agree with Bob’s and about half disagree.
If Bob promises to do either (1) or (2), and assuming no preparation or
measurement errors, then Alice will know that Bob’s action was different
than hers (even though Bob never told her this information) as soon as
one of her measurements disagrees with what Bob found. If all their
measurements agree, then if many spins are measured, Alice will have
very high statistical confidence that she and Bob measured along the
same axis. (Even with occasional measurement errors, the statistical test
will still be highly reliable if the error rate is low enough.) So Alice does
have a way to distinguish Bob’s two preparation methods, but in this case
there is certainly no faster-than-light communication, because Alice had
to receive Bob’s phone call before she could perform her test.
in the case of a coherent superposition, the relative phase of the two states
has observable consequences (distinguishes | ↑x i from | ↓x i). In the case
of an incoherent mixture, the relative phase is completely unobservable.
The superposition becomes incoherent if spin A becomes entangled with
another spin B, and spin B is inaccessible.
Heuristically, the states | ↑z iA and | ↓z iA can interfere (the relative
phase of these states can be observed) only if we have no information
about whether the spin state is | ↑z iA or | ↓z iA . More than that, in-
terference can occur only if there is in principle no possible way to find
out whether the spin is up or down along the z-axis. Entangling spin A
with spin B destroys interference, (causes spin A to decohere) because it
is possible in principle for us to determine if spin A is up or down along
ẑ by performing a suitable measurement of spin B.
But we have now seen that the statement that entanglement causes
decoherence requires a qualification. Suppose that Bob measures spin B
along the x̂-axis, obtaining either the result | ↑x iB or | ↓x iB , and that he
sends his measurement result to Alice. Now Alice’s spin is a pure state
(either | ↑x iA or | ↓x iA ) and in fact a coherent superposition of | ↑z iA and
| ↓z iA . We have managed to recover the purity of Alice’s spin before the
jaws of decoherence could close!
Suppose that Bob allows his spin to pass through a Stern–Gerlach appa-
ratus oriented along the ẑ-axis. Well, of course, Alice’s spin can’t behave
like a coherent superposition of | ↑z iA and | ↓z iA ; all Bob has to do is
look to see which way his spin moved, and he will know whether Al-
ice’s spin is up or down along ẑ. But suppose that Bob does not look.
Instead, he carefully refocuses the two beams without maintaining any
record of whether his spin moved up or down, and then allows the spin to
pass through a second Stern–Gerlach apparatus oriented along the x̂-axis.
This time he looks, and communicates the result of his σ 1 measurement
to Alice. Now the coherence of Alice’s spin has been restored!
This situation has been called a quantum eraser. Entangling the two
spins creates a “measurement situation” in which the coherence of | ↑z iA
and | ↓z iA is lost because we can find out if spin A is up or down along ẑ by
observing spin B. But when we measure spin B along x̂, this information
is “erased.” Whether the result is | ↑x iB or | ↓x iB does not tell us anything
about whether spin A is up or down along ẑ, because Bob has been careful
not to retain the “which way” information that he might have acquired
by looking at the first Stern–Gerlach apparatus. Therefore, it is possible
again for spin A to behave like a coherent superposition of | ↑z iA and
| ↓z iA (and it does, after Alice hears about Bob’s result).
We can best understand the quantum eraser from the ensemble view-
point. Alice has many spins selected from an ensemble described by
ρA = 12 I, and there is no way for her to observe interference between
2.5 Ambiguity of the ensemble interpretation 33
The results are the same, irrespective of whether Bob “prepares” the spins
before or after Alice measures them.
We have claimed that the density matrix ρA provides a complete phys-
ical description of the state of subsystem A, because it characterizes all
possible measurements that can be performed on A. One might object
that the quantum eraser phenomenon demonstrates otherwise. Since the
information received from Bob enables Alice to recover a pure state from
the mixture, how can we hold that everything Alice can know about A is
encoded in ρA ?
I prefer to say that quantum erasure illustrates the principle that “in-
formation is physical.” The state ρA of system A is not the same thing
as ρA accompanied by the information that Alice has received from Bob.
This information (which attaches labels to the subensembles) changes the
physical description. That is, we should include Alice’s “state of knowl-
edge” in our description of her system. An ensemble of spins for which
Alice has no information about whether each spin is up or down is a dif-
ferent physical state than an ensemble in which Alice knows which spins
are up and which are down. This “state of knowledge” need not really
be the state of a human mind; any (inanimate) record that labels the
subensemble will suffice.
34 2 Foundations I: States and Ensembles
Here the states {|ϕi iA } are all normalized vectors, but we do not assume
that they are mutually orthogonal. Nevertheless, ρA can be realized as
an ensemble, in which each pure state |ϕi ihϕi | occurs with probability pi .
For any such ρA , we can construct a “purification” of ρA , a bipartite
pure state |Φ1 iAB that yields ρA when we perform a partial trace over
HB . One such purification is of the form
X√
|Φ1 iAB = pi |ϕi iA ⊗ |αi iB , (2.112)
i
where again the {|βµ iB ’s} are orthonormal vectors in HB . So in the state
|Φ2 iAB , we can realize the ensemble by performing a measurement in HB
that projects onto the {|βµ iB } basis.
Now, how are |Φ1 iAB and |Φ2 iAB related? In fact, we can easily show
that
|Φ1 iAB = (I A ⊗ U B ) |Φ2 iAB ; (2.117)
the two states differ by a unitary change of basis acting in HB alone, or
X√
|Φ1 iAB = qµ |ψµ iA ⊗ |γµ iB , (2.118)
µ
where
|γµ iB = U B |βµ iB , (2.119)
is yet another orthonormal basis for HB . We see, then, that there is a sin-
gle purification |Φ1 iAB of ρA , such that we can realize either the {|ϕi iA }
ensemble or {|ψµ iA } ensemble by choosing to measure the appropriate
observable in system B!
Similarly, we may consider many ensembles that all realize ρA , where
the maximum number of pure states appearing in any of the ensembles
is d. Then we may choose a Hilbert space HB of dimension d, and a
pure state |ΦiAB ∈ HA ⊗ HB , such that any one of the ensembles can
be realized by measuring a suitable observable of B. This is the HJW
theorem (for Hughston, Jozsa, and Wootters); it expresses the quantum
eraser phenomenon in its most general form.
In fact, the HJW theorem is an easy corollary to the Schmidt decom-
position. Both |Φ1 iAB and |Φ2 iAB have Schmidt decompositions, and
because both yield the same ρA when we take the partial trace over B,
these decompositions must have the form
Xp
|Φ1 iAB = λk |kiA ⊗ |k10 iB ,
k
Xp
|Φ2 iAB = λk |kiA ⊗ |k20 iB , (2.120)
k
where the λk ’s are the eigenvalues of ρA and the |kiA ’s are the correspond-
ing eigenvectors. But since {|k10 iB } and {|k20 iB } are both orthonormal
bases for HB , there is a unitary U B such that
|k10 iB = U B |k20 iB , (2.121)
36 2 Foundations I: States and Ensembles
(Some authors use the name “fidelity” for the square root of this quantity.)
The fidelity is nonnegative, vanishes if ρ and σ have support on mutually
orthogonal subspaces, and attains its maximum value 1 if and only if the
two states are identical. If ρ = |ψihψ| is a pure state, then the fidelity is
where p
kAk1 = tr A† A. (2.125)
2.6 How far apart are two quantum states? 37
The L1 norm is also sometimes called the trace norm. (For Hermitian
A, kAk1 is just the sum of the absolute values of its eigenvalues.) The
fidelity F (ρ, σ) is actually symmetric in its two arguments, although the
symmetry is not manifest in eq. (2.122). To verify the symmetry, note
that for any Hermitian A and B, the L1 norm obeys
This holds because BAAB and ABBA have the same eigenvalues —
if |ψi is an eigenstate of ABBA with eigenvalue λ, the BA|ψi is an
eigenstate of BAAB with eigenvalue λ.
It is useful to know how the fidelity of two density operators is related
to the overlap of their purifications. We say that |ΦiAB is a purification
of the density operator ρA if
If X
ρ= pi |iihi|, (2.128)
i
Noting that
X X
U ⊗ I|Φ̃i = |ji ⊗ |iiUji = |ji ⊗ |iiUijT = I ⊗ U T |Φ̃i, (2.134)
i,j i,j
we have
1 1
1 1
hΦσ (W )|Φρ (V )i = hΦ̃|σ 2 ρ 2 U ⊗ I|Φ̃i = tr σ 2 ρ 2 U , (2.135)
T
where U = W † V .
Now we may use the polar decomposition
p
A = U 0 A† A, (2.136)
where U 0 is unitary, to rewrite the inner product as
q X
1 1
0
hΦσ (W )|Φρ (V )i = tr U U ρ σρ 2 2 = λa ha|U U 0 |ai, (2.137)
a
q
1 1
where {λa } are the nonnegative eigenvalues of ρ 2 σρ 2 and {|ai} are
the corresponding eigenvectors. It is now evident that the inner product
has the largest possible absolute value when we choose U = U 0−1 , and
hence we conclude
q 2
1 1
F (ρ, σ) = tr ρ σρ 2 2 = max |hΦσ (W )|Φρ (V )i|2 . (2.138)
V ,W
(For Hermitian A, kAk2 is the square root of the sum of the squares
of its eigenvalues.) The L1 distance is a particularly natural measure
of state distinguishability, because (as shown in Exercise 2.5) it can be
interpreted as the distance between the corresponding probability dis-
tributions achieved by the optimal measurement for distinguishing the
states. Although the fidelity, L1 distance, and L2 distance are not simply
related to one another in general, there are useful inequalities relating
these different measures. √
If {|λi |, i = 0, 1, 2, . . . d−1} denotes the eigenvalues of A† A, then
v
d−1
X
ud−1
uX
kAk1 = |λi |; kAk2 = t |λi |2 . (2.142)
i=0 i=0
and note that the absolute value of this difference may be written as
√ √ X √ √ √ √
| ρ − σ| = |λi | |iihi| = ρ− σ U =U ρ − σ , (2.146)
i
where X
U= sign(λi )|iihi|. (2.147)
i
40 2 Foundations I: States and Ensembles
Using
1 √ √ √ √ 1 √ √ √ √
ρ−σ = ρ− σ ρ+ σ + ρ+ σ ρ − σ (2.148)
2 2
and the cyclicity of the trace, we find
√ √ √ √ X √ √
tr (ρ − σ) U = tr| ρ − σ| ρ + σ = |λi | hi| ρ + σ|ii
i
X √ √ X √ √
≥ |λi | hi| ρ − σ|ii = |λi |2 = k ρ − σk22 .
i i
(2.149)
Finally, using
and
|hϕ|ψi|2 = sin2 θ. (2.157)
Therefore,
2.7 Summary
Axioms. The arena of quantum mechanics is a Hilbert space H. The
fundamental assumptions are:
(1) A state is a ray in H.
(2) An observable is a self-adjoint operator on H.
(3) A measurement is an orthogonal projection.
42 2 Foundations I: States and Ensembles
2.8 Exercises
2.1 Fidelity of measurement
a) What do you think of Alice’s idea? Hint: What does the uni-
tarity of U tell you about how the states |β 0 iB and |β̃ 0 iB are
related to one another?
b) Would you feel differently if the states |ϕiA and |ϕ̃iA were or-
thogonal?
doesn’t want to tell him who the winner will be. But after the Series
is over, Alice wants to be able to convince Bob that she knew the
outcome all along. What to do?
Bob suggests that Alice write down her choice (0 if the Yankees will
win, 1 if the Dodgers will win) on a piece of paper, and lock the
paper in a box. She is to give the box to Bob, but she will keep the
key for herself. Then, when she is ready to reveal her choice, she
will send the key to Bob, allowing him to open the box and read
the paper.
Alice rejects this proposal. She doesn’t trust Bob, and she knows
that he is a notorious safe cracker. Who’s to say whether he will be
able to open the box and sneak a look, even if he doesn’t have the
key?
Instead, Alice proposes to certify her honesty in another way, using
quantum information. To commit to a value a ∈ {0, 1} of her bit, she
prepares one of two distinguishable density operators (ρ0 or ρ1 ) of
the bipartite system AB, sends system B to Bob, and keeps system
A for herself. Later, to unveil her bit, Alice sends system A to Bob,
and he performs a measurement to determine whether the state of
AB is ρ0 or ρ1 . This protocol is called quantum bit commitment.
We say that the protocol is binding if, after commitment, Alice is
unable to change the value of her bit. We say that the protocol is
concealing if, after commitment and before unveiling, Bob is unable
to discern the value of the bit. The protocol is secure if it is both
binding and concealing.
Show that if a quantum bit commitment protocol is concealing, then
it is not binding. Thus quantum bit commitment is insecure.
Hint: First argue that without loss of generality, we may assume
that the states ρ0 and ρ1 are both pure. Then apply the HJW
Theorem.
Remark: Note that the conclusion that quantum bit commitment
is insecure still applies even if the shared bipartite state (ρ0 or ρ1 ) is
prepared during many rounds of quantum communication between
Alice and Bob, where in each round one party performs a quantum
operation on his/her local system and on a shared message system,
and then sends the message system to the other party.
2.4 Completeness of subsystem correlations
Consider a bipartite system AB. Suppose that many copies of the
(not necessarily pure) state ρAB have been prepared. An observer
Alice with access only to subsystem A can measure the expectation
46 2 Foundations I: States and Ensembles
Remark: It follows from (c) alone that the correlations of the “lo-
cal” observables determine the expectation values of all the observ-
ables. Parts (a) and (b) serve to establish that ρ is completely
determined by the expectation values of a complete set of observ-
ables.
this distance is zero if the two distributions are identical, and attains
its maximum value two if the two distributions have support on
disjoint sets.
a) Show that
d−1
X
d(p, q) ≤ |λi | = kρ − σk1 ≡ d(ρ, σ), (2.173)
i=0
A matrix is doubly
P stochastic
P if its entries are nonnegative real num-
bers such that µ Dµi = i Dµi = 1. That the columns sum to one
assures that D maps probability vectors to probability vectors (i.e.,
is stochastic). That the rows sum to one assures that D maps the
uniform distribution to itself. Applied repeatedly, D takes any input
distribution closer and closer to the uniform distribution (unless D
is a permutation, with one nonzero entry in each row and column).
Thus we can view majorization as a partial order on probability
vectors such that q ≺ p means that q is more nearly uniform than p
(or equally close to uniform, in the case where D is a permutation).
Show that normalized pure states {|ϕµ i} exist such that eq.(2.176)
is satisfied if and only if q ≺ p, where p is the vector of eigenvalues
of ρ.
Hint: Recall that, according to the HJW Theorem, if eq.(2.175)
and eq.(2.176) are both satisfied then there is a unitary matrix Vµi
such that X√
√
qµ |ϕµ i = pi Vµi |αi i. (2.178)
i
You may also use (but need not prove) Horn’s Lemma: if q ≺ p,
then there exists a unitary (in fact, orthogonal) matrix Uµi such
that q = Dp and Dµi = |Uµi |2 .
I ⊗ EB
a |Ψi
|Ψi 7→ |Ψa i = . (2.180)
hΨ| I ⊗ E B
1/2
a |Ψi
X ⊗I I ⊗X X ⊗X
I ⊗Y Y ⊗I Y ⊗Y . (2.184)
X ⊗Y Y ⊗X Z ⊗Z
The three observables in each row and in each column are mutu-
ally commuting, and so can be simultaneously diagonalized. In fact
the simultaneous eigenstates of any two operators in a row or col-
umn (the third operator is not independent of the other two) are
a complete basis for the four-dimensional Hilbert space of the two
qubits. Thus we can regard the array eq.(2.184) as a way of present-
ing six different ways to choose a complete set of one-dimensional
projectors for two qubits.
Each of these observables has eigenvalues ±1, so that in a determin-
istic and noncontextual model of measurement (for a fixed value of
the hidden variables), each can be assigned a definite value, either
+1 or −1.
where the λi ’s are real and nonnegative. (We’re not assuming here
that the vector has unit norm, so the sum of the λi ’s is not con-
strained.) Eq.(2.185) is called the Schmidt decomposition of the
vector |ψiAB . Of course, the bases in which the vector has the
Schmidt form depend on which vector |ψiAB is being decomposed.
A unitary transformation acting on HAB is called a local unitary
if it is a tensor product U A ⊗ U B , where U A , U B are unitary
transformations acting on HA , HB respectively. The word “local”
is used because if the two parts A and B of the system are widely
separated from one another, so that Alice can access only part A
and Bob can access only part B, then Alice and Bob can apply this
transformation by each acting locally on her or his part.
a) Now suppose that Alice and Bob choose standard fixed bases
{|iiA } and {|iiB } for their respective Hilbert spaces, and are
presented with a vector |ψAB i that is not necessarily in the
Schmidt form when expressed in the standard bases. Show
that there is a local unitary U A ⊗ U B that Alice and Bob can
apply so that the resulting vector
John Preskill
California Institute of Technology
2
Contents 3
3.8 Exercises 49
3
Foundations II: Measurement and
Evolution
4
3.1 Orthogonal measurement and beyond 5
we have not explained how to measure the pointer. Von Neumann’s atti-
tude was that it is possible in principle to correlate the state of a micro-
scopic quantum system with the value of a macroscopic classical variable,
and we may take it for granted that we can perceive the value of the
classical variable. A quantum measurement, then, is a procedure for am-
plifying a property of the microscopic world, making it manifest in the
macroscopic world.
We may think of the pointer as a particle of mass m that propagates
freely apart from its tunable coupling to the quantum system being mea-
sured. Since we intend to measure the position of the pointer, it should be
prepared initially in a wavepacket state that is narrow in position space
— but not too narrow, because a vary narrow wave packet will spread too
rapidly. If the initial width of the wave packet is ∆x, then the uncertainty
in it velocity will be of order ∆v = ∆p/m ∼ ~/m∆x, so that after a time
t, the wavepacket will spread to a width
~t
∆x(t) ∼ ∆x + , (3.1)
∆x
which is minimized for (∆x(t))2 ∼ (∆x)2 ∼ ~t/m. Therefore, if the
experiment takes a time t, the resolution we can achieve for the final
position of the pointer is limited by
r
> ~t
∆x ∼(∆x)SQL ∼ , (3.2)
m
the “standard quantum limit.” We will choose our pointer to be suffi-
ciently heavy that this limitation is not serious.
The Hamiltonian describing the coupling of the quantum system to the
pointer has the form
1 2
H = H0 + P + λ(t)M ⊗ P , (3.3)
2m
where P 2 /2m is the Hamiltonian of the free pointer particle (which we
will henceforth ignore on the grounds that the pointer is so heavy that
spreading of its wavepacket may be neglected), H 0 is the unperturbed
Hamiltonian of the system to be measured, and λ is a coupling constant
that we are able to turn on and off as desired. The observable to be
measured, M , is coupled to the momentum P of the pointer.
If M does not commute with H 0 , then we have to worry about how
the observable M evolves during the course of the measurement. To
simplify the analysis, let us suppose that either [M , H 0 ] = 0, or else the
measurement is carried out quickly enough that the free evolution of the
system can be neglected during the measurement procedure. Then the
6 3 Foundations II: Measurement and Evolution
we express U (T ) as
X
U (T ) = |ai exp (−iλtMa P ) ha|. (3.6)
a
the position of the pointer has become correlated with the value of the
observable M . If the pointer wavepacket is narrow enough for us to
resolve all values of the Ma that occur (that is, the width ∆x of the packet
is small compared to λT ∆Ma , where ∆Ma is the minimal gap between
eigenvalues of M ), then when we observe the position of the pointer
(never mind how!) we will prepare an eigenstate of the observable. With
probability |αa |2 , we will detect that the pointer has shifted its position
by λT Ma , in which case we will have prepared the M eigenstate |ai. We
conclude that the initial state |ϕi of the quantum system is projected to
|ai with probability |ha|ϕi|2 . This is Von Neumann’s model of orthogonal
measurement.
The classic example is the Stern–Gerlach apparatus. To measure σ 3
for a spin- 21 object, we allow the object to pass through a region of inho-
mogeneous magnetic field
B3 = λz. (3.9)
3.1 Orthogonal measurement and beyond 7
We see that if, by coupling the system to our pointer, we can execute
suitable unitary transformations correlating the system and the pointer,
and if we can observe the pointer in its fiducial basis, then we are empow-
ered to perform any conceivable orthogonal measurement on the system.
Measuring the pointer by projecting onto the basis {|0i, |1i} would induce
an orthogonal measurement of the system, also in the {|0i, |1i} basis. But
suppose that we measure the pointer in a different basis instead, such as
3.1 Orthogonal measurement and beyond 9
{|±i = √12 (|0i ± |1i)}. Then the measurement postulate dictates that the
two outcomes + and − occur equiprobably, and that the corresponding
post-measurement states of the system are
where
1 1 0 1 1 1 0 1
M+ =√ = √ I, M− =√ = √ σ3. (3.22)
2 0 1 2 2 0 −1 2
Evidently, by measuring B in the basis {|±i}, we prepare A in one of the
states M ± |ψi, up to a normalization factor.
Now let’s generalize this idea to the case where the pointer system
B is N -dimensional, and the measurement of the pointer projects onto
an orthonormal basis {|ai, a = 0, 1, 2, . . . , N −1}. Again we’ll assume
that the system A and pointer B are initially in a product state, then
an entangling unitary transformation U correlates the system with the
pointer. By expanding the action of U in the basis for B we obtain
X
U : |ψiA ⊗ |0iB 7→ M a |ψiA ⊗ |aiB . (3.23)
a
Since U is unitary, it preserves the norm of any input, which means that
2
X
X X
1=
M a |ψi ⊗ |ai
= hψ|M †a M b |ψiha|bi = hψ|M †a M a |ψi
a
a
a,b
(3.24)
3.2.2 Reversibility
A unitary transformation U has a unitary inverse U † . Thus if today’s
quantum state was obtained by applying U to yesterday’s state, we can
in principle recover yesterday’s state by applying U † to today’s state.
Unitary time evolution is reversible.
Is the same true for general quantum channels? If channel E1 with
Kraus operators {M a } is inverted by channel E2 with Kraus operators
{N µ }, then for any pure state |ψi we have
X
E2 ◦ E1 (|ψihψ|) = N µ M a |ψihψ|M †a N †µ = |ψihψ|. (3.36)
µ,a
Since the left-hand side is a sum of positive terms, eq.(3.36) can hold only
if each of these terms is proportional to |ψihψ|, hence
N µ M a = λµa I (3.37)
14 3 Foundations II: Measurement and Evolution
so that
tr (A E(ρ)) = tr (E ∗ (A)ρ) . (3.48)
We say that E∗
is the dual or adjoint of E.
Note that the dual of a channel need not be a channel, that is, might
not be trace preserving. Instead, the completeness property of the Kraus
operators {M a } implies that
E ∗ (I) = I (3.49)
if E is a channel. We say that a map is unital if it preserves the identity
operator, and conclude that the dual of a channel is a unital map.
Not all quantum channels are unital, but some are. If the Kraus oper-
ators of E satisfy
X X
M †a M a = I = M a M †a , (3.50)
a a
then E is unital and its dual E ∗ is also a unital channel. A unital quan-
tum channel maps a maximally mixed density operator to itself; it is the
quantum version of a doubly stochastic classical map, which maps proba-
bility distributions to probability distributions and preserves the uniform
distribution.
16 3 Foundations II: Measurement and Evolution
3.2.5 Linearity
A quantum channel specifies how an initial density operator evolves to a
final density operator. Why on general grounds should we expect evolu-
tion of a quantum state to be described by a linear map? One possible
answer is that nonlinear evolution would be incompatible with interpret-
ing the density operator as an ensemble of possible states.
Suppose that E maps an initial state at time t = 0 to a final state at
time t = T , and suppose that at time t = 0 the initial state ρi is prepared
with probability pi . Then the time-evolved state at t = T will be E(ρi )
with probability pi .
On the other hand we argued in Chapter 2 that an ensemble in which
σ i is prepared with probability qi can be described by the convex combi-
nation of density operators
X
σ= qi σ i . (3.57)
i
P
Therefore the initial state is described by i pi ρi , which evolves to
!
X
ρ0 = E pi ρi . (3.58)
i
But we can also apply eq.(3.57) to the ensemble of final states, concluding
that the final state may alternatively be described by
X
ρ0 = pi E(ρi ). (3.59)
i
Equating the two expressions for ρ0 we find that E must act linearly, at
least on convex combinations of states.
18 3 Foundations II: Measurement and Evolution
and hence
T : ρ 7→ ρT . (3.65)
The map T is evidently positive because
X X
hψ|ρT |ψi = ψj∗ ρT ji ψi = ψi (ρ)ij ψj∗ = hψ ∗ |ρ|ψ ∗ i
(3.66)
i,j i,j
that is, it maps |Φ̃ihΦ̃| to the SWAP operator which interchanges the
systems A and B:
X X
SWAP : |ψiA ⊗ |ϕiB = ψi ϕj |i, ji =7→ ϕj ψi |j, ii = |ϕiA ⊗ |ψiB
i,j i,j
(3.69)
Since the square of SWAP is the identity, its eigenvalues are ±1. States
which are symmetric under interchange of A and B have eigenvalue 1,
while antisymmetric states have eigenvalue -1. Thus SWAP has negative
eigenvalues, which means that T ⊗ I is not positive and therefore T is not
completely positive.
20 3 Foundations II: Measurement and Evolution
(This scheme for extracting the action on |ϕiA using the dual vector R hϕ∗ |
is called the relative-state method.) Given a vector |Φ̃iRA0 , where R is d
dimensional, we may define an operator M a mapping HA to HA0 (where
A is d dimensional) by
M a |ϕiA = R hϕ∗ |Ψ̃a iRA0 ; (3.74)
it is easy to check that M a is linear. Thus eq.(3.73) provides an operator-
sum representation of E acting on the pure state (|ϕihϕ|)A (and hence by
linearity acting on any density operator):
X
E(ρ) = M a ρM †a . (3.75)
a
We have now established the desired isomorphism between states and
CP maps: Eq.(3.71) tells us how to obtain a state on RA0 from the channel
EA→A0 , while eq.(3.74) and eq.(3.75) tells us how to recover the CP map
from thePstate. Furthermore, the {M a } must obey the completeness
relation a M †a M a = I if E is trace-preserving.
Put succinctly, the argument went as follows. Because EA→A0 is com-
pletely positive, I ⊗ E takes a maximally entangled state on RA to a
density operator on RA0 , up to normalization. This density operator can
be expressed as an ensemble of pure states, and each of these pure states
is associated with a Kraus operator in the operator-sum representation of
E.
From this viewpoint, we see that the freedom to choose the Kraus
operators representing a channel in many different ways is really the same
thing as the freedom to choose the ensemble of pure states representing a
density operator in many different ways. According to the HJW theorem,
two different ensemble realizations of the same density operator,
X X
(I ⊗ E) |Φ̃ihΦ̃| = |Ψ̃a ihΨ̃a | 0
= (|γ̃µ ihγ̃µ |)RA0 , (3.76)
RA RA
a µ
If the initial state is |φ+ iRA , then when the depolarizing channel acts on
qubit A, the entangled state evolves as
!
p
|φ+ ihφ+ | 7→ (1 − p)|φ+ ihφ+ | + |ψ + ihψ + | + |ψ − ihψ − | + |φ− ihφ− | .
3
(3.89)
26 3 Foundations II: Measurement and Evolution
The “worst possible” quantum channel has p = 3/4, for in that case the
initial entangled state evolves as
!
+ + 1 + + − − + + − − 1
|φ ihφ | 7→ |φ ihφ | + |φ ihφ | + |ψ ihψ | + |ψ ihψ | = I;
4 4
(3.90)
it becomes the maximally mixed density matrix on RA. By the relative-
state method, then, we see that a pure state |ψi of qubit A evolves as
∗ 1 1
(|ψihψ|)A 7→ R hψ |2 I RA |ψ ∗ iR = I A , (3.91)
4 2
where the factor of two has been inserted because here we have used the
standard normalization of the entangled states, instead of the unconven-
tional normalization used in our earlier discussion of the relative-state
method. We see that, for p = 3/4, the qubit is mapped to the maximally
mixed density operator on A, irrespective of the value of the initial state
|ψiA . It is as though the channel threw away the initial quantum state,
and replaced it by completely random junk.
An alternative way to express the evolution of the maximally entangled
state is
+ + 4 + + 4 1
|φ ihφ | 7→ 1 − p |φ ihφ | + p I RA . (3.92)
3 3 4
Thus instead of saying that an error occurs with probability p, with errors
of three types all equally likely, we could instead say that an error occurs
with probability 4/3p, where the error completely “randomizes” the state
(at least we can say that for p ≤ 3/4). The existence of two natural ways
to define an “error probability” for this channel can sometimes cause
confusion.
One useful measure of how well the channel preserves the original quan-
tum information is called the “entanglement fidelity” Fe . It quantifies how
“close” the final density matrix is to the original maximally entangled
state |φ+ i after the action of I ⊗ E:
Fe = hφ+ |ρ0 |φ+ i. (3.93)
For the depolarizing channel, we have Fe = 1 − p, and we can interpret
Fe as the probability that no error occurred.
where
4
P~ 0 = 1 − p P~ (3.96)
3
Hence the Bloch sphere contracts uniformly under the action of the chan-
nel (for p ≤ 3/4); the spin polarization shrinks by the factor 1− 34 p (which
is why we call it the depolarizing channel).
In this case, unlike the depolarizing channel, qubit A does not make any
transitions in the {|0i, |1i} basis. Instead, the environment “scatters” off
of the qubit occasionally (with probability p), being kicked into the state
|1iE if A is in the state |0iA and into the state |2iE if A is in the state
|1iA . Furthermore, also unlike the depolarizing channel, the channel picks
out a preferred basis for qubit A; the basis {|0i, |1i} is the only basis in
which bit flips never occur.
28 3 Foundations II: Measurement and Evolution
time scales relevant to its dynamics. On the one hand, there is a damping
time scale, the time for a significant amount of the particle’s momentum
to be transfered to the photons, which is a long time for such a heavy
particle. On the other hand, there is the decoherence time scale. In this
model, the time scale for decoherence is of order Γ, the time for a single
photon to be scattered by the dust grain, which is far shorter than the
damping time scale. For a macroscopic object, decoherence is fast.
As we have already noted, the phase-damping channel picks out a pre-
ferred basis for decoherence, which in our “interpretation” we have as-
sumed to be the position-eigenstate basis. Physically, decoherence prefers
the spatially localized states of the dust grain because the interactions
of photons and grains are localized in space. Grains in distinguishable
positions tend to scatter the photons of the environment into mutually
orthogonal states.
Even if the separation between the “grains” were so small that it could
not be resolved very well by the scattered photons, the decoherence pro-
cess would still work in a similar way. Perhaps photons that scatter off
grains at positions x and −x are not mutually orthogonal, but instead
have an overlap
hγ + |γ−i = 1 − ε, ε 1. (3.106)
The phase-damping channel would still describe this situation, but with p
replaced by pε (if p is still the probability of a scattering event). Thus, the
decoherence rate would become Γdec = εΓscat , where Γscat is the scattering
rate.
The intuition we distill from this simple model applies to a wide va-
riety of physical situations. A coherent superposition of macroscopically
distinguishable states of a “heavy” object decoheres very rapidly com-
pared to its damping rate. The spatial locality of the interactions of the
system with its environment gives rise to a preferred “local” basis for de-
coherence. The same principle applies to Schrödinger’s unfortunate cat,
delicately prepared in a coherent superposition of its dead state and its
alive state, two states that are easily distinguished by spatially localized
probes. The cat quickly interacts with its environment, which is “scat-
tered” into one of two mutually orthogonal states perfectly correlated
with the cat’s state in the {|deadi, |alivei} basis, thus transforming the
cat into an incoherent mixture of those two basis states.
Visibility. On the other hand, for microscopic systems the time scale for
decoherence need not be short compared to dynamical time scales. Con-
sider for example a single atom, initially prepared in a uniform superposi-
tion of its ground state |0i and an excited state |1i with energy ~ω above
the ground state energy. Neglecting decoherence, after time t the atom’s
3.4 Three quantum channels 31
state will be
1
|ψ(t)i = √ |0i + e−iωt |1i .
(3.107)
2
If dephasing occurs in the {|0i, |1i} basis with rate Γ, the off-diagonal
terms in the density operator decay, yielding the density operator
eiωt e−Γt
1 1
ρ(t) = . (3.108)
2 e−iωt e−Γt 1
If after time t we measure the atom in the basis
1
|±i = √ (|0i ± |1i) , (3.109)
2
the probability of the + outcome is
1
1 + e−Γt cos ωt .
Prob(+, t) = h+|ρ(t)|+i = (3.110)
2
In principle this time dependence of the probability can be measured by
varying the time t between the preparation and measurement, and by re-
peating the experiment many times for each t to estimate the probability
with high statistical confidence. The decoherence rate Γ can be deter-
mined experimentally by fitting the declining visibility of the coherent
oscillations of Prob(+, t) to a decaying exponential function of t.
ρ 7→ E(ρ) = M 0 ρM †0 + M 1 ρM †1
√
ρ00 1 − p ρ 01 pρ 11 0
= √ +
1 − p ρ10 (1 − p) ρ11 0 0
√
ρ + pρ11 1 − p ρ01
= √00 . (3.114)
1 − p ρ10 (1 − p) ρ11
Time dependence. If Γ is the spontaneous decay rate per unit time, then
the decay occurs with probability p = Γ∆t 1 in a small time interval
∆t. We find the density operator after time t = n∆t by applying the
channel n times in succession. The ρ11 matrix element then decays as
the expected exponential decay law, while the off-diagonal entries decay
by the factor (1 − p)n/2 = e−Γt/2 ; hence we find
It is customary to use “T1 ” to denote the exponential decay time for the
excited population, and to use “T2 ” to denote the exponential decay time
for the off-diagonal terms in the density operator. In some systems where
dephasing is very rapid T2 is much shorter than T1 , but we see that for the
amplitude-damping channel these two times are related and comparable:
By the time that t T1 , the atom is in its ground state with high
probability (ρ00 (t) ≈ 1).
3.4 Three quantum channels 33
been |1i, or no photon is detected, in which case the initial state must
have been |0i. It’s odd but true: we can project out the state |0i of the
atom by not detecting anything.
we find
ρ̇ = L(ρ), (3.127)
where the linear map L generating time evolution is called the Liouvillian
or Lindbladian. This evolution equation has the formal solution
Lt n
ρ(t) = lim 1 + (ρ(0)) = eLt (ρ(0)) (3.128)
n→∞ n
if L is time independent.
The channel has an operator-sum representation
X
ρ(t + dt) = Edt (ρ(t)) = M a ρ(t)M †a = ρ(t) + O(dt), (3.129)
a
M 0 = I + (−iH + K)dt,
√
M a = dt La , (3.130)
where H and K are both hermitian and La , H, and K are all zeroth
order in dt. In fact, we can determine K by invoking the Kraus-operator
completeness relation; keeping terms up to linear order in O(dt), we find
!
X X
† †
I= M a M a = I + dt 2K + La L a + · · · , (3.131)
a a>0
or
1X †
K=− L a La . (3.132)
2
a>0
3.5 Master equations for open quantum systems 37
This is the general Markovian evolution law for quantum states in the
Schrödinger picture, assuming time evolution is a trace-preserving com-
pletely positive linear map. The first term in L(ρ) is the familiar Hamilto-
nian term generating unitary evolution. The other terms describe the pos-
sible transitions that the system may undergo due to interactions with the
environment. The operators La are called Lindblad operators or quantum
jump operators. Each La ρL†a term induces one of the possible quantum
jumps, while the terms −1/2L†a La ρ − 1/2ρL†a La are needed to normalize
properly the case in which no jumps occur.
Alternatively, we can describe the evolution using the Heisenberg
picture. Then instead of eq.(3.129), the density operator is time-
independent, while an operator A evolves according to
X
∗
A(t + dt) = Edt (A(t)) = M †a A(t)M a , (3.134)
a
and hence
X 1 † 1
∗
Ȧ = L (A) = i[H, A] + L†a ALa †
− La La A − ALa La .
2 2
a>0
(3.135)
Heisenberg-picture time evolution is unital rather than trace preserving;
the identity operator I does not evolve.
As for any nonunitary quantum channel, we have the freedom to rede-
fine the Kraus operators in the operator-sum representation of Edt , replac-
ing {M a } by operators {N µ } which differ by a unitary change of basis.
In particular, invoking this freedom for the jump operators (while leaving
M 0 untouched), we may replace {La } by {L0µ } where
X
L0µ = Vµa La (3.136)
a
and Vµa is a unitary matrix. We say that these two ways of choosing
the jump operators are two different unravelings of the same Markovian
dynamics.
The master equation describes what happens when the system interacts
with an unobserved environment, but we may also consider what happens
if the environment is continuously monitored. In that case each quantum
jump is detected; we update the quantum state of the system whenever
38 3 Foundations II: Measurement and Evolution
a jump occurs, and an initial pure state remains pure at all later times.
Specifically, a jump of type a occurs during the interval (t, t + dt) with
probability
Prob(a) = dthψ(t)|L†a La |ψ(t)i, (3.137)
and when a type-a jump is detected the updated state is
La |ψ(t)i
|ψ(t + dt)i = , (3.138)
kLa |ψ(t)ik
while when no jump occurs the state evolves as
M 0 |ψ(t)i
|ψ(t + dt)i = . (3.139)
kM 0 |ψ(t)ik
This stochastic Schrödinger evolution can be numerically simulated; each
simulated quantum trajectory is different, but averaging over a sample of
many such trajectories reproduces the evolution of the density operator as
described by the master equation. Simulating the stochastic Schrödinger
equation may have advantages over simulating the master equation, since
it is less costly to follow the evolution of a d-dimensional state vector than
a d × d density matrix.
gk (ab†k + a† bk ),
X
H0 = (3.140)
k
Here Γ is the rate for the oscillator to decay from the first excited (n
P = 1)
state to the ground (n = 0) state, which can be computed as Γ = k Γk ,
where Γk is the rate for emission into mode k. The rate for the decay
from level n to n−1 is nΓ. (The nth level of excitation of the oscillator
3.5 Master equations for open quantum systems 39
where eiφ(α,β) is a phase factor. Thus the off-diagonal terms decay expo-
nentially with time, at a rate
1
Γdecohere = Γ|α − β|2 (3.153)
2
proportional to |α − β|2 , the square of the separation of the two coher-
ent states in phase space; the decoherence rate is much larger than the
damping rate Γ for |α − β|2 1. This behavior is easy to interpret.
The expectation value of the occupation number n in a coherent state
is hni = hα|a† a|αi = |α|2 . Therefore, if α, β have comparable moduli
but significantly different phases (as for a superposition of minimum un-
certainty wave packets centered at positions x and −x), the decoherence
rate is comparable to Γhni, the rate for emission of a single photon. This
rate is very large compared to the rate for a significant fraction of the
oscillator energy to be dissipated, if n is large.
We can also consider an oscillator coupled to an environment with a
nonzero temperature. Again, the decoherence rate is roughly the rate for a
3.6 Non-Markovian noise 41
single photon to be emitted or absorbed, but the rate may be much faster
than at zero temperature. Because the photon modes with frequency
comparable to the oscillator frequency ω have a thermal occupation num-
ber
T
nγ ≈ , (3.154)
~ω
(for T ~ω), the interaction rate is further enhanced by the factor nγ .
For an oscillator with energy E = ~ωnosc , we have
Γdecohere E T
∼ nosc nγ ∼
Γdamp ~ω ~ω
mω 2 x2 T mT x2
∼ ∼ x2 2 ∼ 2 , (3.155)
~ω ~ω ~ λT
where x is the amplitude of oscillation and λT is the thermal de Broglie
wavelength of the oscillating object. For macroscopic objects, decoherence
is really fast.
The energy eigenstates |0ih0| or |1ih1| are not affected, but using
eq.(3.157) we see that the coefficients of the off-diagonal entries |0ih1|
and |1ih0| decay by the factor
1 T
Z Z T
0 0
exp − dt dt K(t − t )
2 0 0
Z ∞
1 T
Z Z T
0 dω −iω(t−t0 )
= exp − dt dt e K̃(ω) ; (3.159)
2 0 0 −∞ 2π
where J(t) is a modulating function that expresses the effect of the spin
echo pulse sequence. For example, if we flip the spin at time T /2, then
J(t) is +1 in the interval [0, T /2] and -1 in the interval [T /2, T ]; therefore
1 2
WT (w) = 2 1 − 2eiωT /2 + eiωT
ω
2
1 1 − eiωT /2 iωT
= 2 1 − e
ω 1 + eiωT /2
4
= tan2 (ωT /4) · sin2 (ωt/2). (3.165)
ω2
In effect, the spin echo modifies K̃(ω) by the multiplicative factor
tan2 (ωT /4), which suppresses the low frequency noise.
The suppression can be improved further by using more pulses. In prac-
tice, pulses have bounded strength and nonzero duration, which places
limitations on the effectiveness of this strategy.
now the center of the window function has been shifted to the frequency
ω01 of the transition.
As before, if we consider the observation time T to be large compared
to the autocorrelation time τc of the noise, then the support of the window
function is narrow, and K̃(ω) is approximately constant in the window.
Thus, after a suitable coarse-graining of the time evolution, we may iden-
tify a rate for the decay of the qubit
Γ↓ = K̃(ω = ω01 ). (3.168)
Similarly, for the transition from ground state to excited state, we find
Γ↑ = K̃(ω = −ω01 ). (3.169)
Thus negative frequency noise transfers energy from the noise reservoir
to the qubit, exciting the qubit, while positive frequency noise transfers
energy from qubit to the noise reservoir, returning the excited qubit to
the ground state. (Dephasing of a qubit, in contrast, involves a negligible
exchange of energy and therefore is controlled by low frequency noise.) We
conclude that an experimentalist capable of varying the energy splitting
ω01 and measuring the qubit’s transition rate can determine how the noise
power depends on the frequency.
For the case we have considered in which the noise source is classical,
f (t) and f (t0 ) are real commuting variables; therefore K(t) is an even
function of t and correspondingly K̃(ω) is an even function of ω. Classical
noise is spectrally symmetric, and the rates for excitation and decay are
equal.
On the other hand, noise driven by a quantized thermal “bath” can
be spectrally asymmetric. When the qubit comes to thermal equilibrium
with the bath, up and down transitions occur at equal rates. If p0 denotes
the probability that the qubit is in the ground state |0i and p1 denotes the
probability that the qubit is in the excited state |1i, then in equilibrium
K̃(−ω01 ) p1
p0 Γ↑ = p1 Γ↓ ⇒ = = e−βω01 ; (3.170)
K̃(ω01 ) p0
quantized bath. The noise will still be Gaussian if the bath is a system of
harmonic oscillators, uncoupled to one another and each coupled linearly
to the dephasing qubit. The Hamiltonian for the system A and bath B is
!
1 X † 1 X †
H A + H B + H AB = − ω01 σ 3 + ωk ak ak − σ 3 gk ak + gk∗ ak ,
2 2
k k
(3.171)
which is called the spin-boson model, as it describes a single spin- 12 par-
ticle coupled to many bosonic variables. This is a model of dephasing
because the coupling of the spin to the bath is diagonal in the spin’s
energy eigenstate basis. (Otherwise the physics of the model would be
harder to analyze.) Despite its simplicity, the spin-boson model provides
a reasonably realistic description of dephasing for a qubit weakly coupled
to many degrees of freedom in the environment.
If there are many oscillators, the sum over k can be approximated by
a frequency integral:
X Z ∞
|gk |2 ≈ dωJ(ω), (3.172)
k 0
where J(ω) is said to be the spectral function of the oscillator bath. Let’s
assume that the bath is in thermal equilibrium at temperature β −1 . In
principle, the coupling to the system could tweak the equilibrium distri-
bution of the bath, but we assume that this effect is negligible, because
the bath is much bigger than the system. The fluctuations of the bath
are Gaussian, and the average over the ensemble of classical functions in
our previous analysis can be replaced by the thermal expectation value:
[f (t)f (0)] 7→ hf (t)f (0)iβ ≡ tr e−βH B f (t)f (0) , (3.173)
We see that
|gk |2 he−iωk t ak a†k + eiωk t a†k ak iβ .
X
Kβ (t) ≡ hf (t)f (0)iβ = (3.175)
k
assuming that this limit exists. The noise is said to be Ohmic if J(ω) ≈
Aω is linear in ω at low frequency, and in that case the dephasing rate
becomes Γ2 = 2πAβ −1 in the limit of long time T .
3.7 Summary
POVM. If we restrict our attention to a subspace of a larger Hilbert
space, then an orthogonal (Von Neumann) measurement performed on the
larger space cannot in general be described as an orthogonal measurement
48 3 Foundations II: Measurement and Evolution
where X
M †a M a = I. (3.185)
a
In fact, any reasonable (linear, trace preserving, and completely positive)
mapping of density operators to density operators has such an operator-
sum representation.
Decoherence. Decoherence — the decay of quantum information due
to the interaction of a system with its environment — can be described
by a quantum channel. If the environment frequently “scatters” off the
system, and the state of the environment is not monitored, then off-
diagonal terms in the density operator of the system decay rapidly in a
preferred basis (typically a spatially localized basis selected by the nature
of the coupling of the system to the environment). The time scale for
decoherence is set by the scattering rate, which may be much larger than
the damping rate for the system.
Master Equation. When the relevant dynamical time scale of an
open quantum system is long compared to the time for the environment
to “forget” quantum information, the evolution of the system is effectively
local in time (the Markovian approximation). Much as general unitary
evolution is generated by a Hamiltonian, a general Markovian superoper-
ator is generated by a Liouvillian L as described by the master equation:
X 1 † 1 †
†
ρ̇ ≡ L(ρ) = −i[H, ρ] + La ρLa − La La ρ − ρLa La . (3.186)
a
2 2
3.8 Exercises 49
3.8 Exercises
3.1 Which state did Alice make?
Consider a game in which Alice prepares one of two possible states:
either ρ1 with a priori probability p1 , or ρ2 with a priori probability
p2 = 1 − p1 . Bob is to perform a measurement and on the basis of
the outcome to guess which state Alice prepared. If Bob’s guess is
right, he wins; if he guesses wrong, Alice wins.
In this exercise you will find Bob’s best strategy, and determine his
optimal probability of error.
Let’s suppose (for now) that Bob performs a POVM with two pos-
sible outcomes, corresponding to the two nonnegative Hermitian
operators E 1 and E 2 = I − E 1 . If Bob’s outcome is E 1 , he guesses
that Alice’s state was ρ1 , and if it is E 2 , he guesses ρ2 . Then the
probability that Bob guesses wrong is
a) Show that
X
perror = p1 + λi hi|E 1 |ii , (3.188)
i
where 0 < α < π/4. Suppose that Alice decides at random to send
either |ui or |vi to Bob, and Bob is to make a measurement to
determine what she sent. Since the two states are not orthogonal,
Bob cannot distinguish the states perfectly.
a) Bob realizes that he can’t expect to be able to identify Alice’s
qubit every time, so he settles for a procedure that is successful
only some of the time. He performs a POVM with three pos-
sible outcomes: ¬u, ¬v, or DON’T KNOW. If he obtains the
52 3 Foundations II: Measurement and Evolution
result ¬u, he is certain that |vi was sent, and if he obtains ¬v,
he is certain that |ui was sent. If the result is DON’T KNOW,
then his measurement is inconclusive. This POVM is defined
by the operators
But now let’s suppose that Eve wants to eavesdrop on the state as it
travels from Alice to Bob. Like Bob, she wishes to extract optimal
information that distinguishes |ψi from |ψ̃i, and she also wants to
minimize the disturbance introduced by her eavesdropping, so that
Alice and Bob are not likely to notice that anything is amiss.
Eve realizes that the optimal POVM can be achieved by measure-
ment operators
where the vectors |φ0 i, and |φ1 i are arbitrary. If Eve performs this
measurement, then Bob receives the state
where
1 − 2 cos2 α sin2 α
cos α sin α
A = ,
cos α sin α 2 cos2 α sin2 α
2 cos2 α sin2 α
cos α sin α
B = . (3.210)
cos α sin α 1 − 2 cos2 α sin2 α
54 3 Foundations II: Measurement and Evolution
b) Show that if |φ0 i and |φ1 i are chosen optimally, the minimal
disturbance that can be attained is
1 p
Dmin (cos2 θ) = (1 − 1 − cos2 θ + cos4 θ) . (3.211)
2
[Hint: We can choose |φ0 i and |φ1 i to maximize the two terms
in eq. (3.209) independently. The maximal value is the maxi-
mal eigenvalue of A, which since √
the eigenvalues sum to 1, can
be expressed as λmax = 12 1 + 1 − 4 · det A .] Of course,
Eve could reduce the disturbance further were she willing to
settle for a less than optimal probability of guessing Alice’s
state correctly.
c) Sketch a plot of the function Dmin (cos2 θ). Interpret its value for
cos θ = 1 and cos θ = 0. For what value of θ is Dmin largest?
Find Dmin and (perror )optimal for this value of θ.
a) Since Eve does not know the secret key, to her the encrypted
state is indistinguishable from
1 X
E(ρ) = 2n σ(x)ρσ(x) . (3.213)
2 x
where X
M †a M a = I . (3.217)
a
E(ρ) ≺ ρ . (3.219)
It turns out that the minimal overlap that
can be
achieved by any
1 1
2
POVM is related to the fidelity F (ρ, ρ̃) =
ρ̃ 2 ρ 2
:
1
p
min{E i } [Overlap(ρ, ρ̃; {E i })] = F (ρ, ρ̃) . (3.224)
58 3 Foundations II: Measurement and Evolution
In this exercise, you will show that the square root of the fidelity
is a lower bound on the overlap, but not that the bound can be
saturated.
b) The space of linear operators acting on a Hilbert space is itself a
Hilbert space, where the inner product (A, B) of two operators
A and B is
(A, B) ≡ tr A† B . (3.225)
For this inner product, the Schwarz inequality becomes
1/2 1/2
† † †
|tr A B| ≤ tr A A tr B B , (3.226)
1 1 1 1
Choosing A = ρ 2 E i2 and B = U ρ̃ 2 E i2 (for an arbitrary uni-
tary U ), use this form of the Schwarz inequality to show that
1 1
Overlap(ρ, ρ̃; {E i }) ≥ |tr ρ 2 U ρ̃ 2 | . (3.227)
c) Now use the polar decomposition
p
A = V A† A (3.228)
(where V is unitary) to write
q
1 1 1 1
ρ̃ 2 ρ 2 = V ρ 2 ρ̃ρ 2 , (3.229)
and by choosing the unitary U in eq. (3.227) to be U = V −1 ,
show that
p
Overlap(ρ, ρ̃; {E i }) ≥ F (ρ, ρ̃) . (3.230)
d) We can obtain an explicit formula for the fidelity in the case of
two states of a single qubit. Using the Bloch parametrization
1
ρ(P~ ) = ~ · P~ ,
I +σ (3.231)
2
show that the fidelity of two single-qubit states with polariza-
tion vectors P~ and Q ~ is
1
q
~ ~
F (P , Q) = ~ ~ ~ 2
1 + P · Q + (1 − P )(1 − Q )~ 2 . (3.232)
2
Hint: First note that the eigenvalues of a 2 × 2 matrix can be
expressed in terms of the trace and determinant of the matrix.
1 1
Then evaluate the determinant and trace of ρ 2 ρ̃ρ 2 , and
calculate the fidelity using the corresponding expression for
the eigenvalues.
3.8 Exercises 59
Because the states {|0i, |1i} of the electromagnetic field are orthog-
onal, the quantum state of the oscillator may decohere. Summing
over these basis states, we see that the initial pure state |ψihψ| of
the oscillator evolves in time dt as
|ψihψ| 7→ h0|Ψ(dt)ihΨ(dt)|0i + h1|Ψ(dt)ihΨ(dt)|1i
† 1 † 1 †
= Γdt a|ψihψ|a + I − Γdt a a |ψihψ| I − Γdt a a ;
2 2
3.8 Exercises 61
Note that there are two things to check in eq.(3.240): that the
value of α decays with time, and that the normalization of the
state decays with time.
c) Verify that, also to linear order in dt,
We already know from part (c) how the “diagonal” terms |αihα| and
|βihβ| evolve, but what about the “off-diagonal” terms |αihβ| and
|βihα|?
e) Using arguments similar to those used in parts (b) and (c), show
that in time t, the operator |αihβ| evolves as
2 /2
|αihβ| 7→ eiφ(α,β) e−Γt|α−β| |αe−Γt/2 ihβe−Γt/2 |, (3.246)
and find the phase factor eiφ(α,β) . Thus the off-diagonal terms
decay exponentially with time, at a rate
1
Γdecohere = Γ|α − β|2 (3.247)
2
proportional to the distance squared |α − β|2 .
f ) Consider an oscillator with mass m = 1 g, circular frequency
ω = 1 s−1 and (very good) quality factor Q ≡ ω/Γ = 109 . Thus
the damping time is very long: over 30 years. A superposition
of minimum uncertainty wavepackets is prepared, centered at
positions x = ±1 cm. Estimate the decoherence rate. (Wow!
For macroscopic objects, decoherence is really fast!)
3.8 Exercises 63
∗ .
where the Eµν ’s are complex numbers satisfying Eµν = Eνµ
Hint: The operation E has an operator-sum representation
with operation elements {M a }. Each M a can be expanded in
the basis {σ µ , µ = 0, 1, 2, 3}.
b) Find four independent conditions that must be satisfied by the
Eµν ’s in order that the operation E be trace-preserving (a chan-
nel).
c) A Hermitian 2 × 2 operator can be expressed as
3
1X
ρ(P ) = Pµ σ µ , (3.249)
2
µ=0
P~ 0 = M P~ + ~v , (3.251)
a) Show that Z
d2 α E α = I , (3.253)
where
1
|αihα| .
Eα = (3.254)
π
Hint: Evaluate matrix elements of both sides of the equation
between coherent states.
b) Since the E α ’s provide a partition of unity, they define a POVM
(an “ideal heterodyne measurement” of the oscillator). Sup-
pose that a coherent state |βi is prepared, and that an ideal
heterodyne measurement is performed, so that the coherent
state |αi is obtained with probability distribution P (α) d2 α =
hβ|E α |βi d2 α. With what fidelity does the measurement out-
come |αi approximate the initial coherent state |βi, averaged
over the possible outcomes?
John Preskill
California Institute of Technology
November 2, 2001
Contents
4 Quantum Entanglement 4
4.1 Nonseparability of EPR pairs 4
4.1.1 Hidden quantum information 4
4.1.2 Einstein locality and hidden variables 8
4.2 The Bell inequality 10
4.2.1 Three quantum coins 10
4.2.2 Quantum entanglement vs. Einstein locality 13
4.3 More Bell inequalities 17
4.3.1 CHSH inequality 17
4.3.2 Maximal violation 18
4.3.3 Quantum strategies outperform classical strategies 20
4.3.4 All entangled pure states violate Bell inequalities 22
4.3.5 Photons 24
4.3.6 Experiments and loopholes 26
4.4 Using entanglement 27
4.4.1 Dense coding 28
4.4.2 Quantum teleportation 30
4.4.3 Quantum teleportation and maximal entanglement 32
4.4.4 Quantum software 35
4.5 Quantum cryptography 36
4.5.1 EPR quantum key distribution 36
4.5.2 No cloning 39
4.6 Mixed-state entanglement 41
4.6.1 Positive-partial-transpose criterion for separability 43
4.7 Nonlocality without entanglement 45
4.8 Multipartite entanglement 48
4.8.1 Three quantum boxes 49
4.8.2 Cat states 55
4.8.3 Entanglement-enhanced communication 57
2
Contents 3
1
|φ+ iAB = √ (|00iAB + |11iAB ) . (4.1)
2
“Maximally entangled” means that when we trace over qubit B to find
the density operator ρA of qubit A, we obtain a multiple of the identity
operator
1
ρA = trB (|φ+ ihφ+ |) = IA , (4.2)
2
(and similarly ρB = 12 I B ). This means that if we measure spin A along
any axis, the result is completely random — we find spin up with proba-
bility 1/2 and spin down with probability 1/2. Therefore, if we perform
any local measurement of A or B, we acquire no information about the
preparation of the state, instead we merely generate a random bit. This
situation contrasts sharply with case of a single qubit in a pure state;
there we can store a bit by preparing, say, either | ↑n̂ i or | ↓n̂ i, and we
can recover that bit reliably by measuring along the n̂-axis. With two
4
4.1 Nonseparability of EPR pairs 5
qubits, we ought to be able to store two bits, but in the state |φ+ iAB this
information is hidden; at least, we can’t acquire it by measuring A or B.
In fact, |φ+ i is one member of a basis of four mutually orthogonal states
for the two qubits, all of which are maximally entangled — the basis
1
|φ± i = √ (|00i ± |11i) ,
2
1
|ψ ± i = √ (|01i ± |10i) , (4.3)
2
introduced in §3.4.1. Imagine that Alice and Bob play a game with Char-
lie. Charlie prepares one of these four states, thus encoding two bits in
the state of the two-qubit system. One bit is the parity bit (|φi or |ψi):
are the two spins aligned or antialigned? The other is the phase bit (+ or
−): what superposition was chosen of the two states of like parity. Then
Charlie sends qubit A to Alice and qubit B to Bob. To win the game,
Alice (or Bob) has to identify which of the four states Charlie prepared.
Of course, if Alice and Bob bring their qubits together, they can iden-
tify the state by performing an orthogonal measurement that projects
onto the {|φ+ i, |φ−i, |ψ +i, |ψ −i} basis. But suppose that Alice and Bob
are in different cities, and that they are unable to communicate at all.
Acting locally, neither Alice nor Bob can collect any information about
the identity of the state.
What they can do locally is manipulate this information. Alice may
apply σ 3 to qubit A, flipping the relative phase of |0iA and |1iA . This
action flips the phase bit stored in the entangled state:
|φ+ i ↔ |φ− i ,
|ψ + i ↔ |ψ −i . (4.4)
On the other hand, she can apply σ1 , which flips her spin (|0iA ↔ |1iA),
and also flips the parity bit of the entangled state:
|φ+ i ↔ |ψ + i ,
|φ− i ↔ −|ψ − i . (4.5)
Bob can manipulate the entangled state similarly. In fact, as we discussed
in §2.4, either Alice or Bob can perform a local unitary transformation
that changes one maximally entangled state to any other maximally en-
tangled state.∗ What their local unitary transformations cannot do is alter
∗
But of course, this does not suffice to perform an arbitrary unitary transformation on
the four-dimensional space HA ⊗ HB , which contains states that are not maximally
entangled. The maximally entangled states are not a subspace — a superposition of
maximally entangled states typically is not maximally entangled.
6 4 Quantum Entanglement
(A) (B)
the eigenvalue of σ 3 ⊗ σ 3 is the parity bit, and the eigenvalue of
(A) (B)
σ 1 ⊗ σ 1 is the phase bit. Since these operators commute, they can in
principle be measured simultaneously. But they cannot be measured si-
multaneously if Alice and Bob perform localized measurements. Alice and
Bob could both choose to measure their spins along the z-axis, preparing
(A) (B) (A) (B)
a simultaneous eigenstate of σ 3 and σ3 . Since σ 3 and σ 3 both
(A) (B)
commute with the parity operator σ 3 ⊗ σ 3 , their orthogonal measure-
ments do not disturb the parity bit, and they can combine their results
(A) (B)
to infer the parity bit. But σ 3 and σ 3 do not commute with phase
(A) (B)
operator σ 1 ⊗ σ 1 , so their measurement disturbs the phase bit. On
the other hand, they could both choose to measure their spins along the
x-axis; then they would learn the phase bit at the cost of disturbing the
parity bit. But they can’t have it both ways. To have hope of acquiring
the parity bit without disturbing the phase bit, they would need to learn
(A) (B) (A)
about the product σ 3 ⊗ σ 3 without finding out anything about σ 3
(B)
and σ3 separately. That cannot be done locally.
Now let us bring Alice and Bob together, so that they can operate on
their qubits jointly. How might they acquire both the parity bit and the
phase bit of their pair? By applying an appropriate unitary transforma-
tion, they can rotate the entangled basis {|φ± i, |ψ ±i} to the unentan-
gled basis {|00i, |01i, |10i, |11i}. Then they can measure qubits A and
B separately to acquire the bits they seek. How is this transformation
constructed?
This is a good time to introduce notation that will be used heavily
in later chapters, the quantum circuit notation. Qubits are denoted by
horizontal lines, and the single-qubit unitary transformation U is denoted:
U
4.1 Nonseparability of EPR pairs 7
b
a⊕b
Thus this transformation flips the second bit if the first is 1, and acts
trivially if the first bit is 0; it has the property
(CNOT)2 = I ⊗ I . (4.12)
We call a the control (or source) bit of the CNOT, and b the target bit.
By composing these “primitive” transformations, or quantum gates, we
can build other unitary transformations. For example, the “circuit”
H u
i
8 4 Quantum Entanglement
θ
| ↑n̂ i , for 0 ≤ λ ≤ cos2 ,
2
θ
| ↓n̂ i , for cos2 <λ≤1. (4.14)
2
coins, after all! Let’s say you would like to uncover coin 1 and coin
2. Well, I’ll uncover my coin 2 here in Chicago, and I’ll call you to
tell you what I found, let’s say its an H. We know, then, that you
are certain to find an H if you uncover your coin 2 also. There’s
absolutely no doubt about that, because we’ve checked it a million
times. Right?
Alice: Right . . .
Bob: But now there’s no reason for you to uncover your coin 2; you
know what you’ll find anyway. You can uncover coin 1 instead.
And then you’ll know the value of both coins.
Alice: Hmmm . . . yeah, maybe. But we won’t be sure, will we? I mean,
yes, it always worked when we uncovered the same coin before,
but this time you uncovered your coin 2, and your coins 1 and 3
disappeared, and I uncovered my coin 1, and my coins 2 and 3
disappeared. There’s no way we’ll ever be able to check anymore
what would have happened if we had both uncovered coin 2.
Bob: We don’t have to check that anymore, Alice; we’ve already checked
it a million times. Look, your coins are in Pasadena and mine are in
Chicago. Clearly, there’s just no way that my decision to uncover
my coin 2 can have any influence on what you’ll find when you
uncover your coin 2. That’s not what’s happening. It’s just that
when I uncover my coin 2 we’re collecting the information we need
to predict with certainty what will happen when you uncover your
coin 2. Since we’re already certain about it, why bother to do it!
Alice: Okay, Bob, I see what you mean. Why don’t we do an experiment
to see what really happens when you and I uncover different coins?
Bob: I don’t know, Alice. We’re not likely to get any funding to do
such a dopey experiment. I mean, does anybody really care what
happens when I uncover coin 2 and you uncover coin 1?
Alice: I’m not sure, Bob. But I’ve heard about a theorist named Bell.
They say that he has some interesting ideas about the coins. He
might have a theory that makes a prediction about what we’ll find.
Maybe we should talk to him.
Bob: Good idea! And it doesn’t really matter whether his theory makes
any sense or not. We can still propose an experiment to test his
prediction, and they’ll probably fund us.
So Alice and Bob travel to CERN to have a chat with Bell. They tell
Bell about the experiment they propose to do. Bell listens closely, but for
12 4 Quantum Entanglement
a long time he remains silent, with a faraway look in his eyes. Alice and
Bob are not bothered by his silence, as they rarely understand anything
that theorists say anyway. But finally Bell speaks.
Bell: I think I have an idea . . . . When Bob uncovers his coin in Chicago,
that can’t exert any influence on Alice’s coin in Pasadena. Instead,
what Bob finds out by uncovering his coin reveals some information
about what will happen when Alice uncovers her coin.
Bob: Well, that’s what I’ve been saying . . .
Bell: Right. Sounds reasonable. So let’s assume that Bob is right about
that. Now Bob can uncover any one of his coins, and know for sure
what Alice will find when she uncovers the corresponding coin. He
isn’t disturbing her coin in any way; he’s just finding out about it.
We’re forced to conclude that there must be some hidden variables
that specify the condition of Alice’s coins. And if those variables
are completely known, then the value of each of Alice’s coins can be
unambiguously predicted.
Bob: [Impatient with all this abstract stuff] Yeah, but so what?
Bell: When your correlated coin sets are prepared, the values of the hid-
den variables are not completely specified; that’s why any one coin
is as likely to be an H as a T . But there must be some probabil-
ity distribution P (x, y, z) (with x, y, z ∈ {H, T }) that characterizes
the preparation and governs Alice’s three coins. These probabilities
must be nonnegative, and they sum to one:
X
P (x, y, z) = 1 . (4.15)
x,y,z∈{H,T }
Alice can’t uncover all three of her coins, so she can’t measure
P (x, y, z) directly. But with Bob’s help, she can in effect uncover
any two coins of her choice. Let’s denote with Psame (i, j), the prob-
ability that coins i and j (i, j = 1, 2, 3) have the same value, either
both H or both T . Then we see that
Psame (1, 2) = P (HHH) + P (HHT ) + P (T T H) + P (T T T ) ,
Psame (2, 3) = P (HHH) + P (T HH) + P (HT T ) + P (T T T ) ,
Psame (1, 3) = P (HHH) + P (HT H) + P (T HT ) + P (T T T ) ,
(4.16)
and it immediately follows from eq. (4.15) that
Psame (1, 2) + Psame (2, 3) + Psame (1, 3)
= 1 + 2 P (HHH) + 2 P (T T T ) ≥ 1 . (4.17)
4.2 The Bell inequality 13
You can test it my doing your experiment that “uncovers” two coins
at a time.
Bob: Well, I guess the math looks right. But I don’t really get it. Why
does it work?
Alice: I think I see . . . . Bell is saying that if there are three coins on a
table, and each one is either an H or a T , then at least two of the
three have to be the same, either both H or both T . Isn’t that it,
Bell?
Bell stares at Alice, a surprised look on his face. His eyes glaze, and
for a long time he is speechless. Finally, he speaks:
Bell: Yes
So Alice and Bob are amazed and delighted to find that Bell is that
rarest of beasts — a theorist who makes sense. With Bell’s help, their pro-
posal is approved and they do the experiment, only to obtain a shocking
result. After many careful trials, they conclude, to very good statistical
accuracy that
1
Psame (1, 2) ' Psame (2, 3) ' Psame (1, 3) ' , (4.19)
4
and hence
1 3
Psame (1, 2) + Psame (2, 3) + Psame (1, 3) ' 3 · = <1 .
4 4 (4.20)
The correlations found by Alice and Bob flagrantly violate Bell’s inequal-
ity!
Alice and Bob are good experimenters, but dare not publish so dis-
turbing a result unless they can find a plausible theoretical interpreta-
tion. Finally, they become so desperate that they visit the library to see
if quantum mechanics can offer any solace . . .
along one of three possible axes, no two of which are orthogonal. Since
the measurements don’t commute, Alice can uncover only one of her three
coins. Similarly, when Bob uncovers his coin, he measures his member
of the entangled pair along any one of three axes, so he too is limited to
uncovering just one of his three coins. But since Alice’s measurements
commute with Bob’s, they can uncover one coin each, and study how
Alice’s coins are correlated with Bob’s coins.
To help Alice and Bob interpret their experiment, let’s see what quan-
tum mechanics predicts about these correlations. The state |ψ − i has the
convenient property that it remains invariant if Alice and Bob each apply
the same unitary transformation,
~ (A) + σ
σ ~ (B) |ψ − i = 0 (4.22)
(the state has vanishing total angular momentum, as you can easily check
by an explicit computation). Now consider the expectation value
~ (A) · â
hψ − | σ ~ (B) · b̂ |ψ −i ,
σ (4.23)
~ (B) by
where â and b̂ are unit 3-vectors. Acting on |ψ −i, we can replace σ
σ (A) ; therefore, the expectation value can be expressed as a property of
−~
Alice’s system, which has density operator ρA = 12 I:
~ (A) · â
− hψ − | σ ~ (A) · b̂ |ψ −i
σ
(A) (A)
= −ai bj tr ρA σ i σj = −ai bj δij = −â · b̂ = − cos θ ,
(4.24)
where θ is the angle between the axes â and b̂. Thus we find that the
measurement outcomes are always perfectly anticorrelated when we mea-
sure both spins along the same axis â, and we have also obtained a more
general result that applies when the two axes are different.
The projection operator onto the spin up (spin down) states along n̂ is
4.2 The Bell inequality 15
E(n̂, ±) = 12 (I ± n̂ · σ
~ ); we therefore obtain
1
P (++) = hψ −|E (A) (â, +)E (B)(b̂, +)|ψ −i = (1 − cos θ) ,
4
1
P (−−) = hψ −|E (A) (â, −)E (B)(b̂, −)|ψ −i = (1 − cos θ) ,
4
− (A) (B) − 1
P (+−) = hψ |E (â, +)E (b̂, −)|ψ i = (1 + cos θ) ,
4
− (A) (B) 1
P (−+) = hψ |E (â, −)E (b̂, +)|ψ−i = (1 + cos θ) ;
4
(4.25)
here P (++) is the probability that Alice and Bob both obtain the spin-
up outcome when Alice measures along â and Bob measures along b̂, etc.
The probability that their outcomes are the same is
1
Psame = P (++) + P (−−) = (1 − cos θ) , (4.26)
2
and the probability that their outcomes are opposite is
1
Popposite = P (+−) + P (−+) = (1 + cos θ) . (4.27)
2
Now suppose that Alice measures her spin along one of the three sym-
metrically distributed axes in the x − z plane,
â1 = (0, 0, 1) ,
√ !
3 1
â2 = , 0, − ,
2 2
√ !
3 1
â3 = − , 0, − , (4.28)
2 2
so that
1
â1 · â2 = â2 · â3 = â3 · â1 = − . (4.29)
2
And suppose that Bob measures along one of three axes that are diamet-
rically opposed to Alice’s:
When Alice and Bob choose opposite axes, then θ = 180◦ and Psame = 1.
But otherwise θ = ±60◦ so that cos θ = 1/2 and Psame = 1/4. This is just
the behavior that Alice and Bob found in their experiment, in violation
of Bell’s prediction.
16 4 Quantum Entanglement
C ≡ (a + a0 )b + (a − a0 )b0 = ±2 . (4.31)
so that
~ (A) · â ,
a=σ ~ (A) · â0 ,
a0 = σ (4.34)
acting on a qubit in Alice’s possession, where â, â0 are unit 3-vectors.
Similarly, let b, b0 denote
~ (B) · b̂ ,
b=σ ~ (B) · b̂0 ,
b0 = σ (4.35)
where θ is the angle between â and b̂. Consider the case where â0 , b̂, â, b̂0
are coplanar and separated by successive 45◦ angles. so that the quantum-
mechanical predictions are
π 1
habi = ha0 bi = hab0 i = − cos = −√ ,
4 2
3π 1
ha0 b0 i = − cos =√ . (4.37)
4 2
The CHSH inequality then becomes
1 √
4· √ =2 2≤2 , (4.38)
2
which is clearly violated by the quantum-mechanical prediction.
Defining
C = ab + a0 b + ab0 − a0 b0 , (4.41)
we evaluate
I +aa0 +bb0 −aa0 bb0
+a a0 +I 0
+a abb 0 −bb0
C2 = 0 0 0 , (4.42)
+b b +aa b b +I −aa0
−a0 ab0 b −b0 b −a0 a +I
using eq. (4.39). All the quadratic terms cancel pairwise, so that we are
left with
and therefore
√
k C ksup ≤ 2 2 , (4.48)
where ⊕ denotes the sum modulo 2 (the XOR gate) and ∧ denotes the
product (the AND gate). Can Alice and Bob find a strategy that enables
them to win the game every time, no matter how Charlie chooses the
input bits?
No, it is easy to see that there is no such strategy. Let a0 , a1 denote the
value of Alice’s output if her input is x = 0, 1 and let b0 , b1 denote Bob’s
output if his input is y = 0, 1. For Alice and Bob to win for all possible
inputs, their output bits must satisfy
a0 ⊕ b0 = 0 ,
a0 ⊕ b1 = 0 ,
a1 ⊕ b0 = 0 ,
a1 ⊕ b1 = 1 . (4.51)
4.3 More Bell inequalities 21
a = (−1)a0 , a0 = (−1)a1 ,
b = (−1)b0 , b0 = (−1)b1 . (4.52)
Then the CHSH inequality says that for any joint probability distribution
governing a, a0 , b, b0 ∈ {±1}, the expectation values satisfy
habi = 2p00 − 1 ,
hab0 i = 2p01 − 1 ,
ha0 bi = 2p10 − 1 ,
ha0 b0 i = 1 − 2p11 ; (4.54)
or
1 3
hpi ≡ (p00 + p01 + p10 + p11 ) ≤ , (4.56)
4 4
where hpi denotes the probability of winning averaged over a uniform
ensemble for the input bits. Thus, if the input bits are random, Alice and
Bob cannot attain a probability of winning higher than 3/4.
It is worthwhile to consider how the assumption that Alice and Bob
take actions governed by “local hidden variables” limits their success in
playing the game. Although Alice and Bob do not share any quantum
entanglement, they are permitted to share a table of random numbers that
22 4 Quantum Entanglement
they may consult to produce their output bits. Thus we may imagine that
hidden variables drawn from an ensemble of possible values guide Alice
and Bob to make correlated decisions. These correlations are limited
by locality — Alice does not know Bob’s input and Bob does not know
Alice’s. In fact, we have learned that for playing this game their shared
randomness is of no value — their best strategy does not use the shared
randomness at all.
But if Alice and Bob share quantum entanglement, they can devise
a better strategy. Based on the value of her input bit, Alice decides to
measure one of two Hermitian observables with eigenvalues ±1: a if x = 0
and a0 is x = 1. Similarly, Bob measures b if y = 0 and b0 if y = 1. Then
the quantum-mechanical expectation values of these observables satisfy
the Cirel’son inequality
√
habi + hab0 i + ha0 bi − ha0 b0 i ≤ 2 2 , (4.57)
and the probability that Alice and Bob win the game is constrained by
√
2 (p00 + p01 + p10 + p11 ) − 4 ≤ 2 2 , (4.58)
or
1 1 1
hpi ≡ (p00 + p01 + p10 + p11 ) ≤ + √ ≈ .853 .
4 2 2 2
(4.59)
Any pure state of two qubits can be expressed this way in the Schmidt
basis; with suitable phase conventions, α and β are real and nonnegative.
Suppose that Alice and Bob both measure along an axis in the x-z
plane, so that their observables are
(A) (A)
a = σ3 cos θA + σ 1 sin θA ,
(B) (B)
b = σ3 cos θB + σ 1 sin θB . (4.62)
(and
√ we recover cos(θA − θB ) in the maximally entangled case α = β =
1/ 2). Now let us consider, for simplicity, the (nonoptimal!) special case
0 π 0
θA = 0, θA = , θB = −θB , (4.65)
2
so that the quantum predictions are:
4.3.5 Photons
Experiments that test the Bell inequality usually are done with entangled
photons, not with spin- 21 objects. What are the quantum-mechanical
predictions for photons?
Recall from §2.2.2 that for a photon traveling in the ẑ direction, we use
the notation |xi, |yi for the states that are linearly polarized along the x
and y axes respectively. In terms of these basis states, the states that are
linearly polarized along “horizontal” and “vertical” axes that are rotated
by angle θ relative to the x and y axes can be expressed as
cos θ − sin θ
|H(θ)i = , |V (θ)i = . (4.69)
sin θ cos θ
cos 2θ sin 2θ
τ (θ) ≡ |H(θ)ihH(θ)| − |V (θ)ihV (θ)| = .
sin 2θ − cos 2θ
(4.70)
1 1 1 i
|+i = √ , |−i = √ . (4.71)
2 i 2 1
Suppose that an excited atom emits two photons that come out back to
back, with vanishing angular momentum and even parity. The two-photon
states
|+iA |−iB
|−iA |+iB (4.72)
are invariant under rotations about ẑ. The photons have opposite val-
ues of Jz , but the same helicity (angular-momentum along the axis of
propagation), since they are propagating in opposite directions. Under a
4.3 More Bell inequalities 25
or
Recall that for the measurement of qubits on the Bloch sphere, we found
the similar expression cos θ, where θ is the angle between Alice’s polariza-
tion axis and Bob’s. Here we have cos 2θ instead, because photons have
spin-1 rather than spin- 12 .
If Alice measures one of the two observables a = τ (A) (θA ) or a0 =
τ (θA ) and Bob measures either b = τ (B)(θB ) or b0 = τ (B)(θB ), then
(A) 0
1 0
√ = cos 2(θB − θA ) = cos 2(θB − θA )
2
0 0 0
= = cos 2(θB − θA ) = − cos 2(θB − θA ). (4.80)
separated. The results were consistent with the quantum predictions, and
violated the CHSH inequality by five standard deviations. Since Aspect,
many other experiments have confirmed this finding, including ones in
which detectors A and B are kilometers apart.
As we have seen, by doing so, she transforms |φ+ iAB to one of 4 mutually
orthogonal states:
1) |φ+ iAB ,
2) |ψ + iAB ,
4) |φ− iAB .
Now, she sends her qubit to Bob, who receives it and then performs
an orthogonal collective measurement on the pair that projects onto the
maximally entangled basis. The measurement outcome unambiguously
distinguishes the four possible actions that Alice could have performed.
Therefore the single qubit sent from Alice to Bob has successfully carried
2 bits of classical information! Hence this procedure is called “dense
coding.”
A nice feature of this protocol is that, if the message is highly con-
fidential, Alice need not worry that an eavesdropper will intercept the
transmitted qubit and decipher her message. The transmitted qubit has
density matrix ρA = 12 I A , and so carries no information at all. All the
information is in the correlations between qubits A and B, and this infor-
mation is inaccessible unless the adversary is able to obtain both members
of the entangled pair. (Of course, the adversary can “jam” the channel,
preventing the information from reaching Bob.)
From one point of view, Alice and Bob really did need to use the channel
twice to exchange two bits of information. For example, we can imagine
that Alice prepared the state |φ+ i herself. Last year, she sent half of the
state to Bob, and now she sends him the other half. So in effect, Alice
has sent two qubits to Bob in one of four mutually orthogonal states, to
convey two classical bits of information as the Holevo bound allows.
Still, dense coding is rather weird, for several reasons. First, Alice sent
the first qubit to Bob long before she knew what her message was going
to be. Second, each qubit by itself carries no information at all; all the
information is encoded in the correlations between the qubits. Third, it
would work just as well for Bob to prepare the entangled pair and send
half to Alice; then two classical bits are transmitted from Alice to Bob
by sending a single qubit from Bob to Alice and back again.
Anyway, when an emergency arose and two bits had to be sent immedi-
ately while only one use of the channel was possible, Alice and Bob could
exploit the pre-existing entanglement to communicate more efficiently.
They used entanglement as a resource.
30 4 Quantum Entanglement
2
F = |hϕ|ψi|2 = , (4.81)
3
This fidelity is better than could have been achieved if Bob had merely
chosen a state at random (F = 21 ), but it is not nearly as good as the
fidelity that Bob requires. Furthermore, as we will see in Chapter 5,
there is no protocol in which Alice measures the qubit and sends classical
information to Bob that achieves a fidelity better than 2/3.
Fortunately, Alice and Bob recall that they share the maximally en-
tangled state |φ+ iAB , which they prepared last year. Why not use the
entanglement as a resource? If they are willing to consume the shared
entanglement and communicate classically, can Alice send her qubit to
Bob with fidelity better than 2/3?
In fact they can achieve fidelity F = 1, by carrying out the following
protocol: Alice unites the unknown qubit |ψiC she wants to send to Bob
with her half of the |φ+ iAB pair that she shares with Bob. She measures
the two commuting observables
This action transforms Bob’s qubit (his member of the entangled pair that
he initially shared with Alice) into a perfect copy of |ψiC . This magic
trick is called quantum teleportation.
How does it work? We merely note that for |ψi = a|0i + b|1i, we may
write
1
|ψiC |φ+ iAB = (a|0iC + b|1iC ) √ (|00iAB + |11iAB )
2
1
= √ (a|000iCAB + a|011iCAB + b|100iCAB + b|111iCAB )
2
1 1
= a(|φ+ iCA + |φ− iCA )|0iB + a(|ψ + iCA + |ψ −iCA )|1iB
2 2
1 1
+ b(|ψ iCA − |ψ iCA )|0iB + b(|φ+ iCA − |φ− iCA )|1iB
+ −
2 2
1 +
= |φ iCA (a|0iB + b|1iB )
2
1 +
+ |ψ iCA (a|1iB + b|0iB )
2
1 −
+ |ψ iCA (a|1iB − b|0iB )
2
1 −
+ |φ iCA (a|0iB − b|1iB )
2
1 1
= |φ+ iCA |ψiB + |ψ + iCA σ 1 |ψiB
2 2
1 − 1
+ |ψ iCA (−iσ 2 )|ψiB + |φ− iCA σ 3 |ψiB . (4.84)
2 2
Thus we see that when Alice performs the Bell measurement on qubits
C and A, all four outcomes are equally likely. Once Bob learns Alice’s
measurement outcome, he possesses the pure state σ|ψi, where σ is a
known Pauli operator, one of {I, σ1 , σ2 , σ3 }. The action prescribed in
eq. (4.83) restores Bob’s qubit to the initial state |ψi.
Quantum teleportation is a curious procedure. Initially, Bob’s qubit is
completely uncorrelated with the unknown qubit |ψiC , but Alice’s Bell
32 4 Quantum Entanglement
meaning that the resources on the left suffice to simulate the resources on
the right. Entanglement is essential in these protocols. Without ebits, a
qubit is worth only one cbit, and without ebits, a “teleported” qubit has
fidelity F ≤ 2/3.
1 NX−1
|Φi = √ |ii ⊗ |ii . (4.86)
N i=0
Here we have defined the transfer operator (T )BC which has the property
!
X X
T BC |ϕiC = T BC ai |iiC = ai |iiB = |ϕiB ;
i i (4.88)
it maps a state in C to the identical state in B. This property has no
invariant meaning independent of the choice of basis in B and C; rather
T BC just describes an arbitrary way to relate the orthonormal bases of
the two systems. Of course, Alice and Bob would need to align their bases
in some way to verify that teleportation has really succeeded.
Now recall that any other N × N maximally entangled state has a
Schmidt decomposition of the form
N −1
1 X 0
√ |i i ⊗ |ii , (4.89)
N i=0
where
U |ii = |i0 i =
X
|jiUji . (4.91)
j
Writing
1 X
|Φ(U )iAB = √ |jiA ⊗ |iiB Uji , (4.92)
N i,j
T 1
CA hΦ(U )|Φ(V )iAB = V U −1 T BC , (4.93)
N B
34 4 Quantum Entanglement
for some unitary U a . Also, we can easily see how the teleportation proto-
col should be modified if the initial maximally entangled state that Alice
and Bob share is not |ΦiAB but rather
If Alice’s measurement outcome is |Φ(U a)iCA , then eq. (4.93) tells us that
the state Bob receives is
V U −1
a |ψiB . (4.98)
Therefore, Eve will not be able to learn anything about Alice’s and Bob’s
measurement results by measuring her qubits. The random key is secure.
(A) (B) (A) (B)
To verify the properties σ 1 σ1 = 1 = σ 3 σ3 , Alice and Bob
can sacrifice a portion of their shared key, and publicly compare their
measurement outcomes. They should find that their results are indeed
perfectly correlated. If so they will have high statistical confidence that
Eve is unable to intercept the key. If not, they have detected Eve’s nefar-
ious activity. They may then discard the key, and make a fresh attempt
to establish a secure key.
As I have just presented it, the quantum key distribution protocol seems
to require entangled pairs shared by Alice and Bob, but this is not really
so. We might imagine that Alice prepares the |φ+ i pairs herself, and then
measures one qubit in each pair before sending the other to Bob. This is
completely equivalent to a scheme in which Alice prepares one of the four
states
| ↑z i, | ↓z i, | ↑x i, | ↓x i, (4.103)
(chosen at random, each occuring with probability 1/4) and sends the
qubit to Bob. Bob’s measurement and the verification are then carried
out as before. This scheme (known as the BB84 quantum key distribution
protocol) is just as secure as the entanglement-based scheme.†
Another intriguing variation is called the “time-reversed EPR” scheme.
Here both Alice and Bob prepare one of the four states in eq. (4.103),
and they both send their qubits to Charlie. Then Charlie performs a Bell
(A) (B) (A) (B)
measurement on the pair — that is, he measures σ1 σ 1 and σ 3 σ 3 ,
orthogonally projecting out one of |φ± i|ψ ±i, and he publicly announces
the result. Since all four of these states are simultaneous eigenstates of
(A) (B) (A) (B)
σ 1 σ 1 and σ 3 σ 3 , when Alice and Bob both prepared their spins
along the same axis (as they do about half the time) they share a single
bit.‡ Of course, Charlie could be allied with Eve, but Alice and Bob
can verify that Charlie and Eve have acquired no information as before,
by comparing a portion of their key. This scheme has the advantage
that Charlie could operate a central switching station by storing qubits
received from many parties, and then perform his Bell measurement when
two of the parties request a secure communication link. (Here we assume
that Charlie has a stable quantum memory in which qubits can be stored
†
Except that in the EPR scheme, Alice and Bob can wait until just before they need
to talk to generate the key, thus reducing the risk that Eve might at some point
burglarize Alice’s safe to learn what states Alice prepared (and so infer the key).
‡
Until Charlie makes his measurement, the states prepared by Bob and Alice are
totally uncorrelated. A definite correlation (or anti-correlation) is established after
Charlie performs his measurement.
4.5 Quantum cryptography 39
4.5.2 No cloning
The security of quantum key distribution is based on an essential differ-
ence between quantum information and classical information. It is not
possible to acquire information that distinguishes between nonorthogonal
quantum states without disturbing the states.
For example, in the BB84 protocol, Alice sends to Bob any one of the
four states | ↑z i| ↓z i| ↑x i| ↓x i, and Alice and Bob are able to verify that
40 4 Quantum Entanglement
This is not the state |ψi ⊗ |ψi (a tensor product of the original and the
copy); rather it is something very different – an entangled state of the
two qubits.
To consider the most general possible quantum Xerox machine, we allow
the full Hilbert space to be larger than the tensor product of the space of
the original and the space of the copy. Then the most general “copying”
unitary transformation acts as
therefore, if hψ|ϕi =
6 0, then
1 = hψ|ϕihe|f i. (4.110)
|hψ|ϕi| = 1, (4.111)
so that |ψi and |ϕi actually represent the same ray. No unitary ma-
chine can make a copy of both |ϕi and |ψi if |ϕi and |ψi are distinct,
nonorthogonal states. This result is called the no-cloning theorem.
where the pi ’s are positive and sum to one. Alternatively, we may say
that ρAB is separable if and only if it can be expressed as
X
ρAB = pij ρA,i ⊗ ρB,j , (4.113)
i,j
42 4 Quantum Entanglement
where each ρA,i and ρB,j is a density operator, and the pij ’s are positive
and sum to one. Thus if a state is separable, the correlations between the
state of part A and the state of part B are entirely classical, and embodied
by the joint probability distribution pij . The two criteria eq. (4.112) and
eq. (4.113) are equivalent because ρA,i and ρB,j can be realized as an
ensemble of pure states.
Of course, it may be possible to realize a separable mixed state as an
ensemble of entangled pure states as well. A simple example is that the
random state ρ = 41 I ⊗ I of two qubits can be expressed as either
1
ρ= (|00ih00| + |01ih10| + |10ih01| + |11ih11|)
4 (4.114)
1
|φ+ ihφ+ | + |φ− ihφ− | + |ψ + ihψ +| + |ψ −ihψ −|
ρ=
4 (4.115)
1 NX
−1
|Φi = √ |iiA ⊗ |iiB (4.118)
N i=0
has density operator
1 X
ρ= |iiihjj| . (4.119)
N i,j
We will say that a bipartite density operator is PPT (for “positive partial
transpose) if its partial transpose is nonnegative.
Thus, if we are presented with a density operator ρAB , we may compute
the eigenvalues of ρPAB T ; if negative eigenvalues are found, then ρ
AB is
known to be inseparable. But because the PPT condition is necessary
but not sufficient for separability, if ρPAB T is found to be nonnegative,
This operator has a negative eigenvalue if λ > 1/3, and we conclude that
the Werner state is inseparable for λ > 1/3. Therefore, if half of the
maximally entangled state |φ+ i is subjected to the depolarizing channel
with error probability p < 1/2, the resulting state remains entangled.
Although we won’t prove it here, it turns out that for the case of two-
qubit states, the PPT criterion is both necessary and sufficient for sepa-
rability. Thus the Werner state with λ < 1/3 (or F < 1/2) is separable.
While we found that a bipartite pure state is entangled if and only if
it violates some Bell inequality, this equivalence does not hold for mixed
states. You will show in Exercise 4.?? that for a Werner state with
λ = 1/2 (or any smaller value of λ) there is a local hidden-variable theory
that fully accounts for the correlations between measurements of Alice’s
qubit and Bob’s. Thus, Werner states with 1/3 < λ < 1/2 are inseparable
states that violate no Bell inequality.
Oddly, though a Werner state with 1/3 < λ < 1/2 is not Bell-inequality
violating, it is nonetheless a shared resource more powerful than classical
randomness. You will also show in Exercise 4.?? that by consuming a
Werner state Alice and Bob can teleport a qubit in an unknown state
with fidelity
1
Fteleport = (1 + λ) . (4.127)
2
This fidelity exceeds the maximal fidelity Fteleport = 2/3 that can be
achieved without any shared entanglement, for any λ > 1/3 — that is,
for any inseparable Werner state, whether Bell-inequality violating or not.
Even if well described by local hidden variables, an entangled mixed state
can be useful.
It seems rather strange that shared entangled states described by local
hidden-variable theory should be a more powerful resource than classical
shared randomness. Further observations to be discussed in §5.?? will
deepen our grasp of the situation. There we will find that if Alice and
Bob share many copies of the Werner state ρ(λ) with 1/3 < λ < 1/2,
then while local hidden variables provide an adequate description of the
correlations if Alice and Bob are restricted to measuring the pairs one at
a time, violations of Bell inequalities still arise if they are permitted to
perform more general kinds of measurements. These observations illus-
trate that mixed-state entanglement is a surprisingly subtle and elusive
concept.
Now, since Alice has a pure state, and so does Bob, we might expect them
to be able to devise a winning strategy. But on further reflection, this
is not so obvious. Though the states {|ψiiAB } in Charlie’s ensemble are
mutually orthogonal, the states {|αiiA } that Alice could receive need not
be mutually orthogonal, and the same is true of the states {|βi iB } that
Bob could receive.
Indeed, even under the new rules, there is no winning strategy for Alice
and Bob in general. Though Charlie sends a pure state to Alice and a
pure state to Bob, there is no way for Alice and Bob, using LOCC, to fully
decipher the message that Charlie has sent to them. This phenomenon is
called nonlocality without entanglement.
The best way to understand nonlocality without entanglement is to con-
sider an example. Suppose that Alice and Bob share a pair of qutrits (3-
level quantum systems), and denote the three elements of an orthonormal
basis for the qutrit by {|0i, |1i, |2i}. In a streamlined notation, Charlie’s
ensemble of nine mutually orthogonal states is
|ψi1,2 = |0, 0 ± 1i ,
|ψi3,4 = |0 ± 1, 2i ,
|ψi5,6 = |2, 1 ± 2i ,
|ψi7,8 = |1 ± 2, 0i ,
|ψi9 = |1, 1i . (4.129)
4.7 Nonlocality without entanglement 47
(Here, |0, 0 ± 1i denotes |0iA ⊗ √12 (|0iB ± |1iB ), etc.) For ease of visual-
ization, it is very convenient to represent this basis pictorially, as a tiling
of a square by rectangles:
Bob
0 1 2
0 |ψ1,2i
|ψ3,4i
Alice 1 |ψ9 i
|ψ7,8i
2 |ψ5,6i
Bob
0 1 2
Bob
0 1 2
0 |ψ1,2i
Alice |ψ3,4i
1 |ψ7,8i |ψ9 i
Once again, Alice and Bob have lost any hope of distinguishing |ψ7 i from
|ψ8 i, but in a few more rounds of LOCC, they can successfully identify
any of the other five states. Bob projects onto |2i or its complement; if he
finds |2ih2|, then Alice projects onto |0 ± 1i to complete the protocol. If
Bob’s outcome is |0ih0| + |1ih1|, then Alice projects onto {|0i, |1i}; finally
Bob measures in either the |0±1i basis (if Alice found |0i) or the {|0i, |1i}
basis (if Alice found |1i).
By choosing one of nine mutually orthogonal product states, Charlie
has sent two trits of classical information to Alice and Bob. But their
LOCC protocol, which fails to distinguish |ψ7 i from |ψ8 i, has not been
able to recover all of the information in Charlie’s message. Of course, this
is just one possible protocol, but one can prove (we won’t here) that no
LOCC protocol can extract two trits of classical information. The trouble
is that with LOCC, Alice and Bob cannot fully “dissect” the square into
nonoverlapping rectangles. This is nonlocality without entanglement.
Alice: You know, Bob, we really ought to help Charlie. Can you think
of a neat experiment that the three of us can do together?
Bob: Well, I dunno, Alice. There are a lot of experiments I’d like to do
with our entangled pairs of qubits. But in each experiment, there’s
one qubit for me and one for you. It looks like Charlie’s the odd
man out.
Alice: [Long pause] Bob . . . . Have you ever thought of doing an exper-
iment with three qubits?
50 4 Quantum Entanglement
Bob’s jaw drops and his pulse races. In a sudden epiphany, his whole
future career seems mapped out before him. Truth be told, Bob was
beginning to wonder if pairs of qubits were getting to be old hat. Now
he knows that for the next five years, he will devote himself slavishly to
performing the definitive three-qubit experiment. By that time, he, Alice,
and Charlie will have trained another brilliant student, and will be ready
for a crack at four qubits. Then another student, and another qubit. And
so on to retirement.
Here is the sort of three-qubit experiment that Alice and Bob decide
to try: Alice instructs her technician in her lab at Caltech to prepare
carefully a state of three quantum boxes. (But Alice doesn’t know exactly
how the technician does it.) She keeps one box for herself, and she ships
the other two by quantum express, one to Bob and one to Charlie. Each
box has a ball inside that can be either black or white, but the box is
sealed tight shut. The only way to find out what is inside is to open the
box, but there are two different ways to open it — the box has two doors,
clearly marked X and Y . When either door opens, a ball pops out whose
color can be observed. It isn’t possible to open both doors at once.
Alice, Bob, and Charlie decide to study how the boxes are correlated.
They conduct many carefully controlled trials. Each time, one of the
three, chosen randomly, opens door X, while the other two open door
Y. Lucky as ever, Alice, Bob, and Charlie make an astonishing discovery.
They find that every single time they open the boxes this way, the number
of black balls they find is always odd.
That is, Alice, Bob and Charlie find that when they open door X on
one box and door Y on the other two, the colors of the balls in the boxes
are guaranteed to be one of
0A 0B 1C , 0A 1B 0C , 1A 0B 0C , 1A 1B 1C ,
(4.132)
1A 1B 0C , 1A 0B 1C , 0A 1B 1C , 0A 0B 0C .
(4.133)
Even after all the acclaim showered upon the three-coin experiment,
Alice, Bob, and Charlie have never quite shaken their attachment to Ein-
stein locality. One day they are having a three-way conference call:
Alice: You know, guys, sometimes I just can’t decide whether to open
door X or door Y of my box. I know I have to choose carefully . . .
If I open door X, that’s sure to disturb the box; so I’ll never know
what would have happened if I had opened door Y instead. And
if I open door Y , I’ll never know what I would have found if I had
opened door X. It’s frustrating!
Bob: Alice, you’re so wrong! Our experiment shows that you can have
it both ways. Don’t you see? Let’s say that you want to know what
will happen when you open door X. Then just ask Charlie and me
to open door Y of our boxes and to tell you what we find. You’ll
know absolutely for sure, without a doubt, what’s going to happen
when you open door X. We’ve tested that over and over again, and
it always works. So why bother to open door X? You can go ahead
and open door Y instead, and see what you find. That way, you
really do know the result of opening both doors!
Charlie: But how can you be sure? If Alice opens door Y , she passes
up the opportunity to open door X. She can’t really ever have it
both ways. After she opens door Y , we can never check whether
opening door X would have given the result we expected.
Bob: Oh come on, how can it be otherwise? Look, you don’t really
believe that what you do to your box in Princeton and I do to mine
in Chicago can exert any influence on what Alice finds when she
opens her box in Pasadena, do you? When we open our boxes, we
can’t be changing anything in Alice’s box; we’re just finding the
information we need to predict with certainty what Alice is going
to find.
Charlie: Well, maybe we should do some more experiments to find out
if you’re right about that.
Indeed, the discovery of the three-box correlation has made Alice and
Bob even more famous than before, but Charlie hasn’t gotten the credit
he deserves — he still doesn’t have tenure. No wonder he wants to do
more experiments! He continues:
Like, maybe we should see what happens if we open the same door
on all three boxes. We could try opening three X doors.
Bob: Oh, come on! I’m tired of three boxes. We already know all about
three boxes. It’s time to move on, and I think Diane is ready to
help out. Let’s do four boxes!
Alice: No, I think Charlie’s right. We can’t really say that we know
everything there is to know about three boxes until we’ve experi-
mented with other ways of opening the doors.
Bob: Forget it. They’ll never fund us! After we’ve put all that effort
into opening two Y ’s and an X, now we’re going to say we want to
open three X’s? They’ll say we’ve done whiffnium and now we’re
proposing whaffnium . . . We’ll sound ridiculous!
Alice: Bob has a point. I think that the only way we can get funding
to do this experiment is if we can make a prediction about what
will happen. Then we can say that we’re doing the experiment to
test the prediction. Now, I’ve heard about some theorists named
Greenberger, Horne, Zeilinger, and Mermin (GHZM). They’ve been
thinking a lot about our three-box experiments; maybe they’ll be
able to suggest something.
Bob: Well, these boxes are my life, and they’re just a bunch of theorists.
I doubt that they’ll have anything interesting or useful to say. But
I suppose it doesn’t really matter whether their theory makes any
sense . . . If we can test it, then even I will accept that we have a
reason for doing another three-box experiment.
And so it happens that Alice, Bob, and Charlie make the pilgrimage
to see GHZM. And despite Bob’s deep skepticism, GHZM make a very
interesting suggestion indeed:
GHZM: Bob says that opening a box in Princeton and a box in Chicago
can’t possibly have any influence on what will happen when Alice
opens a box in Pasadena. Well, let’s suppose that he’s right. Now
you guys are going to do an experiment in which you all open your
X doors. No one can say what’s going to happen, but we can reason
this way: Let’s just assume that if you had opened three Y doors,
you would have found three white balls. Then we can use Bob’s
argument to see that if you open three X doors instead, you will
have to find three black balls. It goes like this: if Alice opens X,
Bob opens Y , and Charlie opens Y , then you know for certain that
the number of black balls has to be odd. So, if we know that Bob
4.8 Multipartite entanglement 53
and Charlie both would find white when they open door Y , then
Alice has to find black when she opens door X. Similarly, if Alice
and Charlie both would find white when they open Y , then Bob has
to find black when he opens X, and if Alice and Bob both would
find white when they open Y , then Charlie must find black when
he opens X. So we see that§
YA YB YC = 000 −→ XA XB XC = 111 . (4.134)
Don’t you agree?
Bob: Well, maybe that’s logical enough, but what good is it? We don’t
know what we’re going to find inside a box until we open it. You’ve
assumed that we know YA YB YC = 000, but we never know that
ahead of time.
GHZM: Sure, but wait. Yes, you’re right that we can’t know ahead of
time what we would find if we opened door Y on each box. But
there are only eight possibilities for three boxes, and we can easily
list them all. And for each of those eight possibilities for YA YB YC we
can use the same reasoning as before to infer the value of XA XB XC .
We obtain a table, like this:
YA YB YC = 000 −→ XA XB XC = 111
YA YB YC = 001 −→ XA XB XC = 001
YA YB YC = 010 −→ XA XB XC = 010
YA YB YC = 100 −→ XA XB XC = 100
YA YB YC = 011 −→ XA XB XC = 100
YA YB YC = 101 −→ XA XB XC = 010
YA YB YC = 110 −→ XA XB XC = 001
YA YB YC = 111 −→ XA XB XC = 111 (4.135)
§
Here 0 stands for white and 1 stands for black; YA is what Alice finds when she opens
door Y on her box, and so on.
54 4 Quantum Entanglement
generously funded. Finally the long awaited day arrives when they are to
carry out the experiment for the first time. And when Alice, Bob, and
Charlie each open door X on their boxes, can you guess what they find?
Three white balls. Whaaaa??!!
Suspecting an error, Alice and Bob and Charlie repeat the experiment,
very carefully, over and over and over again. And in every trial, every
single time, they find an even number of black balls when they open door
X on all three boxes. Sometimes none, sometimes two, but never one
and never three. What they find, every single time, is just the opposite
of what GHZM had predicted would follow from the principle of Einstein
locality!
Desperation once again drives Alice, Bob, and Charlie into the library,
seeking enlightenment. After some study of a quantum mechanics text-
book, and a thorough interrogation of Alice’s lab technician, they realize
that their three boxes had been prepared in a GHZM quantum state
1
|ψiABC = √ (|000iABC + |111iABC ) , (4.136)
2
a simultaneous eigenstate with eigenvalue one of the three observables
Z A ⊗ ZB ⊗ I C , I A ⊗ ZB ⊗ ZC , XA ⊗ X B ⊗ XC .
(4.137)
And since ZX = iY , they realize that this state has the properties
Y A ⊗ Y B ⊗ XC = −1
XA ⊗ Y B ⊗ Y C = −1
Y A ⊗ XB ⊗ Y C = −1
XA ⊗ X B ⊗ XC = 1. (4.138)
In opening the box through door X or door Y , Alice, Bob, and Charlie
are measuring the observable X or Y , where the outcome 1 signifies a
white ball, and the outcome −1 a black ball. Thus if the three qubit state
eq. (4.136) is prepared, eq. (4.138) says that an odd number of black balls
will be found if door Y is opened on two boxes and door X on the third,
while an even number of black balls will be found if door X is opened
on all three boxes. This behavior, unambiguously predicted by quantum
mechanics, is just what had seemed so baffling to Alice, Bob, and Charlie,
and to their fellow die-hard advocates of Einstein locality.
After much further study of the quantum mechanics textbook, Alice,
Bob, and Charlie gradually come to recognize the flaw in their reasoning.
They learn of Bohr’s principle of complementarity, of the irreconcilable
incompatibility of noncommuting observables. And they recognized that
4.8 Multipartite entanglement 55
ZZIII . . . I ,
IZZII . . . I ,
IIZZI . . . I ,
...
III . . . IZZ ,
XX . . . XX . (4.139)
• Each qubit is maximally entangled with the rest. That is, if we trace
over the other n − 1 qubits, the qubit’s density operator is ρ = 21 I.
For this reason, it is sometimes said that a cat state is a maximally
entangled state of n qubits.
• But this is a rather misleading locution. Because its parity and phase
bits are treated quite asymmetrically, the cat is not so profoundly
entangled as some other multiqubit states that we will encounter in
Chapter 7. For example, for the cat state with x = 000 . . .0, if we
trace over n − 2 qubits, the density operator of the remaining two
is
1
ρ2−qubit = |00ih00| + |11ih11| , (4.141)
2
which has rank two rather than four. Correspondingly, we can ac-
quire a bit of information about a cat state (one of its parity bits)
by observing only two of the qubits in the state. Other multiqubit
states, which might be regarded as more highly entangled than cat
states, have the property that the density operator of two (or more)
qubits is proportional to the identity, if we trace over the rest.
• Suppose that Charlie prepares one of the 2n possible cat states and
distributes it to n parties. Using LOCC, the parties can determine
all n − 1 parity bits of the state — each pary measures Z and all
broadcast their results. But by measuring Z they destroy the phase
bit. Alternatively, they can all measure X to determine the phase
bit, but at the cost of destroying all the parity bits.
• Each party, by applying one of {I, X, Y , Z} can transform a given
cat state to any one of four other cat states; that is, the party
can modify the phase bit and one of the n − 1 parity bits. All n
parties, working together, can transform one cat state to any one of
the 2n mutually orthogonal cat states; for example, one party can
manipulate the phase bit while each of the the other n − 1 parties
controls a parity bit.
• If the parties unite, the phase bit and all parity bits can be simulta-
neously measured.
If the parties start out with a product state, the three-qubit cat state
(for example) can be prepared by executing the quantum circuit:
4.8 Multipartite entanglement 57
For the n-party case, a similar circuit with n − 1 CNOT gates does the
job. Thus, to prepare the state, it suffices for the first party to visit each
of the other n − 1 parties. By running the circuit in reverse, a cat state
can be transformed to a product state that can be measured locally.
CCC[f]= the minimum of bits that must be broadcast (in the worst
case) for all the parties to know the value of f(x1 , x2 , x3 , . . . , xn).
Here “in the worst case” means that we maximize the number of bits
of communication required over all possible values for the input strings
x1 , x2 , x3 , . . ., xn .
58 4 Quantum Entanglement
Except that they have been promised that the answer is either 0 or 2m−1 ;
therefore, their function has just a one-bit output.
First consider what strategy the parties should play if they share no
entanglement. Suppose that parties 2 through n broadcast their data,
and that the first party computes f and broadcasts the result. But note
that it is not necessary for the parties to broadcast all of their bits, since
some of the bits cannot affect the answer. Indeed, the k least significant
bits are irrelevant as long as
(n − 1) 2k − 1 < 2m−1 , (4.145)
m − k ≥ log2 (n − 1) + 1 ; (4.147)
4.9 Manipulating entanglement 59
including one more bit for the first party to broadcast the answer, we
conclude that
CCC[f ] ≤ (n − 1) log2 (n − 1) + 1 + 1 . (4.148)
|0i → |0i ,
m
|1i → e2πi(xi /2 ) |1i . (4.150)
Thus the fu
Bob (in Boston) and Claire (in Chicago) share many identically
prepared copies of the two-qubit state
q √ √
|ψi = (1 − 2x) |00i + x |01i + x |10i ,
(4.154)
a) Express the basis {|ϕi, |ϕ⊥i} in terms of the basis {|0i, |1i}.
Bob and Claire now wonder what will happen if they both measure
in the basis {|ϕi, |ϕ⊥i}. Their friend Albert, a firm believer in
local realism, predicts that it is impossible for both to obtain the
outcome ϕ⊥ (a prediction known as Hardy’s theorem). Albert argues
as follows:
When both Bob and Claire measure in the basis {|ϕi, |ϕ⊥i}, it
is reasonable to consider what might have happened if one or
the other had measured in the basis {|0i, |1i} instead.
So suppose that Bob and Claire both measure in the basis
{|ϕi, |ϕ⊥i}, and that they both obtain the outcome ϕ⊥ . Now if
Bob had measured in the basis {|0i, |1i} instead, we can be cer-
tain that his outcome would have been 1, since experiment has
shown that if Bob had obtained 0 then Claire could not have
obtained ϕ⊥ . Similarly, if Claire had measured in the basis
{|0i, |1i}, then she certainly would have obtained the outcome
1. We conclude that if Bob and Claire both measured in the
basis {|0i, |1i}, both would have obtained the outcome 1. But
this is a contradiction, for experiment has shown that it is not
possible for both Bob and Claire to obtain the outcome 1 if
they both measure in the basis {|0i, |1i}.
We are therefore forced to conclude that if Bob and Claire
both measure in the basis {|ϕi, |ϕ⊥i}, it is impossible for both
to obtain the outcome ϕ⊥ .
4.12 Exercises 61
i=1
N
P++ (ab0 ) = N −1 xi yi0 ,
X
i=1
N
P++ (a0 b0 ) = N −1 x0i yi0 ,
X
(4.156)
i=1
where N is the total number of pairs tested. Here e.g P++ (ab)
is the probability that both detectors click when Alice and Bob
orient their detectors along a and b (including the effects of detector
inefficiency).
a) If x, x0, y, y 0 ∈ {0, 1}, show that
xy + xy 0 + x0 y − x0 y 0 ≤ x + y . (4.157)
b) Show that
P++ (ab) + P++ (a0 b) + P++ (ab0 ) − P++ (a0 b0 ) ≤ P+· (a) + P·+ (b) ;
(4.158)
here P+· (a) denotes the probability that Alice’s detector clicks
if oriented along a, and P·+ (b) denotes the probability that
Bob’s detector clicks if oriented along b.
4.12 Exercises 63
cos α sin α
|ui = , |vi = , (4.167)
sin α cos α
where 0 < α < π/4. Alice decides at random to send either |ui
or |vi to Bob, and Bob is to make a measurement to determine
what she sent. Since the two states are not orthogonal, Bob cannot
distinguish the states perfectly.
a) Bob realizes that he can’t expect to be able to identify Alice’s
qubit every time, so he settles for a procedure that is successful
only some of the time. He performs a POVM with three pos-
sible outcomes: ¬u, ¬v, or DON’T KNOW. If he obtains the
result ¬u, he is certain that |vi was sent, and if he obtains ¬v,
he is certain that |ui was sent. If the result is DON’T KNOW,
then his measurement is inconclusive. This POVM is defined
by the operators
cos α sin α
|ψi = , or |ψ̃i = ,
sin α cos α
(4.169)
But now let’s suppose that Eve wants to eavesdrop on the state as it
travels from Alice to Bob. Like Bob, she wishes to extract optimal
information that distinguishes |ψi from |ψ̃i, and she also wants to
minimize the disturbance introduced by her eavesdropping, so that
Alice and Bob are not likely to notice that anything is amiss.
Eve realizes that the optimal POVM can be achieved by measure-
ment operators
where the vectors |φ0 i, and |φ1 i are arbitrary. If Eve performs this
measurement, then Bob receives the state
The purpose of this exercise is to examine how effectively Eve can re-
duce the disturbance by choosing her measurement operators prop-
erly.
where
1 − 2 cos2 α sin2 α cos α sin α
A = ,
cos α sin α 2 cos2 α sin2 α
b) Show that if |φ0 i and |φ1 i are chosen optimally, the minimal
disturbance that can be attained is
1 p
Dmin (cos2 θ) = (1 − 1 − cos2 θ + cos4 θ) .
2
(4.179)
[Hint: We can choose |φ0 i and |φ1 i to maximize the two terms
in eq. (4.177) independently. The maximal value is the maxi-
mal eigenvalue of A, which since the eigenvalues sum to 1, can
1
√
be expressed as λmax = 2 1 + 1 − 4 · det A .] Of course,
Eve could reduce the disturbance further were she willing to
settle for a less than optimal probability of guessing Alice’s
state correctly.
c) Sketch a plot of the function Dmin (cos2 θ). Interpret its value for
cos θ = 1 and cos θ = 0. For what value of θ is Dmin largest?
Find Dmin and (perror)optimal for this value of θ.
σ3 ⊗ σ3 ⊗ I ⊗ I ⊗ · · · ⊗ I ⊗ I ⊗ I
I ⊗ σ3 ⊗ σ3 ⊗ I ⊗ · · · ⊗ I ⊗ I ⊗ I
...
I ⊗ I ⊗ I ⊗ I ⊗ · · · ⊗ I ⊗ σ3 ⊗ σ3
σ 1 ⊗ σ 1 ⊗ σ1 ⊗ · · · ⊗ σ 1 ⊗ σ 1 ⊗ σ1 (4.182)
ρPAB
T
≡ (IA ⊗ TB )(ρAB ) (4.186)
We saw in class that the partial transpose of the Werner state ρ(λ)
is negative for λ > 1/3; therefore, by the Peres-Horodecki criterion,
the Werner state is inseparable for λ > 1/3.
70 4 Quantum Entanglement
Show that
1
(|ΦihΦ|)P T = (I − 2E antisym) , (4.190)
d
where E antisym is the projector onto the space that is antisym-
metric under interchange of the two systems A and B.
b) For what values of λ does the state ρΦ (λ) have a negative partial
transpose?
c) If the Werner state for two qubits is chosen to be
1
ρ(λ) = λ|ψ −ihψ −| + (1 − λ)I , (4.191)
4
then another natural way to generalize the Werner state to a
pair of d-dimensional systems is to consider
!
1 1
ρanti(λ) = λ 1 E antisym + 2 (1 − λ)I . (4.192)
2 d(d − 1)
d
John Preskill
California Institute of Technology
2
5
Classical and quantum circuits
3
4 5 Classical and quantum circuits
the logical OR (∨) of all the f (a) ’s. In binary arithmetic the ∨ operation
of two bits may be represented
x ∨ y = x + y − x · y; (5.5)
it has the value 0 if x and y are both zero, and the value 1 otherwise.
Now consider the evaluation of f (a) . We express the n-bit string x as
it is the logical AND (∧) of all n bits. In binary arithmetic, the AND is
the product
x ∧ y = x · y. (5.8)
For any other x(a) , f (a) is again obtained as the AND of n bits, but where
(a)
the NOT (¬) operation is first applied to each xi such that xi = 0; for
example
f (a) (x) = . . . (¬x3 ) ∧ x2 ∧ x1 ∧ (¬x0 ) (5.9)
if
x(a) = . . . 0110. (5.10)
The NOT operation is represented in binary arithmetic as
¬x = 1 − x. (5.11)
We have now constructed the function f (x) from three elementary log-
ical connectives: NOT, AND, OR. The expression we obtained is called
the “disjunctive normal form” (DNF) of f (x). We have also implicitly
used another operation INPUT(xi ), which inputs the ith bit of x.
5.1 Classical Circuits 5
Our DNF construction shows that any Boolean function with an n-bit
input can be evaluated using no more than 2n OR gates, n2n AND gates,
n2n NOT gates, and n2n INPUT gates, a total of (3n + 1)2n gates. Of
course, some functions can be computed using much smaller circuits, but
for most Boolean functions the smallest circuit that evaluates the function
really does have an exponentially large (in n) number of gates. The point
is that if the circuit size (i.e., number of gates) is subexponential in n,
then there are many, many more functions than circuits.
How many circuits are there with G gates acting on an n-bit input?
Consider the gate set from which we constructed the DNF, where we will
also allow inputting of a constant bit (either 0 or 1) in case we want to
use some scratch space when we compute. Then there are n + 5 different
gates: NOT, AND, OR, INPUT(0), INPUT(1), and INPUT(xi ) for i =
0, 1, 2, . . . n − 1. Each two-qubit gate acts on a pair of bits which are
outputs from preceding gates; this pair can be chosen in fewer than G2
ways. Therefore the total number of size-G circuits can be bounded as
G
Ncircuit (G) ≤ (n + 5)G2 . (5.12)
6 5 Classical and quantum circuits
n
2
If G = c 2n , where c is a constant independent of n, then
where the second inequality holds for n sufficiently large. Comparing with
n
the number of Boolean functions, Nfunction (n) = 22 , we find
Ncircuit (G)
log2 ≤ (c − 1)2n (5.14)
Nfunction (n)
for n sufficiently large. Therefore, for any c < 1, the number of circuits
is smaller than the number of functions by a huge factor. We did this
analysis for one particular universal gate set, but the counting would
not have been substantially different if we had used a different gate set
instead.
We conclude that for any positive ε, then, most Boolean functions re-
2n
quire circuits with at least (1 − ε) 2n gates. The circuit size is so large
because most functions have no structure that can be exploited to con-
struct a more compact circuit. We can’t do much better than consulting
a “look-up table” that stores a list of all accepted strings, which is essen-
tially what the DNF does.
This concept that a problem may be hard to solve, but that a solution
can be easily verified once found, can be formalized. The complexity class
of decision problems for which the answer can be checked efficiently, called
NP, is defined as
Definition. NP. A language L is in NP iff there is a polynomial-size
verifier V (x, y) such that
If x ∈ L, then there exists y such that V (x, y) = 1 (completeness),
If x 6∈ L, then, for all y, V (x, y) = 0 (soundness).
The verifier V is the uniform circuit family that checks the answer. Com-
pleteness means that for each input in the language (for which the answer
is YES), there is an appropriate “witness” such that the verifier accepts
the input if that witness is provided. Soundness means that for each input
not in the language (for which the answer is NO) the verifier rejects the
input no matter what witness is provided. It is implicit that the witness
is of polynomial length, |y| = poly(|x|); since the verifier has a polynomial
number of gates, including input gates, it cannot make use of more than
a polynomial number of bits of the witness. NP stands for “nondeter-
ministic polynomial time;” this name is used for historical reasons, but
it is a bit confusing since the verifier is actually a deterministic circuit
(evaluates a function).
If is obvious that P ⊆ NP; if the problem is in P then the polynomial-
time verifier can decide whether to accept x on its own, without any
help from the witness. But some problems in NP seem to be hard, and
are believed not to be in P. Much of complexity theory is built on a
fundamental conjecture:
For NP the witness y testifies that x is in the language while for co-NP
the witness testifies that x is not in the language. Thus if language L is
in NP, then its complement L̄ is in co-NP and vice-versa. We see that
whether we consider a problem to be in NP or in co-NP depends on how
we choose to frame the question — while “Is there a Hamiltonian path?”
is in NP, the complementary question “Is there no Hamiltonian path?” is
in co-NP.
Though the distinction between NP and co-NP may seem arbitrary, it
is nevertheless interesting to ask whether a problem is in both NP and
co-NP. If so, then we can easily verify the answer (once a suitable witness
is in hand) regardless of whether the answer is YES or NO. It is believed
(though not proved) that NP 6= co-NP. For example, we can show that
a graph has a Hamiltonian path by exhibiting an example, but we don’t
know how to show that it has no Hamiltonian path that way!
If we assume that P 6= NP, it is known that there exist problems in
NP of intermediate difficulty (the class NPI), which are neither in P nor
NP-complete. Furthermore, assuming that that NP 6= co-NP, it is known
that no co-NP problems are NP-complete. Therefore, problems in the
intersection of NP and co-NP, if not in P, are good candidates for inclusion
in NPI.
In fact, a problem in NP ∩ co-NP believed not to be in P is the FAC-
TORING problem. As already noted, FACTORING is in NP because,
if we are offered a factor of x, we can easily check its validity. But it is
also in co-NP, because it is known that if we are given a prime number
we can efficiently verify its primality. Thus, if someone tells us the prime
factors of x, we can efficiently check that the prime factorization is right,
and can exclude that any integer less than y is a divisor of x. Therefore,
it seems likely that FACTORING is in NPI.
We are led to a crude (conjectured) picture of the structure of NP ∪ co-
NP. NP and co-NP do not coincide, but they have a nontrivial intersection.
P lies in NP ∩ co-NP but the intersection also contains problems not in
P (like FACTORING). No NP-complete or co-NP-complete problems lie
in NP ∩ co-NP.
12 5 Classical and quantum circuits
1
N≥ ln (1/ε) . (5.25)
2δ 2
5.2 Reversible computation 13
dissipative process. His insight is that erasure always involves the com-
pression of phase space, and so is thermodynamically, as well as logically,
irreversible.
For example, I can store one bit of information by placing a single
molecule in a box, either on the left side or the right side of a partition that
divides the box. Erasure means that we move the molecule to the right
side (say) irrespective of whether it started out on the left or right. I can
suddenly remove the partition, and then slowly compress the one-molecule
“gas” with a piston until the molecule is definitely on the right side. This
procedure changes the entropy of the gas by ∆S = −k ln 2 (where k is
Boltzmann’s constant) and there is an associated flow of heat from the
box to its environment. If the process is quasi-static and isothermal at
temperature T , then work W = −kT ∆S = kT ln 2 is performed on the
box, work that I have to provide. If I erase information, someone has to
pay the power bill.
Landauer also observed that, because irreversible logic elements erase
information, they too are necessarily dissipative, and therefore require an
unavoidable expenditure of energy. For example, an AND gate maps two
input bits to one output bit, with 00, 01, and 10 all mapped to 0, while
11 is mapped to one. If the input is destroyed and we can read only the
output, then if the output is 0 we don’t know for sure what the input was
— there are three possibilities. If the input bits are chosen uniformly at
random, than on average the AND gate destroys 34 log2 3 ≈ 1.189 bits of
information. Indeed, if the input bits are uniformly random any 2-to-1
gate must “erase” at least one bit on average. According to Landauer’s
principle, then, we need to do an amount of work at least W = kT ln 2 to
operate a 2-to-1 logic gate at temperature T .
But if a computer operates reversibly, then in principle there need be
no dissipation and no power requirement. We can compute for free! At
present this idea is not of great practical importance, because the power
consumed in today’s integrated circuits exceeds kT per logic gate by at
least three orders of magnitude. As the switches on chips continue to
get smaller, though, reversible computing might eventually be invoked to
reduce dissipation in classical computing hardware.
Here y is a bit and ⊕ denotes the XOR gate (addition mod 2) — the
n-bit input x is preserved and the last bit flips iff f (x) = 1. Applying f˜
a second time undoes this bit flip; hence f˜ is invertible, and equal to its
own inverse. If we set y = 0 initially and apply f˜, we can read out the
value of f (x) in the last output bit.
Just as for Boolean functions, we can ask whether a complicated re-
versible computation can be executed by a circuit built from simple com-
ponents — are there universal reversible gates? It is easy to see that
one-bit and two-bit reversible gates do not suffice; we will need three-bit
gates for universal reversible computation.
Of the four 1-bit → 1-bit gates, two are reversible; the trivial gate and
the NOT gate. Of the (24 )2 = 256 possible 2-bit → 2-bit gates, 4! = 24
are reversible. One of special interest is the controlled-NOT (CNOT) or
reversible XOR gate that we already encountered in Chapter 4:
x s x
y g x⊕y
This gate flips the second bit if the first is 1, and does nothing if the first
bit is 0 (hence the name controlled-NOT). Its square is trivial; hence it
inverts itself. Anticipating the notation that will be convenient for our
discussion of quantum gates, we will sometimes use Λ(X) to denote the
CNOT gate. More generally, by Λ(G) we mean a gate that applies the
operation G to a “target” conditioned on the value of a “control bit;” G
is applied if the control bit is 1 and the identity is applied if the control
bit is 0. In the case of the CNOT gate, G is the Pauli operator X, a bit
flip.
16 5 Classical and quantum circuits
The CNOT gate performs a NOT on the second bit if the first bit x is
set to 1, and it performs the copy operation if y is initially set to zero:
x s g s y
y g s g x
With these swaps we can shuffle bits around in a circuit, bringing them
together if we want to act on them with a “local gate” at a fixed location.
To see that the one-bit and two-bit gates are nonuniversal, we observe
that all these gates are linear. Each reversible two-bit gate has an action
of the form 0
x x x a
7→ = M + ; (5.31)
y y0 y b
the pair of bits ab can take any one of four possible values, and the
matrix M is one of the six invertible matrices with binary entries
1 0 0 1 1 1
M= , , ,
0 1 1 0 0 1
1 0 0 1 1 1
, , . (5.32)
1 1 1 1 1 0
(All addition is performed modulo 2.) Combining the six choices for M
with the four possible constants, we obtain 24 distinct gates, exhausting
all the reversible 2 → 2 gates.
Since the linear transformations are closed under composition, any cir-
cuit composed from reversible 2 → 2 (and 1 → 1) gates will compute a
linear function
x 7→ M x + a. (5.33)
But for n ≥ 3, there are invertible functions on n-bits that are nonlinear.
An important example is the 3-bit Toffoli gate (or controlled-controlled-
NOT) Λ2 (X)
Λ2 (X) : (x, y, z) → (x, y, z ⊕ xy); (5.34)
5.2 Reversible computation 17
x s x
y s y
z g z ⊕ xy
it flips the third bit if the first two are 1 and does nothing otherwise, thus
invoking the (nonlinear) multiplication of the two bits x and y. The Λ2 (·)
notation indicates that the operation acting on the target bit is triggered
only if both control bits are set to 1. Like the CNOT gate Λ(X), Λ2 (X)
is its own inverse.
Unlike the reversible 2-bit gates, the Toffoli gate serves as a universal
gate for Boolean logic, if we can provide constant input bits and ignore
output bits. If we fix x = y = 1, then the Toffoli gate performs NOT
acting on the third bit, and if z is set to zero initially, then the Toffoli
gate outputs z = x ∧ y in the third bit. Since NOT/AND/OR are a
universal gate set, and we can construct OR from NOT and AND (x∨y =
¬(¬x ∧ ¬y)), this is already enough to establish that the Toffoli gate is
universal. Note also that if we fix x = 1 the Toffoli gate functions like a
CNOT gate acting on y and z; we can use it to copy.
The Toffoli gate Λ2 (X) is also universal in the sense that we can build a
circuit to compute any reversible function using Toffoli gates alone (if we
can fix input bits and ignore output bits). It will be instructive to show
this directly, without relying on our earlier argument that NOT/AND/OR
is universal for Boolean functions. Specifically, we can show the following:
From the NOT gate and the Toffoli gate Λ2 (X), we can construct any
invertible function on n bits, provided we have one extra bit of scratchpad
space available.
The first step is to show that from the three-bit Toffoli-gate Λ2 (X) we
can construct an n-bit Toffoli gate Λn−1 (X) that acts as
using one extra bit of scratch space. For example, we construct Λ3 (X)
from Λ2 (X)’s with the circuit
18 5 Classical and quantum circuits
x1 s s x1
x2 s s x2
0 g s g 0
x3 s x3
y g y ⊕ x1 x2 x3
The purpose of the last Λ3 (X) gate is to reset the scratch bit back to
its original value 0. Actually, with one more gate we can obtain an im-
plementation of Λ3 (X) that works irrespective of the initial value of the
scratch bit:
x1 s s x1
x2 s s x2
w g s g s w
x3 s s x3
y g g y ⊕ x1 x2 x3
We can see that the scratch bit really is necessary, because Λ3 (X)
is an odd permutation (in fact a transposition) of the 4-bit strings —
it transposes 1111 and 1110. But Λ2 (X) acting on any three of the
four bits is an even permutation; e.g., acting on the last three bits it
transposes both 0111 with 0110 and 1111 with 1110. Since a product of
even permutations is also even, we cannot obtain Λ3 (X) as a product of
Λ2 (X)’s that act only on the four bits.
This construction of Λ3 (X) from four Λ2 (X)’s generalizes immediately
to the construction of Λn−1 (X) from two Λn−2 (X)’s and two Λ2 (X)’s
(just expand x1 to several control bits in the above diagram). Iterating
the construction, we obtain Λn−1 (X) from a circuit with 2n−2 + 2n−3 − 2
Λ2 (X)’s. Furthermore, just one bit of scratch suffices. (With more scratch
space, we can build Λn−1 (X) from Λ2 (X)’s much more efficiently — see
Exercise 5.1.)
The next step is to note that, by conjugating Λn−1 (X) with NOT gates,
we can in effect modify the value of the control string that “triggers” the
gate. For example, the circuit
5.2 Reversible computation 19
x1 g s g
x2 s
x3 g s g
y g
a0 , a1 , a2 , a3 , . . . , as , (5.36)
such that each string in the sequence is Hamming distance one from its
neighbors. Therefore, each of the transpositions
from step k − 1.
To save space in our simulation we want to minimize at all times the
number of steps that have already been computed but have not yet been
uncomputed. The challenge we face can be likened to a game — the
reversible pebble game. The steps to be executed form a one-dimension
directed graph with sites labeled 1, 2, 3, . . . , T . Execution of step k is
modeled by placing a pebble on the kth site of the graph, and executing
step k in reverse is modeled as removal of a pebble from site k. At the
beginning of the game, no sites are covered by pebbles, and in each turn
we add or remove a pebble. But we cannot place a pebble at site k (except
for k = 1) unless site k − 1 is already covered by a pebble, and we cannot
remove a pebble from site k (except for k = 1) unless site k − 1 is covered.
The object is to cover site T (complete the computation) without using
more pebbles than necessary (generating a minimal amount of garbage).
We can construct a recursive procedure that enables us to reach site
t = 2n using n + 1 pebbles and leaving only one pebble in play. Let F1 (k)
denote placing a pebble at site k, and F1 (k)−1 denote removing a pebble
from site k. Then
F2 (1, 2) = F1 (1)F1 (2)F1 (1)−1 , (5.39)
leaves a pebble at site k = 2, using a maximum of two pebbles at inter-
mediate stages. Similarly
F3 (1, 4) = F2 (1, 2)F2 (3, 4)F2 (1, 2)−1 , (5.40)
reaches site k = 4 using three pebbles, and
F4 (1, 8) = F3 (1, 4)F3 (5, 8)F3 (1, 4)−1 , (5.41)
reaches k = 8 using four pebbles. Proceeding this way we construct
Fn (1, 2n ) which uses a maximum of n + 1 pebbles and leaves a single
pebble in play.
Interpreted as a routine for simulating Tirr = 2n steps of an irreversible
computation, this strategy for playing the pebble game represents a re-
versible simulation requiring space Srev scaling like
Srev ≈ Sstep log2 (Tirr /Tstep ) , (5.42)
where Tstep is the number of gates is a single step, and Sstep is the amount
of memory used in a single step. How long does the simulation take?
At each level of the recursive procedure described above, two steps for-
ward are replaced by two steps forward and one step back. Therefore,
an irreversible computation with Tirr /Tstep = 2n steps is simulated in
Trev /Tstep = 3n steps, or
Trev = Tstep (Tirr /Tstep )log 3/ log 2 = Tstep (Tirr /Tstep )1.58 , (5.43)
22 5 Classical and quantum circuits
for any ε > 0. Instead of replacing two steps forward with two forward
and one back, we replace ` forward with ` forward and ` − 1 back. A
recursive procedure with n levels reaches site `n using a maximum of
n(` − 1) + 1 pebbles. Now we have Tirr ∝ `n and Trev ∝ (2` − 1)n , so that
Srev /Sstep ≈ `n ≈ 21/ε log` (Tirr /Tstep ) ≈ ε 21/ε log2 (Tirr /Tstep ) , (5.47)
where 1/ε = log2 `. The required space still scales as Srev ∼ log Tirr , yet
the slowdown is no worse than Trev ∼ (Tirr )1+ε . By using more than the
minimal number of pebbles, we can reach the last step faster.
You might have worried that, because reversible computation is
“harder” than irreversible computation, the classification of complexity
depends on whether we compute reversibly or irreversibly. But don’t
worry — we’ve now seen that a reversible computer can simulate an irre-
versible computer pretty easily.
projecting each measured qubit onto the basis {|0i, |1i}. The outcome of
this measurement is the result of the computation.
Several features of this model invite comment:
(1) Preferred decomposition into subsystems. It is implicit but impor-
tant that the Hilbert space of the device has a preferred decomposition
into a tensor product of low-dimensional subsystems, in this case the
qubits. Of course, we could have considered a tensor product of, say,
qutrits instead. But anyway we assume there is a natural decomposition
into subsystems that is respected by the quantum gates — the gates act
on only a few subsystems at a time. Mathematically, this feature of the
gates is crucial for establishing a clearly defined notion of quantum com-
plexity. Physically, the fundamental reason for a natural decomposition
into subsystems is locality; feasible quantum gates must act in a bounded
spatial region, so the computer decomposes into subsystems that interact
only with their neighbors.
(2) Finite instruction set. Since unitary transformations form a contin-
uum, it may seem unnecessarily restrictive to postulate that the machine
can execute only those quantum gates chosen from a discrete set. We
nevertheless accept such a restriction, because we do not want to invent a
new physical implementation each time we are faced with a new computa-
tion to perform. (When we develop the theory of fault-tolerant quantum
computing we will see that only a discrete set of quantum gates can be
well protected from error, and we’ll be glad that we assumed a finite gate
set in our formulation of the quantum circuit model.)
(3) Unitary gates and orthogonal measurements. We might have allowed
our quantum gates to be trace-preserving completely positive maps, and
our final measurement to be a POVM. But since we can easily simulate
a TPCP map by performing a unitary transformation on an extended
system, or a POVM by performing an orthogonal measurement on an
extended system, the model as formulated is of sufficient generality.
(4) Simple preparations. Choosing the initial state of the n input qubits
to be |00 . . . 0i is merely a convention. We might want the input to be
some nontrivial classical bit string instead, and in that case we would just
include NOT gates in the first computational step of the circuit to flip
some of the input bits from 0 to 1. What is important, though, is that
the initial state is easy to prepare. If we allowed the input state to be a
complicated entangled state of the n qubits, then we might be hiding the
difficulty of executing the quantum algorithm in the difficulty of preparing
the input state. We start with a product state instead, regarding it as
uncontroversial that preparation of a product state is easy.
(5) Simple measurements. We might allow the final measurement to be
a collective measurement, or a projection onto a different basis. But
24 5 Classical and quantum circuits
x → y = f (x), (5.48)
{|xi i, i = 0, 1, . . . 2k − 1} (5.49)
5.3.1 Accuracy
Let’s discuss the issue of accuracy. We imagine that we wish to implement
a computation in which the quantum gates U 1 , U 2 , . . . , U T are applied
sequentially to the initial state |ϕ0 i. The state prepared by our ideal
quantum circuit is
But in fact our gates do not have perfect accuracy. When we attempt
to apply the unitary transformation U t , we instead apply some “nearby”
unitary transformation Ũ t . If we wish to include environmental deco-
herence in our model of how the actual unitary deviates from the ideal
5.3 Quantum Circuits 27
Thus we have expressed the difference between |ϕ̃T i and |ϕT i as a sum of
T remainder terms. The worst case yielding the largest deviation of |ϕ̃T i
from |ϕT i occurs if all remainder terms line up in the same direction, so
that the errors interfere constructively. Therefore, we conclude that
where we have used the property k U |Et i k=k |Et i k for any unitary U .
Let k A ksup denote√ the sup norm of the operator A — that is, the
largest eigenvalue of A† A. We then have
k |Et i k=k Ũ t − U t |ϕt−1 i k ≤ k Ũ t − U t ksup (5.61)
(since |ϕt−1 i is normalized). Now suppose that, for each value of t, the
error in our quantum gate is bounded by
k Ũ t − U t ksup ≤ ε; (5.62)
in this sense, the accumulated error in the state grows linearly with the
length of the computation.
The distance bounded in eq.(5.62) can equivalently be expressed as
k W t − I ksup , where W t = Ũ t U †t . Since W t is unitary, each of its
eigenvalues is a phase eiθ , and the corresponding eigenvalue of W t − I
has modulus
|eiθ − 1| = (2 − 2 cos θ)1/2 , (5.64)
so that eq.(5.62) is the requirement that each eigenvalue satisfies
(or |θ| <∼ ε, for ε small). The origin of eq.(5.63) is clear. In each time
step, |ϕ̃i rotates relative to |ϕi by (at worst) an angle of order ε, and the
distance between the vectors increases by at most of order ε.
How much accuracy is good enough? In the final step of our compu-
tation, we perform an orthogonal measurement, and the probability of
outcome a, in the ideal case, is
It is shown in Exercise 2.5 that the L1 distance between the ideal and
actual probability distributions satisfies
1 1X
kp̃ − pk1 = |p̃(a) − p(a)| ≤ k |ϕ̃T i − |ϕT i k ≤ T ε. (5.68)
2 2 a
5.3 Quantum Circuits 29
and the value hy|U t |xi can be determined (to the required accuracy) by
a simple circuit requiring little memory.
For example, in the case of a single-qubit gate acting on the first qubit,
we have
A simple circuit can compare x1 with y1 , x2 with y2 , etc., and output zero
if the equality is not satisfied. In the event of equality, the circuit outputs
one of the four complex numbers
are not, we can’t reach arbitrary unitaries with finite-size circuits. We’ll
be satisfied to accurately approximate an arbitrary unitary.
As noted in our discussion of quantum circuit accuracy, to ensure that
we have a good approximation in the L1 norm to the probability distri-
bution for any measurement performed after applying a unitary trans-
formation, it suffices for the actual unitary Ũ to be close to the ideal
unitary U in the sup norm. Therefore we will say that Ũ is δ-close to U
if kŨ − U ksup ≤ δ. How large should the circuit size T be if we want to
approximate any n-qubit unitary to accuracy δ?
If we imagine drawing a ball of radius δ (in the sup norm) centered at
each unitary achieved by some circuit with T gates, we want the balls to
cover the unitary group U (N ), where N = 2n . The number Nballs of balls
needed satisfies
Vol(U (N ))
Nballs ≥ , (5.76)
Vol(δ−ball)
where Vol(U (N )) means the total volume of the unitary group and
Vol(δ−ball) means the volume of a single ball with radius δ. The ge-
ometry of U (N ) is actually curved, but we may safely disregard that
subtlety — all we need to know is that U (N )) contains a ball centered
at the identity element with a small but constant radius C (independent
of N ). Ignoring the curvature, because U (N ) has real dimension N 2 , the
2
volume of this ball (a lower bound on the volume of U (N )) is ΩN 2 C N ,
where ΩN 2 denotes the volume of a unit ball in flat space; likewise, the
2
volume of a δ-ball is ΩN 2 δ N . We conclude that
N 2
C
Nballs ≥ . (5.77)
δ
On the other hand, if our universal set contains a constant number of
quantum gates (independent of n), and each gate acts on no more than
k qubits, where k is a constant, then the number of ways to choose the
quantum gate at step t of a circuit is no more than constant × nk =
poly(n). Therefore the number NT of quantum circuits with T gates
acting on n qubits is
NT ≤ (poly(n))T . (5.78)
We conclude that if we want to reach every element of U (N ) to accuracy
δ with circuits of size T , hence NT ≥ Nballs , we require
log(C/δ)
T ≥ 22n ; (5.79)
log(poly(n))
the circuit size must be exponential. With polynomial-size quantum cir-
cuits, we can achieve a good approximation to unitaries that occupy only
an exponentially small fraction of the volume of U (2n )!
5.4 Universal quantum gates 33
G = {U 1 , U 2 , . . . , U m }, (5.80)
then perform the gate, and finally perform SWAP gates to return the
qubits to their original positions.
When we say the gate set G is universal we mean that the unitary
transformations that can be constructed as quantum circuits using this
gate set are dense in the unitary group U (2n ), up to an overall phase.
That is for any V ∈ U (2n ) and any δ > 0, there is a unitary Ṽ achieved
by a finite circuit such that
for some phase eiφ . (It is natural to use the sup norm to define the
deviation of the circuit from the target unitary, but we would reach similar
conclusions using any reasonable topology on U (2n ).) Sometimes it is
useful to relax this definition of universality; for example we might settle
for encoded universality, meaning that the circuits are dense not in U (2n )
but rather some subgroup U (N ), where N is exponential (or at least
superpolynomial) in n.
There are several variations on the notion of universality that are note-
worthy, because they illuminate the general theory or are useful in appli-
cations.
(1) Exact universality. If we are willing to allow uncountable gate sets,
then we can assert that for certain gate sets we can construct a circuit
that achieves an arbitrary unitary transformation exactly. We will see
that two-qubit gates are exactly universal — any element of U (2n ) can be
constructed as a finite circuit of two qubit gates. Another example is that
the two-qubit CNOT gate, combined with arbitrary single-qubit gates, is
exactly universal (Exercise 5.2).
In fact the CNOT gate is not special in this respect. Any “entangling”
two-qubit gate, when combined with arbitrary single-qubit gates, is uni-
versal (Exercise 5.6). We say a two-qubit gate is entangling if it maps
some product state to a state which is not a product state.
An example of a two-gate which is not entangling is a “local gate” —
a product unitary V = A ⊗ B; another example is the SWAP gate, or
any gate “locally equivalent” to SWAP, i.e., of the form
V = (A ⊗ B) (SWAP) (C ⊗ D) . (5.82)
In fact these are the only non-entangling two-qubit gates. Every two-qubit
unitary which is not local or locally equivalent to SWAP is entangling,
and hence universal when combined with arbitrary single-qubit gates.
(2) Generic universality. Gates acting on two or more qubits which are
not local are typically universal. For example, almost any two-qubit gate
is universal, if the gate can be applied to any pair of the n qubits. By
“almost any” we mean except for a set of measure zero in U (4).
5.4 Universal quantum gates 35
etc.
That particular finite gates sets are universal is especially important
in the theory of quantum fault tolerance, in which highly accurate logi-
cal gates acting on encoded quantum states are constructed from noisy
physical gates. As we’ll discuss in Chapter 8, only a discrete set of log-
ical gates can be well protected against noise, where the set depends on
how the quantum information is encoded. The goal of fault-tolerant gate
constructions is to achieve a universal set of such protected gates.
(4) Efficient circuits of universal gates. The above results concern only
the “reachability” of arbitrary n-qubit unitaries; they say nothing about
the circuit size needed for a good approximation. Yet the circuit size is
highly relevant if we want to approximate one universal gate set by using
another one, or if we want to approximate the steps in an ideal quantum
algorithm to acceptable accuracy.
We already know that circuits with size exponential in n are needed to
approximate arbitrary n-qubit unitaries using a finite gate set. However,
we will see that, for any fixed k, a k-qubit unitary can be approximated to
accuracy ε using a circuit whose size scales with the error like polylog(1/ε).
This result, the Solovay-Kitaev theorem, holds for any universal gate set
which is “closed under inverse” — that is, such that the inverse of each
gate in the set can be constructed exactly using a finite circuit.
36 5 Classical and quantum circuits
log(C/2δ)
T polylog(T /δ) ≥ 22n . (5.87)
log n
if we use circuits constructed from arbitrary k-qubit gates. The required
circuit size is smaller than exponential by only a poly(n) factor. The group
U (2n ) is unimaginably vast not because we are limited to a discrete set
of gates, but rather because we are unable to manipulate more than a
constant number of qubits at a time.
as a direct sum
U = U (2) ⊕ I (N −2) , (5.88)
where U (2) is a 2 × 2 unitary matrix acting on the span of |ii and |ji,
and I (N −2) is the identity matrix acting on the complementary (N − 2)-
dimensional subspace.
We should be careful not to confuse a 2 × 2 unitary with a two-qubit
unitary acting on the n-qubit space of dimension N = 2n . A two-qubit
unitary U decomposes as a tensor product
n−2 )
U = U (4) ⊗ I (2 , (5.89)
n−2
where U (4) is a 4×4 unitary matrix acting on a pair of qubits, and I (2 )
is the identity matrix acting on the remaining n−2 qubits. We can regard
the two-qubit unitary as a direct sum of 2n−2 4×4 blocks, with each block
labeled by a basis state of the (n−2)-qubit Hilbert space, and U (4) acting
on each block.
Let’s see how to express U ∈ U (N ) as a product of 2 × 2 unitaries.
Consider the action of U on the basis state |0i:
N
X −1
U |0i = ai |ii. (5.90)
i=0
a product of (N − 1) + (N − 2) + · · · + 2 + 1 = 21 N (N − 1) 2 × 2 unitaries.
Now it remains to show that we can construct any 2 × 2 unitary as a
circuit of two-qubit unitaries. It will be helpful to notice that the three-
qubit gate Λ2 (U 2 ) can be constructed as a circuit of Λ(U ), Λ(U † ), and
Λ(X) gates. Using the notation
x s x s s s x
y s = y s g s g y
U2 U U† U
does the job. We can check that the power of U applied to the third qubit
is
y − (x ⊕ y) + x = y − (x + y − 2xy) + x = 2xy. (5.94)
That is, U 2 is applied if x = y = 1, and the identity is applied otherwise;
hence this circuit achieves the Λ2 (U 2 ) gate. Since every unitary V has
a square root U such that V = U 2 , the construction shows that, using
two-qubit gates, we can achieve Λ2 (V ) for any single-qubit V .
Generalizing this construction, we can find a circuit that constructs
Λ (U 2 ) using Λm−1 (U ), Λm−1 (X), Λ(U ), and Λ(U † ) gates. If we replace
m
the Λ(X) gates in the previous circuit by Λm−1 (X) gates, and replace
the last Λ(U ) gate by Λn−1 (U ), then, if we denote the m control bits by
x1 , x2 , x3 , . . . xm , the power of U applied to the last qubit is
xm + x1 x2 x3 . . . xm−1 − (xm ⊕ x1 x2 x3 . . . xm−1 )
= xm + x1 x2 x3 . . . xm−1
− (xm + x1 x2 x3 . . . xm−1 − 2x1 x2 x3 . . . xm−1 xm )
= 2x1 x2 x3 . . . xm−1 xm , (5.95)
constructing
Σ−1 ◦ Λn−1 (V ) ◦ Σ . (5.98)
This is to be read from right to left, with Σ acting first and Σ−1 acting
last. But we have already seen in §5.2.2 how to construct an arbitrary per-
mutation of computational basis states using Λn−1 (X) gates and (single-
qubit) NOT gates, and we now know how to construct Λn−1 (X) (a special
case of Λn−1 (U )) from two-qubit gates. Therefore, using two-qubit gates,
we have constructed the general 2 × 2 unitary (in the computational ba-
sis) as in eq.(5.98). That completes the proof that any element of U (2n )
can be achieved by a circuit of two-qubit gates. In fact we have proven a
somewhat stronger result: that the two-qubit gates {Λ(U )}, where U is
an arbitrary single-qubit gate, constitute an exactly universal gate set.
We claim that the positive integer powers of U (4πα) are dense in U (1) if
α ∈ [0, 1) is irrational. Equivalently, the points
Since rational numbers are countable and real numbers are not, for a
generic U (that is, for all elements of U (N ) except for a set of measure
zero) each θi /π and θi /θj is an irrational number. For each positive integer
k, the eigenvalues {e−ikθi /2 , i = 1, 2, . . . , N } of U k define a point on the
N -dimensional torus (the product of N circles), and as k ranges over all
positive integers, these points densely fill the whole N -torus. We conclude
that for any generic U , the elements {U k , k = 1, 2, 3, . . . } are dense in
the group U (1)N , i.e., come as close as we please to every unitary matrix
which is diagonal in the same basis as U .
Note that this argument does not provide any upper bound on how
large k must be for U k to be ε-close to any specified element of U (1)N .
In fact, the required value of k could be extremely large if, for some m
and i, |mθi (mod 4π)| << ε. It might be hard (that is, require many
gates) to approximate a specified unitary transformation with circuits of
commuting quantum gates, because the unitary achieved by the circuit
only depends on how many times each gate is applied, not on the order
in which the gates are applied. It is much easier (requires fewer gates)
to achieve a good approximation using circuits of noncommuting gates.
5.4 Universal quantum gates 41
If the gates are noncommuting, then the order in which the gates are
applied matters, and many more unitaries can be reached by circuits of
specified size than if the gates are noncommuting.
Reaching the full Lie algebra. Suppose we can construct the two gates
U = exp (iA) , V = exp (iB) ∈ U (N ), where A and B are N × N
Hermitian matrices. If these are generic gates, positive powers of U come
as close as we please to eiαA for any real α and positive powers of V come
as close as we please to eiβB for any real β. That is enough to ensure
that there is a finite circuit constructed from U and V gates that comes
as close as we please to eiC , where C is any Hermitian element of the Lie
algebra generated by A and B.
We say that a unitary transformation U is reachable if for any ε > 0
there is a finite circuit achieving Ũ which is ε-close to U in the sup norm.
Noting that
n
iαA/n iβB/n n i 1
lim (e e ) = lim 1 + (αA + βB) + O
n→∞ n→∞ n n2
= ei(αA+βB) , (5.102)
we see that any ei(αA+βB) is reachable if each eiαA/n and eiβB/n is reach-
able. Furthermore, because
√ √ √ √ n
lim eiA/ n eiB/ n e−iA/ n e−iB/ n
n→∞
n
1 1
= lim 1 − (AB − BA) + O = e−[A,B] , (5.103)
n→∞ n n3/2
(the same gate applied to the same two qubits, but in the opposite order)
is another two-qubit gate not commuting with U . Positive powers of U
reach one U (1)3 subgroup of SU (4) while positive powers of V reach a
different U (1)3 , so that circuits built from U and V reach all of SU (4).
Even nongeneric universal gates, in particular gates whose eigenvalues
are all rational multiples of π, can suffice for universality. One example
discussed in the homework is the gate set {CNOT, H, T }, where H ro-
tates the Bloch sphere by the angle π about the axis √12 (x̂ + x̂), and T
rotates the Bloch sphere by the angle π/4 about the ẑ axis. If we replaced
T with the π/2 rotation T 2 , then the gate set would not be universal; in
that case the only achievable single-qubit rotations would be those in a
finite subgroup of SU (2), the symmetry group of the cube. But SU (2)
has few such finite nonabelian subgroups (the only finite nonabelian sub-
groups of the rotation group SO(3) are the symmetry groups of regular
polygons and of regular three-dimensional polyhedra, the platonic solids).
If the gate set reaches beyond these finite subgroups it will reach either a
U (1) subgroup of SU (2) or all of SU (2).
circuit is needed to construct Ũ such that kŨ − eiφ U ksup ≤ ε? (2) How
large a classical circuit is needed to find the quantum circuit that achieves
Ũ ? We will see that, for any universal set of gates (closed under inverse)
used to approximate elements of a unitary group of constant dimension,
the answer to both questions is polylog(1/ε). We care about the answer
to the second question because it would not be very useful to know that
U can be well approximated by small quantum circuits if these circuits
are very hard to find.
We will prove this result by devising a recursive algorithm which
achieves successively better and better approximations. We say that a
finite repertoire of unitary transformations R is an “ε-net” in U (N ) if
every element of U (N ) is no more than distance ε away (in the sup norm)
from some element of R, and we say that R is “closed under inverse” if
the inverse of every element of R is also in R. The key step of the recur-
sive algorithm is to show that if R is an ε-net, closed under inverse, then
we can construct a new repertoire R0 , also closed under inverse, with the
following properties: (1) each element of R0 is achieved by a circuit of at
most 5 gates from R. (2) R0 is an ε0 -net, where
ε0 = Cε3/2 , (5.105)
and C is a constant.
Before explaining how this step works, let’s see why it ensures that we
can approximate any unitary using a quantum circuit with size polylog-
arithmic in the accuracy. Suppose to start with that we have found an
ε0 -net R0 , closed under inverse, where each element of R0 can be achieved
by a circuit with no more than L0 gates chosen from our universal gate
set. If ε0 < 1/C 2 , then we can invoke the recursive step to find an ε1 -net
R1 , where ε1 < ε0 , and each element of R1 can be achieved by a circuit
of L1 = 5L0 gates. By repeating this step k times, we can make the error
εk much smaller than the level-0 error ε0 . Iterating the relation
3/2
C 2 εk = C 2 εk−1 (5.106)
k times we obtain
(3/2)k
C 2 εk = C 2 ε0 , (5.107)
and by taking logs of both sides we find
k
log 1/C 2 εk
3
= . (5.108)
2 log (1/C 2 ε0 )
After k recursive steps the circuit size for each unitary in the εk -net Rk
44 5 Classical and quantum circuits
Thus the circuit size scales with the accuracy εk as [log(1/εk )]3.97 .
Now let’s see how the ε0 -net R0 is constructed from the ε-net R. For
any U ∈ SU (N ) there is an element Ũ ∈ R such that kU − Ũ ksup ≤ ε,
−1
or equivalently kU Ũ − Iksup ≤ ε. Now we will find W , constructed
−1
as a circuit of 4 elements of R, such that kU Ũ − W |sup ≤ ε0 , or
equivalently kU − W Ũ |sup ≤ ε0 . Thus U is approximated to accuracy ε0
by W Ũ , which is achieved by a circuit of 5 elements of R.
−1
We may write U Ũ = eiA , where A = O(ε). (By A = O(ε) we mean
kAksup = O(ε), i.e., kAksup is bounded above by a constant times ε for ε
sufficiently small. ) It is possible to find Hermitian B, C, both O(ε1/2 ),
such that [B, C] = −iA. Furthermore, because R is an ε-net, there is an
element eiB̃ of R which is ε-close to eiB , and an element eiC̃ of R which
is ε-close to eiC . It follows that B − B̃ = O(ε) and C − C̃ = O(ε).
Now we consider the circuit
the remainder term is cubic order in B̃ and C̃, hence O(ε3/2 ). First note
that the inverse of this circuit, eiC̃ eiB̃ e−iC̃ e−iB̃ , can also be constructed
as a size-4 circuit of gates from R. Furthermore,
of tasks that do not scale with ε, such as finding the matrices B and C
satisfying [B, C] = −iA. After k iterations, the classical cost scales like
O(3k ) = O [log(1/εk )]log 3/ log(3/2) = O [log(1/εk )]2.71 , (5.112)
5.5 Summary
Classical circuits. The complexity of a problem can be characterized
by the size of a uniform family of logic circuits that solve the problem:
The problem is hard if the size of the circuit is a superpolynomial func-
tion of the size of the input, and easy otherwise. One classical universal
computer can simulate another efficiently, so the classification of com-
plexity is machine independent. The 3-bit Toffoli gate is universal for
classical reversible computation. A reversible computer can simulate an
irreversible computer without a significant slowdown and without unrea-
sonable memory resources.
Quantum Circuits. Although there is no proof, it seems likely
that polynomial-size quantum circuits cannot be simulated in general
by polynomial-size randomized classical circuits (BQP 6= BPP); however,
polynomial space is sufficient (BQP ⊆ PSPACE). A noisy quantum circuit
can simulate an ideal quantum circuit of size T to acceptable accuracy
46 5 Classical and quantum circuits
if each quantum gate has an accuracy of order 1/T . Any n-qubit uni-
tary transformation can be constructed from two-qubit gates. A generic
two-qubit quantum gate, if it can act on any two qubits in a device, is
sufficient for universal quantum computation. One universal quantum
computer can simulate another to accuracy ε with a polylog(1/ε) over-
heard cost; therefore the complexity class BQP is machine independent.
Do the Exercises to learn more about universal sets of quantum gates.
5.6 Exercises
5.1 Linear simulation of Toffoli gate.
In §5.2.2 we constructed the n-bit Toffoli gate Λn−1 (X) from 3-bit
Toffoli gates (Λ2 (X)’s). The circuit required only one bit of scratch
space, but the number of gates was exponential in n. With more
scratch, we can substantially reduce the number of gates.
a) Find a circuit family with 2n − 5 Λ2 (X)’s that evaluates
Λn−1 (X). (Here n − 3 scratch bits are used, which are set
to 0 at the beginning of the computation and return to the
value 0 at the end.)
b) Find a circuit family with 4n − 12 Λ2 (X)’s that evaluates
Λn−1 (X), which works irrespective of the initial values of the
scratch bits. (Again the n−3 scratch bits return to their initial
values, but they don’t need to be set to zero at the beginning.)
5.2 An exactly universal quantum gate set.
The purpose of this exercise is to complete the demonstration that
the controlled-NOT gate Λ(X) and arbitrary single-qubit gates con-
stitute an exactly universal set.
Since the argument in §5.4.2 shows that the gate set {Λ(U )} is
exactly universal, we have shown that Λ(X) together with single-
qubit gates are an exactly universal set.
5.3 Universal quantum gates I
In this exercise and the two that follow, we will establish that several
simple sets of gates are universal for quantum computation.
The Hadamard transformation H is the single-qubit gate that acts
in the standard basis {|0i, |1i} as
1 1 1
H=√ ; (5.117)
2 1 −1
in quantum circuit notation, we denote the Hadamard gate as
and is denoted
S
48 5 Classical and quantum circuits
Despite this misleading notation, the gate Λ(S) actually acts sym-
metrically on the two qubits:
s S
=
S s
We will see that the two gates H and Λ(S) comprise a universal gate
set – any unitary transformation can be approximated to arbitrary
accuracy by a quantum circuit built out of these gates.
H s H
U1 =
S
and
S
U2 =
H s H
Let |abi denote the element of the standard basis where a labels
the upper qubit in the circuit diagram and b labels the lower
qubit. Write out U 1 and U 2 as 4 × 4 matrices in the standard
basis. Show that U 1 and U 2 both act trivially on the states
1
|00i, √ (|01i + |10i + |11i) . (5.120)
3
and √
1 3(1 − i)
U2 = √3+i . (5.123)
4 3(1 − i) 1 + 3i
5.6 Exercises 49
The property that eiθ/2 is not a root of unity follows from the result
(a) and the
50 5 Classical and quantum circuits
b) By “long division” we can prove that if A(x) and B(x) are ra-
tional polynomials, then there exist rational polynomials Q(x)
and R(x) such that
where the “remainder” R(x) has degree less than the degree
of B(x). Suppose that an = 1, and that P (x) is a rational
polynomial of minimal degree such that P (a) = 0. Show that
there is a rational polynomial Q(x) such that
xn − 1 = P (x)Q(x) . (5.130)
c) Show that if A(x) and B(x) are both primitive integral polyno-
mials, then
P so is their product C(x) = A(x)B(x). Hint: If
C(x) = k ck xk is not primitive, then there is a P prime num-
ber p that divides all of the c ’s. Write A(x) = l
k l al x , and
B(x) = m bm xm , let ar denote the coefficient of lowest order
P
in A(x) that is not divisible by p (which must exist if A(x) is
primitive), and let bs denote the coefficient of lowest order in
B(x) that is not divisible by p. Express the product ar bs in
terms of cr+s and the other al ’s and bm ’s, and reach a contra-
diction.
d) Suppose that a monic integral polynomial P (x) can be factored
into a product of two monic rational polynomials, P (x) =
A(x)B(x). Show that A(x) and B(x) are integral. Hint:
First note that we may write A(x) = (1/r) · Ã(x), and
B(x) = (1/s) · B̃(x), where r, s are positive integers, and Ã(x)
and B̃(x) are primitive integral; then use (c) to show that
r = s = 1.
e) Combining (b) and (d), prove the theorem.
U (1) subgroups are dense in the SU (2) subgroup that contains both
U 1 and U 2 .
Furthermore, products of
Λ(S)U −1
2 U 1 Λ(S)
−1
and Λ(S)U 1 U −1
2 Λ(S)
−1
(5.131)
and
Thus, the classical Toffoli gate does not need much help to unleash
the power of quantum computing. In fact, any nonclassical single-
qubit gate (one that does not preserve the computational basis),
combined with the Toffoli gate, is sufficient.
5.6 Exercises 53
John Preskill
Institute for Quantum Information and Matter
California Institute of Technology
page iv
Preface v
6 Quantum Algorithms 1
6.1 Some Quantum Algorithms 1
6.2 Periodicity 7
6.2.1 Finding the period 8
6.2.2 From FFT to QFT 10
6.3 Factoring 12
6.3.1 Factoring as period finding 12
6.3.2 RSA 16
6.4 Phase Estimation 18
6.5 Hidden Subgroup Problem 21
6.5.1 Discrete Log Problem 23
6.5.2 Diffie-Hellman key exchange 23
6.5.3 Finding abelian hidden subgroups 24
6.6 Quantum Searching 28
6.6.1 Generalized Search 31
6.7 The Grover Algorithm Is Optimal 32
6.8 Using quantum computers to simulate quantum physics 35
6.8.1 Simulating time evolution of local Hamiltonians 35
6.8.2 Estimating energy eigenvalues and preparing energy eigenstates 39
6.9 Classical simulation of slightly entangling quantum computations 42
6.10 QMA-completeness of the local Hamiltonian problem 46
6.10.1 3-SAT is NP-complete 47
6.10.2 Frustrated spin glasses 49
6.10.3 The quantum k-local Hamiltonian problem 50
6.10.4 Constructing and analyzing the Hamiltonian 51
This article forms one chapter of Quantum Information which will be first published by
Cambridge University Press.
c in the Work, John Preskill, 2020
NB: The copy of the Work, as displayed on this website, is a draft, pre-publication copy
only. The final, published version of the Work can be purchased through Cambridge
University Press and other standard distribution channels. This draft copy is made
available for personal use only and must not be sold or re-distributed.
Preface
This is the 6th chapter of my book Quantum Information, based on the course I have
been teaching at Caltech since 1997. An early version of this chapter has been available
on the course website since 1998, but this version is substantially revised and expanded.
This is a working draft of Chapter 6, which I will continue to update. See the URL
on the title page for further updates and drafts of other chapters. Please send an email
to [email protected] if you notice errors.
Eventually, the complete book will be published by Cambridge University Press. I
hesitate to predict the publication date — they have been far too patient with me.
6
Quantum Algorithms
(1) Nonexponential speedup. We can find quantum algorithms that are demonstra-
bly faster than the best classical algorithm, but not exponentially faster. These
algorithms shed no light on the conventional classification of complexity. But
they do demonstrate a type of separation between tasks that classical and quan-
tum computers can perform. Example: Grover’s quantum speedup of the search
of an unsorted data base.
(2) “Relativized” exponential speedup. We can consider the problem of analyzing
the contents of a “quantum black box.” The box performs an a priori unknown)
unitary transformation. We can prepare an input for the box, and we can measure
its output; our task is to find out what the box does. It is possible to prove that
quantum black boxes (computer scientists call them oracles1 ) exist with this
property: By feeding quantum superpositions to the box, we can learn what is
inside with an exponential speedup, compared to how long it would take if we
were only allowed classical inputs. A computer scientist would say that BP P 6=
BQP “relative to the oracle.” Example: Simon’s exponential quantum speedup
for finding the period of a 2 to 1 function.
(3) Exponential speedup for “apparently” hard problems. We can exhibit a
quantum algorithm that solves a problem in polynomial time, where the problem
appears to be hard classically, so that it is strongly suspected (though not proved)
that the problem is not in BP P . Example: Shor’s factoring algorithm.
Deutsch’s problem. We will discuss examples from all three approaches. But first,
we’ll warm up by recalling an example of a simple quantum algorithm that was previously
discussed in §1.5: Deutsch’s algorithm for distinguishing between constant and balanced
functions f : {0, 1} → {0, 1}. We are presented with a quantum black box that computes
f (x); that is, it enacts the two-qubit unitary transformation
which flips the second qubit iff f (first qubit) = 1. Our assignment is to determine
1 The term “oracle” signifies that the box responds to a query immediately; that is, the time it takes the box
to operate is not included in the complexity analysis.
2 Quantum Algorithms
whether f (0) = f (1). If we are restricted to the “classical” inputs |0i and |1i, we need
to access the box twice (x = 0 and x = 1) to get the answer. But if we are allowed to
input a coherent superposition of these “classical” states, then once is enough.
The quantum circuit that solves the problem (discussed in §1.5) is:
|0i H s H Measure
|1i H Uf
Then when we measure the first qubit, we find the outcome |0i with probability one if
f (0) = f (1) (constant function) and the outcome |1i with probability one if f (0) 6= f (1)
(balanced function).
A quantum computer enjoys an advantage over a classical computer because it can
invoke quantum parallelism. Because we input a superposition of |0i and |1i, the output
is sensitive to both the values of f (0) and f (1), even though we ran the box just once.
Deutsch–Jozsa problem. Now we’ll consider some generalizations of Deutsch’s
problem. We will continue to assume that we are to analyze a quantum black box
(“quantum oracle”). But in the hope of learning something about complexity, we will
imagine that we have a family of black boxes, with variable input size. We are interested
in how the time needed to find out what is inside the box scales with the size of the
input (where “time” is measured by how many times we query the box).
6.1 Some Quantum Algorithms 3
In the Deutsch–Jozsa problem, we are presented with a quantum black box that com-
putes a function taking n bits to 1,
f : {0, 1}n → {0, 1}, (6.6)
and we have it on good authority that f is either constant (f (x) = c for all x) or
balanced (f (x) = 0 for exactly 12 of the possible input values). We are to solve the
decision problem: Is f constant or balanced?
In fact, we can solve this problem, too, accessing the box only once, using the same
circuit as for Deutsch’s problem (but with x expanded from one bit to n bits). We note
that if we apply n Hadamard gates in parallel to n-qubits.
H (n) = H ⊗ H ⊗ . . . ⊗ H, (6.7)
then the n-qubit state transforms as
n n −1
2X
(n)
Y 1 X
x y 1
H : |xi → √ (−1) |yi i ≡ n/2
i i
(−1)x·y |yi, (6.8)
i=1
2 2 y=0
y ={0,1}
i
where x, y represent n-bit strings, and x · y denotes the bitwise AND (or mod 2 scalar
product)
x · y = (x1 ∧ y1 ) ⊕ (x2 ∧ y2 ) ⊕ . . . ⊕ (xn ∧ yn ). (6.9)
Acting on the input (|0i)n |1i, the action of the circuit is
n −1
2X
!
1 1
(|0i)n |1i → n/2
|xi √ (|0i − |1i)
2 x=0
2
n
2 −1
!
1 X f (x) 1
→ (−1) |xi √ (|0i − |1i)
2n/2 x=0 2
2Xn n
−1 2X −1
1 1
→ n (−1)f (x) (−1)x·y |yi √ (|0i − |1i) (6.10)
2 2
x=0 y=0
it vanishes unless y = 0. Hence, when we measure the n-bit register, we obtain the result
|y = 0i ≡ (|0i)n with probability one. But if the function is balanced, then for y = 0,
the sum becomes
2n−1
1 X
(−1)f (x) = 0, (6.13)
2n
x=0
(because half of the terms are (+1) and half are (−1)). Therefore, the probability of
obtaining the measurement outcome |y = 0i is zero.
4 Quantum Algorithms
We conclude that one query of the quantum oracle suffices to distinguish constant and
balanced function with 100% confidence. The measurement result y = 0 means constant,
any other result means balanced.
So quantum computation solves this problem neatly, but is the problem really hard
classically? If we are restricted to classical input states |xi, we can query the oracle
repeatedly, choosing the input x at random (without replacement) each time. Once we
obtain distinct outputs for two different queries, we have determined that the function
is balanced (not constant). But if the function is in fact constant, we will not be certain
it is constant until we have submitted 2n−1 + 1 queries and have obtained the same
response every time. In contrast, the quantum computation gives a definite response in
only one go. So in this sense (if we demand absolute certainty) the classical calculation
requires a number of queries exponential in n, while the quantum computation does not,
and we might therefore claim an exponential quantum speedup.
But perhaps it is not reasonable to demand absolute certainty of the classical com-
putation (particularly since any real quantum computer will be susceptible to errors, so
that the quantum computer will also be unable to attain absolute certainty.) Suppose
we are satisfied to guess balanced or constant, with a probability of success
P (success) > 1 − ε. (6.14)
If the function is actually balanced, then if we make k queries, the probability of getting
the same response every time is p = 2−(k−1) . If after receiving the same response k
consecutive times we guess that the function is balanced, then a quick Bayesian analysis
shows that the probability that our guess is wrong is 2k−11 +1 (assuming that balanced
and constant are a priori equally probable). So if we guess after k queries, the probability
of a wrong guess is
1
1 − P (success) = k−1 k−1 . (6.15)
2 (2 + 1)
Therefore, we can achieve success probability 1 − ε for ε−1 = 2k−1 (2k−1 + 1) or k ∼
1 1
2 log ε . Since we can reach an exponentially good success probability with a polynomial
number of trials, it is not really fair to say that the problem is hard.
Bernstein–Vazirani problem. Exactly the same circuit can be used to solve another
variation on the Deutsch–Jozsa problem. Let’s suppose that our quantum black box
computes one of the functions fa , where
fa (x) = a · x, (6.16)
and a is an n-bit string. Our job is to determine a.
The quantum algorithm can solve this problem with certainty, given just one (n-qubit)
quantum query. For this particular function, the quantum state in eq. (6.10) becomes
n n
2 −1 2 −1
1 X X
(−1)a·x (−1)x·y |yi. (6.17)
2n
x=0 y=0
But in fact
n
2 −1
1 X
(−1)a·x (−1)x·y = δa,y , (6.18)
2n
x=0
so this state is |ai. We can execute the circuit once and measure the n-qubit register,
finding the n-bit string a with probability one.
6.1 Some Quantum Algorithms 5
If only classical queries are allowed, we acquire only one bit of information from each
query, and it takes n queries to determine the value of a. Therefore, we have a clear
separation between the quantum and classical difficulty of the problem. Even so, this
example does not probe the relation of BP P to BQP , because the classical problem is
not hard. The number of queries required classically is only linear in the input size, not
exponential.
Simon’s problem. Bernstein and Vazirani managed to formulate a variation on the
above problem that is hard classically, and so establish for the first time a “relativized”
separation between quantum and classical complexity. We will find it more instructive
to consider a simpler example proposed somewhat later by Daniel Simon.
Once again we are presented with a quantum black box, and this time we are assured
that the box computes a function
f : {0, 1}n → {0, 1}n , (6.19)
that is 2-to-1. Furthermore, the function has a “period” given by the n-bit string a; that
is
f (x) = f (y) iff y = x ⊕ a, (6.20)
where here ⊕ denotes the bitwise XOR operation. (So a is the period if we regard x
as taking values in (Z2 )n rather than Z2n .) This is all we know about f . Our job is to
determine the value of a.
Classically this problem is hard. We need to query the oracle an exponentially large
number of times to have any reasonable probability of finding a. We don’t learn anything
until we are fortunate enough to choose two queries x and y that happen to satisfy
x ⊕ y = a. Suppose, for example, that we choose 2n/4 queries. The number of pairs of
queries is less than (2n/4 )2 , and for each pair {x, y}, the probability that x ⊕ y = a is
2−n . Therefore, the probability of successfully finding a is less than
2−n (2n/4 )2 = 2−n/2 ; (6.21)
even with exponentially many queries, the success probability is exponentially small.
If we wish, we can frame the question as a decision problem: Either f is a 1-1 function,
or it is 2-to-1 with some randomly chosen period a, each occurring with an a priori
probability 12 . We are to determine whether the function is 1-to-1 or 2-to-1. Then, after
2n/4 classical queries, our probability of making a correct guess is
1 1
P (success) < + n/2 , (6.22)
2 2
which does not remain bounded away from 21 as n gets large.
But with quantum queries the problem is easy! The circuit we use is essentially the
same as above, but now both registers are expanded to n qubits. We prepare the equally
weighted superposition of all n-bit strings (by acting on |0i with H (n) ), and then we
query the oracle:
n −1
2X
! n −1
2X
Uf : |xi |0i → |xi|f (x)i. (6.23)
x=0 x=0
Now we measure the second register. (This step is not actually necessary, but I include
it here for the sake of pedagogical clarity.) The measurement outcome is selected at
random from the 2n−1 possible values of f (x), each occurring equiprobably. Suppose the
6 Quantum Algorithms
outcome is f (x0 ). Then because both x0 and x0 ⊕ a, and only these values, are mapped
by f to f (x0 ), we have prepared the state
1
√ (|x0 i + |x0 ⊕ ai) (6.24)
2
in the first register.
Now we want to extract some information about a. Clearly it would do us no good to
measure the register (in the computational basis) at this point. We would obtain either
the outcome x0 or x0 ⊕ a, each occurring with probability 12 , but neither outcome would
reveal anything about the value of a.
But suppose we apply the Hadamard transform H (n) to the register before we mea-
sure:
1
H (n) : √ (|x0 i + |x0 + ai)
2
n −1
2X
1 h i
→ (n+1)/2 (−1)x0 ·y + (−1)(x0 ⊕a)·y |yi
2 y=0
1 X
= (n−1)/2 (−1)x0 ·y |yi. (6.25)
2 a·y=0
If a · y = 1, then the terms in the coefficient of |yi interfere destructively. Hence only
states |yi with a · y = 0 survive in the sum over y. The measurement outcome, then, is
selected at random from all possible values of y such that a · y = 0, each occurring with
probability 2−(n−1) .
We run this algorithm repeatedly, each time obtaining another value of y satisfying
y · a = 0. Once we have found n such linearly independent values {y1 , y2 , y3 . . . yn } (that
is, linearly independent over (Z2 )n ), we can solve the equations
y1 · a = 0
y2 · a = 0
..
.
yn · a = 0, (6.26)
to determine a unique value of a, and our problem is solved. It is easy to see that with
O(n) repetitions, we can attain a success probability that is exponentially close to 1.
So we finally have found an example where, given a particular type of quantum oracle,
we can solve a problem in polynomial time by exploiting quantum superpositions, while
exponential time is required if we are limited to classical queries. As a computer scientist
might put it:
6.2 Periodicity
So far, the one case for which we have found an exponential separation between the speed
of a quantum algorithm and the speed of the corresponding classical algorithm is the
case of Simon’s problem. Simon’s algorithm exploits quantum parallelism to speed up
the search for the period of a function. Its success encourages us to seek other quantum
algorithms designed for other kinds of period finding.
Simon studied periodic functions taking values in (Z2 )n . For that purpose the n-bit
Hadamard transform H (n) was a powerful tool. If we wish instead to study periodic func-
tions taking values in Z2n , the (discrete) Fourier transform will be a tool of comparable
power.
The moral of Simon’s problem is that, while finding needles in a haystack may be
difficult, finding periodically spaced needles in a haystack can be far easier. For example,
if we scatter a photon off of a periodic array of needles, the photon is likely to be scattered
in one of a set of preferred directions, where the Bragg scattering condition is satisfied.
These preferred directions depend on the spacing between the needles, so by scattering
just one photon, we can already collect some useful information about the spacing. We
should further explore the implications of this metaphor for the construction of efficient
quantum algorithms.
So imagine a quantum oracle that computes a function
1 r 2n . (6.28)
That is,
where m is any integer such that x and x + mr lie in {0, 1, 2, . . . , 2n − 1}. We are to find
the period r. Classically, this problem is hard. If r is, say, of order 2n/2 , we will need to
query the oracle of order 2n/4 times before we are likely to find two values of x that are
mapped to the same value of f (x), and hence learn something about r. But we will see
that there is a quantum algorithm that finds r in time poly (n).
Even if we know how to compute efficiently the function f (x), it may be a hard
problem to determine its period. Our quantum algorithm can be applied to finding, in
poly(n) time, the period of any function that we can compute in poly(n) time. Efficient
period finding allows us to efficiently solve a variety of (apparently) hard problems, such
as factoring an integer, or evaluating a discrete logarithm.
The key idea underlying quantum period finding is that the Fourier transform can
be evaluated by an efficient quantum circuit (as discovered by Peter Shor). The quan-
tum Fourier transform (QFT) exploits the power of quantum parallelism to achieve an
exponential speedup of the well-known (classical) fast Fourier transform (FFT). Since
the FFT has such a wide variety of applications, perhaps the QFT will also come into
widespread use someday.
8 Quantum Algorithms
where N = 2n . For now let’s suppose that we can perform the QFT efficiently, and see
how it enables us to extract the period of f (x).
Emulating Simon’s algorithm, we first query the oracle with the input √1N x |xi
P
Then we measure the output register, obtaining the result |f (x0 )i for some 0 ≤ x0 < r.
This measurement prepares in the input register the coherent superposition of the A
values of x that are mapped to f (x0 ):
A−1
1 X
√ |x0 + jri, (6.32)
A j=0
where
N − r ≤ x0 + (A − 1)r < N, (6.33)
or
N
A−1< < A + 1. (6.34)
r
Actually, the measurement of the output register is unnecessary. If it is omitted, the state
of the input register is an incoherent superposition (summed over x0 ∈ {0, 1, . . . r − 1})
of states of the form eq. (6.32). The rest of the algorithm works just as well acting on
this initial state.
Now our task is to extract the value of r from the state eq. (6.32) that we have
prepared. Were we to measure the input register by projecting onto the computational
basis at this point, we would learn nothing about r. Instead (cf. Simon’s algorithm), we
should Fourier transform first and then measure.
By applying the QFT to the state eq. (6.32) we obtain
N −1 A−1
1 X 2πix0 y/N X 2πijry/N
√ e e |yi. (6.35)
N A y=0 j=0
If we now measure in the computational basis, the probability of obtaining the outcome
y is
2
A−1
A 1
X
2πijry/N
Prob(y) = e . (6.36)
N A
j=0
This distribution strongly favors values of y such that yr/N is close to an integer. For
6.2 Periodicity 9
example, if N/r happened to be an integer (and therefore equal to A), we would have
A−1 1
1 1 X 2πijy/A r y = A · (integer)
Prob(y) = e = (6.37)
r A
0 otherwise.
j=0
where
2πyr(mod N )
θy = . (6.39)
N
There are precisely r values of y in {0, 1, . . . , N − 1} that satisfy
r r
− ≤ yr(mod N ) ≤ . (6.40)
2 2
(To see this, imagine marking the multiples of r and N on a number line ranging from
0 to rN − 1. For each multiple of N , there is a multiple of r no more than distance r/2
away.) For each of these values, the corresponding θy satisfies.
r r
−π ≤ θy ≤ π . (6.41)
N N
Now, since A − 1 < Nr , for these values of θy all of the terms in the sum over j in eq.
(6.38) lie in the same half-plane, so that the terms interfere constructively and the sum
is substantial.
We know that
|1 − eiθ | ≤ |θ|, (6.42)
because the straight-line distance from the origin is less than the arc length along the
circle, and for A|θ| ≤ π, we know that
2A|θ|
|1 − eiAθ | ≥ , (6.43)
π
because we can see (either graphically or by evaluating its derivative) that thisdistance
is a convex function. We actually have A < Nr + 1, and hence Aθy < π 1 + Nr , but by
applying the above bound to
ei(A−1)θ − 1 ei(A−1)θ − 1
+ ei(A−1)θ ≥ − 1, (6.44)
e −1
iθ iθ
e −1
for each of the r values of y that satisfy eq. (6.40). Therefore, with a probability of at
least 4/π 2 , the measured value of y will satisfy
N 1 N 1
k − ≤y≤k + , (6.47)
r 2 r 2
or
k 1 y k 1
− ≤ ≤ + , (6.48)
r 2N N r 2N
where k is an integer chosen from {0, 1, . . . , r − 1}. The output of the computation is
reasonable likely to be within distance 1/2 of an integer multiple of N/r.
Suppose that we know that r < M N . Thus N/r is a rational number with a
denominator less than M . Two distinct rational numbers, each with denominator less
than M , can be no closer together than 1/M 2 , since ab − dc = ad−bcbd . If the measurement
outcome y satisfies eq. (6.47), then there is a unique value of k/r (with r < M ) deter-
mined by y/N , provided that N ≥ M 2 . This value of k/r can be efficiently extracted
from the measured y/N , by the continued fraction method.
Now, with probability exceeding 4/π 2 , we have found a value of k/r where k is selected
(roughly equiprobably) from {0, 1, 2, . . . , r − 1}. It is reasonably likely that k and r are
relatively prime (have no common factor), so that we have succeeded in finding r. With
a query of the oracle, we may check whether f (x) = f (x + r). But if GCD(k, r) 6= 1, we
have found only a factor (r1 ) of r.
If we did not succeed, we could test some nearby values of y (the measured value
might have been close to the range −r/2 ≤ yr(mod N ) ≤ r/2 without actually lying
inside), or we could try a few multiples of r (the value of GCD(k, r), if not 1, is probably
not large). That failing, we resort to a repetition of the quantum circuit, this time (with
probability at least 4/π 2 ) obtaining a value k 0 /r. Now k 0 , too, may have a common
factor with r, in which case our procedure again determines a factor (r2 ) of r. But it is
reasonably likely that GCD(k, k 0 ) = 1, in which case r = LCM(r1 , r2 ). Indeed, we can
estimate the probability that randomly selected k and k 0 are relatively prime as follows:
Since a prime number p divides a fraction 1/p of all numbers, the probability that p
divides both k and k 0 is 1/p2 . And k and k 0 are coprime if and only if there is no prime
p that divides both. Therefore,
Y 1
1 6
Prob(k, k 0 coprime) = 1− 2 = = 2 ' .607 (6.49)
p ζ(2) π
prime p
(where ζ(z) denotes the Riemann zeta function). Therefore, we are likely to succeed in
finding the period r after some constant number (independent of N ) of repetitions of
the algorithm.
well-known and very useful (classical) procedure that reduces the number of operations
to O(N log N ). Assuming N = 2n , we express x and y as binary expansions
x = xn−1 · 2n−1 + xn−2 · 2n−2 + . . . + x1 · 2 + x0
y = yn−1 · 2n−1 + yn−2 · 2n−2 + . . . + y1 · 2 + y0 . (6.51)
In the product of x and y, we may discard any terms containing n or more powers of 2,
as these make no contribution to e2πixy /2n . Hence
xy
≡ yn−1 (.x0 ) + yn−2 (.x1 x0 ) + yn−3 (.x2 x1 x0 ) + . . .
2n
+ y1 (.xn−2 xn−3 . . . x0 ) + y0 (.xn−1 xn−2 . . . x0 ), (6.52)
where the factors in parentheses are binary expansions; e.g.,
x2 x1 x0
.x2 x1 x0 = + 2 + 3. (6.53)
2 2 2
We can now evaluate
1 X 2πixy/N
f˜(x) = √ e f (y), (6.54)
N y
for each of the N values of x. But the sum over y factors into n sums over yk = 0, 1,
which can be done sequentially in a time of order n.
With quantum parallelism, we can do far better. From eq. (6.52) we obtain
1 X 2πixy/N
QF T :|xi → √ e |yi
N y
1
=√ |0i + e2πi(.x0 ) |1i |0i + e2πi(.x1 x0 ) |1i
2n
. . . |0i + e2πi(.xn−1 xn−2 ...x0 ) |1i . (6.55)
The QFT takes each computational basis state to an unentangled state of n qubits;
thus we anticipate that it can be efficiently implemented. Indeed, let’s consider the case
n = 3. We can readily see that the circuit
|x2 i H R1 R2 |y0 i
|x1 i s H R1 |y1 i
|x0 i s s H |y2 i
does the job (but note that the order of the bits has been reversed in the output). Each
Hadamard gate acts as
1
H : |xk i → √ |0i + e2πi(.xk ) |1i . (6.56)
2
The other contributions to the relative phase of |0i and |1i in the kth qubit are provided
by the two-qubit conditional rotations, where
1 0
Rd = d , (6.57)
0 eiπ/2
12 Quantum Algorithms
|x2 i H s s |y0 i
|x1 i R1 H s |y1 i
|x0 i R2 R1 H |y2 i
Once we have measured |y0 i, we know the value of the control bit in the controlled-R1
gate that acted on the first two qubits. Therefore, we will obtain the same probability
distribution of measurement outcomes if, instead of applying controlled-R1 and then
measuring, we instead measure y0 first, and then apply (R1 )y0 to the next qubit, condi-
tioned on the outcome of the measurement of the first qubit. Similarly, we can replace
the controlled-R1 and controlled-R2 gates acting on the third qubit by the single qubit
rotation
(R2 )y0 (R1 )y1 , (6.58)
(that is, a rotation with relative phase π(.y1 y0 )) after the values of y1 and y0 have been
measured.
Altogether then, if we are going to measure after performing the QFT, only n
Hadamard gates and n − 1 single-qubit rotations are needed to implement it. The QFT
is remarkably simple!
6.3 Factoring
6.3.1 Factoring as period finding
What does the factoring problem (finding the prime factors of a large composite positive
integer) have to do with periodicity? There is a well-known (randomized) reduction of
6.3 Factoring 13
factoring to determining the period of a function. Although this reduction is not directly
related to quantum computing, we will discuss it here for completeness, and because the
prospect of using a quantum computer as a factoring engine has generated so much
excitement.
Suppose we want to find a factor of the n-bit number N . Select pseudo-randomly
a < N , and compute the greatest common divisor GCD(a, N ), which can be done
efficiently (in a time of order (log N )3 ) using the Euclidean algorithm. If GCD(a, N ) 6= 1
then the GCD is a nontrivial factor of N , and we are done. So suppose GCD(a, N ) = 1.
[Aside: The Euclidean algorithm. To compute GCD(N1 , N2 ) (for N1 > N2 ) first
divide N1 by N2 obtaining remainder R1 . Then divide N2 by R1 , obtaining
remainder R2 . Divide R1 by R2 , etc. until the remainder is 0. The last nonzero
remainder is R = GCD(N1 , N2 ). To see that the algorithm works, just note that
(1) R divides all previous remainders and hence also N1 and N2 , and (2) any
number that divides N1 and N2 will also divide all remainders, including R. A
number that divides both N1 and N2 , and also is divided by any number that
divides both N1 and N2 must be GCD(N1 , N2 ). To see how long the Euclidean
algorithm takes, note that
Rj = qRj+1 + Rj+2 , (6.59)
where q ≥ 1 and Rj+2 < Rj+1 ; therefore Rj+2 < 12 Rj . Two divisions reduce
the remainder by at least a factor of 2, so no more than 2 log N1 divisions are
required, with each division using O((log N )2 ) elementary operations; the total
number of operations is O((log N )3 ).]
The numbers a < N coprime to N (having no common factor with N ) form a finite
group under multiplication mod N . [Why? We need to establish that each element a has
an inverse. But for given a < N coprime to N , each ab (mod N ) is distinct, as b ranges
over all b < N coprime to N . (If N divides ab − ab0 , it must divide b − b0 .) Therefore,
for some b, we must have ab ≡ 1 (mod N ); hence the inverse of a exists.] Each element
a of this finite group has a finite order r, the smallest positive integer such that
ar ≡ 1 (mod N ). (6.60)
The order of a mod N is the period of the function
fN,a (x) = ax (mod N ). (6.61)
We know there is an efficient quantum algorithm that can find the period of a function;
therefore, if we can compute fN,a efficiently, we can find the order of a efficiently.
Computing fN,a may look difficult at first, since the exponent x can be very large.
But if x < 2m and we express x as a binary expansion
x = xm−1 · 2m−1 + xm−2 · 2m−2 + . . . + x0 , (6.62)
we have
m−1 m−2
ax (mod N ) = (a2 )xm−1 (a2 )xm−2 . . . (a)x0 (mod N ). (6.63)
j
Each a2 has a large exponent, but can be computed efficiently by a classical computer,
using repeated squaring
j j−1
a2 (mod N ) = (a2 )2 (mod N ). (6.64)
14 Quantum Algorithms
INPUT 1
j
For j = 0 to m − 1, if xj = 1, MULTIPLY a2 .
|x2 i s
|x1 i s
|x0 i s
|1i a a2 a4
j
Multiplication by a2 is performed if the control qubit xj has the value 1.
Suppose we have found the period r of a mod N . Then if r is even, we have
r r
N divides a 2 + 1 a 2 − 1 . (6.65)
We know that N does not divide ar/2 − 1; if it did, the order of a would be ≤ r/2. Thus,
if it is also the case that N does not divide ar/2 + 1, or
then N must have a nontrivial common factor with each of ar/2 ± 1. Therefore,
GCD(N, ar/2 + 1) 6= 1 is a factor (that we can find efficiently by a classical compu-
tation), and we are done.
We see that, once we have found r, we succeed in factoring N unless either (1) r is
odd or (2) r is even and ar/2 ≡ −1 (mod N ). How likely is success?
Let’s suppose that N is a product of two prime factors p1 6= p2 ,
N = p1 p2 (6.67)
(this is actually the least favorable case). For each a < p1 p2 , there exist unique a1 < p1
and a2 < p2 such that
a ≡ a1 (mod p1 )
a ≡ a2 (mod p2 ). (6.68)
Choosing a random a < N is, therefore, equivalent to choosing random a, < p1 and
a2 < p2 .
6.3 Factoring 15
[Aside: We’re using the Chinese Remainder Theorem. The a solving eq. (6.68) is
unique because if a and b are both solutions, then both p1 and p2 must divide
a − b. The solution exists because every a < p1 p2 solves eq. (6.68) for some a1
and a2 . Since there are exactly p1 p2 ways to choose a1 and a2 , and exactly p1 p2
ways to choose a, uniqueness implies that there is an a corresponding to each
pair a1 , a2 .]
Now let r1 denote the order of a1 mod p1 and r2 denote the order of a2 mod p2 . The
Chinese remainder theorem tells us that ar ≡ 1 (mod p1 p2 ) is equivalent to
ar1 ≡ 1 (mod p1 )
ar2 ≡ 1 (mod p2 ). (6.69)
Therefore r = LCM(r1 , r2 ). If r1 and r2 are both odd, then so is r, and we lose.
But if either r1 or r2 is even, then so is r, and we are still in the game. If
ar/2 ≡ −1 (mod p1 )
r/2
a ≡ −1 (mod p2 ). (6.70)
then ar/2 6≡ −1(mod p1 p2 ) and we win. (Of course, ar/2 ≡ 1 (mod p1 ) and ar/2 ≡
1 (mod p2 ) is not possible, for that would imply ar/2 ≡ 1 (mod p1 p2 ), and r could not
be the order of a.)
Suppose that
r1 = 2c1 · odd
r2 = 2c2 · odd, (6.73)
where c1 > c2 . Then r = LCM(r1 , r2 ) = 2r2 · integer, so that ar/2 ≡ 1 (mod p2 ) and
eq. (6.71) is satisfied – we win! Similarly c2 > c1 implies eq. (6.72) – again we win. But
for c1 = c2 , r = r1 · (odd) = r2 · (odd0 ) so that eq. (6.70) is satisfied – in that case we
lose.
Okay – it comes down to: for c1 = c2 we lose, for c1 6= c2 we win. How likely is c1 6= c2 ?
It helps to know that the multiplicative group mod p is cyclic – it contains a primitive
element of order p − 1, so that all elements are powers of the primitive element. [Why?
The integers mod p are a finite field. If the group were not cyclic, the maximum order
of the elements would be q < p − 1, so that xq ≡ 1 (mod p) would have p − 1 solutions.
But that can’t be: in a finite field there are no more than q qth roots of unity.]
Suppose that p − 1 = 2k · s, where s is odd, and consider the orders of all the elements
of the cyclic group of order p − 1. For brevity, we’ll discuss only the case k = 1, which
is the least favorable case for us. Then if b is a primitive (order 2s) element, the even
powers of b have odd order, and the odd powers of b have order 2· (odd). In this case,
16 Quantum Algorithms
then, r = 2c · (odd) where c ∈ {0, 1}, each occuring equiprobably. Therefore, if p1 and
p2 are both of this (unfavorable) type, and a1 , a2 are chosen randomly, the probability
that c1 6= c2 is 12 . Hence, once we have found r, our probability of successfully finding
a factor is at least 21 , if N is a product of two distinct primes. If N has more than two
distinct prime factors, our odds are even better. The method fails if N is a prime power,
N = pα , but prime powers can be efficiently factored by other methods.
6.3.2 RSA
Does anyone care whether factoring is easy or hard? Well, yes, some people do.
The presumed difficulty of factoring is the basis of the security of the widely used
RSA (for Rivest, Shamir, and Adleman) scheme for public key cryptography, which you
may have used yourself if you have ever sent your credit card number over the internet.
The idea behind public key cryptography is to avoid the need to exchange a secret key
(which might be intercepted and copied) between the parties that want to communicate.
The enciphering key is public knowledge. But using the enciphering key to infer the
deciphering key involves a prohibitively difficult computation. Therefore, Bob can send
the enciphering key to Alice and everyone else, but only Bob will be able to decode
the message that Alice (or anyone else) encodes using the key. Encoding is a “one-way
function” that is easy to compute but very hard to invert.
(Of course, Alice and Bob could have avoided the need to exchange the public key if
they had decided on a private key in their previous clandestine meeting. For example,
they could have agreed to use a long random string as a one-time pad for encoding and
decoding. But perhaps Alice and Bob never anticipated that they would someday need
to communicate privately. Or perhaps they did agree in advance to use a one-time pad,
but they have now used up their private key, and they are loath to reuse it for fear that
an eavesdropper might then be able to break their code. Now they are two far apart to
safely exchange a new private key; public key cryptography appears to be their most
secure option.)
To construct the public key Bob chooses two large prime numbers p and q. But he
does not publicly reveal their values. Instead he computes the product
N = pq. (6.74)
Since Bob knows the prime factorization of N , he also knows the value of the Euler
function ϕ(N ) – the number of number less than N that are coprime with N . In the
case of a product of two primes it is
(only multiples of p and q share a factor with N ). It is easy to find ϕ(N ) if you know
the prime factorization of N , but it is hard if you know only N .
Bob then pseudo-randomly selects e < ϕ(N ) that is coprime with ϕ(N ). He reveals
to Alice (and anyone else who is listening) the value of N and e, but nothing else.
Alice converts her message to ASCII, a number a < N . She encodes the message by
computing
b = f (a) = ae (mod N ), (6.76)
which she can do quickly by repeated squaring. How does Bob decode the message?
6.3 Factoring 17
and Eve can decipher the message. If our only concern is to defeat RSA, we run the
Shor algorithm to find r = Ord(ae ), and we needn’t worry about whether we can use r
to extract a factor of N or not.
How important are such prospective cryptographic applications of quantum comput-
ing? When fast quantum computers are readily available, concerned parties can stop
using RSA, or can use longer keys to stay a step ahead of contemporary technology.
However, people with secrets sometimes want their messages to remain confidential for
a while (30 years?). They may not be satisfied by longer keys if they are not confident
about the pace of future technological advances.
And if they shun RSA, what will they use instead? Not so many suitable one-way
functions are known, and others besides RSA are (or may be) vulnerable to a quantum
attack. So there really is a lot at stake. If fast large scale quantum computers become
available, the cryptographic implications may be far reaching.
But while quantum theory taketh away, quantum theory also giveth; quantum com-
puters may compromise public key schemes, but also offer an alternative: secure quantum
key distribution, as discussed in Chapter 4.
|0i H s H Measure
|λi U |λi
Here |λi denotes an eigenstate of U with eigenvalue λ (U |λi = λ|λi). Then the action
of the circuit on the control bit is
1 1
|0i → √ (|0i + |1i) → √ (|0i + λ|1i)
2 2
1 1
→ (1 + λ)|0i + (1 − λ)|1i. (6.89)
2 2
Then the outcome of the measurement of the control qubit has probability distribution
2
1
Prob(0) = (1 + λ) = cos2 (πφ)
2
1
Prob(1) = (1 − λ) |2 = sin2 (πφ), (6.90)
2
where λ = e2πiφ .
As we have discussed previously (for example in connection with Deutsch’s problem),
this procedure distinguishes with certainty between the eigenvalues λ = 1 (φ = 0) and
λ = −1 (φ = 1/2). But other possible values of λ can also be distinguished, albeit
with less statistical confidence. For example, suppose the state on which U acts is a
superposition of U eigenstates
α1 |λ1 i + α2 |λ2 i. (6.91)
And suppose we execute the above circuit n times, with n distinct control bits. We thus
prepare the state
⊗n
1 + λ1 1 − λ1
α1 |λ1 i |0i + |1i
2 2
⊗n
1 + λ2 1 − λ2
+α2 |λ2 i |0i + |1i . (6.92)
2 2
If λ1 6= λ2 , the overlap between the two states of the n control bits is exponentially small
for large n; by measuring the control bits, we can perform the orthogonal projection onto
the {|λ1 i, |λ2 i} basis, at least to an excellent approximation.
If we use enough control bits, we have a large enough sample to measure Prob (0)=
1
2 (1 + cos 2πφ) with reasonable statistical confidence. By executing a controlled-(iU ),
we can also measure 12 (1 + sin 2πφ) which suffices to determine φ modulo an integer.
However, in the factoring algorithm, we need to measure the phase of e2πik/r to ex-
ponential accuracy, which seems to require an exponential number of trials. Suppose,
though, that we can efficiently compute high powers of U (as is the case for U a ) such
as
j
U2 . (6.93)
2j
By applying the above procedure to measurement of U , we determine
exp(2πi2j φ), (6.94)
20 Quantum Algorithms
j
where e2πiφ is an eigenvalue of U . Hence, measuring U 2 to one bit of accuracy is
equivalent to measuring the jth bit of the eigenvalue of U .
We can use this phase estimation procedure for order finding, and hence factorization.
We invert eq. (6.88) to obtain
r−1
1 X
|x0 i = √ |λk i; (6.95)
r
k=0
s √1 |0i + λ4 |1i
|0i H 2
s √1 |0i + λ2 |1i
|0i H 2
|λi U U2 U4
acts on the eigenstate |λi of the unitary transformation U . The conditional U prepares
√1 (|0i + λ|1i), the conditional U 2 prepares √1 (|0i + λ2 |1i), the conditional U 4 prepares
2 2
√1 (|0i + λ4 |1i), and so on. We could perform a Hadamard and measure each of these
2
qubits to sample the probability distribution governed by the jth bit of φ, where λ =
e2πiφ . But a more efficient method is to note that the state prepared by the circuit is
m
2 −1
1 X 2πiφy
√ e |yi. (6.96)
2m y=0
A better way to learn the value of φ is to perform the QFT(m) , not the Hadamard H (m) ,
before we measure.
Considering the case m = 3 for clarity, the circuit that prepares and then Fourier
analyzes the state
7
1 X 2πiφy
√ e |yi (6.97)
8 y=0
is
6.5 Hidden Subgroup Problem 21
|0i H r H r r |ỹ0 i
|0i H r 1 H r |ỹ1 i
|0i H r 2 1 H |ỹ2 i
U U2 U4
This circuit very nearly carries out our strategy for phase estimation outlined above, but
with a significant modification. Before we execute the final Hadamard transformation
and measurement of ỹ1 and ỹ2 , some conditional phase rotations are performed. It is
those phase rotations that distinguish the QFT(3) from Hadamard transform H (3) , and
they strongly enhance the reliability with which we can extract the value of φ.
We can understand better what the conditional rotations are doing if we suppose
that φ = k/8, for k ∈ {0, 1, 2 . . . , 7}; in that case, we know that the Fourier transform
will generate the output ỹ = k with probability one. We may express k as the binary
expansion
k = k2 k1 k0 ≡ k2 · 4 + k1 · 2 + k0 . (6.98)
In fact, the circuit for the least significant bit ỹ0 of the Fourier transform is precisely
Kitaev’s measurement circuit applied to the unitary U 4 , whose eigenvalue is
(e2πiφ )4 = eiπk = eiπk0 = ±1. (6.99)
The measurement circuit distinguishes eigenvalues ±1 perfectly, so that ỹ0 = k0 .
The circuit for the next bit ỹ1 is almost the measurement circuit for U 2 , with eigen-
value
(e2πiφ )2 = eiπk/2 = eiπ(k1 ·k0 ) . (6.100)
Except that the conditional phase rotation has been inserted, which multiplies the phase
by exp[iπ(·k0 )], resulting in eiπk1 . Again, applying a Hadamard followed by measure-
ment, we obtain the outcome ỹ1 = k1 with certainty. Similarly, the circuit for ỹ2 measures
the eigenvalue
e2πiφ = eiπk/4 = eiπ(k2 ·k1 k0 ) , (6.101)
except that the conditional rotation removes eiπ(·k1 k0 ) , so that the outcome is ỹ2 = k2
with certainty.
Thus, the QFT implements the phase estimation routine with maximal cleverness.
We measure the less significant bits of φ first, and we exploit the information gained in
the measurements to improve the reliability of our estimate of the more significant bits.
Keeping this interpretation in mind, you will find it easy to remember the circuit for
the QFT(n) !
y= 0 1 2 3 4 5
.
x = 5y (mod 7) 1 5 4 6 2 3
The inverse function is
x= 1 2 3 4 5 6
.
y = dlog7,5 (x) = 0 4 5 2 1 3
The modular exponential is easy to compute classically (by repeated squaring), but the
discrete log seems to be hard to compute — the modular exponential is a candidate one-
way function. It is hard to invert because ax seems to jump about in Z∗q haphazardly as
x varies (for at least some values of q).
There are applications of this one-way function in cryptography; for example:
• Alice computes (ay )x = axy (mod q). Bob computes (ax )y = axy (mod q). This is their
final shared key.
Alice and Bob can both compute the key because the modular exponential can be
evaluated efficiently. The protocol is expected to be secure because even when ax and
ay (but not x or y) are known, it is hard to compute axy . Of course, if Eve can compute
the discrete log, she could break the protocol. E.g. knowing ax and ay she would be able
to compute x and then compute (ay )x .
But a quantum computer can evalutate a discrete log by solving a HSP! Here is how.
We would like to find
r = dlogq,a (x) (6.114)
where the value of r is such that x = ar (mod q). We consider the function
f : Z × Z → Z∗q , f (y1 , y2 ) = ay1 x−y2 (mod q). (6.115)
When does f map two different inputs to the same output?
f (y1 , y2 ) = ay1 −ry2 (mod q) = f (z1 , z2 ) = az1 −rz2 (mod q) (6.116)
iff (y1 − z1 ) − r(y2 − z2 ) = 0 (mod q − 1). This means that we may think of the input
to f as an element of the additive group G = Z × Z where f is constant and distinct on
the cosets of
H = {(y1 , y2 )|y1 = ry2 (mod q − 1)}. (6.117)
H is generated by the elements (r, 1), (q − 1, 0) so if we find generators, we determine r.
We may construct an n × n generator matrix for the lattice H whose rows are the
generating vectors:
v1
v2 n
. and H = {x = αM, α ∈ Z },
M = (6.119)
vn
where α is the row vector α = (α1 , α2 , .., αn ). For fixed H, the generator matrix is not
unique. We may make the replacement M 7→ RM where R is an invertible integral
6.5 Hidden Subgroup Problem 25
matrix with det R = ±1 (so that R−1 is also integral). Both M and RM are generators
of the same lattice.
The quotient space G/H may be called the unit cell of the lattice. It contains all the
distinct ways to shift the lattice H by an element of G. We may say that |G/H| is the
volume of the unit cell, the number of points it contains. Note that
(the linear transformation M inflates the cube {0, 1}n to a region of volume det M ).
Corresponding to the integral lattice H is its dual lattice, denoted H ⊥ . The elements
of H ⊥ are points in Rn that are orthogonal to all the vectors in H, modulo integers:
We can choose the basis for the dual lattice such that ua vbT = δab , in which case
M ⊥ M T = I. That means that, once we have found M ⊥ , an easy computation de-
termines M (matrix inversion of transpose of M ⊥ ). In the quantum algorithm for the
abelian HSP, the quantum computation determines the generators of H ⊥ (i.e. the matrix
M ⊥ ) and then finding the generators of H is easy (by matrix inversion).
The efficient solution to the problem makes use “Fourier sampling” — after evaluating
the function f on a coherent superposition of values of G, we perform (an approximation
to) the Fourier transform over the group G, and then measure. This procedure enables
us to sample nearly uniformly from H ⊥ . Only a modest number of samples are needed
to determine H ⊥ , and hence H, which solves the problem.
For example, in the case of period finding, we have G = Z, H = rZ = {x = rα, α ∈ Z}
and H ⊥ = {k = β/r, β ∈ Z}. In the quantum algorithm, we are promised that r ≤ R;
thus we can sample from H ⊥ using the Fourier transform for the finite group ZN rather
than the infinite group Z, where N ≥ R2 . Fourier sampling then provides sufficient
accuracy to determine an element β/r of H ⊥ with high success probability. After a few
samples we can determine 1/r, the generator of H ⊥ , and hence r, the generator of H.
We want to extend this idea from subgroups of Z to subgroups of Zn .
So, to solve the general abelian HSP, we Fourier transform over ZnN instead of Zn ,for
some sufficiently large N . And to keep the discussion simple at first, let’s suppose that
H is actually a subgroup of the finite group ZnN , rather than of Zn .
H-invariant coset state and Fourier sampling from the dual lattice
As in the period finding algorithm we query the black box with
1 X 1 X
p |xi to obtain p |xi ⊗ |f (x)i, (6.123)
|G| x∈G |G| x∈G
26 Quantum Algorithms
This “coset state” has an important property: it is H-invariant. We may consider the
unitary transformation Uy associated with an element y ∈ G whose action is
1 X
Uy |H, x0 i = p |x + x0 + yi, (6.125)
|H| x∈H
To appreciate the significance of H-invariance, note that if |ψi obeys U |ψi = |ψi, then
Ũ |ψ̃i = V U V −1 |ψ̃i = |ψ̃i where |ψ̃i = V |ψi. Now apply this identity to U = Uy where
V is the Fourier transform
1 X
V : |xi 7→ p e2πik·x/N |ki, (6.127)
|G| k∈G⊥
1 X −2πik·x/N
V −1 : |ki 7→ p e |xi. (6.128)
|G| x∈G
Therefore, the state |ki is an eigenstate of Ũy with eigenvalue 1 iff k · y/N = integer —
or for y ∈ H iff k/N ∈ H ⊥ . Thus, if a state is H-invariant, then in the Fourier basis its
expansion contains the state |ki with a non-zero coefficient only if k/N ∈ H ⊥ .
More explicitly, we compute
1 X
V −1 : |H, x0 i = p |x + x0 i (6.133)
|H| x∈H
1 X X
7→ p e2πik·(x+x0 )/N |ki. (6.134)
|H||G| x∈H k∈G⊥
(6.135)
Because of H-invariance, only k/N ∈ H ⊥ survives in the sum over G⊥ , and for such k,
6.5 Hidden Subgroup Problem 27
e2πi(k·x)/N = 1 so we obtain
1 X
p e2πik·x0 /N |ki (6.136)
⊥
|H | k∈H ⊥
Therefore if we Fourier sample — i.e. Fourier transform and then measure — the prob-
ability distribution that governs the outcome is the uniform distribution on H ⊥ . Once
we have sampled from H ⊥ enough times, with high probability a generating set for H ⊥
will be found.
Query complexity
How many samples are enough (assuming now that G is finite — e.g. G = ZnN — and
H ⊆ G)? Suppose K is a group (either abelian or not), and m elements of K are
chosen uniformly at random. If these m elements do not generate K, then they must be
contained in some maximal proper subgroup S ⊂ K. (Proper means S is smaller than
K, and maximal means we cannot add another element of K to S without generating all
of K.) Any proper subgroup has order |S| ≤ |K|/2, because the order of the subgroup
must divide the order of K, and the probability that all m elements are in S is
|S| m
Prob(all m in S) = ; (6.137)
|K|
therefore the probability that the m elements generate K is
X |S| m
Prob(m elements generate K) ≥ 1 − ≥ 1 − (# max)2−m , (6.138)
|K|
S∈max
where the sum is over maximal proper subgroups {S}, and where (# max) denotes the
total number of maximal proper subgroups.
If K is abelian, we can count the maximal proper subgroups. S is a sublattice of K
and if S is a maximal proper subgroup, then its dual lattice S ⊥ contains a vector not
in K ⊥ . There is only one such (linearly independent) vector if S is maximal, for if there
were two then we could remove one, obtaining a smaller S ⊥ and hence a larger proper
subgroup S. Any nontrivial vector not in K ⊥ determines such a subgroup, so there are
|G/K ⊥ | − 1 choices (where e.g. G = ZnN ), and therefore
Prob(m elements generate S) ≥ 1 − 2−m |G/K ⊥ |. (6.139)
In the case of the hidden subgroup problem where H ⊆ G = ZnN , we are sampling
K = H ⊥ and |G/K ⊥ | becomes |G/H|, the number of cosets. To have constant success
probability, then, we choose m such that e.g.
1
2−m |G/K ⊥ | < , (6.140)
2
or m − 1 > log |G/H| (compare this with the conclusion for Simon’s problem). Since
|G/H| < N n ; it suffices if m = O(n log N ).
How large should N be? For period finding with r ≤ R, choosing N ≥ R2 provided
adequate precision for finding r. For an integral lattice with generator matrix M , its
inverse matrix (transpose of M ⊥ ) has entries that can be expressed as integer/ det M ,
where det M = |G/H|, the number of cosets. In the formulation of the HSP, we are
provided with an upper bound |G/H| ≤ R, and N needs to be large enough to point to
a unique rational number with denominator ≤ R with reasonable success probability.
28 Quantum Algorithms
In our discussion of period finding (H ⊥ = Z/r) we noted that Fourier sampling yields
a rational number y/N close to integer/r with high probability:
X y k 1 4
Prob − ≤
≥ 2, (6.141)
N r 2N π
k
so that choosing N ≥ R2 was good enough to determine a rational number with de-
nominator < R. If we fix the desired accuracy δ, then the part of the distribution lying
outside the peaks decreases as N increases (an exercise):
y k 1
Prob ∀k − > δ ≤
, (6.142)
N r Nδ
The peak of the Fourier transform sharpens with increasing N , so that the prob of lying
outside all peaks with half width δ scales like 1/N .
When we sample H ⊥ , we find an n-component vector, where each component should
be determined to accuracy 1/R2 (where |G/H| < R). The probability of success in
finding all n-components to this accuracy is
1 n
Prob(success) ≥ 1 − (6.143)
Nδ
and the probability of being successful in each of m consecutive samplings is
1 nm
Prob(success m times) ≥ 1 − . (6.144)
Nδ
For δ = 1/R2 , the success probability is a constant for
mnR2
< constant or N = O(mnR2 ). (6.145)
N
Since m = O(n log N ) samples are sufficient to find generators of H ⊥ , we conclude it
suffices to choose N to be
N = O(n2 R2 log N ) = O(n2 R2 log(nR)) (6.146)
This is good enough to determine H ⊥ in m = O(n log N ) = O(n log(nR)) queries, and
the generators of H are found by inverting the matrix M ⊥ that generates H ⊥ .
The algorithm is efficient: both the number of queries and the number of steps in the
quantum Fourier transform are polylog in (the upper bound on) the number of cosets
|G/H|.
We can model this situation in the black box setting. Suppose we are promised that
the function evaluated by the box has the form
(
0 x 6= w,
fw (x) = (6.148)
1 x = w,
where x ∈ {0, 1, 2.., N −1} for some unknown w. Our task is to find w, the marked string.
Classically, we’ll need to query the box more than N/2 times to find w with success
probability above 1/2. This is a black-box version of an NP-hard problem, where there
is a unique witness accepted by a circuit, but the problem has no structure, so there is
no better option than exhaustive searching.
Now we ask, can exhaustive search be done faster on a quantum computer? The
answer is yes, using
√ Grover’s algorithm. With quantum queries, we can find the marked
string using√O( N ) queries. Thus we can solve NP-hard problems by exhaustive search
in time O( N polylogN ).
We say that Grover’s algorithm achieves a quadratic speedup relative to exhaustive
search on a classical computer. Though the speedup is only quadratic rather than ex-
ponential, Grover’s algorithm is interesting because of its broad applicability. And√it is
rather remarkable: in effect we can interrogate N potential witnesses by asking O( N )
questions.
In the quantum setting, the black box applies the unitary
Uw : |xi ⊗ |yi 7→ |xi ⊗ |y ⊕ fw (x)i, (6.149)
where x ∈ {0, 1}n and y ∈ {0, 1}n . By the standard trick, Uw becomes a phase oracle:
Uw : |xi ⊗ |−i 7→ (−1)fw (x) |xi ⊗ |−i (6.150)
where
(
1 x 6= w,
(−1)fw (x) = (6.151)
−1 x = w.
Ignoring the output register (which is unaffected by Uw ), we can express Uw acting on
input as
Uw = I − 2|wihw|. (6.152)
We can express a general n-qubit state |ψi as
Uw : |ψi = a|wi + b|ψ ⊥ i 7→ −a|wi + b|ψ ⊥ i, (6.153)
where hw|ψ ⊥ i = 0. That is, we resolve |ψi into a component along |wi and a component
in the hyperplane orthogonal to |wi. Uw induces a reflection of the vector |ψi about this
hyperplane.
The first step in Grover’s algorithm is to prepare the uniform superposition of all
values of x:
N −1
⊗n 1 X
|si = H |0i = √ |xi; (6.154)
N x=1
this state |si has overlap with the marked string |wi
1
hw|si = √ . (6.155)
N
30 Quantum Algorithms
The next step is to apply the Grover iteration many times in succession, where each
iteration enhances the overlap of the quantum superposition with the marked state |wi,
while suppressing the amplitude for each |xi with x 6= w. This iteration is
UGrover = Us Uw (6.156)
reflects a vector about the axis determined by |si. Note that Us is easy to construct as
a quantum circuit. It can be expressed as
since H ⊗n maps |si to |0i, where H is the single qubit Hadamard gate. Furthermore a
multiply controlled Z gate can be formed from a multiply controlled-not gate Λn−1 (X)
where the target qubit has a Hadamard before and after the gate, and we know that
Λn−1 (X) can be constructed from O(n) Toffoli gates. Finally, we can conjugate by X ⊗n
so the phase gate is triggered by |00..0i rather than |11..1i. Thus Us is realized by a
circuit of size O(log N ).
What does UGrover do? It preserves the plane spanned by |si and |wi, so we may
confine our attention to that plane. First, Uw reflects |si about the axis |w⊥ i (the vector
⊥ to |wi in the span of |si and |wi). Then Us reflects Uw |si about the axis |si. The net
effect of UGrover , then, is a counterclockwise rotation in the plane by the angle 2θ, where
θ is initial angle between |si and |wi. Each time we repeat the Grover iteration the state
vector rotates further in the counterclockwise direction by 2θ. √
The initial angle θ between √ |si and |w⊥ i is given by sin θ = hw|si = 1/ N . For
N 1, then, we have θ = 1/ N + O(1/N 3/2 ). If we repeat the Grover iteration T
times, then the vector is rotated away from the |w⊥ i axis by (2T + 1)θ.
We may choose T such that (2T + 1)θ = π/2 + δ where |δ| ≤ θ/2 ≈ 2√1N . Then
if we measure in the computational basis, we find the outcome |wi with probability
1
Prob(w) = cos2 δ ≥ 1 − √ δ 2 ≥ 1 − 4N . Thus we find |wi with success probability close
π π
to 1 using T ≈ 4θ ≈ 4 N Grover iterations, making use of T quantum queries to the
black box. This is Grover’s quadratic speedup.
Suppose now that there are r marked states, where r is known. Classically, with each
query the probability of finding a solution (wi such that f (wi ) = 1) is r/N , so we need
O(N/r) queries to find a solution with constant success probability. Quantumly, the
uniform superposition of the marked states
r
1 X
|markedi = √ |wi i (6.159)
r
i=1
PN −1
has overlap with |si = √1 |xi
N x=0
r
r
hmarked|si = = sin θ (6.160)
N
and the Grover iteration again rotates by 2θ in the plane spanned by |si and |markedi
(because the query reflects about the axis perpendicular to |markedi).
As above, then, for N/r 1, we achieve success probability Prob = 1 − O(r/N )
6.6 Quantum Searching 31
p
in T ≈ π/4 N/r queries. Again, the speedup is quadratic: The number of quantum
queries needed to find a solution is
p
#quantum queries = O( #classical queries). (6.161)
What if r is not known a priori? As a function of the number ofp queries the success
probability oscillates, where the period of the oscillation is T ≈ π/2
√ N/r. If we choose
T uniformly at random in the interval T = {0, 1, 2, .., Tmax ≈ π/4 N }, then if there is
a solution (r > 0), a solution will be found with Prob ≥ 1/2 + O(1/N ). If we repeat m
times, sampling a different random value of T each time, we will find a solution apart
from a small failure probability ≈ 2−m . Therefore, we can use Grover’s algorithm to
solve a decision problem in NP with high success probability, in
√
time = O( N polylogN ), (6.162)
since we can compute the circuit that evaluates f (x) in (classical or quantum) time
O(polylogN ).
Thus Ũs reflects in the axis U |0i rather than |si. The analysis of the algorithm is the
same as before, and in the case√ where there exists a unique solution,we can find it with
π π
high probability in T ≈ 4θ < 4 N queries.
Specifically, because of the structure of the function, we might be able to exclude all
except M < N inputs as potential solutions.√ Then, classically, we could find the solution
in O(M ) queries, while quantumly only O( M ) queries suffice, if we can construct U
such that
M
1 X
U |0i = √ |xi i, (6.165)
M i=1
classical heuristic — that is, a function g that takes a randomly generated seed r in a
set R to a trial solution:
g : r 7→ g(r) where r ∈ R. (6.166)
The heuristic is useful if trial solutions generated by the heuristic are more likely to be
accepted than trial solutions chosen uniformly at random
# of soln. in g(R) total # of soln.
> , (6.167)
|R| N
where the bracket h·i indicates the expectation value evaluated for a probability distri-
bution on black-box functions. Then the number of classical queries to find a solution,
using the heuristic, with constant success probability is
|R|
Tclassical = O . (6.168)
# of soln. in g(R)
To exploit the heuristic in quantum searching, we apply Grover’s algorithm to searching
in the space of seeds instead of the full search space. The heuristic is realized as an
efficiently computable unitary:
|ri ⊗ |0i 7→ |ri ⊗ |g(r)i. (6.169)
We can query the box with |g(r)i and then run the evaluation of g backwards to erase
garbage:
|ri ⊗ |0i ⊗ |yi 7→ |ri ⊗ |g(r)i ⊗ |yi 7→ |ri ⊗ |g(r)i ⊗ |y ⊕ f (g(r))i
7→ |ri ⊗ |0i ⊗ |y ⊕ f (g(r))i (6.170)
This composite oracle can be consulted to search R for a state marked by the function
f ◦ g (i.e. for a state marked by f in g(R), the range of g). The number of quantum
queries used is
*s +!
|R|
Tquantum = O (6.171)
# of soln. in g(R)
that calls the oracle T times. We place no restriction on the circuit aside from specifying
the number of queries; in particular, we place no limit on the number of quantum gates.
This circuit is applied to an initial state |ψ(0)i, producing a final state
(The sum is minimized if |ϕi is the equally weighted superposition of all the basis
elements, |ϕi = √1N ω |ψω i, as you can show by invoking a Lagrange multiplier to
P
perform the constrained extremization.) Our strategy will be to choose the state |ϕi
suitably so that we can use this inequality to learn something about the number T of
oracle calls.
Our circuit with T queries builds a unitary transformation
U (ω, T ) = U ω U T U ω U T −1 . . . U ω U 1 , (6.175)
where U ω is the oracle transformation, and the U t ’s are arbitrary non-oracle transforma-
tions. For our state |ϕ(T )i we will choose the result of applying U (ω, T ) to |ψ(0)i, except
with each U ω replaced by I; that is, the same circuit, but with all queries submitted to
the “empty oracle.” Hence,
while
To compare |ϕ(T )i and |ψω (T )i, we appeal to our previous analysis of the effect of errors
on the accuracy of a circuit, regarding the ω oracle as an “erroneous” implementation
of the empty oracle. The norm of the error vector in the t-th step is
T
X
k |ψω (T )i − |ϕ(T )i k≤ 2 |hω|ϕ(t)i|. (6.179)
t=1
34 Quantum Algorithms
the quadratic speedup is the best possible if we rely on the power of sheer quantum
parallelism — that is, if we don’t design our quantum algorithm to exploit the specific
structure of the problem we wish to solve.
The optimality of the Grover algorithm might be construed as evidence that BQP 6⊇
NP. At least, if it turns out that NP ⊆ BQP and P 6= NP, then the NP problems
must share a deeply hidden structure (for which there is currently no evidence) that is
well-matched to the peculiar capabilities of quantum circuits.
Even the quadratic speedup may prove useful for a variety of NP-complete optimiza-
tion problems. But a quadratic speedup, unlike an exponential one, does not really move
the frontier between solvability and intractability. Quantum computers may someday
outperform classical computers in performing exhaustive search, but only if the clock
speed of quantum devices does not lag too far behind that of their classical counterparts.
where each term Ha acts non-trivially on at most k qubits — i.e. Ha = H̃a ⊗ I n−k , and
H̃a acts on some set of at most k qubits. (Of course, we may use a similar definition for
a system of d-dimensional subsystems for constant d > 2, rather than qubits.) We say
that H is local if it is k-local for some constant k.
There is a stronger notion of locality we sometimes use, which can be called geometrical
locality or spatial locality. A k-local Hamiltonian is geometrically local in D dimensions
if the qubits can be arranged in (flat) D-dimensional space with a bounded number
of qubits per unit volume, and the k qubits upon which Ha acts non-trivially are all
contained in a ball of constant radius. In this sense there are no long-range interactions
among the qubits. H is geometrically local if it is geometrically k-local in D dimensions
for some constant D and k.
P
If we write H = a Ha where there is a unique Ha for each set of k qubits, then
the expansion of a k-local H contains at most nk = O(nk ) terms, and the expansion
of a geometrically local H contains O(n) terms (each of the n qubits is contained in a
constant number of interacting sets). Let us also assume that each Ha is bounded
||Ha ||∞ ≤ h for all a, where h is a constant. (6.188)
36 Quantum Algorithms
Physicists are interested in geometrically local Hamiltonians because they seem to pro-
vide an accurate description of Nature. Therefore, it is noteworthy that quantum circuits
can simulate quantum evolution governed by a local Hamiltonian efficiently: evolution
of n qubits for time t can be simulated to constant accuracy using a circuit whose size
is polynomial in n and t.
We can formulate the problem this way: suppose we are given an initial quantum state
|ψ(0)i, or a classical description of a quantum circuit that prepares the state. Our goal
is to construct
|ψ(t)i = U (t)|ψ(0)i (6.189)
d
where U (t) satisfies dt U (t) = −iH(t)U (t) and the boundary condition U (0) = I. (Thus
U (t) = e −iHt in the case where H is time independent). We will settle for computing
|ψ(t)i to accuracy δ, i.e. constructing ψ̃(t)i where
where Ũ is the simulated unitary and U is the ideal unitary. Hence the error per time
step should be less than δ∆/t.
P
Suppose H = a Ha is a sum of M k-local terms, and let’s consider the geometrically
local case, where M = O(n). We will show below that a single time step can be simulated
by a product of M local “gates” (unitary transformations that act on a constant number
of qubits) where each such “gate” has an error O(∆2 h2 ). Therefore the simulation of
evolution for time t uses all together M t/∆ gates where we require
Mt 2 2 δ
∆ h ≈ δ =⇒ ∆ = O . (6.193)
∆ h2 M t
Therefore the total number of gates is
h2 (M t)2
L=O . (6.194)
δ
Furthermore each “gate” can be simulated to accuracy O(∆2 h2 ) with a universal gate
6.8 Using quantum computers to simulate quantum physics 37
2
2
set using polylog ∆21h2 = polylog h (M t)
δ2
gates, according to the Solovay-Kitaev
theorem. We conclude that the simulation can be done with a quantum circuit of size
2
h (M t)2
2
h (M t)2
L=O polylog . (6.195)
δ δ2
In the case where H is geometrically local, M = O(n) = O(V ), where V is the spatial
volume of the system. Since h is a constant, we find that the cost of simulating time
evolution with fixed accuracy scales like
L = O(Ω2 polylog Ω), (6.196)
where Ω = V t is the simulated volume of spacetime.
Now we need to explain how to simulate a single time step. We’ll use the idea that
P Q
exp ( a Aa ) can be approximated by a exp (Aa ) if ||A|| 1. To check the accuracy
we expand the exponentials:
!
X Y
exp Aa − exp (Aa ) (6.197)
a a
X 1X Y 1
= 1 + Aa + Aa Ab + . . . − 1 + Aa + A2a + . . .
a
2 a
2
a,b
!
X 1 X X X1
2
X
= 1 + Aa + Aa Ab + . . . − 1 + Aa + A + Aa Ab + . . .
a
2 a a
2 a
a,b a<b
!
1 X X X
= Aa Ab + Ab Aa − Aa Ab + . . .
2
a<b a<b a<b
1X
=− [Aa , Ab ] + . . .
2
a<b
P
(where + . . . denotes terms higher order in Aa ). Writing H = a Ha , then, we find that
Y 1 X
e−iH∆ − e−iHa ∆ = ∆2 [Ha , Hb ] + h.o. (6.198)
a
2
a<b
Now, how many non-vanishing commutators {[Ha , Hb ]} can occur in this sum? Let’s
suppose the Hamiltonian is geometrically local, in which case there are O(n) terms
in H, and each term fails to commute with a constant number of terms. So, there are
O(n) = O(M ) non-vanishing commutators. We conclude that (in the geometrically local
case)
−iH∆ Y −iHa ∆
e − e
= O(M ∆2 h2 ). (6.199)
a
Since Πa e−iHa ∆is a product of M “gates,” we have verified that the accuracy per gate
2 2
is O(∆ h ) (Note that terms arising from the higher-order terms in the expansion of
the exponential are of order M ∆3 h3 , and therefore systematically supressed by another
factor of ∆h = O(δ/hM t) = O((δ/L)1/2 ). )
So far we have shown that, for a geometrically local H that is a sum of bounded terms,
evolution in a spacetime volume Ω can be achieved with a quantum circuit of size
L = O(Ω2 polylog Ω), (6.200)
38 Quantum Algorithms
The resources needed for the simulation scale like the square of the simulated volume
(up to a log factor). Can this be improved?
P
An improved approximation to exp ( a Aa ) is the subject of an exercise. Instead
1 1
of a eAa we use a→ e 2 Aa a← e 2 Aa , where a→ denotes the product in ascending
Q Q Q Q
Q
order, and a← denotes the product in descending order. The exercise shows that, for
geometrically local H,
Y 1 Y 1
||e−iH∆ − e 2 Ha ∆ e 2 Ha ∆ || = O(M ∆3 h3 ), (6.201)
a→ a←
O(h3 ∆3 )
i.e. the error per gate is instead of O(h2 ∆2 ). For an accurate simulation, we
choose
1/2
Mt 3 3 δ
(∆ h ) ≈ δ =⇒ ∆ ≈ ; (6.202)
∆ h3 M t
the number of gates is
Mt h3/2 (M t)3/2
≈ , (6.203)
∆ δ 1/2
3/2
t)3/2
and the Solovay-Kitaev blowup factor is polylog ∆31h3 = polylog h δ(M
3/2 =
hM t
polylog δ . We conclude that spacetime volume Ω can be simulated with a circuit of
size
L = O(Ω3/2 polylog Ω). (6.204)
The improved approximation in each time step increases the circuit size by only a factor
of 2.
The accuracy of this Trotter-Suzuki approximation to e−iH∆ can be improved further,
so that
||e−iH∆ − approximation|| = O cp M (h∆)p+1
(6.205)
where p is any power, the constant cp depends on p, and the improved approximation
increases the number of gates by a factor that depends on p. With this approximation,
we choose
Mt δ 1/p
(h∆)p+1 ≈ δ =⇒ ∆ ≈ (p+1)/p , (6.206)
∆ h (M t)1/p
so that the number of gates is
Mt h(p+1)/p (M t)(p+1)/p
≈ . (6.207)
∆ δ 1/p
Including the Solovay-Kitaev log factor, we can do the simulation with a circuit of size
L = O(Ω1+ polylog Ω), (6.208)
where = 1/p. In fact, though, the factor cp grows exponentially with p = 1/; this
happens because each time we increase the value of p by 1, we may the price of increas-
ing the circuit size by a constant factor. But anyway, as we systematically improve the
approximation, the circuit size comes closer and closer to scaling linearly with the sim-
ulated volume, though it never quite makes it. Somehow, Nature manages to simulate
herself with “resources” scaling like Ω (and without any error!), but this approximation
method using a universal quantum computer does not do quite as well. There are other
simulation methods that more nearly match the scaling Nature achieves.
6.8 Using quantum computers to simulate quantum physics 39
Whether the quantum circuit model can simulate Nature efficiently is an important
issue, because it addresses whether this model is the right one for studying what problems
can be solved with reasonable resources by computers that are in principle physically
realizable. We believe that the classical Turing machine model is not the right model,
because it seems to be incapable of efficiently simulating general quantum systems, even
ones for which the Hamiltonian is geometrically local. The quantum circuit model is
presumably stronger, but is it really strong enough?
Particle physicists model fundamental physics using quantum field theory (QFT).
Some predictions of QFT can be computed classically, and the success of such predictions
provides the evidence that QFT is a good model. But we don’t know how to efficiently
simulate on a classical computer the real-time evolution of, for example, nuclear matter
governed by quantum chromodynamics. Can we do such simulations efficiently on a
quantum computer?
The question is subtle because the number of qubits per unit volume is formally
infinite in QFT (there are degrees of freedom at arbitrarily small distances) and al-
though the Hamiltonian is local, the terms in the Hamiltonian do not necessarily have
bounded norm. On the other hand, in a physical process of interest, we can usually
assume that the energy density per unit volume is bounded above. In that case, we
expect that the very-short-distance degrees of freedom are not so relevant, so that we
can attain reasonable accuracy by approximating continuous space with a finite den-
sity of lattice points per unit volume, and that furthermore the terms in the local
Hamiltonian can be well approximated by operators with bounded norm. A reasonable
expectation is that the complexity of a simulation of the dynamics scales polynomially
in Ωρmax = (V tρmax ) = Emax t where ρmax is the maximal energy per unit volume, and
Emax is an upper bound on the total energy. But there is no rigorous theorem establish-
ing such scaling. Though physicists have a pretty good grasp of the properties of QFT
(as indicated by the agreement between theory and experimental data), rigorous math-
ematical results pertaining to QFT are still quite technical and incomplete, particulary
in three spatial dimensions (where we live).
Our understanding is even less satisfactory for physical processes in which gravita-
tional effects are important. Can the quantum circuit model simulate quantum gravity
efficiently? If so, quantum computers may turn out to be a powerful tool for deepening
our understanding of quantum gravity. If not, then we still have not found the computa-
tional model that fully captures the computational power that is potentially realizable
in Nature.
if the matrix is very sparse and has a simple structure, but in some physically relevant
cases efficient classical algorithms are not known.
Sometimes, if we express H in a “natural” basis we find that all the off-diagonal terms
in the matrix are non-positive, i.e.
H = cI − h (6.209)
where I is the identity and h has only non-negative entries. In that case, the ground state
P
|ψ0 i of H (the eigenvector with the lowest eigenvalue) can be expressed as |ψ0 i = i ci |ii
where |ii is a basis element and all ci can be chosen non-negative. That is because |ψ0 i
maximizes
X
hψ0 |h|ψ0 i = c∗i cj hij (6.210)
and so it is optimal to choose c∗i cj ≥ 0 for all i and j. When all the ci are nonnegative,
there are quantum Monte Carlo sampling algorithms that can find the {ci } efficiently
and accurately in practice. But if the off-diagonal terms in H have both + and − signs,
then sampling algorithms might not work well because there can be delicate cancellations
between positive and negative terms contributing to hψ0 |h|ψ0 i. Computational physicists
call this the sign problem.
But on a quantum computer, we can estimate eigenvalues of a unitary matrix U using
the phase estimation algorithm. To obtain m bits of accuracy, we prepare an m-qubit
register in a uniform superposition state, and execute this circuit:
-figure-
-figure-
The location of each narrow peak estimates an energy eigenvalue Ea , modulo 2π/T .
The height of the peak estimates |hEa |ψi|2 – the overlap |ψi with the corresponding
energy eigenstate |Ea i. To compute the energy eigenvalue to accuracy polynomial in n,
we choose
δ ≈ 2−m ≈ 1/nc =⇒ m = c log2 n. (6.212)
6.8 Using quantum computers to simulate quantum physics 41
electronic ground state of a molecule with atomic nuclei at fixed positions to accuracy
≈ 10−6 (chemical accuracy). They also claim that it is an adequate approximation to
express
M
X
H= Ha (6.218)
a=1
and each Ha acts on ≤ 4 basis functions. Hence this Hamiltonian is local. Further-
more, chemists assert (without proof) that it is possible to evolve adiabatically from
the Hartree-Fock Hamiltonian (which they can solve classically) to the full configuration
interaction (FCI) Hamiltonian (which they want to solve, but don’t know how to solve
classically in general) while the gap satisfies ∆ ≥ constant. If that is true, quantum
computers are likely to be a powerful tool for studying molecular chemistry.
let us use the Schmidt number. Recall that the Schmidt number is the number of terms
in the Schmidt expansion of a bipartite pure state — equivalently it is the rank of the
density operator for either part.
While in principle the Schmidt number could be as large as dn/2 when the system is
cut into two subsystems of equal size, let us assume that
Schmidt number ≤ D = constant. (6.219)
We claim that under this assumption:
1. There is a succinct classical description of the pure state of n qudits.
2. A computation in which the n qudits have bounded Schmidt number at all times can
be efficiently simulated on a classical computer.
First, let’s construct the succinct description. The n-qudit state |ψi can be expanded
in the standard basis as
X
|ψi = ci1 i2 ..in |i1 i2 ..in i (6.220)
in terms of the dn complex numbers {ci1 i2 ..in }. This is not succinct in general, but if the
Schmidt rank is bounded then each ci1 i2 ..in can be expressed as a contraction of tensors,
where each tensor has 3 indices, and each index has a constant number of values:
-figure-
Here each (Pk )iαkk−1 αk has dD2 entries (except for the sites at the ends of the chain,
which have dD entries). Therefore, the state is described by ndD2 complex parameters,
a number that is linear rather than exponential in n. (In the case of a product state,
this becomes nd parameters.)
To obtain this compressed description of the n-site state |ψi we perform the Schmidt
decomposition repeatedly (n − 1 times in succession). In the first step we divide the
system between qudits 1 and 2:
X
|ψi = λ1α1 |ψ1α1 i|φ2α1 i. (6.221)
α1
Here the {λ1α1 } are Schmidt coefficients, the {|ψ1α1 i} are elements of an orthonormal
basis for the first qudit, and {|φ2α1 i} are elements of an orthonormal basis for qudits
2, 3, ..n. We can expand |ψ1α1 i in the standard basis, obtaining
XX XX
i1
|ψi = λ1α1 ψ1α 1
|i1 i|φ 2α 1 i = (P1i1 )α1 |i1 i|φ2α1 i, (6.222)
α1 i1 α1 i1
Note that there is no α1 label on the states {|φ3α2 i}. That is because these are the states
occuring in the Schmidt decomposition across the 2-3 cut that describe the right side of
44 Quantum Algorithms
the cut. These states have nothing to do with α1 , which labels the states in the Schmidt
decomposition across the 1-2 cut.
Repeating the Schmidt decomposition n−1 times we find
X X
|ψi = (P1i1 )α1 (P2i2 )α1 α2 (P3i3 )α2 α3 ..(Pnin )αn−1 |i1 i2 ..in i. (6.224)
α1 ..αn−1 i1 ,..in
This is the decomposition in terms of contracted tensors that we sought. It is called the
matrix product state (MPS) decomposition of |ψi. It exists for any |ψi, but for generic
|ψi, the matrices in the middle of the chain of n qudits become exponentially large
in n. Under the assumption of bounded Schmidt rank, however, (P1 )i1 and (Pn )in are
D-component vectors and (Pk )ik for k = 2, 3, ..n − 1 are D × D matrices.
Therefore, we have expressed |ψi = i1 ,..in P1i1 P2i2 ..Pnin |i1 i2 ..in i where the coefficient
P
This description looks like the MPS locally, but with the ends of the chain now glued
together.
Suppose that the qudits are arranged in a one-dimensional chain, and that a circuit of
geometrically local quantum gates acts on the state, such that each gate acts on a pair of
neighboring qudits. We want to see that the MPS description can be efficiently updated
as this circuit is executed. If a unitary transformation U acts on a pair of neighboring
qudits on the chain as in
-figure-
-figure-
ij
We regard Mαγ as a dD × dD matrix, and we perform its singular value decomposition
(SVD). Recall that for any matrix M , there are unitary matrices VL and VR such that
M = VL† ∆VR , where ∆ is diagonal. This Schmidt decomposes the state
X X
Mac |aci = ∆b |ψLb i ⊗ |ψRb i (6.227)
ac b
P † † P
where |ψLb i = a |aiVLab and |ψRb i = c |ciVRbc . Here |aci is shorthand for |αi, γji,
where |αi labels the states in the Schmidt decomposition across the cut just to the left of
6.9 Classical simulation of slightly entangling quantum computations 45
the two neighboring sites where U acts, and γ labels states in the Schmidt decomposition
across the cut just to the right.
The SVD of an N ×N matrix can be computed in time O(N 3 ) on a classical computer.
ij
In the case of Mαγ , for which N = dD, this is time O(d3 D3 ). In general, the middle index
b would be summed over N (= dD) values. But if the Schmidt rank stays bounded√by D,
the matrix ∆ has √ at most D nonzero eigenvalues. We may identify (P 0i )αβ = (VL† ∆)αβ
and (Q0j )βγ = ( ∆VR )βγ , completing the update of the MPS. The point is that when
U acts on neighboring sites labeled a, a + 1, we need not update the Schmidt states to
the left of site a or to the right of site a + 1.
Acting on a product state, any 2-qudit state acting across a given cut can produce a
state with Schmidt number at most d2 across that cut.
-figure-
This can be achieved by a SWAP gate acting on two maximally entangled pairs, one
on each side of the cut. Likewise, acting on a state with Schmidt number D, the two-
qudit gate can increase the Schmidt number to at most d2 D . This means that, starting
with a product state, a circuit in which no more than G gates act across any particular
cut creates a state with Schmidt number at most D = (d2 )G across that cut. (We can
simulate a 2-qudit gate U acting across the cut by swapping the qubits until they are at
adjacent sites on either side of the cut, applying U , and then swapping back. Only U ,
not the swaps, increases the Schmidt number, and it increases it by at most the factor
d2 .) The quantum state admits an efficient MPS classical description provided that the
Schmidt number D is bounded above by poly(n) across every cut, and furthermore this
description can efficiently updated each time a quantum gate is applied. It follows that
a quantum computation acting on n qubits can be simulated efficiently by a classical
computer if the initial state of the quantum computer is a product state, and the number
of gates that act across each cut (for some ordering of the qubits) is O(log n), since in
2
that case we have D ≤ (d2 )O(log n) = nO(log d ) .
We should also note that, for an MPS, local measurements of the qudits in the chain
are easy to simulate. First let’s impose the proper normalization condition on the state
|ψi:
X
|ψi = tr P1i1 P2i2 ..Pnin |i1 i2 ..in i (6.228)
i1 ,..in
X 2
=⇒ hψ|ψi = tr P1i1 P2i2 ..Pnin ,
i1 ,..in
the trace of a product of n constant-size matrices in time O(n). This completes the proof
that slightly entangled quantum computations can be classically simulated in polynomial
time.
Finally, let’s return to the problem of finding the energy of the ground state of a local
Hamiltonian. In many one-dimensional systems studied in physics, the ground state and
low-lying excited states can be well approximated by an MPS with a reasonable matrix
size. That is, there is a good approximation to e.g. the ground state with maximal
Schmidt number across any cut
D = poly(n). (6.231)
However there are local Hamiltonians (even translation-invariant ones) for which finding
the ground state energy seems to be hard not only for classical computers but also for
quantum computers! In these cases, either an accurate MPS representation requires
matrices super-polynomial in size, or, if there is a good MPS approximation with D =
poly(n), it must be a hard computational task to find that approximation.
quantum analogue of NP: the class of problems such that the solution can be verified
efficiently with a quantum computer if a suitable “quantum witness” is provided.)
For the classical case, we’ll recall the notion of a“reduction” of one computational
problem to another (B reduces to A if a machine that solves A can be used to solve B),
and then we’ll consider this sequence of reductions:
1. Any problem in NP reduces to CIRCUIT-SAT (already discussed previously); i.e.,
CIRCUIT-SAT is NP-complete.
2. CIRCUIT-SAT reduces to 3-SAT (3-SAT is NP-complete).
3. 3-SAT can be formulated as the problem of finding the ground state energy of a
classical 3-local Hamiltonian to constant accuracy.
4. MAX 2-SAT is also NP-hard and can be formulated as the problem of finding the
ground state energy of a classical 2-local Hamiltonian to constant accuracy.
5. The classical 2-local Hamiltonian problem is still hard in the case where the Hamil-
tonian is geometrically local, in three or even in two dimensions (cases of interest for
describing real spin glasses).
(5) implies that a spin glass will not be able to relax to its ground state efficiently in any
realistic physical process (which is part of what physicists mean by the word “glass”).
each variable is a bit. The formula is a conjunction of clauses, and the formula is true if
and only if every clause is true. In the k-SAT problem, each clause depends on at most
k of the variables, where k is a constant. (In some formulations of k-SAT, each clause is
required to be a disjunction of k literals (variables or their negations), but that is not
an important requirement, since any formula, and in particular any k-bit formula, can
be expressed in conjunctive normal form.). If f is a Boolean formula, the SAT function
is:
Now we’ll show that CIRCUIT-SAT reduces to 3-SAT. For a given circuit C (the input
to CIRCUIT-SAT), how do we construct the corresponding Boolean formula R(C) (the
input to 3-SAT)?
Suppose that the gates in the circuit C are chosen from the universal set (AND, OR,
NOT), or any other gate set such that each gate has at most two input bits and one
output bit. We introduce a variable for the output of each gate, and we include in the
formula R(C) a clause corresponding to each gate.
Here the three-variable clause Cg (x, y, z) is true iff z is a valid output of the gate g when
the inputs are (x, y). The circuit may also have inputs that are constants rather than
variables. Then, e.g. a gate with two input bits, one of which is a constant, becomes a two-
variable clause, determined by the gate and the value of the constant. Or equivalently,
we can regard input of a constant as a gate with a one-bit output; the corresponding
one-bit clause is true if x has the right value. The circuit also has an answer bit, which
becomes a 1-bit clause Cg (x) which is true iff x = 1 (that is, if and only if the answer is
YES).
The formula R(C) has as variables the input x to the circuit C, and also additional
variables corresponding to the outputs of all gates in the circuit C. R(C) has been
constructed so that an assignment that satisfies every clause in C corresponds to a
valid history of the computation of the circuit C acting on input x, where the input is
accepted. If there is an input x that is accepted by the circuit C, then there will be a
satisfying assignment for the 3-SAT formula R(C), and conversely if there is no input
that C accepts, then there will be no satisfying assignment for R(C). Thus we have
obtained the desired reduction of CIRCUIT-SAT to 3-SAT.
The key idea we have exploited to reduce CIRCUIT-SAT to 3-SAT is that the witness
for 3-SAT is a valid history of the whole computation C that accepts the input x. We can
check the history efficiently because the circuit C has polynomial size and it is easy to
check each of the poly(n) gates in the execution of the circuit. Later on, we will extend
this idea — that a valid history of the computation is an efficiently checkable witness
— to the quantum setting.
Notice that we may think of the clauses in the formula f as the terms in a 3-local
classical Hamiltonian
X
H(x) = Hc (xc1 , xc2 , xc3 ); (6.234)
c
Then minx H(x) = 0 if there is an assignment that satisfies every clause, while
minx H(x) ≥ 1 if there is no satisfying assignment (the number of violated clauses
is ≥ 1 for any assignment). We conclude that finding the minimum value of a 3-local
classical Hamiltonian is NP-hard: if we could do it we could solve 3-SAT. and hence any
problem in NP. This conclusion implies, as asserted earlier, that finding the ground state
energy of a 3-local classical Hamiltonian to constant accuracy must be hard in general
for quantum computers, too, unless NP ⊆ BQP.
Although 2-SAT (deciding whether a 2-SAT formula can be satisfied) is easy (there is a
poly-time algorithm), MAX-2SAT is an NP-hard problem. MAX-2SAT is the problem of
finding the minimum number of violated clauses for any assignment, which is equivalent
to minimizing the Hamiltonian function H.
Furthermore, we can make the 2-local Hamiltonian geometrically local without losing
hardness. An example is the Ising spin-glass model in three dimensions. Suppose the
binary variables are spins sitting at the sites of a cubic lattice, taking values Zi ∈ {±1}
at site i in the lattice. Consider the Hamiltonian
X
H=− Jij Zi Zj (6.237)
hiji
Here hiji labels the edge in the lattice that connects two nearest-neighbor sites with
labels i and j.
Jij ∈ {±1} encodes the instance of the problem. If Jij = +1, then we say the edge
hiji is ferromagnetic; it is energetically favorable for the neighboring spins to align
(both +1 or both −1). If Jij = −1, then we say the edge hiji is anti-ferromagnetic; it is
energetically favorable for the neighboring spins to anti-align (either +1 and −1 or −1
and +1). If all edges were ferromagnetic, it would be easy to minimize the energy — all
spins would align. But anti-ferromagnetic edges can generate frustration. This means it
is not possible to minimize −Jij Zi Zj for all edges simultaneously.
For purposes of visualization, it is convenient to represent spins by lattice cells– i.e.
by squares on the 2D square lattices or by cubes in the 3D cubic lattice. There is a
−Jij Zi Zj term coupling neighboring spins associated with each edge where two squares
meet in 2D on each face where two cubes meet in 3D.
Consider a site in 2D where 4 edges meet. If one of these edges is antiferromagnetic and
the other three are ferromagnetic, then there must be an odd number of excited edges
meeting at this site. More generally, if the number of antiferromagnetic edges is odd, then
there must be an odd number of excited edges and if the number of antiferromagnetic
edges is even, then there must be an even number of excited edges. If the number of
J = −1 edges meeting at a site is odd, we say that there is an Ising vortex at that site.
For any spin configuration, there are domain walls of excited edges, where the walls end
on Ising vortices.
Minimizing the energy, then, is equivalent in 2D to finding the minimum “chain” of
50 Quantum Algorithms
excited edges with boundary points at the positions of the Ising vortices. There is a poly-
time classical algorithm that finds the minimal chain. In 3D, there is an Ising vortex
on an edge if there are an odd number of J = −1 faces that meed at that edge. These
vortices form closed loops, and each spin configuration has a domain wall of excited faces
bounded by the vortex loops. To minimize the energy, then, we find the minimum area
surface with a specified 1D boundary; this problem is NP-hard. Finding the minimum
energy configuration is hard because there are many ways for the domain walls to be
“pinned” — stuck at local minima in the energy, such that many spins need to flip at
once to find a lower energy configuration. Local searching for the global energy minimum
fails.
In fact, there are also NP-hard spin glass problems in 2D, if we introduce local mag-
netic field terms in H as well as antiferromagnetic terms. For example, on a square
lattice consider
X X
H=− Jij Zi Zj − hi Z i (6.238)
hiji i
where Jij ∈ {1, 0, −1} and hi ∈ {1, 0, −1}. The local magnetic field {hi } compounds
the frustration: Each spin wants to align with the local field, but by doing so the edge
connecting the spins might become excited.
So we see that minimizing the energy of a geometrially 2-local classical Hamiltonian
can be NP-hard because of frustration — there is no way to satisfy all the clauses and
there are many local minima of the energy that are not global minima. In the quantum
P
local Hamiltonian problem, we have H = a Ha where the {Ha } might not be mutually
commuting; hence we might expect the problem to be even harder— that the Ha ’s
cannot be simultaneously diagonalized compounds the frustration even further. Indeed
the ground state could be highly entangled, with no succinct classical description. Let’s
try to characterize the hardness of the quantum problem.
exists a poly-size uniform quantum circuit family (the verifier V ) and a single-qubit
measurement {Π0 , Π1 } such that:
1. If x ∈ L there is a witness |ψx i such that:
hΨ|Π1 |Ψi ≥ 2/3 where |Ψi = V (|ψx i ⊗ |xi ⊗ |0i∗ ). (6.240)
2. If x ∈
/ L, then hΨ|Π1 |Ψi ≤ 1/3 for all |ψx i.
We claim: the k-local Hamiltonian problem is QMA-complete for k ≥ 5 (famously
proved by Kitaev: see Chap. 14 of KSV). The result can be improved to geometrically
2-local H in 2D (for qubits) or geometrically 2-local H in 1D (for higher dimensional
qudits, with the local dimension d sufficiently large).
To verify this claim, we need to show:
1. the k-local Hamiltonian problem is in QMA.
2. Any problem in QMA is reducible to the k-local Hamiltonian problem. We’ll show
this for k = 5 and without geometrical constraints, as in Kitaev’s original discovery.
We have already shown part (1). We have seen that if the ground state |ψ0 i is provided as
a witness, then we can compute E0 to 1/poly(n) accuracy using a circuit of size poly(n)
that performs the phase estimation algorithm. But how to achieve the reduction (2)?
For the reduction, we’ll follow the strategy used to show that 3-SAT is NP-complete:
For any problem in QMA, we’ll construct a witness that encodes the whole history of
the computation performed by the verifier, and a Hamiltonian H such that computing
the ground state energy of H amounts to checking that each step in the computation is
valid.
For a given problem in QMA, suppose that the verifier circuit VT has T gates:
U1 , U2 , . . . UT chosen from a universal set, where each Ut acts on at most two qubits. For
the corresponding k-local Hamiltonian problem, we’ll suppose that Merlin provides the
history state encoding the computation performed by the verifier:
T
1 X
|ηi = √ |ψ(t)i ⊗ |ti, (6.241)
T + 1 t=0
where |ψ(t)i = (Ut Ut−1 . . . U1 )|ψinitial i; here |ψinitial i = |xi ⊗ |0i∗ ⊗ |ψx i , where |xi
specifies the instance of the QMA problem, |0i∗ is the initial state of scratch space used
by the verifier, |ψx i is the quantum witness testifying that x should be accepted, and
|ψ(t)i is state of the quantum computer obtained after the first t steps of the verifier
circuit have been executed. The state |ti is the state of a clock register that records the
time t ∈ {0, 1, · · · , T }; since ht|si = δts , the T + 1 states appearing in superposition in
state |ηi are mutually orthogonal, so |ηi is properly normalized: hη|ηi = 1.
witness itself) are set to the right initial values. E.g. for each scratch qubit labeled j
that should be in the state |0i at time t = 0, we include in Hin the term
(j)
Hin = (|1ih1|)(j) ⊗ I (else) ⊗ (|0ih0|)(clock) . (6.243)
There is an energy penality of 1 if the scratch qubit is set to |1i rather than |0i as the
execution of the verifier circuit begins (time t = 0). Similar terms in Hin enforce that the
input register is set to the desired value x. The purpose of Hout is to impose a penalty
if the verifier circuit for the QMA problem fails to accept:
Hout = (|0ih0|)(output) ⊗ I (else) ⊗ (|T ihT |)(clock) . (6.244)
There is an energy cost of 1 if the output qubit has the value |0i rather than |1i after
the verifier circuit is fully executed (time t = T ). The purpose of Hclock is to impose a
penalty if the clock register is not properly encoded (we’ll return to this issue later).
The purpose of the Hprop is to impose a penalty if the state |ψ(t)i does not have the
form Ut |ψ(t − 1)i; i.e., was not obtained by faithfully executing step t of the verifier
circuit. Hence we write:
X T
Hprop = Hprop (t), (6.245)
t=1
where
1
Hprop (t) = I ⊗ |tiht| + I ⊗ |t − 1iht − 1| − Ut ⊗ |tiht − 1| − Ut† ⊗ |t − 1iht| .
2
(6.246)
The action of Hprop (t) on the relevant part of the valid history state |ηi is
1
|ψ(t − 1)i ⊗ |t − 1i 7→ (|ψ(t − 1)i ⊗ |t − 1i − Ut |ψ(t − 1)i ⊗ |ti + . . . ) ,
2
1
|ψ(t)i ⊗ |ti 7→ |ψ(t)i ⊗ |ti − Ut† |ψ(t)i ⊗ |t − 1i + . . . . (6.247)
2
Acting on |ηi, the terms I⊗|tiht| and −Ut ⊗|tiht−1| in Hprop give canceling contributions.
Hence
Hprop |ηi = 0 if |ηi is a valid history state. (6.248)
Therefore, a valid history state (where the state at time t is obtained from the state at
time t − 1 by applying the proper gate), such that the initial state is also valid, is a null
vector of both Hprop and Hin . Furthermore, if the quantum verifier accepts the input
with probability 1 − , then
E0 ≤ hη|Hout |ηi = . (6.249)
T +1
(If the history is valid, the only term in the Hamiltonian that makes a nonzero contri-
bution to hHi is the term that penalizes an incorrect final readout.) Note also that it
is possible to amplify the success probability by repeating the verification of multiple
copies of the witness. Actually, the amplification is a little bit subtle: Merlin might try
to fool Arthur: Instead of sending a product state |ψx i⊗m (m copies of the witness) he
might send an entangled state instead. But the amplification still works even in that
case. Each copy sent by Merlin may be a mixed state (obtained from the partial trace
over the other copies), a mixture of a state the verifier accepts with probability ≥ 2/3
6.10 QMA-completeness of the local Hamiltonian problem 53
and of another state that it might reject. But Merlin cannot fool Arthur into accepting
after many trials unless there is some state occuring in the ensemble of pure states that
is accepted with high probability in each trial.
So now we have seen that for a problem in QMA such that the verifier accepts with
high probability, the corresponding Hamiltonian H has an eigenvector with eigenvalue
close to zero. There are two things left to show:
• If the verifier rejects with high probability, then
E0 ≥ 1/poly(n) (6.250)
(then we can choose the promise gap of size 1/poly(n), such that E0 ≤ Elow for a
YES answer and E0 > Ehigh for a NO answer)
• So far Hprop is not local! It acts on the clock register, which is a (T + 1)-dimensional
system, and T = poly(n). We need to show we can encode the clock using qubits such
that Hprop + Hclock is k-local.
We’ll come back to the issue of encoding the clock later. First let’s try to understand
better the spectrum of H = Hin + Hout + Hprop .
Diagonalizing Hprop
Let’s start by considering Hprop . We’ve seen that a valid history state is a null vector of
Hprop (has eigenvalue zero). What are the other eigenspaces and eigenvalues of Hprop ?
It’s easier to compute the spectrum of Hprop by transforming to a rotating frame basis
that freezes the motion of the state |ψ(t)i. That is, let
Vt = Ut Ut−1 . . . U1 (6.251)
PT
(the unitary applied after the first t steps of the circuit). And consider V = t=0 Vt ⊗
|tiht|. The the history state η = √T1+1 Tt=0 Vt |ψ(0)i ⊗ |tiht| is mapped by V † to the
P
“history” state for the case where |ψ(t)i does not depend on t at all:
T T
1 X † X
V † |ηi √ Vt Vt |ψ(0)i ⊗ |tiht| = |ψ(0)i ⊗ |tiht|. (6.252)
T + 1 t=0 t=0
where the subscript means the matrix acts on the space spanned by {|t−1i, |ti}. Because
54 Quantum Algorithms
1 − 12
of the overlaps, 0
Hprop
is actually in each block, except in the spaces
− 21 1
spanned by {|0i, |1i} and {|T − 1i, |T i}, where it is
1
− 12 1 − 21
2 and . (6.255)
− 12 1 0,1
− 21 1
2 T −1,T
0
That is, Hprop acts on the clock as a (T + 1) × (T + 1) matrix
1
− 12 0 · · ·
2
− 1 1
2 11 − 2 01 · · ·
0 − 1 −2 0 · · ·
2
0
. ..
Hprop =
;
(6.256)
..
.
··· 0 − 2 1 − 12
1
··· 0 − 21 1
2
its entries are (− 12 , − 12 , .., − 21 ) just above and below the diagonal, ( 12 , 1, 1, .., 1, 12 ) on the
diagonal, and 0 elsewhere.
We may express it as Hprop 0 = I − 21 M , where
0 ···
1 1
1 0
1 0 ···
0 1
0 1 0 ···
. ..
M =
.
(6.257)
..
.
··· 0 1 0 1
··· 0 1 1
That is,
Thus the action on |ti is multiplication by 2 cos ω, which does not depend on the sign
of w. To construct an eigenstate of M , then, we may consider a linear combination of
|ωi and |−ωi, where the value of |ωi is chosen so that M acts properly on the states at
the boundary — i.e. on |t = 0i and |t = T i.
6.10 QMA-completeness of the local Hamiltonian problem 55
there is a simultaneous null eigenvector of Hprop and of Hin +Hout . That is, there is a valid
history with a valid input where the output is accepted. But if the acceptance probability
is small, that means the angle between these two null spaces cannot be too small, since
there is no vector that comes close to lying within both null spaces. We can relate this
angle to the ground state energy of the full Hamiltonian H = Hin + Hout + Hprop .
In general, suppose that H1 and H2 are two Hermitian operators, each with lowest
eigenvalue zero, and eigenvalue gap at least ∆. Then
H1 ≥ ∆(I − Π1 ) where Π1 is projection onto null space of H1 (6.272)
H2 ≥ ∆(I − Π2 ) where Π2 is projection onto null space of H2 (6.273)
Thus H1 + H2 ≥ ∆(2I − Π1 − Π2 ) and hH1 + H2 i ≥ 2∆ − ∆hΠ1 + Π2 i. Now suppose
|ψ1 i and |ψ2 i are two vectors such that |hψ1 |ψ2 i| = cos θ for 0 ≤ θ ≤ π/2. With suitable
phase conventions we may choose a basis in the two-dimensional space spanned by |ψ1 i
and |ψ2 i such that
cos θ/2 cos θ/2
|ψ1 i = |ψ2 i = =⇒ (6.274)
sin θ/2 − sin θ/2
2 cos2 θ/2
0
|ψ1 ihψ1 | + |ψ2 ihψ2 | = , (6.275)
0 2 sin2 θ/2
and therefore, in any state
h|ψ1 ihψ1 | + |ψ2 ihψ2 |i ≤ 2 cos2 θ/2 = 1 + cos θ. (6.276)
If follows that, if Π1 and Π2 are projectors, where the maximum overlap between spaces
projected by Π1 and Π2 is |hψ1 |ψ2 i| = cos θ, then hΠ1 + Π2 i ≤ 1 + cos θ. Thus
hH1 + H2 i ≥ 2∆ − ∆hΠ1 + Π2 i (6.277)
2
≥ ∆(1 − cos θ) = 2∆ sin θ/2. (6.278)
Having related the expectation value of a sum of nonnegative Hermitian operators to
the angle between their null spaces, we now need to estimate the angle between null
spaces of H1 = Hin + Hout and H2 = Hprop . That is, we want to find
cos2 θ = max |hη1 |η2 i|2 (6.279)
= maxhη2 |Π1 |η2 i (6.280)
where we maximize over η1 in the null space of H1 and η2 in the null space H2 , and Π1
is the projector onto the null space of H1 .
A vector in the null space of H2 = Hprop is a valid history state |η2 i =
PT
√1
T +1 t=0 |ψ(t)i ⊗ |ti. The projector onto null space of H1 acts trivially on states
with t ∈ {1, 2, .., T − 1} so after transforming to the rotating frame
T −1 1
hη20 |Π1 |η20 i = + hη̃2 |Πin + Π0out |η̃2 i, (6.281)
T +1 T +1
where η̃2 is a state of the non-clock variables, Πin projects onto a valid input state, and
Π0out projects onto
VT† (|1iout ⊗ (anything else)) (6.282)
(because we have transformed to the rotating frame, and the history state is valid). We
know that hη̃2 |Πin + Π0out |η̃2 i ≤ (1 + cos φ) where φ is the angle between the spaces that
6.10 QMA-completeness of the local Hamiltonian problem 57
Πin and Πout project onto, as we showed above. This angle φ is given by cos2 φ = where
is the max. acceptance probability — that is, if the input is valid (in the support of
Πin ) and the history is valid, then the probability of |1iout is at most .
√
Via eq.(6.281), this upper bound hη̃2 |Πin + Π0out |η̃2 i ≤ 1 + provides an upper bound
on the expectation value of Π1 in a valid history state, and hence an lower bound on
the angle θ between the H1 and H2 null eigenspaces:
√
2 T −1 1 √ 1−
cos θ ≤ + (1 + ) = 1 − , (6.283)
T +1 T +1 T +1
where is the maximum acceptance probability. Now, ∆ = 2 sin2 2(Tπ+1) is a lower
bound for the gap of H1 or H2 and hH1 + H2 i ≥ 2∆ sin2 2θ , where sin2 θ = 1 − cos2 θ ≥
√
1− θ sin2 θ 1
T +1 . Using sin2 2 = 4 cos2 θ2
≥ 4 sin2 θ, we have
√ √
2 π 1 1− 2 π 1−
E0 ≥ 4 sin × ≥ sin ×
2(T + 1) 4 T +1 2(T + 1) T +1
√ √
1− 1−
≥ constant × 3
= . (6.284)
(T + 1) poly(n)
To summarize, we have shown that if the acceptance probability is ≥ 1−, then E0 ≤ T +1
√ 1
and if acceptance probability is ≤ , then E0 ≥ constant × (1 − ) (T +1)3 . With suitable
amplification to make small (compared to 1/(T + 1)3 ) in the case where the answer
is YES, we have reduced the QMA problem to an instance of the local Hamiltonian
problem.
|t = 0i = |000..0i
|t = 1i = |100..0i
|t = 2i = |110..0i
etc. (6.285)
We can add a term to the Hamiltonian that imposes an energy penalty on the clock if
its state is not validly encoded. The encoding is valid if a 0 is never followed by a 1.
Therefore we choose
T
X −1
Hclock = (|01ih01)i,i+1 , (6.286)
i=1
so that the only null vectors of Hclock are valid clock states.
The term Hprop in the Hamiltonian contains operators |tiht|, |tiht − 1|, |t − 1iht| acting
on the clock register. If only validly encoded clock states are allowed, we can project
onto the state |ti by acting on just two neighboring qubits:
The operators that advance or retard the time act on three adjacent qubits:
|tiht − 1| = (|110ih100|)t−1,t,t+1 ,
|t − 1iht| = (|100ih110|)t−1,t,t+1 (6.288)
(for 1 ≤ t ≤ T − 1). Acting on three qubits suffices to locate the position of the “domain
wall” between the 0’s and 1’s in the bit string, and to move the wall one position to the
left or to the right; for t = 0 action on just two qubits suffices to move the wall one step
to the right, and for t = T action on two qubits suffices to move the wall one step to the
left. Since these clock terms act on at most three qubits, the term Ut ⊗ |tiht − 1| acts on
at most 5 qubits, if Ut is a 2-qubit gate. Thus with this clock encoding, the Hamiltonian
H = Hin + Hout + Hprop + Hclock (6.289)
is 5-local. Because the clock Hamiltonian annihilates validly encoded clock states, it
trivially commutes with the rest of the terms in the Hamiltonian H, and has an energy
gap of 1. Therefore, our previous analysis of the ground state energy of H still stands.
This completes the demonstration that any QMA problem is reducible to the prob-
lem of estimating the ground state energy (with a 1/poly(n) promise gap) of a 5-local
Hamiltonian.
Comments
Thus we have shown that the 5-local Hamiltonian problem is a “natural” QMA-complete
problem, much as 3-SAT is a natural NP-complete problem. But while in the classical
case many practical problems have been shown to be NP-complete, the family of prob-
lems that have been shown to be QMA-complete is still rather small, and the problems
seem relatively “artificial.” In any case, its interesting to see that quantum local Hamil-
tonian problems seem to be harder than classical ones (if QMA 6= NP).
I won’t discuss the tricks for reducing the QMA-complete problem to k = 2 (which
involves clever use of perturbation theory) or for making H geometrically local (which
involves encoding the clock more cleverly, among other things).
Another interesting direction to pursue using these ideas is to show that any problem
in BQP can be solved using adiabatic quantum computing. The idea is to replace
Hprop → Hprop (s) = (1 − s)Hclock−init + sHprop (6.290)
where the null space of Hclock−init fixes the clock at |t = 0i and s varies in [0, 1]. Then
the ground state of H(s = 0) is easy to construct, and the ground state of H(s = 1) is
the valid history state. The eigenvalue gap of H(s) stays > 1/poly(n) for s ∈ [0, 1], so
the history state can be prepared in polynomial time by adiabatically varying s. Once
we have prepared the history state, we can measure the clock, projecting out |t = T i
with probability 1/(T + 1) = 1/poly(n). And once we have |ψ(T )i we can measure the
output qubit to find out if the circuit accepts. At a modest additional cost, we can can
substantially improve the probability of preparing |ψ(T )i, by adding a long “runway”
after time t = T , so that the state of the computer remains fixed for S additional time
steps; then measuring the clock prepares |ψ(T )i with probability (S + 1)/(S + T + 1).
Chapter 7
1
2 CHAPTER 7. QUANTUM ERROR CORRECTION
damage (if the noise is not too severe). Then in Chapter 8, we will extend the
theory in two important ways. We will see that the recovery procedure can
work effectively even if occasional errors occur during recovery. And we will
learn how to process encoded information, so that a quantum computation
can be executed successfully despite the debilitating effects of decoherence
and faulty quantum gates.
A quantum error-correcting code (QECC) can be viewed as a mapping
of k qubits (a Hilbert space of dimension 2k ) into n qubits (a Hilbert space
of dimension 2n ), where n > k. The k qubits are the “logical qubits” or
“encoded qubits” that we wish to protect from error. The additional n − k
qubits allow us to store the k logical qubits in a redundant fashion, so that
the encoded information is not easily damaged.
We can better understand the concept of a QECC by revisiting an ex-
ample that was introduced in Chapter 1, Shor’s code with n = 9 and k = 1.
We can characterize the code by specifying two basis states for the code sub-
space; we will refer to these basis states as |0̄i, the “logical zero” and |1̄i, the
“logical one.” They are
1
|0̄i = [ √ (|000i + |111i)]⊗3 ,
2
1
|1̄i = [ √ (|000i − |111i)]⊗3 ; (7.1)
2
each basis state is a 3-qubit cat state, repeated three times. As you will
recall from the discussion of cat states in Chapter 4, the two basis states
can be distinguished by the 3-qubit observable σ (1) (2)
x ⊗ σx ⊗ σx
(3)
(where
(i)
σ x denotes the Pauli matrix σ x acting on the ith qubit); we will use the
notation X 1 X 2 X 3 for this operator. (There is an implicit I ⊗ I ⊗ · · · ⊗ I
acting on the remaining qubits that is suppressed in this notation.) The
states |0̄i and |1̄i are eigenstates of X 1 X 2 X 3 with eigenvalues +1 and −1
respectively. But there is no way to distinguish |0̄i from |1̄i (to gather any
information about the value of the logical qubit) by observing any one or two
of the qubits in the block of nine. In this sense, the logical qubit is encoded
nonlocally; it is written in the nature of the entanglement among the qubits
in the block. This nonlocal property of the encoded information provides
protection against noise, if we assume that the noise is local (that it acts
independently, or nearly so, on the different qubits in the block).
Suppose that an unknown quantum state has been prepared and encoded
as a|0̄i + b|1̄i. Now an error occurs; we are to diagnose the error and reverse
7.1. A QUANTUM ERROR-CORRECTING CODE 3
it. How do we proceed? Let us suppose, to begin with, that a single bit flip
occurs acting on one of the first three qubits. Then, as discussed in Chapter
1, the location of the bit flip can be determined by measuring the two-qubit
operators
Z 1Z 2 , Z 2Z 3 . (7.2)
The logical basis states |0̄i and |1̄i are eigenstates of these operators with
eigenvalue 1. But flipping any of the three qubits changes these eigenvalues.
For example, if Z 1 Z 2 = −1 and Z 2Z 3 = 1, then we infer that the first
qubit has flipped relative to the other two. We may recover from the error
by flipping that qubit back.
It is crucial that our measurement to diagnose the bit flip is a collective
measurement on two qubits at once — we learn the value of Z 1 Z 2, but we
must not find out about the separate values of Z 1 and Z 2, for to do so
would damage the encoded state. How can such a collective measurement
be performed? In fact we can carry out collective measurements if we have
a quantum computer that can execute controlled-NOT gates. We first intro-
duce an additional “ancilla” qubit prepared in the state |0i, then execute the
quantum circuit
– Figure –
and finally measure the ancilla qubit. If the qubits 1 and 2 are in a state
with Z 1 Z 2 = −1 (either |0i1 |1i2 or |1i1 |0i2 ), then the ancilla qubit will flip
once and the measurement outcome will be |1i. But if qubits 1 and 2 are
in a state with Z 1Z 2 = 1 (either |0i1 |0i2 or |1i1 |1i2 ), then the ancilla qubit
will flip either twice or not at all, and the measurement outcome will be |0i.
Similarly, the two-qubit operators
Z 4Z 5, Z 7Z 8 ,
Z 5Z 6, Z 8Z 9 , (7.3)
can be measured to diagnose bit flip errors in the other two clusters of three
qubits.
A three-qubit code would suffice to protect against a single bit flip. The
reason the 3-qubit clusters are repeated three times is to protect against
4 CHAPTER 7. QUANTUM ERROR CORRECTION
occurs acting on one of the nine qubits. We can diagnose in which cluster
the phase error occurred by measuring the two six-qubit observables
X 1 X 2X 3 X 4 X 5 X 6,
X 4 X 5X 6 X 7 X 8 X 9. (7.5)
The logical basis states |0̄i and |1̄i are both eigenstates with eigenvalue one
of these observables. A phase error acting on any one of the qubits in a
particular cluster will change the value of XXX in that cluster relative to
the other two; the location of the change can be identified by measuring the
observables in eq. (7.5). Once the affected cluster is identified, we can reverse
the error by applying Z to one of the qubits in that cluster.
How do we measure the six-qubit observable X 1 X 2 X 3 X 4X 5 X 6 ? Notice
that if its control qubit is initially in the state √12 (|0i + |1i), and its target is
an eigenstate of X (that is, NOT) then a controlled-NOT acts according to
1 1
CNOT : √ (|0i + |1i) ⊗ |xi → √ (|0i + (−1)x |1i) ⊗ |xi;
2 2 (7.6)
– Figure –
and then measure the ancilla in the √12 (|0i ± |1i) basis.
We see that a single error acting on any one of the nine qubits in the block
will cause no irrevocable damage. But if two bit flips occur in a single cluster
of three qubits, then the encoded information will be damaged. For example,
if the first two qubits in a cluster both flip, we will misdiagnose the error and
attempt to recover by flipping the third. In all, the errors, together with our
7.2. CRITERIA FOR QUANTUM ERROR CORRECTION 5
The encoded information will also be damaged if phase errors occur in two
different clusters. Then we will introduce a phase error into the third cluster
in our misguided attempt at recovery, so that altogether Z 1 Z 4Z 7 will have
been applied, which flips the encoded qubit:
3 that there is no loss or generality (we may still represent the most gen-
eral superoperator acting on our qubit) if we assume that the initial state
of the environment is a pure state, which we will denote as |0iE . Then the
evolution of the qubit and its environment can be described by a unitary
transformation
here the four |eij iE are states of the environment that need not be normalized
or mutually orthogonal (though they do satisfy some constraints that follow
from the unitarity of U ). Under U , an arbitrary state |ψi = a|0i + b|1i of
the qubit evolves as
1
= (a|0i + b|1i) ⊗ (|e00iE + |e11iE )
2
1
+ (a|0i − b|1i) ⊗ (|e00iE − |e11iE )
2
1
+ (a|1i + b|0i) ⊗ (|e01iE + |e10iE )
2
1
+ (a|1i − b|0i) ⊗ (|e01iE − |e10iE )
2
here the index a ranges over 22n values. The {E a } are the linearly inde-
pendent Pauli operators acting on the n qubits, and the {|eaiE } are the
corresponding states of the environment (which are not assumed to be nor-
malized or mutually orthogonal). A crucial feature of this expansion for what
follows is that each E a is a unitary operator.
Eq. (7.12) provides the conceptual foundation of quantum error correc-
tion. In devising a quantum error-correcting code, we identify a subset E of
all the Pauli operators,
these are the errors that we wish to be able to correct. Our aim will be
to perform a collective measurement of the n qubits in the code block that
will enable us to diagnose which error E a ∈ E occurred. If |ψi is a state
in the code subspace, then for some (but not all) codes this measurement
will prepare a state E a |ψi ⊗ |eaiE , where the value of a is known from the
measurement outcome. Since E a is unitary, we may proceed to apply E †a (=
E a) to the code block, thus recovering the undamaged state |ψi.
Each Pauli operator can be assigned a weight, an integer t with 0 ≤ t ≤ n;
the weight is the number of qubits acted on by a nontrivial Pauli matrix
(X, Y , or Z). Heuristically, then, we can interpret a term in the expansion
eq. (7.12) where E a has weight t as an event in which errors occur on t
qubits (but again we cannot take this interpretation too literally if the states
{|eaiE } are not mutually orthogonal). Typically, we will take E to be the set
of all Pauli operators of weight up to and including t; then if we can recover
from any error superoperator with support on the set E, we will say that the
8 CHAPTER 7. QUANTUM ERROR CORRECTION
code can correct t errors. In adopting such an error set, we are implicitly
assuming that the errors afflicting different qubits are only weakly correlated
with one another, so that the amplitude for more than t errors on the n
qubits is relatively small.
Given the set E of errors that are to be corrected, what are the necessary
and sufficient conditions to be satisfied by the code subspace in order that
recovery is possible? Let us denote by { |īi } an orthonormal basis for the
code subspace. (We will refer to these basis elements as “codewords”.) It
will clearly be necessary that
where E a,b ∈ E. If this condition were not satisfied for some i 6= j, then
errors would be able to destroy the perfect distinguishability of orthogonal
codewords, and encoded quantum information could surely be damaged. (A
more explicit derivation of this necessary condition will be presented below.)
We can also easily see that a sufficient condition is
In this case the E a ’s take the code subspace to a set of mutually orthogonal
“error subspaces”
Ha = E a Hcode . (7.16)
Suppose, then that an arbitrary state |ψi in the code subspace is prepared,
and subjected to an error. The resulting state of code block and environment
is
X
E a |ψi ⊗ |eaiE , (7.17)
E a∈E
where the sum is restricted to the errors in the set E. We may then perform
an orthogonal measurement that projects the code block onto one of the
spaces Ha , so that the state becomes
We finally apply the unitary operator E †a to the code block to complete the
recovery procedure.
7.2. CRITERIA FOR QUANTUM ERROR CORRECTION 9
where E a,b ∈ E, and Cba = hī|E †b E a |īi is an arbitrary Hermitian matrix. The
nontrivial content of this condition that goes beyond the weaker necessary
condition eq. (7.14) is that hī|E †b E a|īi does not depend on i. The origin of
this condition is readily understood — were it otherwise, in identifying an
error subspace Ha we would acquire some information about the encoded
state, and so would inevitably disturb that state.
To prove that the condition eq. (7.19) is necessary and sufficient, we
invoke the theory of superoperators developed in Chapter 3. Errors acting
on the code block are described by a superoperator, and the issue is whether
another superoperator (the recovery procedure) can be constructed that will
reverse the effect of the error. In fact, we learned in Chapter 3 that the only
superoperators that can be inverted are unitary operators. But now we are
demanding a bit less. We are not required to be able to reverse the action of
the error superoperator on any state in the n-qubit code block; rather, it is
enough to be able to reverse the errors when the initial state resides in the
k-qubit encoded subspace.
An alternative way to express the action of an error on one of the code
basis states |īi (and the environment) is
X
|īi ⊗ |0iE → M µ |īi ⊗ |µiE , (7.20)
µ
where now the states |µiE are elements of an orthonormal basis for the envi-
ronment, and the matrices M µ are linear combinations of the Pauli operators
10 CHAPTER 7. QUANTUM ERROR CORRECTION
and X
Rν M µ |īi ⊗ |µiE ⊗ |νiA
µ,ν
the superoperator defined by the Rν ’s does indeed reverse the error. It only
remains to check that the Rν ’s satisfy the normalization condition. We have
1 X
R†ν Rν = M ν |īihī|M †ν ,
X X
(7.32)
ν ν,i Cν ν
which is the orthogonal projection onto the space of states that can be reached
by errors acting on codewords. Thus we can complete the specification of
the recovery superoperator by adding one more element to the operator sum
— the projection onto the complementary subspace.
In brief, eq. (7.19) is a sufficient condition for error recovery because it is
possible to choose a basis for the error operators (not necessarily the Pauli
12 CHAPTER 7. QUANTUM ERROR CORRECTION
operator basis) that diagonalizes the matrix Cab, and in this basis we can
unambiguously diagnose the error by performing a suitable orthogonal mea-
surement. (The eigenmodes of Cab with eigenvalue zero, like Z 1 − Z 2 in the
case of the 9-qubit code, correspond to errors that occur with probability
zero.) We see that, once the set E of possible errors is specified, the recov-
ery operation is determined. In particular, no information is needed about
the states |eaiE of the environment that are associated with the errors E a .
Therefore, the code works equally effectively to control unitary errors or de-
coherence errors (as long as the amplitude for errors outside of the set E is
negligible). Of course, in the case of a nondegenerate code, Cab is already
diagonal in the Pauli basis, and we can express the recovery basis as
|īihī|E †a ;
X
Ra = (7.33)
i
– Figure –
hī|E a|j̄i =
6 Ca δij . (7.34)
We will describe a quantum code with block size n, k encoded qubits, and
distance d as an “[[n, k, d]] quantum code.” We use the double-bracket no-
14 CHAPTER 7. QUANTUM ERROR CORRECTION
tation for quantum codes, to distinguish from the [n, k, d] notation used for
classical codes.
We say that an QECC can correct t errors if the set E of E a ’s that allow
recovery includes all Pauli operators of weigh t or less. Our definition of
distance implies that the criterion for error correction
will be satisfied by all Pauli operators E a,b of weight t or less, provided that
d ≥ 2t + 1. Therefore, a QECC with distance d = 2t + 1 can correct t errors.
If the good outcome occurs, we are assured that the quantum state is un-
damaged. If the bad outcome occurs, damage has been sustained, and the
state should be discarded.
If the error superoperator has its support on the set E of all Pauli op-
erators of weight up to t, and it is possible to make a measurement that
correctly diagnoses whether an error has occurred, then it is said that we can
detect t errors. Error detection is easier than error correction, so a given code
can detect more errors than it can correct. In fact, a QECC with distance
d = t + 1 can detect t errors.
Such a code has the property that
the density matrix of the t qubits. Then this density matrix is totally random:
1
ρ(t) = I; (7.40)
2t
(In any distance-(t + 1) code, we cannot acquire any information about the
encoded data by observing any t qubits in the block; that is, ρ(t) is a constant,
independent of the codeword. But only if the code is nondegenerate will the
density matrix of the t qubits be a multiple of the identity.)
To verify the property eq. (7.40), we note that for a nondegenerate distance-
(t + 1) code,
tr(ρ(t)E a) = 0, (7.42)
for any t-qubit Pauli operator E a other than the identity. Now ρ(t), like any
Hermitian 2t × 2t matrix, can be expanded in terms of Pauli operators:
1
(t)
X
ρ = t I+ ρa E a . (7.43)
2
E a 6=I
1
tr(E aE b ) = δab , (7.44)
2t
The recovery operation (a unitary acting on the data and the ancilla) then
maps |GOODi to a state |GOOD0 i of data, environment, and ancilla, and
|BADi to a state |BAD0 i, so that after recovery we obtain the state
Let ρrec denote the density matrix of the recovered state, obtained by tracing
out the environment and ancilla, and let
F = hψ|ρrec|ψi (7.49)
18 CHAPTER 7. QUANTUM ERROR CORRECTION
be its fidelity. Now, since |BAD0 i is orthogonal to |GOOD0i (that is, |BAD0 i
has no component along |ψi|siEA ), the fidelity will be
where
Of course, since both the error operation and the recovery operation are uni-
tary acting on data, environment, and ancilla, the complete state |GOOD0i+
|BAD0i is normalized, or
F ≥ 1− k |BAD0⊥ i k2 . (7.57)
7.4. PROBABILITY OF FAILURE 19
Finally, the norm of |BAD0⊥ i cannot exceed the norm of |BAD0 i, and we
conclude that
1 − F ≤ k |BAD0 i k2 =k |BADi k2 ≡k E b |ψi ⊗ |eb iE k2 .
X
E b 6∈E (7.58)
This is our general bound on the “failure probability” of the recovery oper-
ation. The result eq. (7.53) then follows in the special case where |GOODi
and |BADi are orthogonal states.
where $(1)
error is a one-qubit superoperator whose action (in its unitary repre-
sentation) has the form
|ψi ⊗ |0iE → |ψi ⊗ |eI iE + X|ψi⊗ |eX iE + Y |ψi ⊗ |eY iE
+Z|ψi ⊗ |eZ iE . (7.60)
The effect of the errors on encoded information is especially easy to analyze
if we suppose further that each of the three states of the environment |eX,Y,Z i
is orthogonal to the state |eI i. In that case, a record of whether or not an
error occurred for each qubit is permanently imprinted on the environment,
and it is sensible to speak of a probability of error perror for each qubit, where
heI |eI i = 1 − perror . (7.61)
If our quantum code can correct t errors, then the “good” Pauli operators
have weight up to t, and the “bad” Pauli operators have weight greater than
t; recovery is certain to succeed unless at least t + 1 qubits are subjected to
errors. It follows that the fidelity obeys the bound
n
n n
n−s
pserror pt+1 .
X
1−F ≤ (1 − perror) ≤
s t + 1 error
s=t+1 (7.62)
20 CHAPTER 7. QUANTUM ERROR CORRECTION
n
(For each of the t+1 ways of choosing t + 1 locations, the probability that
errors occurs at every one of those locations is pt+1 error , where we disregard
whether additional errors occur at the remaining n − t − 1 locations. There-
fore, the final expression in eq. (7.62) is an upper bound on the probability
that at least t + 1 errors occur in the block of n qubits.) For perror small and t
large, the fidelity of the encoded data is a substantial improvement over the
fidelity F = 1 − O(p) maintained by an unprotected qubit.
For a general error superoperator acting on a single qubit, there is no clear
notion of an “error probability;” the state of the qubit and its environment
obtained when the Pauli operator I acts is not orthogonal to (and so cannot
be perfectly distinguished from) the state obtained when the Pauli operators
X, Y , and Z act. In the extreme case there is no decoherence at all — the
“errors” arise because unknown unitary transformations act on the qubits.
(If the unitary transformation U acting on a qubit were known, we could
recover from the “error” simply by applying U †.)
Consider uncorrelated unitary errors acting on the n qubits in the code
block, each of the form (up to an irrelevant phase)
q √
U (1) = 1 − p + i p W, (7.63)
(7.64)
If a unitary error of the form eq. (7.63) acts on each of the n qubits in the
code block, and the resulting state is expanded in terms of Pauli operators
as in eq. (7.45), then the state |BADi (which arises from terms in which W
√
acts on at least t + 1 qubits) has a norm of order ( p)t+1 , and eq. (7.58)
becomes
1 − F = O(pt+1 ) . (7.65)
where each αi ∈ {0, 1}, and addition is modulo 2. We may say that the
length-n vector v(α1 . . . αk ) encodes the k-bit message α = (α1 , . . . , αk ).
22 CHAPTER 7. QUANTUM ERROR CORRECTION
called the generator matrix of the code. Then in matrix notation, eq. (7.66)
can be rewritten as
v(α) = αG ; (7.68)
Hv = 0 (7.69)
for all those and only those vectors v in the code C. This matrix H is called
the parity check matrix of the code C. The rows of H are n − k linearly
independent vectors, and the code space is the space of vectors that are
orthogonal to all of these vectors. Orthogonality is defined with respect to
the mod 2 bitwise inner product; two length-n binary strings are orthogonal
is they “collide” (both take the value 1) at an even number of locations. Note
that
HGT = 0 ; (7.70)
v →v+e . (7.71)
H(v + e) = Hv + He = He . (7.72)
7.5. CLASSICAL LINEAR CODES 23
v + e → (v + e) + e = v . (7.73)
The recovered message v + e1 + e2 lies in the code, but it differs from the
intended message v; the encoded information has been damaged.
The distance d of a code C is the minimum weight of any vector v ∈ C,
where the weight is the number of 1’s in the string v. A linear code with
distance d = 2t + 1 can correct t errors; the code assigns a distinct syndrome
to each e ∈ E, where E contains all vectors of weight t or less. This is so
because, if He1 = He2 , then
and therefore e1 + e2 ∈ C. But if e1 and e2 are unequal and each has weight
no larger than t, then the weight of e1 + e2 is greater than zero and no larger
than 2t. Since d = 2t + 1, there is no such vector in C. Hence He1 and He2
cannot be equal.
A useful concept in classical coding theory is that of the dual code. We
have seen that the k × n generator matrix G and the (n − k) × n parity check
matrix H of a code C are related by HGT = 0. Taking the transpose, it
follows that GH T = 0. Thus we may regard H T as the generator and G as
the parity check of an (n − k)-dimensional code, which is denoted C ⊥ and
called the dual of C. In other words, C ⊥ is the orthogonal complement of
C in F2n . A vector is self-orthogonal if it has even weight, so it is possible
for C and C ⊥ to intersect. A code contains its dual if all of its codewords
have even weight and are mutually orthogonal. If n = 2k it is possible that
C = C ⊥ , in which case C is said to be self-dual.
An identity relating the code C and its dual C ⊥ will prove useful in the
24 CHAPTER 7. QUANTUM ERROR CORRECTION
following section:
2k
u ∈ C⊥
v·u
X
(−1) = . (7.76)
0 u 6∈ C ⊥
v∈C
The nontrivial content of the identity is the statement that the sum vanishes
for u 6∈ C ⊥ . This readily follows from the familiar identity
(−1)v·w = 0, w 6= 0,
X
(7.77)
v∈{0,1}k
v = αG, (7.78)
(−1)v·u = (−1)α·Gu = 0,
X X
(7.79)
v∈C α∈{0,1}k
1
(−1)u·w |ui ;
X
=√ (7.81)
2n−k2 u∈C ⊥
2
d1 ≥ 2tF + 1 ,
d⊥
2 ≥ 2tP + 1 . (7.82)
Then we can see that the corresponding CSS code can correct tF bit flips
and tP phase flips. If e is a binary string of length n, let E (flip)
e denote the
Pauli operator with an X acting at each location i where ei = 1; it acts on
the state |vi according to
E (flip)
e : |vi → |v + ei . (7.83)
26 CHAPTER 7. QUANTUM ERROR CORRECTION
E (phase)
e : |vi → (−1)v.e |vi , (7.84)
E (phase)
e : |ui → |u + ei . (7.85)
Now, in the original basis (the F or “flip” basis), each basis state |w̄iF of
the CSS code is a superposition of words in the code C1 . To diagnose bit flip
error, we perform on data and ancilla the unitary transformation
and then measure the ancilla. The measurement result H1 eF is the bit flip
syndrome. If the number of flips is tF or fewer, we may correctly infer from
this syndrome that bit flips have occurred at the locations labeled by eF . We
recover by applying X to the qubits at those locations.
To correct phase errors, we first perform the bitwise Hadamard transfor-
mation to rotate from the F basis to the P (“phase”) basis. In the P basis,
each basis state |w̄iP of the CSS code is a superposition of words in the code
C2⊥ . To diagnose phase errors, we perform a unitary transformation
and measure the ancilla (G2 , the generator matrix of C2, is also the parity
check matrix of C2⊥ ). The measurement result G2 eP is the phase error syn-
drome. If the number of phase errors is tP or fewer, we may correctly infer
from this syndrome that phase errors have occurred at locations labeled by
eP . We recover by applying X (in the P basis) to the qubits at those lo-
cations. Finally, we apply the bitwise Hadamard transformation once more
to rotate the codewords back to the original basis. (Equivalently, we may
recover from the phase errors by applying Z to the affected qubits after the
rotation back to the F basis.)
If eF has weight less than d1 and eP has weight less than d⊥
2 , then
hw̄|E (phase)
eP E (flip) 0
eF |w̄ i = 0 (7.88)
phase error both afflicting the same qubit. So the distance d of a CSS code
satisfies
d ≥ min(d1 , d⊥
2) . (7.89)
CSS codes have the special property (not shared by more general QECC’s)
that the recovery procedure can be divided into two separate operations, one
to correct the bit flips and the other to correct the phase errors.
The unitary transformation eq. (7.86) (or eq. (7.87)) can be implemented
by executing a simple quantum circuit. Associated with each of the n − k1
rows of the parity check matrix H1 is a bit of the syndrome to be extracted.
To find the ath bit of the syndrome, we prepare an ancilla bit in the state
|0iA,a , and for each value of λ with (H1 )aλ = 1, we execute a controlled-NOT
gate with the ancilla bit as the target and qubit λ in the data block as the
control. When measured, the ancilla qubit reveals the value of the parity
check bit λ(H1 )aλ vλ.
P
Schematically, the full error correction circuit for a CSS code has the
form:
– Figure –
Separate syndromes are measured to diagnose the bit flip errors and the phase
errors. An important special case of the CSS construction arises when a code
C contains its dual C ⊥ . Then we may choose C1 = C and C2 = C ⊥ ⊆ C; the
C parity check is computed in both the F basis and the P basis to determine
the two syndromes.
0 0 0 1 1 1 1
To see that the distance of the code is d = 3, first note that the weight-3
string (1110000) passes the parity check and is, therefore, in the code. Now
we need to show that there are no vectors of weight 1 or 2 in the code. If e1
has weight 1, then He1 is one of the columns of H. But no column of H is
trivial (all zeros), so e1 cannot be in the code. Any vector of weight 2 can be
expressed as e1 + e2 , where e1 and e2 are distinct vectors of weight 1. But
1 0 1 0 1 0 1
0 1 1 0 0 1 1
G= ; (7.92)
0 0 0 1 1 1 1
1 1 1 0 0 0 0
the first three rows coincide with the rows of H, and the weight-3 codeword
(1110000) is appended as the fourth row.
The dual of the Hamming code is the [7, 3, 4] code generated by H. In
this case the dual of the code is actually contained in the code — in fact, it
is the even subcode of the Hamming code, containing all those and only those
Hamming codewords that have even weight. The odd codeword (1110000)
is a representative of the nontrivial coset of the even subcode. For the CSS
construction, we will choose C1 to be the Hamming code, and C2 to be its
dual, the even subcode.. Therefore, C2⊥ = C1 is again the Hamming code;
we will use the Hamming parity check both to detect bit flips in the F basis
and to detect phase flips in the P basis.
7.7. THE 7-QUBIT CODE 29
In the F basis, the two orthonormal codewords of this CSS code, each
associated with a distinct coset of the even subcode, can be expressed as
1 X
|0̄iF = √ |vi ,
8 even v
∈ Hamming
1 X
|1̄iF = √ |vi . (7.93)
8 odd v
∈ Hamming
Since both |0̄i and |1̄i are superpositions of Hamming codewords, bit flips
can be diagnosed in this basis by performing an H parity check. In the
Hadamard rotated basis, these codewords become
1 1
(7)
X
H : |0̄iF → |0̄iP ≡ |vi = √ (|0̄iF + |1̄iF )
4 v∈ Hamming 2
1 1
(−1)wt(v) |vi = √ (|0̄iF − |1̄iF ).
X
|1̄iF → |1̄iP ≡
4 2
v∈ Hamming (7.94)
H(e1 + e2 + e3 ) = 0 ; (7.95)
then, the effect of the two bit flip errors and our faulty attempt at recovery
will be to add e1 + e2 + e3 (an odd-weight Hamming codeword) to the data,
which will induce a flip of the encoded qubit
Similarly, two phase flips in the F basis are two bit flips in the P basis, which
(after the botched recovery) induce on the encoded qubit
or equivalently
|0̄iF → |0̄iF
|1̄iF → −|1̄iF , (7.98)
a phase flip of the encoded qubit in the F basis. If there is one bit flip and
one phase flip (either on the same qubit or different qubits) then recovery
will be successful.
1 + 3n ≤ 2n−1 , (7.101)
– Figure –
32 CHAPTER 7. QUANTUM ERROR CORRECTION
If we append |00i to each of those two sub-blocks, then the original block
has spawned two offspring, each with two located errors. If we were able to
correct the two located errors in each of the offspring, we would obtain two
identical copies of the parent block — we would have cloned an unknown
quantum state, which is impossible. Therefore, no [[4, 1, 3]] quantum code
can exist. We conclude that n = 5 is the minimal block size of a quantum
code that corrects one error, whether the code is degenerate or not.
The same reasoning shows that an [[n, k ≥ 1, d]] code can exist only for
n − k ≥ d − 1, (7.104)
and so has been called the “quantum Singleton bound.” For a classical linear
code, the Singleton bound is a near triviality: the code can have distance d
only if any d− 1 columns of the parity check matrix are linearly independent.
Since the columns have length n − k, at most n − k columns can be linearly
independent; therefore d − 1 cannot exceed n − k. The Singleton bound also
applies to nonlinear codes.
An elegant proof of the quantum Singleton bound can be found that
exploits the subadditivity of the Von Neumann entropy discussed in §5.2.
We begin by introducing a k-qubit ancilla, and constructing a pure state
that maximally entangles the ancilla with the 2k codewords of the QECC:
1 X
|ΨiAQ = √ |xiA |x̄iQ , (7.105)
2k
where {|xiA } denotes an orthonormal basis for the 2k -dimensional Hilbert
space of the ancilla, and {|x̄iQ } denotes an orthonormal basis for the 2k -
dimensional code subspace. If we trace over the length-n code block Q, the
density matrix ρA of the ancilla is 21k 1, which has entropy
Now, if the code has distance d, then d − 1 located errors can be corrected;
or, as we have seen, no observable acting on d − 1 of the n qubits can reveal
any information about the encoded state. Equivalently, the observable can
reveal nothing about the state of the ancilla in the entangled state |Ψi.
Now, since we already know that n > 2(d − 1) (if k ≥ 1), let us imagine
(1)
dividing the code block Q into three disjoint parts: a set of d− 1 qubits Qd−1 ,
(2) (3)
another disjoint set of d − 1 qubits Qd−1 , and the remaining qubits Qn−2(d−1) .
If we trace out Q(2) and Q(3), the density matrix we obtain must contain no
correlations between Q(1) and the ancilla A. This means that the entropy of
system AQ(1) is additive:
S(Q(2)Q(3)) = S(AQ(1)) = S(A) + S(Q(1)). (7.107)
Similarly,
S(Q(1)Q(3)) = S(AQ(2)) = S(A) + S(Q(2)). (7.108)
Furthermore, in general, Von Neumann entropy is subadditive, so that
S(Q(1)Q(3)) ≤ S(Q(1)) + S(Q(3))
S(Q(2)Q(3)) ≤ S(Q(2)) + S(Q(3)) (7.109)
Combining these inequalities with the equalities above, we find
S(A) + S(Q(2)) ≤ S(Q(1)) + S(Q(3))
S(A) + S(Q(1)) ≤ S(Q(2)) + S(Q(3)). (7.110)
Both of these inequalities can be simultaneously satisfied only if
S(A) ≤ S(Q(3)) (7.111)
Now Q(3) has dimension n − 2(d − 1), and its entropy is bounded above by
its dimension so that
S(A) = k ≤ n − 2(d − 1), (7.112)
which is the quantum Singleton bound.
The [[5, 1, 3]] code saturates this bound, but for most values of n and
k the bound is not tight. Rains has obtained the stronger result that an
[[n, k, 2t + 1]] code with k ≥ 1 must satisfy
n+1
t≤ , (7.113)
6
34 CHAPTER 7. QUANTUM ERROR CORRECTION
are the elements of a group of order 8.1 The n-fold tensor products of single-
qubit Pauli operators also form a group
of order |Gn | = 22n+1 (since there are 4n possible tensor products, and another
factor of 2 for the ± sign) we will refer to Gn as the n-qubit Pauli group.
(In fact, we will use the term “Pauli group” both to refer to the abstract
1
It is not the quaternionic group but the other non-abelian group of order 8 — the
symmetry group of the square. The element Y , of order 4, can be regarded as the 90◦
rotation of the plane, while X and Z are reflections about two orthogonal axes.
7.9. STABILIZER CODES 35
We will use the Pauli group to characterize a QECC in the following way:
Let S denote an abelian subgroup of the n-qubit Pauli group Gn . Thus all
elements of S acting on H2n can be simultaneously diagonalized. Then the
stabilizer code HS ⊆ H2 n associated with S is the simultaneous eigenspace
with eigenvalue 1 of all elements of S. That is,
The group S is called the stabilizer of the code, since it preserves all of the
codewords.
The group S can be characterized by its generators. These are elements
{M i } that are independent (no one can be expressed as a product of others)
and such that each element of S can be expressed as a product of elements
of {M i }. If S has n − k generators, we can show that the code space HS has
dimension 2k — there are k encoded qubits.
To verify this, first note that each M ∈ S must satisfy M 2 = I; if
2
M = −I, then M cannot have the eigenvalue +1. Furthermore, for each
M 6= ±I in Gn that squares to one, the eigenvalues +1 and −1 have equal
36 CHAPTER 7. QUANTUM ERROR CORRECTION
so that the error flips the value of M , and the error can be detected by
measuring M .
For stabilizer generators M i and errors E a , we may write
M i E a = (−1)sia E a M i . (7.123)
where Cab is independent of |ψi. We can see that this condition is satisfied
provided that, for each E a, E b ∈ E, one of the following holds:
1) E †a E b ∈ S ,
Proof: In case (1) hψ|E †a E b |ψi = hψ|ψi = 1, for |ψi ∈ HS . In case (2),
suppose M ∈ S and M E †a E b = −E †aE b M . Then
Evidently we could also just as well choose the code subspace to be any
one of the 2n−k simultaneous eigenspaces of n − k independent commuting
elements of Gn . But in fact all of these codes are equivalent. We may regard
two stabilizer codes as equivalent if they differ only according to how the
qubits are labeled, and how the basis for each single-qubit Hilbert space is
chosen – that is the stabilizer of one code is transformed to the stabilizer
of the other by a permutation of the qubits together with a tensor prod-
uct of single-qubit transformations. If we partition the stabilizer generators
into two sets {M 1, . . . , M j } and {M j+1 , . . . , M n−k }, then there exists an
N ∈ Gn that commutes with each member of the first set and anti-commutes
with each member of the second set. Applying N to |ψi ∈ Hs preserves the
eigenvalues of the first set while flipping the eigenvalues of the second set.
Since N is just a tensor product of single-qubit unitary transformations,
there is no loss of generality (up to equivalence) in choosing all of the eigen-
values to be one. Furthermore, since minus signs don’t really matter when
the stabilizer is specified, we may just as well say that two codes are equiva-
lent if, up to phases, the stabilizers differ by a permutation of the n qubits,
and permutations on each individual qubits of the operators X, Y , Z.
M = ZM · XM (7.126)
where α and β are binary strings of length n. (Then Y acts at the locations
where α and β “collide.”) Multiplication in G¯n maps to addition in F22n:
0
(α|β)(α0|β 0) = (−1)α ·β (α + α0 |β + β 0) ; (7.128)
Thus two Pauli operators commute if and only if the corresponding vectors
are orthogonal with respect to the “symplectic” inner product
α · β 0 + α0 · β . (7.130)
since α·β counts the number of Y ’s in the operator; it squares to the identity
if and only if
α·β =0 . (7.132)
Note that a closed subspace, where each element has this property, is auto-
matically self-orthogonal, since
α · β 0 + α0 · β = (α + α0 ) · (β + β 0) − α · β − α0 · β 0 = 0 ;
(7.133)
in the group language, that is, a subgroup of Gn with each element squaring
to I is automatically abelian.
Using the linear algebra language, some of the statements made earlier
about the Pauli group can be easily verified by counting linear constraints.
Elements are independent if the corresponding vectors are linearly indepen-
dent over F22n, so we may think of the n − k generators of the stabilizer
as a basis for a linear subspace of dimension n − k. We will use the nota-
tion S to denote both the linear space and the corresponding abelian group.
Then S ⊥ denotes the dimension-n + k space of vectors that are orthogonal
to each vector in S (with respect to the symplectic inner product). Note
that S ⊥ contains S, since all vectors in S are mutually orthogonal. In the
group language, corresponding to S ⊥ is the normalizer (or centralizer) group
N(S) (≡ S ⊥ ) of S in Gn — the subgroup of Gn containing all elements that
commute with each element of S. Since S is abelian, it is contained in its
own normalizer, which also contains other elements (to be further discussed
below). The stabilizer of a distance d code has the property that each (α|β)
whose weight i (αi ∨ βi) is less than d either lies in the stabilizer subspace
P
Here each row is a Pauli operator, expressed in the (α|β) notation. The syn-
drome of an error E a = (αa|βa ) is determined by its commutation properties
with the generators M i = (α0i |βi0); that is
Z 1Z 2, Z 2Z 3 Z 4Z 5 Z 5Z 6, Z 7Z 8 Z8Z 9
X 1 X 2 X 3 X 4X 5 X 6 , X 4X 5 X 6 X 7 X 8X 9 .
(7.136)
1 1 0
0 0
0 1 1
1 1 0
0 0 0
0 1 1
1 1 0
0 0
0 1 1
1 1 1 1 1 1 0 0 0
0
0 0 0 1 1 1 1 1 1
(b) The seven-qubit code. This [[7, 1, 3]] code has six stabilizer genera-
tors, which can be expressed as
!
Hham 0
H̃ = , (7.137)
0 Hham
M 1 = Z 1Z 3Z 5Z7
M 2 = Z 2Z 3Z 6Z7
M 3 = Z 4Z 5Z 6Z7, (7.138)
42 CHAPTER 7. QUANTUM ERROR CORRECTION
M 4 = X 1 X 3 X 5X 7
M 5 = X 2 X 3 X 6X 7
M 6 = X 4 X 5 X 6X 7 , (7.139)
(c) CSS codes. Recall whenever an [n, k, d] classical code C contains its
dual code C ⊥ , we can perform the CSS construction to obtain an
[[n, 2k − n, d]] quantum code. The stabilizer of this code can be written
as
!
H 0
H̃ = (7.140)
0 H
where w ∈ C ⊥ .
HX HZT = HZ HX
T
=0. (7.143)
⊥
But this is just the requirement that the dual CX of the code whose
parity check is HX be contained in the code CZ whose parity check is
HZ . In other words, this QECC fits into the CSS framework, with
⊥
C2 = CX ⊆ C1 = CZ . (7.144)
So we may characterize CSS codes as those and only those for which
the stabilizer has generators of the form eq. (7.142).
However there is a caveat. The code defined by eq. (7.142) will be non-
degenerate if errors are restricted to weight less than d = min(dZ , dX )
(where dZ is the distance of CZ , and dX the distance of CX ). But the
true distance of the QECC could exceed d. For example, the 9-qubit
code is in this generalized sense a CSS code. But in that case the
classical code CX is distance 1, reflecting that, e.g., Z 1 Z 2 is contained
in the stabilizer. Nevertheless, the distance of the CSS code is d = 3,
since no weight-2 Pauli operator lies in S ⊥ \ S.
on the encoded qubit labeled by i. First, note that we can extend the n − k
stabilizer generators to a maximal set of n commuting operators. The k
operators that we add to the set may be denoted Z̄ 1 , . . . Z̄ k . We can then
regard the simultaneous eigenstates of Z̄ 1 . . . Z̄ k (in the code subspace HS )
as the logical basis states |z̄1 , . . . , z̄k i, with z̄j = 0 corresponding to Z̄ j = 1
and z̄j = 1 corresponding to Z̄ j = −1.
The remaining k generators of the normalizer may be chosen to be mutu-
ally commuting and to commute with the stabilizer, but then they will not
commute with any of the Z̄ i ’s. By invoking a Gram-Schmidt orthonormaliza-
tion procedure, we can choose these generators, denoted X̄ i , to diagonalize
the symplectic form, so that
Z̄ i X̄ j = (−1)δij X̄ j Z̄ i . (7.145)
(a) The 9-qubit Code. As we have discussed previously, the logical oper-
ators can be chosen to be
Z̄ = X 1X 2 X 3 ,
X̄ = Z 1Z 4 Z 7 . (7.146)
X̄ = X 1X 2 X 3 ,
Z̄ = Z 1Z 2 Z 3 ; (7.147)
M1 = XZZXI,
M2 = IXZZX,
M3 = XIXZZ,
M4 = ZXIXZ, (7.148)
01100 10010
00110 01001
H̃ = (7.149)
00011 10100
10001 01010
This matrix has a nice interpretation, as each of its columns can be regarded
as the syndrome of a single-qubit error. For example, the single-qubit bit flip
operator X j , commutes with M i if M i has an I or X in position j, and
anti-commutes if M i has a Z in position j. Thus the table
46 CHAPTER 7. QUANTUM ERROR CORRECTION
X1 X2 X3 X4 X5
M1 0 1 1 0 0
M2 0 0 1 1 0
M3 0 0 0 1 1
M4 1 0 0 0 1
lists the outcome of measuring M 1,2,3,4 in the event of a bit flip. (For example,
if the first bit flips, the measurement outcomes M 1 = M 2 = M 3 = 1, M 4 =
−1, diagnose the error.) Similarly, the right half of H̃ can be regarded as the
syndrome table for the phase errors.
Z1 Z2 Z3 Z4 Z5
M1 1 0 0 1 0
M2 0 1 0 0 1
M3 1 0 1 0 0
M4 0 1 0 1 0
Since Y anti-commutes with both X and Z, we obtain the syndrome for the
error Y i by summing the ith columns of the X and Z tables:
Y1 Y2 Y3 Y4 Y5
M1 1 1 1 1 0
M2 0 1 1 1 1
M3 1 0 1 1 1
M4 1 1 0 1 1
M 3 M 4 = −Y XXY I, (7.150)
and its cyclic permutations. Evidently, all elements of the stabilizer are
weight-4 Pauli operators.
For our logical operators, we may choose
Z̄ = ZZZZZ,
X̄ = XXXXX; (7.152)
these commute with M 1,2,3,4, square to I, and anti-commute with one an-
other. Being weight 5, they are not themselves contained in the stabilizer.
Therefore if we don’t mind destroying the encoded state, we can determine
the value of Z̄ for the encoded qubit by measuring Z of each qubit and eval-
uating the parity of the outcomes. In fact, since the code is distance three,
there are elements of S ⊥ \ S of weight-three; alternate expressions for Z̄ and
X̄ can be obtained by multiplying by elements of the stabilizer. For example
we can choose
(up to normalization)
X
|0̄i = |00000i
M ∈S
= |00000i + (M 1 + cyclic perms) |00000i
+ (M 3 M 4 + cyclic perms) |00000i + (M 2 M 5 + cyclic perms) |00000i
= |00000i + (110010i + cyclic perms)
− (|11110i + cyclic perms)
− (|01100i + cyclic perms). (7.156)
We may then find |1̄i by applying X̄ to |0̄i, that is by flipping all 5 qubits:
– Figure –
The Hadamard rotations on the first and fourth qubits rotate M 1 to the
tensor product of Z’s ZZZZI, and the CNOT’s then imprint the value
of this operator on the ancilla. The final Hadamard rotations return the
encoded block to the standard code subspace. Circuits for measuring M 2,3,4
are obtained from the above by cyclically permuting the five qubits in the
code block.
What about encoding? We want to construct a unitary transformation
an initial state.
Since the generators are independent, each element of the stabilizer can be
expressed as a product of generators as a unique way, and we may therefore
rewrite the sum as
X
M = (I + M 4)(I + M 3 )(I + M 2 )(I + M 1) .
M ∈S (7.160)
11011 10001
00110 01001
H̃ 0 = , (7.161)
11000 00101
10111 00011
or
M1 = Y ZIZY
M2 = IXZZX
M3 = ZZXIX
M4 = ZIZY Y (7.162)
– Figure –
The Hadamard prepares √12 (|0i + |1i. If the first qubit is |0i, the other
operations don’t do anything, so I is applied. But if the first qubit is |1i,
then X has been applied to this qubit, and the other gates in the circuit apply
50 CHAPTER 7. QUANTUM ERROR CORRECTION
– Figure –
– Figure –
It then follows that m−1 erasures can be corrected, or that the other n−m+1
parties have all the information.
From these two observations we obtain the two inequalities
n − m < m ⇒ n < 2m ,
m − 1 < n − m + 1 ⇒ n > 2m − 2 . (7.166)
It follows that
n = 2m − 1 , (7.167)
in an ((m, n)) pure state quantum threshold scheme, where each party has
a single qubit. In other words, the threshold is reached as the number of
qubits in hand crosses over from the minority to the majority of all n qubits.
We see that if each share is a qubit, a quantum pure state threshold
scheme is a [[2m−1, k, m]] quantum code with k ≥ 1. But in fact the [[3, 1, 2]]
and [[7, 1, 4]] codes do not exist, and it follows from the Rains bound that the
m > 3 codes do not exist. In a sense, then, the [[5, 1, 3]] code is the unique
quantum threshold scheme.
There are a number of caveats — the restriction n = 2m − 1 continues to
apply if each share is a q-dimensional system rather than a qubit, but various
codes can be constructed for q > 2. (See the exercises for an example.)
Also, we might allow the shared information to be a mixed state (that
encodes a pure state). For example, if we discard one qubit of the five qubit
block, we have a ((3, 4)) scheme. Again, once we have three qubits, we can
correct two erasures, one arising because the fourth share is in the hands of
another party, the other arising because a qubit has been thrown away.
Finally, we have assumed that the shared information is quantum infor-
mation. But if we are only sharing classical information instead, then the
conditions for correcting erasures are less stringent. For example, a Bell pair
may be regarded as a kind of (2, 2) threshold scheme for two bits of classical
information, where the classical information is encoded by choosing one of
7.12. SOME OTHER STABILIZER CODES 53
where |0̄i and |1̄i are the Z̄ eigenstates of the [[5, 1, 3]] code. You can verify
that this code has distance d = 4 (an exercise).
The [[6, 0, 4]] code is interesting because its code state is maximally en-
tangled. We may choose any three qubits from among the six. The density
matrix ρ(3) of those three, obtained by tracing over the other three, is totally
random, ρ(3) = 81 I. In this sense, the [[6, 0, 4]] state is a natural multiparti-
cle analog of the two-qubit Bell states. It is far “more entangled” than the
six-qubit cat state √12 (|000000i + |111111i). If we measure any one of the six
qubits in the cat state, in the {|0i, |1i} basis, we know everything about the
state we have prepared of the remaining five qubits. But we may measure
any observable we please acting on any three qubits in the [[6, 0, 4]] state, and
54 CHAPTER 7. QUANTUM ERROR CORRECTION
we learn nothing about the remaining three qubits, which are still described
by ρ(3) = 81 I.
Our [[6, 0, 4]] state is all the more interesting in that it turns out (but is not
so simple to prove) that its generalizations to more qubits do not exist. That
is, there are no [[2n, 0, n + 1]] binary quantum codes for n > 3. You’ll see in
the exercises, though, that there are other, nonbinary, maximally entangled
states that can be constructed.
ZZ ,
(7.170)
XX .
The code has distance two because no weight-one Pauli operator commutes
with both generators (none of X, Y , Z commute with both X and Z). Cor-
respondingly, a bit flip (X) or a phase flip (Z), or both (Y ) acting on either
qubit in |φ+ i, takes it to an orthogonal state (one of the other Bell states
|φ− i, |ψ + i, |ψ − i).
One way to generalize the Bell states to more qubits is to consider the
n = 4, k = 2 code with stabilizer generators
ZZZZ ,
(7.171)
XXXX .
This is a distance d = 2 code for the same reason as before. The code
subspace is spanned by states of even parity (ZZZZ) that are invariant
under a simultaneous flip of all four qubits (XXX). A basis is:
|0000i + |1111i ,
|0011i + |1100i ,
(7.172)
|0101i + |1010i ,
|0110i + |1001i .
(the length is required to be even so that the generators will commute. The
code subspace is spanned by our familiar friends the 2n−2 cat states
1
√ (|xi + |¬xi), (7.174)
2
where x is an even-weight string of length n = 2m.
can correct one error if: (1) the columns of H̃ are distinct (a distinct syndrome
for each X and Z error) and (2) each sum of a column of HZ with the
corresponding column of HX is distinct from each column of H̃ and distinct
from all other such sums (each Y error can be distinguished from all other
one-qubit errors).
We can readily construct a 5 × 16 matrix H̃ with this property, and so
derive the stabilizer of an [[8, 3, 3]] code; we choose
Hσ
H
H̃ = 11111111 00000000 . (7.176)
00000000 11111111
0 0 0 1 1 1 1 0
whose columns are all the distinct binary strings of length 3, and H σ is ob-
tained from H by performing a suitable permutation of the columns. This
56 CHAPTER 7. QUANTUM ERROR CORRECTION
1 1 0 0 1 1 0 0
1 1 0 1 0 0 1 0
The last two rows of H̃ serve to distinguish each X syndrome from each Y
syndrome or Z syndrome, and the above mentioned property of H σ ensures
that all Y syndromes are distinct. Therefore, we have constructed a length-8
code with k = 8− 5 = 3 that can correct one error. It is actually the simplest
in an infinite class of [[2m , 2m − m − 2, 3]] codes constructed by Gottesman,
with m ≥ 3.
The [[8, 3, 3]] quantum code that we have just described is a close cousin
of the “extended Hamming code,” the self-dual [8,4,4] classical code that
is obtained from the [7,3,4] dual of the Hamming code by adding an extra
parity bit. Its parity check matrix (which is also its generator matrix) is
1 0 1 0 1 0 1 0
0 1 1 0 0 1 1 0
HEH = (7.180)
0 0 0 1 1 1 1 0
1 1 1 1 1 1 1 1
This matrix HEH has the property that, not only are its eight columns dis-
tinct, but also each sum of two columns is distinct from all columns; since
the sum of two columns has 0, not 1, as its fourth bit.
In fact, there is. Our suspicion that the [[5, 1, 3]] code might exist was
aroused by the observation that its parameters saturate the quantum sphere-
packing inequality for t = 1 codes:
1 + 3n = 2n−k , (7.181)
In fact, the perfect binary Hamming codes that saturate this bound for q = 2
with parameters
n = 2m − 1, k = n − m, (7.183)
The field GF (4) has four elements that may be denoted 0, 1, ω, ω̄, where
1 + 1 = ω + ω = ω̄ + ω̄ = 0,
1 + ω = ω̄, (7.185)
α · β 0 + α0 · β . (7.187)
v ∗ u = a + bω ∈ GF (4) , (7.189)
3
Calderbank, Rains, Shor, and Sloane, “Quantum error correction via codes over
GF (4),” quant-ph/9608006.
7.13. CODES OVER GF (4) 59
which is M 3.
The [[5, 1, 3]] code is just one example of a quite general construction.
Consider a subcode C of GF (4)n that is additive (closed under addition),
and self-orthogonal (contained in its dual) with respect to the symplectic
inner product. This GF (4) code can be identified with the stabilizer of a
binary QECC with length n. If the GF (4) code contains 2n−k codewords,
then the QECC has k encoded qubits. The distance d of the QECC is the
minimum weight of a vector in C ⊥ \ C.
Another example of a self-orthogonal linear GF (4) code is the dual of the
m = 3 Hamming code with
1
n = (43 − 1) = 21. (7.196)
3
The Hamming code has 4n−m codewords, and its dual has 4m = 26 codewords.
We immediately obtain a QECC with parameters
and the outer code is the three-qubit “phase code” with stabilizer generators
where Ī = III and X̄ = XXX. You will recognize these as the eight
stabilizer generators of Shor’s code that we have described earlier. In this
case, the inner and outer codes both have distance 1 (e.g., ZII commutes
with the stabilizer of the inner code), yet the concatenated code has distance
3 > d1 d2 = 1. This happens because the code has been cleverly constructed
so that the weight 1 and 2 encoded operations of the inner code do not
commute with the stabilizer of the outer code. (It would have been different
if we had concatenated the repetition code with itself rather than with the
phase code!)
7.15. SOME CODES THAT CORRECT MULTIPLE ERRORS 63
n = n1 n2 . . . nL , (7.204)
and distance
d ≥ d1 d2 . . . dL . (7.205)
code in the case where each of the five qubits is independently subjected to
the depolarizing channel with error probability p (that is X, Y , Z errors each
occur with probability p/3). Recovery is sure to succeed if fewer than two
errors occur in the block. Therefore, as in §7.4.2, we can bound the failure
probability by
!
(1) 5 2
pfail ≡ p ≤ p = 10p2 . (7.208)
2
Now consider the performance of the concatenated [[25, 1, 9]] code. To
keep life easy, we will perform recovery in a simple (but nonoptimal) way:
First we perform recovery on each of the five subblocks, measuring M 1,2,3,4
to obtain an error syndrome for each subblock. After correcting the sub-
blocks, we then measure the stabilizer generators M̄ 1,2,3,4 of the outer code,
to obtains its syndrome, and apply an encoded X̄, Ȳ , or Z̄ to one of the
subblocks if the syndrome reveals an error.
For the outer code, recovery will succeed if at most one of the subblocks
is damaged, and the probability p(1) of damage to a subblock is bounded as
in eq. (7.208); we conclude that the probability of a botched recovery for the
[[25, 1, 9]] code is bounded above by
Our recovery procedure is clearly not the best possible, because four errors
can induce failure if there are two each in two different subblocks. Since the
code has distance nine, there is a better procedure that would always recover
successfully from four errors, so that p(2) would be of order p5 rather than
p4 . Still, the suboptimal procedure has the advantage that it is very easily
generalized, (and analyzed) if there are many levels of concatenation.
Indeed, if there are L levels of concatenation, we begin recovery at the
innermost level and work our way up. Solving the recursion
We may write
!2L
(L) p
p ≤ po , (7.212)
po
1
where po = 10 is an estimate of the threshold error probability that can be
tolerated (we will obtain better codes and better estimates of this threshold
below). Note that to obtain
In principle, the concatenated code at a high level could fail with many
fewer than n/10 errors, but these would have to be distributed in a highly
conspiratorial fashion that is quite unlikely for n large.
The concatenated encoding of an unknown quantum state can be carried
out level by level. For example to encode a|0i + b|1i in the [[25, 1, 9]] block,
we could first prepare the state a|0̄i + b|1̄i in the five qubit block, using the
encoding circuit described earlier, and also prepare four five-qubit blocks in
the state |0̄i. The a|0̄i+|1̄i can be encoded at the next level by executing the
encoded circuit yet again, but this time with all gates replaced by encoded
gates acting on five-qubit blocks. We will see in the next chapter how these
encoded gates are constructed.
m
There are 22 such functions forming what we may regard as a binary vector
space of dimension 2m . It will be useful to have a basis for this space. Recall
(§6.1), that any Boolean function has a disjunctive normal form. Since the
NOT of a bit x is 1 − x, and the OR of two bits x and y can be expressed as
x ∨ y == x + y − xy , (7.216)
1, xi , xi xj , xi xj xk , . . . , (7.217)
4
See, e.g., MacWilliams and Sloane, Chapter 13.
7.15. SOME CODES THAT CORRECT MULTIPLE ERRORS 67
1 = (11111111)
x0 = (10101010)
x1 = (11001100)
x2 = (11110000)
x0 x1 = (10001000)
x0 x2 = (10100000)
x1 x2 = (11000000)
x0 x1 x2 = (10000000) . (7.218)
• R(m−1, m) is the dual of the repetition code, the space on all length-2m
even-weight strings.
It is because of this nice duality property that Reed–Muller codes are well-
suited for the CSS construction of quantum codes.
In particular, the Reed–Muller code is weakly self-dual for r ≤ m − r − 1,
or 2r ≥, m − 1, and self-dual for 2r = m − 1. In the self-dual case, the
distance is
1 √
d = 2m−r = 2 2 (m+1) = 2n , (7.226)
(The [8, 4, 4] code is the extended Hamming code as we have already noted.)
Associated with these self-dual codes are the k = 0 quantum codes with
parameters
and so forth.
One way to obtain a k = 1 quantum code is to puncture the self-dual
Reed–Muller code, that is, to delete one of the n = 2m bits from the code.
(It turns out not to matter which bit we delete.) The q classical punctured
1
m (m−1)
code has parameters n = 2 − 1, d = 2 2 − 1 = 2(n + 1) − 1, and
1
k = 2 (n + 1). Furthermore, the dual of the punctured code is its even
subcode. (The even subcode consists of those RM codewords for which the
bit removed by the puncture is zero, and it follows from the self-duality of
the RM code that these are orthogonal to all the words (both odd and even
weight) of the punctured code.) From these punctured codes, we obtain, via
the CSS construction, k = 1 quantum codes with parameters
n = 2m = 64
d = 2m−r = 8 ! !
6 6
k = 1+6+ + = 1 + 6 + 15 + 20 = 42 , (7.231)
2 3
n = 2m = 64
d = 2m−r = 16 !
6
k = 1+6+ = 1 + 6 + 15 = 22 , (7.232)
2
Many other weakly self-dual codes are known and can likewise be employed.
In fact, perfect codes that correct more than one error are extremely rare.
It can be shown5 that the only perfect codes (linear or nonlinear) over any
finite field that can correct more than one error are the [23, 12, 7] code and
one other binary code discovered by Golay, with parameters [11, 6, 5].
The [24, 12, 8] Golay code has a very intricate symmetry. The symmetry
is characterized by its automorphism group — the group of permutations of
the 24 bits that take codewords to codewords. This is the Mathieu group
M24, a sporadic simple group of order 244,823,040 that was discovered in the
19th century.
The 212 = 4096 codewords have the weight distribution (in an obvious
notation)
and it arises for this combination reason: with each weight-8 codeword we
associate the eight-element set (“octad”) where the codeword has its support.
Each 5-element subset of the 24 bits is contained in exactly one octad (a
reflection of the code’s large symmetry).
What makes the Golay code important in mathematics? Its discovery
in 1949 set in motion a sequence of events that led, by around 1980, to a
complete classification of the finite simple groups. This classification is one
of the greatest achievements of 20th century mathematics.
(A group is simple if it contains no nontrivial normal subgroup. The finite
simple groups may be regarded as the building blocks of all finite groups in
5
MacWilliams and Sloane §6.10.
72 CHAPTER 7. QUANTUM ERROR CORRECTION
the sense that for any finite group G there is a unique decomposition of the
form
G ≡ G0 ⊇ G1 ⊇ G2 ≥ . . . ⊇ Gn , (7.237)
Then ~x ∈ Λ if
(ii) The xj2 ’s are an even parity 24-bit string if the xj0 ’s are 0, and an odd
parity 24-bit string if the xj0 ’s are 1.
(iii) The xj1 ’s are a 24-bit string contained in the Golay code.
When these rules are applied, a negative number is represented by its binary
complement, e.g.
−1 = . . . 11111 ,
−2 = . . . 11110 ,
−3 = . . . 11101 ,
etc. (7.239)
We can easily check that Λ is a lattice; that is, it is closed under addition.
(Bits other than the last three in the binary expansion of the xj ’s are unre-
stricted).
We can now count the number of nearest neighbors to the origin (or
the number of spheres that touch any given sphere). These points are all
7.15. SOME CODES THAT CORRECT MULTIPLE ERRORS 73
(±2)8 : 27 · 759
(±3)(∓1)23 : 212 · 24
!
2 2 24
(±4) : 2 · . (7.240)
2
That is, there are 759 · 27 neighbors that have eight components with the
values ±2 — their support is on one of the 759 weight-8 Golay codewords,
and the number of − signs must be even. There are 212 · 24 neighbors that
have one component with value ±3 (this component can be chosen in 24
ways) and the remaining 23 components have the value (∓1). If, say, +3 is
chosen, then the position of the +3, together with the position of the −1’s,
can be any of the 211 Golay codewords with value 1 at the position of the
+3. There are 22 · 242
neighbors with two components each taking the value
±4 (the signs are unrestricted). Altogether, the coordination number of the
lattice is 196, 560.
The Leech lattice has an extraordinary automorphism group discovered
by Conway in 1968. This is the finite subgroup of the 24-dimensional rotation
group SO(24) that preserves the lattice. The order of this finite group (known
as ·0, or “dot oh”) is
222 · 39 · 54 · 72 · 11 · 13 · 23 = 8, 315, 553, 613, 086, 720, 000 ' 8.3 × 1018 .
(7.241)
If its two element center is modded out, the sporadic simple group ·1 is
obtained. At the time of its discovery, ·1 was the largest of the sporadic
simple groups that had been constructed.
The Leech lattice and its automorphism group eventually (by a route
that won’t be explained here) led Griess in 1982 to the construction of the
most amazing sporadic simple group of all (whose existence had been inferred
earlier by Fischer and Griess). It is a finite subgroup of the rotation group in
196,883 dimensions, whose order is approximately 8.08×1053 . This behemoth
known as F1 has earned the nickname “the monster” (though Griess prefers
to call it “the friendly giant”.) It is the largest of the sporadic simple groups,
and the last to be discovered.
Thus the classification of the finite simple groups owes much to (classical)
coding theory, and to the Golay code in particular. Perhaps the theory of
74 CHAPTER 7. QUANTUM ERROR CORRECTION
H(n) = H ⊗ . . . ⊗ H. (7.242)
(n)
We would like to select a code subspace Hcode of H(n) such that quantum
(n)
information residing in Hcode can be subjected to the superoperator
$(n) = $ ⊗ . . . ⊗ $, (7.243)
– Figure –
We observe that the erasure channel can be realized if Alice sends a qubit
to Bob, and a third party Charlie decides at random to either steal the
qubit (with probability p) or allow the qubit to pass unscathed to Bob (with
probability 1 − p).
If Alice sends a large number n of qubits, then about (1 − p)n reach Bob,
and pn are intercepted by Charlie. Hence for p > 21 , Charlie winds up in
possession of more qubits than Bob, and if Bob can recover the quantum
information encoded by Alice, then certainly Charlie can as well. Therefore,
if Q(p) > 0 for p > 12 , Bob and Charlie can clone the unknown encoded
quantum states sent by Alice, which is impossible. (Strictly speaking, they
can clone with fidelity F = 1 − ε, for any ε > 0.) We conclude that Q(p) = 0
for p > 21 .
7.16. THE QUANTUM CHANNEL CAPACITY 77
r ≤ m + Qn. (7.247)
We derive this inequality by noting that Alice and Bob can simulate the m
qubits sent over the perfect channel by sending m/Q over the noisy channel
and so achieve a rate
!
r r
R= = Q, (7.248)
m/Q + n m + Qn
over the noisy channel. Were r to exceed m + Qn, this rate R would exceed
the capacity, a contradiction. Therefore eq. (7.247) is satisfied.
How consider the erasure channel with error probability p1 , and suppose
Q(p1) > 0. Then we can bound Q(p2) for p2 ≤ p1 by
p2 p2
Q(p2 ) ≤ 1 − + Q(p1 ). (7.249)
p1 p1
(In other words, if we plot Q(p) in the (p, Q) plane, and we draw a straight line
segment from any point (p1 , Q1) on the plot to the point (p = 0, Q = 1), then
the curve Q(p) must lie on or below the segment in the interval 0 ≤ p ≤ p1 ; if
Q(p) is twice differentiable, then its second derivative cannot be positive.) To
obtain this bound, imagine that Alice sends n qubits to Bob, knowing ahead
of time that n(1 − p2 /p1 ) specified qubits will arrive safely. The remaining
n(p2 /p1 ) qubits are erased with probability p1 . Therefore, Alice and Bob are
using both a perfect channel and an erasure channel with erasure probability
p1 ; eq. (7.247) holds, and the rate R they can attain is bounded by
p2 p2
R ≤ 1− + Q(p1). (7.250)
p1 p1
On the other hand, for n large, altogether about np2 qubits are erased, and
(1 − p2 )n arrive safely. Thus Alice and Bob have an erasure channel with
erasure probability p2 , except that they have the additional advantage of
78 CHAPTER 7. QUANTUM ERROR CORRECTION
knowing ahead of time that some of the qubits that Alice sends are invul-
nerable to erasure. With this information, they can be no worse off than
without it; eq. (7.249) then follows. The same bound applies to the depolar-
izing channel as well.
Now, the result Q(p) = 0 for p > 1/2 can be combined with eq. (7.249).
We conclude that the curve Q(p) must be on or below the straight line
connecting the points (p = 0, Q = 1) and (p = 1/2, Q = 0), or
1
Q(p) ≤ 1 − 2p, 0≤p≤ . (7.251)
2
In fact, there are stabilizer codes that actually attain the rate 1 − 2p for
0 ≤ p ≤ 1/2. We can see this by borrowing an idea from Claude Shannon,
and averaging over random stabilizer codes. Imagine choosing, in succession,
altogether n − k stabilizer generators. Each is selected from among the
4n Pauli operators, where all have equal a priori probability, except that
each generator is required to commute with all generators chosen in previous
rounds.
Now Alice uses this stabilizer code to encode an arbitrary quantum state
in the 2k -dimensional code subspace, and sends the n qubits to Bob over an
erasure channel with erasure probability p. Will Bob be able to recover the
state sent by Alice?
Bob replaces each erased qubit by a qubit in the state |0i, and then
proceeds to measure all n − k stabilizer generators. From this syndrome
measurement, he hopes to infer the Pauli operator E acting on the replaced
qubits. Once E is known, we can apply E † to recover a perfect duplicate
of the state sent by Alice. For n large, the number of qubits that Bob must
replace is about pn, and he will recover successfully if there is a unique Pauli
operator E that can produce the syndrome that he finds. If more than one
Pauli operator acting on the replaced qubits has this same syndrome, then
recovery may fail.
How likely is failure? Since there are about pn replaced qubits, there are
about 4pn Pauli operators with support on these qubits. Furthermore, for any
particular Pauli operator E, a random stabilizer code generates a random
syndrome — each stabilizer generator has probability 1/2 of commuting with
E, and probability 1/2 of anti-commuting with E. Therefore, the probability
that two Pauli operators have the same syndrome is (1/2)n−k .
There is at least one particular Pauli operator acting on the replaced
qubits that has the syndrome found by Bob. But the probability that an-
7.16. THE QUANTUM CHANNEL CAPACITY 79
other Pauli operator has this same syndrome (and hence the probability of
a recovery failure) is no worse than
n−k
1
Pfail ≤ 4pn = 2−n(1−2p−R) . (7.252)
2
where R = k/n is the rate. Eq. (7.252) bounds the failure probability if
we average over all stabilizer codes with rate R; it follows that at least one
particular stabilizer code must exist whose failure probability also satisfies
the bound.
For that particular code, Pfail gets arbitrarily small as n → ∞, for any rate
R = 1−2p−δ strictly less than 1−2p. Therefore R = 1−2p is asymptotically
attainable; combining this result with the inequality eq. (7.251) we obtain
the capacity of the quantum erasure channel:
1
Q(p) = 1 − 2p, 0≤p≤ . (7.253)
2
If we wanted assurance that a distinct syndrome could be assigned to
all ways of damaging pn erased qubits, then we would require an [[n, k, d]]
quantum code with distance d > pn. Our Gilbert–Varshamov bound of §7.14
guarantees the existence of such a code for
This rate can be achieved by a code that recovers from any of the possible
ways of erasing up to pn qubits. It lies strictly below the capacity for p > 0,
because to achieve high average fidelity, it suffices to be able to correct the
typical erasures, rather than all possible erasures.
each stolen qubit with a maximally mixed qubit. For q > 1/2, Charlie steals
more than half the qubits and is in a better position than Bob to decode the
state sent by Alice. Therefore, to disallow cloning, the rate at which quan-
tum information is sent from Alice to Bob must be strictly zero for q > 1/2
or p > 3/8:
3
Q(p) = 0, p> . (7.255)
8
In fact we can obtain a stronger bound by noting that Charlie can choose
a better eavesdropping strategy – he can employ the optimal approximate
cloner that you studied in a homework problem. This device, applied to
each qubit sent by Alice, replaces it by two qubits that each approximate the
original with fidelity F = 5/6, or
⊗2
1
|ψihψ| → (1 − q)|ψihψ| + q 1 , (7.256)
2
where F = 5/6 = 1 − 1/2q. By operating the cloner, both Charlie and
Bob can receive Alice’s state transmitted through the q = 1/3 depolarizing
channel. Therefore, the attainable rate must vanish; otherwise, by combin-
ing the approximate cloner with quantum error correction, Bob and Charlie
would be able to clone Alice’s unknown state exactly. We conclude that the
capacity vanishes for q > 1/3 or p > 1/4:
1
Q(p) = 0, p> . (7.257)
4
Invoking the bound eq. (7.249) we infer that
1
Q(p) ≤ 1 − 4p, 0≤p≤ . (7.258)
4
This result actually coincides with our bound on the rate of [[n, k, d]] codes
with k ≥ 1 and d ≥ 2pn + 1 found in §7.8. A bound on the capacity is not
the same thing as a bound on the allowable error probability for an [[n, k, d]]
code (and in the latter case the Rains bound is tighter). Still, the similarity
of the two results bound may not be a complete surprise, as both bounds are
derived from the no-cloning theorem.
We can obtain a lower bound on the capacity by estimating the rate that
can be attained through random stabilizer coding, as we did for the erasure
7.16. THE QUANTUM CHANNEL CAPACITY 81
for any δ, ε > 0 and n sufficiently large. Bob’s attempt at recovery can fail if
another among these typical Pauli operators has the same syndrome as the
actual error operator. Since a random code assigns a random (n − k)-bit
syndrome to each Pauli operator, the failure probability can be bounded as
Here the second term bounds the probability of an atypical error, and the
first bounds the probability of an ambiguous syndrome in the case of a typical
error. We see that the failure probability, averaged over random stabilizer
codes, becomes arbitrarily small for large n, for any δ 0 < 0 and rate R such
that
k
R≡ < 1 − H2 (p) − p log 2 3 − δ 0. (7.261)
n
If the failure probability, averaged over codes, is small, there is a particu-
lar code with small failure probability, and we conclude that the rate R is
attainable; the capacity of the depolarizing channel is bounded below as
Not coincidentally, the rate attainable by random coding agrees with the
asymptotic form of the quantum Hamming upper bound on the rate of nonde-
generate [[n, k, d]] codes with d > 2pn; we arrive at both results by assigning
a distinct syndrome to each of the typical errors. Of course, the Gilbert–
Varshamov lower bound on the rate of [[n, k, d]] codes lies below Q(p), as it
is obtained by demanding that the code can correct all the errors of weight
pn or less, not just the typical ones.
This random coding argument can also be applied to a somewhat more
general channel, in which X, Y , and Z errors occur at different rates. (We’ll
82 CHAPTER 7. QUANTUM ERROR CORRECTION
if the rate R satisfies R < 1 − H, then again it is highly unlikely that a single
syndrome of a random stabilizer code will point to more than one typical
error operator.
code is a degenerate code with a relatively small block size. Their idea is
that the degeneracy of the inner code will allow enough typical errors to act
trivially in the code space that a higher rate can be attained than through
random coding alone.
To investigate this scheme, imagine that encoding and decoding are each
performed in two stages. In the first stage, using the (random) outer code
that she and Bob have agreed on, Alice encodes the state that she has selected
in a large n-qubit block. In the second stage, Alice encodes each of these
n-qubits in a block of m qubits, using the inner code. Similarly, when Bob
receives the nm qubits, he first decodes each inner block of m, and then
subsequently decodes the block of n.
We can evidently describe this procedure in an alternative language —
Alice and Bob are using just the outer code, but the qubits are being trans-
mitted through a composite channel.
– Figure –
This modified channel consists (as shown) of: first the inner encoder, then
propagation through the original noisy channel, and finally inner decoding
and inner recovery. The rate that can be attained through the original chan-
nel, via concatenated coding, is the same as the rate that can be attained
through the modified channel, via random coding.
Specifically, suppose that the inner code is an m-qubit repetition code,
with stabilizer
Z 1 Z 2, Z 1 Z 3 , Z 1Z 4 , . . . , Z 1Z m . (7.266)
– Figure –
84 CHAPTER 7. QUANTUM ERROR CORRECTION
where $ denotes the original noisy channel. (We have also suppressed the
final recovery step of the decoding; e.g., if the measured qubits both read
1, we should flip the data qubit. In fact, to simplify the analysis of the
composite channel, we will dispense with this step.)
Since we recall that a CNOT propagates bit flips forward (from control
to target) and phase flips backward (from target to control), we see that for
each possible measurement outcome of the auxiliary qubits, the composite
channel is a Pauli channel. If we imagine that this measurement of the m − 1
inner block qubits is performed for each of the n qubits of the outer block,
then Pauli channels act independently on each of the n qubits, but the chan-
nels acting on different qubits have different parameters (error probabilities
(i) (i) (i) (i)
pI , pX , pY , pZ for the ith qubit). Now the number of typical error operators
acting on the n qubits is
Pn
Hi
2 i=1 (7.267)
where
(i) (i) (i) (i)
Hi = H(pI , pX , pY , pZ ), (7.268)
is the Shannon entropy of the Pauli channel acting on the ith qubit. By the
law of large numbers, we will have
n
X
Hi = nhHi, (7.269)
i=1
for large n, where hHi is the Shannon entropy, averaged over the 2m−1 pos-
sible classical outcomes of the measurement of the extra qubits of the inner
code. Therefore, the rate that can be attained by the random outer code is
1 − hHi
R= , (7.270)
m
(we divide by m, because the concatenated code has a length m times longer
than the random code).
Shor and Smolin discovered that there are repetition codes (values of m)
for which, in a suitable range of p, 1−hHi is positive while 1−H2 (p)−p log 2 3
is negative. In this range, then, the capacity Q(p) is nonzero, showing that
the lower bound eq. (7.262) is not tight.
7.16. THE QUANTUM CHANNEL CAPACITY 85
7
In fact a very slight further improvement can be achieved by concatenating a random
code with the 25-qubit generalized Shor code described in the exercises – then a nonzero
rate is attainable for p < p00max ' .19056 (another 0.1% better than the maximum tolerable
error probability with repetition coding).
86 CHAPTER 7. QUANTUM ERROR CORRECTION
7.17 Summary
Quantum error-correcting codes: Quantum error correction can protect
quantum information from both decoherence and “unitary errors” due to
imperfect implementations of quantum gates. In a (binary) quantum error-
correcting code (QECC), the 2k -dimensional Hilbert space Hcode of k encoded
qubits is embedded in the 2n -dimensional Hilbert space of n qubits. Errors
acting on the n qubits are reversible provided that hψ|M †ν M µ |ψi/hψ|ψi is
independent of |ψi for any |ψi ∈ Hcode and any two Kraus operators M µ,ν
occuring in the expansion of the error superoperator. The recovery superop-
erator transforms entanglement of the environment with the code block into
entanglement of the environment with an ancilla that can then be discarded.
Quantum stabilizer codes: Most QECC’s that have been constructed
are stabilizer codes. A binary stabilizer code is characterized by its stabilizer
S, an abelian subgroup of the n-qubit Pauli group Gn = {I, X, Y , Z}⊗n
(where X, Y , Z are the single-qubit Pauli operators). The code subspace is
the simultaneous eigenspace with eigenvalue one of all elements of S; if S has
n − k independent generators, then there are k encoded qubits. A stabilizer
code can correct each error in a subset E of Gn if for each E a , E b ∈ E,
E †a E b either lies in the stabilizer S or outside of the normalizer S ⊥ of the
stabilizer. If some E †a E b is in S for E a,b ∈ E the code is degenerate; otherwise
it is nondegenerate. Operators in S ⊥ \ S are “logical” operators that act on
encoded quantum information. The stabilizer S can be associated with an
additive code over the finite field GF (4) that is self-orthogonal with respect
to a symplectic inner product. The weight of a Pauli operator is the number
of qubits on which its action is nontrivial, and the distance d of a stabilizer
code is the minimum weight of an element of S ⊥ \ S. A code with length n,
k encoded qubits, and distance d is called an [[n, k, d]] quantum code. If the
code enables recovery from any error superoperator with support on Pauli
operators of weight t or less, we say that the code “can correct t errors.” A
code with distance d can correct [(d−1)/2] in unknown locations or d−1 errors
in known locations. “Good” families of stabilizer codes can be constructed
in which d/n and k/n remain bounded away from zero as n → ∞.
Examples: The code of minimal length that can correct one error is a
[[5, 1, 3, ]] quantum code associated with a classical GF (4) Hamming code.
Given a classical linear code C1 and subcode C2 ⊆ C1 , a Calderbank-Shor-
Steane (CSS) quantum code can be constructed with k = dim(C1) − dim(C2)
encoded qubits. The distance d of the CSS code satisfies d ≥ min(d1 , d⊥ 2 ),
7.18. EXERCISES 87
7.18 Exercises
7.1 Phase error-correcting code
a) Construct stabilizer generators for this code, and construct the log-
ical operations Z̄ and X̄ such that
b) These n−k−2 stabilizer generators that apply I to the last qubit will
still commute and are still independent if we drop the last qubit.
Hence they are the generators for a code with length n−1 and k+1
encoded qubits. Show that if the original code is nondegenerate,
then the distance of the shortened code is at least d − 1. (Hint:
First show that if there is a weight-t element of the (n − 1)-qubit
Pauli group that commutes with the stabilizer of the shortened
code, then there is an element of the n-qubit Pauli group of weight
at most t + 1 that commutes with the stabilizer of the original
code.)
c) Apply the code-shortening procedure of (a) and (b) to the [[5, 1, 3]]
QECC. Do you recognize the code that results? (Hint: It may
be helpful to exploit the freedom to perform a change of basis on
some of the qubits.)
E r,s ≡ X r Z s , r, s = 0, 1, . . . , d − 1 (7.276)
a) Are the E r,s ’s a basis for the space of operators acting on a qudit?
Are they unitary? Evaluate tr(E †r,s E t,u ).
b) The Pauli operators obey
The n-fold tensor products of these qudit Pauli operators form a group
G(d)
n of order d
2n+1
(and if we mod out its d-element center, we obtain
7.18. EXERCISES 91
Z saa .
O
(7.279)
a
X Z Z −1 X −1 I
I X Z Z −1 X −1
(7.283)
X −1 I X Z Z −1
Z −1 X −1 I X Z
(the second, third, and fourth generators are obtained from the first by
a cyclic permutation of the qudits).
a) Find the order of each generator. Are the generators really in-
dependent? Do they commute? Is the fifth cyclic permutation
Z Z −1 X −1 I X independent of the rest?
b) Find the distance of this code. Is the code nondegenerate?
c) Construct the encoded operations X̄ and Z̄, each expressed as an
operator of weight 3. (Be sure to check that these operators obey
the right commutation relations for any value of d.)
Lecture Notes for Physics 219:
Quantum Computation
John Preskill
California Institute of Technology
14 June 2004
Contents
2
Contents 3
References 67
9
Topological quantum computation
4
9.1 Anyons, anyone? 5
∗
Two interesting approaches to realizing nonabelian anyons — using superconduct-
ing junction arrays and using cold atoms trapped in optical lattices — have been
discussed in the recent literature.
9.2 Flux-charge composites 7
The relative sign in the superposition flips, but this has no detectable
physical effects, since all observables are block diagonal in the (−1)F
basis.
Similarly, in two dimensions, the shift in the angular momentum spec-
trum e−2πiJ = eiθ has no unacceptable physical consequences if there is
9.3 Spin and statistics 9
phase generated when one of the two objects is rotated by 2π. Thus the
connection between spin and statistics continues to hold, in a form that
is a natural generalization of the connection that applies to bosons and
fermions.
The origin of this connection is fairly clear in our flux-charge composite
model, but in fact it holds much more generally. Why? Reading textbooks
on relativistic quantum field theory, one can easily get the impression that
the spin-statistics connection is founded on Lorentz invariance, and has
something to do with the properties of the complexified Lorentz group.
Actually, this impression is quite misleading. All that is essential for a
spin-statistics connection to hold is the existence of antiparticles. Special
relativity is not an essential ingredient.
Consider an anyon, characterized by the phase θ, and suppose that this
particle has a corresponding antiparticle. This means that the particle
and its antiparticle, when combined, have trivial quantum numbers (in
particular, zero angular momentum) and therefore that there are physical
processes in which particle-antiparticle pairs can be created and annihi-
lated. Draw a world line in spacetime that represents a process in which
two particle-antiparticle pairs are created (one pair on the left and the
other pair on the right), the particle from the pair on the right is ex-
changed in a counterclockwise sense with the particle from the pair on
the left, and then both pairs reannihilate. (The world line has an orien-
tation; if directed forward in time it represents a particle, and if directed
backward in time it represents an antiparticle.) Turning our diagram 90◦ ,
we obtain a depiction of a process in which a single particle-antiparticle
pair is created, the particle and antiparticle are exchanged in a clock-
wise sense, and then the pair reannihilates. Turning it 90◦ yet again, we
have a process in which two pairs are created and the antiparticle from
the pair on the right is exchanged, in a counterclockwise sense, with the
antiparticle from the pair on the left, before reannihilation.
Raa = R−1
aā = Rāā . (9.6)
9.4 Combining anyons 11
If a is an anyon with exchange phase eiθ , then its antiparticle ā also has
the same exchange phase. Furthermore, when a and ā are exchanged
counterclockwise, the phase acquired is e−iθ .
These conclusions are unsurprising when we interpret them from the
perspective of our flux-charge composite model of anyons. The antipar-
ticle of the object with flux Φ and charge q has flux −Φ and charge −q.
Hence, when we exchange two antiparticles, the minus signs cancel and
the effect is the same as though the particles were exchanged. But if we
exchange a particle and an antiparticle, then the relative sign of charge
and flux results in the exchange phase e−iqΦ = e−iθ .
But what is the connection between these observations about statistics
and the spin? Continuing to contemplate the same spacetime diagram, let
us consider its implications regarding the orientation of the particles. For
keeping track of the orientation, it is convenient to envision the particle
world line not as a thread but as a ribbon in spacetime. I claim that our
process can be smoothly deformed to one in which a particle-antiparticle
pair is created, the particle is rotated counterclockwise by 2π, and then
the pair reannihilates. A convenient way to verify this assertion is to take
off your belt (or borrow a friend’s). The buckle at one end specifies an
orientation; point your thumb toward the buckle, and following the right-
hand rule, twist the belt by 2π before rebuckling it. You should be able
to check that you can lay out the belt to match the spacetime diagram for
any of the exchange processes described earlier, and also for the process
in which the particle rotates by 2π.
Thus, in a topological sense, rotating a particle counterclockwise by 2π
is really the same thing as exchanging two particles in a counterclockwise
sense (or exchanging particle and antiparticle in a clockwise sense), which
provides a satisfying explanation for a general spin-statistics connection.†
I emphasize again that this argument invokes processes in which particle-
antiparticle pairs are created and annihilated, and therefore the existence
of antiparticles is an essential prerequisite for it to apply.
†
Actually, this discussion has been oversimplified. Though it is adequate for abelian
anyons, we will see that it must be amended for nonabelian anyons, because Rab has
more than one eigenvalue in the nonabelian case. Similarly, the discussion in the next
section of “combining anyons” will need to be elaborated because, in the nonabelian
case, more than one kind of composite anyon can be obtained when two anyons are
fused together.
12 9 Topological quantum computation
Suppose that a is an anyon with exchange phase eiθ , and that we build
a “molecule” from n of these a anyons. What phase is acquired under a
counterclockwise exchange of the two molecules?
The answer is clear in our flux-charge composite model. Each of the n
charges in one molecule acquires a phase eiθ/2 when transported half way
around each of the n fluxes in the other molecule. Altogether then, 2n2
factors of the phase eiθ/2 are generated, resulting in the total phase
2θ
eiθn = ein . (9.7)
Said another way, the phase eiθ occurs altogether n2 times because in
effect n anyons in one molecule are being exchanged with n anyons in
the other molecule. Contrary to what we might have naively expected, if
we split a fermion (say) into two
√identical constituents, the constituents
have, not an exchange phase of −1 = i, but rather (eiπ )1/4 = eiπ/4 .
This behavior is compatible with the spin-statistics connection: the
angular momentum J of the n-anyon molecule satisfies
2J 2θ
e−2πiJn = e−2πin = ein . (9.8)
and this orbital angular momentum combines additively with the spin S
to produce the total angular momentum
−2πJ = −2πL−2πS = 2θ+2θ+ 2π(integer) = 4θ+ 2π(integer) . (9.12)
What if, on the other hand, we build a molecule āa from an anyon a
and its antiparticle ā? Then, as we’ve seen, the spin S has the same value
as for the aa molecule. But the exchange phase has the opposite value, so
that the noninteger part of the orbital angular momentum is −2πL = −2θ
instead of −2πL = 2θ, and the total angular momentum J = L + S is
an integer. This property is necessary, of course, if the āa pair is to be
able to annihilate without leaving behind an object that carries nontrivial
angular momentum.
which is sometimes called the Yang-Baxter relation. You can verify the
Yang-Baxter relation by drawing the two braids σ1 σ2 σ1 and σ2 σ1 σ2 on
a piece of paper, and observing that both describe a process in which
the particles initially in positions 1 and 3 are exchanged counterclockwise
9.5 Unitary representations of the braid group 15
about the particle labeled 2, which stays fixed — i.e., these are topologi-
cally equivalent braids.
V1 V2
V2 V1
V1 V2
length later on) that there is more to a model of anyons than a mere rep-
resentation of the braid group. In our flux tube model of abelian anyons,
we were able to describe not only the effects of an exchange of anyons, but
also the types of particles that can be obtained when two or more anyons
are combined together. Likewise, in a general anyon model, the anyons
are of various types, and the model incorporates “fusion rules” that spec-
ify what types can be obtained when two anyons of particular types are
combined. Nontrivial consistency conditions arise because fusion is asso-
ciate (fusing a with b and then fusing the result with c is equivalent to
fusing b with c and then fusing the result with a), and because the fusion
rules must be consistent with the braiding rules. Though these consis-
tency conditions are highly restrictive, many solutions exist, and hence
many different models of nonabelian anyons are realizable in principle.
1
18 9 Topological quantum computation
θ = πp/q , (9.18)
where q and p (p < 2q) are positive integers with no common factor. Then
we conclude that T1 must have at least q distinct eigenvalues; T1 acting
on α generates an orbit with q distinct values:
2πp
α+ k (mod 2π) , k = 0, 1, 2, . . . , q − 1 . (9.19)
q
Since T1 commutes with H, on the torus the ground state of our anyonic
system (indeed, any energy eigenstate) must have a degeneracy that is an
integer multiple of q. Indeed, generically (barring further symmetries or
accidental degeneracies), the degeneracy is expected to be exactly q.
For a two-dimensional surface with genus g (a sphere with g “handles”),
the degree of this topological degeneracy becomes q g , because there are
operators analogous to T1 and T2 associated with each of the g handles,
and all of the T1 -like operators can be simultaneously diagonalized. Fur-
thermore, we can apply a similar argument to a finite planar medium if
single anyons can be created and destroyed at the edges of the system. For
example, consider an annulus in which anyons can appear or disappear
at the inner and outer edges. Then we could define the unitary opera-
tor T1 as describing a process in which an anyon winds counterclockwise
around the annulus, and a unitary operator T2 as describing a process in
which an anyon appears at the outer edge, propagates to the inner edge,
and disappears. These operators T1 and T2 have the same commutator
as the corresponding operators defined on the torus, and so we conclude
as before that the ground state on the annulus is q-fold degenerate for
θ = πp/q. For a disc with h holes, there is an operator analogous to
T1 that winds an anyon counterclockwise around each of the holes, and
an operator analogous to T2 that propagates an anyon from the outer
boundary of the disk to the edge of the hole; thus the degeneracy is q h .
9.6 Topological degeneracy 19
‡
If you are familiar with Euclidean path integral methods, you’ll find it easy to verify
that in the leading semiclassical approximation the amplitude A for such a tunneling
process in which the anyon propagates a distance L has the form A = Ce−L/L0 ,
where C is a constant and L0 = h̄ (2m∗ ∆)−1/2 ; here h̄ is Planck’s constant and m∗
is the effective mass of the anyon, defined so that the kinetic energy of an anyon
traveling at speed v is 21 m∗ v 2 .
20 9 Topological quantum computation
both arising from processes in which world lines of charges and fluxon link
once with one another. Thus T1,S and T2,S can be diagonalized simulta-
neously, and can be regarded as the encoded Pauli operators Z̄1 and Z̄2
acting on two protected qubits. The operator T2,P , which commutes with
Z̄1 and anticommutes with Z̄2 , can be regarded as the encoded X̄1 , and
similarly T1,P is the encoded X̄2 .
On the torus, the degeneracy of the four ground states is exact for
the ideal Hamiltonian we constructed (the particles have infinite effective
masses). Weak local perturbations will break the degeneracy, but only
by an amount that gets exponentially small as the linear size L of the
torus increases. To be concrete, suppose the perturbation is a uniform
“magnetic field” pointing in the ẑ direction, coupling to the magnetic
moments of the qubits: X
H 0 = −h Z` . (9.22)
`
Because of the nonzero energy gap, for the purpose of computing in per-
turbation theory the leading contribution to the splitting of the degen-
eracy, it suffices to consider the effect of the perturbation in the four-
dimensional subspace spanned by the ground states of the unperturbed
system. In the toric code, the operators with nontrivial matrix elements
in this subspace are those such that Z` ’s act on links that form a closed
loop that wraps around the torus (or X` ’s act on links whose dual links
form a closed loop that wraps around the torus). For an L × L lattice on
the torus, the minimal length of such a closed loop is L; therefore nonva-
nishing matrix elements do not arise in perturbation theory until the Lth
order, and are suppressed by hL . Thus, for small h and large L, memory
errors due to quantum fluctuations occur only with exponentially small
amplitude.
R
The matrix elements Dij (a) are measurable in principle, for example by
conducting interference experiments in which a beam of calibrated charges
can pass on either side of the flux. (The phase of the complex number
R
Dij (a) determines the magnitude of the shift of the interference fringes,
and the modulus of Dij R (a) determines the visibility of the fringes.) Thus
once we have chosen a standard basis for the charges, we can use the
charges to attach labels (elements of G) to all fluxes. The flux labels
are unambiguous as long as the representation R is faithful, and barring
any group automorphisms (which create ambiguities that we are free to
resolve however we please).
However, the group elements that we attach to the fluxes depend on our
conventions. Suppose I am presented with k fluxons (particles that carry
flux), and that I use my standard charges to measure the flux of each
particle. I assign group elements a1 , a2 , . . . , ak ∈ G to the k fluxons. You
are then asked to measure the flux, to verify my assignments. But your
standard charges differ from mine, because they have been surreptitiously
transported around another flux (one that I would label with g ∈ G).
Therefore you will assign the group elements ga1 g −1 , ga2g −1 , . . ., gak g −1
to the k fluxons; our assignments differ by an overall conjugation by g.
The moral of this story is that the assignment of group elements to
fluxons is inherently ambiguous and has no invariant meaning. But be-
cause the valid assignments of group elements to fluxons differ only by
conjugation by some element g ∈ G, the conjugacy class of the flux in
G does have an invariant meaning on which all observers will agree. In-
deed, even if we fix our conventions at the charge bureau of standards, the
group element that we assign to a particular fluxon may change if that
fluxon takes part in a physical process in which it braids with other flux-
ons. For that reason, the fluxons belonging to the same conjugacy class
should all be regarded as indistinguishable particles, even though they
come in many varieties (one for each representative of the class) that can
be distinguished when we make measurements at a particular time and
place: The fluxons are nonabelian anyons.
αβα−1 7→ α , α 7→ β . (9.28)
DED 1
D E
x0
D
x0
It follows that the effect of transporting a charge around the path α, after
the exchange, is equivalent to the effect of transport around the path
αβα−1 , before the exchange; similarly, the effect of transport around β,
after the exchange, is the same as the effect of transport around α before.
We conclude that the braid operator R representing a counterclockwise
26 9 Topological quantum computation
Thus, if the two fluxons are exchanged three times, they swap positions
(the number of exchanges is odd), yet the labeling of the state is unmod-
ified. This observation means that there can be quantum interference
between the “direct” and “exchange” scattering of two fluxons that carry
distinct labels in the same conjugacy class, reinforcing the notion that
fluxes carrying conjugate labels ought to be regarded as indistinguishable
particles.
Since the braid operator acting on pairs of two-cycle fluxes satisfies
3
R = I, its eigenvalues are third roots of unity. For example, by taking
linear combinations of the three states with total flux (123), we obtain
the R eigenstates
where ω = e2πi/3 .
Although a pair of fluxes |a, a−1 i with trivial total flux has trivial braid-
ing properties, it is interesting for another reason — it carries charge. The
way to detect the charge of an object is to carry a flux b around the ob-
ject (counterclockwise); this modifies the object by the action of D R (b) for
some representation R of G. If the charge is zero then the representation
is trivial — D(b) = I for all b ∈ G. But if we carry flux b counterclockwise
around the state |a, a−1 i, the state transforms as
where |α| denotes the order of α. A pair of fluxons in the class α that can
be created in a local process must not carry any conserved charges and
therefore must be in the state |0; αi. Other linear combinations orthogonal
to |0; αi carry nonzero charge. This charge carried by a pair of fluxons can
be detected by other fluxons, yet oddly the charge cannot be localized on
the core of either particle in the pair. Rather it is a collective property of
the pair. If two fluxons with a nonzero total charge are brought together,
complete annihilation of the pair will be forbidden by charge conservation,
even though the total flux is zero.
28 9 Topological quantum computation
where X
χR (a) = R
Dii (a) = tr D R (a) (9.41)
i
is the character of the representation R, evaluated at a. In fact, the
character (a trace) is unchanged by conjugation — it takes the same value
for all a ∈ α. Therefore, eq. (9.40) is also the probability that the pair of
chargeons has zero total charge when one chargeon (initially a member
of a pair in the state |0; Ri) winds around one fluxon (initially a member
of a pair in the state |0; αi). Of course, since the total charge of all four
particles is zero and charge is conserved, after the winding the two pairs
have opposite charges — if the pair of chargeons has total charge R0 , then
the pair of fluxons must have total charge R̄0 , combined with R0 to give
trivial total charge. A pair of particles with zero total charge and flux can
annihilate, leaving no stable particle behind, while a pair with nonzero
charge will be unable to annihilate completely. We conclude, then, that
if the world lines of a fluxon pair and a chargeon pair link once, the
probability that both pairs will be able to annihilate is given by eq. (9.40).
This probability is less than one, provided that the representation of R
is not one dimensional and the class α is not represented trivially. Thus
the linking of the world lines induces an exchange of charge between the
two pairs.
For example, in the case where α is the two-cycle class of G = S3 and
R = [2] (the two-dimensional irreducible representation of S3 ), we see
from eq. (9.37) that χ[2](α) = 0. Therefore, charge is transfered with
certainty; after the winding, both the fluxon pair and the chargeon pair
transform as R0 = [2].
Since the sum over the dimension squared for all irreducible representa-
tions of a finite group is the order of the group, and the order of the
normalizer N (α) is |G|/|α|, we obtain
X
D2 = |α| · |G| = |G|2 ; (9.44)
α
9.10 Superselection sectors of a nonabelian superconductor 31
We have already noted that the fusion of two two-cycle fluxes can yield
either a trivial total flux or a three-cycle flux, and that the charge of the
composite with trivial total flux can be either [+] or [2]. If the total flux
is a three-cycle, then the charge eigenstates are just the braid operator
eigenstates that we constructed in eq. (9.33).
For a system of two anyons, why should the eigenstates of the total
charge also be eigenstates of the braid operator? We can understand this
connection more generally by thinking about the angular momentum of
the two-anyon composite object. The monodromy operator R2 captures
the effect of winding one particle counterclockwise around another. This
winding is almost the same thing as rotating the composite system coun-
terclockwise by 2π, except that the rotation of the composite system also
rotates both of the constituents. We can compensate for the rotation of
the constituents by following the counterclockwise rotation of the compos-
ite by a clockwise rotation of the constituents. Therefore, the monodromy
operator can be expressed as
which is less than one if the flux ab−1 is not the identity (assuming that the
representation R is not one-dimensional and represents ab−1 nontrivially).
Thus, if annihilation of the chargeon pair does not occur, we know for sure
that a and b are distinct fluxes, and each time annihilation does occur,
it becomes increasingly likely that a and b are equal. By repeating this
procedure a modest number of times, we can draw a conclusion about
whether a and b are the same, with high statistical confidence.
This procedure allows us to sort the fluxon pairs into bins, where each
pair in a bin has the same flux. If a bin contains n pairs, its state is, in
general, a mixture of states of the form
X
ψa|a, a−1 i⊗n . (9.50)
a∈G
By discarding just one pair in the bin, each such state becomes a mixture
X ⊗(n−1)
ρa |a, a−1 iha, a−1 | ; (9.51)
a∈G
we may regard each bin as containing (n − 1) pairs, all with the same
definite flux, but where that flux is as yet unknown.
Which bin is which? We want to label the bins with elements of G. To
arrive at a consistent labeling, we withdraw fluxon pairs from three dif-
ferent bins. Suppose the three pairs are |a, a−1 i, |b, b−1i, and |c, c−1i, and
that we want to check whether c = ab. We create a chargeon-antichargeon
pair, carry the chargeon around a closed path that encloses the first mem-
ber of the first fluxon pair, the first member of the second fluxon pair,
and second member of the third fluxon pair, and observe whether the
reunited chargeon pair annihilates or not. Since the total flux enclosed
by the chargeon’s path is abc−1 , by repeating this procedure we can de-
termine with high statistical confidence whether ab and c are the same.
Such observations allow us to label the bins in some manner that is consis-
tent with the group composition rule. This labeling is unique apart from
group automorphisms (and ambiguities arising from any automorphisms
may be resolved arbitrarily).
Once the flux bureau of standards is established, we can use it to mea-
sure the unknown flux of an unlabeled pair. If the state of the pair to
be measured is |d, d−1 i, we can withdraw the labeled pair |a, a−1 i from
a bin, and use chargeon pairs to measure the flux ad−1 . By repeating
this procedure with other labeled fluxes, we can eventually determine the
value of the flux d, realizing a projective measurement of the flux.
For a simulation of a quantum circuit using fluxons, we will need to
perform logic gates that act upon the value of the flux. The basic gate we
will use is realized by winding counterclockwise a fluxon pair with state
9.11 Quantum computing with nonabelian fluxons 35
|a, a−1 i around the first member of another fluxon pair with state |b, b−1i.
Since the |a, a−1 i pair has trivial total flux, the |b, b−1i pair is unaffected
by this procedure. But since in effect the flux b travels counterclockwise
about both members of the pair whose initial state was |a, a−1 i, this pair
is transformed as
|a, a−1 i 7→ |bab−1 , ba−1b−1 i . (9.52)
We will refer to this operation as the conjugation gate acting on the fluxon
pair.
To summarize what has been said so far, our primitive and derived
capabilities allow us to: (1) Perform a projective flux measurement, (2)
perform a destructive measurement that determines whether or not the
flux and charge of a pair is trivial, and (3) execute a conjugation gate.
Now we must discuss how to simulate a quantum circuit using these ca-
pabilities.
The next step is to decide how to encode qubits using fluxons. Ap-
propriate encodings can be chosen in many ways; we will stick to one
particular choice that illustrates the key ideas — namely we will encode a
qubit by using a pair of fluxons, where the total flux of the pair is trivial.
We select two noncommuting elements a, b ∈ G, where b2 = e, and choose
a computational basis for the qubit
The crucial point is that a single isolated fluxon with flux a looks iden-
tical to a fluxon with the conjugate flux bab−1 . Therefore, if the two
fluxons in a pair are kept far apart from one another, local interactions
with the environment will not cause a superposition of the states |0̄i and
|1̄i to decohere. The quantum information is protected from damage be-
cause it is stored nonlocally, by exploiting a topological degeneracy of the
states where the fluxon and antifluxon are pinned to fixed and distantly
separated positions.
However, in contrast with the topological degeneracy that arises in
systems with abelian anyons, this protected qubit can be measured rela-
tively easily, without resorting to delicate interferometric procedures that
extract Aharonov-Bohm phases. We have already described how to mea-
sure flux using previously calibrated fluxons; therefore we can perform
a projective measurement of the encoded Pauli operator Z̄ (a projection
onto the basis {|0̄i, |1̄i}). We can also measure the complementary Pauli
operator X̄, albeit destructively and imperfectly. The X̄ eigenstates are
1 1
|±i = √ (|0̄i ± |1̄i) ≡ √ |a, a−1 i ± |bab−1 , ba−1 b−1 i ;
(9.54)
2 2
36 9 Topological quantum computation
where α is the conjugacy class that contains a. On the other hand, the
state |+i has a nonzero overlap with |0; αi
p
h+|0; αi = 2/|α| ; (9.56)
Therefore, if the two members of the fluxon pair are brought together,
complete annihilation is impossible if the state of the pair is |−i, and
occurs with probability Prob(0) = 2/|α| if the state is |+i.
Note that it is also possible to prepare a fluxon pair in the state |+i.
One way to do that is to create a pair in the state |0; αi. If α contains
only the two elements a and bab−1 we are done. Otherwise, we compare
the newly created pair with calibrated pairs in each of the states |c, c−1i,
where c ∈ α and c is distinct from both a and bab−1 . If the pair fails to
match any of these |c, c−1i pairs, its state must be |+i.
To go further, we need to characterize the computational power of the
conjugation gate. Let us use a more compact notation, in which the
state |x, x−1 i of a fluxon pair is simply denoted |xi, and consider the
transformations of the state |x, y, zi that can be built from conjugation
gates. By winding the third pair through the first, either counterclockwise
or clockwise, we can execute the gates
and by winding the third pair through the second, either counterclockwise
or clockwise, we can execute
furthermore, by borrowing a pair with flux |ci from the bureau of stan-
dards, we can execute
where the function f (x, y) can be expressed in product form — that is,
as a finite product of group elements, where the elements appearing in
9.11 Quantum computing with nonabelian fluxons 37
the product may be the inputs x and y, their inverses x−1 and y −1 , or
constant elements of G, each of which may appear in the product any
number of times.
What are the functions f (x, y) that can be expressed in this form?
The answer depends on the structure of the group G, but the following
characterization will suffice for our purposes. Recall that a subgroup H
of a finite group G is normal if for any h ∈ H and any g ∈ G, ghg −1 ∈ H,
and recall that a finite group G is said to be simple if G has no normal
subgroups other than G itself and the trivial group {e}. It turns out that
if G is a simple nonabelian finite group, then any function f (x, y) can be
expressed in product form. In the computer science literature, a closely
related result is often called Barrington’s theorem.
In particular, then, if the group G is a nonabelian simple group, there
is a function f realizable in product form such that
f (a, a) = f (a, bab−1) = f (bab−1 , a) = e , f (bab−1 , bab−1) = b . (9.61)
Thus for x, y, z ∈ {a, bab−1}, the action eq. (9.60) causes the flux of the
third pair to “flip” if and only if x = y = bab−1 ; we have constructed
from our elementary operations a Toffoli gate in the computational ba-
sis. Therefore, conjugation gates suffice for universal reversible classical
computation acting on the standard basis states.
The nonabelian simple group of minimal order is A5 , the group of even
permutations of five objects, with |A5 | = 60. Therefore, one concrete
realization of universal classical computation using conjugation gates is
obtained by choosing a to be the three-cycle element a = (345) ∈ A5 , and
b to be the product of two-cycles b = (12)(34) ∈ A5 , so that bab−1 = (435).
With this judicious choice of the group G, we achieve a topological real-
ization of universal classical computation, but how can be go still further,
to realize universal quantum computation? We have the ability to prepare
computational basis states, to measure in the computational basis, and
to execute Toffoli gates, but these tools are entirely classical. The only
nonclassical tricks at our disposal are the ability to prepare X̄ = 1 eigen-
states, and the ability to perform an imperfect destructive measurement
of X̄. Fortunately, these additional capabilities are sufficient.
In our previous discussions of quantum fault tolerance, we have noted
that if we can do the classical gates Toffoli and CNOT, it suffices for
universal quantum computation to be able to apply each of the Pauli op-
erators X, Y , and Z, and to be able to perform projective measurements
of each of X, Y , and Z. We already know how to apply the classical
gate X and to measure Z (that is, project onto the computational basis).
Projective measurement of X and Y , and execution of Z, are still missing
from our repertoire. (Of course, if we can apply X and Z, we can also
apply their product ZX = iY .)
38 9 Topological quantum computation
CNOT : XI 7→ XX , (9.62)
where the first qubit is the control and the second qubit is the target of
the CNOT. Therefore, CNOT gates, together with the ability to prepare
X = 1 eigenstates and to perform destructive measurements of X, suffice
to realize projective measurements of X. We can prepare an ancilla qubit
in the X = 1 eigenstate, perform a CNOT with the ancilla as control
and the data to be measured as target, and then measure the ancilla
destructively. The measurement prepares the data in an eigenstate of X,
whose eigenvalue matches the outcome of the measurement of the ancilla.
In our case, the destructive measurement is not fully reliable, but we
can repeat the measurement multiple times. Each time we prepare and
measure a fresh ancilla, and after a few repetitions, we have acceptable
statistical confidence in the inferred outcome of the measurement.
Now that we can measure X projectively, we can prepare X = −1
eigenstates as well as X = 1 eigenstates (for example, we follow a Z mea-
surement with an X measurement until we eventually obtain the outcome
X = −1). Then, by performing a CNOT gate whose target is an X = −1
eigenstate, we can realize the Pauli operator Z acting on the control qubit.
It only remains to show that a measurement of Y can be realized.
Measurement of Y seems problematic at first, since our physical capa-
bilities have not provided any means to distinguish between Y = 1 and
Y = −1 eigenstates (that is, between a state ψ and its complex conjugate
ψ ∗ ). However, this ambiguity actually poses no serious difficulty, because
it makes no difference how the ambiguity is resolved. Were we to replace
measurement of Y by measurement of −Y in our simulation of a unitary
transformation U , the effect would be that U ∗ is simulated instead; this
replacement would not alter the probability distributions of outcomes for
measurements in the standard computational basis.
To be explicit, we can formulate a protocol for measuring Y by noting
first that applying a Toffoli gate whose target qubit is an X = −1 eigen-
state realizes the controlled-phase gate Λ(Z) acting on the two control
qubits. By composing this gate with the CNOT gate Λ(X), we obtain
the gate Λ(iY ) acting as
where the first qubit is the control and the second is the target. Now
suppose that my trusted friend gives me just one qubit that he assures
me has been prepared in the state |Y = 1i. I know how to prepare
|X = 1i states myself and I can execute Λ(iY ) gates; therefore since a
Λ(iY ) gate with |Y = 1i as its target transforms |X = 1i to |Y = 1i, I
can make many copies of the |Y = 1i state I obtained from my friend.
When I wish to measure Y , I apply the inverse of Λ(iY ), whose target is
the qubit to be measured, and whose control is one of my Y = 1 states;
then I perform an X measurement of the ancilla to read out the result of
the Y measurement of the other qubit.
What if my friend lies to me, and gives me a copy of the state |Y = −1i
instead? Then I’ll make many copies of the |Y = −1i state, and I will
be measuring −Y when I think I am measuring Y . My simulation will
work just the same as before; I’ll actually be simulating the complex
conjugate of the ideal circuit, but that won’t change the final outcome of
the quantum computation. If my friend flipped a coin to decide whether
to give me the |Y = 1i state or the |Y = −1i, this too would have no
effect on the fidelity of my simulation. Therefore, it turns out I don’t
need by friend’s help at all — instead of using the |Y = 1i state I would
have received from him, I may use the random state ρ = I/2 (an equally
weighted mixture of |Y = 1i and |Y = −1i, which I know how to prepare
myself).
This completes the demonstration that we can simulate a quantum cir-
cuit efficiently and fault tolerantly using the fluxons and chargeons of
a nonabelian superconductor, at least in the case where G is a simple
nonabelian finite group.§ Viewed as a whole, including all state prepara-
tion and calibration of fluxes, the simulation can be described this way:
Many pairs of anyons (fluxons and chargeons) are prepared, the anyon
world lines follow a particular braid, and pairs of anyons are fused to see
whether they will annihilate. The simulation is nondeterministic in the
sense that the actual braid executed by the anyons depends on the out-
comes of measurements performed (via fusion) during the course of the
simulation. It is robust if the temperature is low compared to the energy
gap, and if particles are kept sufficiently far apart from one another (ex-
cept when pairs are being created and fused), to suppress the exchange
of virtual anyons. Small deformations in the world lines of the particles
have no effect on the outcome of the computation, as long as the braiding
of the particles is in the correct topological class.
§
Mochon has shown that universal quantum computation is possible for a larger class
of groups.
40 9 Topological quantum computation
1. A list of particle types. The types are labels that specify the possible
values of the conserved charge that a particle can carry.
2. Rules for fusing and splitting, which specify the possible values of the
charge that can be obtained when two particles of known charge
are combined together, and the possible ways in which the charge
carried by a single particle can be split into two parts.
3. Rules for braiding, which specify what happens when two particles are
exchanged (or when one particle is rotated by 2π).
9.12.1 Labels
I will use Latin letters {a, b, c, . . .} for the labels that distinguish different
types of particles. (For the case of the nonabelian superconductor, the
label was (α, R(α)), specifying a conjugacy class and an irreducible rep-
resentation of the normalizer of the class, but now our notation will be
more compact). We will assume that the set of possible labels is finite.
The symbol a represents the value of the conserved charge carried by the
9.12 Anyon models generalized 41
where each c
Nab is a nonnegative integer and the sum is over the complete
set of labels. Note that a, b and c are labels, not vector spaces; the
product on the left-hand side is not a tensor product and the sum on
the right-hand side is not a direct sum. Rather, the fusion rules can be
regarded as an abstract relation on the label set that maps the ordered
c . This relation is symmetric in a and b (a × b = b × a)
triple (a, b; c) to Nab
— the possible charges of the composite do not depend on whether a is on
the left or the right. Read backwards, the fusion rules specify the possible
ways for the charge c to split into two parts with charges a and b.
c
If Nab = 0, then charge c cannot be obtained when we combine a and
c c
b. If Nab = 1, then c can be obtained — in a unique way. If Nab > 1,
42 9 Topological quantum computation
fusing two charges can yield a third charge in more than one possible way
should be familiar from group representation theory. For example, the
rule governing the fusion of two octet representations of SU(3) is
8 × 8 = 1 + 8 + 8 + 10 + 10 + 27 , (9.66)
8
so that N88 = 2. We emphasize again, however, that while the fusion
rules for group representations can be interpreted as a decomposition of a
tensor product of vector spaces as a direct sum of vector spaces, in general
the fusion rules in an anyon model have no such interpretation.
The Nabc distinguishable ways that c can arise by fusing a and b can
c
{|ab; c, µi , µ = 1, 2, . . . , Nab}. (9.67)
It is quite convenient to introduce a graphical notation for the fusion basis
states:
a b c
P | ab; c, P ² P ¢ ab; c, P |
c a b
c´ a b
P´ P
cc Pc
a b G G
c P ¦P
c,
c a b
P P
c a b
There are some natural isomorphisms among fusion spaces. First of all,
c ∼ c
Vab = Vba ; these vector spaces are associated with different labelings of
the two particles (if a 6= b) and so should be regarded as distinct, but they
are isomorphic spaces because fusion is symmetric. We may also “raise
and lower indices” of a fusion space by replacing a label by its conjugate,
e.g.,
c ∼ b̄ ∼ 1 ∼ b̄c ∼ āb̄ ∼
Vab = Vac̄ = Vabc̄ = Va , = Vc̄ = · · · ; (9.70)
in the diagrammatic notation, we have the freedom to reverse the sense
of a line while conjugating the line’s label. The space Vabc̄1 , represented
as a diagram with three incoming lines, is the space spanned by the dis-
tinguishable ways to obtain the trivial total charge 1 when fusing three
particles with labels a, b, c̄.
The charge 1 deserves its name because it fuses trivially with other
particles:
a×1=a . (9.71)
a ∼ 1
Because of the isomorphism Va1 = Vaā , we conclude that ā is the unique
label that can fuse with a to yield 1, and that this fusion can occur in
a ∼ aā
only one way. Similarly, Va1 = V1 means that pairs of particles created
out of the vacuum have conjugate charges.
An anyon model is nonabelian if
!
M X
c c
dim Vab = Nab ≥2 (9.72)
c c
for at least some pair of labels ab; otherwise the model is abelian. In an
abelian model, any two particles fuse in a unique way, but in a nonabelian
model, there are some pairs of particles that can fuse in more than one
way, and there is a Hilbert space of two or more dimensions spanned by
these distinguishable states. We will refer to this space as the “topological
44 9 Topological quantum computation
If we choose canonical bases {|ba; c, µi} and {|ab; c, µ0i} for these two
spaces, R can be expressed as the unitary matrix
0
|ab; c, µ0i (Rcab)µµ ;
X
R : |ba; c, µi 7→ (9.74)
µ0
note that R may have a nontrivial action on the fusion states. When
we represent the action of R diagrammatically, it is convenient to fix the
positions of the labels a and b on the incoming lines, and twist the lines
counterclockwise as they move toward the fusion vertex (µ)— the graph
c
with twisted lines represents the state in Vab obtained by applying R to
c
|ba; c, µi, which can be expanded in terms of the canonical basis for Vab :
9.12 Anyon models generalized 45
a b a b
c
c P
P ¦ R ba P Pc
P c
c c
d
Correspondingly, there are two natural orthonormal bases for Vabc , which
we may denote
|(ab)c → d; eµνi ≡ |ab; e, µi ⊗ |ec; d, νi ,
|a(bc) → d; e0µ0 ν 0 i ≡ |ae0 ; d, ν 0i ⊗ |bc; e0, µ0 i , (9.80)
and which are related by a unitary transformation F :
X e0 µ0 ν 0
0 0 0 d
|(ab)c → d; eµνi = |a(bc) → d; e µ ν i Fabc . (9.81)
eµν
e0 µ0 ν 0
a b c a b c
c c c
d ePQ
¦ F
P Pc
e abc ePQ e´
Q ecP cQ c Qc
d d
The unitary matrices Fabc d are sometimes called fusion matrices; how-
ever, rather than risk causing confusion between F and the fusion rules
c , I will just call it the F -matrix.
Nab
Note that this space does not have a natural decomposition as a tensor
product of subsystems associated with the localized particles; rather, we
have expressed it as a direct sum of many tensor products. For nonabelian
anyons, its dimension
dim Vac1 a2 a3 ···an ≡ Nac1a2 a3 ···an
X
= Nab11a2 Nbb12a3 Nbb23a4 . . . Nbcn−2 an (9.83)
b1 ,b2 ,b3 ,...bn−2
9.12 Anyon models generalized 47
{|a1 a2 ; b1, µ1 i|b1 a3 ; b2, µ2 i · · · |bn−3 an−1 ; bn−2 , µn−2 i|bn−2 an ; c, µn−1 i} ,
(9.84)
or in diagrammatic notation:
a2 a3 a4 an-1 an
a1 P1 P2 P3 Pn 2 Pn 1 c
b1 b2 b3 bn-3 bn-2
1
F R F
o o o
d , which is
To reduce the number of subscripts, we will call this space Vacb
transformed by the exchange as
d d
B : Vacb → Vabc . (9.86)
Let us express the action of B in terms of the standard bases for the two
d d
spaces Vacb and Vabc .
b c b c
ecP cQ c
a d
¦ B d
abc ePQ
a d
e e cP cQ c e´
To avoid cluttering the equations, I suppress the labels for the fusion
space basis elements (it is obvious where they should go). Hence we write
X f
d
B|(ac)b → d; ei = B|a(cb) → d; f i Facb
e
f
f
|a(bc) → d; f iRfbc Facb
X
d
=
e
f
h d i g f
Rfbc Facb
X
= |(ab)c → d; gi F −1 abc f
d
,
e
f,g
(9.87)
9.13 Simulating anyons with a quantum circuit 49
or
X g
d
B : |(ac)b → d; ei 7→ |(ab)c → d; gi Babc , (9.88)
e
g
where
g X h d ig f
d
Babc = F −1 abc Rfbc Facb
d
. (9.89)
e f e
f
Va11 a2 a3 ···an ∼
M
= Vab11a2 ⊗ Vbb12a3 ⊗ · · · ⊗ Vbān−3
n
an−1 , (9.90)
b1 ,b2 ,...,bn−3
where M
1
Hd = Vabc . (9.92)
a,b,c
Here, a, b, c are summed over the complete label set of the model (which
we have assumed is finite), so that Hd contains all the possible fusion
states of three particles, and the dimension d of Hd is
X
1
d= Nabc . (9.93)
a,b,c
e b e b
g
a
d f
¦B
g
f
aeb d
a
g g f
d
b e
b e
g
¦ abe
f g
F g
a f d
d d g
a f
we have separated the sum over g into the component for which (be) fuses
to 1, plus the remainder. After the F -move which (is just a particular
two-qudit unitary gate), we can sample the probability that (be) fuses to
1 by performing a projective measurement of the second qudit in the basis
{|b, ḡ, ei}, and recording whether g = 1.
This completes our demonstration that a quantum circuit can simulate
efficiently a topological quantum computer.
when two anyons are brought together they either annihilate, or fuse to
become a single anyon. The model is nonabelian because two anyons can
fuse in two distinguishable ways.
Consider the standard basis for the Hilbert space V1bn of n anyons, where
each basis element describes a distinguishable way in which the n anyons
could fuse to give total charge b ∈ {0, 1}. If the two anyons furthest to
the left were fused first, the resulting charge could be 0 or 1; this charge
could then fuse with the third anyon, yielding a total charge of 0 or 1,
and so on. Finally, the last anyon fuses with the total charge of the first
n − 1 anyons to give the total charge b. Altogether n − 2 intermediate
charges b1 , b2, b3 , . . . bn−2 appear in this description of the fusion process;
thus the corresponding basis element can be designated with a binary
string of length n − 2. If the total charge is 0, the result of fusing the
9.15 Quantum dimension 53
Nn0 = Nn−1
0 0
+ Nn−2 . (9.97)
n = 1 2 3 4 5 6 7 8 9...
(9.98)
Nn0 = 0 1 1 2 3 5 8 13 21 . . .
a a a a a a
1 1
a a
da
If two pairs are created and then each pair annihilates immediately, the
world lines of the pairs form two closed loops, and |R| counts the number
of distinct “colors” that propagate around each loop. But if the particle
from each pair annihilates the antiparticle from the other pair, there is
only one closed loop and therefore one sum over colors; if we normalize
the process on the left to unity, the amplitude for the process on the right
is suppressed by a factor of 1/|R|. To say the same thing in an equation,
the normalized state of an RR̄ pair is
1 X
|RR̄i = p |ii|īi , (9.99)
|R| i
where {|ii} denotes an orthonormal basis for R and {|īi} is a basis for R̄.
Suppose that two pairs |RR̄iand |R0 R̄0 i are created; if the pairs are fused
after swapping partners, the amplitude for annihilation is
1 X
hRR̄, R0 R̄0 |RR̄0 , R0 R̄i = hj j̄, j 0j̄ 0 |iī0 , i0īi
|R|2 0 0
i,i ,j,j
1 X 1 X 1
= δji δji0 δj 0 i0 δj 0 i = δii = . (9.100)
|R|2 |R|2 |R|
i,i0 ,j,j 0 i
fusing the particle from each pair with the antiparticle from the neighbor-
ing pair. However, each zigzag reduces the amplitude by another factor
of 1/da. We can compensate for these factors of 1/d √ a if we weight each
pair creation or annihilation event by a factor of da. With this new
convention, we can bend the world line of a particle forward or backward
in time without paying any penalty:
a a
da da da
da da da
a a
b b
a P a
d a db
a b
¦P c P
c,
P
¦P
c,
b
P
a
¦N
c
c
ab
c
¦N
c
c
ab dc
c
Nad~ = da d~ . (9.102)
Na = |vidahv| + · · · , (9.104)
where
d~
sX
|vi = , D= dc 2 , (9.105)
D c
where the ellipsis represents terms that are exponentially suppressed for
large n. We see that the quantum dimension da controls the rate of growth
of the n-particle Hilbert space for anyons of type a.
Because the label 0 with trivial charge fuses trivially, we have d0 = 1. In
the case of the Fibonacci model, it follows from the fusion rule 1×1 = 0+1
that d21 = 1 + d1 , which is solved by d1 = φ as we found earlier; therefore
D2 = d20 + d21 = 1 + φ2 = 2 + φ. Our formula becomes
0 1
N111···1 = φn , (9.107)
2+φ
b P
P a
d a db p (ab o c) ¦P c
P
¦P b
P
a
c
c
N ab N abc d c
c
Using X X
c a
Nab da = Nbc̄ dā = dbdc̄ = db dc , (9.111)
a a
we can easily verify that this condition is satisfied by
d2a
pa = . (9.112)
D2
We conclude that if anyons are created in a random process, those carrying
labels with larger quantum dimension are more likely to be produced, in
keeping with the property that anyons with larger dimension have more
quantum states.
58 9 Topological quantum computation
1 2 3 4
F a c F
1 2 3 4 1 2 3 4
5
a c
d
b
5
F F 5
1 2 3 4 1 2 3 4
e
F e
d
b
5 5
The basis shown furthest to the left in this pentagon diagram is the “left
standard basis” {|left; a, bi}, in which particles 1 and 2 are fused first,
the resulting charge a is fused with particle 3 to yield charge b, and then
finally b is fused with particle 4 to yield the total charge 5. The basis
shown furthest to the right is the “right standard basis” {|right; c, di}, in
which the particles are fused from right to left instead of left to right.
Across the top of the pentagon, these two bases are related by two F -
9.16 Pentagon and hexagon equations 59
Across the bottom of the pentagon, the bases are related by three F -
moves, and we find
X c d b e
d 5
|left; a, bi = |right; c, di F234 F1e4 b
F123 . (9.114)
e a
c,d,e
Equating our two expressions for |left; a, bi, we obtain the pentagon equa-
tion: d 5 c X d c 5 d b e
5
F12c a
Fa34 b = F234 F1e4 b F123 . (9.115)
e a
e
Another nontrivial consistency condition is found by considering the
various ways that three particles can fuse:
1 2 3 2 3 1
F R b F
b
1 2 3 2 3 1
4 4
a c
R R
4 2 1 3 2 1 3 4
F
a c
4 4
The basis {|left; ai} furthest to the left in this hexagon diagram is obtained
if the particles are arranged in the order 123, and particles 1 and 2 are
fused first, while the basis {|right, ci} furthest to the right is obtained if
the particles are arranged in order 231, and particles 1 and 3 are fused
first. Across the top of the hexagon, the two bases are related by the
sequence of moves F RF :
X
4
c 4 4
b
|left, ai = |right; ci F231 b
R 1b F123 a
. (9.116)
b,c
Across the bottom of the hexagon, the bases are related by the sequence
of moves RF R, and we find
X c a
|left, ai = |right; ciRc13 F213
4
R .
a 12
(9.117)
c
60 9 Topological quantum computation
Equating our two expressions for |left; ai, we obtain the hexagon equation:
c a X c 4 b
Rc13 F213
4
R =
a 12
4
F231 R F123
b 1b
4
a
. (9.118)
b
A beautiful theorem, which I will not prove here, says that there are
no further conditions that must be imposed to ensure the consistency of
braiding and fusing. That is, for any choice of an initial and final basis
for n anyons, all sequences of R-moves and F -moves that take the initial
basis to the final basis yield the same isomorphism, provided that the
pentagon equation and hexagon equation are satisfied. This theorem is
an instance of the MacLane coherence theorem, a fundamental result in
category theory. The pentagon and hexagon equations together are called
the Moore-Seiberg polynomial equations — their relevance to physics was
first appreciated in studies of (1+1)-dimensional conformal field theory
during the 1980’s.
A solution to the polynomial equations defines a viable anyon model.
Therefore, there is a systematic procedure for constructing anyon models:
(Fc )da (Fa )cb = (Fd )ce (Fe )db (Fb )ea .
X
(9.121)
e
τ2 + τ = 1 . (9.123)
The only other solution is the complex conjugate of this one; this second
solution really describes the same model, but with clockwise and coun-
terclockwise braiding interchanged. Therefore, an anyon model with the
Fibonacci fusion rule really does exist, and it is essentially unique.
9.18 Epilogue
That is as far as I got in class. I will mention briefly here a few other
topics that I might have covered if I had not run out of time.
j1 × j2 = j . (9.128)
j=|j2 −j1 |
9.18.2 S-matrix
The modular S-matrix of an anyon model can be defined in terms of two
anyon world lines that form a Hopf link:
a b
b 1
S a
D
Here D is the total quantum dimension of the model, and we have used
the normalization where unlinked loops would have the value dadb ; then
the matrix Sab is symmetric and unitary. In abelian anyon models, the
Hopf link arose in our discussion of topological degeneracy, where we
characterized how the vacuum state of an anyon model on the torus is
affected when an anyon is transported around one of the cycles of the
torus. The S-matrix has a similar interpretation in the nonabelian case.
By elementary reasoning, S can be related to the fusion rules:
c
X Sd c
(Na)b = Sbd a
d
S −1 d ; (9.133)
d
S1
9.19 Bibliographical notes 65
that is, the S-matrix simulaneously diagonalizes all the matrices {Na}
(the Verlinde relation). Note that it follows from the definition that S1a =
da /D.
where the sum is over the complete label set of the anyon model, and
e2πiJa = R1aā is the topological spin of the label a. This expression re-
lates the quantity c− , characteristic of the edge theory, to the quantum
dimensions and topological spins of the bulk theory, but determines c−
only modulo 8. Therefore, at least in principle, there can be multiple edge
theories corresponding to a single theory of anyons in the bulk.
was discussed in [4, 5]. My discussion of the universal gate set is based
on [6], where more general models are also discussed. Other schemes,
that make more extensive use of electric charges and that are universal
for smaller groups (like S3 ) are described in [7].
Diagrammatic methods, like those I used in the discussion of the quan-
tum dimension, are extensively applied to derive properties of anyons in
[8]. The role of the polynomial equations (pentagon and hexagon equa-
tions) in (1+1)-dimensional conformal field theory is discussed in [9].
Simulation of anyons using a quantum circuit is discussed in [10]. Simu-
lation of a universal quantum computer using the anyons of the SU(2)k=3
Chern-Simons theory is discussed in [11]. That the Yang-Lee model is
also universal was pointed out in [12].
I did not discuss physical implementations in my lectures, but I list a
few relevant references here anyway: Ideas about realizing abelian and
nonabelian anyons using superconducting Josephson-junction arrays are
discussed in [13]. A spin model with nearest-neighbor interactions that
has nonabelian anyons (though not ones that are computationally univer-
sal) is proposed and solved in [14], and a proposal for realizing this model
using cold atoms trapped in an optical lattice is described in [15]. Some
ideas about realizing the (computationally universal) SU(2)k=3 model in
a system of interacting electrons are discussed in [16].
Much of my understanding of the theory of computing with nonabelian
anyons was derived from many helpful discussions with Alexei Kitaev.
References
67
68 References
John Preskill
Institute for Quantum Information and Matter
California Institute of Technology
page v
Preface vi
10 Quantum Shannon Theory 1
10.1 Shannon for Dummies 1
10.1.1 Shannon entropy and data compression 2
10.1.2 Joint typicality, conditional entropy, and mutual information 4
10.1.3 Distributed source coding 6
10.1.4 The noisy channel coding theorem 7
10.2 Von Neumann Entropy 12
10.2.1 Mathematical properties of H(ρ) 14
10.2.2 Mixing, measurement, and entropy 15
10.2.3 Strong subadditivity 16
10.2.4 Monotonicity of mutual information 18
10.2.5 Entropy and thermodynamics 19
10.2.6 Bekenstein’s entropy bound. 20
10.2.7 Entropic uncertainty relations 21
10.3 Quantum Source Coding 23
10.3.1 Quantum compression: an example 24
10.3.2 Schumacher compression in general 27
10.4 Entanglement Concentration and Dilution 30
10.5 Quantifying Mixed-State Entanglement 35
10.5.1 Asymptotic irreversibility under LOCC 35
10.5.2 Squashed entanglement 37
10.5.3 Entanglement monogamy 38
10.6 Accessible Information 39
10.6.1 How much can we learn from a measurement? 39
10.6.2 Holevo bound 40
10.6.3 Monotonicity of Holevo χ 41
10.6.4 Improved distinguishability through coding: an example 42
10.6.5 Classical capacity of a quantum channel 45
10.6.6 Entanglement-breaking channels 49
10.7 Quantum Channel Capacities and Decoupling 50
10.7.1 Coherent information and the quantum channel capacity 50
10.7.2 The decoupling principle 52
10.7.3 Degradable channels 55
iv Contents
This is the 10th and final chapter of my book Quantum Information, based on the course
I have been teaching at Caltech since 1997. An early version of this chapter (originally
Chapter 5) has been available on the course website since 1998, but this version is
substantially revised and expanded.
The level of detail is uneven, as I’ve aimed to provide a gentle introduction, but I’ve
also tried to avoid statements that are incorrect or obscure. Generally speaking, I chose
to include topics that are both useful to know and relatively easy to explain; I had to
leave out a lot of good stuff, but on the other hand the chapter is already quite long.
My version of Quantum Shannon Theory is no substitute for the more careful treat-
ment in Wilde’s book [1], but it may be more suitable for beginners. This chapter
contains occasional references to earlier chapters in my book, but I hope it will be in-
telligible when read independently of other chapters, including the chapter on quantum
error-correcting codes.
This is a working draft of Chapter 10, which I will continue to update. See the URL
on the title page for further updates and drafts of other chapters. Please send an email
to [email protected] if you notice errors.
Eventually, the complete book will be published by Cambridge University Press. I
hesitate to predict the publication date — they have been far too patient with me.
10
Quantum Shannon Theory
A recurring theme unites these topics — the properties, interpretation, and applications
of Von Neumann entropy.
My goal is to introduce some of the main ideas and tools of quantum Shannon theory,
but there is a lot we won’t cover. For example, we will mostly consider information theory
in an asymptotic setting, where the same quantum channel or state is used arbitrarily
many times, thus focusing on issues of principle rather than more practical questions
about devising efficient protocols.
1. How much can a message be compressed; i.e., how redundant is the information?
This question is answered by the “source coding theorem,” also called the “noiseless
coding theorem.”
2. At what rate can we communicate reliably over a noisy channel; i.e., how much
redundancy must be incorporated into a message to protect against errors? This
question is answered by the “noisy channel coding theorem.”
2 Quantum Shannon Theory
Both questions concern redundancy – how unexpected is the next letter of the message,
on the average. One of Shannon’s key insights was that entropy provides a suitable way
to quantify redundancy.
I call this section “Shannon for Dummies” because I will try to explain Shannon’s ideas
quickly, minimizing distracting details. That way, I can compress classical information
theory to about 14 pages.
Since the letters are statistically independent, and each is produced by consulting the
same probability distribution X, we say that the letters are independent and identically
distributed, abbreviated i.i.d. We’ll use X n to denote the ensemble of n-letter messages in
which each letter is generated independently by sampling from X, and x ~ = (x1 x2 . . . xn )
to denote a string of bits.
Now consider long n-letter messages, n 1. We ask: is it possible to compress the
message to a shorter string of letters that conveys essentially the same information? The
answer is: Yes, it’s possible, unless the distribution X is uniformly random.
If the alphabet is binary, then each letter is either 0 with probability 1 − p or 1 with
probability p, where 0 ≤ p ≤ 1. For n very large, the law of large numbers tells us that
typical strings will contain about n(1 − p) 0’s and about np 1’s.
The number of distinct
n
strings of this form is of order the binomial coefficient np , and from the Stirling
approximation log n! = n log n − n + O(log n) we obtain
n n!
log = log
np (np)! (n(1 − p))!
≈ n log n − n − (np log np − np + n(1 − p) log n(1 − p) − n(1 − p))
= nH(p), (10.4)
where
H(p) = −p log p − (1 − p) log(1 − p) (10.5)
is the entropy function.
In this derivation we used the Stirling approximation in the appropriate form for
natural logarithms. But from now on we will prefer to use logarithms with base 2, which
10.1 Shannon for Dummies 3
is the Shannon entropy (or simply entropy) of the ensemble X = {x, p(x)}. Adopting a
block code that assigns integers to the typical sequences, the information in a string of
n letters can be compressed to about nH(X) bits. In this sense a letter x chosen from
the ensemble carries, on the average, H(X) bits of information.
It is useful to restate this reasoning more carefully using the strong law of large
numbers, which asserts that a sample average for a random variable almost certainly
converges to its expected value in the limit of many trials. If we sample from the dis-
tribution Y = {y, p(y)} n times, let yi , i ∈ {1, 2, . . ., n} denote the ith sample, and
let
X
µ[Y ] = hyi = y p(y) (10.8)
y
denote the expected value of y. Then for any positive ε and δ there is a positive integer
N such that
n
1 X
yi − µ[Y ] ≤ δ (10.9)
n
i=1
with probability at least 1 − ε for all n ≥ N . We can apply this statement to the random
variable log2 p(x). Let us say that a sequence of n letters is δ-typical if
1
H(X) − δ ≤ − log2 p(x1 x2 . . . xn ) ≤ H(X) + δ; (10.10)
n
then the strong law of large numbers says that for any ε, δ > 0 and n sufficiently large,
an n-letter sequence will be δ-typical with probability ≥ 1 − ε.
Since each δ-typical n-letter sequence ~x occurs with probability p(~x) satisfying
pmin = 2−n(H+δ) ≤ p(~x) ≤ 2−n(H−δ) = pmax , (10.11)
4 Quantum Shannon Theory
we may infer upper and lower bounds on the number Ntyp (ε, δ, n) of typical sequences:
X X
Ntyp pmin ≤ p(x) ≤ 1, Ntyp pmax ≥ p(x) ≥ 1 − ε, (10.12)
typical x typical x
implies
2n(H+δ) ≥ Ntyp(ε, δ, n) ≥ (1 − ε)2n(H−δ). (10.13)
Therefore, we can encode all typical sequences using a block code with length n(H + δ)
bits. That way, any message emitted by the source can be compressed and decoded
successfully as long as the message is typical; the compression procedure achieves a
success probability psuccess ≥ 1 − ε, no matter how the atypical sequences are decoded.
What if we try to compress the message even further, say to H(X) − δ 0 bits per letter,
where δ 0 is a constant independent of the message length n? Then we’ll run into trouble,
because there won’t be enough codewords to cover all the typical messages, and we
won’t be able to decode the compressed message with negligible probability of error.
The probability psuccess of successfully decoding the message will be bounded above by
0 0
psuccess ≤ 2n(H−δ ) 2−n(H−δ) + ε = 2−n(δ −δ) + ε; (10.14)
we can correctly decode only 2 n(H−δ 0 )
typical messages, each occurring with probability
no higher than 2−n(H−δ) ; we add ε, an upper bound on the probability of an atypical
message, allowing optimistically for the possibility that we somehow manage to decode
the atypical messages correctly. Since we may choose ε and δ as small as we please, this
success probability becomes small as n → ∞, if δ 0 is a positive constant.
The number of bits per letter encoding the compressed message is called the rate of
the compression code, and we say a rate R is achievable asymptotically (as n → ∞) if
there is a sequence of codes with rate at least R and error probability approaching zero
in the limit of large n. To summarize our conclusion, we have found that
Compression Rate = H(X) + o(1) is achievable,
Compression Rate = H(X) − Ω(1) is not achievable, (10.15)
where o(1) denotes a positive quantity which may be chosen as small as we please, and
Ω(1) denotes a positive constant. This is Shannon’s source coding theorem.
We have not discussed at all the details of the compression code. We might imagine
a huge lookup table which assigns a unique codeword to each message and vice versa,
but because such a table has size exponential in n it is quite impractical for compressing
and decompressing long messages. It is fascinating to study how to make the coding
and decoding efficient while preserving a near optimal rate of compression, and quite
important, too, if we really want to compress something. But this practical aspect of
classical compression theory is beyond the scope of this book.
and similarly for Y . If X and Y are correlated, then by reading a message generated
by Y n I reduce my ignorance about a message generated by X n, which should make it
possible to compress the output of X further than if I did not have access to Y .
To make this idea more precise, we use the concept of jointly typical sequences. Sam-
pling from the distribution X n Y n , that is, sampling n times from the joint distribution
XY , produces a message (~x, ~y) = (x1 x2 . . . xn , y1 y2 . . . yn ) with probability
Then, applying the strong law of large numbers simultaneously to the three distributions
X n , Y n , and X n Y n , we infer that for ε, δ > 0 and n sufficiently large, a sequence drawn
from X n Y n will be δ-typical with probability ≥ 1 − ε. Using Bayes’ rule, we can then
obtain upper and lower bounds on the conditional probability p(~x|~ y) for jointly typical
sequences:
H(X|Y ) = H(XY ) − H(Y ) = h− log p(x, y) + log p(y)i = h− log p(x|y)i, (10.20)
H(X|Y ) is the number of additional bits per letter needed to specify both x ~ and y~ once
y~ is known. Similarly, H(Y |X) is the number of additional bits per letter needed to
specify both ~x and y~ when ~x is known.
The information about X that I gain when I learn Y is quantified by how much the
6 Quantum Shannon Theory
number of bits per letter needed to specify X is reduced when Y is known. Thus is
I(X; Y ) ≡ H(X) − H(X|Y )
= H(X) + H(Y ) − H(XY )
= H(Y ) − H(Y |X), (10.21)
which is called the mutual information. The mutual information I(X; Y ) quantifies how
X and Y are correlated, and is symmetric under interchange of X and Y : I find out
as much about X by learning Y as about Y by learning X. Learning Y never reduces
my knowledge of X, so I(X; Y ) is obviously nonnegative, and indeed the inequality
H(X) ≥ H(X|Y ) ≥ 0 follows easily from the concavity of the log function.
Of course, if X and Y are completely uncorrelated, we have p(x, y) = p(x)p(y), and
p(x, y)
I(X; Y ) ≡ log = 0; (10.22)
p(x)p(y)
we don’t find out anything about X by learning Y if there is no correlation between X
and Y .
and thus
Ntyp|~y ≤ 2n(H(X|Y )+2δ). (10.24)
10.1 Shannon for Dummies 7
Now, to estimate the probability of a decoding error, we need to specify how the bins
are chosen. Let’s assume the bins are chosen uniformly at random, or equivalently, let’s
consider averaging uniformly over all codes that divide the length-n strings into 2nR
bins of equal size. Then the probability that a particular bin contains a message jointly
typical with a specified ~y purely by accident is bounded above by
2−nR Ntyp|~y ≤ 2−n(R−H(X|Y )−2δ). (10.25)
We conclude that if Alice sends R bits to Bob per each letter of the message x, where
R = H(X|Y ) + o(1), (10.26)
then the probability of a decoding error vanishes in the limit n → ∞, at least when we
average over uniformly all codes. Surely, then, there must exist a particular sequence of
codes Alice and Bob can use to achieve the rate R = H(X|Y ) + o(1), as we wanted to
show.
In this scenario, Alice and Bob jointly know (x, y), but initially neither Alice nor Bob
has access to all their shared information. The goal is to merge all the information on
Bob’s side with minimal communication from Alice to Bob, and we have found that
H(X|Y ) + o(1) bits of communication per letter suffice for this purpose. Similarly, the
information can be merged on Alice’s side using H(Y |X) + o(1) bits of communication
per letter from Bob to Alice.
We want to construct a family of codes with increasing block size n, such that the
probability of a decoding error goes to zero as n → ∞. For each n, the code contains
2k codewords among the 2n possible strings of length n. The rate R of the code, the
number of encoded data bits transmitted per physical bit carried by the channel, is
k
R= . (10.28)
n
To protect against errors, we should choose the code so that the codewords are as “far
apart” as possible. For given values of n and k, we want to maximize the number of bits
that must be flipped to change one codeword to another, the Hamming distance between
the two codewords. For any n-bit input message, we expect about np of the bits to flip
— the input diffuses into one of about 2nH(p) typical output strings, occupying an “error
sphere” of “Hamming radius” np about the input string. To decode reliably, we want
to choose our input codewords so that the error spheres of two different codewords do
not overlap substantially. Otherwise, two different inputs will sometimes yield the same
output, and decoding errors will inevitably occur. To avoid such decoding ambiguities,
the total number of strings contained in all 2k = 2nR error spheres should not exceed
the total number 2n of bits in the output message; we therefore require
2nH(p)2nR ≤ 2n (10.29)
or
R ≤ 1 − H(p) := C(p). (10.30)
If transmission is highly reliable, we cannot expect the rate of the code to exceed C(p).
But is the rate R = C(p) actually achievable asymptotically?
In fact transmission with R = C − o(1) and negligible decoding error probability is
possible. Perhaps Shannon’s most ingenious idea was that this rate can be achieved by
an average over “random codes.” Though choosing a code at random does not seem like
a clever strategy, rather surprisingly it turns out that random coding achieves as high
a rate as any other coding scheme in the limit n → ∞. Since C is the optimal rate for
reliable transmission of data over the noisy channel it is called the channel capacity.
Suppose that X is the uniformly random ensemble for a single bit (either 0 with p = 12
or 1 with p = 12 ), and that we sample from X n a total of 2nR times to generate 2nR
“random codewords.” The resulting code is known by both Alice and Bob. To send nR
bits of information, Alice chooses one of the codewords and sends it to Bob by using
the channel n times. To decode the n-bit message he receives, Bob draws a “Hamming
sphere” with “radius” slightly large than np, containing
2n(H(p)+δ) (10.31)
strings. If this sphere contains a unique codeword, Bob decodes the message accordingly.
If the sphere contains more than one codeword, or no codewords, Bob decodes arbitrarily.
How likely is a decoding error? For any positive δ, Bob’s decoding sphere is large
enough that it is very likely to contain the codeword sent by Alice when n is sufficiently
large. Therefore, we need only worry that the sphere might contain another codeword
just by accident. Since there are altogether 2n possible strings, Bob’s sphere contains a
fraction
2n(H(p)+δ)
f= = 2−n(C(p)−δ) , (10.32)
2n
10.1 Shannon for Dummies 9
of all the strings. Because the codewords are uniformly random, the probability that
Bob’s sphere contains any particular codeword aside from the one sent by Alice is f ,
and the probability that the sphere contains any one of the 2nR − 1 invalid codewords
is no more than
2nR f = 2−n(C(p)−R−δ) . (10.33)
Let N2ε denote the number of codewords with pi ≥ 2ε. Then we infer that
1
(N2ε )2ε ≤ ε or N2ε ≤ 2nR−1 ; (10.35)
2nR
we see that we can throw away at most half of the codewords, to achieve pi ≤ 2ε for
every codeword. The new code we have constructed has
1
Rate = R − , (10.36)
n
which approaches R as n → ∞. We have seen, then, that the rate R = C(p) − o(1) is
asymptotically achievable with negligible probability of error, where C(p) = 1 − H(p).
Bob decodes accordingly. If there is no x ~ jointly typical with ~y , or more than one such
~x, Bob decodes arbitrarily.
How likely is a decoding error? For any positive ε and δ, the (~x, ~y) drawn from X nY n
is jointly δ-typical with probability at least 1 − ε if n is sufficiently large. Therefore, we
need only worry that there might more than one codeword jointly typical with y~.
Suppose that Alice samples X n to generate a codeword ~x, which she sends to Bob
using the channel n times. Then Alice samples X n a second time, producing another
codeword ~x0 . With probability close to one, both y~ and ~x0 are δ-typical. But what is the
probability that ~x0 is jointly δ-typical with ~y ?
Because the samples are independent, the probability of drawing these two codewords
factorizes as p(~x0 , ~x) = p(~x0 )p(~x), and likewise the channel output ~y when the first
codeword is sent is independent of the second channel input ~x0 , so p(~x0 , ~y ) = p(~x0 )p(~ y).
From eq.(10.18) we obtain an upper bound on the number Nj.t. of jointly δ-typical (~x, ~y):
X
1≥ p(~x, ~y ) ≥ Nj.t. 2−n(H(XY )+δ) =⇒ Nj.t. ≤ 2n(H(XY )+δ) . (10.37)
j.t. (~
x,~
y)
We also know that each δ-typical x ~ 0 occurs with probability p(~x0 ) ≤ 2−n(H(X)−δ) and that
each δ-typical ~y occurs with probability p(~ y ) ≤ 2−n(H(Y )−δ) . Therefore, the probability
~ 0 and ~y are jointly δ-typical is bounded above by
that x
p(~x0 )p(~
y) ≤ Nj.t. 2−n(H(X)−δ)2−n(H(Y )−δ)
X
j.t. (~
x0 ,~
y)
have demonstrated that errorless transmission over the noisy channel is possible for any
rate R strictly less than
C := max I(X; Y ). (10.41)
X
This quantity C is called the channel capacity; it depends only on the conditional prob-
abilities p(y|x) that define the channel.
where X̃i and Ỹi are the marginal probability distributions for the ith letter deter-
mined by our distribution on the codewords. Because Shannon entropy is subadditive,
H(XY ) ≤ H(X) + H(Y ), we have
X
H(Ỹ n ) ≤ H(Ỹi), (10.45)
i
and therefore
I(Ỹ n ; X̃ n) = H(Ỹ n ) − H(Ỹ n |X̃ n)
X
≤ (H(Ỹi ) − H(Ỹi|X̃i ))
i
X
= I(Ỹi ; X̃i) ≤ nC. (10.46)
i
The mutual information of the messages sent and received is bounded above by the
sum of the mutual information per letter, and the mutual information for each letter is
bounded above by the capacity, because C is defined as the maximum of I(X; Y ) over
all input ensembles.
Recalling the symmetry of mutual information, we have
I(X̃ n; Ỹ n ) = H(X̃ n) − H(X̃ n|Ỹ n )
= nR − H(X̃ n|Ỹ n ) ≤ nC. (10.47)
12 Quantum Shannon Theory
Now, if we can decode reliably as n → ∞, this means that the input codeword is
completely determined by the signal received, or that the conditional entropy of the
input (per letter) must get small
1
H(X̃ n|Ỹ n ) → 0. (10.48)
n
If errorless transmission is possible, then, eq. (10.47) becomes
R ≤ C + o(1), (10.49)
in the limit n → ∞. The asymptotic rate cannot exceed the capacity. In Exercise 10.9,
you will sharpen the statement eq.(10.48), showing that
1 1
H(X̃ n|Ỹ n ) ≤ H2 (pe ) + pe R, (10.50)
n n
where pe denotes the decoding error probability, and H2 (pe ) = −pe log2 pe − (1 −
pe ) log2 (1 − pe ) .
We have now seen that the capacity C is the highest achievable rate of communication
through the noisy channel, where the probability of error goes to zero as the number of
letters in the message goes to infinity. This is Shannon’s noisy channel coding theorem.
What is particularly remarkable is that, although the capacity is achieved by messages
that are many letters in length, we have obtained a single-letter formula for the capacity,
expressed in terms of the optimal mutual information I(X; Y ) for just a single use of
the channel.
The method we used to show that R = C − o(1) is achievable, averaging over random
codes, is not constructive. Since a random code has no structure or pattern, encoding
and decoding are unwieldy, requiring an exponentially large code book. Nevertheless, the
theorem is important and useful, because it tells us what is achievable, and not achiev-
able, in principle. Furthermore, since I(X; Y ) is a concave function of X = {x, p(x)}
(with {p(y|x)} fixed), it has a unique local maximum, and C can often be computed
(at least numerically) for channels of interest. Finding codes which can be efficiently
encoded and decoded, and come close to achieving the capacity, is a very interesting
pursuit, but beyond the scope of our lightning introduction to Shannon theory.
imagine a source that prepares messages of n letters, but where each letter is chosen
from an ensemble of quantum states. The signal alphabet consists of a set of quantum
states {ρ(x)}, each occurring with a specified a priori probability p(x).
As we discussed at length in Chapter 2, the probability of any outcome of any mea-
surement of a letter chosen from this ensemble, if the observer has no knowledge about
which letter was prepared, can be completely characterized by the density operator
X
ρ= p(x)ρ(x); (10.52)
x
the vector of eigenvalues λ(ρ) is a probability distribution, and the Von Neumann en-
tropy of ρ is just the Shannon entropy of this distribution,
H(ρ) = H(λ(ρ)). (10.56)
If ρA is the density operator of system A, we will sometimes use the notation
H(A) := H(ρA ). (10.57)
Our convention is to denote quantum systems with A, B, C, . . . and classical probability
distributions with X, Y, Z, . . ..
In the case where the signal alphabet {|ϕ(x)i, p(x)} consists of mutually orthogonal
pure states, the quantum source reduces to a classical one; all of the signal states can be
perfectly distinguished, and H(ρ) = H(X), where X is the classical ensemble {x, p(x)}.
The quantum source is more interesting when the signal states {ρ(x)} are not mutually
commuting. We will argue that the Von Neumann entropy quantifies the incompressible
information content of the quantum source (in the case where the signal states are pure)
much as the Shannon entropy quantifies the information content of a classical source.
Indeed, we will find that Von Neumann entropy plays multiple roles. It quantifies not
only the quantum information content per letter of the pure-state ensemble (the mini-
mum number of qubits per letter needed to reliably encode the information) but also its
classical information content (the maximum amount of information per letter—in bits,
not qubits—that we can gain about the preparation by making the best possible mea-
surement). And we will see that Von Neumann information enters quantum information
in yet other ways — for example, quantifying the entanglement of a bipartite pure state.
Thus quantum information theory is largely concerned with the interpretation and uses
of Von Neumann entropy, much as classical information theory is largely concerned with
the interpretation and uses of Shannon entropy.
In fact, the mathematical machinery we need to develop quantum information theory
is very similar to Shannon’s mathematics (typical sequences, random coding, . . . ); so
similar as to sometimes obscure that the conceptual context is really quite different.
14 Quantum Shannon Theory
The central issue in quantum information theory is that nonorthogonal quantum states
cannot be perfectly distinguished, a feature with no classical analog.
The Shannon entropy of just part of a classical bipartite system cannot be greater
than the Shannon entropy of the whole system. Not so for the Von Neumann en-
tropy! For example, in the case of an entangled bipartite pure quantum state, we have
H(A) = H(B) > 0, while H(AB) = 0. The entropy of the global system vanishes be-
cause our ignorance is minimal — we know as much about AB as the laws of quantum
physics will allow. But we have incomplete knowledge of the parts A and B, with our
ignorance quantified by H(A) = H(B). For a quantum system, but not for a classical
one, information can be encoded in the correlations among the parts of the system, yet
be invisible when we look at the parts one at a time.
Equivalently, a property that holds classically but not quantumly is
is no more than H(ρ) bits, so some of the information about which state was prepared
has been irretrievably lost if H(ρ) < H(X).
If we perform an orthogonal measurement on ρ by projecting onto the basis {|yi},
then outcome y occurs with probability
X X
q(y) = hy|ρ|yi = |hy|ai|2λa , where ρ = λa|aiha| (10.69)
a a
and {|ai} is the basis in which ρ is diagonal. Since Dya = |hy|ai|2 is a doubly stochastic
matrix, q ≺ λ(ρ) and therefore H(Y ) ≥ H(ρ), where equality holds only if the measure-
ment is in the basis {|ai}. Mathematically, the conclusion is that for a nondiagonal and
nonnegative Hermitian matrix, the diagonal elements are more random than the eigen-
values. Speaking more physically, the outcome of an orthogonal measurement is easiest
to predict if we measure an observable which commutes with the density operator, and
becomes less predictable if we measure in a different basis.
This majorization property has a further consequence, which will be useful for our dis-
cussion of quantum compression. Suppose that ρ is a density operator of a d-dimensional
system, with eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λd and that E 0 = di=1 |eiihei | is a projector
P0
where the inequality follows because the diagonal elements of ρ in the basis {|ei i}
are majorized by the eigenvalues of ρ. In other words, if we perform a two-outcome
orthogonal measurement, projecting onto either Λ or its orthogonal complement Λ⊥ , the
probability of projecting onto Λ is no larger than the sum of the d0 largest eigenvalues
of ρ (the Ky Fan dominance principle).
This is the eminently reasonable statement that the correlations of X with Y Z are at
least as strong as the correlations of X with Y alone.
There is another useful way to think about (classical) strong subadditivity. Recalling
the definition of mutual information we have
p(x)p(y, z) p(x, y)
I(X; Y Z) − I(X; Y ) = − log + log
p(x, y, z) p(x)p(y)
p(x, y) p(y, z) p(y)
= − log
p(y) p(y) p(x, y, z)
X
p(x|y)p(z|y)
= − log = p(y)I(X; Z|y) ≥ 0, (10.72)
p(x, z|y) y
where in the last line we used p(x, y, z) = p(x, z|y)p(y). For each fixed y, p(x, z|y)
is a normalized probability distribution with nonnegative mutual information; hence
10.2 Von Neumann Entropy 17
One might ask under what conditions strong subadditivity is satisfied as an equality;
that is, when does the conditional mutual information vanish? Since I(X; Z|Y ) is a sum
of nonnegative terms, each of these terms must vanish if I(X; Z|Y ) = 0. Therefore for
each y with p(y) > 0, we have I(X; Z|y) = 0. The mutual information vanishes only for
a product distribution; therefore
This means that the correlations between x and z arise solely from their shared corre-
lation with y, in which case we say that x and z are conditionally independent.
Correlations of quantum systems also obey strong subadditivity:
But while the proof is elementary in the classical case, in the quantum setting strong
subadditivity is a rather deep result with many important consequences. We will post-
pone the proof until §10.8.3, where we will be able to justify the quantum statement
by giving it a clear operational meaning. We’ll also see in Exercise 10.3 that strong
subadditivity follows easily from another deep property, the monotonicity of relative
entropy:
D(ρA kσA ) ≤ D(ρAB kσ AB ), (10.76)
where
D(ρkσ) := tr ρ (log ρ − log σ) . (10.77)
The relative entropy of two density operators on a system AB cannot be less than
the induced relative entropy on the subsystem A. Insofar as we can regard the relative
entropy as a measure of the “distance” between density operators, monotonicity is the
reasonable statement that quantum states become no easier to distinguish when we look
at the subsystem A than when we look at the full system AB. It also follows (Exercise
10.3), that the action of a quantum channel N cannot increase relative entropy:
There are a few other ways of formulating strong subadditivity which are helpful
to keep in mind. By expressing the quantum mutual information in terms of the Von
Neumann entropy we find
While A, B, C are three disjoint quantum systems, we may view AB and BC as overlap-
ping systems with intersection B and union ABC; then strong subadditivity says that
the sum of the entropies of two overlapping systems is at least as large as the sum of the
18 Quantum Shannon Theory
entropies of their union and their intersection. In terms of conditional entropy, strong
subadditivity becomes
H(A|B) ≥ H(A|BC); (10.80)
loosely speaking, our ignorance about A when we know only B is no smaller than our
ignorance about A when we know both B and C, but with the proviso that for quantum
information “ignorance” can sometimes be negative!
As in the classical case, it is instructive to consider the condition for equality in strong
subadditivity. What does it mean for systems to have quantum conditional independence,
I(A; C|B) = 0? It is easy to formulate a sufficient condition. Suppose that system B has
a decomposition as a direct sum of tensor products of Hilbert spaces
M M
HB = HBj = HB L ⊗ HB R , (10.81)
j j
j j
and that the state of ABC has the block diagonal form
M
ρABC = pj ρAB L ⊗ ρB RC . (10.82)
j j
j
In each block labeled by j the state is a tensor product, with conditional mutual infor-
mation
What is less obvious is that the converse is also true — any state with I(A; C|B) = 0
has a decomposition as in eq.(10.82). This is a useful fact, though we will not give the
proof here.
0
its Stinespring dilation U B→B E , mapping B to B 0 and a suitable environment system
E. Since the isometry U does not change the eigenvalues of the density operator, it
preserves the entropy of B and of AB,
which implies
where E(ρ) = hHiρ denotes the expectation value of the Hamiltonian in this state; for
this subsection we respect the conventions of thermodynamics by denoting Von Neumann
entropy by S(ρ) rather than H(ρ) (lest H be confused with the Hamiltonian H), and
by using natural logarithms. Expressing F (ρ) and the free energy F (ρβ ) of the Gibbs
state as
with equality only for ρ = ρβ . The nonnegativity of relative entropy implies that at a
given temperature β −1 , the Gibbs state ρβ has the lowest possible free energy. Our open
20 Quantum Shannon Theory
system, in contact with a thermal reservoir at temperature β −1 , will prefer the Gibbs
state if it wishes to minimize its free energy.
What can we say about the approach to thermal equilibrium of an open system?
We may anticipate that the joint unitary evolution of system and reservoir induces a
quantum channel N acting on the system alone, and we know that relative entropy is
monotonic — if
N : ρ 7→ ρ0 , N : σ 7→ σ 0 , (10.91)
then
D(ρ0 kσ 0 ) ≤ D(ρkσ). (10.92)
Furthermore, if the Gibbs state is an equilibrium state, we expect this channel to preserve
the Gibbs state
N : ρβ 7→ ρβ ; (10.93)
therefore,
D(ρ0 kρβ ) = β F (ρ0 ) − F (ρβ ) ≤ β F (ρ) − F (ρβ ) = D(ρkρβ ),
(10.94)
and hence
F (ρ0 ) ≤ F (ρ). (10.95)
Any channel that preserves the Gibbs state cannot increase the free energy; instead,
free energy of an out-of-equilibrium state is monotonically decreasing under open-state
evolution. This statement is a version of the second law of thermodynamics.
We’ll have more to say about how quantum information theory illuminates thermo-
dynamics in §10.8.4.
That is, we regard the state as the thermal mixed state of K, with the temperature arbi-
trarily set to unity (which is just a normalization convention for K). Then by rewriting
eq.(10.90) we see that, for any state ρ, D(ρkρ0 ) ≥ 0 implies
S(ρ) − S(ρ0 ) ≤ tr (ρK) − tr (ρ0 K) (10.97)
The left-hand side, the entropy with vacuum entropy subtracted, is not larger than
the right-hand side, the (modular) energy with vacuum energy subtracted. This is one
version of Bekenstein’s bound. Here K, which is dimensionless, can be loosely interpreted
as ER, where E is the energy contained in the region and R is its linear size.
While the bound follows easily from nonnegativity of relative entropy, the subtle part
of the argument is recognizing that the (suitably subtracted) expectation value of the
modular Hamiltonian is a reasonable way to define ER. The detailed justification for
this involves properties of relativistic quantum field theory that we won’t go into here.
Suffice it to say that, because we constructed K by regarding the marginal state of
the vacuum as the Gibbs state associated with the Hamiltonian K, we expect K to be
linear in the energy, and dimensional analysis then requires inclusion of the factor of R
(in units with ~ = c = 1).
Bekenstein was led to conjecture such a bound by thinking about black hole thermo-
dynamics. Leaving out numerical factors, just to get a feel for the orders of magnitude
of things, the entropy of a black hole with circumference ∼ R is S ∼ R2 /G, and its mass
(energy) is E ∼ R/G, where G is Newton’s gravitational constant; hence S ∼ ER for a
black hole. Bekenstein realized that unless S = O(ER) for arbitrary states and regions,
we could throw extra stuff into the region, making a black hole with lower entropy than
the initial state, thus violating the (generalized) second law of thermodynamics. Though
black holes provided the motivation, G drops out of the inequality, which holds even in
nongravitational relativistic quantum field theories.
where {|xi} is the orthonormal basis of eigenvectors of A and {a(x)} is the corresponding
vector of eigenvalues; if A is measured in the state ρ, the outcome a(x) occurs with
probability p(x) = hx|ρ|xi. Thus A has expectation value tr(ρA) and variance
(∆A)2 = tr ρA2 − (trρA)2 .
(10.99)
Using the Cauchy-Schwarz inequality, we can show that if A and B are two Hermitian
observables and ρ = |ψihψ| is a pure state, then
1
∆A∆B ≥ |hψ|[A, B]|ψi|. (10.100)
2
Eq.(10.100) is a useful statement of the uncertainty principle, but has drawbacks. It
depends on the state |ψi and for that reason does not fully capture the incompatibility
of the two observables. Furthermore, the variance does not characterize very well the
22 Quantum Shannon Theory
The second term on the right-hand side, which vanishes if ρ is a pure state, reminds us
that our uncertainty increases when the state is mixed. Like many good things in quan-
tum information theory, this entropic uncertainty relation follows from the monotonicity
of the quantum relative entropy.
For each measurement there is a corresponding quantum channel, realized by per-
forming the measurement and printing the outcome in a classical register,
X
MX : ρ 7→ |xihx|ρ|xihx| =: ρX ,
x
X
MZ : ρ 7→ |zihz|ρ|zihz| =: ρZ . (10.104)
z
The Shannon entropy of the measurement outcome distribution is also the Von Neumann
entropy of the corresponding channel’s output state,
H(X) = H(ρX ), H(Z) = H(ρZ ); (10.105)
the entropy of this output state can be expressed in terms of the relative entropy of
input and output, and the entropy of the channel input, as in
H(X) = −trρX log ρX = −trρ log ρX = D(ρkρX ) + H(ρ). (10.106)
Using the monotonicity of relative entropy under the action of the channel MZ , we
have
D(ρkρX ) ≥ D(ρZ kMZ (ρX )), (10.107)
where
D(ρZ kMZ (ρX )) = −H(ρZ ) − trρZ log MZ (ρX ), (10.108)
and
X
MZ (ρX ) = |zihz|xihx|ρ|xihx|zihz|. (10.109)
x,z
10.3 Quantum Source Coding 23
Writing
!
X X
log MZ (ρX ) = |zi log hz|xihx|ρ|xihx|zi hz|, (10.110)
z x
we see that
!
X X
−trρZ log MZ (ρX ) = − hz|ρ|zi log hz|xihx|ρ|xihx|zi . (10.111)
z x
Clearly this inequality is tight, as it is saturated by x-basis (or z-basis) states, for which
H(X) = 0 and H(Z) = log d.
If the states of this ensemble are mutually orthogonal, then the message might as well
be classical; the interesting quantum case is where the states are not orthogonal and
24 Quantum Shannon Theory
therefore not perfectly distinguishable. The density operator realized by this ensemble
is
X
ρ= p(x)|ϕ(x)ihϕ(x)|, (10.118)
x
ρ⊗n = ρ ⊗ · · · ⊗ ρ. (10.119)
How redundant is the quantum information in this message? We would like to devise
a quantum code allowing us to compress the message to a smaller Hilbert space, but
without much compromising the fidelity of the message. Perhaps we have a quantum
memory device, and we know the statistical properties of the recorded data; specifically,
we know ρ. We want to conserve space on our (very expensive) quantum hard drive by
compressing the data.
The optimal compression that can be achieved was found by Schumacher. As you
might guess, the message can be compressed to a Hilbert space H with
As is obvious from symmetry, the eigenstates of ρ are qubits oriented up and down along
the axis n̂ = √12 (x̂ + ẑ),
cos π8
0
|0 i ≡ | ↑n̂ i = ,
sin π8
sin π8
0
|1 i ≡ | ↓n̂ i = ; (10.124)
− cos π8
the eigenvalues are
1 1 π
λ(00) = + √ = cos2 ,
2 2 2 8
1 1 π
λ(10) = − √ = sin2 ; (10.125)
2 2 2 8
evidently λ(00 ) + λ(10) = 1 and λ(00)λ(10) = 81 = detρ. The eigenstate |00 i has equal
(and relatively large) overlap with both signal states
π
|h00| ↑z i|2 = |h00 | ↑x i|2 = cos2 = .8535, (10.126)
8
while |10 i has equal (and relatively small) overlap with both,
π
|h10 | ↑z i|2 = |h10 | ↑x i|2 = sin2 = .1465. (10.127)
8
Thus if we don’t know whether | ↑z i or | ↑x i was sent, the best guess we can make is
|ψi = |00 i. This guess has the maximal fidelity with ρ
1 1
F = |h↑z |ψi|2 + |h↑x |ψi|2, (10.128)
2 2
among all possible single-qubit states |ψi (F = .8535).
Now imagine that Alice needs to send three letters to Bob, but she can afford to send
only two qubits. Still, she wants Bob to reconstruct her state with the highest possible
fidelity. She could send Bob two of her three letters, and ask Bob to guess |00 i for the
third. Then Bob receives two letters with perfect fidelity, and his guess has F = .8535
for the third; hence F = .8535 overall. But is there a more clever procedure that achieves
higher fidelity?
Yes, there is. By diagonalizing ρ, we decomposed the Hilbert space of a single qubit
into a “likely” one-dimensional subspace (spanned by |00 i) and an “unlikely” one-
dimensional subspace (spanned by |10 i). In a similar way we can decompose the Hilbert
space of three qubits into likely and unlikely subspaces. If |ψi = |ψ1 i ⊗ |ψ2 i ⊗ |ψ3 i is any
signal state, where the state of each qubit is either | ↑z i or | ↑x i, we have
π
|h0000 00 |ψi|2 = cos6 = .6219,
8 π π
|h0000 10 |ψi|2 = |h0010 00 |ψi|2 = |h1000 00 |ψi|2 = cos4 sin2 = .1067,
8
π 8
π
|h0010 10 |ψi|2 = |h1000 10 |ψi|2 = |h1010 00 |ψi|2 = cos2 sin4 = .0183,
π 8 8
|h1010 10 |ψi|2 = sin6 = .0031. (10.129)
8
Thus, we may decompose the space into the likely subspace Λ spanned by
{|00 00 00 i, |0000 10 i, |0010 00 i, |1000 00 i}, and its orthogonal complement Λ⊥ . If we make an
26 Quantum Shannon Theory
incomplete orthogonal measurement that projects a signal state onto Λ or Λ⊥ , the prob-
ability of projecting onto the likely subspace Λ is
To perform this measurement, Alice could, for example, first apply a unitary trans-
formation U that rotates the four high-probability basis states to
then Alice measures the third qubit to perform the projection. If the outcome is |0i,
then Alice’s input state has in effect been projected onto Λ. She sends the remaining
two unmeasured qubits to Bob. When Bob receives this compressed two-qubit state
|ψcomp i, he decompresses it by appending |0i and applying U −1 , obtaining
If Alice’s measurement of the third qubit yields |1i, she has projected her input state
onto the low-probability subspace Λ⊥ . In this event, the best thing she can do is send
the state that Bob will decompress to the most likely state |00 00 00 i – that is, she sends
the state |ψcompi such that
Thus, if Alice encodes the three-qubit signal state |ψi, sends two qubits to Bob, and
Bob decodes as just described, then Bob obtains the state ρ0
This is indeed better than the naive procedure of sending two of the three qubits each
with perfect fidelity.
As we consider longer messages with more letters, the fidelity of the compression
improves, as long as we don’t try to compress too much. The Von-Neumann entropy of
the one-qubit ensemble is
π
H(ρ) = H cos2 = .60088 . . . (10.138)
8
Therefore, according to Schumacher’s theorem, we can shorten a long message by the
factor, say, .6009, and still achieve very good fidelity.
10.3 Quantum Source Coding 27
Since the letters are drawn independently, the density operator of the entire message is
ρ⊗n ≡ ρ ⊗ · · · ⊗ ρ. (10.140)
We claim that, for n large, this density matrix has nearly all of its support on a sub-
space of the full Hilbert space of the messages, where the dimension of this subspace
asymptotically approaches 2nH(ρ).
This claim follows directly from the corresponding classical statement, for we may
consider ρ to be realized by an ensemble of orthonormal pure states, its eigenstates,
where the probability assigned to each eigenstate is the corresponding eigenvalue. In
this basis our source of quantum information is effectively classical, producing messages
which are tensor products of ρ eigenstates, each with a probability given by the product
of the corresponding eigenvalues. For a specified n and δ, define the δ-typical subspace
Λ as the space spanned by the eigenvectors of ρ⊗n with eigenvalues λ satisfying
2−n(H−δ) ≥ λ ≥ 2−n(H+δ) . (10.141)
Borrowing directly from Shannon’s argument, we infer that for any δ, ε > 0 and n
sufficiently large, the sum of the eigenvalues of ρ⊗n that obey this condition satisfies
tr(ρ⊗n E) ≥ 1 − ε, (10.142)
where E denotes the projection onto the typical subspace Λ, and the number dim(Λ) of
such eigenvalues satisfies
2n(H+δ) ≥ dim(Λ) ≥ (1 − ε)2n(H−δ) . (10.143)
Our coding strategy is to send states in the typical subspace faithfully. We can make a
measurement that projects the input message onto either Λ or Λ⊥ ; the outcome will be
Λ with probability pΛ = tr(ρ⊗n E) ≥ 1 − ε. In that event, the projected state is coded
and sent. Asymptotically, the probability of the other outcome becomes negligible, so it
matters little what we do in that case.
The coding of the projected state merely packages it so it can be carried by a minimal
number of qubits. For example, we apply a unitary change of basis U that takes each
state |ψtyp i in Λ to a state of the form
U |ψtypi = |ψcomp i ⊗ |0resti, (10.144)
where |ψcompi is a state of n(H + δ) qubits, and |0resti denotes the state |0i ⊗ . . . ⊗ |0i
of the remaining qubits. Alice sends |ψcomp i to Bob, who decodes by appending |0resti
and applying U −1 .
28 Quantum Shannon Theory
Suppose that
|ϕ(~x)i = |ϕ(x1 )i ⊗ . . . ⊗ |ϕ(xn )i, (10.145)
denotes any one of the n-letter pure state messages that might be sent. After coding,
transmission, and decoding are carried out as just described, Bob has reconstructed a
state
where ρJunk (~x) is the state we choose to send if the measurement yields the outcome
Λ⊥ . What can we say about the fidelity of this procedure?
The fidelity varies from message to message, so we consider the fidelity averaged over
the ensemble of possible messages:
p(~x)hϕ(~x)|ρ0 (~x)|ϕ(~x)i
X
F̄ =
~
x
X
= p(~x)hϕ(~x)|E|ϕ(~x)ihϕ(~x)|E|ϕ(~x)i
~
x
X
+ p(~x)hϕ(~x)|ρJunk (~x)|ϕ(~x)ihϕ(~x)|I − E|ϕ(~x)i
~
x
X
≥ p(~x)hϕ(~x)|E|ϕ(~x)i2 , (10.147)
~
x
where the last inequality holds because the “Junk” term is nonnegative. Since any real
number z satisfies
(z − 1)2 ≥ 0, or z 2 ≥ 2z − 1, (10.148)
and hence
X
F̄ ≥ p(~x)(2hϕ(~x)|E|ϕ(~x)i − 1)
~
x
= 2 tr(ρ⊗n E) − 1 ≥ 2(1 − ε) − 1 = 1 − 2ε. (10.150)
Since ε and δ can be as small as we please, we have shown that it is possible to compress
the message to n(H + o(1)) qubits, while achieving an average fidelity that becomes
arbitrarily good as n gets large.
Is further compression possible? Let us suppose that Bob will decode the message
ρcomp (~x) that he receives by appending qubits and applying a unitary transformation
U −1 , obtaining
ρ0 (~x) = U −1 (ρcomp (~x) ⊗ |0ih0|)U (10.151)
(“unitary decoding”), and suppose that ρcomp (~x) has been compressed to n(H − δ 0 )
qubits. Then, no matter how the input messages have been encoded, the decoded mes-
sages are all contained in a subspace Λ0 of Bob’s Hilbert space with dim(Λ0 ) = 2n(H−δ ) .
0
10.3 Quantum Source Coding 29
If the input message is |ϕ(~x)i, then the density operator reconstructed by Bob can be
diagonalized as
ρ0 (~x) =
X
|ax~ iλa~x hax~ |, (10.152)
a~x
where the |ax~ i’s are mutually orthogonal states in Λ0 . The fidelity of the reconstructed
message is
where E 0 denotes the orthogonal projection onto the subspace Λ0 . The average fidelity
therefore obeys
But, according to the Ky Fan dominance principle discussed in §10.2.2, since E 0 projects
onto a space of dimension 2n(H−δ ) , tr(ρ⊗n E 0 ) can be no larger than the sum of the
0
0
2n(H−δ ) largest eigenvalues of ρ⊗n . The δ-typical eigenvalues of ρ⊗n are no smaller than
0
2−n(H−δ) , so the sum of the 2n(H−δ ) largest eigenvalues can be bounded above:
0 0
tr(ρ⊗n E 0 ) ≤ 2n(H−δ )2−n(H−δ) + ε = 2−n(δ −δ) + ε, (10.155)
where the + ε accounts for the contribution from the atypical eigenvalues. Since we
may choose ε and δ as small as we please for sufficiently large n, we conclude that the
average fidelity F̄ gets small as n → ∞ if we compress to H(ρ) − Ω(1) qubits per letter.
We find, then, that H(ρ) qubits per letter is the optimal compression of the quantum
information that can be achieved if we are to obtain good fidelity as n goes to infinity.
This is Schumacher’s quantum source coding theorem.
The above argument applies to any conceivable encoding scheme, but only to a re-
stricted class of decoding schemes, unitary decodings. The extension of the argument to
general decoding schemes is sketched in §10.6.3. The conclusion is the same. The point
is that n(H − δ) qubits are too few to faithfully encode the typical subspace.
There is another useful way to think about Schumacher’s quantum compression pro-
tocol. Suppose that Alice’s density operator ρ⊗nA has a purification |ψiRA which Alice
shares with Robert. Alice wants to convey her share of |ψiRA to Bob with high fidelity,
sending as few qubits to Bob as possible. To accomplish this task, Alice can use the same
procedure as described above, attempting to compress the state of A by projecting onto
its typical subspace Λ. Alice’s projection succeeds with probability
|ψi satisfying
2
Fe ≥ hψ|I ⊗ E|ψihψ|I ⊗ E|ψi = tr ρ⊗n E = P (E)2 ≥ (1 − ε)2 ≥ 1 − 2ε.
(10.158)
We conclude that Alice can transfer her share of the pure state |ψiRA to Bob by sending
nH(ρ) + o(n) qubits, achieving arbitrarily good entanglement fidelity Fe as n → ∞. In
§10.8.2 we’ll derive a more general version of this result.
To summarize, there is a close analogy between Shannon’s classical source coding
theorem and Schumacher’s quantum source coding theorem. In the classical case, nearly
all long messages are typical sequences, so we can code only these and still have a small
probability of error. In the quantum case, nearly all long messages have nearly perfect
overlap with the typical subspace, so we can code only the typical subspace and still
achieve good fidelity.
Alternatively, Alice could send classical information to Bob, the string x1 x2 · · · xn , and
Bob could follow these classical instructions to reconstruct Alice’s state |ϕ(x1 )i ⊗ . . . ⊗
|ϕ(xn )i. By this means, they could achieve high-fidelity compression to H(X) + o(1)
bits — or qubits — per letter, where X is the classical ensemble {x, p(x)}. But if
{|ϕ(x)i, p(x)} is an ensemble of nonorthogonal pure states, this classically achievable
amount of compression is not optimal; some of the classical information about the
preparation of the state is redundant, because the nonorthogonal states cannot be per-
fectly distinguished. Schumacher coding goes further, achieving optimal compression to
H(ρ) + o(1) qubits per letter. Quantum compression packages the message more effi-
ciently than classical compression, but at a price — Bob receives the quantum state
Alice intended to send, but Bob doesn’t know what he has. In contrast to the classical
case, Bob can’t fully decipher Alice’s quantum message accurately. An attempt to read
the message will unavoidably disturb it.
even if Alice and Bob exchange classical messages about their actions and measurement
outcomes. Therefore, any quantitative measure of entanglement should have the property
that LOCC cannot increase it, and it should also vanish for an unentangled product
state. An obvious candidate is the Schmidt number, but on reflection it does not seem
very satisfactory. Consider
p
|ψε i = 1 − 2|ε|2 |00i + ε|11i + ε|22i, (10.160)
which has Schmidt number 3 for any |ε| > 0. Do we really want to say that |ψεi is
“more entangled” than |φ+ i? Entanglement, after all, can be regarded as a resource —
we might plan to use it for teleportation, for example — and it seems clear that |ψε i
(for |ε| 1) is a less valuable resource than |φ+ i.
It turns out, though, that there is a natural and useful way to quantify the entangle-
ment of any bipartite pure state. To compare two states, we use LOCC to convert both
states to a common currency that can be compared directly. The common currency is
maximal entanglement, and the amount of shared entanglement can be expressed in units
of Bell pairs (maximally entangled two-qubit states), also called ebits of entanglement.
To quantify the entanglement of a particular bipartite pure state, |ψiAB , imagine
preparing n identical copies of that state. Alice and Bob share a large supply of maxi-
mally entangled Bell pairs. Using LOCC, they are to convert k Bell pairs (|φ+ iAB )⊗k )
to n high-fidelity copies of the desired state (|ψiAB )⊗n ). What is the minimum number
kmin of Bell pairs with which they can perform this task?
To obtain a precise answer, we consider the asymptotic setting, requiring arbitrarily
high-fidelity conversion in the limit of large n. We say that a rate R of conversion from
|φ+ i to |ψi is asymptotically achievable if for any ε, δ > 0, there is an LOCC protocol
with
k
≤ R + δ, (10.161)
n
which prepares the target state |ψ +i⊗n with fidelity F ≥ 1 − ε. We define the entangle-
ment cost EC of |ψi as the infimum of achievable conversion rates:
EC (|ψi) := inf {achievable rate for creating |ψi from Bell pairs} . (10.162)
Asymptotically, we can create many copies of |ψi by consuming EC Bell pairs per copy.
Now imagine that n copies of |ψiAB are already shared by Alice and Bob. Using
LOCC, Alice and Bob are to convert (|ψiAB )⊗n back to the standard currency: k0 Bell
0
pairs |φ+ i⊗k 0
AB . What is the maximum number kmax of Bell pairs they can extract from
⊗n 0
|ψiAB ? In this case we say that a rate R of conversion from |ψi to |φ+ i is asymptotically
achievable if for any ε, δ > 0, there is an LOCC protocol with
k0
≥ R0 − δ, (10.163)
n
0
which prepares the target state |φ+ i⊗k with fidelity F ≥ 1 − ε. We define the distillable
entanglement ED of |ψi as the supremum of achievable conversion rates:
ED (|ψi) := sup {achievable rate for distilling Bell pairs from |ψi} . (10.164)
Asymptotically, we can convert many copies of |ψi to Bell pairs, obtaining ED Bell pairs
per copy of |ψi consumed.
32 Quantum Shannon Theory
otherwise Alice and Bob could increase their number of shared Bell pairs by converting
them to copies of |ψi and then back to Bell pairs. In fact the entanglement cost and
distillable entanglement are equal for bipartite pure states. (The story is more compli-
cated for bipartite mixed states; see §10.5.) Therefore, for pure states at least we may
drop the subscript, using E(|ψi) to denote the entanglement of |ψi. We don’t need to
distinguish between entanglement cost and distillable entanglement because conversion
of entanglement from one form to another is an asymptotically reversible process. E
quantifies both what we have to pay in Bell pairs to create |ψi, and value of |ψi in Bell
pairs for performing tasks like quantum teleportation which consume entanglement.
But what is the value of E(|ψiAB )? Perhaps you can guess — it is
the Von Neumann entropy of Alice’s density operator ρA (or equivalently Bob’s density
operator ρB ). This is clearly the right answer in the case where |ψiAB is a product of k
Bell pairs. In that case ρA (or ρB ) is 12 I for each qubit in Alice’s possession
⊗k
1
ρA = I , (10.167)
2
and
1
H(ρA ) = k H I = k. (10.168)
2
How do we see that E = H(ρA ) is the right answer for any bipartite pure state?
Though it is perfectly fine to use Bell pairs as the common currency for comparing
bipartite entangled states, in the asymptotic setting it is simpler and more natural to
allow fractions of a Bell pair, which is what we’ll do here. That is, we’ll consider a
maximally entangled state of two d-dimensional systems to be log2 d Bell pairs, even if
d is not a power of two. So our goal will be to show that Alice and Bob can use LOCC
to convert shared maximal entanglement of systems with dimension d = 2n(H(ρA )+δ)
into n copies of |ψi, for any positive δ and with arbitrarily good fidelity as n → ∞, and
conversely that Alice and Bob can use LOCC to convert n copies of |ψi into a shared
maximally entangled state of d-dimensional systems with arbitrarily good fidelity, where
d = 2n(H(ρA )−δ) . This suffices to demonstrate that EC (|ψi) = ED (|ψi) = H(ρA ).
First let’s see that if Alice and Bob share k = n(H(ρA ) + δ) Bell pairs, then they
can prepare |ψi⊗nAB with high fidelity using LOCC. They perform this task, called entan-
glement dilution, by combining quantum teleportation with Schumacher compression.
To get started, Alice locally creates n copies of |ψiAC , where A and C are systems she
controls in her laboratory. Next she wishes to teleport the C n share of these copies to
Bob, but to minimize the consumption of Bell pairs, she should compress C n before
teleporting it.
If A and C are d-dimensional, then the bipartite state |ψiAC can be expressed in
terms of its Schmidt basis as
√ √ √
|ψiAC = p0 |00i + p1 |11i + . . . + pd−1 |d−1, d−1i, (10.169)
10.4 Entanglement Concentration and Dilution 33
where ~x p(~x) = 1. If Alice attempts to project onto the δ-typical subspace of C n , she
P
|ΨiAn C n = P −1/2
X p
p(~x) |~xiAn ⊗ |~xiC n , (10.172)
δ−typical x
~
such that
X √ √
hΨ|ψ ⊗ni = P −1/2 p(~x) = P ≥ 1 − ε. (10.173)
δ−typical ~
x
Since the typical subspace has dimension at most 2n(H(ρ)+δ) , Alice can teleport the
C n half of |Ψi to Bob with perfect fidelity using no more than n(H(ρ) + δ) Bell pairs
shared by Alice and Bob. The teleportation uses LOCC: Alice’s entangled measurement,
classical communication from Alice to Bob to convey the measurement outcome, and
Bob’s unitary transformation conditioned on the outcome. Finally, after the teleporta-
tion, Bob decompresses, so that Alice and Bob share a state which has high fidelity with
|ψi⊗n
AB . This protocol demonstrates that the entanglement cost EC of |ψi is not more
than H(ρA ).
Now consider the distillable entanglement ED . Suppose Alice and Bob share the state
|ψi⊗n
AB . Since |ψiAB is, in general, a partially entangled state, the entanglement that Alice
and Bob share is in a diluted form. They wish to concentrate their shared entanglement,
squeezing it down to the smallest possible Hilbert space; that is, they want to convert
it to maximally-entangled pairs. We will show that Alice and Bob can “distill” at least
k0 = n(H(ρA ) − δ) (10.174)
where 0 ≤ p ≤ 1, when expressed in its Schmidt basis. That is, Alice and Bob share the
state
√
|ψ(p)i⊗n = ( 1 − p |00i + p |11i)⊗n.
p
(10.176)
When we expand this state in the {|0i, |1i} basis, we find 2n terms, in each of which
Alice and Bob hold exactly the same binary string of length n.
34 Quantum Shannon Theory
Now suppose Alice (or Bob) performs a local measurement on her (his) n qubits,
measuring the total spin along the z-axis
n
(total) (i)
X
σ3 = σ3 . (10.177)
i=1
Equivalently, the measurement determines the Hamming weight of Alice’s n qubits, the
number of |1i’s in Alice’s n-bit string; that is, the number of spins pointing up.
n
In the expansion of |ψ(p)i⊗n there are m terms in which Alice’s string has Ham-
ming weight m, each occurring with the same amplitude: (1 − p)(n−m)/2 pm/2 . Hence the
probability that Alice’s measurement finds Hamming weight m is
n
p(m) = (1 − p)n−m pm . (10.178)
m
Furthermore, because Alice is careful not to acquire any additional information besides
the Hamming weight when she conducts the measurement, by measuring the Hamming
n
weight m she prepares a uniform superposition of all m strings with m up spins.
Because Alice and Bob have perfectly correlated strings, if Bob were to measure the
Hamming weight of his qubits he would find the same outcome as Alice. Alternatively,
Alice could report her outcome to Bob in a classical message, saving Bob the trouble of
doing the measurement himself. Thus, Alice and Bob share a maximally entangled state
D
X
|iiA ⊗ |iiB , (10.179)
i=1
n
where the sum runs over the D = m strings with Hamming weight m.
For n large the binomial distribution {p(m)} approaches a sharply peaked function
of m with mean µ = np and variance σ 2 = np(1 − p). Hence the probability of a large
deviation from the mean,
|m − np| = Ω(n), (10.180)
with probability approaching one as n → ∞, where H(p) = −p log2 p−(1−p) log2 (1−p)
is the entropy function. Thus with high probability Alice and Bob share a maximally
entangled state of Hilbert spaces HA and HB with dim(HA ) = dim(HB ) = D and
log2 D ≥ n(H(p) − δ). In this sense Alice and Bob can distill H(p) − δ Bell pairs per
copy of |ψiAB .
Though the number m of up spins that Alice (or Bob) finds in her (his) measurement
is typically close to np, it can fluctuate about this value. Sometimes Alice and Bob will
be lucky, and then will manage to distill more than H(p) Bell pairs per copy of |ψ(p)iAB .
But the probability of doing substantially better becomes negligible as n → ∞.
The same idea applies to bipartite pure states in larger Hilbert spaces. If A and B are
d-dimensional systems, then |ψiAB has the Schmidt decomposition
d−1 p
X
|ψ(X)iAB = p(x) |xiA ⊗ |xiB , (10.182)
i=0
10.5 Quantifying Mixed-State Entanglement 35
where X is the classical ensemble {x, p(x)}, and H(ρA ) = H(ρB ) = H(X). The Schmidt
decomposition of n copies of ψi is
d−1
X p
p(x1 )p(x2 ) . . . p(xn ) |x1 x2 . . . xn iAn ⊗ |x1 x2 . . . xn iB n . (10.183)
x1 ,x2 ,...,xn =0
Now Alice (or Bob) can measure the total number of |0i’s, the total number of |1i’s, etc.
in her (his) possession. If she finds m0 |0i’s, m1 |1i’s, etc., then her measurement prepares
a maximally entangled state with Schmidt number
n!
D(m0 , m1 , . . ., md−1 ) = (10.184)
m0 !m1 ! . . . md−1 !
and this outcome occurs with probability
p(m) = D(m0 , m1 , . . ., md−1 )p(0)m0 p(1)m1 . . . p(d−1)md−1 . (10.185)
For n large, Alice will typically find mx ≈ np(x), and again the probability of a large
deviation is small, so that, from Stirling’s approximation
2n(H(X)−o(1)) ≤ D ≤ 2n(H(X)+o(1)) (10.186)
with high probability. Thus, asymptotically for n → ∞, n(H(ρA ) − o(1)) high-fidelity
Bell pairs can be distilled from n copies of |ψi, establishing that ED (|ψi) ≥ H(ρA ), and
therefore ED (|ψi) = EC (|ψi) = E(|ψi).
This entanglement concentration protocol uses local operations but does not require
any classical communication. When Alice and Bob do the same measurement they al-
ways get the same outcome, so there is no need for them to communicate. Classical
communication really is necessary, though, to perform entanglement dilution. The pro-
tocol we described here, based on teleportation, requires two bits of classical one-way
communication per Bell pair consumed; in a more clever protocol this can be reduced
√
to O( n) bits, but no further. Since the classical communication cost is sublinear in n,
the number of bits of classical communication needed per copy of |ψi becomes negligible
in the limit n → ∞.
Here we have discussed the entanglement cost and distillable entanglement for bipar-
tite pure states. An achievable rate for distilling Bell pairs from bipartite mixed states
will be derived in §10.8.2.
cost and zero distillable entanglement, a phenomenon called bound entanglement. This
irreversibility is not shocking; any bipartite operation which maps many copies of the
pure state |φ+ iAB to many copies of the mixed state ρAB necessarily discards some
information to the environment, and we don’t normally expect a process that forgets
information to be reversible.
This separation between EC and ED raises the question, what is the preferred way to
quantify the amount of entanglement when two parties share a mixed quantum state?
The answer is, it depends. Many different measures of bipartite mixed-state entangle-
ment have been proposed, each with its own distinctive advantages and disadvantages.
Even though they do not always agree, both EC and ED are certainly valid measures.
A further distinction can be made between the rate ED1 at which entanglement can
be distilled with one-way communication between the parties, and the rate ED with
two-way communication. There are bipartite mixed states for which ED > ED1 , and
even states for which ED is nonzero while ED1 is zero. In contrast to the pure-state
case, we don’t have nice formulas for the values of the various entanglement measures,
though there are useful upper and lower bounds. We will derive a lower bound on ED1
in §10.8.2 (the hashing inequality).
There are certain properties that any reasonable measure of bipartite quantum en-
tanglement should have. The most important is that it must not increase under local
operations and classical communication, because quantum entanglement cannot be cre-
ated by LOCC alone. A function on bipartite states that is nonincreasing under LOCC
is called an entanglement monotone. Note that an entanglement monotone will also be
invariant under local unitary operations UAB = UA ⊗ UB , for if UAB can reduce the
entanglement for any state, its inverse can increase entanglement.
A second important property is that a bipartite entanglement measure must vanish
for separable states. Recall from Chapter 4 that a bipartite mixed state is separable if
it can be expressed as a convex combination of product states,
X
ρAB = p(x) |α(x)ihα(x)|A ⊗ |β(x)ihβ(x)|B . (10.187)
x
A separable state is not entangled, as it can be created using LOCC. Via classical com-
munication, Alice and Bob can establish a shared source of randomness, the distribution
X = {x, p(x)}. Then they may jointly sample from X; if the outcome is x, Alice prepares
|α(x)i while Bob prepares |β(x)i.
A third desirable property for a bipartite entanglement measure is that it should
agree with E = EC = ED for bipartite pure states. Both the entanglement cost and the
distillable entanglement respect all three of these properties.
We remark in passing that, despite the irreversibility of entanglement dilution under
LOCC, there is a mathematically viable way to formulate a reversible theory of bipartite
entanglement which applies even to mixed states. In this formulation, we allow Alice
and Bob to perform arbitrary bipartite operations that are incapable of creating entan-
glement; these include LOCC as well as additional operations which cannot be realized
using LOCC. In this framework, dilution and concentration of entanglement become
asymptotically reversible even for mixed states, and a unique measure of entanglement
can be formulated characterizing the optimal rate of conversion between copies of ρAB
and Bell pairs using these non-entangling operations.
Irreversible bipartite entanglement theory under LOCC, and also the reversible theory
under non-entangling bipartite operations, are both examples of resource theories. In the
10.5 Quantifying Mixed-State Entanglement 37
resource theory framework, one or more parties are able to perform some restricted class
of operations, and they are capable of preparing a certain restricted class of states using
these operations. In addition, the parties may also have access to resource states, which
are outside the class they can prepare on their own. Using their restricted operations,
they can transform resource states from one form to another, or consume resource states
to perform operations beyond what they could achieve with their restricted operations
alone. The name “resource state” conveys that such states are valuable because they
may be consumed to do useful things.
In a two-party setting, where LOCC is allowed or more general non-entangling oper-
ations are allowed, bipartite entangled states may be regarded as a valuable resource.
Resource theory also applies if the allowed operations are required to obey certain sym-
metries; then states breaking this symmetry become a resource. In thermodynamics,
states deviating from thermal equilibrium are a resource. Entanglement theory, as a par-
ticularly well developed resource theory, provides guidance and tools which are broadly
applicable to many different interesting situations.
where {|xiC } is an orthonormal set; the state ρABC has the block-diagonal form
eq.(10.82) and hence I(A; B|C) = 0. Conversely, if ρAB has any extension ρABC with
I(A; B|C) = 0, then ρABC has the form eq.(10.82) and therefore ρAB is separable.
Esq is difficult to compute, because the infimum is to be evaluated over all possible
extensions, where the system C may have arbitrarily high dimension. This property
also raises the logical possibility that there are nonseparable states for which the infi-
mum vanishes; conceivably, though a nonseparable ρAB can have no finite-dimensional
extension for which I(A; B|C) = 0, perhaps I(A; B|C) can approach zero as the di-
mension of C increases. Fortunately, though this is not easy to show, it turns out that
Esq is strictly positive for any nonseparable state. In this sense, then, it is a faithful
entanglement measure, strictly positive if and only if the state is nonseparable.
One desirable property of Esq , not shared by EC and ED , is its additivity on tensor
38 Quantum Shannon Theory
Though, unlike EC and ED , squashed entanglement does not have an obvious operational
meaning, any additive entanglement monotone which matches E for bipartite pure states
is bounded above and below by EC and ED respectively,
EC ≥ Esq ≥ ED . (10.192)
In particular, in the case of a pure tripartite state, Esq = H(A) is the (pure-state)
entanglement shared between A and BC. The inequality is saturated if Alice’s system
is divided into subsystems A1 and A2 such that the tripartite pure state is
loosely speaking, the entanglement cost EC (A; BC) imposes a ceiling on Alice’s ability
to entangle with Bob and Claire individually, requiring her to trade in some distillable
entanglement with Bob to increase her distillable entanglement with Claire.
To prove the monogamy relation eq.(10.193), we note that mutual information obeys
a chain rule which is really just a restatement of the definition of conditional mutual
information:
A similar equation follows directly from the definition if we condition on a fourth system
D,
Now, Esq (A; BC) is the infimum of I(A; BC|D) over all possible extensions of ρABC to
ρABCD . But since ρABCD is also an extension of ρAB and ρAC , we have
for any such extension. Taking the infimum over all ρABCD yields eq.(10.193).
A further aspect of monogamy arises when we consider extending a quantum state to
more parties. We say that the bipartite state ρAB of systems A and B is k-extendable
if there is a (k+1)-part state ρAB1 ...Bk whose marginal state on ABj matches ρAB for
each j = 1, 2, . . .k, and such that ρAB1 ...Bk is invariant under permutations of the k
systems B1 , B2 . . . Bk . Separable states are k-extendable for every k, and entangled pure
states are not even 2-extendable. Every entangled mixed state fails to be k-extendable
for some finite k, and we may regard the maximal value kmax for which such a symmetric
extension exists as a rough measure of how entangled the state is — bipartite entangled
states with larger and larger kmax are closer and closer to being separable.
Bob’s best strategy (his optimal measurement) maximizes this information gain. The
best information gain Bob can achieve,
Though there is no simple general formula for the accessible information of an ensem-
ble, we can derive a useful upper bound, called the Holevo bound. For the special case
of an ensemble of pure states E = {|ϕ(x)i, p(x)}, the Holevo bound becomes
X
Acc(E) ≤ H(ρ), where ρ = p(x)|ϕ(x)ihϕ(x)|, (10.201)
x
and a sharper statement is possible for an ensemble of mixed states, as we will see.
Since the entropy for a quantum system with dimension d can be no larger than log d,
the Holevo bound asserts that Alice, by sending n qubits to Bob (d = 2n ) can convey
no more than n bits of information. This is true even if Bob performs a sophisticated
collective measurement on all the qubits at once, rather than measuring them one at a
time.
Therefore, if Alice wants to convey classical information to Bob by sending qubits, she
can do no better than treating the qubits as though they were classical, sending each
qubit in one of the two orthogonal states {|0i, |1i} to transmit one bit. This statement is
not so obvious. Alice might try to stuff more classical information into a single qubit by
sending a state chosen from a large alphabet of pure single-qubit signal states, distributed
uniformly on the Bloch sphere. But the enlarged alphabet is to no avail, because as the
number of possible signals increases the signals also become less distinguishable, and
Bob is not able to extract the extra information Alice hoped to deposit in the qubit.
If we can send information more efficiently by using an alphabet of mutually orthog-
onal states, why should we be interested in the accessible information for an ensemble
of non-orthogonal states? There are many possible reasons. Perhaps Alice finds it eas-
ier to send signals, like coherent states, which are imperfectly distinguishable rather
than mutually orthogonal. Or perhaps Alice sends signals to Bob through a noisy chan-
nel, so that signals which are orthogonal when they enter the channel are imperfectly
distinguishable by the time they reach Bob.
The accessible information game also arises when an experimental physicist tries to
measure an unknown classical force using a quantum system as a probe. For example, to
measure the z-component of a magnetic field, we may prepare a spin- 12 particle pointing
in the x-direction; the spin precesses for time t in the unknown field, producing an
ensemble of possible final states (which will be an ensemble of mixed states if the initial
preparation is imperfect, or if decoherence occurs during the experiment). The more
information we can gain about the final state of the spin, the more accurately we can
determine the value of the magnetic field.
Now we have
I(X; Y )ρ0 ≤ I(X; AY )ρ0 ≤ I(X; A)ρ, (10.205)
where the subscript indicates the state in which the mutual information is evaluated;
the first inequality uses strong subadditivity in the state ρ0 , and the second uses mono-
tonicity under the channel mapping ρ to ρ0 .
The quantity I(X; A) is an intrinsic property of the ensemble E; it is denoted χ(E)
and called the Holevo chi of the ensemble. We have shown that however Bob chooses
his measurement his information gain is bounded above by the Holevo chi; therefore,
Acc(E) ≤ χ(E) := I(X; A)ρ. (10.206)
This is the Holevo bound.
Now let’s calculate I(X; A)ρ explicitly. We note that
!!
X X
0 0 0 0
H(XA) = −trXA p(x)|xihx| ⊗ ρ(x) log p(x )|x ihx | ⊗ ρ(x )
x x0
X
=− trA p(x)ρ(x) (log p(x) + log ρ(x))
x
X
= H(X) + p(x)H(ρ(x)), (10.207)
x
and therefore
X
H(A|X) = H(XA) − H(X) = p(x)H(ρ(x)). (10.208)
x
For an ensemble of pure states, χ is just the entropy of the density operator arising from
the ensemble, but for an ensemble E of mixed states it is a strictly smaller quantity – the
difference between the entropy H(ρE ) of the convex sum of signal states and the convex
sum hHiE of the signal state entropies; this difference is always nonnegative because of
the concavity of the entropy function (or because mutual information is nonnegative).
where
E = {ρ(x)), p(x)} and E 0 = {ρ0 (x) = N (ρ(x)), p(x)}. (10.211)
A channel cannot increase the Holevo χ of an ensemble.
Its monotonicity provides a further indication that χ(E) is a useful measure of the
information encoded in an ensemble of quantum states; the decoherence described by
a quantum channel can reduce this quantity, but never increases it. In contrast, the
Von Neumann entropy may either increase or decrease under the action of a channel.
Mapping pure states to mixed states can increase H, but a channel might instead map
the mixed states in an ensemble to a fixed pure state |0ih0|, decreasing H and improving
the purity of each signal state, but without improving the distinguishability of the states.
We discussed the asymptotic limit H(ρ) on quantum compression per letter in §10.3.2.
There we considered unitary decoding; invoking the monotonicity of Holevo χ clarifies
why more general decoders cannot do better. Suppose we compress and decompress the
ensemble E ⊗n using an encoder Ne and a decoder Nd , where both maps are quantum
channels:
N
e ˜(n) N
E ⊗n −→ E −→ d
Ẽ 0(n) ≈ E ⊗n (10.212)
The Holevo χ of the input pure-state product ensemble is additive, χ(E ⊗n ) = H(ρ⊗n ) =
nH(ρ), and χ of a d-dimensional system is no larger than log2 d; therefore if the ensemble
E˜(n) is compressed to q qubits per letter, then because of the monotonicity of χ the
decompressed ensemble E˜0(n) has Holevo chi per letter n1 χ(Ẽ 0(n)) ≤ q. If the decompressed
output ensemble has high fidelity with the input ensemble, its χ per letter should nearly
match the χ per letter of the input ensemble, hence
1
q≥ χ(Ẽ 0(n)) ≥ H(ρ) − δ (10.213)
n
for any positive δ and sufficiently large n. We conclude that high-fidelity compression
to fewer than H(ρ) qubits per letter is impossible asymptotically, even when the com-
pression and decompression maps are arbitrary channels.
a spin- 21 object points in one of three directions that are symmetrically distributed in
the xz-plane. Each state has a priori probability 31 . Evidently, Alice’s signal states are
nonorthogonal:
1
hϕ1 |ϕ2 i = hϕ1 |ϕ3 i = hϕ2 |ϕ3 i = − . (10.215)
2
10.6 Accessible Information 43
Bob’s task is to find out as much as he can about what Alice prepared by making a
suitable measurement. The density matrix of Alice’s ensemble is
1 1
ρ= (|ϕ1 ihϕ1| + |ϕ2 ihϕ2| + |ϕ3 ihϕ3 |) = I, (10.216)
3 2
which has H(ρ) = 1. Therefore, the Holevo bound tells us that the mutual information
of Alice’s preparation and Bob’s measurement outcome cannot exceed 1 bit.
In fact, though, the accessible information is considerably less than the one bit allowed
by the Holevo bound. In this case, Alice’s ensemble has enough symmetry that it is not
hard to guess the optimal measurement. Bob may choose a POVM with three outcomes,
where
2
E a = (I − |ϕa ihϕa|), a = 1, 2, 3; (10.217)
3
we see that
0 a = b,
p(a|b) = hϕb|E a |ϕb i = 1 (10.218)
2 a 6= b.
The measurement outcome a excludesthe possibility that Alice prepared a, but leaves
equal a posteriori probabilities p = 21 for the other two states. Bob’s information gain
is
I = H(X) − H(X|Y ) = log2 3 − 1 = .58496. (10.219)
each with pab = 1/9. Then Bob’s best strategy is to perform the POVM eq. (10.217)
on each of the two qubits, achieving a mutual information of .58496 bits per qubit, as
before.
But, determined to do better, Alice and Bob decide on a different strategy. Alice will
prepare one of three two-qubit states
of the four-dimensional two-qubit Hilbert space. In Exercise 10.4, you will show that the
density operator
3
!
1 X
ρ= |ΦaihΦa| , (10.223)
3
a=1
This is a positive operator on the space spanned by the |Φ̃ai’s. Therefore, on that
subspace, G has an inverse, G−1 and that inverse has a positive square root G−1/2 .
Now we define
E a = G−1/2 |Φ̃a ihΦ̃a|G−1/2 , (10.227)
on the span of the |Φ̃ai’s. If necessary, we can augment these E a ’s with one more positive
operator, the projection E 0 onto the orthogonal complement of the span of the |Φ̃ai’s,
and so construct a POVM. This POVM is the PGM associated with the vectors |Φ̃a i.
In the special case where the |Φ̃a i’s are orthogonal,
p
|Φ̃ai = λa|φa i, (10.229)
10.6 Accessible Information 45
= |φaihφa|; (10.230)
this is the orthogonal measurement that perfectly distinguishes the |φai’s and so clearly
is optimal. If the |Φ̃ai’s are linearly independent but not orthogonal, then the PGM
is again an orthogonal measurement (because n one-dimensional operators in an n-
dimensional space can constitute a POVM only if mutually orthogonal — see Exercise
3.11), but in that case the measurement may not be optimal.
In Exercise 10.4, you’ll construct the PGM for the vectors |Φai in eq. (10.222), and
you’ll show that
1 2
1
p(a|a) = hΦa |E a|Φa i = 1+ √ = .971405,
3 2
1 2
1
p(b|a) = hΦa |E b |Φai = 1− √ = .0142977 (10.231)
6 2
(for b 6= a). It follows that the conditional entropy of the input is
a mutual information of .684534 bits per qubit. Thus, the improved distinguishability
of Alice’s signals has indeed paid off – we have exceeded the .58496 bits that can be
extracted from a single qubit. We still didn’t saturate the Holevo bound (I ≤ 1.5 in this
case), but we came a lot closer than before.
This example, first described by Peres and Wootters, teaches some useful lessons.
First, Alice is able to convey more information to Bob by “pruning” her set of codewords.
She is better off choosing among fewer signals that are more distinguishable than more
signals that are less distinguishable. An alphabet of three letters encodes more than an
alphabet of nine letters.
Second, Bob is able to read more of the information if he performs a collective measure-
ment instead of measuring each qubit separately. His optimal orthogonal measurement
projects Alice’s signal onto a basis of entangled states.
is bounded above by the Holevo chi of the output ensemble E 0 . To convey as much infor-
mation through the channel as possible, Alice and Bob may choose the input ensemble
E that maximizes the Holevo chi of the output ensemble E 0 . The maximum value
of χ(E 0 ) is a property of the channel, which we will call the Holevo chi of N .
As we’ve seen, Bob’s actual optimal information gain in this single-shot setting may
fall short of χ(E 0 ) in general. But instead of using the channel just once, suppose that
Alice and Bob use the channel n 1 times, where Alice sends signal states chosen from
a code, and Bob performs an optimal measurement to decode the signals he receives.
Then an information gain of χ(N ) bits per letter really can be achieved asymptotically
as n → ∞.
Let’s denote Alice’s ensemble of encoded n-letter signal states by E˜(n) , denote the
ensemble of classical labels carried by the signals by X̃ n , and denote Bob’s ensemble of
measurement outcomes by Ỹ n . Let’s say that the code has rate R if Alice may choose
from among 2nR possible signals to send. If classical information can be sent through
the channel with rate R − o(1) such that Bob can decode the signal with negligible error
probability as n → ∞, then we say the rate R is achievable. The classical capacity C(N )
of the quantum channel N A→B is the supremum of all achievable rates.
As in our discussion of the capacity of a classical channel in §10.1.4, we suppose
that X̃ n is the uniform ensemble over the 2nR possible messages, so that H(X̃ n) = nR.
Furthermore, the conditional entropy per letter n1 H(X̃ n|Ỹ n )) approaches zero as n → ∞
if the error probability is asymptotically negligible; therefore,
1
R≤ I(X̃ n; Ỹ n ) + o(1)
n
1 1
n n
χ(N ⊗n ) + o(1) ,
≤ max I(X ; B ) + o(1) = (10.236)
n E (n) n
where we obtain the first inequality as in eq.(10.47) and the second inequality by invoking
the Holevo bound, optimized over all possible n-letter input ensembles. We therefore
infer that
1
C(N ) ≤ lim χ N ⊗n ;
(10.237)
n→∞ n
the classical capacity is bounded above by the asymptotic Holevo χ per letter of the
product channel N ⊗n .
In fact this upper bound is actually an achievable rate, and hence equal to the classical
capacity C(N ). However, this formula for the classical capacity is not very useful as it
stands, because it requires that we optimize the Holevo χ over message ensembles of
arbitrary length; we say that the formula for capacity is regularized if, as in this case,
it involves taking a limit in which the number of channel tends to infinity. It would be
far preferable to reduce our expression for C(N ) to a single-letter formula involving just
one use of the channel. In the case of a classical channel, the reduction of the regularized
expression to a single-letter formula was possible, because the conditional entropy for n
uses of the channel is additive as in eq.(10.44).
10.6 Accessible Information 47
For quantum channels the situation is more complicated, as channels are known to
exist such that the Holevo χ is strictly superadditive:
Therefore, at least for some channels, we are stuck with the not-very-useful regularized
formula for the classical capacity. But we can obtain a single-letter formula for the
optimal achievable communication rate if we put a restriction on the code used by Alice
and Bob. In general, Alice is entitled to choose input codewords which are entangled
across the many uses of the channel, and when such entangled codes are permitted
the computation of the classical channel capacity may be difficult. But suppose we
demand that all of Alice’s codewords are product states. With that proviso the Holevo
chi becomes subadditive, and we may express the optimal rate as
C1 (N ) = χ(N ). (10.239)
(Here E (n) is the output ensemble received by Bob when Alice sends product-state
codewords, but to simplify the notation we have dropped the prime (indicating output)
and tilde (indicating codewords) used earlier, e.g. in eq.(10.234) and eq.(10.236).) While
the Von Neumann entropy is subadditive,
n
X
H(B n ) ≤ H(Bi); (10.242)
i=1
(see eq.(10.209)) is not subadditive in general. But for the product-state ensemble
eq.(10.240), since the entropy of a product is additive, we have
n
!
X X
H(B n |X n) = p(x1 x2 , . . . xn ) H (ρ(xi))
x1 ,x2 ,...,xn i=1
Xn n
X
= pi (xi )H(ρ(xi )) = H(Bi |Xi) (10.244)
i=1 i=1
where Xi = {xi , pi(xi )} is the marginal probability distribution for the ith letter.
Eq.(10.244) is a quantum analog of eq.(10.44), which holds for product-state ensembles
but not in general for entangled ensembles. Combining eq.(10.241), (10.242), (10.244),
48 Quantum Shannon Theory
we have
n
X X
I(X n; B n ) ≤ (H(Bi ) − H(Bi |Xi)) = I(Xi; Bi ) ≤ nχ(N ). (10.245)
i=1 i
is selected with probability p(~x) = p(x1 )p(x2) . . . p(xn ). (In fact Alice should choose each
ρ(~x) to be pure to optimize the communication rate.) This codeword is sent via n uses
of the channel N , and Bob receives the product state
N ⊗n (ρ(~x)) = N (ρ(x1 )) ⊗ N (ρ(x2 )) ⊗ · · · ⊗ N (ρ(xn )). (10.247)
Averaged over codewords, the joint state of Alice’s classical register X n and Bob’s system
B n is
p(~x) |~xih~x| ⊗ N ⊗n (ρ(~x)).
X
ρX n B n = (10.248)
x
~
capacity theorems when quantum channels are used for other tasks besides sending
classical information. We’ll turn to that in §10.7.
which is mapped by N1 ⊗ N2 to
ρ0XB1 B2 = N1 ⊗ N2 ρXA1 A2 .
(10.252)
ρ0XB1 = I ⊗ N1 ρXA1 =
X
p(x) |xihx| ⊗ N1 (ρ(x)A1 ) , (10.253)
x
so that in particular
Eq.(10.256) holds for any channels N1 and N2 ; now to obtain eq.(10.250) it suffices to
show that
Therefore,
X
ρ0XB1B2 = p(x)p(y|x)|xihx| ⊗ σ(x, y)B1 ⊗ [N2 (τ (x, y))]B2 (10.259)
x,y
50 Quantum Shannon Theory
ω 0XY B1 B2 =
X
p(x, y)|x, yihx, y| ⊗ σ(x, y) ⊗ N2 (τ (x, y)) . (10.260)
x,y
Furthermore, because ω 0 becomes a product state when conditioned on (x, y), we find
I(B1 ; B2 |XY )ω0 = 0, (10.261)
and using strong subadditivity together with the definition of conditional mutual infor-
mation we obtain
I(XB1 ; B2 )ρ0 = I(XB1 ; B1 )ω0 ≤ I(XY B1 ; B2 )ω0
= I(XY ; B2 )ω0 + I(B1 ; B2 |XY )ω0 = I(XY ; B2 )ω0 . (10.262)
Finally, noting that
trB1 ω0XY B1 B2 =
X
p(x, y)|x, yihx, y| ⊗ N2 (τ (x, y)) (10.263)
x,y
and recalling the definition of χ(N2 ), we see that I(XY ; B2 )ω0 ≤ χ(N2 ), establishing
eq.(10.257), and therefore eq.(10.250).
An example of an entanglement-breaking channel is a classical-quantum channel, also
called a c-q channel, which acts according to
X
N A→B : ρA 7→ hx|ρA |xiσ(x)B , (10.264)
x
where {|xi} is an orthonormal basis. In effect, the channel performs a complete orthog-
onal measurement on the input state and then prepares an output state conditioned on
the measurement outcome. The measurement breaks the entanglement between system
A and any other system with which it was initially entangled. Therefore, c-q channels
are entanglement breaking and have additive Holevo chi.
Another pleasing feature of this formula is its robustness. For example, the capacity
does not increase if we allow the sender and receiver to share randomness, or if we
allow feedback from receiver to sender. But for quantum channels the story is more
complicated. We’ve seen already that no simple single-letter formula is known for the
classical capacity of a quantum channel, if we allow entanglement among the channel
inputs, and we’ll soon see that the same is true for the quantum capacity. In addition, it
turns out that entanglement shared between sender and receiver can boost the classical
and quantum capacities of some channels, and so can “backward” communication from
receiver to sender. There are a variety of different notions of capacity for quantum
channels, all reasonably natural, and all with different achievable rates.
While Shannon’s theory of classical communication over noisy classical channels is
10.7 Quantum Channel Capacities and Decoupling 51
pristine and elegant, the same cannot be said for the theory of communication over noisy
quantum channels, at least not in its current state. It’s still a work in progress. Perhaps
some day another genius like Shannon will construct a beautiful theory of quantum
capacities. For now, at least there are a lot of interesting things we can say about
achievable rates. Furthermore, the tools that have been developed to address questions
about quantum capacities have other applications beyond communication theory.
The most direct analog of the classical capacity of a classical channel is the quantum
capacity of a quantum channel, unassisted by shared entanglement or feedback. The
quantum channel N A→B is a TPCP map from HA to HB , and Alice is to use the
channel n times to convey a quantum state to Bob with high fidelity. She prepares her
state |ψi in a code subspace
H(n) ⊆ H⊗n
A (10.266)
and sends it to Bob, who applies a decoding map, attempting to recover |ψi. The rate
R of the code is the number of encoded qubits sent per channel use,
R = log2 dim H(n) , (10.267)
We say that the rate R is achievable if there is a sequence of codes with increasing n
such that for any ε, δ > 0 and for sufficiently large n the rate is at least R − δ and Bob’s
recovered state ρ has fidelity F = hψ|ρ|ψi ≥ 1 − ε. The quantum channel capacity Q(N )
is the supremum of all achievable rates.
There is a regularized formula for Q(N ). To understand the formula we first need
to recall that any channel N A→B has an isometric Stinespring dilation U A→BE where
E is the channel’s “environment.” Furthermore, any input density operator ρA has a
purification; if we introduce a reference system R, for any ρA there is a pure state ψRA
such that ρA = trR (|ψihψ|). (I will sometimes use ψ rather than the Dirac ket |ψi
to denote a pure state vector, when the context makes the meaning clear and the ket
notation seems unnecessarily cumbersome.) Applying the channel’s dilation to ψRA, we
obtain an output pure state φRBE , which we represent graphically as:
A - U - B
- E
Here the maximum is taken over all possible input density operators {ρA }, and H(R|B)
is the quantum conditional entropy
where in the last equality we used H(RB) = H(E) in a pure state of RBE. The quantity
−H(R|B) has such a pivotal role in quantum communication theory that it deserves to
52 Quantum Shannon Theory
have its own special name. We call it the coherent information from R to B and denote
it
Ic (RiB)φ = −H(R|B)φ = H(B)φ − H(E)φ. (10.270)
This quantity does not depend on how the purification φ of the density operator ρA is
chosen; any one purification can be obtained from any other by a unitary transformation
acting on R alone, which does not alter H(B) or H(E). Indeed, since the expression
H(B) − H(E) only depends on the marginal state of BE, for the purpose of computing
this quantity we could just as well consider the input to the channel to be the mixed state
ρA obtained from ψRA by tracing out the reference system R. Furthermore, the coherent
information does not depend on how we choose the dilation of the quantum channel;
given a purification of the input density operator ρA , Ic (RiB)φ = H(B) − H(RB) is
determined by the output density operator of RB.
For a classical channel, H(R|B) is always nonnegative and the coherent information
is never positive. In the quantum setting, Ic (RiB) is positive if the reference system R
is more strongly correlated with the channel output B than with the environment E.
Indeed, an alternative way to express the coherent information is
1
Ic (RiB) = (I(R; B) − I(R; E)) = H(B) − H(E), (10.271)
2
where we note that (because φRBE is pure)
I(R; B) = H(R) + H(B) − H(RB) = H(R) + H(B) − H(E),
I(R; E) = H(R) + H(E) − H(RE) = H(R) + H(E) − H(B). (10.272)
Now we can state the regularized formula for the quantum channel capacity — it is
the optimal asymptotic coherent information per letter
1
Q(N A→B ) = lim max Ic (Rn iB n )φRn Bn En , (10.273)
n→∞ A n n
where the input density operator ρAn is allowed to be entangled across the n channel
uses. If coherent information were subadditive, we could reduce this expression to a
single-letter quantity, the one-shot capacity Q1 (N ). But, unfortunately, for some chan-
nels the coherent information can be superadditive, in which case the regularized formula
is not very informative. At least we can say that Q1 (N ) is an achievable rate, and there-
fore a lower bound on the capacity.
and hence
Ic (RiA) ≥ Ic (RiB) ≥ Ic (RiC). (10.276)
A quantum channel cannot increase the coherent information, which has been called the
quantum data-processing inequality.
Suppose now that ρA is a quantum code state, and that the two channels acting in
succession are a noisy channel N A→B and the decoding map DB→B̂ applied by Bob to
the channel output in order to recover the channel input. Consider the action of the
0
dilation U A→BE of N followed by the dilation V B→B̂B of D on the input purification
ψRA , under the assumption that Bob is able to recover perfectly:
U V
ψRA −→ φRBE −→ ψ̃RB̂B 0 E = ψRB̂ ⊗ χB 0 E . (10.277)
If the decoding is perfect, then after decoding Bob holds in system B̂ the purification
of the state of R, so that
H(R) = Ic (RiA)ψ = Ic (RiB̂)ψ̃ . (10.278)
Since the initial and final states have the same coherent information, the quantum data
processing inequality implies that the same must be true for the intermediate state
φRBE :
H(R) = Ic (RiB) = H(B) − H(E)
=⇒ H(B) = H(RE) = H(R) + H(E). (10.279)
Thus the state of RE is a product state. We have found that if Bob is able to recover
perfectly from the action of the channel dilation U A→BE on the pure state ψRA , then,
in the resulting channel output pure state φRBE , the marginal state ρRE must be the
product ρR ⊗ ρE . Recall that we encountered this criterion for recoverability earlier,
when discussing quantum error-correcting codes in Chapter 7.
Conversely, suppose that ψRA is an entangled pure state, and Alice wishes to transfer
the purification of R to Bob by sending it through the noisy channel U A→BE . And
suppose that in the resulting tripartite pure state φRBE , the marginal state of RE
factorizes as ρRE = ρR ⊗ ρE . Then B decomposes into subsystems B = B1 B2 such that
φRBE = W B ψ̃RB1 ⊗ χB2 E . (10.280)
where W B is some unitary change of basis in B. Now Bob can construct an isometric
†
decoder V B1 →B̂ W B , which extracts the purification of R into Bob’s preferred subsystem
B̂. Since all purifications of R differ by an isometry on Bob’s side, Bob can choose his
decoding map to output the state ψRB̂ ; then the input state of RA is successfully
transmitted to RB̂ as desired. Furthermore, we may choose the initial state to be a
maximally entangled state ΦRA of the reference system with the code space of a quantum
code; if the marginal state of RE factorizes in the resulting output pure state φRBE ,
then by the relative state method of Chapter 3 we conclude that any state in the code
space can be sent through the channel and decoded with perfect fidelity by Bob.
We have found that purified quantum information transmitted through the noisy
channel is exactly correctable if and only if the reference system is completely uncor-
related with the channel’s environment, or as we sometimes say, decoupled from the
environment. This is the decoupling principle, a powerful notion underlying many of the
key results in the theory of quantum channels.
54 Quantum Shannon Theory
So far we have shown that exact correctability corresponds to exact decoupling. But we
can likewise see that approximate correctability corresponds to approximate decoupling.
Suppose for example that the state of RE is close to a product state in the L1 norm:
kρRE − ρR ⊗ ρE k1 ≤ ε. (10.281)
As we learned in Chapter 2, if two density operators are close together in this norm, that
means they also have fidelity close to one and hence purifications with a large overlap.
Any purification of the product state ρR ⊗ ρE has the form
φ̃RBE = W B ψ̃RB1 ⊗ χB2 E , (10.282)
and since all purifications of ρRE can be transformed to one another by an isometry
acting on the purifying system B, there is a way to choose W B such that
2
F (ρRE , ρR ⊗ ρE ) =
hφRBE |φ̃RBE i
≥ 1 − kρRE − ρR ⊗ ρE k1 ≥ 1 − ε. (10.283)
Furthermore, because fidelity is monotonic, both under tracing out E and under the
action of Bob’s decoding map, and because Bob can decode φ̃RBE perfectly, we conclude
that
F DB→B̂ (ρRB ) , ψRB̂ ≥ 1 − ε (10.284)
if Bob chooses the proper decoding map D. Thus approximate decoupling in the L1 norm
implies high-fidelity correctability. It is convenient to note that a similar argument still
works if ρRE is close in the L1 norm to ρ̃R ⊗ ρ̃E , where ρ̃R is not necessarily trE (ρRE )
and ρ̃E is not necessarily trR (ρRE ).
On the other hand, if (approximate) decoupling fails, the fidelity of Bob’s decoded
state will be seriously compromised. Suppose that in the state φRBE we have
I(R; E) = H(R) + H(E) − H(RE) = ε > 0. (10.285)
Then the coherent information of φ is
Ic (RiB)φ = H(B)φ − H(E)φ = H(RE)φ − H(E)φ = H(R)φ − ε. (10.286)
By the quantum data processing inequality, we know that the coherent information of
Bob’s decoded state ψ̃RB̂ is no larger; hence
The deviation from perfect decoupling means that the decoded state of RB̂ has some
residual entanglement with the environment E, and is therefore impure.
Now we have the tools to derive an upper bound on the quantum channel capacity
Q(N ). For n channel uses, let ψ (n) be a maximally entangled state of a reference system
(n) (n) (n)
HR ⊆ H⊗n ⊗n
R with a code space HA ⊆ HA , where dim HA = 2
nR̄ , so that
These quantities are all evaluated in the state φRB1 B2 E1 E2 . But notice that for the
2 →B2 E2
evaluation of H(B1 )−H(E1), the isometry U A 2 is irrelevant. This quantity is really
the same as the coherent information Ic (RA2 iB1 ), where now we regard A2 as part of the
reference system for the input to channel N1 . Similarly H(B2 ) − H(E2) = Ic (RA1 iB2 ),
and therefore,
where in the last inequality we use the definition of the one-shot capacity as coher-
ent information maximized over all inputs. Since Q1 (N1 ⊗ N2 ) is likewise defined by
maximizing the coherent information Ic (RiB1 B2 ), we find that
where the last inequality follows from eq.(10.297) assuming that N is degradable. We’ll
see that Q1 (N ) is actually an achievable rate, and therefore a single-letter formula for
the quantum capacity of a degradable channel.
As a concrete example of a degradable channel, consider the generalized dephasing
channel with dilation
U A→BE : |xiA 7→ |xiB ⊗ |αx iE , (10.299)
where {|xiA}, {|xiB } are orthonormal bases for HA , HB respectively, and the states
{|αxiE } of the environment are normalized but not necessarily orthogonal. (We discussed
the special case where A and B are qubits in §3.4.2.) The corresponding channel is
X
N A→B : ρ 7→ |xihx|ρ|x0 ihx0 |hαx0 |αx i, (10.300)
x,x0
In the special case where the states {|αxiE = |xiE } are orthonormal, we obtain the
completely dephasing channel
X
∆A→B : ρ 7→ |xihx|ρ|xihx|, (10.302)
x
whose complement ∆A→E has the same form as ∆A→B . (Here subscripts have been
suppressed to avoid cluttering the notation, but it should be clear from the context
10.8 Quantum Protocols 57
whether |xi denotes |xiA , |xiB , or |xiE in the expressions for N A→B , NcA→E , ∆A→B ,
and ∆A→E .) We can easily check that
NcA→E = NcC→E ◦ ∆B→C ◦ N A→B ; (10.303)
therefore Nc ◦ ∆ degrades N to Nc . Thus N is degradable and Q(N ) = Q1 (N ).
Further examples of degradable channels are discussed in Exercise 10.12.
and 21 I(R; B) = H(R) = H(B) = log2 d is just the number of qubits in A. To remember
the factor of 12 in front of I(R; E), consider the case of a noiseless classical channel (what
58 Quantum Shannon Theory
we called the completely dephasing channel in §10.7.3), where the quantum information
completely decoheres in a preferred basis; in that case
d−1
1 X
φRBE = √ |xiR ⊗ |xiB ⊗ |xiE , (10.306)
d x=0
and I(R; B) = I(R; E) = H(R) = H(B) = log2 d. Then the father inequality merely
expresses the power of quantum teleportation: we can transmit n2 qubits by consuming
n
2 ebits and sending n bits through the noiseless classical channel.
Before proving the father resource inequality, we will first discuss a few of its inter-
esting consequences.
In this case there is a matching upper bound, so eq.(10.310) is really an equality, and
hence a single-letter formula for the entanglement-assisted classical capacity. Further-
more, eq.(10.309) tells us a rate of entanglement consumption which suffices to achieve
the capacity. If we disregard the cost of entanglement, the father protocol shows that a
rate can be achieved for entanglement-assisted quantum communication which is half the
entanglement-assisted classical capacity CE (N ) of the noisy channel N . That’s clearly
true, since by consuming entanglement we can use teleportation to convert n bits of
classical communication into n/2 qubits of quantum communication. We also note that
for the case where N is a noisy classical channel, eq.(10.310) matches Shannon’s classical
capacity; in that case, no consumption of entanglement is needed to reach the optimal
classical communication rate.
10.8 Quantum Protocols 59
After repaying their debt, Alice and Bob retain a number of qubits of quantum commu-
nication per channel use
1 1
I(R; B) − I(R; E) = H(B) − H(E) = Ic (RiB), (10.312)
2 2
the channel’s coherent information from R to B. We therefore obtain the achievable rate
for quantum communication
A→B
N : ρA ≥ Ic (RiB)[q → q], (10.313)
albeit in the catalyzed setting. It can actually be shown that this same rate is achievable
without invoking catalysis (see §10.9.4). As already discussed in §10.7.1, though, because
of the superadditivity of coherent information this resource inequality does not yield a
general single-letter formula for the quantum channel capacity Q(N ).
A B A1 A2 B A2 B2 B1
@ φABE @
@ =⇒ @ =⇒
@ E @ E E
In the i.i.d. version of the mother protocol, the initial state is φ⊗n
ABE , and the task
achieved by the protocol is summarized by the mother resource inequality
1 1
hφABE i + I(A; E)[q → q] ≥ I(A; B)[qq] + hφ0B1 E i, (10.314)
2 2
60 Quantum Shannon Theory
where the resources on the left-hand side can be used to achieve the result on the right-
hand side, in an asymptotic i.i.d. setting, and the entropic quantities are evaluated in the
(n) (n)
state φABE . That is, if A1 denotes the state Alice sends and A2 denotes the state she
(n)
keeps, then for any positive
ε, the state of A2 E n is ε-close in the L1 norm to a product
(n) (n) (n)
state, where log A1 = n2 I(A; E) + o(n), while A2 B2 contains n2 I(A; B) − o(n)
shared ebits of entanglement. Eq.(10.314) means that for any input pure state φABE
(n) (n)
there is a way to choose the subsystem A2 of the specified dimension such that A2
and E n are nearly uncorrelated and the specified amount of entanglement is harvested
(n) (n)
in A2 B2 .
The mother protocol is in a sense dual to the father protocol. While the father pro-
tocol consumes entanglement to achieve quantum communication, the mother protocol
consumes quantum communication and harvests entanglement. For the mother, I(A; B)
quantifies the correlation between Alice and Bob at the beginning of the protocol (some-
thing good), and I(A; E) quantifies the noise in the initial shared entanglement (some-
thing bad). The mother protocol can also be viewed as a quantum generalization of the
Slepian-Wolf distributed compression protocol discussed in §10.1.3. The mother proto-
col merges Alice’s and Bob’s shares of the purification of E by sending Alice’s share to
Bob, much as distributed source coding merges the classical correlations shared by Alice
and Bob by sending Alice’s classical information to Bob. For this reason the mother
protocol has been called the fully quantum Slepian-Wolf protocol; the modifier “fully”
will be clarified below, when we discuss state merging, a variant on quantum state trans-
fer in which classical communication is assumed to be freely available. For the mother
(or father) protocol, 12 I(A; E) (or 12 I(R; E)) quantifies the price we pay to execute the
protocol, while 12 I(A; B) (or 12 I(R; B)) quantifies the reward we receive.
We may also view the mother protocol as a generalization of the entanglement con-
centration protocol discussed in §10.4, extending that discussion in three ways:
1. The initial entangled state shared by Alice and Bob may be mixed rather than pure.
2. The communication from Alice to Bob is quantum rather than classical.
3. The amount of communication that suffices to execute the protocol is quantified by
the resource inequality.
Also note that if the state of AE is pure (uncorrelated with B), then the mother protocol
reduces to Schumacher compression. In that case 21 I(A; E) = H(A), and the mother
resource inequality states that the purification of E n can be transferred to Bob with
high fidelity using nH(A) + o(n) qubits of quantum communication.
Before proving the mother resource inequality, we will first discuss a few of its inter-
esting consequences.
Hashing inequality.
Suppose Alice and Bob wish to distill entanglement from many copies of the state φABE ,
using only local operations and classical communication (LOCC). In the catalytic set-
ting, they can borrow some quantum communication, use the mother protocol to distill
some shared entanglement, and then use classical communication and their harvested
entanglement to repay their debt via quantum teleportation. Using the teleportation
resource inequality
TP : [qq] + 2[c → c] ≥ [q → q] (10.315)
10.8 Quantum Protocols 61
n
2 I(A; E) times, and combining with the mother resource inequality, we obtain
since the net amount of distilled entanglement is 12 I(A; B) per copy of φ achieved by
the mother minus the 21 I(A; E) per copy consumed by teleportation, and
1 1
I(A; B) − I(A; E) = H(B) − H(E) = Ic (AiB). (10.317)
2 2
Eq.(10.316) is the hashing inequality, which quantifies an achievable rate for distilling
ebits of entanglement shared by Alice and Bob from many copies of a mixed state ρAB ,
using one-way classical communication, assuming that Ic (AiB) = −H(A|B) is positive.
Furthermore, the hashing inequality tells us how much classical communication suffices
for this purpose.
In the case where the state ρAB is pure, Ic (AiB) = H(A) − H(AB) = H(A) and
there is no environment E; thus we recover our earlier conclusion about concentration
of pure-state bipartite entanglement — that H(A) Bell pairs can be extracted per copy,
with a negligible classical communication cost.
State merging.
Suppose Alice and Bob share the purification of Eve’s state, and Alice wants to transfer
her share of the purification to Bob, where now unlimited classical communication from
Alice to Bob is available at no cost. In contrast to the mother protocol, Alice wants to
achieve the transfer with as little one-way quantum communication as possible, even if
she needs to send more bits in order to send fewer qubits.
In the catalytic setting, Alice and Bob can borrow some quantum communication,
perform the mother protocol, then use teleportation and the entanglement extracted by
the mother protocol to repay some of the borrowed quantum communication. Combining
teleportation of n2 I(A; B) qubits with the mother resource inequality, we obtain
using
1 1
I(A; E) − I(A; B) = H(E) − H(B) = H(AB) − H(B) = H(A|B). (10.319)
2 2
Eq.(10.318) is the state-merging inequality, expressing how much quantum and classical
communication suffices to achieve the state transfer in an i.i.d. setting, assuming that
H(A|B) is nonnegative.
Like the mother protocol, this state merging protocol can be viewed as a (partially)
quantum version of the Slepian-Wolf protocol for merging classical correlations. In the
classical setting, H(X|Y ) quantifies Bob’s remaining ignorance about Alice’s information
X when Bob knows only Y ; correspondingly, Alice can reveal X to Bob by sending
H(X|Y ) bits per letter of X. Similarly, state merging provides an operational meaning
to the quantum conditional information H(A|B), as the number of qubits per copy of φ
that Alice sends to Bob to convey her share of the purification of E, assuming classical
communication is free. In this sense we may regard H(A|B) as a measure of Bob’s
remaining “ignorance” about the shared purification of E when he holds only B.
Classically, H(X|Y ) is nonnegative, and zero if and only if Bob is already certain about
XY , but quantumly H(A|B) can be negative. How can Bob have “negative uncertainty”
62 Quantum Shannon Theory
about the quantum state of AB? If H(A|B) < 0, or equivalently if I(A; E) < I(A; B),
then the mother protocol yields more quantum entanglement than the amount of quan-
tum communication it consumes. Therefore, when H(A|B) is negative (i.e. Ic (AiB) is
positive), the mother resource inequality implies the Hashing inequality, asserting that
classical communication from Alice to Bob not only achieves state transfer, but also
distills −H(A|B) ebits of entanglement per copy of φ. These distilled ebits can be de-
posited in the entanglement bank, to be withdrawn as needed in future rounds of state
merging, thus reducing the quantum communication cost of those future rounds. Bob’s
“negative uncertainty” today reduces the quantum communication cost of tasks to be
performed tomorrow.
A @@ B A2 @
@ B2 B1
@@ @
@
@ @
@ @@
@ @ =⇒ @
@ @
@ E @
@ E @ @
In the hashing protocol, applied to n copies of φABE , the entanglement across this cut
at the beginning of the protocol is nH(B). By the end of the protocol E n has decoupled
(n) (n)
from A2 and has entanglement nH(E) with B1 , ignoring o(n) corrections. If k ebits
shared by Alice and Bob are distilled, the final entanglement across the AE-B cut is
k
nH(E) + k ≤ nH(B) =⇒ ≤ H(B) − H(E) = −H(A|B). (10.321)
n
This inequality holds because LOCC cannot increase the entanglement across the cut,
and implies that no more than −H(A|B) ebits of entanglement per copy of φABE can
be distilled in the hashing protocol, asymptotically.
(n)
On the other hand, if H(A|B) is positive, at the conclusion of state merging B1 is
n
entangled with E , and the entanglement across the AE-B cut is at least nH(E). To
10.8 Quantum Protocols 63
achieve this increase in entanglement, the number of qubits sent from Alice to Bob must
be at least
k
k ≥ nH(E) − nH(B) =⇒ ≥ H(E) − H(B) = H(A|B). (10.322)
n
This inequality holds because the entanglement across the cut cannot increase by more
than the quantum communication across the cut, and implies that at least H(A|B)
qubits must be sent per copy of φABE to achieve state merging.
To summarize, we have proven strong subadditivity, not by the traditional route of
sophisticated matrix analysis, but via a less direct method. This proof is built on two
cornerstones of quantum information theory — the decoupling principle and the theory
of typical subspaces — which are essential ingredients in the proof of the mother resource
inequality.
the state of the agent’s memory, which captures the agent’s ignorance before erasure
and therefore also the thermodynamic cost of erasing. Thus the minimal work needed
to erase system A should be expressed as
W (A|O) = H(A|O)kT ln 2, (10.323)
where O is the memory of the observer who performs the erasure, and H(A|O) quantifies
that observer’s ignorance about the state of A.
But what if A and O are quantum systems? We know that if A and O are entangled,
then the conditional entropy H(A|O) can be negative. Does that mean we can erase A
while extracting work rather than doing work?
Yes, we can! Suppose for example that A and O are qubits and their initial state is
maximally entangled. By controlling the contact between AO and the heat bath, the
observer can extract work W = 2kT log 2 while transforming AO to a maximally mixed
state, using the same work extraction protocol as described above. Then she can do work
W = kT log 2 to return A to the state |0i. The net effect is to erase A while extracting
work W = kT log 2, satisfying the equality eq.(10.323).
To appreciate why this trick works, we should consider the joint state of AO rather
than the state of A alone. Although the marginal state of A is mixed at the beginning
of the protocol and pure at the end, the state of AO is pure at the beginning and mixed
at the end. Positive work is extracted by sacrificing the purity of AO.
To generalize this idea, let’s consider n 1 copies of the state ρAO of system A and
memory O. Our goal is to map the n copies of A to the erased state |000 . . .0i while
using or extracting the optimal amount of work. In fact, the optimal work per copy is
given by eq.(10.323) in the n → ∞ limit.
To achieve this asymptotic work per copy, the observer first projects An onto its
typical subspace, succeeding with probability 1 − o(1). A unitary transformation then
rotates the typical subspace to a subsystem Ā containing n(H(A) + o(1)) qubits, while
“erasing” the complementary qubits as in eq.(10.144). Now it only remains to erase Ā.
The mother resource inequality ensures that we may decompose Ā into subsystems
A1 A2 such that A2 contains n2 (I(A; O) − o(1)) qubits and is nearly maximally entangled
with a subsystem of On . What is important for the erasure protocol is that we may
identify a subsystem of ĀOn containing n (I(A; O) − o(1)) qubits which is only distance
o(1) away from a pure state. By controlling the contact between this subsystem and the
heat bath, we may extract work W = n(I(A; O) − o(1))kT log 2 while transforming the
subsystem to a maximally mixed state. We then proceed to erase Ā, expending work
kT log |Ā| = n(H(A) + o(1))kT log 2. The net work cost of the erasure, per copy of ρAO ,
is therefore
W = (H(A) − I(A; O) + o(1)) kT log 2 = (H(A|O) + o(1)) kT log 2, (10.324)
and the erasure succeeds with probability 1 − o(1). A notable feature of the protocol is
that only the subsystem of On which is entangled with A2 is affected. Any correlation
of the memory O with other systems remains intact, and can be exploited in the future
to reduce the cost of erasure of those other systems.
As does the state merging protocol, this erasure protocol provides an operational
interpretation of strong subadditivity. For positive H(A|O), H(A|O) ≥ H(A|OO0) means
that it is no harder to erase A if the observer has access to both O and O0 than if she
has access to O alone. For negative H(A|O), −H(A|OO0) ≥ −H(A|O) means that we
can extract at least as much work from AOO0 as from its subsystem AO.
10.9 The Decoupling Inequality 65
To carry out this protocol and extract the optimal amount of work while erasing A,
we need to know which subsystem of On provides the purification of A2 . The decou-
pling argument ensures that this subsystem exists, but does not provide a constructive
method for finding it, and therefore no concrete protocol for erasing at optimal cost.
This quandary is characteristic of Shannon theory; for example, Shannon’s noisy channel
coding theorem ensures the existence of a code that achieves the channel capacity, but
does not provide any explicit code construction.
A1
A U A2
σ AE
@
@ E
EU [1] = 1, EU [f (U )] = EU [f (V U )] = EU [f (U V )] . (10.325)
66 Quantum Shannon Theory
These conditions uniquely define EU [f (U )], which is sometimes described as the integral
over the unitary group using the invariant measure or Haar measure on the group.
If we apply the unitary transformation U to A, and then discard A1 , the marginal
state of A2 E is
σ A2 E (U ) := trA1 (U A ⊗ I E ) σ AE U †A ⊗ I E . (10.326)
The decoupling inequality expresses how close (in the L1 norm) σ A2 E is to a product
state when we average over U :
2 |A2 | · |E|
EU kσA2 E (U ) − σ max tr σ 2AE ,
A2 ⊗ σ E k1 ≤ (10.327)
|A1 |
where
1
σ max
A2 := I (10.328)
|A2 |
denotes the maximally mixed state on A2 , and σ E is the marginal state trA σ AE .
This inequality has interesting consequences even in the case where there is no system
E at all and σ A is pure, where it becomes
s s
|A 2 | |A2 |
EU kσ A2 (U ) − σ max tr σ2A =
A2 k1 ≤ . (10.329)
|A1 | |A1 |
Eq.(10.329) implies that, for a randomly chosen pure state of the bipartite system
A = A1 A2 , where |A2 |/|A1 | 1, the density operator on A2 is very nearly maxi-
mally mixed with high probability. One can likewise show that the expectation value
of the entanglement entropy of A1 A2 is very close to the maximal value: E [H(A2 )] ≥
log2 |A2 | − |A2 |/ (2|A1 | ln 2). Thus, if for example A2 is 50 qubits and A1 is 100 qubits,
the typical entropy deviates from maximal by only about 2−50 ≈ 10−15 .
because
2 1
tr σ max
A2 = ; (10.334)
|A2 |
1 1
EU tr σ 2A2 E (U ) ≤ tr σ 2E + tr σ2AE .
(10.335)
|A2 | |A1 |
We can facilitate the computation of EU tr σ 2A2 E (U ) using a clever trick. For any
bipartite system BC, imagine introducing a second copy B 0 C 0 of the system. Then
(Exercise 10.17)
trC σ 2C = trBCB 0C 0 (I BB 0 ⊗ S CC 0 ) (σ BC ⊗ σ B 0 C 0 ) ,
(10.336)
In particular, then,
trA2 E σ 2A2 E (U )
= trAEA0 E 0 I A1 A01 ⊗ S A2 A02 ⊗ S EE 0 (σ AE (U ) ⊗ σA0 E 0 (U ))
= trAEA0 E 0 (M AA0 (U ) ⊗ S EE 0 ) (σ AE ⊗ σA0 E 0 ) , (10.338)
where
M AA0 (U ) = U †A ⊗ U †A0 I A1 A01 ⊗ S A2 A02 (U A ⊗ U A0 ) . (10.339)
where
1 − 1/|A1|2
1 1
cI = ≤ ,
|A2 | 1 − 1/|A|2 |A2 |
1 − 1/|A2|2
1 1
cS = 2
≤ . (10.341)
|A1 | 1 − 1/|A| |A1 |
EU trA2 E σ 2A2 E (U )
1 1
≤ trAEA0 E 0 I AA0 + S AA0 ⊗ S EE 0 (σ AE ⊗ σ A0 E 0 )
|A2 | |A1 |
1 1
tr σ 2E + σ 2AE ,
= (10.342)
|A2 | |A1 |
In order to transfer the purification of E n to Bob, Alice first projects An onto its
typical subspace, succeeding with probability 1 − o(1), and compresses the result. She
then divides her compressed system Ā into two parts Ā1 Ā2 , and applies a random unitary
to Ā before sending Ā1 to Bob. Quantum state transfer is achieved if Ā2 decouples from
Ē.
Because φ0ĀB̄ Ē is close to φ⊗nABE , we can analyze whether the protocol is successful
by supposing the initial state is φ0ĀB̄ Ē rather than φ⊗n ABE . According to the decoupling
inequality
2 |Ā| · |Ē|
EU kσĀ2 Ē (U ) − σ max tr σ 2ĀĒ
Ā2 ⊗ σ k
Ē 1 ≤ 2
|Ā1 |
1 1
2n(H(A)+H(E)+o(1)) tr σ 2ĀĒ = 2n(H(A)+H(E)−H(B)+o(1));
= (10.344)
|Ā1 |2 |Ā1 |2
here we have used properties of typical subspaces in the second line, as well as the
property that σ ĀĒ and σ B̄ have the same nonzero eigenvalues, because φ0ĀB̄ Ē is pure.
Eq.(10.344) bounds the L1 distance of σ Ā2 Ē (U ) from a product state when averaged
over all unitaries, and therefore suffices to ensure the existence of at least one unitary
transformation U such that the L1 distance is bounded above by the right-hand side.
Therefore, by choosing this U , Alice can decouple Ā2 from E n to o(1) accuracy in the
L1 norm by sending to Bob
n n
log2 |Ā1 | = (H(A) + H(E) − H(B) + o(1)) = (I(A; E) + o(1)) (10.345)
2 2
qubits, suitably chosen from the (compressed) typical subspace of An . Alice retains
|Ā2 | = nH(A) − n2 I(A; E) − o(n) qubits of her compressed system, which are nearly
maximally mixed and uncorrelated with E n; hence at the end of the protocol she shares
with Bob this many qubit pairs, which have high fidelity with a maximally entangled
state. Since φABE is pure, and therefore H(A) = 21 (I(A; E) + I(A; B)), we conclude that
Alice and Bob distill n2 I(A; B) − o(n) ebits of entanglement, thus proving the mother
resource inequality.
We can check that this conclusion is plausible using a crude counting argument.
Disregarding the o(n) corrections in the exponent, the state φ⊗n ABE is nearly maximally
mixed on a typical subspace of An E n with dimension 2nH(AE), i.e. the marginal state
on ĀĒ can be realized as a nearly uniform ensemble of this many mutually orthogonal
states. If Ā1 is randomly chosen and sufficiently small, we expect that, for each state in
this ensemble, Ā1 is nearly maximally entangled with a subsystem of the much larger
system Ā2 Ē, and that the marginal states on Ā2 Ē arising from different states in the ĀĒ
ensemble have a small overlap. Therefore, we anticipate that tracing out Ā1 yields a state
on Ā2 Ē which is nearly maximally mixed on a subspace with dimension |Ā1 |2nH(AE).
Approximate decoupling occurs when this state attains full rank on Ā2 Ē, since in that
10.9 The Decoupling Inequality 69
case it is close to maximally mixed on Ā2 Ē and therefore close to a product state on its
support. The state transfer succeeds, therefore, provided
|Ā| · |Ē| 2n(H(A)+H(E))
|Ā1 |2nH(AE) ≈ |Ā2 | · |Ē| = ≈
|Ā1 | |Ā1 |
=⇒ |Ā1 |2 ≈ 2nI(A;E) , (10.346)
as in eq.(10.345).
Our derivation of the mother resource inequality, based on random coding, does not
exhibit any concrete protocol that achieves the claimed rate, nor does it guarantee the
existence of any protocol in which the required quantum processing can be executed ef-
ficiently. Concerning the latter point, it is notable that our derivation of the decoupling
inequality applies not just to the expectation value averaged uniformly over the unitary
group, but also to any average over unitary transformations which satisfies eq.(10.340).
In fact, this identity is satisfied by a uniform average over the Clifford group, which
means that there is some Clifford transformation on Ā which achieves the rates speci-
fied in the mother resource inequality. Any Clifford transformation on n qubits can be
reached by a circuit with O(n2 ) gates. Since it is also known that Schumacher com-
pression can be achieved by a polynomial-time quantum computation, Alice’s encoding
operation can be carried out efficiently.
In fact, after compressing, Alice encodes the quantum information she sends to Bob
using a stabilizer code (with Clifford encoder U ), and Bob’s task, after receiving Ā1 is
to correct the erasure of Ā2 . Bob can replace each erased qubit by the standard state |0i,
and then measure the code’s check operators. With high probability, there is a unique
Pauli operator acting on the erased qubits that restores Bob’s state to the code space,
and the recovery operation can be efficiently computed using linear algebra. Hence,
Bob’s part of the mother protocol, like Alice’s, can be executed efficiently.
B1
@ D Ã2
A1 @
V A N B2
A2
We would like to know how much shared entanglement suffices for Alice and Bob to
succeed.
This question can be answered using the decoupling inequality. First we introduce
a reference system R0 which is maximally entangled with A2 ; then Bob succeeds if his
70 Quantum Shannon Theory
decoder can extract the purification of R0 . Because the system R0 B1 is maximally entan-
gled with A1 A2 , the encoding unitary V acting on A1 A2 can be replaced by its transpose
V T acting on R0 B1 . We may also replace N by its Stinespring dilation U A1 A2 →B2 E , so
that the extended output state φ of R0 B1 B2 E is pure:
R0 R0 VT
B1 B1
=
@ @ @ @
@ A1 @ B2 @ A1 @ B2
A2 @ V U A2 @ U
@ E @ E
i.i.d. version.
In the i.i.d. version of the father protocol, Alice and Bob achieve high fidelity
entanglement-assisted quantum communication through n uses of the quantum chan-
nel N A→B . The code they use for this purpose can be described in the following way:
Consider an input density operator ρA of system A, which is purified by a reference
system R. Sending the purified input state ψRA through U A→BE , the isometric dilation
⊗n
of N A→B , generates the tripartite pure state φRBE . Evidently applying U A→BE to
⊗n ⊗n
ψRA produces φRBE .
But now suppose that before transmitting the state to Bob, Alice projects An onto
its typical subspace Ā, succeeding with probability 1 − o(1) in preparing a state of ĀR̄
that is nearly maximally entangled, where R̄ is the typical subspace of Rn . Imagine
dividing R̄ into a randomly chosen subsystem B1 and its complementary subsystem R0 ;
then there is a corresponding decomposition of Ā = A1 A2 such that A1 is very nearly
maximally entangled with B1 and A2 is very nearly maximally entangled with R0 .
If we interpret B1 as Bob’s half of an entangled state of A1 B1 shared with Alice,
this becomes the setting where the one-shot father protocol applies, if we ignore the
10.9 The Decoupling Inequality 71
|0i we need to renormalize the state by multiplying by |R1 |, while on the other hand the
projection suppresses the expected distance squared from a product state by a factor
|R1 |.
In the i.i.d. setting where the noisy channel is used n times, we consider φ⊗n RBE , and
project onto the jointly typical subspaces R̄, B̄, Ē of Rn , B n , E n respectively, succeeding
with high probability. We choose a code by projecting R̄ onto a random subspace with
dimension |R2 |. Then, the right-hand side of eq.(10.352) becomes
and since the inequality holds when we average uniformly over V , it surely holds for
some particular V . That unitary defines a code which achieves decoupling and has the
rate
1
log2 |R2 | = H(E) − H(B) − o(1) = Ic (RiB) − o(1). (10.354)
n
Hence the coherent information is an achievable rate for high-fidelity quantum commu-
nication over the noisy channel.
system B 0 decouples from Alice’s reference system RA . Let’s suppose that the qubits
emitted in the Hawking radiation are chosen randomly; that is, A0 is a Haar-random
k0 -qubit subsystem of the n-qubit system AB, as depicted here:
RB
Bob
@
@
B@@ A0
U
B0
A
Alice
@
@ RA
The double lines indicate the very large systems B and B 0 , and single lines the smaller
systems A and A0 . Because the radiated qubits are random, we can determine whether
RA B 0 decouples using the decoupling inequality, which for this case becomes
s
|ABRA | 2
EU kσ B 0RA (U ) − σ max
B 0 ⊗ σ R k 1 ≤ tr σ ABRA . (10.355)
A
|A0 |2
Because
the
state of ARA is pure, and B is maximally entangled with RB , we have
tr σ ABRA = 1/|B|, and therefore the Haar-averaged L1 distance of σ B 0 RA from a
2
Thus, if Bob waits for only k0 = k + c qubits of Hawking radiation to be emitted after
Alice tosses in her k qubits, Bob can decode her qubits with excellent fidelity F ≥ 1−2−c .
Alice made a serious mistake. Rather than waiting for Ω(n) qubits to emerge from
the black hole, Bob can already decode Alice’s secret quite well when he has collected
just a few more than k qubits. And Bob is an excellent physicist, who knows enough
about black hole dynamics to infer the encoding unitary transformation U , information
he uses to find the right decoding map.
We could describe the conclusion, more prosaically, by saying that the random uni-
tary U applied to AB encodes a good quantum error-correcting code, which achieves
high-fidelity entanglement-assisted transmission of quantum information though an era-
sure channel with a high erasure probability. Of the n input qubits, only k0 randomly
selected qubits are received by Bob; the rest remain inside the black hole and hence are
inaccessible. The input qubits, then, are erased with probability p = (n − k0 )/n, while
nearly error-free qubits are recovered from the input qubits at a rate
k k0 − k
R= =1−p− ; (10.357)
n n
in the limit n → ∞ with c = k0 − k fixed, this rate approaches 1 − p, the entanglement-
assisted quantum capacity of the erasure channel.
So far, we’ve assumed that the emitted system A0 is a randomly selected subsystem
of AB. That won’t be true for a real black hole. However, it is believed that the in-
ternal dynamics of actual black holes mixes quantum information quite rapidly (the
fast scrambling conjecture). For a black hole with temperature T , it takes time of order
74 Quantum Shannon Theory
~/kT for each qubit to be emitted in the Hawking radiation, and a time longer by only
a factor of log n for the dynamics to mix the black hole degrees of freedom sufficiently
for our decoupling estimate to hold with reasonable accuracy. For a solar mass black
hole, Alice’s qubits are revealed just a few milliseconds after she deposits them, much
faster than the 1067 years she had hoped for! Because Bob holds the system RB which
purifies B, and because he knows the right decoding map to apply to A0 RB , the black
hole behaves like an information mirror — Alice’s qubits bounce right back!
If Alice is more careful, she will dump her qubits into a young black hole instead. If
we assume that the initial black hole B is in a pure state, then σABRA is also pure, and
the Haar-averaged L1 distance of σB 0 RA from a product state is bounded above by
s r
|ABRA | 2n+k 1
= 0 = (10.358)
|A0 |2 22k 2c
after
1
k0 = (n + k) + c (10.359)
2
qubits are emitted. In this case, Bob needs to wait a long time, until more than half of
the qubits in AB are radiated away. Once Bob has acquired k + 2c more qubits than
the number still residing in the black hole, he is empowered to decode Alice’s k qubits
with fidelity F ≥ 1 − 2−c . In fact, there is nothing special about Alice’s subsystem A;
by adjusting his decoding map appropriately, Bob can decode any k qubits he chooses
from among the n qubits in the initial black hole AB.
There is far more to learn about quantum information processing by black holes, an
active topic of current research, but we will not delve further into this fascinating topic
here. We can be confident, though, that the tools and concepts of quantum informa-
tion theory discussed in this book will be helpful for addressing the many unresolved
mysteries of quantum gravity, as well as many other open questions in the physical
sciences.
10.10 Summary
Shannon entropy and classical data compression. The Shannon entropy of an
ensemble X = {x, p(x)} is H(X) ≡ h− log p(x)i; it quantifies the compressibility of
classical information. A message n letters long, where each letter is drawn independently
from X, can be compressed to H(X) bits per letter (and no further), yet can still be
decoded with arbitrarily good accuracy as n → ∞.
Conditional entropy and information merging. The conditional entropy
H(X|Y ) = H(XY ) − H(Y ) quantifies how much the information source X can be com-
pressed when Y is known. If n letters are drawn from XY , where Alice holds X and Bob
holds Y , Alice can convey X to Bob by sending H(X|Y ) bits per letter, asymptotically
as n → ∞.
Mutual information and classical channel capacity. The mutual information
I(X; Y ) = H(X) + H(Y ) − H(XY ) quantifies how information sources X and Y are
correlated; when we learn the value of y we acquire (on the average) I(X; Y ) bits of
information about x, and vice versa. The capacity of a memoryless noisy classical com-
munication channel is C = maxX I(X; Y ). This is the highest number of bits per letter
10.10 Summary 75
that can be transmitted through n uses of the channel, using the best possible code,
with negligible error probability as n → ∞.
Von Neumann entropy and quantum data compression. The Von Neumann
entropy of a density operator ρ is
H(ρ) = −trρ log ρ; (10.360)
it quantifies the compressibility of an ensemble of pure quantum states. A mes-
sage n letters long, where each letter is drawn independently from the ensemble
{|ϕ(x)i, p(x)}, can be compressed to H(ρ) qubits per letter (and no further) where
P
ρ = X p(x)|ϕ(x)ihϕ(x)|, yet can still be decoded with arbitrarily good fidelity as
n → ∞.
Entanglement concentration and dilution. The entanglement E of a bipartite
pure state |ψiAB is E = H(ρA ) where ρA = trB (|ψihψ|). With local operations and
classical communication, we can prepare n copies of |ψiAB from nE Bell pairs (but not
from fewer), and we can distill nE Bell pairs (but not more) from n copies of |ψiAB ,
asymptotically as n → ∞.
Accessible information. The Holevo chi of an ensemble E = {ρ(x), p(x)} of quan-
tum states is !
X X
χ(E) = H p(x)ρ(x) − p(x)H(ρ(x)). (10.361)
x x
This is the highest number of classical bits per letter that can be transmitted through
n uses of the quantum channel, with negligible error probability as n → ∞, assuming
that each codeword is a product state.
Decoupling and quantum communication. In a tripartite pure state φRBE , we
say that systems R and E decouple if the marginal density operator of RE is a product
state, in which case R is purified by a subsystem of B. A quantum state transmitted
through a noisy quantum channel N A→B (with isometric dilation U A→BE ) can be accu-
rately decoded if a reference system R which purifies channel’s input A nearly decouples
from the channel’s environment E.
Father and mother protocols. The father and mother resource inequalities specify
achievable rates for entanglement-assisted quantum communication and quantum state
transfer, respectively. Both follow from the decoupling inequality, which establishes a
sufficient condition for approximate decoupling in a tripartite mixed state. By com-
bining the father and mother protocols with superdense coding and teleportation, we
can derive achievable rates for other protocols, including entanglement-assisted classical
communication, quantum communication, entanglement distillation, and quantum state
merging.
Homage to Ben Schumacher:
Ben.
He rocks.
76 Quantum Shannon Theory
I remember
When
He showed me how to fit
A qubit
In a small box.
Or does it feel
At all, and if it does
Would I squeal
Or be just as I was?
If not undone
I’d become as I’d begun
And write a memorandum
On being random.
Had it felt like a belt
Of rum?
I’d crawl
To Ben again.
And call,
Put down your pen!
Don’t stall!
Make me small!
Christandl and Winter [18], and its monogamy discussed by Koashi and Winter [19].
Brandão, Christandl, and Yard [20] showed that squashed entanglement is positive for
any nonseparable bipartite state. Doherty, Parrilo, and Spedalieri [21] showed that every
nonseparable bipartite state fails to be k-extendable for some finite k.
The Holevo bound was derived in [22]. Peres-Wootters coding was discussed in [23].
The product-state capacity formula was derived by Holevo [24] and by Schumacher
and Westmoreland [25]. Hastings [26] showed that Holevo chi can be superadditive.
Horodecki, Shor, and Ruskai [27] introduced entanglement-breaking channels, and ad-
ditivity of Holevo chi for these channels was shown by Shor [28].
Necessary and sufficient conditions for quantum error correction were formulated in
terms of the decoupling principle by Schumacher and Nielsen [29]; that (regularized)
coherent information is an upper bound on quantum capacity was shown by Schumacher
[30], Schumacher and Nielsen [29], and Barnum et al. [31]. That coherent information
is an achievable rate for quantum communication was conjectured by Lloyd [32] and by
Schumacher [30], then proven by Shor [33] and by Devetak [34]. Devetak and Winter
[35] showed it is also an achievable rate for entanglement distillation. The quantum Fano
inequality was derived by Schumacher [30].
Approximate decoupling was analyzed by Schumacher and Westmoreland [36], and
used to prove capacity theorems by Devetak [34], by Horodecki et al. [37], by Hayden
et al. [38], and by Abeyesinghe et al. [39]. The entropy of Haar-random subsystems had
been discussed earlier, by Lubkin [40], Lloyd and Pagels [41], and Page [42]. Devetak,
Harrow, and Winter [43, 44] introduced the mother and father protocols and their de-
scendants. Devatak and Shor [45] introduced degradable quantum channels and proved
that coherent information is additive for these channels. Bennett et al. [46, 47] found
the single-letter formula for entanglement-assisted classical capacity. Superadditivity of
coherent information was discovered by Shor and Smolin [48] and by DiVincenzo et
al. [49]. Smith and Yard [50] found extreme examples of superadditivity, in which two
zero-capacity channels have nonzero capacity when used jointly. The achievable rate for
state merging was derived by Horodecki et al. [37], and used by them to prove strong
subadditivity of Von Neumann entropy.
Decoupling was applied to Landuaer’s principle by Renner et al. [51], and to black
holes by Hayden and Preskill [52]. The fast scrambling conjecture was proposed by
Sekino and Susskind [53].
Exercises
10.1 Positivity of quantum relative entropy
a) Show that ln x ≤ x − 1 for all positive real x, with equality iff x = 1.
b) The (classical) relative entropy of a probability distribution {p(x)} relative to
{q(x)} is defined as
X
D(p k q) ≡ p(x) (log p(x) − log q(x)) . (10.363)
x
Show that
D(p k q) ≥ 0 , (10.364)
with equality iff the probability distributions are identical. Hint: Apply the
inequality from (a) to ln (q(x)/p(x)).
78 Quantum Shannon Theory
Let {pi} denote the eigenvalues of ρ and {qa } denote the eigenvalues of σ.
Show that
!
X X
D(ρ k σ) = pi log pi − Dia log qa , (10.366)
i a
where Dia is a doubly stochastic matrix. Express Dia in terms of the eigen-
states of ρ and σ. (A matrix is doubly stochastic if its entries are nonneg-
ative real numbers, where each row and each column sums to one.)
d) Show that if Dia is doubly stochastic, then (for each i)
!
X X
log Dia qa ≥ Dia log qa , (10.367)
a a
Hint: Consider
X
ρAB = px (ρx )A ⊗ (|xihx|)B , (10.371)
x
in terms of the Bell basis of maximally entangled states {|φ± i, |ψ ±i}, and
compute H(ρ).
b) For the three vectors |Φai, a = 1, 2, 3, construct the “pretty good measurement”
defined in eq.(10.227). (Again, expand the |Φai’s in the Bell basis.) In this
case, the PGM is an orthogonal measurement. Express the elements of the
PGM basis in terms of the Bell basis.
c) Compute the mutual information of the PGM outcome and the preparation.
10.5 Separability and majorization
The hallmark of entanglement is that in an entangled state the whole is less
random than its parts. But in a separable state the correlations are essentially
classical and so are expected to adhere to the classical principle that the parts
are less disordered than the whole. The objective of this problem is to make this
expectation precise by showing that if the bipartite (mixed) state ρAB is separable,
then
λ(ρAB ) ≺ λ(ρA ) , λ(ρAB ) ≺ λ(ρB ) . (10.378)
Here λ(ρ) denotes the vector of eigenvalues of ρ, and ≺ denotes majorization.
A separable state can be realized as an ensemble of pure product states, so that
if ρAB is separable, it may be expressed as
X
ρAB = pa |ψaihψa| ⊗ |ϕa ihϕa| . (10.379)
a
80 Quantum Shannon Theory
where {|ej i} denotes an orthonormal basis for AB; then by the HJW theorem,
there is a unitary matrix V such that
√ X √
rj |ej i = Vja pa |ψai ⊗ |ϕa i . (10.381)
a
here {|fµ i} denotes an orthonormal basis for A, and by the HJW theorem, there
is a unitary matrix U such that
√ X √
pa |ψai = Uaµ sµ |fµ i . (10.383)
µ
That is, you must check that the entries of Djµ are real and nonnegative, and
P P
that j Djµ = 1 = µ Djµ . Thus we conclude that λ(ρAB ) ≺ λ(ρA ). Just by
interchanging A and B, the same argument also shows that λ(ρAB ) ≺ λ(ρB ).
Remark: Note that it follows from the Schur concavity of Shannon entropy that,
if ρAB is separable, then the von Neumann entropy has the properties H(AB) ≥
H(A) and H(AB) ≥ H(B). Thus, for separable states, conditional entropy is non-
negative: H(A|B) = H(AB) − H(B) ≥ 0 and H(B|A) = H(AB) − H(A) ≥ 0. In
contrast, if H(A|B) is negative, then according to the hashing inequality the state
of AB has positive distillable entanglement −H(A|B), and therefore is surely not
separable.
10.6 Additivity of squashed entanglement
Suppose that Alice holds systems A, A0 and Bob holds systems B, B 0 . How is the
entanglement of AA0 with BB 0 related to the entanglement of A with B and A0 with
B 0 ? In this problem we will show that the squashed entanglement is superadditive,
a) Use the chain rule for mutual information eq.(10.196) and eq.(10.197) and the
nonnegativity of quantum conditional mutual information to show that
b) Show that for any extension ρABC ⊗ ρA0 B 0C 0 of the product state ρAB ⊗ ρA0 B 0 ,
we have
I(AA0 ; BB 0 |CC 0 ) ≤ I(A; B|C) + I(A0 ; B 0 |C 0 ). (10.388)
Conclude that
Esq (ρAB ⊗ ρA0 B 0 ) ≤ Esq (ρAB ) + Esq (ρA0 B 0 ), (10.389)
which, when combined with eq.(10.385), implies eq.(10.386).
10.7 The first law of Von Neumann entropy
Writing the density operator in terms of its modular Hamiltonian K as in §10.2.6,
e−K
ρ= , (10.390)
tr (e−K )
consider how the entropy S(ρ) = −tr (ρ ln ρ) changes when the density operator is
perturbed slightly:
ρ → ρ0 = ρ + δρ. (10.391)
Since ρ and ρ0 are both normalized density operators, we have tr (δρ) = 0. Show
that
S(ρ0 ) − S(ρ) = tr ρ0 K − tr (ρK) + O (δρ)2 ;
(10.392)
that is,
δS = δhKi (10.393)
to first order in the small change in ρ. This statement generalizes the first law
of thermodynamics; for the case of a thermal density operator with K = T −1 H
(where H is the Hamiltonian and T is the temperature), it becomes the more
familiar statement
δE = δhHi = T δS. (10.394)
10.8 Information gain for a quantum state drawn from the uniform ensemble
Suppose Alice prepares a quantum state drawn from the ensemble {ρ(x), p(x)}
and Bob performs a measurement {E(y)} yielding outcome y with probability
p(y|x) = tr (E(y)ρ(x)). As noted in §10.6.1, Bob’s information gain about Alice’s
preparation is the mutual information I(X; Y ) = H(X) − H(X|Y ). If x is a con-
tinuous variable, while y is discrete, it is more convenient to use the symmetry of
mutual information to write I(X; Y ) = H(Y ) − H(Y |X), where
XZ
H(Y |X) = dx · p(x) · p(y|x) · log p(y|x); (10.395)
y
here p(x) is a probability density (that is, p(x)dx is the probability for x to lie in
the interval [x, x + dx]).
For example, suppose that Alice prepares an arbitrary pure state |ϕi chosen
from the uniform ensemble in a d-dimensional Hilbert space, and Bob performs an
orthogonal measurement projecting onto the basis {|ey i}, hoping to learn something
about what Alice prepared. Then Bob obtains outcome y with probability
p(y|θ) = |hey |ϕi|2 ≡ cos2 θ (10.396)
82 Quantum Shannon Theory
where θ is the angle between |ϕi and |ey i. Because Alice’s ensemble is uniform,
Bob’s outcomes are also uniformly distributed; hence H(Y ) = log d. Furthermore,
the measurement outcome y reveals only information about θ; Bob learns nothing
else about |ϕi. Therefore, eq.(10.395) implies that the information gain may be
expressed as
Z
I(X; Y ) = log d − d dθ · p(θ) · cos2 θ · log cos2 θ. (10.397)
Here p(θ)dθ is the probability density for the vector |ϕi to point in a direction
making angle θ with the axis |ey i, where 0 ≤ θ ≤ π/2.
a) Show that
d−2
p(θ) · dθ = −(d − 1) 1 − cos2 θ · d cos2 θ.
(10.398)
Hint: Choose a basis in which the fixed axis |ey i is
|ey i = (1, ~0) (10.399)
and write
|ϕi = (eiφ cos θ, ψ ⊥ ), (10.400)
where θ ∈ [0, π/2], and |ψ ⊥ i denotes a complex (d−1)-component vector
with length sin θ. Now note that the phase φ resides on a circle of radius cos θ
(and hence circumference 2π cos θ), while |ψ ⊥ i lies on a sphere of radius sin θ
(thus the volume of the sphere, up to a multiplicative numerical constant,
is sin2d−3 θ).
b) Now evaluate the integral eq. (10.397) to show that the information gain from
the measurement, in nats, is
1 1 1
I(X; Y ) = ln d − + +···+ . (10.401)
2 3 d
(Information is expressed in nats if logarithms are natural logarithms; I
in nats is related to I in bits by Ibits = Inats/ ln 2.) Hint: To evaluate the
integral
Z 1
dx(1 − x)px ln x , (10.402)
0
observe that
d s
x ln x = x , (10.403)
ds s=1
R1
and then calculate 0 dx(1 − x)pxs by integrating by parts repeatedly.
c) Show that in the limit of large d, the information gain, in bits, approaches
1−γ
Id=∞ = = .60995 . . ., (10.404)
ln 2
where γ = .57721 . . . is Euler’s constant.
Our computed value of H(Y |X) may be interpreted in another way: Suppose
we fix an orthogonal measurement, choose a typical state, and perform the mea-
surement repeatedly on that chosen state. Then the measurement outcomes will
not be uniformly distributed. Instead the entropy of the outcomes will fall short of
maximal by .60995 bits, in the limit of large Hilbert space dimension.
Exercises 83
Note that H(E|XY ) = 0 because e is determined when x and y are know, and
that H(E|Y ) ≤ H(E) because mutual information is nonnegative. Therefore,
b) Noting that
F = hψ|ρ|ψi = 1 − ε. (10.410)
Show that
b) As in §10.7.2, suppose that the noisy channel N A→B acts on the pure state ψRA ,
and is followed by the decoding map DB→C . Show that
where
Therefore, if the decoder’s output (the state of RC) is almost pure, then the
coherent information of the channel N comes close to matching its input
entropy. Hint: Use the data processing inequality Ic (R iC)σ ≤ Ic (R iB)ρ
and the subadditivity of von Neumann entropy. It is convenient to consider
the joint pure state of the reference system, the output, and environments
of the dilations of N and D.
c) Suppose that the decoding map recovers the channel input with high fidelity,
Show that
b) Suppose that in the GHZ state Alice measures the Pauli operator X, gets the
outcome +1 and broadcasts her outcome to Bob and Eve. What state do
Bob and Eve then share? What if Alice gets the outcome −1 instead?
Exercises 85
c) Suppose that Alice, Bob, and Eve share just one copy of the GHZ state φABE .
Find a protocol such that, after one unit of coherent classical communication
+ + +
√ the shared state becomes |φ iAB ⊗|φ iBE , where |φ i =
from Alice to Bob,
(|00i + |11i) / 2 is a maximally entangled Bell pair.
d) Now suppose that Alice, Bob, and Eve start out with two copies of the GHZ
state, and suppose that Alice and Bob can borrow an ebit of entanglement,
which will be repaid later, to catalyze the resource conversion. Use coher-
ent superdense coding to construct a protocol that achieves the (catalytic)
conversion eq. (10.417) perfectly.
10.12 Degradability of amplitude damping and erasure
A→B (p) discussed in §3.4.3 has the
The qubit amplitude damping channel Na.d.
dilation U A→BE such that
a qubit in its “ground state” |0iA is unaffected by the channel, while a qubit in
the “excited state” |1iA decays to the ground state with probability p, and the
decay process excites the environment. Note that U is invariant under interchange
of systems B and E accompanied by transformation p ↔ (1 − p). Thus the channel
A→B (p) is N A→E (1 − p).
complementary to Na.d. a.d.
A→B (p) is degradable for p ≤ 1/2. Therefore, the quantum capac-
a) Show that Na.d.
ity of the amplitude damping channel is its optimized one-shot coherent
information. Hint: It suffices to show that
A→E B→E A→B
Na.d. (1 − p) = Na.d. (q) ◦ Na.d. (p), (10.418)
where 0 ≤ q ≤ 1.
A→B
The erasure channel Nerase (p) has the dilation U A→BE such that
p √
U : |ψiA 7→ 1 − p |ψiB ⊗ |eiE + p |eiB ⊗ |ψiE ; (10.419)
Alice’s system passes either to Bob (with probability 1−p) or to Eve (with probabil-
ity p), while the other party receives the “erasure symbol” |ei, which is orthogonal
to Alice’s Hilbert space. Because U is invariant under interchange of systems B
and E accompanied by transformation p ↔ (1 − p), the channel complementary to
A→B (p) is N A→E (1 − p).
Nerase erase
A→B (p) is degradable for p ≤ 1/2. Therefore, the quantum capac-
b) Show that Nerase
ity of the amplitude damping channel is its optimized one-shot coherent
information. Hint: It suffices to show that
A→E B→E A→B
Nerase (1 − p) = Nerase (q) ◦ Nerase (p), (10.420)
where 0 ≤ q ≤ 1.
c) Show that for p ≤ 1/2 the quantum capacity of the erasure channel is
A→B
Q(Nerase (p)) = (1 − 2p) log2 d, (10.421)
Verify that this matches the standard noiseless superdense coding resource
inequality when φ is a maximally entangled state of AB.
b) By converting the entanglement achieved by the mother protocol into quantum
communication, prove the noisy teleportation resource inequality:
Verify that this matches the standard noiseless teleportation resource in-
equality when φ is a maximally entangled state of AB.
Exercises 87
where S AA0 is the swap operator, and that the symmetric and antisymmetric sub-
spaces have dimension 12 |A| (|A| + 1) and dimension 12 |A| (|A| − 1) respectively.
Even if you are not familiar with group representation theory, you might re-
gard eq.(10.428) as obvious. We may write M AA0 (U ) as a sum of two terms, one
symmetric and the other antisymmetric under the interchange of A and A0 . The
expectation of the symmetric part must be symmetric, and the expectation value
of the antisymmetric part must be antisymmetric. Furthermore, averaging over the
unitary group ensures that no symmetric state is preferred over any other.
(sym)
b) To evaluate the constant csym , multiply both sides of eq.(10.428) by ΠAA0 and
take the trace of both sides, thus finding
|A1 | + |A2 |
csym = . (10.430)
|A| + 1
(anti)
c) To evaluate the constant canti, multiply both sides of eq.(10.428)) by ΠAA0 and
take the trace of both sides, thus finding
|A1 | − |A2 |
canti = . (10.431)
|A| − 1
d) Using
1 1
cI = (csym + canti) , cS = (csym − canti) (10.432)
2 2
prove eq.(10.341).
References
Last time we discussed the HSP for finitely generated abelian groups. For a black box function f: G -> X that is
constant and distinct on the cosets of H < G, classically it takes
Now a natural question is, what if the group G is not abelian? It is shown in the homework problem (5.4) that the
query complexity is still reasonable. In the algorithm analyzed there, the coset states are not measured individually
as in the algorithm for abelian G. Rather, m coset states are generated, and then a sequence of collective
measurements is performed on the m copies. If R is an upper bound on the number of candidates for the hidden
subgroup H, then the algorithm has success probability
Quantum searching
Ph/CS 219, 11 February 2009
Quantum lower bounds
18 February 2009
Studying quantum systems using a quantum computer
Ph/CS 219, 23 February 2009
Estimating energy eigenvalues and preparing energy eigenstates
Ph/CS 219, 2 March 2009
We have argued that a quantum computer can efficiently simulate the time evolution of a quantum system with a
local Hamiltonian; i.e., it can solve the time-dependent Schroedinger equation. Another thing that physicists and
chemists want to do is solve the time-independent Schroedinger equation; i.e., compute the energy eigenvalues
of a Hamiltonian. For example, chemists say that estimating the ground state energy of a molecule to "chemical
accuracy" (about one part in a million) is valuable for predicting the properties of chemical reactions.
[Actually, it is a subtle question whether the Hamiltonians typically studied by quantum chemists can be regarded
as local. The goal is the determine the quantum state of many electrons with the positively charged nuclei held at
fixed positions. The position of the electron is really a continuous quantum variable, but we can express the
electron's state in terms of a finite set of orbitals --- the issue is whether the electron orbitals couple with one
another only in clusters of constant size as the number of electrons increases.]
In general, finding the energy eigenvalues of a local Hamiltonian seems to be a hard problem classically
How hard is it to simulate a quantum computer?
Clearly, we would like to understand more deeply the classical and quantum complexity of solving the time-
dependent and time-independent Schroedinger equations. In particular, we wish to identify cases for which the
problem seems to be hard classically and easy quantumly, for these are cases where quantum computers will find a
useful niche.
More broadly, why do we believe that quantum computers are more powerful than classical ones, and what is the
source of the quantum computer's power? Roughly speaking, it seems to be hard to simulate a quantum system
with a classical computer because the Hilbert space of the quantum computer is exponentially large in the number of
qubits n, and that exponentially large Hilbert space is needed to accommodate and describe the quantum
correlations among the qubits in a many-body quantum system. From this perspective, it seems legitimate to claim
that quantum entanglement is the source of the quantum computer's power.
2) For mixed states, simulating a quantum computation classically might be hard even if the state remains separable
(that is, unentangled) at all times during the computation --- even if the correlations among the parts of the quantum
computer are "classical" they could still be hard to simulate. You examined an example for HW problem (6.1):
estimating the trace of an exponentially large unitary matrix using just "one clean qubit".
So let's ask the question this way: for for quantum computation with pure states, if the qubits in the computer never
becomes highly entangled during the course of the computation, can we simulate the quantum computer efficiently
with a classical computer? As we'll see, the answer is yes.
Imagine that n qudits (d-dimensional systems) are arranged in a line, and consider cutting the systems into two parts
anywhere along the line: there are m qudits to the left of the cut and n-m qubits to the right of the cut. Suppose that
for any way of choosing where we cut the chain, the entanglement between the two parts is bounded above by a
constant (independent of the system size n). We could quantify the entanglement in various ways, and the
conclusion would be the same, but to keep the discussion simple let us use the Schmidt number. Recall that the
Schmidt number is the number of terms in the Schmidt expansion of a bipartite pure state --- equivalently it is the
rank of the density operator for either part.
Ph/CS 219, 4 March 2009
"Quantum Merlin-Arthur and the local Hamiltonian problem"
As we have discussed, we expect that quantum computers can compute the ground state energy of local
Hamiltonians in cases where the problem is hard classically; this may be an important application for quantum
computers. On the other hand, we believe that in some cases computing the ground state energy of a local
Hamiltonian is still hard even for quantum computers. Let us try to understand more deeply the reason for this
belief.
Physicists are often interested in translation-invariant geometrically local Hamiltonians, in which all qubits
interact in the same way with their neighbors (except for the qubits at the boundary of the sample), because such
Hamiltonians provide good models of some real materials. But Hamiltonians that are not translationally invariant
are also useful in physics (for example, when modeling a material with "disorder" due to dirt and other inperfections
in the sample. If the Hamiltonian is not translation invariant, then we can formulate an instance of an n-qubit local
Hamiltonian problem by specifying how the Hamiltonian varies from site to site in the system. Physicists sometimes
refer to such (not translationally-invariant) systems as "spin glasses".
Even in the classical case, where the variable at each site is a bit rather than a qubit, finding the ground
state energy of a spin glass to constant accuracy can be an NP-hard problem. Therefore, we don't expect classical
and quantum computers to be able to solve the problem in general (unless NP is contained in BQP, which seems
unlikely).
Let's first understand why the classical spin-glass problem can be NP-complete. Then we'll discuss the
hardness of the quantum problem. We'll see that in the quantum case finding the ground state energy of a local
Hamiltonian is QMA-complete. (Recall that QMA is the quantum analogue of NP: the class of problems such that
the solution can be verified efficiently with a quantum computer if a suitable "quantum witness" is provided.)
For the classical case, we'll recall the notion of a "reduction" of one computational problem to another (B
reduces to A if a machine that solves A can be used to solve B), and then we'll consider this sequence of
reductions:
1) Any problem in NP reduces to CIRCUIT-SAT (already discussed previously); i.e., CIRCUIT-SAT is NP-complete.
2) CIRCUIT-SAT reduces to 3-SAT (3-SAT is NP-complete).
3) 3-SAT can be formulated as the problem of finding the ground state energy of a classical 3-local Hamiltonian to
constant accuracy.
4) MAX 2-SAT is also NP-complete and can be formulated as the problem of finding the ground state energy of a
classical 2-local Hamiltonian to constant accuracy.
5) The classical 2-local Hamiltonian problem is still hard in the case where the Hamiltonian is geometrically local, in
three or even in two dimensions (cases of interest for describing real spin glasses).
(5) implies that a spin glass will not be able to relax to its ground state efficiently in any realistic physical process
(which is part of what physicists mean by the word "glass").
Language: Recall (as discussed earlier) that if f is a uniform family of Boolean functions with variable input size ,
f: {0,1}* -> {0,1}, then the set of input strings accepted by f is called a language:
L = { x in {0,1}* : f(x) =1 } .
NP: We say that a language is in NP if there is a polynomial-size uniform classical circuit family (the verifier V(x,y))
such that:
Reduction: We say that B reduces to A if there is a polynomial-size uniform classical circuit family R mapping x to
R(x) such that B accepts x if and only if A accepts R(x). This means we can hook up R to a machine that solves A
to construct a machine that solves B.
An important problem in NP is CIRCUIT-SAT. The input to the problem is a Boolean circuit C ( with an n-
bit input and G= poly(n) gates), and we are to evaluate the Boolean function f(C), where
CIRCUIT-SAT is in NP because we can simulate the circuit C. Given as a witness the value of x that C accepts,
we can verify efficiently that C(x)=1.
Now we come to a further reduction that we did not discuss previously: CIRCUIT-SAT reduces to 3-SAT,
and therefore 3-SAT, too, is an NP-complete problem (the Cook-Levin theorem).
For the SAT problem, the input is a "Boolean formula" with n variables, where each variable is a bit. The formula
is a conjunction of clauses, and the formula is true if and only if every clause is true. In the k-SAT problem, each
clause depends on at most k of the variables, where k is a constant. (In some formulations of k-SAT, each clause
is required to be a disjunction of k "literals" (variables or their negations), but that is not an important requirement,
since any formula, and in particular any k-bit formula, can be expressed in conjunctive normal form.). If f is a
Boolean formula, the SAT function is:
Now we'll show that CIRCUIT-SAT reduces to 3-SAT. For a given circuit C (the input to CIRCUIT-SAT),
how do we construct the corresponding Boolean formula R(C) (the input to 3-SAT)?
Suppose that the gates in the circuit C are chosen from the universal set (AND, OR, NOT), or any other gate set
such that each gate has at most two-input bits and one output bit. We introduce a variable for the output of each
gate, and we include in the formula R(C) a clause corresponding to each gate.
The formula R(C) has as variables the input x to the circuit C, and also additional variables corresponding to the
outputs of all gates in the circuit C. R(C) has been constructed so that an assignment that satisfies every clause in
C corresponds to a valid history of the computation of the circuit C acting on input x, where the input is accepted. If
there is an input x that is accepted by the circuit C, then there will be a satisfying assignment for the 3-SAT formula
R(C), and conversely if there is no input that C accepts, then there will be no satisfying assignment for R(C).
The key idea we have exploited to reduce CIRCUIT-SAT to 3-SAT is that the witness for 3-SAT is a valid history of
the whole computation C(x) that accepts the input x. We can check the history efficiently because the circuit C has
polynomial size and it is easy to check each of the poly(n) gates in the execution of the circuit. Later on, we will
extend this idea --- that a valid history of the computation is an efficiently checkable witness --- to the quantum
setting.
Ph/CS 219c, 24 January 2011
Last time we discussed the toric code. This is a CSS code, where the qubits are associated with the edges of an
L X L square lattice on a 2D torus (i.e., with periodic boundary conditions). The Z-type generators of the code
stabilizer are weight four, with support on the four edges making up an elementary square ("plaquette") of the
lattice, and the X-type generators are weight four with support on the four edges that meet at a site. There are two
encoded qubits, and the logical Pauli operators Z_1, Z_2 are weight L with support on a cycle winding around the
torus in the vertical and horizontal direction respectively. The logical Pauli operators X_1, X_2 are weight L with
support on a cycle of the dual lattice winding around the torus in the horizontal and veritcal direction respectively.
The code has length n=2L^2, distance L, and k=2 encoded qubits.
The code is highly degenerate. A Z-type operator can be viewed as a 1-chain on the lattice, and it commutes with
the stabilizer if the 1-chain is a cycle (has a trivial boundary). The operator lies in the stabilizer if the cycle is
homologically trivial (is the boundary of a 2-chain); otherwise it is a nontrivial operation acting on the code space.
Similarly, an X-type operator can be viewed as a 1-chain on the dual lattice; it commutes with the stabilizer if the 1-
chain is a cycle, and is contained in the stabilizer if the cycle is homologically trivial.
Now consider recovery from Z-type errors (recovery from X-type errors can be described similarly, with lattice and
dual lattice interchanged). Suppose errors occur on a chain E.
As with the concatenated codes discussed last time, we find that the distance L of the toric code is too pessimistic an
indicator of the code's performance. It is possible for L/2 errors to cause a recovery failure if the locations of the
errors are strategically chosen, but such error patterns are highly atypical. For randomly distibuted errors, recovery
succeeds with high probability even if the errors occur at a constant rate up to 2.7% (so that the total number of
errors in the code block is O(L^2), far greater than the code distance). Numerical simulations show that recovery
succeeds for errors occuring at an even higher rate, up to about 10.6%.
Fault-tolerant error recovery
Up until now we have discussed protecting quantum information using quantum error-correcting codes, where we
have assumed that the recovery procedure can be performed perfectly. Next we want to consider how to use
quantum error-correcting codes to protect a quantum computer from noise. There are two key issues that we need
to address:
1) Aside from just protecting quantum information, we will need to process it. So we must figure out how to perform
nontrivial quantum gates without leaving the code space, and so without losing the protection afforded by the code.
2) We will need to do the error recovery and the information processing using the imperfect gates that are
achievable in a realistic quantum computer.
We will discuss issue (2) first, and postpone issue (1) for later. Specifically, if we use a stabilizer code to protect
quantum information, we will need to measure the stabilizer generators (check operators) of the code. In principle
this can be done by executing a quantum circuit (including qubit measurements and preparations). But the gates
and measurements themselves will be prone to error (including the "identity gate" -- even qubits that are not being
processed may suffer "storage errors"). Will the error recovery work if the recovery procedure itself is noisy?
For now, let's not worry about processing the encoded information; we'll just try to use a QECC to operate a reliable
quantum memory. Furthermore, for now, let's suppose that we have an ideal encoder that we can use once to store
quantum information in the memory, and a decoder that can perform an idealized error-correction and decoding
step once when we are ready to retrieve a quantum state from the memory. But in between we are to protect the
information in the memory by repeatedly performing cycles of error recovery using noisy gates. If we store the
information for all together T error correction cycles, with what fidelity will we be able to retrieve the quantum
information from the memory at the end?
How should we describe the noise in the gates? Though we could consider more general noise models, let's use this
simple model: each gate (or preparation or measurement) in a quantum circuit is either ideal with probability 1-
epsilon, or faulty with probability epsilon. If the gate is faulty, it is replaced by an arbitrary TPCP map. With this
model we can hope to use successfully a QECC that corrects t errors. The rough idea is that if the fault rate epsilon
is small, the number of errors occuring during a cycle of error correction will only rarely exceed the number that the
code can protect against.
Let's try to be more precise. What properties should an error correction "gadget" have to protect a quantum state
successfully. Two fundamental properties are needed, which I will call "Property 0" and "Property 1". Before
explaining the properties, we introduce some terminology.
We will say that a quantum circuit (possibly including state preparation steps and measurement steps) is "r-good" if
the circuit contains no more than r faults.
We will say that the quantum state of a code block is "s-deviated" from the code space if the state can be obtained
by acting on a codeword (a state in the code space) with an error whose weight is at most s. That is, the "error" can
be expanded in terms of Pauli operators of weight up to s.
Now .. how can we build an error correction gadget that obeys the properties 0 and 1. To be concrete, consider
the t=1 case -- a code that corrects one error. In particular, consider the property 1b: we are to ensure that, that
if the incoming state is a codeword, then a single fault in the EC will cause only a single (hence correctable) error
in the code block.
If we are not careful, this will not be true. A single faulty gate in the EC might have two effects.
-- an error in the code block.
-- a nontrivial and incorrect syndrome.
Then, guided by the incorrect syndrome, we will flip the wrong qubit in a misguided effort to correct the error; this
will introduce a second error in the block, overwhelming the codes error-correction capability.
A general approach to avoid being misled by incorrect syndrome information is to measure the syndrome
multiple times. For the t=1 case, it suffices to measure the complete error syndrome twice.
-- if the syndrome is trivial (indicates no error) the first time, then we need not repeat the syndrome
measurement, nor do we take any action to recover from error.
-- if the syndrome is nontrivial, we repeat the syndrome measurement. If we obtain the same error syndrome
twice in a row, then we trust the syndrome and apply the indicated recovery step. But if we obtain a different
syndrome when we measure a second time, then we do not trust the syndrome and we do not attempt to
recover.
Let's check that this procedure really has the required properties, starting with property 1.
First, suppose that the full EC (including the repetition of the syndrome measurement) is 0-good (no faults). Then
if the incoming block is 0-deviated, the syndrome will be trivial, and if it is 1-deviated, the block will be projeced
onto a state with a definite Pauli error that can be correctly inferred from the syndrome. The second syndrome
measurement (also with no faults) is guarenteed to give the same result, and recovery succeeds.
Next suppose that the EC is 1-good, and that the incoming state is a codeword. Then if the first syndrome
measurement has no faults, the trivial syndrome is measured correctly, and no action is taken to recover. If the
first syndrome measurement has a fault, then the syndrome measurement is repeated and the second syndrome
measurement has no fault. If the syndrome measurement procedure has the property that a single fault
introduces only a single error in the block, then the second syndrome measurement will identify this error
correctly, and it will be corrected successfully.
Now let's check property 0, in the case where the incoming state is arbitrary. If the EC has no faults, we will
obtain a valid syndrome, and we can correct the state to the codespace successfully. But what if the EC has one
fault? Again, if the first syndrome measurement has a fault and yields a nontrivial syndrome, then the second one
has no fault and yields a valid syndrome. Therefore we can correct to the codespace successfully.
It may be that the first syndrome measurement has a fault, and yields a trivial syndrome that is incorrect. In that
case the syndrome measurement will not be repeated. We need to be sure to design our syndrome measurement
procedure so that in this case the actual state of the block is only 1-deviated.
But suppose the first syndrome measurement has no fault, and yields a (correct) nontrivial syndrome. Then the
second syndrome measurement might have a fault. We need to be sure to design our syndrome measurement
procedure so that, although a fault in the second syndrome measurement might cause both an error in the block
and an invalid syndrome, that nevertheless after recovery the state of the block is only 1-deviated.
====================================
Now, note that we assumed above, in the case where the incoming state is a codeword, that a single fault in the
syndrome measurement results in just one error in the block. But how to we make sure this is true? If we are not
careful, even gates in the EC correction that do not have faults might propagate errors from one qubit to another.
But, we are not done yet with designing a fault-tolerant procedure --- we need to worry about the preparation of the
ancilla cat state. Z errors in the ancilla state can cause the syndrome to be incorrect, a problem we can address by
repeating the syndrome measurement. More serious are X errors in the ancilla, because these can propagate to
the code block. Therefore, when we encode the cat state, we don't want to allow a single fault during the encoding
circuit to result in two X errors in the state.
1) prepare ancilla
2) verify ancilla
3) measure syndrome
4) repeat if necessary
5) recover
=================================
Here is another design of a fault-tolerant EC gadget for the [[7,1,3]] code. This time, instead of encoding the
ancilla using the repetition code, the ancilla is encoded using the same [[7,1,3]] code that corrects the data.
This gadget uses a general property of CSS codes which we will derive in the next lecture: an encoded CNOT
can be executed transversally. This means that the circuit for the encoded CNOT is
This procedure has the advantage that it is highly parallelized --- for both the X and Z syndrome, the interaction
between data block and ancilla block occurs in a single time step. A further advantage is that repetition of of the
syndrome measurement is not necessary. A single faulty CNOT gate might cause an X error in the data and an X
error in the ancilla, or a Z error in the data and a Z error in the ancilla. But, the error occurs in the same "position"
in both the data block and the ancilla block. For example, if a single fault introduces an error in the first qubit of the
data block and the first qubit of the ancilla block, we can't obtain an incorrect syndrome that instructs us to flip
another qubit in the data block at a position other than the first position. Rather if the syndrome indicates there is
an error, the indicated error will be in the first position, and no weight-two error can occur.
Of course, we need to encode and test the ancilla. A single fault in the encoding circuit might cause a high-weight
error, so when we try to prepare the encoded |0> we get the encoded |1> instead (or a state one-deviated from the
encoded |1>). We test the encoded ancilla by using yet another encoded ancilla, and reject the ancilla if it fails the
test.
If there is only one fault in the preparation and verification of the two ancilla states, then if there is a fault in the
preparation of one block (which might have a high-weight error) the other block has no errors. Therefore, if the
ancilla being tested has an encoded error, the error will be detected and the ancilla will be rejected. The procedure
checks only for X errors, not Z errors, but only the X error can propagate to the data when an accepted ancilla
interacts with the data.
-- It is not actually necessary to apply the Z or X recovery operations. An efficient classical computation suffices to
propagate the Z and X errors found in each syndrome measurement forward through the circuit, so we can
interpret the measurement of a logical Z or logical X at the end of the circuit.
-- We can make the syndrome measurement procedure deterministic (i.e. avoid throwing away ancillas that fail the
test) by preparing three ancilla blocks and consuming two of them in two tests of the third block. If both tests
indicate that the ancilla is an encoded |1> rather than an encoded |0>, then we either perform a logical X to correct
the ancilla, or we record that the ancilla is actually a |1> and propagate the logical error through the circuit by an
efficient classical computation.
Fault-tolerant quantum gates
Ph/CS 219
2 February 2011
Last time we considered the requirements for fault-tolerant quantum gates that act nontrivially on the codespace of a
quantum error-correcting code. In the special case of a code that corrects t=1 error, the requirements are:
-- if the gate gadget is ideal (has no faults) and its input is a codeword, then the gadget realizes the encoded
operation U acting on the code space.
-- if the gate gadget is ideal and its input has at most one error (is one-deviated from the codespace), then the
output has at most one error in each output block.
-- if the gate has one fault and its input has no errors, then the output has at most one error in each block (the errors
are correctable).
We considered the Clifford group, the finite subgroup of the m-qubit unitary group generated by the Hadamard gate
H, the phase gate P (rotation by Pi/2 about the z-axis) and the CNOT gate. For a special class of codes, the
generators of the Clifford group can be executed transversally (i.e., bitwise). The logical U can be done by applying a
product of n U (or inverse of U) gates in parallel (where n is the code's length). If we suppose that the number of
encoded qubits is k=1, then:
In particular, Steane's [[7,1,3]] code has all of these properties, so we can do Clifford group computation
transversally for that code. Transversal operations are fault tolerant: they don't propagate errors from one qubit in a
block to another qubit in the same block, and a single faulty gate damages at most one qubit in each block.
Of course, the Clifford group is discrete so that the Clifford generators are not a universal gate set for quantum
computing; in fact Clifford group computation can be simulated efficiently on a classical computer. So we need to
consider how to augment our fault-tolerant Clifford gates with another gate that completes a universal set. But before
we do that, let's generalize the observation that we can do Clifford group computation fault tolerantly. We will show
this is possible for any stabilizer code.
Specifically, we will show (an observation that has useful applications even beyond the study of fault-tolerance):
The following operations suffice for realizing the Clifford group:
-- preparing the Z eigenstate |0>.
-- applying the Pauli operators X, Z to any qubit.
-- measuring weight-1 Pauli operators X, Y and measuring weight-two Pauli operators XX, ZZ, ZX.
This observation is useful in the study of fault tolerance because, for any stabilizer code, any logical Pauli operator
can be realized as a Pauli operator (a tensor product of Pauli matrices) which is fault tolerant (r faults cause at most
r errors in the block). Furthermore, we have seen that Pauli operators can be measured fault tolerantly, e.g. by using
the cat-state method, repeating measurements, and doing majority voting on the observed outcomes. This is true
even for the measurement of the tensor product of two logical Pauli operators in the same code block, since the
weight-two logical Pauli operator is also just a tensor product of Pauli operators acting on qubits in the block.
As we already saw last time, since Clifford gates acting by conjugation take Pauli operators to Pauli operators, it is
quite convenient to describe these gates in the "Heisenberg picture" -- i.e., in terms of their action on operators
rather than states.
And ... if the state preparations and measurements can be done fault tolerantly, so can the CNOT gate, if
we insert an error correction state after each preparation or measurement. Furthermore, we can apply a
CNOT gate from any encoded qubit in the control block to any encoded qubit in the target block, as long as
we can measure the corresponding weight-two logical Pauli operators.
When we think about how to complete the fault-tolerant gate set, it is useful to keep in mind a hierarchical
classification of unitary gates --- the C_r classification.
"One-way" quantum computer (measurement based universal computation with cluster states)
An example is a cluster state, or more generally a graph state. Let's first consider the cluster state
associated with a one-dimensional lattice of qubits, and see how to execute a universal set of single-qubit
gates. Then we'll extend the discussion to a two-dimensional lattice, and see that we can do a CNOT gate
as well, completing a universal gate set.
A cluster state is a stabilizer state, a simultaneous eigenstate with eigenvalue 1 of a set of commuting Pauli
operators. In the one dimensional lattice, there is a stabilizer generator associated with the ith qubit, namely
ZXZ acting on qubits i-1, i, i+1, unless the qubit is at the end of the chain. For the first qubit the stabilizer
generator is XZ acting on qubits 1, 2, and for the nth (last) qubit is is ZX acting on qubits n-1 and n.
These stabilizers are mutually commuting. The generators i and i+1 collide on two qubits, and ZX
commutes with ZX. The generators i and i+2 collide on a single qubit, where each applies Z. Since there
are n qubits and n independent stabilizer generators, there is a unique state satisfying these conditions.
In order to have a single encoded qubit, let's eliminate the first stabilizer generator on the left edge of the
lattice. Now we have a k=1 code. We can choose its encoded Pauli operators (which commute with the
stabilizer and anticommute with one another) to be:
But first let's notice that it is very easy to prepare this 1D cluster state. Recall the action by conjugation of
the controlled-Z gate on Pauli operators.
Suppose that the initial state is a tensor product of n
X-eigenstates |+>, and then controlled-Z is applied to
each pair of neighboring qubits. The controlled-Z
gates acting on different pairs of qubits are mutually
commuting, so all can be applied in parallel in a
single time step.
The controlled-Z gates transform the stabilizer IXI acting on three successive qubits to the stabilizer ZXZ
of the cluster state. We can apply the same construction to any graph: Starting with |+> at each vertex
of the graph, we apply controlled-Z to each pair of qubits connected by an edge. The corresponding
state is called a graph state. The term cluster state is used when the graph is a regular lattice, like the
1D chain, or a 2D square lattice.
What happens if we measure one of the qubits in the Z basis? If we measure the first qubit, we are just
measuring the logical qubit in the Z basis. If we measure any other qubit, the stabilizer generator ZXZ is
replaced by a new stabilizer generator IZI. This has the effect of breaking the entangled cluster state
into the product of two cluster states, lying to the left and to the right of the measured qubit.
The effect of measuring in the X basis is more interesting: Instead of splitting the cluster state in two,
measuring X on a string of qubits glues together the chain to the left of the measured qubits with the
chain on the right, yielding a single cluster state. For the chain with a logical qubit at the left end,
suppose we measure the first qubit in the X basis. This does not measure the logical qubit; rather the
logical qubit of the length-n chain is tranformed to the logical qubit on a chain of length n-1, and
furthermore a nontrivial rotation is applied to the logical qubit. Recall that, before the measurement, we
can multiply by a stabilizer element to obtain an equivalent form of the logical Z. Then if the
measurement of X yields outcome (-1)^a:
We can now act on the second qubit with X^a (that is, do nothing if the measurement outcome is +1
and apply X if the measurement outcome is -1). The result is that we have transformed the n-qubit
cluster state to the (n-1)-qubit cluster state, and at the same time have applied a logical Hadamard
transformation, which interchanges the logical X and Z.
So ... if we measure two successive qubits, both in the X basis, we propagate the logical information two
sites to the right, apart from a Pauli operator which is determined by the measurement outcomes.
We can perform other nontrivial operations on the logical qubit by choosing other measurement bases.
For example, suppose that we measure Y.
For universal single-qubit computation, it would suffice to execute the T gate as well (the square root of
the S gate). Consider what happens when we measure X and the input state is T |psi> rather than
|psi>:
However ... remember we are executing the circuit using measurements only. We are not allowed to apply
Pauli operators to compensate for the Pauli operators resulting from the measurement outcomes. And if
we commute a T through an X the rotation angle flips.
If we want to execute a circuit of H, S, and T gates, up to a known Pauli operator which is determined by
the measurement outcomes, then each time we apply a T gate, we need to know whether the number of
X's applied previously has been even or odd.
In this sense, the execution of the circuit requires adaptive measurements --- each time we do a T gate,
the measurement basis depends on outcomes of previous measurements.
Now we want to see how to complete our universal gate set by adding an entangling two-qubit gate,
namely a CNOT gate, where we expand the cluster state to 2D. We can use two ideas already
discussed: (1) Z measurements eliminate qubits from the cluster state, so by doing such measurements
we can "carve out" a circuit that can be realized as a planar graph. (2) A pair of X measurements on
neighboring sites just propagates a qubit forward through the graph, up to a Pauli operator determined
by measurement outcomes. So a string of X measurements acts like a wire that carries a qubit. This
means it suffices to understand how the entangling gate works for a three-qubit cluster state with two
encoded qubits. And in fact all we have to do is measure X for one of the qubits to realize the gate.
It is a bit more convenient to consider a four-qubit cluster state instead, where we measure X for two of
the qubits. In fact only one of the measurements is needed for the entangling gate, the second
measurement just executes an H gate on the target qubit. But we'll consider this two step procedure
because this way we actually get an encoded CNOT gate, up to a Pauli operator.
So this is a CNOT gate, up to a two-qubit Pauli operator!
If someone is kind enough to provide us with a sufficiently cluster state, just single-qubit
measurements suffice to do any quantum computation we please. The height of the cluster state we
need scales with the circuit width (number of qubits), and the length scales with the circuit depth
(number of time steps).
Furthermore, the cluster state does not have to be prepared all at once, it is good enough for qubits to
be added to the state just before these are needed to execute gates.
The 1D cluster state is perhaps the simplest example of a phenomenon much studied in
contemporary quantum condensed matter physics: it is a symmetry-protected topological phase (SPT
phase).
First we remark that any stabilizer state or code can be interpretted as the (perhaps degenerate)
ground space of a "commuting Hamiltonian". We take the Hamiltonian to be
We find the lowest eigenstate of H by minimizing each term separately. (If minimizing all terms
simultaneously is possible we say that H is "frustration free".)This enforces S_a = 1 for each a. The
ground space of H is the code. Other eigenstates of H have energy higher by at least 2 min_a
(alpha_a).
For the 1D cluster state, the Hamiltonian is "geometrically local" --- this means that each term in the
Hamiltonian acts on a set of qubits that are close to one another on the 1D lattice.
Let's consider an open line, where we omit the two stabilizer generators at the left and right ends of
the line. Then the code space is four dimensional --- there are two encoded qubits, one localized
near the left end and one near the right end. We call these "edge states" on the chain.
If the chain is n sites long, then the encoded Pauli operators acting on the left and right edges are
We also note that this Hamiltonian has symmetries: there are operators which commute with the
Hamiltonian, and hence map energy eigenstates (in particular ground states) to eigenstates with
the same eigenvalue. There are two Pauli operators that commute with H:
A and B commute and both square to one. They generate a Z2 X Z2 symmetry. A acts on the odd
sublattice and B acts on the even sublattice.
How do these symmetry operators act on the edge states? By multiplying by elements of the
stabilizer, we see that acting on the code space we have
So, acting on the ground space (but not on general states) A and B both factorize into a product of
operators, one supported on the left end of the chain which acts on the left edge states, and the
other supported on the right end of the chain which acts on the right edge states.
A single X acting on a site does not commute with the Hamiltonian. When it acts on an odd site (say),
it creates two localized excitations ("quasiparticles") on the neighboring even sites. But a string of X's
acting on successive odd sites (or successive even sites) creates an excitation only at the end of the
chain.
The two symmetry operators A and B are "string" operators stretching from
one end of the chain to the other. If we apply X to one site at a time, starting
at the left edge and progressing toward the right edge, we view the string as
the description of a process in which an excitation is created on the left,
propagates across the bulk, and disappears on the right.
Note the difference between X and Z. If we apply Z to any site, an excitation is created at that site,
whether of not we apply Z to other sites as well. But if we apply X's, excitations appear only at the end
of the "string" of X's.
There are local operators which act on one of the two edges and preserve the ground space (the logical
operators X_L and Z_L for example), but these do not respect the symmetries (they fail to commute
with either A or B). For an operator to preserve the ground space and to act nontrivially on the left edge
(fail to commute with X_L or Z_L), and also to have the symmetry (commute with A and B), the operator
must be a nonlocal string operator, which actually acts on both edges at once.
If we consider only the action on the ground space, the symmetry operators A and B factorize into a
product of two operators, each with support on the left or right edge.
A_R and B_R act trivially on the excitation which is localized at the left edge, so it is really A_L and
B_L which determine how the Z2 X Z2 symmetry acts on the left edge excitation. Now notice
something interesting: While A and B commute, A_L and B_L anticommute instead --- they generate
the Pauli group. Because these two operators both preserve the ground space, yet do not commute
with one anothers, just the algebra of these symmetry operators is enough to inform us that the ground
space must be degenerate (more than one dimensional), because acting with B_L must flip the
eigenvalue of A_L:
What is happening here? Recall what it means for a quantum system to have a symmetry group G.
Each element g of G is represented by a unitary transformation U(g), and since applying first g1 and
then g2 must be physically equivalent to applying the product transformation g2 g1, we must have
U(g2) U(g1) = (phase) U(g2 g1).
Note that a nontrivial phase is allowed because quantum states are rays in Hilbert space. We might be
able to remove the phases in the multiplication rule just by redefining the phases of { U(g) }, but if that
is not possible we say the representation is projective. If fact, the Pauli matrices provide a projective
representation of Z2 X Z2. What we have found is that the degenerate edge states on the left (or right)
edge of the chain transform as a projective representation of the symmetry group G = Z2 X Z2 of the
Hamiltonian.
We can break the degeneracy of the left edge states by adding a term to the Hamiltonian which acts
on qubits 1 and 2, such as ZI (which fails to commute with A) or XZ (which fails to commute with B).
Note that to lift the degeneracy it suffices to break one Z2 or the other; it is not necessary to break
both. From a group theory perspective, the remaining Z2 symmetry does not suffice to enforce the
degeneracy because Z2 (unlike Z2 X Z2) does not have any projective representations.
Now comes the really interesting point. Suppose we add to the Hamiltonian a small local perturbation
(again a sum of geometrically local terms) which respects the Z2 X Z2 symmetry (commutes with both
A and B). To be concrete, we might turn on a weak uniform "magnetic field"
Now the terms in the Hamiltonian are no longer mutually commuting, and diagonalizing H is not so
easy. When the perturbation is weak, though, we can anticipate that
-- The low-lying states are still localized at the left and right edges.
-- The symmetry operators acting on the low-lying states still factorize into a product of operators
localized at the edges.
-- The states at one edge still transform as a projective representation of Z2 X Z2.
But as we have noted, transforming as a projective representation is already enough to enforce the
degeneracy of the states localized at the left (or right) edge. We conclude that the weak perturbation
cannot lift the degeneracy.
This argument is not precisely correct, because when we turn on the perturbation the factorization of
A and B into operators which act on just one edge is not exact. Rather A = A_L A_R, where A_L is
mostly supported near the left edge, but has action on the right edge which is exponentially
suppressed in n, the length of the chain. So the correct conclusion is that when we turn on the
perturbation the lifting of the edge state degeneracy is exponentially small in n.
We can understand how the degeneracy is lifted by thinking about doing a perturbation expansion in
powers of b (the strength of the magnetic field). Applying X to site i can create a pair of excitations at
sites i-1 and i+1, or it can move an excitation from site i-1 to i+1. In a sufficiently high order in
perturbation theory a process occurs in which an excitation propagates across the bulk from the left
edge to the right edge, but this process is suppressed by b^{O(n)}, where n is the length of the
chain. In effect, the nonlocal string order which acts on both edges arises in this order, and the
quantum "tunneling" of an excitation from one edge to the other breaks the degeneracy even though
the symmetry is exact. The exact degeneracy is restored in the limit of an infinitely long chain.
To justify this argument, it is important that no small energy denominators arise in the perturbation
expansion --- that is, the energy cost of creating an excitation in the bulk should be a positive
constant independent of n. When the pertubation is strong enough, this may no longer be true (the
bulk may become "gapless") and at that stage the edge-state degeneracy can be lifted substantially.