100% found this document useful (2 votes)

1K views714 pages

03 Network Information Theory 2011 PDF

Uploaded by

YollaGusrianti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

1K views714 pages

03 Network Information Theory 2011 PDF

Uploaded by

YollaGusrianti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 714

Network Information Theory

This comprehensive treatment of network information theory and its applications pro-
vides the first unified coverage of both classical and recent results. With an approach that
balances the introduction of new models and new coding techniques, readers are guided
through Shannon’s point-to-point information theory, single-hop networks, multihop
networks, and extensions to distributed computing, secrecy, wireless communication,
and networking. Elementary mathematical tools and techniques are used throughout,
requiring only basic knowledge of probability, whilst unified proofs of coding theorems
are based on a few simple lemmas, making the text accessible to newcomers. Key topics
covered include successive cancellation and superposition coding, MIMO wireless com-
munication, network coding, and cooperative relaying. Also covered are feedback and
interactive communication, capacity approximations and scaling laws, and asynchronous
and random access channels. This book is ideal for use in the classroom, for self-study,
and as a reference for researchers and engineers in industry and academia.

Abbas El Gamal is the Hitachi America Chaired Professor in the School of Engineering
and the Director of the Information Systems Laboratory in the Department of Electri-
cal Engineering at Stanford University. In the field of network information theory, he is
best known for his seminal contributions to the relay, broadcast, and interference chan-
nels; multiple description coding; coding for noisy networks; and energy-efficient packet
scheduling and throughput–delay tradeoffs in wireless networks. He is a Fellow of IEEE
and the winner of the 2012 Claude E. Shannon Award, the highest honor in the field of
information theory.
Young-Han Kim is an Assistant Professor in the Department of Electrical and Com-
puter Engineering at the University of California, San Diego. His research focuses on
information theory and statistical signal processing. He is a recipient of the 2008 NSF
Faculty Early Career Development (CAREER) Award and the 2009 US–Israel Binational
Science Foundation Bergmann Memorial Award.
NETWORK
INFORMATION
THEORY

Abbas El Gamal
Stanford University

Young-Han Kim
University of California, San Diego
cambrid ge universit y press
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Tokyo, Mexico City
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York

www.cambridge.org
Information on this title: www.cambridge.org/9781107008731

C Cambridge University Press 2011

This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.

First published 2011

Printed in the United Kingdom at the University Press, Cambridge

A catalogue record for this publication is available from the British Library

Library of Congress Cataloguing in Publication data

ISBN 978-1-107-00873-1 Hardback

Additional resources for this publication at www.cambridge.org/9781107008731

Cambridge University Press has no responsibility for the persistence or

accuracy of URLs for external or third-party internet websites referred to in
this publication, and does not guarantee that any content on such websites is,
or will remain, accurate or appropriate.
To our families
whose love and support
made this book possible
Contents

Preface xvii

Acknowledgments xxiii

Notation xxv

1 Introduction 1
1.1 Network Information Flow Problem 1
1.2 Max-Flow Min-Cut Theorem 2
1.3 Point-to-Point Information Theory 2
1.4 Network Information Theory 4

Part I Preliminaries

2 Information Measures and Typicality 17

2.1 Entropy 17
2.2 Differential Entropy 19
2.3 Mutual Information 22
2.4 Typical Sequences 25
2.5 Jointly Typical Sequences 27
Summary 30
Bibliographic Notes 31
Problems 32
Appendix 2A Proof of the Conditional Typicality Lemma 37

3 Point-to-Point Information Theory 38

3.1 Channel Coding 38
3.2 Packing Lemma 45
viii Contents

Part II Single-Hop Networks

4 Multiple Access Channels 81

4.1 Discrete Memoryless Multiple Access Channel 81
4.2 Simple Bounds on the Capacity Region 82
4.3* Multiletter Characterization of the Capacity Region 84
4.4 Time Sharing 85
4.5 Single-Letter Characterization of the Capacity Region 86
4.6 Gaussian Multiple Access Channel 93
4.7 Extensions to More than Two Senders 98
Summary 98
Bibliographic Notes 99
Problems 99
Appendix 4A Cardinality Bound on Q 103

5 Degraded Broadcast Channels 104

5.1 Discrete Memoryless Broadcast Channel 104
5.2 Simple Bounds on the Capacity Region 106
5.3 Superposition Coding Inner Bound 107
5.4 Degraded DM-BC 112
5.5 Gaussian Broadcast Channel 117
5.6 Less Noisy and More Capable Broadcast Channels 121
5.7 Extensions 123
Summary 124
Contents ix

Bibliographic Notes 125

Problems 125

6 Interference Channels 131

6.1 Discrete Memoryless Interference Channel 132
6.2 Simple Coding Schemes 133
6.3 Strong Interference 135
6.4 Gaussian Interference Channel 137
6.5 Han–Kobayashi Inner Bound 143
6.6 Injective Deterministic IC 145
6.7 Capacity Region of the Gaussian IC within Half a Bit 148
6.8 Deterministic Approximation of the Gaussian IC 153
6.9 Extensions to More than Two User Pairs 157
Summary 158
Bibliographic Notes 159
Problems 160
Appendix 6A Proof of Lemma 6.2 164
Appendix 6B Proof of Proposition 6.1 165

7 Channels with State 168

7.1 Discrete Memoryless Channel with State 169
7.2 Compound Channel 169
7.3* Arbitrarily Varying Channel 172
7.4 Channels with Random State 173
7.5 Causal State Information Available at the Encoder 175
7.6 Noncausal State Information Available at the Encoder 178
7.7 Writing on Dirty Paper 184
7.8 Coded State Information 189
Summary 191
Bibliographic Notes 191
Problems 192

8 General Broadcast Channels 197

8.1 DM-BC with Degraded Message Sets 198
8.2 Three-Receiver Multilevel DM-BC 199
8.3 Marton’s Inner Bound 205
8.4 Marton’s Inner Bound with Common Message 212
x Contents

8.5 Outer Bounds 214

8.6 Inner Bounds for More than Two Receivers 217
Summary 219
Bibliographic Notes 220
Problems 221
Appendix 8A Proof of the Mutual Covering Lemma 223
Appendix 8B Proof of the Nair–El Gamal Outer Bound 225

9 Gaussian Vector Channels 227

9.1 Gaussian Vector Point-to-Point Channel 227
9.2 Gaussian Vector Multiple Access Channel 232
9.3 Gaussian Vector Broadcast Channel 234
9.4 Gaussian Product Broadcast Channel 235
9.5 Vector Writing on Dirty Paper 241
9.6 Gaussian Vector BC with Private Messages 242
Summary 253
Bibliographic Notes 253
Problems 254
Appendix 9A Proof of the BC–MAC Duality Lemma 255
Appendix 9B Uniqueness of the Supporting Line 256

10 Distributed Lossless Compression 258

10.1 Distributed Lossless Source Coding for a -DMS 258
10.2 Inner and Outer Bounds on the Optimal Rate Region 259
10.3 Slepian–Wolf Theorem 260
10.4 Lossless Source Coding with a Helper 264
10.5 Extensions to More than Two Sources 269
Summary 270
Bibliographic Notes 270
Problems 271

11 Lossy Compression with Side Information 274

11.1 Simple Special Cases 275
11.2 Causal Side Information Available at the Decoder 275
11.3 Noncausal Side Information Available at the Decoder 280
11.4 Source Coding When Side Information May Be Absent 286
Summary 288
Contents xi

Bibliographic Notes 288

Problems 289
Appendix 11A Proof of Lemma 11.1 292

12 Distributed Lossy Compression 294

12.1 Berger–Tung Inner Bound 295
12.2 Berger–Tung Outer Bound 299
12.3 Quadratic Gaussian Distributed Source Coding 300
12.4 Quadratic Gaussian CEO Problem 308
12.5* Suboptimality of Berger–Tung Coding 312
Summary 313
Bibliographic Notes 313
Problems 314
Appendix 12A Proof of the Markov Lemma 315
Appendix 12B Proof of Lemma 12.3 317
Appendix 12C Proof of Lemma 12.4 317
Appendix 12D Proof of Lemma 12.6 318

13 Multiple Description Coding 320

13.1 Multiple Description Coding for a DMS 321
13.2 Simple Special Cases 322
13.3 El Gamal–Cover Inner Bound 323
13.4 Quadratic Gaussian Multiple Description Coding 327
13.5 Successive Refinement 330
13.6 Zhang–Berger Inner Bound 332
Summary 334
Bibliographic Notes 334
Problems 335

14 Joint Source–Channel Coding 336

14.1 Lossless Communication of a -DMS over a DM-MAC 336
14.2 Lossless Communication of a -DMS over a DM-BC 345
14.3 A General Single-Hop Network 351
Summary 355
Bibliographic Notes 355
Problems 356
Appendix 14A Proof of Lemma 14.1 358
xii Contents

Part III Multihop Networks

15 Graphical Networks 363

15.1 Graphical Multicast Network 364
15.2 Capacity of Graphical Unicast Network 366
15.3 Capacity of Graphical Multicast Network 368
15.4 Graphical Multimessage Network 373
Summary 377
Bibliographic Notes 377
Problems 379
Appendix 15A Proof of Lemma 15.1 381

16 Relay Channels 382

16.1 Discrete Memoryless Relay Channel 383
16.2 Cutset Upper Bound on the Capacity 384
16.3 Direct-Transmission Lower Bound 386
16.4 Decode–Forward Lower Bound 386
16.5 Gaussian Relay Channel 395
16.6 Partial Decode–Forward Lower Bound 396
16.7 Compress–Forward Lower Bound 399
16.8 RFD Gaussian Relay Channel 406
16.9 Lookahead Relay Channels 411
Summary 416
Bibliographic Notes 418
Problems 419
Appendix 16A Cutset Bound for the Gaussian RC 423
Appendix 16B Partial Decode–Forward for the Gaussian RC 424
Appendix 16C Equivalent Compress–Forward Lower Bound 425

17 Interactive Channel Coding 427

17.1 Point-to-Point Communication with Feedback 428
17.2 Multiple Access Channel with Feedback 434
17.3 Broadcast Channel with Feedback 443
17.4 Relay Channel with Feedback 444
17.5 Two-Way Channel 445
17.6 Directed Information 449
Summary 453
Contents xiii

Bibliographic Notes 454

Problems 455
Appendix 17A Proof of Lemma 17.1 458

18 Discrete Memoryless Networks 459

18.1 Discrete Memoryless Multicast Network 459
18.2 Network Decode–Forward 462
18.3 Noisy Network Coding 466
18.4 Discrete Memoryless Multimessage Network 477
Summary 481
Bibliographic Notes 481
Problems 482

19 Gaussian Networks 484

19.1 Gaussian Multimessage Network 485
19.2 Capacity Scaling Laws 490
19.3 Gupta–Kumar Random Network 492
Summary 499
Bibliographic Notes 499
Problems 500
Appendix 19A Proof of Lemma 19.1 501
Appendix 19B Proof of Lemma 19.2 502

20 Compression over Graphical Networks 505

20.1 Distributed Lossless Source–Network Coding 505
20.2 Multiple Description Network Coding 508
20.3 Interactive Source Coding 512
Summary 519
Bibliographic Notes 520
Problems 520
Appendix 20A Proof of Lemma 20.1 525

Part IV Extensions

21 Communication for Computing 529

21.1 Coding for Computing with Side Information 530
21.2 Distributed Coding for Computing 533
xiv Contents

21.3 Interactive Coding for Computing 537

21.4 Cascade Coding for Computing 539
21.5 Distributed Lossy Averaging 542
21.6 Computing over a MAC 544
Summary 545
Bibliographic Notes 546
Problems 547

22 Information Theoretic Secrecy 549

22.1 Wiretap Channel 550
22.2 Confidential Communication via Shared Key 557
22.3 Secret Key Agreement: Source Model 559
22.4 Secret Key Agreement: Channel Model 572
Summary 575
Bibliographic Notes 576
Problems 578
Appendix 22A Proof of Lemma 22.1 579
Appendix 22B Proof of Lemma 22.2 580
Appendix 22C Proof of Lemma 22.3 581

24 Networking and Information Theory 600

24.1 Random Data Arrivals 601
24.2 Random Access Channel 604
24.3 Asynchronous MAC 607
Summary 614
Contents xv

Bibliographic Notes 614

Problems 615
Appendix 24A Proof of Lemma 24.1 617
Appendix 24B Proof of Lemma 24.2 618

Appendices

A Convex Sets and Functions 623

B Probability and Estimation 625

C Cardinality Bounding Techniques 631

D Fourier–Motzkin Elimination 636

E Convex Optimization 640

Bibliography 643

Common Symbols 664

Author Index 666

Subject Index 671

Preface

Network information theory aims to establish the fundamental limits on information flow
in networks and the optimal coding schemes that achieve these limits. It extends Shan-
non’s fundamental theorems on point-to-point communication and the Ford–Fulkerson
max-flow min-cut theorem for graphical unicast networks to general networks with mul-
tiple sources and destinations and shared resources. Although the theory is far from com-
plete, many elegant results and techniques have been developed over the past forty years
with potential applications in real-world networks. This book presents these results in a
coherent and simplified manner that should make the subject accessible to graduate stu-
dents and researchers in electrical engineering, computer science, statistics, and related
fields, as well as to researchers and practitioners in industry.
The first paper on network information theory was on the two-way channel by Shan-
non (). This was followed a decade later by seminal papers on the broadcast channel
by Cover (), the multiple access channel by Ahlswede (, ) and Liao (),
and distributed lossless compression by Slepian and Wolf (a). These results spurred
a flurry of research on network information theory from the mid s to the early s
with many new results and techniques developed; see the survey papers by van der Meulen
() and El Gamal and Cover (), and the seminal book by Csiszár and Körner
(b). However, many problems, including Shannon’s two-way channel, remained open
and there was little interest in these results from communication theorists or practition-
ers. The period from the mid s to the mid s represents a “lost decade” for network
information theory during which very few papers were published and many researchers
shifted their focus to other areas. The advent of the Internet and wireless communica-
tion, fueled by advances in semiconductor technology, compression and error correction
coding, signal processing, and computer science, revived the interest in this subject and
there has been an explosion of activities in the field since the mid s. In addition to
progress on old open problems, recent work has dealt with new network models, new
approaches to coding for networks, capacity approximations and scaling laws, and topics
at the intersection of networking and information theory. Some of the techniques devel-
oped in network information theory, such as successive cancellation decoding, multiple
description coding, successive refinement of information, and network coding, are being
implemented in real-world networks.
xviii Preface

Development of the Book

The idea of writing this book started a long time ago when Tom Cover and the first author
considered writing a monograph based on their aforementioned  survey paper. The
first author then put together a set of handwritten lecture notes and used them to teach
a course on multiple user information theory at Stanford University from  to .
In response to high demand from graduate students in communication and information
theory, he resumed teaching the course in  and updated the early lecture notes with
recent results. These updated lecture notes were used also in a course at EPFL in the
summer of . In  the second author, who was in the  class, started teaching a
similar course at UC San Diego and the authors decided to collaborate on expanding the
lecture notes into a textbook. Various versions of the lecture notes have been used since
then in courses at Stanford University, UC San Diego, the Chinese University of Hong
Kong, UC Berkeley, Tsinghua University, Seoul National University, University of Notre
Dame, and McGill University among others. The lecture notes were posted on the arXiv
in January . This book is based on these notes. Although we have made an effort to
provide a broad coverage of the results in the field, we do not claim to be all inclusive.
The explosion in the number of papers on the subject in recent years makes it almost
impossible to provide a complete coverage in a single textbook.

Organization of the Book

We considered several high-level organizations of the material in the book, from source
coding to channel coding or vise versa, from graphical networks to general networks, or
along historical lines. We decided on a pedagogical approach that balances the intro-
duction of new network models and new coding techniques. We first discuss single-hop
networks and then multihop networks. Within each type of network, we first study chan-
nel coding, followed by their source coding counterparts, and then joint source–channel
coding. There were several important topics that did not fit neatly into this organization,
which we grouped under Extensions. The book deals mainly with discrete memoryless
and Gaussian network models because little is known about the limits on information
flow for more complex models. Focusing on these models also helps us present the cod-
ing schemes and proof techniques in their simplest possible forms.
The first chapter provides a preview of network information theory using selected ex-
amples from the book. The rest of the material is divided into four parts and a set of
appendices.
Part I. Background (Chapters  and ). We present the needed basic information theory
background, introduce the notion of typicality and related lemmas used throughout the
book, and review Shannon’s point-to-point communication coding theorems.
Part II. Single-hop networks (Chapters  through ). We discuss networks with single-
round one-way communication. Here each node is either a sender or a receiver. The
material is divided into three types of communication settings.
∙ Independent messages over noisy channels (Chapters  through ). We discuss noisy
Organization of the Book xix

single-hop network building blocks, beginning with multiple access channels (many-
to-one communication) in Chapter , followed by broadcast channels (one-to-many
communication) in Chapters  and , and interference channels (multiple one-to-one
communications) in Chapter . We split the discussion on broadcast channels for a
pedagogical reason—the study of general broadcast channels in Chapter  requires
techniques that are introduced more simply through the discussion of channels with
state in Chapter . In Chapter , we study Gaussian vector channels, which model
multiple-antenna (multiple-input multiple-output/MIMO) communication systems.
∙ Correlated sources over noiseless links (Chapters  through ). We discuss the source
coding counterparts of the noisy single-hop network building blocks, beginning with
distributed lossless source coding in Chapter , followed by lossy source coding with
side information in Chapter , distributed lossy source coding in Chapter , and mul-
tiple description coding in Chapter . Again we spread the discussion on distributed
coding over three chapters to help develop new ideas gradually.
∙ Correlated sources over noisy channels (Chapter ). We discuss the general setting of
sending uncompressed sources over noisy single-hop networks.

Part III. Multihop networks (Chapters  through ). We discuss networks with
relaying and multiple rounds of communication. Here some of the nodes can act as both
sender and receiver. In an organization parallel to Part II, the material is divided into
three types of settings.

∙ Independent messages over graphical networks (Chapter ). We discuss coding for net-
works modeled by graphs beyond simple routing.
∙ Independent messages over noisy networks (Chapters  through ). In Chapter , we
discuss the relay channel, which is a simple two-hop network with a sender, a receiver,
and a relay. We then discuss channels with feedback and the two-way channel in Chap-
ter . We extend results on the relay channel and the two-way channel to general noisy
networks in Chapter . We further discuss approximations and scaling laws for the
capacity of large wireless networks in Chapter .
∙ Correlated sources over graphical networks (Chapter ). We discuss source coding
counterparts of the channel coding problems in Chapters  through .

Part IV. Extensions (Chapters  through ). We study extensions of the theory
discussed in the first three parts of the book to communication for computing in Chap-
ter , communication with secrecy constraints in Chapter , wireless fading channels in
Chapter , and to problems at the intersection of networking and information theory in
Chapter .

Appendices. To make the book as self-contained as possible, Appendices A, B, and E

provide brief reviews of the necessary background on convex sets and functions, probabil-
ity and estimation, and convex optimization, respectively. Appendices C and D describe
techniques for bounding the cardinality of auxiliary random variables appearing in many
xx Preface

capacity and rate region characterizations, and the Fourier–Motzkin elimination proce-
dure, respectively.

Presentation of the Material

Each chapter typically contains both teaching material and advanced topics. Starred sec-
tions contain topics that are either too technical to be discussed in detail or are not essen-
tial to the main flow of the material. The chapter ends with a bulleted summary of key
points and open problems, bibliographic notes, and problems on missing proof steps in
the text followed by exercises around the key ideas. Some of the more technical and less
central proofs are delegated to appendices at the end of each chapter in order to help the
reader focus on the main ideas and techniques.
The book follows the adage “a picture is worth a thousand words.” We use illustra-
tions and examples to provide intuitive explanations of models and concepts. The proofs
follow the principle of making everything as simple as possible but not simpler. We use
elementary tools and techniques, requiring only basic knowledge of probability and some
level of mathematical maturity, for example, at the level of a first course on information
theory. The achievability proofs are based on joint typicality, which was introduced by
Shannon in his  paper and further developed in the s by Forney and Cover. We
take this approach one step further by developing a set of simple lemmas to reduce the
repetitiveness in the proofs. We show how the proofs for discrete memoryless networks
can be extended to their Gaussian counterparts by using a discretization procedure and
taking appropriate limits. Some of the proofs in the book are new and most of them are
simplified—and in some cases more rigorous—versions of published proofs.

Use of the Book in Courses

As mentioned earlier, the material in this book has been used in courses on network in-
formation theory at several universities over many years. We hope that the publication of
the book will help make such a course more widely adopted. One of our main motivations
for writing the book, however, is to broaden the audience for network information theory.
Current education of communication and networking engineers encompasses primarily
point-to-point communication and wired networks. At the same time, many of the inno-
vations in modern communication and networked systems concern more efficient use of
shared resources, which is the focus of network information theory. We believe that the
next generation of communication and networking engineers can benefit greatly from
having a working knowledge of network information theory. We have made every effort
to present some of the most relevant material to this audience as simply and clearly as
possible. In particular, the material on Gaussian channels, wireless fading channels, and
Gaussian networks can be readily integrated into an advanced course on wireless com-
munication.
The book can be used as a main text in a one-quarter/semester first course on infor-
mation theory with emphasis on communication or a one-quarter second course on in-
formation theory, or as a supplementary text in courses on communication, networking,
Dependence Graphs xxi

computer science, and statistics. Most of the teaching material in the book can be covered
in a two-quarter course sequence. Slides for such courses are posted at http://arxiv.org/abs
/./.

Dependence Graphs
The following graphs depict the dependence of each chapter on its preceding chapters.
Each box contains the chapter number and lighter boxes represent dependence on pre-
vious parts. Solid edges represent required reading and dashed edges represent recom-
mended reading.

Part II.
, 
2 Information measures and
typicality
 3 Point-to-point information theory
4 Multiple access channels
5 Degraded broadcast channels
6 Interference channels
  
7 Channels with state
8 General broadcast channels
9 Gaussian vector channels
  10 Distributed lossless compression
11 Lossy compression with side
information
   12 Distributed lossy compression
13 Multiple description coding
14 Joint source–channel coding
 

Part III.
 ,    

  

  

15 Graphical networks 18 Discrete memoryless networks

16 Relay channels 19 Gaussian networks
17 Interactive channel coding 20 Compression over graphical networks
xxii Preface

Part IV.
,     

   

21 Communication for computing 22 Information theoretic secrecy

23 Wireless fading channels 24 Networking and information theory

In addition to the dependence graphs for each part, we provide below some interest-
based dependence graphs.

Communication.
, 

 , 

   

   

   

Data compression.
, 

 , , , 

  

 

Abbas El Gamal Palo Alto, California

Young-Han Kim La Jolla, California
July 
Acknowledgments

The development of this book was truly a community effort. Many colleagues, teaching
assistants of our courses on network information theory, and our postdocs and PhD stu-
dents provided invaluable input on the content, organization, and exposition of the book,
and proofread earlier drafts.
First and foremost, we are indebted to Tom Cover. He taught us everything we know
about information theory, encouraged us to write this book, and provided several insight-
ful comments. We are also indebted to our teaching assistants—Ehsan Ardestanizadeh,
Chiao-Yi Chen, Yeow-Khiang Chia, Shirin Jalali, Paolo Minero, Haim Permuter, Han-I
Su, Sina Zahedi, and Lei Zhao—for their invaluable contributions to the development of
this book. In particular, we thank Sina Zahedi for helping with the first set of lecture notes
that ultimately led to this book. We thank Han-I Su for his contributions to the chapters
on quadratic Gaussian source coding and distributed computing and his thorough proof-
reading of the entire draft. Yeow-Khiang Chia made invaluable contributions to the chap-
ters on information theoretic secrecy and source coding over graphical networks, con-
tributed several problems, and proofread many parts of the book. Paolo Minero helped
with some of the material in the chapter on information theory and networking.
We are also grateful to our PhD students. Bernd Bandemer contributed to the chapter
on interference channels and proofread several parts of the book. Sung Hoon Lim con-
tributed to the chapters on discrete memoryless and Gaussian networks. James Mammen
helped with the first draft of the lecture notes on scaling laws. Lele Wang and Yu Xiang
also provided helpful comments on many parts of the book.
We benefited greatly from discussions with several colleagues. Chandra Nair con-
tributed many of the results and problems in the chapters on broadcast channels. David
Tse helped with the organization of the chapters on fading and interference channels.
Mehdi Mohseni helped with key proofs in the chapter on Gaussian vector channels. Amin
Gohari helped with the organization and several results in the chapter on information the-
oretic secrecy. Olivier Lévêque helped with some of the proofs in the chapter on Gauss-
ian networks. We often resorted to John Gill for stylistic and editorial advice. Jun Chen,
Sae-Young Chung, Amos Lapidoth, Prakash Narayan, Bobak Nazer, Alon Orlitsky, Ofer
Shayevitz, Yossi Steinberg, Aslan Tchamkerten, Dimitris Toumpakaris, Sergio Verdú, Mai
Vu, Michèle Wigger, Ram Zamir, and Ken Zeger provided helpful input during the writing
of this book. We would also like to thank Venkat Anantharam, François Baccelli, Stephen
Boyd, Max Costa, Paul Cuff, Suhas Diggavi, Massimo Franceschetti, Michael Gastpar,
xxiv Acknowledgments

Andrea Goldsmith, Bob Gray, Te Sun Han, Tara Javidi, Ashish Khisti, Gerhard Kramer,
Mohammad Maddah-Ali, Andrea Montanari, Balaji Prabhakar, Bixio Rimoldi, Anant Sa-
hai, Anand Sarwate, Devavrat Shah, Shlomo Shamai, Emre Telatar, Alex Vardy, Tsachy
Weissman, and Lin Zhang.
This book would not have been written without the enthusiasm, inquisitiveness, and
numerous contributions of the students who took our courses, some of whom we have
already mentioned. In addition, we would like to acknowledge Ekine Akuiyibo, Lorenzo
Coviello, Chan-Soo Hwang, Yashodhan Kanoria, Tae Min Kim, Gowtham Kumar, and
Moshe Malkin for contributions to some of the material. Himanshu Asnani, Yuxin Chen,
Aakanksha Chowdhery, Mohammad Naghshvar, Ryan Peng, Nish Sinha, and Hao Zou
provided many corrections to earlier drafts. Several graduate students from UC Berkeley,
MIT, Tsinghua, University of Maryland, Tel Aviv University, and KAIST also provided
valuable feedback.
We would like to thank our editor Phil Meyler and the rest of the Cambridge staff
for their exceptional support during the publication stage of this book. We also thank
Kelly Yilmaz for her wonderful administrative support. Finally, we acknowledge partial
support for the work in this book from the DARPA ITMANET and the National Science
Foundation.
Notation

We introduce the notation and terminology used throughout the book.

Sets, Scalars, and Vectors

We use lowercase letters x, y, . . . to denote constants and values of random variables. We
j
use xi = (xi , xi+1 , . . . , x j ) to denote an ( j − i + 1)-sequence/column vector for 1 ≤ i ≤ j.
When i = 1, we always drop the subscript, i.e., x j = (x1 , x2 , . . . , x j ). Sometimes we write
x, y, . . . for constant vectors with specified dimensions and x j for the j-th component of
x. Let x(i) be a vector indexed by time i and x j (i) be the j-th component of x(i). The
sequence of these vectors is denoted by xn = (x(1), x(2), . . . , x(n)). An all-one column
vector (1, . . . , 1) with a specified dimension is denoted by 1.
Let α, β ∈ [0, 1]. Then ᾱ = (1 − α) and α ∗ β = α β̄ + βα. ̄
Let x , y ∈ {0, 1} be binary n-vectors. Then x ⊕ y is the componentwise modulo-
n n n n n

sum of the two vectors.

Calligraphic letters X , Y , . . . are used exclusively for finite sets and |X | denotes the
cardinality of the set X . The following notation is used for common sets:
∙ ℝ is the real line and ℝ d is the d-dimensional real Euclidean space.
∙ 𝔽q is the finite field GF(q) and 𝔽qd is the d-dimensional vector space over GF(q).

Script letters C , R, P , . . . are used for subsets of ℝ d .

For a pair of integers i ≤ j, we define the discrete interval [i : j] = {i, i + 1, . . . , j}.
More generally, for a ≥ 0 and integer i ≤ 2a , we define
∙ [i : 2a ) = {i, i + 1, . . . , 2⌊a⌋ }, where ⌊a⌋ is the integer part of a, and
∙ [i : 2a ] = {i, i + 1, . . . , 2⌈a⌉ }, where ⌈a⌉ is the smallest integer ≥ a.

Probability and Random Variables

The probability of an event A is denoted by P(A) and the conditional probability of A
given B is denoted by P(A | B). We use uppercase letters X, Y , . . . to denote random vari-
ables. The random variables may take values from finite sets X , Y , . . . or from the real
line ℝ. By convention, X =  means that X is a degenerate random variable (unspecified
constant) regardless of its support. The probability of the event {X ∈ A} is denoted by
P{X ∈ A}.
xxvi Notation

j
In accordance with the notation for constant vectors, we use Xi = (Xi , . . . , X j ) to de-
note a ( j − i + 1)-sequence/column vector of random variables for 1 ≤ i ≤ j. When i = 1,
we always drop the subscript and use X j = (X1 , . . . , X j ).
Let (X1 , . . . , Xk ) be a tuple of k random variables and J ⊆ [1 : k]. The subtuple of
random variables with indices from J is denoted by X(J ) = (X j : j ∈ J ). Similarly,
given k random vectors (X1n , . . . , Xkn ),

X n (J ) = (X nj : j ∈ J ) = (X1 (J ), . . . , Xn (J )).

Sometimes we write X, Y, . . . for random (column) vectors with specified dimensions

and X j for the j-th component of X. Let X(i) be a random vector indexed by time i
and X j (i) be the j-th component of X(i). We denote the sequence of these vectors by
Xn = (X(1), . . . , X(n)).
The following notation is used to specify random variables and random vectors.
∙ X n ∼ p(x n ) means that p(x n ) is the probability mass function (pmf) of the discrete
random vector X n . The function p X 󰑛 (x̃n ) denotes the pmf of X n with argument x̃n ,
i.e., p X 󰑛 (x̃n ) = P{X n = x̃n } for all x̃n ∈ X n . The function p(x n ) without subscript is
understood to be the pmf of the random vector X n defined over X1 × ⋅ ⋅ ⋅ × Xn .
∙ X n ∼ f (x n ) means that f (x n ) is the probability density function (pdf) of the contin-
uous random vector X n .
∙ X n ∼ F(x n ) means that F(x n ) is the cumulative distribution function (cdf) of X n .
∙ (X n , Y n ) ∼ p(x n , yn ) means that p(x n , y n ) is the joint pmf of X n and Y n .
∙ Y n | {X n ∈ A} ∼ p(y n | X n ∈ A) means that p(yn | X n ∈ A) is the conditional pmf of
Y n given {X n ∈ A}.
∙ Y n | {X n = x n } ∼ p(y n |x n ) means that p(y n |x n ) is the conditional pmf of Y n given
{X n = x n }.
∙ p(y n |x n ) is a collection of (conditional) pmfs on Y n , one for every x n ∈ X n .
f (y n |x n ) and F(yn |x n ) are similarly defined.
∙ Y n ∼ p X 󰑛 (y n ) means that Y n has the same pmf as X n , i.e., p(y n ) = p X 󰑛 (yn ).
Similar notation is used for conditional probability distributions.
Given a random variable X, the expected value of its function д(X) is denoted by
E X (д(X)), or E(д(X)) in short. The conditional expectation of X given Y is denoted by
E(X|Y ). We use Var(X) = E[(X − E(X))2 ] to denote the variance of X and Var(X|Y) =
E[(X − E(X|Y))2 | Y] to denote the conditional variance of X given Y.
For random vectors X = X n and Y = Y k , KX = E[(X − E(X))(X − E(X))T ] denotes
the covariance matrix of X, KXY = E[(X − E(X))(Y − E(Y))T ] denotes the crosscovariance
matrix of (X, Y), and KX|Y = E[(X − E(X|Y))(X − E(X|Y))T ] = KX−E(X|Y) denotes the con-
ditional covariance matrix of X given Y, that is, the covariance matrix of the minimum
mean squared error (MMSE) for estimating X given Y.
Common Functions xxvii

We use the following notation for standard random variables and random vectors:
∙ X ∼ Bern(p): X is a Bernoulli random variable with parameter p ∈ [0, 1], i.e.,

1 with probability p,
X=󶁇
0 with probability 1 − p.

∙ X ∼ Binom(n, p): X is a binomial random variable with parameters n ≥ 1 and p ∈

[0, 1], i.e.,
n
p X (k) = 󶀤 󶀴pk (1 − p)n−k , k ∈ [0 : n].
k
∙ X ∼ Unif(A): X is a discrete uniform random variable over a finite set A.
X ∼ Unif[i : j] for integers j > i: X is a discrete uniform random variable over [i : j].
∙ X ∼ Unif[a, b] for b > a: X is a continuous uniform random variable over [a, b].
∙ X ∼ N(μ, σ 2 ): X is a Gaussian random variable with mean μ and variance σ 2 .
Q(x) = P{X > x}, x ∈ ℝ, where X ∼ N(0, 1).
∙ X = X n ∼ N(μ, K): X is a Gaussian random vector with mean vector μ and covariance
matrix K, i.e.,
1 1 󰑇 −1
f (x) = e − 2 (x−μ) K (x−μ) .
󵀄(2π)n |K|

We use the notation {Xi } = (X1 , X2 , . . .) to denote a discrete-time random process.

The following notation is used for common random processes:
∙ {Xi } is a Bern(p) process means that (X1 , X2 , . . .) is a sequence of independent and
identically distributed (i.i.d.) Bern(p) random variables.
∙ {Xi } is a WGN(P) process means that (X1 , X2 , . . .) is a sequence of i.i.d. N(0, P) ran-
dom variables. More generally, {Xi , Yi } is a -WGN(P, ρ) process means that (X1 , Y1 ),
(X2 , Y2 ), . . . are i.i.d. jointly Gaussian random variable pairs with E(X1 ) = E(Y1 ) = 0,
E(X12 ) = E(Y12 ) = P, and correlation coefficient ρ = E(X1Y1 )/P.
We say that X → Y → Z form a Markov chain if p(x, y, z) = p(x)p(y|x)p(z|y). More
generally, we say that X1 → X2 → X3 → ⋅ ⋅ ⋅ form a Markov chain if p(xi |x i−1 ) = p(xi |xi−1 )
for i ≥ 2.

Common Functions
The following functions are used frequently. The logarithm function log is assumed to be
base  unless specified otherwise.

∙ Binary entropy function: H(p) = −p log p − p̄ log p̄ for p ∈ [0, 1].

∙ Gaussian capacity function: C(x) = (1/2) log(1 + x) for x ≥ 0.
∙ Quadratic Gaussian rate function: R(x) = max{(1/2) log x, 0} = (1/2)[log x]+ .
xxviii Notation

є–δ Notation
We use є, є 󳰀 > 0 exclusively to denote “small” constants such that є 󳰀 < є. We use δ(є) > 0 to
denote a function of є that tends to zero as є → 0. When there are multiple such functions
δ1 (є), δ2 (є), . . . , δk (є), we denote them all by a generic function δ(є) that tends to zero as
є → 0 with the understanding that δ(є) = max{δ1 (є), δ2 (є), . . . , δk (є)}. Similarly, we use
єn ≥ 0 to denote a generic function of n that tends to zero as n → ∞.
We say that an ≐ 2nb for some constant b if there exists some δ(є) (with є defined in
the context) such that for n sufficiently large,

2n(b−δ(є)) ≤ an ≤ 2n(b+δ(є)) .

Matrices
We use uppercase letters A, B, . . . to denote matrices. The entry in the i-th row and the
j-th column of a matrix A is denoted by A(i, j) or A i j . A transpose of a matrix A is
denoted by AT , i.e., AT (i, j) = A( j, i). We use diag(a1 , a2 , . . . , ad ) to denote a d × d di-
agonal matrix with diagonal elements a1 , a2 , . . . , ad . The d × d identity matrix is denoted
by I d . The subscript d is omitted when it is clear from the context. For a square matrix A,
|A| = det(A) denotes the determinant of A and tr(A) denotes its trace.
A symmetric matrix A is said to be positive definite (denoted by A ≻ 0) if xTAx > 0
for all x ̸= 0. If instead xTAx ≥ 0 for all x ̸= 0, then the matrix A is said to be positive
semidefinite (denoted by A ⪰ 0). For symmetric matrices A and B of the same dimension,
A ≻ B means that A − B ≻ 0 and A ⪰ B means that A − B ⪰ 0.
A singular value decomposition of an r × t matrix G of rank d is given by G = ΦΓΨT ,
where Φ is an r × d matrix with ΦT Φ = Id , Ψ is a t × d matrix with ΨT Ψ = Id , and
Γ = diag(γ1 , . . . , γ d ) is a d × d positive diagonal matrix.
For a symmetric positive semidefinite matrix K with an eigenvalue decomposition K =
ΦΛΦT , we define its symmetric square root as K 1/2 = ΦΛ1/2 ΦT , where Λ1/2 is a diagonal
matrix with diagonal elements 󵀄Λ ii . Note that K 1/2 is symmetric positive definite with
K 1/2 K 1/2 = K. We define the symmetric square root inverse K −1/2 of a symmetric positive
definite matrix K as the symmetric square root of K −1 .

Order Notation
Let д1 (N) and д2 (N) be nonnegative functions on natural numbers.
∙ д1 (N) = o(д2 (N)) means that д1 (N)/д2 (N) tends to zero as N → ∞.
∙ д1 (N) = O(д2 (N)) means that there exist a constant a and an integer n0 such that
д1 (N) ≤ aд2 (N) for all N > n0 .
∙ д1 (N) = Ω(д2 (N)) means that д2 (N) = O(д1 (N)).
∙ д1 (N) = Θ(д2 (N)) means that д1 (N) = O(д2 (N)) and д2 (N) = O(д1 (N)).
CHAPTER 1

Introduction

We introduce the general problem of optimal information flow in networks, which is the
focus of network information theory. We then give a preview of the book with pointers
to where the main results can be found.

1.1 NETWORK INFORMATION FLOW PROBLEM

A networked system consists of a set of information sources and communication nodes

connected by a network as depicted in Figure .. Each node observes one or more sources
and wishes to reconstruct other sources or to compute a function based on all the sources.
To perform the required task, the nodes communicate with each other over the network.
∙ What is the limit on the amount of communication needed?
∙ How can this limit be achieved?

Communication network

Figure .. Elements of a networked system. The information sources (shaded cir-
cles) may be data, video, sensor measurements, or biochemical signals; the nodes
(empty circles) may be computers, handsets, sensor nodes, or neurons; and the net-
work may be a wired network, a wireless cellular or ad-hoc network, or a biological
network.
2 Introduction

These information flow questions have been answered satisfactorily for graphical uni-
cast (single-source single-destination) networks and for point-to-point communication
systems.

1.2 MAX-FLOW MIN-CUT THEOREM

Consider a graphical (wired) network, such as the Internet or a distributed storage system,
modeled by a directed graph (N , E) with link capacities C jk bits from node j to node k as
depicted in Figure .. Assume a unicast communication scenario in which source node 
wishes to communicate an R-bit message M to destination node N. What is the network
capacity C, that is, the maximum number of bits R that can be communicated reliably?
The answer is given by the max-flow min-cut theorem due to Ford and Fulkerson ()
and Elias, Feinstein, and Shannon (). They showed that the capacity (maximum flow)
is equal to the minimum cut capacity, i.e.,

C= min C(S),
S⊂N :1∈S , N∈S 󰑐

where C(S) = ∑ j∈S , k∈S 󰑐 C jk is the capacity of the cut (S, S c ). They also showed that the
capacity is achieved without errors using simple routing at the intermediate (relay) nodes,
that is, the incoming bits at each node are forwarded over its outgoing links. Hence, under
this networked system model, information can be treated as a commodity to be shipped
over a transportation network or electricity to be delivered over a power grid.

2 j

C12

1 C14 N
M ̂
M
4
C13

3 k
Figure .. Graphical single-source single-destination network.

The max-flow min-cut theorem is discussed in more detail in Chapter .

1.3 POINT-TO-POINT INFORMATION THEORY

The graphical unicast network model captures the topology of a point-to-point network
with idealized source and communication link models. At the other extreme, Shannon
1.3 Point-to-Point Information Theory 3

(, ) studied communication and compression over a single link with more com-
plex source and link (channel) models. He considered the communication system archi-
tecture depicted in Figure ., where a sender wishes to communicate a k-symbol source
sequence U k to a receiver over a noisy channel. To perform this task, Shannon proposed a
general block coding scheme, where the source sequence is mapped by an encoder into an
n-symbol input sequence X n (U k ) and the received channel output sequence Y n is mapped
by a decoder into an estimate (reconstruction) sequence Û k (Y n ). He simplified the anal-
ysis of this system by proposing simple discrete memoryless models for the source and
the noisy channel, and by using an asymptotic approach to characterize the necessary and
sufficient condition for reliable communication.

Uk Xn Yn Û k
Encoder Channel Decoder

Figure .. Shannon’s model of a point-to-point communication system.

Shannon’s ingenious formulation of the point-to-point communication problem led

to the following four fundamental theorems.
Channel coding theorem. Suppose that the source is a maximally compressed k-bit mes-
sage M as in the graphical network case and that the channel is discrete and memoryless
with input X, output Y, and conditional probability p(y|x) that specifies the probability
of receiving the symbol y when x is transmitted. The decoder wishes to find an estimate
M̂ of the message such that the probability of decoding error P{M ̂ ̸= M} does not exceed
a prescribed value Pe . The general problem is to find the tradeoff between the number of
bits k, the block length n, and the probability of error Pe .
This problem is intractable in general. Shannon () realized that the difficulty lies
in analyzing the system for any given finite block length n and reformulated the problem
as one of finding the channel capacity C, which is the maximum communication rate
R = k/n in bits per channel transmissions such that the probability of error can be made
arbitrarily small when the block length n is sufficiently large. He established a simple and
elegant characterization of the channel capacity C in terms of the maximum of the mutual
information I(X; Y) between the channel input X and output Y:
C = max I(X; Y ) bits/transmission.
p(x)

(See Section . for the definition of mutual information and its properties.) Unlike the
graphical network case, however, capacity is achieved only asymptotically error-free and
using sophisticated coding.
Lossless source coding theorem. As a “dual” to channel coding, consider the following
lossless data compression setting. The sender wishes to communicate (store) a source
sequence losslessly to a receiver over a noiseless binary channel (memory) with the min-
imum number of bits. Suppose that the source U is discrete and memoryless, that is, it
4 Introduction

generates an i.i.d. sequence U k . The sender encodes U k at rate R = n/k bits per source
symbol into an n-bit index M(U k ) and sends it over the channel. Upon receiving the index
M, the decoder finds an estimate Û k (M) of the source sequence such that the probability
of error P{Û k ̸= U k } is less than a prescribed value. Shannon again formulated the prob-
lem as one of finding the minimum lossless compression rate R∗ when the block length
is arbitrarily large, and showed that it is characterized by the entropy of U :

R∗ = H(U) bits/symbol.

(See Section . for the definition of entropy and its properties.)
Lossy source coding theorem. Now suppose U k is to be sent over the noiseless binary
channel such that the receiver can reconstruct it with some distortion instead of loss-
lessly. Shannon assumed the per-letter distortion (1/k) ∑ki=1 E(d(Ui , Û i )), where d(u, û )
is a measure of the distortion between the source symbol u and the reconstruction sym-
bol û . He characterized the rate–distortion function R(D), which is the optimal tradeoff
between the rate R = n/k and the desired distortion D, as the minimum of the mutual
information between U and Û :

R(D) = min ̂
I(U ; U) bits/symbol.
u|u):E(d(U ,Û ))≤D
p(̂

Source–channel separation theorem. Now we return to the general point-to-point com-

munication system shown in Figure .. Let C be the capacity of the discrete memory-
less channel (DMC) and R(D) be the rate–distortion function of the discrete memoryless
source (DMS), and assume for simplicity that k = n. What is the necessary and sufficient
condition for communicating the DMS over the DMC with a prescribed distortion D?
Shannon () showed that R(D) ≤ C is necessary. Since R(D) < C is sufficient by the
lossy source coding and channel coding theorems, separate source coding and channel
coding achieves the fundamental limit. Although this result holds only when the code
block length is unbounded, it asserts that using bits as a “universal” interface between
sources and channels—the basis for digital communication—is essentially optimal.
We discuss the above results in detail in Chapter . Shannon’s asymptotic approach to
network performance analysis will be adopted throughout the book.

1.4 NETWORK INFORMATION THEORY

The max-flow min-cut theorem and Shannon’s point-to-point information theory have
had a major impact on communication and networking. However, the simplistic model
of a networked information processing system as a single source–destination pair com-
municating over a noisy channel or a graphical network does not capture many important
aspects of real-world networks:
∙ Networked systems have multiple sources and destinations.
∙ The task of the network is often to compute a function or to make a decision.
1.4 Network Information Theory 5

∙ Wireless communication uses a shared broadcast medium.

∙ Networked systems involve complex tradeoffs between competition for resources and
cooperation for the common good.
∙ Many networks allow for feedback and interactive communication.
∙ Source–channel separation does not hold for networks in general.
∙ Network security is often a primary concern.
∙ Data from the sources is often bursty and network topology evolves dynamically.
Network information theory aims to answer the aforementioned information flow ques-
tions while capturing some of these aspects of real-world networks. In the following, we
illustrate some of the achievements of this theory using examples from the book.

1.4.1 Multiple Sources and Destinations

Coding for networks with many sources and destinations requires techniques beyond
routing and point-to-point source/channel coding. Consider the following settings.
Graphical multicast network. Suppose we wish to send a movie over the Internet to mul-
tiple destinations (multicast). Unlike the unicast case, routing is not optimal in general
even if we model the Internet by a graphical network. Instead, we need to use coding of
incoming packets at the relay nodes.
We illustrate this fact via the famous “butterfly network” shown in Figure ., where
source node  wishes to send a -bit message (M1 , M2 ) ∈ {0, 1}2 to destination nodes 
and . Assume link capacities C jk = 1 for all edges ( j, k). Note that using routing only,
both M1 and M2 must be sent over the edge (4, 5), and hence the message cannot be
communicated to both destination nodes.
However, if we allow the nodes to perform simple modulo- sum operations in ad-
dition to routing, the -bit message can be communicated to both destinations. As illus-
trated in Figure ., relay nodes , , and  forward multiple copies of their incoming bits,

2 6
̂6
M
M1 M1

M1 or M2
(M1 , M2 ) 1 4 5

M2 M2
̂7
M
3 7
Figure .. Butterfly network. The -bit message (M1 , M2 ) cannot be sent using
routing to both destination nodes  and .
6 Introduction

and relay node  sends the modulo- sum of M1 and M2 . Using this simple scheme, both
destination nodes  and  can recover the message error-free.

2 M1 6
̂6
M
M1 M1
M1 ⊕ M2
M1 ⊕ M2
(M1 , M2 ) 1 4 5
M1 ⊕ M2
M2 M2
M2
̂7
M
3 7
Figure .. The -bit message can be sent to destination nodes  and  using linear
network coding.

In Chapter , we show that linear network coding, which is a generalization of this
simple scheme, achieves the capacity of an arbitrary graphical multicast network. Exten-
sions of this multicast setting to lossy source coding are discussed in Chapters  and .
Distributed compression. Suppose that a sensor network is used to measure the temper-
ature over a geographical area. The output from each sensor is compressed and sent to
a base station. Although compression is performed separately on each sensor output, it
turns out that using point-to-point compression is not optimal when the sensor outputs
are correlated, for example, because the sensors are located close to each other.
Consider the distributed lossless compression system depicted in Figure .. Two se-
quences X1n and X2n are drawn from correlated discrete memoryless sources (X1 , X2 ) ∼
p(x1 , x2 ) and compressed separately into an nR1 -bit index M1 and an nR2 -bit index M2 ,
respectively. A receiver (base station) wishes to recover the source sequences from the
∗
index pair (M1 , M2 ). What is the minimum sum-rate Rsum , that is, the minimum over
R1 + R2 such that both sources can be reconstructed losslessly?
If each sender uses a point-to-point code, then by Shannon’s lossless source coding
theorem, the minimum lossless compression rates for the individual sources are R1∗ =
H(X1 ) and R2∗ = H(X2 ), respectively; hence the resulting sum-rate is H(X1 ) + H(X2 ).
If instead the two sources are jointly encoded, then again by the lossless source coding
theorem, the minimum lossless compression sum-rate is H(X1 , X2 ), which can be much
smaller than the sum of the individual entropies. For example, let X1 and X2 be binary-
valued sources with p X1 ,X2 (0, 0) = 0.495, p X1 ,X2 (0, 1) = 0.005, p X1 ,X2 (1, 0) = 0.005, and
p X1 ,X2 (1, 1) = 0.495; hence, the sources have the same outcome . of the time. From
the joint pmf, we see that X1 and X2 are both Bern(1/2) sources with entropy H(X1 ) =
H(X2 ) = 1 bit per symbol. By comparison, their joint entropy H(X1 , X2 ) = 1.0808 ≪ 2
bits per symbol pair.
∗
Slepian and Wolf (a) showed that Rsum = H(X1 , X2 ) and hence that the minimum
1.4 Network Information Theory 7

X1n M1
Encoder 
X̂ 1n , X̂ 2n
Decoder
X2n M2
Encoder 

Figure .. Distributed lossless compression system. Each source sequence X nj ,

j = 1, 2, is encoded into an index M j (X nj ) ∈ [1 : 2nR 󰑗 ), and the decoder wishes to
reconstruct the sequences losslessly from (M1 , M2 ).

sum-rate for distributed compression is asymptotically the same as for centralized com-
pression! This result is discussed in Chapter . Generalizations to distributed lossy com-
pression are discussed in Chapters  and .
Communication for computing. Now suppose that the base station in the tempera-
ture sensor network wishes to compute the average temperature over the geographical
area instead of the individual temperature values. What is the amount of communication
needed?
While in some cases the rate requirement for computing a function of the sources is the
same as that for recovering the sources themselves, it is sometimes significantly smaller.
As an example, consider an n-round online game, where in each round Alice and Bob
each select one card without replacement from a virtual hat with three cards labeled , ,
and . The one with the larger number wins. Let X n and Y n be the sequences of numbers
on Alice and Bob’s cards over the n rounds, respectively. Alice encodes her sequence X n
into an index M ∈ [1 : 2nR ] and sends it to Bob so that he can find out who won in each
round, that is, find an estimate Ẑ n of the sequence Zi = max{Xi , Yi } for i ∈ [1 : n], as
shown in Figure .. What is the minimum communication rate R needed?
By the aforementioned Slepian–Wolf result, the minimum rate needed for Bob to re-
construct X is the conditional entropy H(X|Y) = H(X, Y ) − H(Y) = 2/3 bit per round.
By exploiting the structure of the function Z = max{X, Y}, however, it can be shown that
only 0.5409 bit per round is needed.
This card game example as well as general results on communication for computing
are discussed in Chapter .

Xn M Ẑ n
Encoder Decoder
(Alice) (Bob)
n
Y

Figure .. Online game setup. Alice has the card number sequence X n and Bob
has the card number sequence Y n . Alice encodes her card number sequence into
an index M ∈ [1 : 2nR ] and sends it to Bob, who wishes to losslessly reconstruct the
winner sequence Z n .
8 Introduction

1.4.2 Wireless Networks

Perhaps the most important practical motivation for studying network information the-
ory is to deal with the special nature of wireless channels. We study models for wireless
communication throughout the book.
The first and simplest wireless channel model we consider is the point-to-point Gauss-
ian channel Y = дX + Z depicted in Figure ., where Z ∼ N(0, N0 /2) is the receiver noise
and д is the channel gain. Shannon showed that the capacity of this channel under a
prescribed average transmission power constraint P on X, i.e., ∑ni=1 Xi2 ≤ nP for each
codeword X n , has the simple characterization
1
C= log(1 + S) = C(S),
2

where S = 2д 2 P/N0 is the received signal-to-noise ratio (SNR).

Z ∼ N(0, N0 /2)

д
X Y

Figure .. Gaussian point-to-point channel.

A wireless network can be turned into a set of separate point-to-point Gaussian chan-
nels via time or frequency division. This traditional approach to wireless communication,
however, does not take full advantage of the broadcast nature of the wireless medium as
illustrated in the following example.
Gaussian broadcast channel. The downlink of a wireless system is modeled by the Gauss-
ian broadcast channel
Y1 = д1 X + Z1 ,
Y2 = д2 X + Z2 ,

as depicted in Figure .. Here Z1 ∼ N(0, N0 /2) and Z2 ∼ N(0, N0 /2) are the receiver noise
components, and д12 > д22 , that is, the channel to receiver  is stronger than the channel to
receiver . Define the SNRs for receiver j = 1, 2 as S j = 2д 2j P/N0 . Assume average power
constraint P on X.
The sender wishes to communicate a message M j at rate R j to receiver j for j = 1, 2.
What is the capacity region C of this channel, namely, the set of rate pairs (R1 , R2 ) such
that the probability of decoding error at both receivers can be made arbitrarily small as
the code block length becomes large?
If we send the messages M1 and M2 in different time intervals or frequency bands,
then we can reliably communicate at rate pairs in the “time-division region” R shown in
Figure .. Cover () showed that higher rates can be achieved by adding the code-
words for the two messages and sending this sum over the entire transmission block. The
1.4 Network Information Theory 9

stronger receiver  decodes for both codewords, while the weaker receiver  treats the
other codeword as noise and decodes only for its own codeword. Using this superposition
coding scheme, the sender can reliably communicate the messages at any rate pair in the
capacity region C shown in Figure .b, which is strictly larger than the time-division
region R.

Z1 ∼ N(0, N0 /2)
R2

Y1 R
д1
C(S2 )
X C
д2
Y2

R1
C(S1 )
Z2 ∼ N(0, N0 /2)
(a) (b)

Figure .. (a) Gaussian broadcast channel with SNRs S1 = д12 P > д22 P = S2 . (b) The
time-division inner bound R and the capacity region C .

This superposition scheme and related results are detailed in Chapter . Similar im-
provements in rates can be achieved for the uplink (multiple access channel) and the in-
tercell interference (interference channel), as discussed in Chapters  and , respectively.
Gaussian vector broadcast channel. Multiple transmitter and receiver antennas are com-
monly used to enhance the performance of wireless communication systems. Coding
for these multiple-input multiple-output (MIMO) channels, however, requires techniques
beyond single-antenna (scalar) channels. For example, consider the downlink of a MIMO
wireless system modeled by the Gaussian vector broadcast channel

Y1 = G1 X + Z1 ,
Y2 = G2 X + Z2 ,

where G1 , G2 are r-by-t channel gain matrices and Z1 ∼ N(0, Ir ) and Z2 ∼ N(0, Ir ) are
noise components. Assume average power constraint P on X. Note that unlike the single-
antenna broadcast channel shown in Figure ., in the vector case neither receiver is nec-
essarily stronger than the other. The optimum coding scheme is based on the following
writing on dirty paper result. Suppose we wish to communicate a message over a Gaussian
vector channel,
Y = GX + S + Z

where S ∼ N(0, KS ) is an interference signal, which is independent of the Gaussian noise

10 Introduction

Z ∼ N(0, Ir ). Assume average power constraint P on X. When the interference sequence

Sn is available at the receiver, it can be simply subtracted from the received sequence and
hence the channel capacity is the same as when there is no interference. Now suppose that
the interference sequence is available only at the sender. Because of the power constraint,
it is not always possible to presubtract the interference from the transmitted codeword.
It turns out, however, that the effect of interference can still be completely canceled via
judicious precoding and hence the capacity is again the same as that with no interference!
This scheme is applied to the Gaussian vector broadcast channel as follows.
∙ To communicate the message M2 to receiver , consider the channel Y2 = G2 X2 +
G2 X1 + Z2 with input X2 , Gaussian interference G2 X1 , and additive Gaussian noise
Z2 . Receiver  recovers M2 while treating the interference signal G2 X1 as part of the
noise.
∙ To communicate the message M1 to receiver , consider the channel Y1 = G1 X1 +
G1 X2 + Z1 , with input X1 , Gaussian interference G1 X2 , and additive Gaussian noise
Z1 , where the interference sequence G1 X2n (M2 ) is available at the sender. By the writ-
ing on dirty paper result, the transmission rate of M1 can be as high as that for the
channel Y󳰀1 = G1 X1 + Z1 without interference.
The writing on dirty paper result is discussed in detail in Chapter . The optimality of this
scheme for the Gaussian vector broadcast channel is established in Chapter .
Gaussian relay channel. An ad-hoc or a mesh wireless network is modeled by a Gaussian
multihop network in which nodes can act as relays to help other nodes communicate their
messages. Again reducing such a network to a set of links using time or frequency division
does not take full advantage of the shared wireless medium, and the rate can be greatly
increased via node cooperation.
As a canonical example, consider the -node relay channel depicted in Figure .a.
Here node  is located on the line between nodes  and  as shown in Figure .b. We
assume that the channel gain from node k to node j is д jk = r −3/2
jk , where r jk is the distance
−3/2 −3/2
between nodes k and j. Hence д31 = r31 , д21 = r21 , and д32 = (r31 − r21 )−3/2 . Assume
average power constraint P on each of X1 and X2 .
Suppose that sender node  wishes to communicate a message M to receiver node 
with the help of relay node . On the one extreme, the sender and the receiver can commu-
nicate directly without help from the relay. On the other extreme, we can use a multihop
scheme where the relay plays a pivotal role in the communication. In this commonly used
scheme, the sender transmits the message to the relay in the first hop and the relay recov-
ers the message and transmits it to the receiver concurrently in the second hop, causing
interference to the first-hop communication. If the receiver is far away from the sender,
that is, the distance r31 is large, this scheme performs well because the interference due to
the concurrent transmission is weak. However, when r31 is not large, the interference can
adversely affect the communication of the message.
In Chapter , we present several coding schemes that outperform both direct trans-
mission and multihop.
1.4 Network Information Theory 11

Z2 ∼ N(0, N0 /2)

Z3 ∼ N(0, N0 /2)
д21 Y2 X2 д32

X1 Y3
д31

(a)
r21
1 2 3
r31

(b)

Figure .. (a) Gaussian relay channel. (b) Node placements: relay node  is placed
along the lines between sender node  and receiver node .

∙ Decode–forward. The direct transmission and multihop schemes are combined and
further enhanced via coherent transmission by the sender and the relay. The receiver
decodes for the signals from both hops instead of treating the transmission from the
first hop as interference. Decode–forward performs well when the relay is closer to
the sender, i.e., r21 < (1/2)r31 .
∙ Compress–forward. As an alternative to the “digital-to-digital” relay interface used
in multihop and decode–forward, the compress–forward scheme uses an “analog-to-
digital” interface in which the relay compresses the received signal and sends the com-
pression index to the receiver. Compress–forward performs well when the relay is
closer to the receiver.
∙ Amplify–forward. Decode–forward and compress–forward require sophisticated op-
erations at the nodes. The amplify–forward scheme provides a much simpler “analog-
to-analog” interface in which the relay scales the incoming signal and transmits it
to the receiver. In spite of its simplicity, amplify–forward can outperform decode–
forward when the relay is closer to the receiver.

The performance of the above relaying schemes are compared in Figure .. In general,
it can be shown that both decode–forward and compress–forward achieve rates within
1/2 bit of the capacity, while amplify–forward achieves rates within 1 bit of the capacity.
We extend the above coding schemes to general multihop networks in Chapters 
and . In particular, we show that extending compress–forward leads to a noisy network
coding scheme that includes network coding for graphical multicast networks as a special
case. When applied to Gaussian multihop multicast networks, this noisy network coding
scheme achieves within a constant gap of the capacity independent of network topology,
12 Introduction

3.5
RDF
3
RCF
2.5
RAF
2

RDT
1.5

1
RMH
0.5

00 0.2 0.4 0.6 0.8 1

r21

Figure .. Comparison of the achievable rates for the Gaussian relay channel us-
ing direct transmission (RDT ), multihop (RMH ), decode–forward (RDF ), compress–
forward (RCF ), and amplify–forward (RAF ) for N0 /2 = 1, r31 = 1 and P = 10.

channel parameters, and power constraints, while extensions of the other schemes do not
yield such performance guarantees.
To study the effect of interference and path loss in large wireless networks, in Chap-
ter  we also investigate how capacity scales with the network size. We show that relaying
and spatial reuse of frequency/time can greatly increase the rates over naive direct trans-
mission with time division.
Wireless fading channels. Wireless channels are time varying due to scattering of signals
over multiple paths and user mobility. In Chapter , we study fading channel models
that capture these effects by allowing the gains in the Gaussian channels to vary randomly
with time. In some settings, channel capacity in the Shannon sense is not well defined.
We introduce different coding approaches and corresponding performance metrics that
are useful in practice.

1.4.3 Interactive Communication

Real-world networks allow for feedback and node interactions. Shannon () showed
that feedback does not increase the capacity of a memoryless channel. Feedback, how-
ever, can help simplify coding and improve reliability. This is illustrated in the following
example.
Binary erasure channel with feedback. The binary erasure channel is a DMC with binary
input X ∈ {0, 1} and ternary output Y ∈ {0, 1, e}. Each transmitted bit ( or ) is erased
1.4 Network Information Theory 13

(Y = e) with probability p. The capacity of this channel is 1 − p and achieving it requires

sophisticated block coding. Now suppose that noiseless causal feedback from the receiver
to the sender is present, that is, the sender at each time i has access to all previous received
symbols Y i−1 . Then we can achieve the capacity simply by retransmitting each erased
bit. Using this simple feedback scheme, on average n = k/(1 − p) transmissions suffice to
reliably communicate k bits of information.
Unlike point-to-point communication, feedback can achieve higher rates in networks
with multiple senders/receivers.
Binary erasure multiple access channel with feedback. Consider the multiple access
channel (MAC) with feedback depicted in Figure ., where the channel inputs X1 and
X2 are binary and the channel output Y = X1 + X2 is ternary, i.e., Y ∈ {0, 1, 2}. Suppose
that senders  and  wish to communicate independent messages M1 and M2 , respectively,
to the receiver at the same rate R. Without feedback, the symmetric capacity, which is the
maximum rate R, is max p(x1 )p(x2 ) H(Y) = 3/4 bits/transmission.
Noiseless causal feedback allows the senders to cooperate in communicating their mes-
sages and hence to achieve higher symmetric rates than with no feedback. To illustrate
such cooperation, suppose that each sender first transmits its k-bit message uncoded.
On average k/2 bits are “erased” (that is, Y = 0 + 1 = 1 + 0 = 1 is received). Since the
senders know through feedback the exact locations of the erasures as well as the corre-
sponding message bits from both messages, they can cooperate to send the erased bits
from the first message (which is sufficient to recover both messages). This cooperative
retransmission requires k/(2 log 3) transmissions. Hence we can increase the symmetric
rate to R = k/(k + k/(2 log 3)) = 0.7602. This rate can be further increased to 0.7911 by
using a more sophisticated coding scheme that sends new messages simultaneously with
cooperative retransmissions.
In Chapter , we discuss the iterative refinement approach illustrated in the binary
erasure channel example; the cooperative feedback approach for multiuser channels il-
lustrated in the binary erasure MAC example; and the two-way channel. In Chapters 

Y i−1

M1 X1i
Encoder 
Yi ̂ 1, M
M ̂2
Decoder
M2 X2i
Encoder 

Y i−1

Figure .. Feedback communication over a binary erasure MAC. The channel
inputs X1i and X2i at time i ∈ [1 : n] are functions of (M1 , Y i−1 ) and (M2 , Y i−1 ),
respectively.
14 Introduction

through , we show that interaction can also help in distributed compression, distributed
computing, and secret communication.

1.4.4 Joint Source–Channel Coding

As we mentioned earlier, Shannon showed that separate source and channel coding is
asymptotically optimal for point-to-point communication. It turns out that such sepa-
ration does not hold in general for sending correlated sources over multiuser networks.
In Chapter , we demonstrate this breakdown of separation for lossless communication
of correlated sources over multiple access and broadcast channels. This discussion yields
natural definitions of various notions of common information between two sources.

1.4.5 Secure Communication

Confidentiality of information is a crucial requirement in networking applications such
as e-commerce. In Chapter , we discuss several coding schemes that allow a legitimate
sender (Alice) to communicate a message reliably to a receiver (Bob) while keeping it
secret (in a strong sense) from an eavesdropper (Eve). When the channel from Alice
to Bob is stronger than that to Eve, a confidential message with a positive rate can be
communicated reliably without a shared secret key between Alice and Bob. By contrast,
when the channel from Alice to Bob is weaker than that to Eve, no confidential message
can be communicated reliably. We show, however, that Alice and Bob can still agree on a
secret key through interactive communication over a public (nonsecure) channel that Eve
has complete access to. This key can then be used to communicate a confidential message
at a nonzero rate.

1.4.6 Network Information Theory and Networking

Many aspects of real-world networks such as bursty data arrivals, random access, asyn-
chrony, and delay constraints are not captured by the standard models of network infor-
mation theory. In Chapter , we present several examples for which such networking
issues have been successfully incorporated into the theory. We present a simple model for
random medium access control (used for example in the ALOHA network) and show that
a higher throughput can be achieved using a broadcasting approach instead of encoding
the packets at a fixed rate. In another example, we establish the capacity region of the
asynchronous multiple access channel.

1.4.7 Toward a Uniﬁed Network Information Theory

The above ideas and results illustrate some of the key ingredients of network information
theory. The book studies this fascinating subject in a systematic manner, with the ultimate
goal of developing a unified theory. We begin our journey with a review of Shannon’s
point-to-point information theory in the next two chapters.
PART I

PRELIMINARIES
CHAPTER 2

Information Measures and Typicality

We define entropy and mutual information and review their basic properties. We intro-
duce basic inequalities involving these information measures, including Fano’s inequality,
Mrs. Gerber’s lemma, the maximum differential entropy lemma, the entropy power in-
equality, the data processing inequality, and the Csiszár sum identity. We then introduce
the notion of typicality adopted throughout the book. We discuss properties of typical se-
quences and introduce the typical average lemma, the conditional typicality lemma, and
the joint typicality lemma. These lemmas as well as the aforementioned entropy and mu-
tual information inequalities will play pivotal roles in the proofs of the coding theorems
throughout the book.

2.1 ENTROPY

Let X be a discrete random variable with probability mass function (pmf) p(x) (in short
X ∼ p(x)). The uncertainty about the outcome of X is measured by its entropy defined as

H(X) = − 󵠈 p(x) log p(x) = − EX (log p(X)).

x∈X

For example, if X is a Bernoulli random variable with parameter p = P{X = 1} ∈ [0, 1] (in
short X ∼ Bern(p)), the entropy of X is

H(X) = −p log p − (1 − p) log(1 − p).

Since the Bernoulli random variable will be frequently encountered, we denote its entropy
by the binary entropy function H(p). The entropy function H(X) is a nonnegative and
concave function in p(x). Thus, by Jensen’s inequality (see Appendix B),

H(X) ≤ log |X |,

that is, the uniform pmf over X maximizes the entropy.

Let X be a discrete random variable and д(X) be a function of X. Then

H(д(X)) ≤ H(X),

where the inequality holds with equality if д is one-to-one over the support of X, i.e., the
set {x ∈ X : p(x) > 0}.
18 Information Measures and Typicality

Conditional entropy. Let X ∼ F(x) be an arbitrary random variable and Y | {X = x} ∼

H(Y | X) = 󵐐 H(Y | X = x) dF(x)

= − EX,Y 󶀡log p(Y | X)󶀱.

Conditional entropy is a measure of the remaining uncertainty about the outcome of Y

given the “observation” X. Again by Jensen’s inequality,

H(Y | X) ≤ H(Y ) (.)

with equality if X and Y are independent.

Joint entropy. Let (X, Y) ∼ p(x, y) be a pair of discrete random variables. The joint en-
tropy of X and Y is defined as

H(X, Y) = − E󶀡log p(X, Y)󶀱.

Note that this is the same as the entropy of a single “large" random variable (X, Y). The
chain rule for pmfs, p(x, y) = p(x)p(y|x) = p(y)p(x|y), leads to a chain rule for joint
entropy
H(X, Y ) = H(X) + H(Y | X) = H(Y) + H(X |Y ).

By (.), it follows that

H(X, Y) ≤ H(X) + H(Y) (.)

with equality if X and Y are independent.

The definition of entropy extends to discrete random vectors. Let X n ∼ p(x n ). Then
again by the chain rule for pmfs,

H(X n ) = H(X1 ) + H(X2 | X1 ) + ⋅ ⋅ ⋅ + H(Xn | X1 , . . . , Xn−1 )

n
= 󵠈 H(Xi | X1 , . . . , Xi−1 )
i=1
n
= 󵠈 H(Xi | X i−1 ).
i=1

Using induction and inequality (.), it follows that H(X n ) ≤ ∑ni=1 H(Xi ) with equality if
X1 , X2 , . . . , Xn are mutually independent.
Next, we consider the following two results that will be used in the converse proofs of
many coding theorems. The first result relates equivocation to the “probability of error.”
2.2 Differential Entropy 19

Fano’s Inequality. Let (X, Y) ∼ p(x, y) and Pe = P{X ̸= Y}. Then

H(X |Y) ≤ H(Pe ) + Pe log |X | ≤ 1 + Pe log |X |.

The second result provides a lower bound on the entropy of the modulo- sum of two
binary random vectors.

Mrs. Gerber’s Lemma (MGL). Let H −1 : [0, 1] → [0, 1/2] be the inverse of the binary
entropy function, i.e., H(H −1 (󰑣)) = 󰑣.
∙ Scalar MGL: Let X be a binary random variable and let U be an arbitrary random
variable. If Z ∼ Bern(p) is independent of (X, U) and Y = X ⊕ Z, then

H(Y |U ) ≥ H󶀡H −1 (H(X |U )) ∗ p󶀱.

∙ Vector MGL: Let X n be a binary random vector and U be an arbitrary random vari-
able. If Z n is a vector of independent and identically distributed Bern(p) random
variables independent of (X n , U ) and Y n = X n ⊕ Z n , then

H(Y n |U ) H(X n |U)

≥ H 󶀥H −1 󶀤 󶀴 ∗ p󶀵 .
n n

The proof of this lemma follows by the convexity of the function H(H −1 (󰑣) ∗ p) in 󰑣
and using induction; see Problem ..
Entropy rate of a stationary random process. Let X = {Xi } be a stationary random pro-
cess with Xi taking values in a finite alphabet X . The entropy rate H(X) of the process X
is defined as
1
H(X) = lim H(X n ) = lim H(Xn | X n−1 ).
n→∞ n n→∞

2.2 DIFFERENTIAL ENTROPY

Let X be a continuous random variable with probability density function (pdf) f (x) (in
short X ∼ f (x)). The differential entropy of X is defined as

h(X) = − 󵐐 f (x) log f (x) dx = − E X 󶀡log f (X)󶀱.

For example, if X ∼ Unif[a, b], then

h(X) = log(b − a).

As another example, if X ∼ N(μ, σ 2 ), then

1
h(X) = log(2πeσ 2 ).
2
20 Information Measures and Typicality

The differential entropy h(X) is a concave function of f (x). However, unlike entropy
it is not always nonnegative and hence should not be interpreted directly as a measure
of information. Roughly speaking, h(X) + n is the entropy of the quantized version of X
using equal-size intervals of length 2−n (Cover and Thomas , Section .).
The differential entropy is invariant under translation but not under scaling.
∙ Translation: For any constant a, h(X + a) = h(X).
∙ Scaling: For any nonzero constant a, h(aX) = h(X) + log |a|.
The maximum differential entropy of a continuous random variable X ∼ f (x) under
the average power constraint E(X 2 ) ≤ P is
1
max h(X) = log(2πeP)
f (x):E(X 2 )≤P 2

and is attained when X is Gaussian with zero mean and variance P, i.e., X ∼ N(0, P); see
Remark . and Problem .. Thus, for any X ∼ f (x),
1
h(X) = h(X − E(X)) ≤ log(2πe Var(X)). (.)
2

Conditional differential entropy. Let X ∼ F(x) be an arbitrary random variable and

Y | {X = x} ∼ f (y|x) be continuous for every x. The conditional differential entropy h(Y|X)
of Y given X is defined as

h(Y | X) = 󵐐 h(Y | X = x) dF(x)

= − EX,Y 󶀡log f (Y | X)󶀱.

As for the discrete case in (.), conditioning reduces entropy, i.e.,

h(Y | X) ≤ h(Y ) (.)

with equality if X and Y are independent.

We will often be interested in the sum of two random variables Y = X + Z, where X is
an arbitrary random variable and Z is an independent continuous random variable with
bounded pdf f (z), for example, a Gaussian random variable. It can be shown in this case
that the sum Y is a continuous random variable with well-defined density.
Joint differential entropy. The definition of differential entropy can be extended to a
continuous random vector X n with joint pdf f (x n ) as

h(X n ) = − E󶀡log f (X n )󶀱.

For example, if X n is a Gaussian random vector with mean μ and covariance matrix K,
i.e., X n ∼ N(μ, K), then
1
h(X n ) = log󶀡(2πe)n |K |󶀱.
2
2.2 Differential Entropy 21

By the chain rule for pdfs and (.), we have

n n
h(X n ) = 󵠈 h(Xi | X i−1 ) ≤ 󵠈 h(Xi ) (.)
i=1 i=1

with equality if X1 , . . . , Xn are mutually independent. The translation and scaling prop-
erties of differential entropy continue to hold for the vector case.
∙ Translation: For any real-valued vector an , h(X n + an ) = h(X n ).
∙ Scaling: For any real-valued nonsingular n × n matrix A,

h(AX n ) = h(X n ) + log |det(A)| .

The following lemma will be used in the converse proofs of Gaussian source and chan-
nel coding theorems.

Maximum Diﬀerential Entropy Lemma. Let X ∼ f (x n ) be a random vector with co-

variance matrix KX = E[(X − E(X))(X − E(X))T ] ≻ 0. Then
1 1
h(X) ≤ log 󶀡(2πe)n |KX |󶀱 ≤ log󶀡(2πe)n |E(XXT )|󶀱, (.)
2 2

where E(XXT ) is the correlation matrix of X n . The first inequality holds with equality
if and only if X is Gaussian and the second inequality holds with equality if and only if
E(X) = 0. More generally, if (X, Y) = (X n , Y k ) ∼ f (x n , y k ) is a pair of random vectors
KX|Y = E[(X − E(X|Y))(X − E(X|Y))T ] is the covariance matrix of the error vector of
the minimum mean squared error (MMSE) estimate of X given Y, then
1
h(X|Y) ≤ log 󶀡(2πe)n |KX|Y |󶀱 (.)
2

with equality if (X, Y) is jointly Gaussian.

The proof of the upper bound in (.) is similar to the proof for the scalar case in (.);
see Problem .. The upper bound in (.) follows by applying (.) to h(X|Y = y) for
each y and Jensen’s inequality using the concavity of log |K| in K. The upper bound on
differential entropy in (.) can be further relaxed to
n n
n 1 n 1
h(X n ) ≤ log󶀤2πe󶀤 󵠈 Var(Xi )󶀴󶀴 ≤ log󶀤2πe󶀤 󵠈 E(Xi2 )󶀴󶀴. (.)
2 n i=1 2 n i=1

These inequalities can be proved using Hadamard’s inequality or more directly using (.),
(.), and Jensen’s inequality.
󰑛
The quantity 22h(X )/n /(2πe) is often referred to as the entropy power of the random
vector X n . The inequality in (.) shows that the entropy power is upper bounded by the
average power. The following inequality shows that the entropy power of the sum of two
22 Information Measures and Typicality

independent random vectors is lower bounded by the sum of their entropy powers. In a
sense, this inequality is the continuous analogue of Mrs. Gerber’s lemma.

Entropy Power Inequality (EPI).

∙ Scalar EPI: Let X ∼ f (x) and Z ∼ f (z) be independent random variables and Y =
X + Z. Then
22h(Y) ≥ 22h(X) + 22h(Z)

with equality if both X and Z are Gaussian.

∙ Vector EPI: Let X n ∼ f (x n ) and Z n ∼ f (z n ) be independent random vectors and
Y n = X n + Z n . Then 󰑛 󰑛 󰑛
22h(Y )/n ≥ 22h(X )/n + 22h(Z )/n

with equality if X n and Z n are Gaussian with KX = aKZ for some scalar a > 0.
∙ Conditional EPI: Let X n and Z n be conditionally independent given an arbitrary
random variable U, with conditional pdfs f (x n |u) and f (z n |u), and Y n = X n + Z n .
Then 󰑛 󰑛 󰑛
22h(Y |U )/n ≥ 22h(X |U )/n + 22h(Z |U)/n .

The scalar EPI can be proved, for example, using a sharp version of Young’s inequality
or de Bruijn’s identity; see Bibliographic Notes. The proofs of the vector and conditional
EPIs follow by the scalar EPI, the convexity of the function log(2󰑣 + 2󰑤 ) in (󰑣, 󰑤), and
induction.
Differential entropy rate of a stationary random process. Let X = {Xi } be a stationary
continuous-valued random process. The differential entropy rate h(X) of the process X is
defined as
1
h(X) = lim h(X n ) = lim h(Xn | X n−1 ).
n→∞ n n→∞

2.3 MUTUAL INFORMATION

Let (X, Y ) ∼ p(x, y) be a pair of discrete random variables. The information about X
obtained from the observation Y is measured by the mutual information between X and
Y defined as
p(x, y)
I(X; Y ) = 󵠈 p(x, y) log
(x, y)∈X ×Y
p(x)p(y)
= H(X) − H(X |Y )
= H(Y ) − H(Y | X)
= H(X) + H(Y) − H(X, Y).
2.3 Mutual Information 23

The mutual information I(X; Y) is a nonnegative function of p(x, y), and I(X; Y) = 0 if
and only if (iff) X and Y are independent. It is concave in p(x) for a fixed p(y|x), and
convex in p(y|x) for a fixed p(x). Mutual information can be defined also for a pair of
continuous random variables (X, Y ) ∼ f (x, y) as
f (x, y)
I(X; Y) = 󵐐 f (x, y) log dxd y
f (x) f (y)
= h(X) − h(X |Y )
= h(Y ) − h(Y | X)
= h(X) + h(Y) − h(X, Y).

Similarly, let X ∼ p(x) be a discrete random variable and Y | {X = x} ∼ f (y|x) be contin-

uous for every x. Then

I(X; Y ) = h(Y) − h(Y | X) = H(X) − H(X |Y).

In general, mutual information can be defined for an arbitrary pair of random variables
(Pinsker ) as
d μ(x, y)
I(X; Y ) = 󵐐 log d μ(x, y),
d(μ(x)× μ(y))
where d μ(x, y)/d(μ(x)× μ(y)) is the Radon–Nikodym derivative (see, for example, Roy-
den ) of the joint probability measure μ(x, y) with respect to the product probability
measure μ(x)× μ(y). Equivalently, it can be expressed as
I(X; Y ) = sup I(x̂(X); ŷ(Y )),
x̂, ŷ

where x̂(x) and ŷ(y) are finite-valued functions, and the supremum is over all such func-
tions. These definitions can be shown to include the above definitions of mutual informa-
tion for discrete and continuous random variables as special cases (Gray , Section )
by considering
I(X; Y ) = lim I([X] j ; [Y ]k ),
j,k→∞

where [X] j = x̂ j (X) and [Y]k = ŷk (Y ) can be any sequences of finite quantizations of X
and Y , respectively, such that the quantization errors (x − x̂ j (x)) and (y − ŷk (y)) tend to
zero as j, k → ∞ for every x, y.
Remark . (Relative entropy). Mutual information is a special case of the relative en-
tropy (Kullback–Leibler divergence). Let P and Q be two probability measures such that P
is absolutely continuous with respect to Q, then the relative entropy is defined as
dP
D(P | |Q) = 󵐐 log dP,
dQ
where dP/dQ is the Radon–Nikodym derivative. Thus, mutual information I(X; Y) is
the relative entropy between the joint and product measures of X and Y. Note that by the
convexity of log(1/x), D(P||Q) is nonnegative and D(P||Q) = 0 iff P = Q.
24 Information Measures and Typicality

Conditional mutual information. Let (X, Y) | {Z = z} ∼ F(x, y|z) and Z ∼ F(z). De-
note the mutual information between X and Y given {Z = z} by I(X; Y |Z = z). Then the
conditional mutual information I(X; Y|Z) between X and Y given Z is defined as

I(X; Y |Z) = 󵐐 I(X; Y | Z = z) dF(z).

For (X, Y , Z) ∼ p(x, y, z),

I(X; Y |Z) = H(X |Z) − H(X |Y , Z)

= H(Y |Z) − H(Y | X, Z)
= H(X |Z) + H(Y |Z) − H(X, Y |Z).

The conditional mutual information I(X; Y|Z) is nonnegative and is equal to zero iff X
and Y are conditionally independent given Z, i.e., X → Z → Y form a Markov chain.
Note that unlike entropy, no general inequality relationship exists between the conditional
mutual information I(X; Y|Z) and the mutual information I(X; Y ). There are, however,
two important special cases.
∙ Independence: If p(x, y, z) = p(x)p(z)p(y|x, z), that is, if X and Z are independent,
then
I(X; Y | Z) ≥ I(X; Y ).

This follows by the convexity of I(X; Y) in p(y|x) for a fixed p(x).

∙ Conditional independence: If Z → X → Y form a Markov chain, then

I(X; Y |Z) ≤ I(X; Y ).

This follows by the concavity of I(X; Y) in p(x) for a fixed p(y|x).

The definition of mutual information can be extended to random vectors in a straight-
forward manner. In particular, we can establish the following useful identity.

Chain Rule for Mutual Information. Let (X n , Y ) ∼ F(x n , y). Then

n
I(X n ; Y) = 󵠈 I(Xi ; Y | X i−1 ).
i=1

The following inequality shows that processing cannot increase information.

Data Processing Inequality. If X → Y → Z form a Markov chain, then

I(X; Z) ≤ I(X; Y).

2.4 Typical Sequences 25

Consequently, for any function д, I(X; д(Y)) ≤ I(X; Y), which implies the inequal-
ity in (.). To prove the data processing inequality, we use the chain rule to expand
I(X; Y , Z) in two ways as

I(X; Y , Z) = I(X; Y ) + I(X; Z |Y) = I(X; Y )

= I(X; Z) + I(X; Y |Z) ≥ I(X; Z).

The chain rule can be also used to establish the following identity, which will be used in
several converse proofs.

Csiszár Sum Identity. Let (U , X n , Y n ) ∼ F(u, x n , yn ). Then

n n
n
󵠈 I(Xi+1 ; Yi |Y i−1 , U ) = 󵠈 I(Y i−1 ; Xi | Xi+1
n
, U),
i=1 i=1

n
where Xn+1 , Y 0 = .

2.4 TYPICAL SEQUENCES

Let x n be a sequence with elements drawn from a finite alphabet X . Define the empirical
pmf of x n (also referred to as its type) as

|{i : xi = x}|
π(x |x n ) = for x ∈ X .
n
For example, if x n = (0, 1, 1, 0, 0, 1, 0), then

4/7 for x = 0,
π(x |x n ) = 󶁇
3/7 for x = 1.

Let X1 , X2 , . . . be a sequence of independent and identically distributed (i.i.d.) random

variables with Xi ∼ p X (xi ). Then by the (weak) law of large numbers (LLN), for each
x ∈ X,
π(x | X n ) → p(x) in probability.

Thus, with high probability, the random empirical pmf π(x|X n ) does not deviate much
from the true pmf p(x). For X ∼ p(x) and є ∈ (0, 1), define the set of є-typical n-sequences
x n (or the typical set in short) as

Tє(n) (X) = 󶁁x n : |π(x |x n ) − p(x)| ≤ єp(x) for all x ∈ X 󶁑 .

When it is clear from the context, we will use Tє(n) instead of Tє(n) (X). The following
simple fact is a direct consequence of the definition of the typical set.
26 Information Measures and Typicality

Typical Average Lemma. Let x n ∈ Tє(n) (X). Then for any nonnegative function д(x)
on X ,
n
1
(1 − є) E(д(X)) ≤ 󵠈 д(xi ) ≤ (1 + є) E(д(X)).
n i=1

Typical sequences satisfy the following properties:

. Let p(x n ) = ∏ni=1 p X (xi ). Then for each x n ∈ Tє(n) (X),

2−n(H(X)+δ(є)) ≤ p(x n ) ≤ 2−n(H(X)−δ(є)) ,

where δ(є) = єH(X). This follows by the typical average lemma with д(x) = − log p(x).
. The cardinality of the typical set is upper bounded as

󵄨󵄨T (n) (X)󵄨󵄨 ≤ 2n(H(X)+δ(є)) .

󵄨󵄨 є 󵄨󵄨

This can be shown by summing the lower bound in property  over the typical set.
. If X1 , X2 , . . . are i.i.d. with Xi ∼ p X (xi ), then by the LLN,

lim P󶁁X n ∈ Tє(n) (X)󶁑 = 1.

n→∞

. The cardinality of the typical set is lower bounded as

󵄨󵄨T (n) (X)󵄨󵄨 ≥ (1 − є)2n(H(X)−δ(є))

󵄨󵄨 є 󵄨󵄨

for n sufficiently large. This follows by property  and the upper bound in property .
The above properties are illustrated in Figure ..

Xn
Tє(n) (X)
P{X n ∈ Tє(n) (X)} ≥ 1 − є typical x n
|Tє(n) (X)| ≐ 2nH(X) p(x n ) ≐ 2−nH(X)

Figure .. Properties of typical sequences. Here X n ∼ ∏ni=1 p X (xi ).

2.5 Jointly Typical Sequences 27

2.5 JOINTLY TYPICAL SEQUENCES

The notion of typicality can be extended to multiple random variables. Let (x n , yn ) be a

pair of sequences with elements drawn from a pair of finite alphabets (X , Y). Define their
joint empirical pmf (joint type) as

|{i : (xi , yi ) = (x, y)}|

π(x, y|x n , y n ) = for (x, y) ∈ X × Y.
n
Let (X, Y ) ∼ p(x, y). The set of jointly є-typical n-sequences is defined as

Tє(n) (X, Y ) = 󶁁(x n , yn ) : |π(x, y|x n , y n ) − p(x, y)| ≤ єp(x, y) for all (x, y) ∈ X × Y󶁑.

Note that this is the same as the typical set for a single “large” random variable (X, Y ),
i.e., Tє(n) (X, Y ) = Tє(n) 󶀡(X, Y)󶀱. Also define the set of conditionally є-typical n sequences
as Tє(n) (Y |x n ) = 󶁁y n : (x n , y n ) ∈ Tє(n) (X, Y)󶁑. The properties of typical sequences can be
extended to jointly typical sequences as follows.

. Let (x n , y n ) ∈ Tє(n) (X, Y) and p(x n , y n ) = ∏ni=1 p X,Y (xi , yi ). Then

where δ(є) = єH(Y|X).

. Let X ∼ p(x) and Y = д(X). Let x n ∈ Tє(n) (X). Then y n ∈ Tє(n) (Y|x n ) iff yi = д(xi ) for
i ∈ [1 : n].

The following property deserves a special attention.

Conditional Typicality Lemma. Let (X, Y ) ∼ p(x, y). Suppose that x n ∈ Tє(n)
󳰀 (X) and

Y n ∼ p(y n |x n ) = ∏ni=1 pY|X (yi |xi ). Then, for every є > є 󳰀 ,

lim P󶁁(x n , Y n ) ∈ Tє(n) (X, Y)󶁑 = 1.

n→∞

The proof of this lemma follows by the LLN. The details are given in Appendix A.
Note that the condition є > є 󳰀 is crucial to applying the LLN because x n could otherwise
be on the boundary of Tє(n) (X); see Problem ..
28 Information Measures and Typicality

The conditional typicality lemma implies the following additional property of jointly
typical sequences.

. If x n ∈ Tє(n) 󳰀
󳰀 (X) and є < є, then for n sufficiently large,

󵄨󵄨T (n) (Y |x n )󵄨󵄨 ≥ (1 − є)2n(H(Y|X)−δ(є)) .

󵄨󵄨 є 󵄨󵄨

The above properties of jointly typical sequences are illustrated in two different ways
in Figure ..

Tє(n) (X) 󶀣| ⋅ | ≐ 2nH(X) 󶀳

xn
n
y

Tє(n) (Y ) Tє(n) (X, Y)

󶀣| ⋅ | ≐ 2nH(Y) 󶀳 󶀣| ⋅ | ≐ 2nH(X,Y) 󶀳

Tє(n) (Y|x n ) Tє(n) (X|y n )

Xn Yn
Tє(n) (X) Tє(n) (Y)

Tє(n) (Y |x n )
󶀣| ⋅ | ≐ 2nH(Y|X) 󶀳

Figure .. Properties of jointly typical sequences.

2.5 Jointly Typical Sequences 29

2.5.1 Joint Typicality for a Triple of Random Variables

Let (X, Y , Z) ∼ p(x, y, z). The set of jointly є-typical (x n , y n , z n ) sequences is defined as

Tє(n) (X, Y , Z) = 󶁁(x n , y n , z n ) : |π(x, y, z |x n , y n , z n ) − p(x, y, z)| ≤ єp(x, y, z)

for all (x, y, z) ∈ X × Y × Z󶁑.

Since this is equivalent to the typical set for a single “large” random variable (X, Y , Z) or
a pair of random variables ((X, Y), Z), the properties of (jointly) typical sequences con-
tinue to hold. For example, suppose that (x n , y n , z n ) ∈ Tє(n) (X, Y , Z) and p(x n , y n , z n ) =
∏ni=1 p X,Y ,Z (xi , yi , zi ). Then
. x n ∈ Tє(n) (X) and (yn , z n ) ∈ Tє(n) (Y , Z),
. p(x n , y n , z n ) ≐ 2−nH(X,Y ,Z) ,
. p(x n , y n |z n ) ≐ 2−nH(X,Y|Z) ,
. |Tє(n) (X|y n , z n )| ≤ 2n(H(X|Y ,Z)+δ(є)) , and
. if (y n , z n ) ∈ Tє(n) 󳰀
󳰀 (Y , Z) and є < є, then for n sufficiently large, |T є
(n)
(X|y n , z n )| ≥
2n(H(X|Y ,Z)−δ(є)) .
The following two-part lemma will be used in the achievability proofs of many coding
theorems.

Joint Typicality Lemma. Let (X, Y , Z) ∼ p(x, y, z) and є 󳰀 < є. Then there exists
δ(є) > 0 that tends to zero as є → 0 such that the following statements hold:
. If (x̃n , ỹn ) is a pair of arbitrary sequences and Z̃ n ∼ ∏ni=1 pZ|X (̃zi |x̃i ), then

P󶁁(x̃n , ỹn , Z̃ n ) ∈ Tє(n) (X, Y , Z)󶁑 ≤ 2−n(I(Y ;Z|X)−δ(є)) .

. If (x n , y n ) ∈ Tє(n)
󳰀 and Z̃ n ∼ ∏ni=1 pZ|X (̃zi |xi ), then for n sufficiently large,

P󶁁(x n , yn , Z̃ n ) ∈ Tє(n) (X, Y , Z)󶁑 ≥ 2−n(I(Y ;Z|X)+δ(є)) .

To prove the first statement, consider

P󶁁(x̃n , ỹn , Z̃ n ) ∈ Tє(n) (X, Y , Z)󶁑 = 󵠈 p(̃z n | x̃n )

z̃󰑛 ∈T󰜖(󰑛) (Z|x̃󰑛 , ỹ󰑛 )

≤ 󵄨󵄨󵄨󵄨Tє(n) (Z | x̃n , ỹn )󵄨󵄨󵄨󵄨 ⋅ 2−n(H(Z|X)−єH(Z|X))

Similarly, for every n sufficiently large,

P󶁁(x n , y n , Z̃ n ) ∈ Tє(n) (X, Y , Z)󶁑 ≥ 󵄨󵄨󵄨󵄨Tє(n) (Z |x n , y n )󵄨󵄨󵄨󵄨 ⋅ 2−n(H(Z|X)+єH(Z|X))

which proves the second statement.

Remark .. As an application of the joint typicality lemma, it can be easily shown that
if (X n , Y n ) ∼ ∏ni=1 p X,Y (xi , yi ) and Z̃ n | {X n = x n , Y n = y n } ∼ ∏ni=1 p Z|X (̃zi |xi ), then

P󶁁(X n , Y n , Z̃ n ) ∈ Tє(n) 󶁑 ≐ 2−nI(Y ;Z|X) .

Other applications of the joint typicality lemma are given in Problem ..

2.5.2 Multivariate Typical Sequences

Let (X1 , X2 , . . . , Xk ) ∼ p(x1 , x2 , . . . , xk ) and J be a nonempty subset of [1 : k]. Define the
subtuple of random variables X(J ) = (X j : j ∈ J ). For example, if k = 3 and J = {1, 3},
then X(J ) = (X1 , X3 ). The set of є-typical n-sequences (x1n , x2n , . . . , xkn ) is defined as
Tє(n) (X1 , X2 , . . . , Xk ) = Tє(n) ((X1 , X2 , . . . , Xk )), that is, as the typical set for a single ran-
dom variable (X1 , X2 , . . . , Xk ). We can similarly define Tє(n) (X(J )) for every J ⊆ [1 : k].
It can be easily checked that the properties of jointly typical sequences continue to
hold by considering X(J ) as a single random variable. For example, if (x1n , x2n , . . . , xkn ) ∈
Tє(n) (X1 , X2 , . . . , Xk ) and p(x1n , x2n , . . . , xkn ) = ∏ni=1 p X1 ,X2 ,...,X󰑘 (x1i , x2i , . . . , xki ), then for
all J , J 󳰀 ⊆ [1 : k],
. x n (J ) ∈ Tє(n) (X(J )),
󳰀
. p(x n (J )|x n (J 󳰀 )) ≐ 2−nH(X(J )|X(J )) ,
󳰀
. |Tє(n) (X(J )|x n (J 󳰀 ))| ≤ 2n(H(X(J )|X(J ))+δ(є))
, and
. if x n (J 󳰀 ) ∈ Tє(n) 󳰀 󳰀
󳰀 (X(J )) and є < є, then for n sufficiently large,

󳰀
|Tє(n) (X(J )|x n (J 󳰀 ))| ≥ 2n(H(X(J )|X(J ))−δ(є))
.

The conditional and joint typicality lemmas can be readily generalized to subsets J1 ,
J2 , and J3 and corresponding sequences x n (J1 ), x n (J2 ), and x n (J3 ) that satisfy similar
conditions.

SUMMARY

∙ Entropy as a measure of information

∙ Mutual information as a measure of information transfer
Bibliographic Notes 31

∙ Key inequalities and identities:

∙ Fano’s inequality
∙ Mrs. Gerber’s lemma
∙ Maximum differential entropy lemma
∙ Entropy power inequality
∙ Chain rules for entropy and mutual information
∙ Data processing inequality
∙ Csiszár sum identity
∙ Typical sequences:
∙ Typical average lemma
∙ Conditional typicality lemma
∙ Joint typicality lemma

BIBLIOGRAPHIC NOTES

Shannon () defined entropy and mutual information for discrete and continuous ran-
dom variables, and provide justifications of these definitions in both axiomatic and oper-
ational senses. Many of the simple properties of these quantities, including the maximum
entropy property of the Gaussian distribution, are also due to Shannon. Subsequently,
Kolmogorov () and Dobrushin (a) gave rigorous extensions of entropy and mu-
tual information to abstract probability spaces.
Fano’s inequality is due to Fano (). Mrs. Gerber’s lemma is due to Wyner and Ziv
(). Extensions of the MGL were given by Witsenhausen (), Witsenhausen and
Wyner (), and Shamai and Wyner ().
The entropy power inequality has a longer history. It first appeared in Shannon ()
without a proof. Full proofs were given subsequently by Stam () and Blachman ()
using de Bruijn’s identity (Cover and Thomas , Theorem ..). The EPI can be
rewritten in the following equivalent inequality (Costa and Cover ). For a pair of
independent random vectors X n ∼ f (x n ) and Z n ∼ f (z n ),

h(X n + Z n ) ≥ h( X̃ n + Z̃ n ), (.)

where X̃ n and Z̃ n are a pair of independent Gaussian random vectors with proportional
covariance matrices, chosen so that h(X n ) = h( X̃ n ) and h(Z n ) = h( Z̃ n ). Now (.) can be
proved by the strengthened version of Young’s inequality (Beckner , Brascamp and
Lieb ); see, for example, Lieb () and Gardner (). Recently, Verdú and Guo
() gave a simple proof by relating the minimum mean squared error (MMSE) and
32 Information Measures and Typicality

mutual information in Gaussian channels; see Madiman and Barron () for a similar
proof from a different angle. Extensions of the EPI are given by Costa (), Zamir and
Feder (), and Artstein, Ball, Barthe, and Naor ().
There are several notions of typicality in the literature. Our notion of typicality is that
of robust typicality due to Orlitsky and Roche (). As is evident in the typical average
lemma, it is often more convenient than the more widely known notion of strong typicality
(Berger , Csiszár and Körner b) defined as

n 󵄨󵄨󵄨 1 󵄨󵄨
A(n) n 󵄨
є (X) = 󶁃x : 󵄨󵄨 − n log p(x ) − H(X)󵄨󵄨 ≤ є󶁓 , (.)
󵄨 󵄨
where p(x n ) = ∏ni=1 p X (xi ). This is a weaker notion than the one we use, since Tє(n) ⊆
A(n) 󳰀
δ for δ = єH(X), while in general for some є > 0 there is no δ > 0 such that Aδ 󳰀 ⊆
(n)

Tє(n) . For example, every binary n-sequence is weakly typical with respect to the Bern(1/2)
pmf, but not all of them are typical. Weak typicality is useful when dealing with discrete
or continuous stationary ergodic processes because it is tightly coupled to the Shannon–
McMillan–Breiman theorem (Shannon , McMillan , Breiman , Barron ),
commonly referred to as the asymptotic equipartition property (AEP), which states that for
a discrete stationary ergodic process X = {Xi },
1
lim − log p(X n ) = H(X).
n→∞ n
However, we will encounter several coding schemes that require our notion of typicality.
Note, for example, that the conditional typicality lemma fails to hold under weak typical-
ity.

PROBLEMS

.. Prove Fano’s inequality.

.. Prove the Csiszár sum identity.
.. Prove the properties of jointly typical sequences with δ(є) terms explicitly speci-
fied.
.. Inequalities. Label each of the following statements with =, ≤, or ≥. Justify each
answer.
(a) H(X|Z) vs. H(X|Y ) + H(Y |Z).
(b) h(X + Y ) vs. h(X), if X and Y are independent continuous random variables.
(c) h(X + aY ) vs. h(X + Y), if Y ∼ N(0, 1) is independent of X and a ≥ 1.
Problems 33

(d) I(X1 , X2 ; Y1 , Y2 ) vs. I(X1 ; Y1) + I(X2 ; Y2 ), if p(y1 , y2 |x1 , x2 ) = p(y1|x1)p(y2 |x2 ).
(e) I(X1 , X2 ; Y1 , Y2 ) vs. I(X1 ; Y1 ) + I(X2 ; Y2 ), if p(x1 , x2 ) = p(x1 )p(x2 ).
(f) I(aX + Y ; bX) vs. I(X + Y /a; X), if a, b ̸= 0 and Y ∼ N(0, 1) is independent
of X.
.. Mrs. Gerber’s lemma. Let H −1 : [0, 1] → [0, 1/2] be the inverse of the binary en-
tropy function.
(a) Show that H(H −1 (󰑣) ∗ p) is convex in 󰑣 for every p ∈ [0, 1].
(b) Use part (a) to prove the scalar MGL

H(Y |U) ≥ H󶀡H −1 (H(X |U )) ∗ p󶀱.

(c) Use part (b) and induction to prove the vector MGL
H(Y n |U ) H(X n |U )
≥ H 󶀥H −1 󶀤 󶀴 ∗ p󶀵 .
n n
.. Maximum differential entropy. Let X ∼ f (x) be a zero-mean random variable with
finite variance and X ∗ ∼ f (x ∗ ) be a zero-mean Gaussian random variable with the
same variance as X.
(a) Show that

− 󵐐 f (x) log f X ∗ (x) dx = − 󵐐 f X ∗ (x) log fX ∗ (x) dx = h(X ∗ ).

(b) Using part (a) and the nonnegativity of relative entropy (see Remark .), con-
clude that

h(X) = −D( fX | | f X ∗ ) − 󵐐 f (x) log fX ∗ (x) dx ≤ h(X ∗ )

with equality iff X is Gaussian.

(c) Following similar steps, show that if X ∼ f (x) is a zero-mean random vector
and X∗ ∼ f (x∗ ) is a zero-mean Gaussian random vector with the same covari-
ance matrix, then
h(X) ≤ h(X∗ )

with equality iff X is Gaussian.

.. Maximum conditional differential entropy. Let (X, Y) = (X n , Y k ) ∼ f (x n , y k ) be a
pair of random vectors with covariance matrices KX = E[(X − E(X))(X − E(X))T ]
and KY = E[(Y − E(Y))(Y − E(Y))T ], and crosscovariance matrix KXY = E[(X −
E(X))(Y − E(Y))T ] = KYX
T
. Show that
1
h(X|Y) ≤ log󶀡(2πe)n |KX − KXY KY−1 KYX |󶀱
2
with equality if (X, Y) is jointly Gaussian.
34 Information Measures and Typicality

.. Hadamard’s inequality. Let Y n ∼ N(0, K). Use the fact that
n
1
h(Y n ) ≤ log󶀤(2πe)n 󵠉 Kii 󶀴
2 i=1

to prove Hadamard’s inequality

n
det(K) ≤ 󵠉 Kii .
i=1

.. Conditional entropy power inequality. Let X ∼ f (x) and Z ∼ f (z) be independent
random variables and Y = X + Z. Then by the EPI,

22h(Y) ≥ 22h(X) + 22h(Z)

with equality iff both X and Z are Gaussian.

(a) Show that log(2󰑣 + 2󰑤 ) is convex in (󰑣, 󰑤).
(b) Let X n and Z n be conditionally independent given an arbitrary random vari-
able U, with conditional densities f (x n |u) and f (z n |u), respectively. Use part
(a), the scalar EPI, and induction to prove the conditional EPI
󰑛 󰑛 󰑛
22h(Y |U)/n
≥ 22h(X |U)/n
+ 22h(Z |U )/n
.

.. Entropy rate of a stationary source. Let X = {Xi } be a discrete stationary random
process.
(a) Show that
H(X n ) H(X n−1 )
≤ for n = 2, 3, . . . .
n n−1
(b) Conclude that the entropy rate
H(X n )
H(X) = lim
n→∞ n
is well-defined.
(c) Show that for a continuous stationary process Y = {Yi },
h(Y n ) h(Y n−1 )
≤ for n = 2, 3, . . . .
n n−1
.. Worst noise for estimation. Let X ∼ N(0, P) and Z be independent of X with zero
mean and variance N. Show that the minimum mean squared error (MMSE) of
estimating X given X + Z is upper bounded as
PN
E󶁡(X − E(X | X + Z))2 󶁱 ≤
P+N
with equality if Z is Gaussian. Thus, Gaussian noise is the worst noise if the input
to the channel is Gaussian.
Problems 35

.. Worst noise for information. Let X and Z be independent, zero-mean random
variables with variances P and N, respectively.
(a) Show that
1 2πePN
h(X | X + Z) ≤ log 󶀤 󶀴
2 P+N
with equality iff both X and Z are Gaussian. (Hint: Use the maximum differ-
ential entropy lemma or the EPI or Problem ..)
(b) Let X ∗ and Z ∗ be independent zero-mean Gaussian random variables with
variances P and N, respectively. Use part (a) to show that

I(X ∗ ; X ∗ + Z ∗ ) ≤ I(X ∗ ; X ∗ + Z)

with equality iff Z is Gaussian. Thus, Gaussian noise is the worst noise when
the input to an additive channel is Gaussian.
.. Joint typicality. Let (X, Y ) ∼ p(x, y) and є > є 󳰀 . Let X n ∼ p(x n ) be an arbitrary
random sequence and Y n | {X n = x n } ∼ ∏ni=1 pY|X (yi |xi ). Using the conditional
typicality lemma, show that

lim P󶁁(X n , Y n ) ∈ Tє(n) 󵄨󵄨󵄨󵄨 X n ∈ Tє(n)

󳰀 󶁑 = 1.
n→∞

.. Variations on the joint typicality lemma. Let (X, Y , Z) ∼ p(x, y, z) and 0 < є 󳰀 < є.
Prove the following statements.
(a) Let (X n , Y n ) ∼ ∏ni=1 p X,Y (xi , yi ) and Z̃ n |{X n = x n , Y n = y n } ∼ ∏ni=1 pZ|X (̃zi |xi ),
conditionally independent of Y n given X n . Then

P󶁁(X n , Y n , Z̃ n ) ∈ Tє(n) (X, Y , Z)󶁑 ≐ 2−nI(Y ;Z|X) .

(b) Let (x n , y n ) ∈ Tє(n) ̃ n ∼ Unif(T (n) (Z|x n )). Then

󳰀 (X, Y) and Z є

P󶁁(x n , y n , Z̃ n ) ∈ Tє(n) (X, Y , Z)󶁑 ≐ 2−nI(Y ;Z|X) .

(c) Let x n ∈ Tє(n) ̃n be an arbitrary sequence, and Z̃ n ∼ p(̃z n |x n ), where

󳰀 (X), y

∏ni=1 pZ|X (̃zi |xi )

Then
P󶁁(x n , ỹn , Z̃ n ) ∈ Tє(n) (X, Y , Z)󶁑 ≤ 2−n(I(Y ;Z|X)−δ(є)) .

(d) Let ( X̃ n , Ỹ n , Z̃ n ) ∼ ∏ni=1 p X (x̃i )pY ( ỹi )pZ|X,Y (̃zi |x̃i , ỹi ). Then

P󶁁( X̃ n , Ỹ n , Z̃ n ) ∈ Tє(n) (X, Y , Z)󶁑 ≐ 2−nI(X;Y) .

36 Information Measures and Typicality

.. Jointly typical triples. Given (X, Y , Z) ∼ p(x, y, z), let

An = 󶁁(x n , y n , z n ) : (x n , y n ) ∈ Tє(n) (X, Y),

(y n , z n ) ∈ Tє(n) (Y , Z), (x n , z n ) ∈ Tє(n) (X, Z)󶁑.

(a) Show that

|An | ≤ 2n(H(X,Y)+H(Y ,Z)+H(X,Z)+δ(є))/2 .

(Hint: First show that |An | ≤ 2n(H(X,Y)+H(Z|Y)+δ(є)) .)

(b) Does the corresponding lower bound hold in general? (Hint: Consider X =
Y = Z.)
̃ ̃ ̃
Remark: It can be shown that |An | ≐ 2n(max H( X,Y , Z)) , where the maximum is over
all joint pmfs p(x̃, ỹ, z̃) such that p(x̃, ỹ) = p X,Y (x̃, ỹ), p( ỹ, z̃) = pY ,Z ( ỹ, z̃), and
̃ z̃) = p X,Z (x̃, z̃).
p(x,
.. Multivariate typicality. Let (U , X, Y , Z) ∼ p(u, x, y, z). Prove the following state-
ments.
(a) If (Ũ n , X̃ n , Ỹ n , Z̃ n ) ∼ ∏ni=1 pU (ũ i )p X (x̃i )pY ( ỹi )pZ (̃zi ), then

P󶁁(Ũ n , X̃ n , Ỹ n , Z̃ n ) ∈ Tє(n) (U , X, Y , Z)󶁑 ≐ 2−n(I(U ;X)+I(U ,X;Y)+I(U ,X,Y ;Z)) .

(b) If (Ũ n , X̃ n , Ỹ n , Z̃ n ) ∼ ∏ni=1 pU ,X (ũ i , x̃i )pY|X ( ỹi | x̃i )pZ (̃zi ), then

P󶁁(Ũ n , X̃ n , Ỹ n , Z̃ n ) ∈ Tє(n) (U , X, Y , Z)󶁑 ≐ 2−n(I(U ;Y|X)+I(U ,X,Y ;Z)) .

.. Need for both є and є 󳰀 . Let (X, Y ) be a pair of independent Bern(1/2) random
variables. Let k = ⌊(n/2)(1 + є)⌋ and x n be a binary sequence with k ones followed
by (n − k) zeros.
(a) Check that x n ∈ Tє(n) (X).
(b) Let Y n be an i.i.d. Bern(1/2) sequence, independent of x n . Show that

k
P󶁁(x n , Y n ) ∈ Tє(n) (X, Y )󶁑 ≤ P 󶁇󵠈 Yi < (k + 1)/2󶁗 ,
i=1

which converges to 1/2 as n → ∞. Thus, the fact that x n ∈ Tє(n) (X) and Y n ∼
∏ni=1 pY|X (yi |xi ) does not necessarily imply that P{(x n , Y n ) ∈ Tє(n) (X, Y)}.
Remark: This problem illustrates that in general we need є > є 󳰀 in the conditional
typicality lemma.
Appendix 2A Proof of the Conditional Typicality Lemma 37

APPENDIX 2A PROOF OF THE CONDITIONAL TYPICALITY LEMMA

We wish to show that

lim P󶁁|π(x, y|x n , Y n ) − p(x, y)| > єp(x, y) for some (x, y) ∈ X × Y󶁑 = 0.
n→∞

For x ∈ X such that p(x) ̸= 0, consider

P󶁁|π(x, y|x n , Y n ) − p(x, y)| > єp(x, y)󶁑

Now, since x n ∈ Tє(n) 󳰀 󳰀

󳰀 (X), 1 − є ≤ π(x|x )/p(x) ≤ 1 + є ,
n

π(x, y|x n , Y n ) π(x|x n ) π(x, y|x n , Y n ) 1+є

P󶁅 ⋅ > 1 + є󶁕 ≤ P 󶁅 > p(y|x)󶁕 (.)
π(x|x )p(y|x)
n p(x) π(x|x )n 1 + є󳰀

and

π(x, y|x n , Y n ) π(x|x n ) π(x, y|x n , Y n ) 1−є

P󶁅 ⋅ < 1 − є󶁕 ≤ P 󶁅 < p(y|x)󶁕 . (.)
π(x|x )p(y|x)
n p(x) π(x|x n ) 1 − є󳰀

Since є 󳰀 < є, we have (1 + є)/(1 + є 󳰀 ) > 1 and (1 − є)/(1 − є 󳰀 ) < 1. Furthermore, since Y n
is generated according to the correct conditional pmf, by the LLN, for every y ∈ Y,

π(x, y|x n , Y n )
→ p(y|x) in probability.
π(x|x n )

Hence, both upper bounds in (.) and (.) tend to zero as n → ∞, which, by the union
of events bound over all (x, y) ∈ X × Y, completes the proof of the conditional typicality
lemma.
CHAPTER 3

Point-to-Point Information Theory

We review Shannon’s basic theorems for point-to-point communication. Over the course
of the review, we introduce the techniques of random coding and joint typicality encoding
and decoding, and develop the packing and covering lemmas. These techniques will be
used in the achievability proofs for multiple sources and channels throughout the book.
We rigorously show how achievability for a discrete memoryless channel or source can be
extended to its Gaussian counterpart. We also show that under our definition of typical-
ity, the lossless source coding theorem is a corollary of the lossy source coding theorem.
This fact will prove useful in later chapters. Along the way, we point out some key differ-
ences between results for point-to-point communication and for the multiuser networks
discussed in subsequent chapters.

3.1 CHANNEL CODING

Consider the point-to-point communication system model depicted in Figure ., where
a sender wishes to reliably communicate a message M at a rate R bits per transmission to
a receiver over a noisy communication channel (or a noisy storage medium). Toward this
end, the sender encodes the message into a codeword X n and transmits it over the channel
in n time instances (also referred to as transmissions or channel uses). Upon receiving the
noisy sequence Y n , the receiver decodes it to obtain the estimate M̂ of the message. The
channel coding problem is to find the channel capacity, which is the highest rate R such
that the probability of decoding error can be made to decay asymptotically to zero with
the code block length n.
We first consider the channel coding problem for a simple discrete memoryless chan-
nel (DMC) model (X , p(y|x), Y) (in short p(y|x)) that consists of a finite input set (or
alphabet) X , a finite output set Y, and a collection of conditional pmfs p(y|x) on Y for
every x ∈ X . Thus, if an input symbol x ∈ X is transmitted, the probability of receiv-
ing an output symbol y ∈ Y is p(y|x). The channel is stationary and memoryless in the

M Xn Yn ̂
M
Encoder p(y|x) Decoder

Figure .. Point-to-point communication system.

3.1 Channel Coding 39

sense that when it is used n times with message M drawn from an arbitrary set and input
X n ∈ X n , the output Yi ∈ Y at time i ∈ [1 : n] given (M, X i , Y i−1 ) is distributed according
to p(yi |x i , y i−1 , m) = pY|X (yi |xi ). Throughout the book, the phrase “discrete memoryless
(DM)” will refer to “finite-alphabet and stationary memoryless”.
A (2nR , n) code for the DMC p(y|x) consists of
∙ a message set [1 : 2nR ] = {1, 2, . . . , 2⌈nR⌉ },
∙ an encoding function (encoder) x n : [1 : 2nR ] → X n that assigns a codeword x n (m) to
each message m ∈ [1 : 2nR ], and
∙ a decoding function (decoder) m ̂ : Y n → [1 : 2nR ] ∪ {e} that assigns an estimate m
̂ ∈
[1 : 2 ] or an error message e to each received sequence yn .
nR

Note that under the above definition of a (2nR , n) code, the memoryless property im-
plies that
n
p(y n |x n , m) = 󵠉 pY|X (yi |xi ). (.)
i=1

The set C = {x n (1), x n (2), . . . , x n (2⌈nR⌉ )} is referred to as the codebook associated with the
(2nR , n) code. We assume that the message is uniformly distributed over the message set,
i.e., M ∼ Unif[1 : 2nR ].
The performance of a given code is measured by the probability that the estimate of
the message is different from the actual message sent. More precisely, let λm (C) = P{M ̂ ̸=
m | M = m} be the conditional probability of error given that message m is sent. Then, the
average probability of error for a (2nR , n) code is defined as

2⌈󰑛󰑅⌉
̂ ̸= M} = 1
Pe(n) (C) = P{M 󵠈 λm (C).
2⌈nR⌉ m=1

A rate R is said to be achievable if there exists a sequence of (2nR , n) codes such that
limn→∞ Pe(n) (C) = 0. The capacity C of a DMC is the supremum over all achievable rates.
Remark .. Although the message M depends on the block length n (through the mes-
sage set), we will not show this dependency explicitly. Also, from this point on, we will
not explicitly show the dependency of the probability of error Pe(n) on the codebook C.

3.1.1 Channel Coding Theorem

Shannon established a simple characterization of channel capacity.

Theorem . (Channel Coding Theorem). The capacity of the discrete memoryless
channel p(y|x) is given by the information capacity formula

C = max I(X; Y ).
p(x)
40 Point-to-Point Information Theory

In the following, we evaluate the information capacity formula for several simple but
important discrete memoryless channels.

Example 3.1 (Binary symmetric channel). Consider the binary symmetric channel with
crossover probability p (in short BSC(p)) depicted in Figure .. The channel input X and
output Y are binary and each binary input symbol is flipped with probability p. Equiva-
lently, we can specify the BSC as Y = X ⊕ Z, where the noise Z ∼ Bern(p) is independent
of the input X. The capacity is
C = max I(X; Y)
p(x)

= max(H(Y) − H(Y | X))

p(x)

= max(H(Y) − H(X ⊕ Z | X))

p(x)

= max(H(Y) − H(Z | X))

p(x)
(a)
= max H(Y) − H(Z)
p(x)

= 1 − H(p),
where (a) follows by the independence of X and Z. Note that the capacity is attained by
X ∼ Bern(1/2), which, in turn, results in Y ∼ Bern(1/2).

1−p
1 1 Z ∼ Bern(p)

X Y
X Y
0 0
1−p
Figure .. Equivalent representations of the binary symmetric channel BSC(p).

Example 3.2 (Binary erasure channel). Consider the binary erasure channel with erasure
probability p (BEC(p)) depicted in Figure .. The channel input X and output Y are
binary and each binary input symbol is erased (mapped into an erasure symbol e) with
probability p. Thus, the receiver knows which transmissions are erased, but the sender
does not. The capacity is
C = max(H(X) − H(X |Y))
p(x)
(a)
= max(H(X) − pH(X))
p(x)

= 1 − p,
where (a) follows since H(X |Y = y) = 0 if y = 0 or 1, and H(X |Y = e) = H(X). The
capacity is again attained by X ∼ Bern(1/2).
3.1 Channel Coding 41

1−p
1 1

X e Y

0 0
1−p
Figure .. Binary erasure channel BEC(p).

Example 3.3 (Product DMC). Let p(y1 |x1 ) and p(y2 |x2 ) be two DMCs with capacities
C1 and C2 , respectively. The product DMC is a DMC (X1 × X2 , p(y1 |x1 )p(y2 |x2 ), Y1 × Y2 )
in which the symbols x1 ∈ X1 and x2 ∈ X2 are sent simultaneously in parallel and the re-
ceived outputs Y1 and Y2 are distributed according to p(y1 , y2 |x1 , x2 ) = p(y1 |x1 )p(y2 |x2 ).
The capacity of the product DMC is
C = max I(X1 , X2 ; Y1 , Y2 )
p(x1 ,x2 )

= max 󶀡I(X1 , X2 ; Y1 ) + I(X1 , X2 ; Y2 |Y1 )󶀱

p(x1 ,x2 )
(a)
= max 󶀡I(X1 ; Y1 ) + I(X2 ; Y2 )󶀱
p(x1 ,x2 )

= max I(X1 ; Y1 ) + max I(X2 ; Y2 )

p(x1 ) p(x2 )

= C1 + C2 ,
where (a) follows since Y1 → X1 → X2 → Y2 form a Markov chain, which implies that
I(X1 , X2 ; Y1 ) = I(X1 ; Y1 ) and I(X1 , X2 ; Y2 |Y1 ) ≤ I(Y1 , X1 , X2 ; Y2 ) = I(X2 ; Y2 ) with equal-
ity iff X1 and X2 are independent.
More generally, let p(y j |x j ) be a DMC with capacity C j for j ∈ [1 : d]. A product
d d
DMC consists of an input alphabet X = ⨉ j=1 X j , an output alphabet Y = ⨉ j=1 Y j , and a
d
collection of conditional pmfs p(y1 , . . . , y d |x1 , . . . , x d ) = ∏ j=1 p(y j |x j ). The capacity of
the product DMC is
d
C = 󵠈 Cj.
j=1

To prove the channel coding theorem, we need to show that the information capacity
in Theorem . is equal to the operational capacity defined in the channel coding setup.
This involves the verification of two statements.
∙ Achievability. For every rate R < C = max p(x) I(X; Y), there exists a sequence of
(2nR , n) codes with average probability of error Pe(n) that tends to zero as n → ∞. The
proof of achievability uses random coding and joint typicality decoding.
∙ Converse. For every sequence of (2nR , n) codes with probability of error Pe(n) that
tends to zero as n → ∞, the rate must satisfy R ≤ C = max p(x) I(X; Y ). The proof of
the converse uses Fano’s inequality and basic properties of mutual information.
42 Point-to-Point Information Theory

We first prove achievability in the following subsection. The proof of the converse is
given in Section ...

3.1.2 Proof of Achievability

For simplicity of presentation, we assume throughout the proof that nR is an integer.
Random codebook generation. We use random coding. Fix the pmf p(x) that attains
the information capacity C. Randomly and independently generate 2nR sequences x n (m),
m ∈ [1 : 2nR ], each according to p(x n ) = ∏ni=1 p X (xi ). The generated sequences constitute
the codebook C. Thus
2󰑛󰑅 n
p(C) = 󵠉 󵠉 p X (xi (m)).
m=1 i=1

The chosen codebook C is revealed to both the encoder and the decoder before transmis-
sion commences.
Encoding. To send message m ∈ [1 : 2nR ], transmit x n (m).
Decoding. We use joint typicality decoding. Let yn be the received sequence. The receiver
̂ y n ) ∈ Tє(n) ;
̂ ∈ [1 : 2nR ] is sent if it is the unique message such that (x n (m),
declares that m
otherwise—if there is none or more than one such message—it declares an error e.
Analysis of the probability of error. Assuming that message m is sent, the decoder
makes an error if (x n (m), yn ) ∉ Tє(n) or if there is another message m󳰀 ̸= m such that
(x n (m󳰀 ), y n ) ∈ Tє(n) . Consider the probability of error averaged over M and codebooks

P(E) = EC 󶀢Pe(n) 󶀲
󰑛󰑅
1 2
= EC 󶀤 nR 󵠈 λm (C)󶀴
2 m=1
󰑛󰑅
1 2
= nR 󵠈 EC (λm (C))
2 m=1
(a)
= EC (λ1 (C))
= P(E | M = 1),
where (a) follows by the symmetry of the random codebook generation. Thus, we assume
without loss of generality that M = 1 is sent. For brevity, we do not explicitly condition
on the event {M = 1} in probability expressions whenever it is clear from the context.
The decoder makes an error iff one or both of the following events occur:
E1 = 󶁁(X n (1), Y n ) ∉ Tє(n) 󶁑,
E2 = 󶁁(X n (m), Y n ) ∈ Tє(n) for some m ̸= 1󶁑.
Thus, by the union of events bound,
P(E) = P(E1 ∪ E2 ) ≤ P(E1 ) + P(E2 ).
3.1 Channel Coding 43

We now bound each term. By the law of large numbers (LLN), the first term P(E1 )
tends to zero as n → ∞. For the second term, since for m ̸= 1,
n
(X n (m), X n (1), Y n ) ∼ 󵠉 p X (xi (m))p X,Y (xi (1), yi ),
i=1

we have (X n (m), Y n ) ∼ ∏ni=1 p X (xi (m))pY (yi ). Thus, by the extension of the joint typi-
cality lemma in Remark .,
P󶁁(X n (m), Y n ) ∈ Tє(n) 󶁑 ≤ 2−n(I(X;Y)−δ(є)) = 2−n(C−δ(є)) .
Again by the union of events bound,
2󰑛󰑅 2󰑛󰑅
n
P(E2 ) ≤ 󵠈 P󶁁(X (m), Y ) ∈ n
Tє(n) 󶁑 ≤ 󵠈 2−n(C−δ(є)) ≤ 2−n(C−R−δ(є)) ,
m=2 m=2

which tends to zero as n → ∞ if R < C − δ(є).

Note that since the probability of error averaged over codebooks, P(E), tends to zero
as n → ∞, there must exist a sequence of (2nR , n) codes such that limn→∞ Pe(n) = 0, which
proves that R < C − δ(є) is achievable. Finally, taking є → 0 completes the proof.

Remark 3.2. To bound the average probability of error P(E), we divided the error event
into two events, each of which comprises events with (X n (m), Y n ) having the same joint
pmf. This observation will prove useful when we analyze more complex error events in
later chapters.
Remark 3.3. By the Markov inequality, the probability of error for a random codebook,
that is, a codebook consisting of random sequences X n (m), m ∈ [1 : 2nR ], tends to zero as
n → ∞ in probability. Hence, most codebooks are good in terms of the error probability.
Remark 3.4. The capacity with the maximal probability of error λ∗ = maxm λm is equal
to that with the average probability of error Pe(n) . This can be shown by discarding the
worst half of the codewords (in terms of error probability) from each code in the sequence
of (2nR , n) codes with limn→∞ Pe(n) = 0. The maximal probability of error for each of the
codes with the remaining codewords is at most 2Pe(n) , which again tends to zero as n → ∞.
As we will see, the capacity with maximal probability of error is not always equal to that
with average probability of error for multiuser channels.
Remark 3.5. Depending on the structure of the channel, the rate R = C may or may not
be achievable. We will sometimes informally say that C is achievable to mean that every
R < C is achievable.

3.1.3 Achievability Using Linear Codes

Recall that in the achievability proof, we used only pairwise independence of codewords
X n (m), m ∈ [1 : 2nR ], rather than mutual independence among all of them. This observa-
tion has an interesting consequence—the capacity of a BSC can be achieved using linear
codes.
44 Point-to-Point Information Theory

Consider a BSC(p). Let k = ⌈nR⌉ and (u1 , u2 , . . . , uk ) ∈ {0, 1}k be the binary expan-
sion of the message m ∈ [1 : 2k − 1]. Generate a random codebook such that each code-
word x n (uk ) is a linear function of uk (in binary field arithmetic). In particular, let

x1 д11 д12 ... д1k u1

󶀄 󶀅 󶀄
󶀔 2 󶀕 󶀔 д21
x д22 ... д2k 󶀅
󶀕 󶀔 u2 󶀅
󶀄 󶀕
󶀔.󶀕=󶀔 . .. .. .. 󶀕 󶀔 . 󶀕,
󶀔
󶀜 .. 󶀕
󶀝 󶀔
󶀜 .. . . . 󶀝󶀜 . 󶀕
󶀕 󶀔 . 󶀝
xn дn1 дn2 ... дnk uk

where дi j ∈ {0, 1}, i ∈ [1 : n], j ∈ [1 : k], are generated i.i.d. according to Bern(1/2).
Now we can easily check that X1 (uk ), . . . , Xn (uk ) are i.i.d. Bern(1/2) for each uk ̸= 0,
and X n (uk ) and X n (ũ k ) are independent for each uk ̸= ũ k . Therefore, using the same steps
as in the proof of achievability for the channel coding theorem, it can be shown that the
error probability of joint typicality decoding tends to zero as n → ∞ if R < 1 − H(p) −
δ(є). This shows that for a BSC there exists not only a good sequence of codes, but also a
good sequence of linear codes.
It can be similarly shown that random linear codes achieve the capacity of the binary
erasure channel, or more generally, channels for which the input alphabet is a finite field
and the information capacity is attained by the uniform pmf.

3.1.4 Proof of the Converse

We need to show that for every sequence of (2nR , n) codes with limn→∞ Pe(n) = 0, we must
have R ≤ C = max p(x) I(X; Y). Again for simplicity of presentation, we assume that nR is
an integer. Every (2nR , n) code induces a joint pmf on (M, X n , Y n ) of the form
n
p(m, x n , y n ) = 2−nR p(x n |m) 󵠉 pY|X (yi |xi ).
i=1

By Fano’s inequality,
̂ ≤ 1 + P (n) nR = nєn ,
H(M | M) e

where єn tends to zero as n → ∞ by the assumption that limn→∞ Pe(n) = 0. Thus, by the
data processing inequality,
̂ ≤ nєn .
H(M |Y n ) ≤ H(M | M) (.)

Now consider

nR = H(M)
= I(M; Y n ) + H(M |Y n )
(a)
≤ I(M; Y n ) + nєn
n
= 󵠈 I(M; Yi |Y i−1 ) + nєn
i=1
3.2 Packing Lemma 45

n
≤ 󵠈 I(M, Y i−1 ; Yi ) + nєn
i=1
n
(b)
= 󵠈 I(Xi , M, Y i−1 ; Yi ) + nєn
i=1
n
(c)
= 󵠈 I(Xi ; Yi ) + nєn
i=1
≤ nC + nєn , (.)
where (a) follows from (.), (b) follows since Xi is a function of M, and (c) follows
since the channel is memoryless, which implies that (M, Y i−1 ) → Xi → Yi form a Markov
chain. The last inequality follows by the definition of the information capacity. Since єn
tends to zero as n → ∞, R ≤ C, which completes the proof of the converse.

3.1.5 DMC with Feedback

Consider the DMC with noiseless causal feedback depicted in Figure .. The encoder
assigns a symbol xi (m, y i−1 ) to each message m ∈ [1 : 2nR ] and past received output se-
quence y i−1 ∈ Y i−1 for i ∈ [1 : n]. Hence (.) does not hold in general and a (2nR , n)
feedback code induces a joint pmf of the form
n
(M, X n , Y n ) ∼ p(m, x n , y n ) = 2−nR 󵠉 p(xi |m, y i−1 )pY|X (yi |xi ).
i=1

Nonetheless, it can be easily shown that the chain of inequalities (.) continues to hold
in the presence of such causal feedback. Hence, feedback does not increase the capacity of
the DMC. In Chapter  we will discuss the role of feedback in communication in more
detail.

M Xi Yi ̂
M
Encoder p(y|x) Decoder

Y i−1

Figure .. DMC with noiseless causal feedback.

3.2 PACKING LEMMA

The packing lemma generalizes the bound on the probability of the decoding error event
E2 in the achievability proof of the channel coding theorem; see Section ... The lemma
will be used in the achievability proofs of many multiuser source and channel coding
theorems.
Recall that in the bound on P(E2 ), we had a fixed input pmf p(x) and a DMC p(y|x).
As illustrated in Figure ., we considered a set of (2nR − 1) i.i.d. codewords X n (m),
46 Point-to-Point Information Theory

m ∈ [2 : 2nR ], each distributed according to ∏ni=1 p X (xi ), and an output sequence Ỹ n ∼

∏ni=1 pY ( ỹi ) generated by the codeword X n (1) ∼ ∏ni=1 p X (xi ), which is independent of
the set of codewords. We showed that the probability that (X n (m), Ỹ n ) ∈ Tє(n) for some
m ∈ [2 : 2nR ] tends to zero as n → ∞ if R < I(X; Y ) − δ(є).

Xn Yn Tє(n) (Y)

X n (2)

X n (m)
Ỹ n

Figure .. Illustration of the setup for the bound on P(E2 ).

The following lemma extends this bound in three ways:

. The codewords that are independent of Ỹ n need not be mutually independent.
. The sequence Ỹ n can have an arbitrary pmf (not necessarily ∏ni=1 pY ( ỹi )).
. The sequence Ỹ n and the set of codewords are conditionally independent of a sequence
U n that has a general joint pmf with Ỹ n .

Lemma . (Packing lemma). Let (U , X, Y) ∼ p(u, x, y). Let (Ũ n , Ỹ n ) ∼ p(ũ n , ỹn ) be
a pair of arbitrarily distributed random sequences, not necessarily distributed accord-
ing to ∏ni=1 pU ,Y (ũ i , ỹi ). Let X n (m), m ∈ A, where |A| ≤ 2nR , be random sequences,
each distributed according to ∏ni=1 p X|U (xi | ũ i ). Further assume that X n (m), m ∈ A,
is pairwise conditionally independent of Ỹ n given Ũ n , but is arbitrarily dependent on
other X n (m) sequences. Then, there exists δ(є) that tends to zero as є → 0 such that

lim P󶁁(Ũ n , X n (m), Ỹ n ) ∈ Tє(n) for some m ∈ A󶁑 = 0,

n→∞

if R < I(X; Y |U ) − δ(є).

Note that the packing lemma can be readily applied to the linear coding case where
the X n (m) sequences are only pairwise independent. We will later encounter cases for
which U ̸=  and (Ũ n , Ỹ n ) is not generated i.i.d.
Proof. Define the events

Ẽm = 󶁁(Ũ n , X n (m), Ỹ n ) ∈ Tє(n) 󶁑 for m ∈ A.

3.3 Channel Coding with Input Cost 47

By the union of events bound, the probability of the event of interest can be bounded as

P󶀤 󵠎 Ẽm 󶀴 ≤ 󵠈 P(Ẽm ).
m∈A m∈A

Now consider

P(Ẽm ) = P󶁁(Ũ n , X n (m), Ỹ n ) ∈ Tє(n) (U , X, Y )󶁑

= 󵠈 p(ũ n , ỹn ) P󶁁(ũ n , X n (m), ỹn ) ∈ Tє(n) (U , X, Y) | Ũ n = ũ n , Ỹ n = ỹn 󶁑
(ũ 󰑛 , ỹ󰑛 )∈T󰜖(󰑛)
(a)
= 󵠈 p(ũ n , ỹn ) P󶁁(ũ n , X n (m), ỹn ) ∈ Tє(n) (U , X, Y) | Ũ n = ũ n 󶁑
(ũ 󰑛 , ỹ󰑛 )∈T󰜖(󰑛)
(b)
≤ 󵠈 p(ũ n , ỹn )2−n(I(X;Y|U)−δ(є))
(ũ 󰑛 , ỹ󰑛 )∈T󰜖(󰑛)

≤ 2−n(I(X;Y|U )−δ(є)) ,

where (a) follows by the conditional independence of X n (m) and Ỹ n given Ũ n , and (b)
follows by the joint typicality lemma in Section . since (ũ n , ỹn ) ∈ Tє(n) and X n (m) | {Ũ n =
ũ n , Ỹ n = ỹn } ∼ ∏ni=1 p X|U (xi |ũ i ). Hence

󵠈 P(Ẽm ) ≤ |A|2−n(I(X;Y|U )−δ(є)) ≤ 2−n(I(X;Y|U )−R−δ(є)) ,

m∈A

which tends to zero as n → ∞ if R < I(X; Y|U ) − δ(є). This completes the proof of the
packing lemma.

3.3 CHANNEL CODING WITH INPUT COST

Consider a DMC p(y|x). Suppose that there is a nonnegative cost function b(x) associated
with each input symbol x ∈ X . Assume without loss of generality that there exists a zero-
cost symbol x0 ∈ X , i.e., b(x0 ) = 0. We further assume an average input cost constraint
n
󵠈 b(xi (m)) ≤ nB for every m ∈ [1 : 2nR ],
i=1

(in short, average cost constraint B on X). Now, defining the channel capacity of the DMC
with cost constraint B, or the capacity–cost function, C(B) in a similar manner to capacity
without cost constraint, we can establish the following extension of the channel coding
theorem.

Theorem .. The capacity of the DMC p(y|x) with average cost constraint B on X is

C(B) = max I(X; Y).

p(x):E(b(X))≤B
48 Point-to-Point Information Theory

Note that C(B) is nondecreasing, concave, and continuous in B.

Proof of achievability. The proof involves a minor change to the proof of achievability
for the case with no cost constraint in Section .. to ensure that every codeword satisfies
the cost constraint.
Fix the pmf p(x) that attains C(B/(1 + є)). Randomly and independently generate 2nR
sequences x n (m), m ∈ [1 : 2nR ], each according to ∏ni=1 p X (xi ). To send message m, the
encoder transmits x n (m) if x n (m) ∈ Tє(n) , and consequently, by the typical average lemma
in Section ., the sequence satisfies the cost constraint ∑ni=1 b(xi (m)) ≤ nB. Otherwise,
it transmits (x0 , . . . , x0 ). The analysis of the average probability of error for joint typicality
decoding follows similar lines to the case without cost constraint. Assume M = 1. For the
probability of the first error event,

P(E1 ) = P󶁁(X n (1), Y n ) ∉ Tє(n) 󶁑

= P󶁁X n (1) ∈ Tє(n) , (X n (1), Y n ) ∉ Tє(n) 󶁑 + P󶁁X n (1) ∉ Tє(n) , (X n (1), Y n ) ∉ Tє(n) 󶁑
n n
≤ 󵠈 󵠉 p X (xi ) 󵠈 󵠉 pY|X (yi |xi ) + P󶁁X n (1) ∉ Tє(n) 󶁑
x 󰑛 ∈T󰜖(󰑛) i=1 y 󰑛 ∉T󰜖(󰑛) (Y|x 󰑛 ) i=1
n
≤ 󵠈 󵠉 p X (xi )pY|X (yi |xi ) + P󶁁X n (1) ∉ Tє(n) 󶁑.
(x 󰑛 , y 󰑛 )∉T󰜖(󰑛) i=1

Thus, by the LLN for each term, P(E1 ) tends to zero as n → ∞. The probability of the
second error event, P(E2 ), is upper bounded in exactly the same manner as when there is
no cost constraint. Hence, every rate R < I(X; Y) = C(B/(1 + є)) is achievable. Finally,
by the continuity of C(B) in B, C(B/(1 + є)) converges to C(B) as є → 0, which implies
the achievability of every rate R < C(B).
Proof of the converse. Consider a sequence of (2nR , n) codes with limn→∞ Pe(n) = 0 such
that for every n, the cost constraint ∑ni=1 b(xi (m)) ≤ nB is satisfied for every m ∈ [1 : 2nR ]
and thus ∑ni=1 E[b(Xi )] = ∑ni=1 EM [b(xi (M))] ≤ nB. As before, by Fano’s inequality and
the data processing inequality,
n
nR ≤ 󵠈 I(Xi ; Yi ) + nєn
i=1
(a) n
≤ 󵠈 C(E[b(Xi )]) + nєn
i=1
(b) n
1
≤ nC󶀤 󵠈 E[b(Xi )]󶀴 + nєn (.)
n i=1
(c)
≤ nC(B) + nєn ,

where (a) follows by the definition of C(B), (b) follows by the concavity of C(B), and (c)
follows by the monotonicity of C(B). This completes the proof of Theorem ..
3.4 Gaussian Channel 49

3.4 GAUSSIAN CHANNEL

Consider the discrete-time additive white Gaussian noise channel model depicted in Fig-
ure .. The channel output corresponding to the input X is

Y = дX + Z, (.)

where д is the channel gain, or path loss, and Z ∼ N(0, N0 /2) is the noise. Thus, in trans-
mission time i ∈ [1 : n], the channel output is

Yi = дXi + Zi ,

where {Zi } is a white Gaussian noise process with average power N0 /2 (in short, {Zi } is
a WGN(N0 /2) process), independent of the channel input X n = x n (M). We assume an
average transmission power constraint
n
󵠈 xi2 (m) ≤ nP for every m ∈ [1 : 2nR ]
i=1

(in short, average power constraint P on X). The Gaussian channel is quite popular be-
cause it provides a simple model for several real-world communication channels, such as
wireless and digital subscriber line (DSL) channels. We will later study more sophisticated
models for these channels.

д
X Y

Figure .. Additive white Gaussian noise channel.

We assume without loss of generality that N0 /2 = 1 (since one can define an equivalent
Gaussian channel by dividing both sides of (.) by 󵀄N0 /2 ) and label the received power
(which is now equal to the received signal-to-noise ratio (SNR)) д 2 P as S. Note that the
Gaussian channel is an example of the channel with cost discussed in the previous section,
but with continuous (instead of finite) alphabets. Nonetheless, its capacity under power
constraint P can be defined in the exact same manner as for the DMC with cost constraint.

Remark 3.6. If causal feedback from the receiver to the sender is present, then Xi de-
pends only on the message M and the past received symbols Y i−1 . In this case Xi is not in
general independent of the noise process. However, the message M and the noise process
{Zi } are always assumed to be independent.
Remark 3.7. Since we discuss mainly additive white Gaussian noise channels, for brevity
we will consistently use “Gaussian” in place of “additive white Gaussian noise.”
50 Point-to-Point Information Theory

3.4.1 Capacity of the Gaussian Channel

The capacity of the Gaussian channel is a simple function of the received SNR S.

Theorem .. The capacity of the Gaussian channel is

C= sup I(X; Y) = C(S),

F(x):E(X 2 )≤P

where C(x) = (1/2) log(1 + x), x ≥ 0, is the Gaussian capacity function.

For low SNR (small S), C grows linearly with S, while for high SNR, it grows logarith-
mically.
Proof of the converse. First note that the proof of the converse for the DMC with input
cost constraint in Section . applies to arbitrary (not necessarily discrete) memoryless
channels. Therefore, continuing the chain of inequalities in (.) with b(x) = x 2 , we obtain

C≤ sup I(X; Y).

F(x):E(X 2 )≤P

Now for any X ∼ F(x) with E(X 2 ) ≤ P,

I(X; Y) = h(Y) − h(Y | X)

= h(Y) − h(Z | X)
= h(Y) − h(Z)
(a) 1 1
≤ log(2πe(S + 1)) − log(2πe)
2 2
= C(S),

where (a) follows by the maximum differential entropy lemma in Section . with E(Y 2 ) ≤
д 2 P + 1 = S + 1. Since this inequality becomes equality if X ∼ N(0, P), we have shown
that
C≤ sup I(X; Y) = C(S).
F(x):E(X 2 )≤P

This completes the proof of the converse.

Proof of achievability. We extend the achievability proof for the DMC with cost con-
straint to show that C ≥ C(S). Let X ∼ N(0, P). Then, I(X; Y) = C(S). For every j =
1, 2, . . . , let [X] j ∈ {− jΔ, −( j − 1)Δ, . . . , −Δ, 0, Δ, . . . , ( j − 1)Δ, jΔ}, Δ = 1/󵀄 j, be a quan-
tized version of X, obtained by mapping X to the closest quantization point [X] j = x̂ j (X)
such that |[X] j | ≤ |X|. Clearly, E([X]2j ) ≤ E(X 2 ) = P. Let Y j = д[X] j + Z be the output
corresponding to the input [X] j and let [Y j ]k = ŷk (Y j ) be a quantized version of Y j defined
in the same manner. Now, using the achievability proof for the DMC with cost constraint,
we can show that for each j, k, any rate R < I([X] j ; [Y j ]k ) is achievable for the channel
with input [X j ] and output [Y j ]k under power constraint P.
3.4 Gaussian Channel 51

We now show that I([X] j ; [Y j ]k ) can be made as close to I(X; Y) as desired by taking
j, k sufficiently large. First, by the data processing inequality,

I([X] j ; [Y j ]k ) ≤ I([X] j ; Y j ) = h(Y j ) − h(Z).

Since Var(Y j ) ≤ S + 1, h(Y j ) ≤ h(Y) for all j. Thus, I([X] j ; [Y j ]k ) ≤ I(X; Y ). For the other
direction, we have the following.

Lemma .. lim inf j→∞ limk→∞ I([X] j ; [Y j ]k ) ≥ I(X; Y ).

The proof of this lemma is given in Appendix A. Combining both bounds, we have

lim lim I([X] j ; [Y j ]k ) = I(X; Y ),

j→∞ k→∞

which completes the proof of Theorem ..

Remark .. This discretization procedure shows how to extend the coding theorem for
a DMC to a Gaussian or any other well-behaved continuous-alphabet channel. Similar
procedures can be used to extend coding theorems for finite-alphabet multiuser channels
to their Gaussian counterparts. Hence, in subsequent chapters we will not provide formal
proofs of such extensions.

3.4.2 Minimum Energy Per Bit

In the discussion of the Gaussian channel, we assumed average power constraint P on
each transmitted codeword and found the highest reliable transmission rate under this
constraint. A “dual” formulation of this problem is to assume a given transmission rate
R and determine the minimum energy per bit needed to achieve it. This formulation can
be viewed as more natural since it leads to a fundamental limit on the energy needed to
reliably communicate one bit of information over a Gaussian channel.
Consider a (2nR , n) code for the Gaussian channel. Define the average power for the
code as
󰑛󰑅
1 2 1 n 2
P = nR 󵠈 󵠈 xi (m),
2 m=1 n i=1

and the average energy per bit for the code as E = P/R (that is, the energy per transmission
divided by bits per transmission).
Following similar steps to the converse proof for the Gaussian channel in the previous
section, we can show that for every sequence of (2nR , n) codes with average power P and
limn→∞ Pe(n) = 0, we must have
1
R≤ log(1 + д 2 P).
2

Substituting P = ER, we obtain the lower bound on the energy per bit E ≥ (22R − 1)/(д 2 R).
52 Point-to-Point Information Theory

We also know that if the average power of the code is P, then any rate R < C(д 2 P) is
achievable. Therefore, reliable communication at rate R with energy per bit E > (22R −
1)/R is possible. Hence, the energy-per-bit–rate function, that is, the minimum energy-
per-bit needed for reliable communication at rate R, is

1 2R
Eb (R) = (2 − 1).
д2R

This is a monotonically increasing and strictly convex function of R (see Figure .). As R
tends to zero, Eb (R) converges to Eb∗ = (1/д 2 )2 ln 2, which is the minimum energy per bit
needed for reliable communication over a Gaussian channel with noise power N0 /2 = 1
and gain д.

Eb∗

Figure .. Minimum energy per bit versus transmission rate.

3.4.3 Gaussian Product Channel

The Gaussian product channel depicted in Figure . consists of a set of parallel Gaussian
channels
Y j = д j X j + Z j for j ∈ [1 : d],

where д j is the gain of the j-th channel component and Z1 , Z2 , . . . , Z d are independent
zero-mean Gaussian noise components with the same average power N0 /2 = 1. We as-
sume an average transmission power constraint

n d
1
󵠈 󵠈 x 2 (m) ≤ P for m ∈ [1 : 2nR ].
n i=1 j=1 ji

The Gaussian product channel is a model for continuous-time (waveform) additive

Gaussian noise channels; the parallel channels represent different frequency bands, time
slots, or more generally, orthogonal signal dimensions.
3.4 Gaussian Channel 53

д1
X1 Y1

д2
X2 Y2

дd
Xd Yd

Figure .. Gaussian product channel: d parallel Gaussian channels.

The capacity of the Gaussian product channel is

d
C = max 󵠈 C(д 2j P j ). (.)
P1 ,P2 ,...,P󰑑
j=1
∑ 󰑑󰑗=1 P 󰑗 ≤P

The proof of the converse follows by noting that the capacity is upper bounded as
d
C≤ sup I(X d ; Y d ) = sup 󵠈 I(X j ; Y j )
F(x 󰑑 ):∑ 󰑑󰑗=1 E(X 2󰑗 )≤P F(x 󰑑 ):∑ 󰑑󰑗=1 E(X 2󰑗 )≤P j=1

and that the supremum is attained by mutually independent X j ∼ N(0, P j ), j ∈ [1 : d].

For the achievability proof, note that this bound can be achieved by the discretization
procedure for each component Gaussian channel. The constrained optimization problem
in (.) is convex and can be solved by forming the Lagrangian; see Appendix E. The
solution yields
+
1 1
P ∗j = 󶁧λ − 󶁷 = max 󶁇λ − 2 , 0󶁗 ,
д 2j дj

where the Lagrange multiplier λ is chosen to satisfy the condition

d +
1
󵠈 󶁧λ − 2 󶁷 = P.
j=1 дj

This optimal power allocation has the water-filling interpretation illustrated in Figure ..
54 Point-to-Point Information Theory

д1−2 д −2
d
Pj

д −2
j

Figure .. Water-filling interpretation of optimal power allocation.

Although this solution maximizes the mutual information and thus is optimal only in
the asymptotic sense, it has been proven effective in practical subcarrier bit-loading algo-
rithms for DSL and orthogonal frequency division multiplexing (OFDM) systems.

3.5 LOSSLESS SOURCE CODING

In the previous sections, we considered reliable communication of a maximally com-

pressed information source represented by a uniformly distributed message over a noisy
channel. In this section we consider the “dual” problem of communicating (or storing)
an uncompressed source over a noiseless link (or in a memory) as depicted in Figure ..
The source sequence X n is encoded (described or compressed) into an index M at rate
R bits per source symbol, and the receiver decodes (decompresses) the index to find the
estimate (reconstruction) X̂ n of the source sequence. The lossless source coding problem
is to find the lowest compression rate in bits per source symbol such that the probability
of decoding error decays asymptotically to zero with the code block length n.
We consider the lossless source coding problem for a discrete memoryless source (DMS)
model (X , p(x)), informally referred to as X, that consists of a finite alphabet X and a pmf
p(x) over X . The DMS (X , p(x)) generates an i.i.d. random process {Xi } with Xi ∼ p X (xi ).
For example, the Bern(p) source X for p ∈ [0, 1] has a binary alphabet and the Bern(p)
pmf. It generates a Bern(p) random process {Xi }.

Xn M X̂ n
Encoder Decoder

Figure .. Point-to-point compression system.

3.5 Lossless Source Coding 55

A (2nR , n) lossless source code of rate R bits per source symbol consists of
∙ an encoding function (encoder) m : X n → [1 : 2nR ) = {1, 2, . . . , 2⌊nR⌋ } that assigns an
index m(x n ) (a codeword of length ⌊nR⌋ bits) to each source n-sequence x n , and
∙ a decoding function (decoder) x̂n : [1 : 2nR ) → X n ∪ {e} that assigns an estimate
x̂n (m) ∈ X n or an error message e to each index m ∈ [1 : 2nR ).
The probability of error for a (2nR , n) lossless source code is defined as Pe(n) = P{ X̂ n ̸= X n }.
A rate R is said to be achievable if there exists a sequence of (2nR , n) codes such that
limn→∞ Pe(n) = 0 (hence the coding is required to be only asymptotically error-free). The
optimal rate R∗ for lossless source coding is the infimum of all achievable rates.

3.5.1 Lossless Source Coding Theorem

The optimal compression rate is characterized by the entropy of the source.

Theorem . (Lossless Source Coding Theorem). The optimal rate for lossless
source coding of a discrete memoryless source X is

R∗ = H(X).

For example, the optimal lossless compression rate for a Bern(p) source X is R∗ =
H(X) = H(p). To prove this theorem, we again need to verify the following two state-
ments:
∙ Achievability. For every R > R∗ = H(X) there exists a sequence of (2nR , n) codes with
limn→∞ Pe(n) = 0. We prove achievability using properties of typical sequences. Two
alternative proofs will be given in Sections .. and ...
∙ Converse. For every sequence of (2nR , n) codes with limn→∞ Pe(n) = 0, the source cod-
ing rate R ≥ R∗ = H(X). The proof uses Fano’s inequality and basic properties of en-
tropy and mutual information.
We now prove each statement.

3.5.2 Proof of Achievability

For simplicity of presentation, assume nR is an integer. For є > 0, let R = H(X) + δ(є)
with δ(є) = єH(X). Hence, |Tє(n) | ≤ 2n(H(X)+δ(є)) = 2nR .
Encoding. Assign a distinct index m(x n ) to each x n ∈ Tє(n) . Assign m = 1 to all x n ∉ Tє(n) .
Decoding. Upon receiving the index m, the decoder declares x̂n = x n (m) for the unique
x n (m) ∈ Tє(n) .
Analysis of the probability of error. All typical sequences are recovered error-free. Thus,
the probability of error is Pe(n) = P󶁁X n ∉ Tє(n) 󶁑, which tends to zero as n → ∞. This com-
pletes the proof of achievability.
56 Point-to-Point Information Theory

3.5.3 Proof of the Converse

Given a sequence of (2nR , n) codes with limn→∞ Pe(n) = 0, let M be the random variable
corresponding to the index generated by the encoder. By Fano’s inequality,

H(X n |M) ≤ H(X n | X̂ n ) ≤ 1 + nPe(n) log |X | = nєn ,

where єn tends to zero as n → ∞ by the assumption that limn→∞ Pe(n) = 0. Now consider

nR ≥ H(M)
= I(X n ; M)
= nH(X) − H(X n |M)
≥ nH(X) − nєn .

By taking n → ∞, we conclude that R ≥ H(X). This completes the converse proof of the
lossless source coding theorem.

3.6 LOSSY SOURCE CODING

Recall the compression system shown in Figure .. Suppose that the source alphabet is
continuous, for example, the source is a sensor that outputs an analog signal, then lossless
reconstruction of the source sequence would require an infinite transmission rate! This
motivates the lossy compression setup we study in this section, where the reconstruction
is only required to be close to the source sequence according to some fidelity criterion (or
distortion measure). In the scalar case, where each symbol is separately compressed, this
lossy compression setup reduces to scalar quantization (analog-to-digital conversion),
which often employs a mean squared error fidelity criterion. As in channel coding, how-
ever, it turns out that performing the lossy compression in blocks (vector quantization)
can achieve better performance.
Unlike the lossless source coding setup where there is an optimal compression rate, the
lossy source coding setup involves a tradeoff between the rate and the desired distortion.
The problem is to find the limit on such tradeoff, which we refer to as the rate–distortion
function. Note that this function is the source coding equivalent of the capacity–cost
function in channel coding.
Although the motivation for lossy compression comes from sources with continuous
alphabets, we first consider the problem for a DMS (X , p(x)) as defined in the previous
section. We assume the following per-letter distortion criterion. Let X̂ be a reconstruction
alphabet and define a distortion measure as a mapping

d : X × X̂ → [0, ∞).

This mapping measures the cost of representing the symbol x by the symbol x̂. The average
distortion between x n and x̂n is defined as
n
1
d(x n , x̂n ) = 󵠈 d(xi , x̂i ).
n i=1
3.6 Lossy Source Coding 57

For example, when X = X̂ , the Hamming distortion measure (loss) is the indicator for an
error, i.e.,
̂
1 if x ̸= x,
̂ =󶁇
d(x, x)
0 if x = x̂.

Thus, d(x̂n , x n ) is the fraction of symbols in error (bit error rate for the binary alphabet).
Formally, a (2nR , n) lossy source code consists of
∙ an encoder that assigns an index m(x n ) ∈ [1 : 2nR ) to each sequence x n ∈ X n , and
∙ a decoder that assigns an estimate x̂n (m) ∈ X̂ n to each index m ∈ [1 : 2nR ).
The set C = {x̂n (1), . . . , x̂n (2⌊nR⌋ )} constitutes the codebook.
The expected distortion associated with a (2nR , n) lossy source code is defined as

E(d(X n , X̂ n )) = 󵠈 p(x n )d󶀡x n , x̂n (m(x n ))󶀱.

x󰑛

A rate–distortion pair (R, D) is said to be achievable if there exists a sequence of (2nR , n)

codes with
lim sup E(d(X n , X̂ n )) ≤ D. (.)
n→∞

The rate–distortion function R(D) is the infimum of rates R such that (R, D) is achievable.

3.6.1 Lossy Source Coding Theorem

Shannon showed that mutual information is again the canonical quantity that character-
izes the rate–distortion function.

Theorem . (Lossy Source Coding Theorem). The rate–distortion function for a
̂ is
DMS X and a distortion measure d(x, x)

R(D) = min ̂
I(X; X)
̂
p(x̂|x):E(d(X, X))≤D

for D ≥ Dmin = minx̂(x) E[d(X, x̂(X))].

Similar to the capacity–cost function in Section ., the rate–distortion function R(D)
is nonincreasing, convex, and continuous in D ≥ Dmin (see Figure .). Unless noted
otherwise, we will assume throughout the book that Dmin = 0, that is, for every symbol
x ∈ X there exists a reconstruction symbol x̂ ∈ X̂ such that d(x, x) ̂ = 0.
Example . (Bernoulli source with Hamming distortion). The rate–distortion func-
tion for a Bern(p) source X, p ∈ [0, 1/2], and Hamming distortion measure is

H(p) − H(D) for 0 ≤ D < p,

R(D) = 󶁇
0 for D ≥ p.
58 Point-to-Point Information Theory

H(X)
R(Dmin )

Dmin Dmax
D

Figure .. Graph of a typical rate–distortion function. Note that R(D) = 0 for
̂ and R(Dmin ) ≤ H(X).
D ≥ Dmax = minx̂ E(d(X, x))

To show this, recall that

R(D) = min ̂
I(X; X).
̂
p(x̂|x):E(d(X, X))≤D

If D ≥ p, R(D) = 0 by simply taking X̂ = 0. If D < p, we find a lower bound on R(D)

and then show that there exists a test channel p(x̂|x) that attains it. For any joint pmf that
̂ = P{X ̸= X}
satisfies the distortion constraint E(d(X, X)) ̂ ≤ D, we have

̂ = H(X) − H(X | X)
I(X; X) ̂
= H(p) − H(X ⊕ X̂ | X)
̂
̂
≥ H(p) − H(X ⊕ X)
(a)
≥ H(p) − H(D),

̂ ≤ D. Thus
where (a) follows since P{X ̸= X}

R(D) ≥ H(p) − H(D).

It can be easily shown that this bound is attained by the backward BSC (with X̂ and Z
independent) shown in Figure ., and the associated expected distortion is D.

Z ∼ Bern(D)

X̂ ∼ Bern󶀣 1−2D 󶀳
p−D
X ∼ Bern(p)

Figure .. The backward BSC (test channel) that attains the rate–distortion func-
tion R(D).
3.6 Lossy Source Coding 59

3.6.2 Proof of the Converse

The proof of the lossy source coding theorem again requires establishing achievability and
the converse. We first prove the converse.
We need to show that for any sequence of (2nR , n) codes with

lim sup E(d(X n , X̂ n )) ≤ D, (.)

n→∞

we must have R ≥ R(D). Consider

nR ≥ H(M)
≥ I(M; X n )
≥ I( X̂ n ; X n )
n
= 󵠈 I(Xi ; X̂ n | X i−1 )
i=1
n
(a)
= 󵠈 I(Xi ; X̂ n , X i−1 )
i=1
n
≥ 󵠈 I(Xi ; X̂ i )
i=1
(b) n
≥ 󵠈 R󶀡E[d(Xi , X̂ i )]󶀱
i=1
(c)
≥ nR󶀡E[d(X n , X̂ n )]󶀱,

where (a) follows by the memoryless property of the source, (b) follows by the definition
̂ and (c) follows by the convexity of R(D). Since R(D) is continu-
of R(D) = min I(X; X),
ous and nonincreasing in D, it follows from the bound on distortion in (.) that

R ≥ lim sup R󶀡E[d(X n , X̂ n )]󶀱 ≥ R󶀣lim sup E[d(X n , X̂ n )]󶀳 ≥ R(D).

n→∞ n→∞

This completes the proof of the converse.

3.6.3 Proof of Achievability

The proof uses random coding and joint typicality encoding. Assume that nR is an integer.
Random codebook generation. Fix the conditional pmf p(x|x) ̂ that attains R(D/(1 + є)),
where D is the desired distortion, and let p(x̂) = ∑x p(x)p(x̂|x). Randomly and indepen-
dently generate 2nR sequences x̂n (m), m ∈ [1 : 2nR ], each according to ∏ni=1 p X̂ (x̂i ). These
sequences constitute the codebook C, which is revealed to the encoder and the decoder.
Encoding. We use joint typicality encoding. Given a sequence x n , find an index m such
that (x n , x̂n (m)) ∈ Tє(n) . If there is more than one such index, choose the smallest one
among them. If there is no such index, set m = 1.
60 Point-to-Point Information Theory

Decoding. Upon receiving the index m, the decoder sets the reconstruction sequence
x̂n = x̂n (m).
Analysis of expected distortion. Let є 󳰀 < є and M be the index chosen by the encoder.
We bound the distortion averaged over the random choice of the codebook C. Define the
“encoding error” event
E = 󶁁(X n , X̂ n (M)) ∉ Tє(n) 󶁑,

and consider the events

E1 = 󶁁X n ∉ Tє(n)
󳰀 󶁑,

n ̂n
E2 = 󶁁X n ∈ Tє(n)
󳰀 , (X , X (m)) ∉ Tє
(n)
for all m ∈ [1 : 2nR ]󶁑.

Then by the union of events bound,

P(E) ≤ P(E1 ) + P(E2 ).

We bound each term. By the LLN, the first term P(E1 ) tends to zero as n → ∞. Consider
the second term

P(E2 ) = 󵠈 p(x n ) P󶁁(x n , X̂ n (m)) ∉ Tє(n) for all m | X n = x n 󶁑

x 󰑛 ∈T (󰑛)
󳰀
󰜖

2󰑛󰑅
= 󵠈 p(x ) 󵠉 P󶁁(x n , X̂ n (m)) ∉ Tє(n) 󶁑
n

x 󰑛 ∈T (󰑛)
󳰀
m=1
󰜖
󰑛󰑅
2
= 󵠈 p(x n )󶀡P󶁁(x n , X̂ n (1)) ∉ Tє(n) 󶁑󶀱 .
x 󰑛 ∈T (󰑛)
󳰀
󰜖

Since x n ∈ Tє(n)
󳰀 and X̂ n (1) ∼ ∏ni=1 p X̂ (x̂i ), it follows by the second part of the joint typi-
cality lemma in Section . that for n sufficiently large
̂
P󶁁(x n , X̂ n (1)) ∈ Tє(n) 󶁑 ≥ 2−n(I(X; X)+δ(є)) ,

where δ(є) tends to zero as є → 0. Since (1 − x)k ≤ e −kx for x ∈ [0, 1] and k ≥ 0, we have
2󰑛󰑅 ̂ 2󰑛󰑅
󵠈 p(x n )󶀡P󶁁(x n , X̂ n (1)) ∉ Tє(n) 󶁑󶀱 ≤ 󶀡1 − 2−n(I(X; X)+δ(є)) 󶀱
x 󰑛 ∈T (󰑛) ̂
≤ exp󶀡−2nR ⋅ 2−n(I(X; X)+δ(є)) 󶀱
󰜖󳰀

̂
= exp󶀡−2n(R−I(X; X)−δ(є)) 󶀱,

which tends to zero as n → ∞ if R > I(X; X)̂ + δ(є).

Now, by the law of total expectation and the typical average lemma,

EC,X 󰑛 󶁡d(X n , X̂ n (M))󶁱 = P(E) EC,X 󰑛 󶁡d(X n , X̂ n (M))󵄨󵄨󵄨󵄨E󶁱 + P(E c ) EC,X 󰑛 󶁡d(X n , X̂ n (M))󵄨󵄨󵄨󵄨E c 󶁱
≤ P(E)dmax + P(E c )(1 + є) E(d(X, X)), ̂
3.6 Lossy Source Coding 61

̂ Hence, by the assumption on the conditional pmf

where dmax = max(x, x̂)∈X ×X̂ d(x, x).
̂ ≤ D/(1 + є),
p(x̂|x) that E(d(X, X))
lim sup EC,X 󰑛 󶁡d(X n , X̂ n (M))󶁱 ≤ D
n→∞

̂ + δ(є) = R(D/(1 + є)) + δ(є). Since the expected distortion (averaged over
if R > I(X; X)
codebooks) is asymptotically ≤ D, there must exist a sequence of codes with expected
distortion asymptotically ≤ D, which proves the achievability of the rate–distortion pair
(R(D/(1 + є)) + δ(є), D). Finally, by the continuity of R(D) in D, it follows that the achiev-
able rate R(D/(1 + є)) + δ(є) converges to R(D) as є → 0, which completes the proof of
achievability.
Remark .. The above proof can be extended to unbounded distortion measures, pro-
vided that there exists a symbol x̂0 such that d(x, x̂0 ) < ∞ for every x. In this case,
encoding is modified so that x̂n = (x̂0 , . . . , x̂0 ) whenever joint typicality encoding fails.
For example, for an erasure distortion measure with X = {0, 1} and X̂ = {0, 1, e}, where
d(0, 0) = d(1, 1) = 0, d(0, e) = d(1, e) = 1, and d(0, 1) = d(1, 0) = ∞, we have x̂0 = e.
When X ∼ Bern(1/2), it can be easily shown that R(D) = 1 − D.

3.6.4 Lossless Source Coding Revisited

We show that the lossless source coding theorem can be viewed as a corollary of the lossy
source coding theorem. This leads to an alternative random coding achievability proof of
the lossless source coding theorem. Consider the lossy source coding problem for a DMS
X, reconstruction alphabet X̂ = X , and Hamming distortion measure. Setting D = 0, we
obtain
R(0) = min ̂ = I(X; X) = H(X),
I(X; X)
̂
p(x̂|x):E(d(X, X))=0

which is equal to the optimal lossless source coding rate R∗ as we have already seen in the
lossless source coding theorem.
Here we prove that operationally R∗ = R(0) without resorting to the fact that R ∗ =
H(X). To prove the converse (R∗ ≥ R(0)), note that the converse for the lossy source
coding theorem under the above conditions implies that for any sequence of (2nR , n) codes
if the average symbol error probability
n
1
󵠈 P{ X̂ i ̸= Xi }
n i=1

tends to zero as n → ∞, then R ≥ R(0). Since the average symbol error probability is
smaller than or equal to the block error probability P{ X̂ n ̸= X n }, this also establishes the
converse for the lossless case.
To prove achievability (R ∗ ≤ R(0)), we can still use random coding and joint typicality
encoding! We fix a test channel

1 if x = x̂,
p(x̂ |x) = 󶁇
0 otherwise,
62 Point-to-Point Information Theory

̂ in the usual way. Then, (x n , x̂n ) ∈ T (n) implies that x n = x̂n . Fol-
and define Tє(n) (X, X) є
lowing the achievability proof of the lossy source coding theorem, we generate a random
code x̂n (m), m ∈ [1 : 2nR ], and use the same encoding and decoding procedures. Then,
the probability of decoding error averaged over codebooks is upper bounded as

P(E) ≤ P󶁁(X n , X̂ n ) ∉ Tє(n) 󶁑,

which tends to zero as n → ∞ if R > I(X; X) ̂ + δ(є) = R(0) + δ(є). Thus there exists a
sequence of (2 , n) lossless source codes with limn→∞ Pe(n) = 0.
nR

Remark .. We already know how to construct a sequence of asymptotically optimal

lossless source codes by uniquely labeling each typical sequence. The above proof, how-
ever, shows that random coding can be used to establish all point-to-point communica-
tion coding theorems. Such unification shows the power of random coding and is aes-
thetically pleasing. More importantly, the technique of specializing a lossy source coding
theorem to the lossless case will prove crucial later in Chapters  and .

3.7 COVERING LEMMA

The covering lemma generalizes the bound on the probability of the encoding error event
E in the achievability proof of the lossy source coding theorem. The lemma will be used
in the achievability proofs of several multiuser source and channel coding theorems.
Recall that in the bound on P(E), we had a fixed conditional pmf p(x̂|x) and a source
X ∼ p(x). As illustrated in Figure ., we considered a set of 2nR i.i.d. reconstruction se-
quences X̂ n (m), m ∈ [1 : 2nR ], each distributed according to ∏ni=1 p X̂ (x̂i ) and an indepen-
dently generated source sequence X n ∼ ∏ni=1 p X (xi ). We showed that the probability that
(X n , X̂ n (m)) ∈ Tє(n) for some m ∈ [1 : 2nR ] tends to one as n → ∞ if R > I(X; X) ̂ + δ(є).
n
The following lemma extends this bound by assuming that X and the set of code-
words are conditionally independent given a sequence U n with the condition that U n and
X n are jointly typical with high probability. As such, the covering lemma is a dual to the
packing lemma in which we do not wish any of the untransmitted (independent) code-
words to be jointly typical with the received sequence given U n .

Lemma . (Covering Lemma). Let (U , X, X) ̂ ∼ p(u, x, x̂) and є 󳰀 < є. Let (U n , X n ) ∼
p(u , x ) be a pair of random sequences with limn→∞ P{(U n , X n ) ∈ Tє(n)
n n
󳰀 (U , X)} = 1,

̂
and let X (m), m ∈ A, where |A| ≥ 2 , be random sequences, conditionally indepen-
n nR

dent of each other and of X n given U n , each distributed according to ∏ni=1 p X|U
̂ (x ̂i |ui ).
Then, there exists δ(є) that tends to zero as є → 0 such that

lim P󶁁(U n , X n , X̂ n (m)) ∉ Tє(n) for all m ∈ A󶁑 = 0,

n→∞

̂ ) + δ(є).
if R > I(X; X|U
3.7 Covering Lemma 63

X̂ n Xn Tє(n)
󳰀 (X)

X̂ n (1)

Xn
X̂ n (m)

Figure .. Illustration of the setup for the bound on P(E).

Proof. Define the event

E0 = 󶁁(U n , X n ) ∉ Tє(n)
󳰀 󶁑.

Then, the probability of the event of interest can be upper bounded as

P(E) ≤ P(E0 ) + P(E ∩ E0c ).

By the condition of the lemma, P(E0 ) tends to zero as n → ∞. For the second term, recall
from the joint typicality lemma that if (un , x n ) ∈ Tє(n)
󳰀 , then for n sufficiently large,

P󶁁(un , x n , X̂ n (m)) ∈ Tє(n) | U n = un , X n = x n 󶁑 = P󶁁(un , x n , X̂ n (m)) ∈ Tє(n) | U n = un 󶁑

̂
≥ 2−n(I(X; X|U)+δ(є))

for each m ∈ A for some δ(є) that tends to zero as є → 0. Hence, for n sufficiently large,

P(E ∩ E0c ) = 󵠈 p(un , x n ) P󶁁(un , x n , X̂ n (m)) ∉ Tє(n) for all m | U n = un , X n = x n 󶁑

(u󰑛 ,x 󰑛 )∈T (󰑛)
󰜖󳰀

= 󵠈 p(un , x n ) 󵠉 P󶁁(un , x n , X̂ n (m)) ∉ Tє(n) | U n = un 󶁑

̂
which tends to zero as n → ∞, provided R > I(X; X|U) + δ(є). This completes the proof.
Remark .. The covering lemma continues to hold even when independence among
all the sequences X̂ n (m), m ∈ A, is replaced with pairwise independence; see the mutual
covering lemma in Section ..
64 Point-to-Point Information Theory

3.8 QUADRATIC GAUSSIAN SOURCE CODING

We motivated the need for lossy source coding by considering compression of continuous-
alphabet sources. In this section, we study lossy source coding of a Gaussian source, which
is an important example of a continuous-alphabet source and is often used to model real-
world analog signals such as video and speech.
Let X be a WGN(P) source, that is, a source that generates a WGN(P) random process
{Xi }. We consider a lossy source coding problem for the source X with quadratic (squared
̂ 2 on ℝ2 . The rate–distortion function for this
error) distortion measure d(x, x̂) = (x − x)
quadratic Gaussian source coding problem can be defined in the exact same manner as for
the DMS case. Furthermore, Theorem . with the minimum over arbitrary test channels
applies and the rate–distortion function can be expressed simply in terms of the power-
to-distortion ratio.

Theorem .. The rate–distortion function for a WGN(P) source with squared error
distortion measure is

R(D) = inf ̂ = R󶀣P 󶀳,

I(X; X)
̂ 2 )≤D
F(x̂|x):E((X− X) D

where R(x) = (1/2)[log x]+ is the quadratic Gaussian rate function.

Proof of the converse. It is easy to see that the converse proof for the lossy source coding
theorem extends to continuous sources with well-defined density such as Gaussian, and
we have
R(D) ≥ inf ̂
I(X; X). (.)
̂ 2 )≤D
F(x̂|x):E((X− X)

For D ≥ P, we set X̂ = E(X) = 0; thus R(D) = 0. For 0 ≤ D < P, we first find a lower
bound on the infimum in (.) and then show that there exists a test channel that attains
it. Consider
̂ = h(X) − h(X | X)
I(X; X) ̂
1
= log(2πeP) − h(X − X̂ | X)
̂
2
1 ̂
≥ log(2πeP) − h(X − X)
2
1 1 ̂ 2 ])
≥ log(2πeP) − log(2πe E[(X − X)
2 2
(a) 1 1
≥ log(2πeP) − log(2πeD)
2 2
1 P
= log ,
2 D
̂ 2 ) ≤ D. It is easy to show that this bound is attained by
where (a) follows since E((X − X)
the backward Gaussian test channel shown in Figure . and that the associated expected
distortion is D.
3.8 Quadratic Gaussian Source Coding 65

Z ∼ N(0, D)

X̂ ∼ N(0, P − D) X ∼ N(0, P)

Figure .. The backward Gaussian test channel that attains the minimum in (.).

Proof of achievability. We extend the achievability proof for the DMS to the case of a
Gaussian source with quadratic distortion measure by using the following discretization
procedure. Let D be the desired distortion and let (X, X) ̂ be a pair of jointly Gauss-
̂
ian random variables attaining I(X; X) = R((1 − 2є)D) with distortion E((X − X) ̂ 2) =
̂ be finitely quantized versions of X and X,
(1 − 2є)D. Let [X] and [ X] ̂ respectively, such
that
̂ 2 󶀱 ≤ (1 − є)2 D,
E󶀡([X] − [ X])
E󶀡(X − [X])2 󶀱 ≤ є 2 D. (.)

Then by the data processing inequality,

̂ ≤ I(X; X)
I([X]; [ X]) ̂ = R((1 − 2є)D).

̂ there exists a
Now, by the achievability proof for the DMS [X] and reconstruction [ X],
sequence of (2nR , n) rate–distortion codes with asymptotic distortion
1 ̂ n )󶀱 ≤ (1 − є)2 D,
lim sup E󶀡d([X]n , [ X] (.)
n→∞ n
̂ We use this sequence of codes for the original source
if R > R((1 − 2є)D) ≥ I([X]; [ X]).
̂ n that is assigned to [x]n . Then
X by mapping each x to the codeword [x]
n

n
̂ n )󶀱 = lim sup 1 󵠈 E󶀡(Xi − [ X]
lim sup E󶀡d(X n , [ X] ̂ i )2 󶀱
n→∞ n→∞ n i=1
n
1 ̂ i )󶀱2 󶀱
= lim sup 󵠈 E󶀡󶀡(Xi − [Xi ]) + ([Xi ] − [ X]
n→∞ n i=1
(a) n
1 ̂ i )2 󶀱󶀲
≤ lim sup 󵠈 󶀢E󶀡(Xi − [Xi ])2 󶀱 + E󶀡([Xi ] − [ X]
n→∞ n i=1
n
2 ̂ i )2 󶀱
+ lim sup 󵠈 󵀆 E󶀡(Xi − [Xi ])2 󶀱 E󶀡([Xi ] − [ X]
n→∞ n i=1
(b) 2
≤ є D + (1 − є)2 D + 2є(1 − є)D
= D,

where (a) follows by Cauchy’s inequality and (b) follows by (.) and (.), and Jensen’s
inequality. Thus, R > R((1 − 2є)D) is achievable for distortion D. Using the continuity of
R(D) completes the proof of achievability.
66 Point-to-Point Information Theory

3.9 JOINT SOURCE–CHANNEL CODING

In previous sections we studied limits on communication of compressed sources over

noisy channels and uncompressed sources over noiseless channels. In this section, we
study the more general joint source–channel coding setup depicted in Figure .. The
sender wishes to communicate k symbols of an uncompressed source U over a DMC
p(y|x) in n transmissions so that the receiver can reconstruct the source symbols with a
prescribed distortion D. A straightforward scheme would be to perform separate source
and channel encoding and decoding. Is this separation scheme optimal? Can we do better
by allowing more general joint source–channel encoding and decoding?

Uk Xn Yn Û k
Encoder Channel Decoder

Figure .. Joint source–channel coding setup.

It turns out that separate source and channel coding is asymptotically optimal for
sending a DMS over a DMC, and hence the fundamental limit depends only on the rate–
distortion function of the source and the capacity of the channel.
Formally, let U be a DMS and d(u, û ) be a distortion measure with rate–distortion
function R(D) and p(y|x) be a DMC with capacity C. A (|U |k , n) joint source–channel
code of rate r = k/n consists of
∙ an encoder that assigns a codeword x n (u k ) ∈ X n to each sequence uk ∈ U k and
̂ k to each sequence y n ∈ Y n .
∙ a decoder that assigns an estimate û k (y n ) ∈ U
A rate–distortion pair (r, D) is said to be achievable if there exists a sequence of (|U |k , n)
joint source–channel codes of rate r such that

lim sup E󶁡d(U k , Û k (Y n ))󶁱 ≤ D.

k→∞

Shannon established the following fundamental limit on joint source–channel coding.

Theorem . (Source–Channel Separation Theorem). Given a DMS U and a distor-

tion measure d(u, û ) with rate–distortion function R(D) and a DMC p(y|x) with ca-
pacity C, the following statements hold:
∙ If rR(D) < C, then (r, D) is achievable.
∙ If (r, D) is achievable, then rR(D) ≤ C.

Proof of achievability. We use separate lossy source coding and channel coding.
3.9 Joint Source–Channel Coding 67

∙ Source coding: For any є > 0, there exists a sequence of lossy source codes with rate
R(D/(1 + є)) + δ(є) that achieve expected distortion less than or equal to D. We treat
the index for each code in the sequence as a message to be sent over the channel.
∙ Channel coding: The sequence of source indices can be reliably communicated over
the channel if r(R(D/(1 + є)) + δ(є)) ≤ C − δ 󳰀 (є).
The source decoder finds the reconstruction sequence corresponding to the received
index. If the channel decoder makes an error, the distortion is upper bounded by dmax .
Because the probability of error tends to zero as n → ∞, the overall expected distortion
is less than or equal to D.
Proof of the converse. We wish to show that if a sequence of codes achieves the rate–
distortion pair (r, D), then rR(D) ≤ C. By the converse proof of the lossy source coding
theorem, we know that
1
R(D) ≤ I(U k ; Û k ).
k

Now, by the data processing inequality,

1 1
I(U k ; Û k ) ≤ I(U k ; Y n ).
k k

Following similar steps to the converse proof for the DMC, we have
n
1 1 1
I(U k ; Y n ) ≤ 󵠈 I(Xi ; Yi ) ≤ C.
k k i=1 r

Combining the above inequalities completes the proof of the converse.

Remark 3.12. Since the converse of the channel coding theorem holds when causal feed-
back is present (see Section ..), the separation theorem continues to hold with feedback.
Remark 3.13. As in Remark ., there are cases where rR(D) = C and the rate–distortion
pair (r, D) is achievable via joint source–channel coding; see Example .. However, if
rR(D) > C, the rate–distortion pair (r, D) is not achievable. Hence, we informally say
that source–channel separation holds in general for sending a DMS over a DMC.
Remark 3.14. As a special case of joint source–channel coding, consider the problem of
sending U over a DMC losslessly, i.e., limk→∞ P{Û k ̸= U k } = 0. The separation theorem
holds with the requirement that rH(U ) ≤ C.
Remark 3.15. The separation theorem can be extended to sending an arbitrary stationary
ergodic source over a DMC.
Remark 3.16. As we will see in Chapter , source–channel separation does not hold in
general for communicating multiple sources over multiuser channels, that is, even in the
asymptotic regime, it may be beneficial to leverage the structure of the source and channel
jointly rather than separately.
68 Point-to-Point Information Theory

3.9.1 Uncoded Transmission

Sometimes optimal joint source–channel coding is simpler than separate source and chan-
nel coding. This is illustrated in the following.
Example .. Consider communicating a Bern(1/2) source over a BSC(p) at rate r = 1
with Hamming distortion less than or equal to D. The separation theorem shows that
1 − H(D) < 1 − H(p), or equivalently, D > p, can be achieved using separate source and
channel coding. More simply, we can transmit the binary sequence over the channel with-
out any coding and achieve average distortion D = p !

Similar uncoded transmission is optimal also for communicating a Gaussian source

over a Gaussian channel with quadratic distortion (with proper scaling to satisfy the power
constraint); see Problem ..
Remark .. In general, we have the following condition for the optimality of uncoded
transmission. A DMS U can be communicated over a DMC p(y|x) uncoded if X ∼ pU (x)
attains the capacity C = max p(x) I(X; Y) of the channel and the test channel pY|X (û |u)
attains the rate–distortion function R(D) = min p(û|u):E(d(U ,Û ))≤D I(U ; Û ) of the source. In
this case, C = R(D).

SUMMARY

∙ Point-to-point communication system architecture

∙ Discrete memoryless channel (DMC), e.g., BSC and BEC
∙ Coding theorem: achievability and the converse
∙ Channel capacity is the limit on channel coding
∙ Random codebook generation
∙ Joint typicality decoding
∙ Packing lemma
∙ Feedback does not increase the capacity of a DMC
∙ Capacity with input cost
∙ Gaussian channel:
∙ Capacity with average power constraint is achieved via Gaussian codes
∙ Extending the achievability proof from discrete to Gaussian
∙ Minimum energy per bit
∙ Water filling
∙ Discrete memoryless source (DMS)
Bibliographic Notes 69

∙ Entropy is the limit on lossless source coding

∙ Joint typicality encoding
∙ Covering lemma
∙ Rate–distortion function is the limit on lossy source coding
∙ Rate–distortion function for Gaussian source with quadratic distortion
∙ Lossless source coding theorem is a corollary of lossy source coding theorem
∙ Source–channel separation
∙ Uncoded transmission can be optimal

BIBLIOGRAPHIC NOTES

The channel coding theorem was first proved in Shannon (). There are alternative
proofs of achievability for this theorem that yield stronger results, including Feinstein’s
() maximal coding theorem and Gallager’s () random coding exponent tech-
nique, which yield stronger results. For example, it can be shown (Gallager ) that the
probability of error decays exponentially fast in the block length and the random coding
exponent technique gives a very good bound on the optimal error exponent (reliability
function) for the DMC. These proofs, however, do not extend easily to many multiuser
channel and source coding problems. In comparison, the current proof (Forney ,
Cover b), which is based on Shannon’s original arguments, is much simpler and can
be readily extended to more complex settings. Hence we will adopt random codebook
generation and joint typicality decoding throughout.
The achievability proof of the channel coding theorem for the BSC using a random
linear code is due to Elias (). Even though random linear codes allow for computa-
tionally efficient encoding (by simply multiplying the message by a generator matrix G),
decoding still requires an exponential search, which limits its practical value. This prob-
lem can be mitigated by considering a linear code ensemble with special structures, such
as Gallager’s () low density parity check (LDPC) codes, which have efficient decoding
algorithms and achieve rates close to capacity (Richardson and Urbanke ). A more
recently developed class of capacity-achieving linear codes is polar codes (Arıkan ),
which involve an elegant information theoretic low-complexity decoding algorithm and
can be applied also to lossy compression settings (Korada and Urbanke ). Linear
codes for the BSC or BEC are examples of structured codes. Other examples include lat-
tice codes for the Gaussian channel, which have been shown to achieve the capacity by
Erez and Zamir (); see Zamir () for a survey of recent developments.
The converse of the channel coding theorem states that if R > C, then Pe(n) is bounded
away from zero as n → ∞. This is commonly referred to as the weak converse. In compar-
ison, the strong converse (Wolfowitz ) states that if R > C, then limn→∞ Pe(n) = 1. A
similar statement holds for the lossless source coding theorem. However, except for a few
70 Point-to-Point Information Theory

cases to be discussed later, it appears to be difficult to prove the strong converse for most
multiuser settings. As such, we only present weak converse proofs in our main exposition.
The capacity formula for the Gaussian channel under average power constraint in The-
orem . is due to Shannon (). The achievability proof using the discretization proce-
dure follows McEliece (). Alternative proofs of achievability for the Gaussian channel
can be found in Gallager () and Cover and Thomas (). The discrete-time Gauss-
ian channel is the model for a continuous-time (waveform) bandlimited Gaussian channel
with bandwidth W = 1/2, noise power spectral density (psd) N0 /2, average transmission
power P (area under psd of signal), and channel gain д. If the channel has bandwidth W,
then it is equivalent to 2W parallel discrete-time Gaussian channels (per second) and the
capacity (see, for example, Wyner () and Slepian ()) is

д2 P
C = W log 󶀦1 + 󶀶 bits/second.
WN0

For a wideband channel, the capacity C converges to (S/2) ln 2 as W → ∞, where S =

2д 2 P/N0 . Thus the capacity grows linearly with S and can be achieved via simple binary
code as shown by Golay (). The minimum energy per bit for the Gaussian channel
also first appeared in this paper. The minimum energy per bit can be also viewed as a
special case of the reciprocal of the capacity per unit cost studied by Csiszár and Körner
(b, p. ) and Verdú (). The capacity of the spectral Gaussian channel, which is
the continuous counterpart of the Gaussian product channel, and its water-filling solution
are due to Shannon (a).
The lossless source coding theorem was first proved in Shannon (). In many ap-
plications, one cannot afford to have any errors introduced by compression. Error-free
compression (P{X n ̸= X̂ n } = 0) for fixed-length codes, however, requires that R ≥ log |X |.
Using variable-length codes, Shannon () also showed that error-free compression is
possible if the average rate of the code is larger than the entropy H(X). Hence, the limit
on the average achievable rate is the same for both lossless and error-free compression.
This is not true in general for distributed coding of correlated sources; see Bibliographic
Notes in Chapter .
The lossy source coding theorem was first proved in Shannon (), following an
earlier result for the quadratic Gaussian case in Shannon (). The current achievabil-
ity proof of the quadratic Gaussian lossy source coding theorem follows McEliece ().
There are several other ways to prove achievability for continuous sources and unbounded
distortion measures (Berger , Dunham , Bucklew , Cover and Thomas ).
As an alternative to the expected distortion criterion in (.), several authors have con-
sidered the stronger criterion

lim P󶁁d󶀡X n , x̂n (m(X n ))󶀱 ≤ D󶁑 = 0

n→∞

in the definition of achievability of a rate–distortion pair (R, D). The lossy source cod-
ing theorem and its achievability proof in Section . continue to hold for this alterna-
tive distortion criterion. In the other direction, a strong converse—if R < R(D), then
Problems 71

P{d(X n , X̂ n ) ≤ D} tends to zero as n → ∞—can be established (Csiszár and Körner b,

Theorem .) that implies the converse for the expected distortion criterion.
The lossless source coding theorem can be extended to discrete stationary ergodic (not
necessarily i.i.d.) sources (Shannon ). Similarly, the lossy source coding theorem can
be extended to stationary ergodic sources (Gallager ) with the following characteri-
zation of the rate–distortion function
1
R(D) = lim min I(X k ; X̂ k ).
̂ 󰑘 ))≤D
k→∞ p(x̂󰑘 |x 󰑘 ):E(d(X 󰑘 , X k

However, the notion of ergodicity for channels is more subtle and involved. Roughly
speaking, the capacity is well-defined for discrete channels such that for every time i ≥ 1
j+i j+i
and shift j ≥ 1, the conditional pmf p(yi |xi ) is time invariant (that is, independent
of i) and can be estimated using appropriate time averages. For example, if Yi = д(Xi , Zi )
for some stationary ergodic process {Zi }, then it can be shown (Kim b) that the ca-
pacity is
1
C = lim sup I(X k ; Y k ).
k→∞ p(x 󰑘 ) k

The coding theorem for more general classes of channels with memory can be found in
Gray () and Verdú and Han (). As for point-to-point communication, the essence
of the multiuser source and channel coding problems is captured by the memoryless case.
Moreover, the multiuser problems with memory often have only uncomputable “multi-
letter” expressions as above. We therefore restrict our attention to discrete memoryless
and white Gaussian noise sources and channels.
The source–channel separation theorem was first proved in Shannon (). The gen-
eral condition for optimality of uncoded transmission in Remark . is given by Gastpar,
Rimoldi, and Vetterli ().

PROBLEMS

.. Memoryless property. Show that under the given definition of a (2nR , n) code, the
memoryless property p(yi |x i , y i−1 , m) = pY|X (yi |xi ), i ∈ [1 : n], reduces to

n
p(y n |x n , m) = 󵠉 pY|X (yi |xi ).
i=1

.. Z channel. The Z channel has binary input and output alphabets, and conditional
pmf p(0|0) = 1, p(1|1) = p(0|1) = 1/2. Find the capacity C.
.. Capacity of the sum channel. Find the capacity C of the union of two DMCs (X1 ,
p(y1 |x1 ), Y1 ) and (X2 , p(y2 |x2 ), Y2 ), where, in each transmission, one can send a
symbol over channel  or channel  but not both. Assume that the output alphabets
are distinct, i.e., Y1 ∩ Y2 = .
72 Point-to-Point Information Theory

.. Applications of the packing lemma. Identify the random variables U , X, and Y in
the packing lemma for the following scenarios, and write down the packing lemma
condition on the rate R for each case.
(a) Let (X1 , X2 , X3 ) ∼ p(x1 )p(x2 )p(x3 |x1 , x2 ). Let X1n (m), m ∈ [1 : 2nR ], be each
distributed according to ∏ni=1 p X1 (x1i ), and ( X̃ 2n , X̃ 3n ) ∼ ∏ni=1 p X2 ,X3 (x̃2i , x̃3i ) be
independent of X1n (m) for m ∈ [1 : 2nR ].
(b) Let (X1 , X2 , X3 ) ∼ p(x1 , x2 )p(x3 |x2 ) and R = R0 + R1 . Let X1n (m0 ), m0 ∈ [1 :
2nR0 ], be distributed according to ∏ni=1 p X1 (x1i ). For each m0 , let X2n (m0 , m1 ),
m1 ∈ [1 : 2nR1 ], be distributed according to ∏ni=1 p X2 |X1 (x2i |x1i (m0 )). Let X̃ 3n ∼
∏ni=1 p X3 (x̃3i ) be independent of (X1n (m0 ), X2n (m0 , m1 )) for m0 ∈ [1 : 2nR0 ],
m1 ∈ [1 : 2nR1 ].
.. Maximum likelihood decoding. The achievability proof of the channel coding theo-
rem in Section .. uses joint typicality decoding. This technique greatly simplifies
the proof, especially for multiuser channels. However, given a codebook, the joint
typicality decoding is not optimal in terms of minimizing the probability of de-
coding error (it is in fact surprising that such a suboptimal decoding rule can still
achieve capacity).
Since the messages are equally likely, maximum likelihood decoding (MLD)
n
̂ = arg max p(y n |m) = arg max 󵠉 pY|X (yi |xi (m))
m
m m
i=1

is the optimal decoding rule (when there is a tie, choose an arbitrary index that
maximizes the likelihood). Achievability proofs using MLD are more complex
but provide tighter bounds on the optimal error exponent (reliability function);
see, for example, Gallager ().
In this problem we use MLD to establish achievability of the capacity for a
BSC(p), p < 1/2. Define the Hamming distance d(x n , y n ) between two binary
sequences x n and yn as the number of positions where they differ, i.e., d(x n , yn ) =
|{i : xi ̸= yi }|.
(a) Show that the MLD rule reduces to the minimum Hamming distance decoding
rule—declare m̂ is sent if d(x n (m),
̂ y n ) < d(x n (m), y n ) for all m ̸= m.
̂
(b) Now fix X ∼ Bern(1/2). Using random coding and minimum distance decod-
ing, show that for every є > 0, the probability of error averaged over codebooks
is upper bounded as

̂ ̸= 1 | M = 1}
Pe(n) = P{M
≤ P󶁁d(X n (1), Y n ) > n(p + є) 󵄨󵄨󵄨󵄨 M = 1󶁑
+ (2nR − 1) P󶁁d(X n (2), Y n ) ≤ n(p + є) 󵄨󵄨󵄨󵄨 M = 1󶁑.

Chernoff–Hoeffding bound (Hoeffding ) that

P󶁁d(X n (2), Y n ) ≤ n(p + є) 󵄨󵄨󵄨󵄨 M = 1󶁑 ≤ 2−n(1−H(p+є)) .

Using these results, show that any R < C = 1 − H(p) is achievable.

.. Randomized code. Suppose that in the definition of the (2nR , n) code for the DMC
p(y|x), we allow the encoder and the decoder to use random mappings. Specif-
ically, let W be an arbitrary random variable independent of the message M and
the channel, i.e., p(yi |x i , y i−1 , m, 󰑤) = pY|X (yi |xi ) for i ∈ [1 : n]. The encoder gen-
erates a codeword x n (m, W), m ∈ [1 : 2nR ], and the decoder generates an estimate
̂ n , W). Show that this randomization does not increase the capacity of the
m(y
DMC.
.. Nonuniform message. Recall that a (2nR , n) code for the DMC p(y|x) consists of
an encoder x n = ϕn (m) and a decoder m ̂ = ψn (y n ). Suppose that there exists a se-
quence of (2nR , n) codes such that Pe(n) = P{M ̸= M}̂ tends to zero as n → ∞, where
M is uniformly distributed over [1 : 2 ]. (In other words, the rate R is achievable.)
nR

Now suppose that we wish to communicate a message M 󳰀 that is arbitrarily (not

uniformly) distributed over [1 : 2nR ].
(a) Show that there exists a sequence of (2nR , n) codes with encoder–decoder pairs
(ϕ󳰀n , ψn󳰀 ) such that
lim P{M 󳰀 ̸= M ̂ 󳰀 } = 0.
n→∞

(Hint: Consider a random ensemble of codes Φ󳰀n = ϕn ∘ σ and Ψn󳰀 = σ −1 ∘ ψn ,

where σ is a random permutation. Show the probability of error, averaged over
M 󳰀 and σ, is equal to Pe(n) and conclude that there exists a good permutation σ
for each M 󳰀 .)
(b) Does this result imply that the capacity for the maximal probability of error is
equal to that for the average probability of error?
.. Independently generated codebooks. Let (X, Y) ∼ p(x, y), and p(x) and p(y) be
their marginals. Consider two randomly and independently generated codebooks
C1 = {X n (1), . . . , X n (2nR1 )} and C2 = {Y n (1), . . . , Y n (2nR2 )}. The codewords of C1
are generated independently each according to ∏ni=1 p X (xi ), and the codewords
for C2 are generated independently according to ∏ni=1 pY (yi ). Define the set

C = {(x n , y n ) ∈ C1 × C2 : (x n , y n ) ∈ Tє(n) (X, Y )}.

Show that
E |C | ≐ 2n(R1 +R2 −I(X;Y)) .

.. Capacity with input cost. Consider the DMC p(y|x) with cost constraint B.
(a) Using the operational definition of the capacity–cost function C(B), show that
it is nondecreasing and concave for B ≥ 0.
74 Point-to-Point Information Theory

(b) Show that the information capacity–cost function C(B) is nondecreasing, con-
cave, and continuous for B ≥ 0.
.. BSC with input cost. Find the capacity–cost function C(B) for a BSC(p) with input
cost function b(1) = 1 and b(0) = 0.
.. Channels with input–output cost. Let b(x, y) be a nonnegative input–output cost
function on X × Y. Consider a DMC p(y|x) in which every codeword x n (m),
m ∈ [1 : 2nR ], must satisfy the average cost constraint
n
1
E(b(x n (m), Y n )) = 󵠈 E(b(xi (m), Yi )) ≤ B,
n i=1

where the expectation is with respect to the channel pmf ∏ni=1 pY|X (yi |xi (m)).
Show that the capacity of the DMC with cost constraint B is

C(B) = max I(X; Y ).

p(x):E(b(X,Y))≤B

(Hint: Consider the input-only cost function b󳰀 (x) = E(b(x, Y )), where the expec-
tation is taken with respect to p(y|x).)
.. Output scaling. Show that the capacity of the Gaussian channel Y = дX + Z re-
mains the same if we scale the output by a nonzero constant a.
.. Water-filling. Consider the -component Gaussian product channel Y j = д j X j +
Z j , j = 1, 2, with д1 < д2 and average power constraint P.
(a) Above what power P should we begin to use both channels?
(b) What is the energy-per-bit–rate function Eb (R) needed for reliable commu-
nication at rate R over the channel? Show that Eb (R) is strictly monotoni-
cally increasing and convex in R. What is the minimum energy per bit for the
-component Gaussian product channel, i.e., limR→0 Eb (R)?
.. List codes. A (2nR , 2nL , n) list code for a DMC p(y|x) with capacity C consists of
an encoder that assigns a codeword x n (m) to each message m ∈ [1 : 2nR ] and a
decoder that upon receiving y n tries to finds the list of messages L(y n ) ⊆ [1 : 2nR ]
of size |L| ≤ 2nL that contains the transmitted message. An error occurs if the
list does not contain the transmitted message M, i.e., Pe(n) = P{M ∉ L(Y n )}. A
rate–list exponent pair (R, L) is said to be achievable if there exists a sequence of
(2nR , 2nL , n) list codes with Pe(n) → 0 as n → ∞.
(a) Using random coding and joint typicality decoding, show that any (R, L) is
achievable, provided R < C + L.
(b) Show that for every sequence of (2nR , 2nL , n) list codes with Pe(n) → 0 as n →
∞, we must have R ≤ C + L. (Hint: You will need to develop a modified Fano’s
inequality.)
Problems 75

.. Strong converse for source coding. Given a sequence of (2nR , n) lossless source codes
with R < H(X), show that Pe(n) → 1 as n → ∞. (Hint: A (2nR , n) code can repre-
sent only 2nR points in X n . Using typicality, show that if R < H(X), the probability
of these 2nR points converges to zero, no matter how we choose them.)
.. Infinite alphabet. Consider the lossless source coding problem for a discrete, but
infinite-alphabet source X with finite entropy H(X) < ∞. Show that R∗ = H(X).
(Hint: For the proof of achievability, consider a truncated DMS [X] such that
P{X n ̸= [X]n } tends to zero as n → ∞.)
.. Rate–distortion function. Consider the lossy source coding for a DMS X with dis-
tortion measure d(x, x̂).
(a) Using the operational definition, show that the rate–distortion function R(D)
is nonincreasing and convex for D ≥ 0.
(b) Show that the information rate–distortion function R(D) is nonincreasing,
convex, and continuous for D ≥ 0.
.. Bounds on the quadratic rate–distortion function. Let X be an arbitrary memoryless
(stationary) source with variance P, and let d(x, x̂) = (x − x)̂ 2 be the quadratic
distortion measure.
(a) Show that the rate–distortion function is bounded as
1 1 P
h(X) − log(2πeD) ≤ R(D) ≤ log 󶀤 󶀴
2 2 D
with equality iff X is a WGN(P) source. (Hint: For the upper bound, consider
X̂ = (P − D)X/P + Z, where Z ∼ N(0, D(P − D)/P) is independent of X.)
Remark: The lower bound is referred to as the Shannon lower bound.
(b) Is the Gaussian source harder or easier to describe than other sources with the
same variance?
.. Lossy source coding from a noisy observation. Let X ∼ p(x) be a DMS and Y be
another DMS obtained by passing X through a DMC p(y|x). Let d(x, x̂) be a dis-
tortion measure and consider a lossy source coding problem in which Y (instead
of X) is encoded and sent to the decoder who wishes to reconstruct X with a pre-
scribed distortion D.
Unlike the regular lossy source coding setup, the encoder maps each yn se-
quence to an index m ∈ [1 : 2nR ). Otherwise, the definitions of (2nR , n) codes,
achievability, and rate–distortion function are the same as before.
Let Dmin = minx̂(y) E[d(X, x̂(Y ))]. Show that the rate–distortion function for
this setting is

R(D) = min ̂
I(Y ; X) for D ≥ Dmin .
̂
p(x̂|y):E(d(X, X))≤D

(Hint: Define a new distortion measure d 󳰀 (y, x)

̂ = E(d(X, x)
̂ |Y = y), and show
76 Point-to-Point Information Theory

that
E󶁡d󶀡X n , x̂n (m(Y n ))󶀱󶁱 = E󶁡d 󳰀 󶀡Y n , x̂n (m(Y n ))󶀱󶁱. )

.. To code or not to code. Consider a WGN(P) source U and a Gaussian channel with
output Y = дX + Z, where Z ∼ N(0, 1). We wish to communicate the source over
the channel at rate r = 1 symbol/transmission with the smallest possible squared
error distortion. Assume an expected average power constraint
n
1
󵠈 E(xi2 (U n )) ≤ nP.
n i=1

(a) Find the minimum distortion achieved by separate source and channel coding.
(b) Find the distortion achieved when the sender transmits Xi = Ui , i ∈ [1 : n],
i.e., performs no coding, and the receiver uses the (linear) MMSE estimate Û i
of Ui given Yi . Compare this to the distortion in part (a) and comment on the
results.
.. Two reconstructions. Let X be a DMS, and d1 (x, x̂1 ), x̂1 ∈ X̂1 , and d2 (x, x̂2 ), x̂2 ∈
X̂2 , be two distortion measures. We wish to reconstruct X under both distortion
measures from the same description as depicted in Figure ..

( X̂ 1n , D1 )
Decoder 
Xn M
Encoder
( X̂ 2n , D2 )
Decoder 

Figure .. Lossy source coding with two reconstructions.

Define a (2nR , n) code, achievability of the rate–distortion triple (R, D1 , D2 ),

and the rate–distortion function R(D1 , D2 ) in the standard way. Show that

R(D1 , D2 ) = min I(X; X̂ 1 , X̂ 2 ).

p(x̂1 , x̂2 |x):E(d 󰑗 (x, x̂ 󰑗 ))≤D 󰑗 , j=1,2

.. Lossy source coding with reconstruction cost. Let X be a DMS and d(x, x) ̂ be a
̂ ̂
distortion measure. Further let b(x) ≥ 0 be a cost function on X . Suppose that
there is an average cost constraint on each reconstruction sequence x̂n (m),
n
b(x̂n (m)) ≤ 󵠈 b(x̂i (m)) ≤ nB for every m ∈ [1 : 2nR ),
i=1

in addition to the distortion constraint E(d(X n , X̂ n )) ≤ D. Define a (2nR , n) code,

Appendix 3A Proof of Lemma 3.2 77

achievability of the triple (R, D, B), and rate–distortion–cost function R(D, B) in

the standard way. Show that

R(D, B) = min ̂
I(X; X).
̂
p(x̂|x):E(d(X, X))≤D, ̂
E(b( X))≤B

Note that this problem is not a special case of the above two-reconstruction prob-
lem.

APPENDIX 3A PROOF OF LEMMA 3.2

We first note that I([X] j ; [Y j ]k ) → I([X] j ; Y j ) = h(Y j ) − h(Z) as k → ∞. This follows

since ([Y j ]k − Y j ) tends to zero as k → ∞; recall Section .. Hence it suffices to show
that
lim inf h(Y j ) ≥ h(Y).
j→∞

First note that the pdf of Y j converges pointwise to that of Y ∼ N(0, S + 1). To prove this,
consider
fY 󰑗 (y) = 󵐐 fZ (y − x) dF[X] 󰑗 (x) = E󶀡 fZ (y − [X] j )󶀱.

Since the Gaussian pdf fZ (z) is continuous and bounded, fY 󰑗 (y) converges to fY (y) for
every y by the weak convergence of [X] j to X. Furthermore, we have

1
fY 󰑗 (y) = E󶀡 fZ (y − [X] j )󶀱 ≤ max fZ (z) = .
z 󵀂2π

Hence, for each a > 0, by the dominated convergence theorem (Appendix B),
∞
h(Y j ) = 󵐐 − fY 󰑗 (y) log fY 󰑗 (y) d y
−∞
a
≥ 󵐐 − fY󰑗 (y) log fY 󰑗 (y) d y + P{|Y j | ≥ a} ⋅ min(− log fY 󰑗 (y)),
−a y

which converges to
a
󵐐 − f (y) log f (y) d y + P{|Y | ≥ a} ⋅ min(− log f (y))
−a y

as j → ∞. Taking a → ∞, we obtain the desired result.

PART II

SINGLE-HOP NETWORKS
CHAPTER 4

Multiple Access Channels

We introduce the multiple access channel as a simple model for noisy many-to-one com-
munication, such as the uplink of a cellular system, medium access in a local area network
(LAN), or multiple reporters asking questions in a press conference. We then establish the
main result of this chapter, which is a computable characterization of the capacity region
of the two-sender multiple access channel with independent messages. The proof involves
the new techniques of successive cancellation decoding, time sharing, simultaneous de-
coding, and coded time sharing. This result is extended to establish the capacity region
of the Gaussian multiple access channel. We show that successive cancellation decoding
outperforms the point-to-point based coding schemes of time-division multiple access
and treating the other sender’s signal as noise. Finally, we extend these results to multiple
access channels with more than two senders.

4.1 DISCRETE MEMORYLESS MULTIPLE ACCESS CHANNEL

Consider the multiple access communication system model depicted in Figure .. Each
sender wishes to communicate an independent message reliably to a common receiver. As
such, sender j = 1, 2 encodes its message M j into a codeword X nj and transmits it over
the shared channel. Upon receiving the sequence Y n , the decoder finds estimates M ̂ j,
j = 1, 2, of the messages. Since the senders transmit over a common noisy channel, a
tradeoff arises between the rates of reliable communication for the two messages—when
one sender transmits at a high rate, the other sender may need to back off its rate to ensure
reliable communication of both messages. As in the point-to-point communication case,
we study the limit on this tradeoff when there is no constraint on the code block length n.

M1 X1n
Encoder 
Yn ̂ 1, M
M ̂2
p(y|x1 , x2 ) Decoder
M2 X2n
Encoder 

Figure .. Multiple access communication system with independent messages.

82 Multiple Access Channels

We first consider a -sender discrete memoryless multiple access channel (DM-MAC)

model (X1 × X2 , p(y|x1 , x2 ), Y) that consists of three finite sets X1 , X2 , Y, and a collection
of conditional pmfs p(y|x1 , x2 ) on Y (one for each input symbol pair (x1 , x2 )).
A (2nR1 , 2nR2 , n) code for the DM-MAC consists of
∙ two message sets [1 : 2nR1 ] and [1 : 2nR2 ],
∙ two encoders, where encoder  assigns a codeword x1n (m1 ) to each message m1 ∈ [1 :
2nR1 ] and encoder  assigns a codeword x2n (m2 ) to each message m2 ∈ [1 : 2nR2 ], and
̂ 1, m
∙ a decoder that assigns an estimate (m ̂ 2 ) ∈ [1 : 2nR1 ] × [1 : 2nR2 ] or an error message
n
e to each received sequence y .
We assume that the message pair (M1 , M2 ) is uniformly distributed over [1 : 2nR1 ] ×
[1 : 2nR2 ]. Consequently, x1n (M1 ) and x2n (M2 ) are independent. The average probability of
error is defined as
Pe(n) = P󶁁(M̂ 1, M
̂ 2) =
̸ (M1 , M2 )󶁑.

A rate pair (R1 , R2 ) is said to be achievable for the DM-MAC if there exists a sequence of
(2nR1 , 2nR2 , n) codes such that limn→∞ Pe(n) = 0. The capacity region C of the DM-MAC is
the closure of the set of achievable rate pairs (R1 , R2 ). Sometimes we are interested in the
sum-capacity Csum of the DM-MAC defined as Csum = max{R1 + R2 : (R1 , R2 ) ∈ C }.

4.2 SIMPLE BOUNDS ON THE CAPACITY REGION

We begin with simple bounds on the capacity region of the DM-MAC. For this discussion,
note that the maximum achievable individual rates are

C1 = max I(X1 ; Y | X2 = x2 ),
x2 , p(x1 )
(.)
C2 = max I(X2 ; Y | X1 = x1 ).
x1 , p(x2 )

Using these rates, we can readily achieve any rate pairs below the line segment between
(C1 , 0) and (0, C2 ) using time division (or frequency division), which yields the inner bound
sketched in Figure .. Following similar steps to the converse proof of the channel coding
theorem in Section .., we can show that the sum-rate is upper bounded as

R1 + R2 ≤ C12 = max I(X1 , X2 ; Y). (.)

p(x1 )p(x2 )

Combining the bounds on the individual and sum capacities in (.) and (.), we obtain
the general outer bound on the capacity region sketched in Figure ..
These bounds coincide for the following DM-MAC.
Example . (Binary multiplier MAC). Suppose that the inputs X1 and X2 are binary
and the output Y = X1 ⋅ X2 . Then it can be easily checked that C1 = C2 = C12 = 1. Thus
4.2 Simple Bounds on the Capacity Region 83

C12 Time-division
inner bound
C2
Outer bound
?

R1
C1 C12

Figure .. Time-division inner bound and an outer bound on the capacity region.

the inner and outer bounds in Figure . coincide and the capacity region is the set of rate
pairs (R1 , R2 ) such that

R1 ≤ 1,
R2 ≤ 1,
R1 + R2 ≤ 1.

This is plotted in Figure .a.

The bounds in Figure . do not coincide in general, however.
Example . (Binary erasure MAC). Suppose that the inputs X1 and X2 are binary and
the output Y = X1 + X2 is ternary. Again it can be easily checked that C1 = C2 = 1 and
C12 = 3/2. Hence, the inner and outer bounds do not coincide for this channel. We will
shortly see that the outer bound is the capacity region. This is plotted in Figure .b.
In general, neither the inner bound nor the outer bound in Figure . is tight.

R2 R2

1 1

1/2

R1 R1
1 1/2 1

(a) (b)

Figure .. Capacity regions of (a) the binary multiplier MAC, and (b) the binary
erasure MAC.
84 Multiple Access Channels

4.3* MULTILETTER CHARACTERIZATION OF THE CAPACITY REGION

The capacity region of the DM-MAC can be characterized as follows.

Theorem .. Let C (k) , k ≥ 1, be the set of rate pairs (R1 , R2 ) such that
1
R1 ≤ I(X1k ; Y k ),
k
1
R2 ≤ I(X2k ; Y k )
k

for some pmf p(x1k )p(x2k ). Then the capacity region C of the DM-MAC p(y|x1 , x2 ) is
the closure of the set
󵠎 C (k) .
k

Proof of achievability. We code over k symbols of X1 and X2 (super-symbols) together.

Fix a product pmf p(x1k )p(x2k ). Randomly and independently generate 2nkR1 sequences
x1nk (m1 ), m1 ∈ [1 : 2nkR1 ], each according to ∏ni=1 p X1󰑘 (x1,(i−1)k+1
ik
). Similarly generate 2nkR2
sequences x2nk (m2 ), m2 ∈ [1 : 2nkR2 ], each according to ∏ni=1 p X2󰑘 (x2,(i−1)k+1
ik
). Upon re-
ceiving the sequence y nk , the decoder uses joint typicality decoding to find an estimate
for each message separately. It can be readily shown that the average probability of error
tends to zero as n → ∞ if kR1 < I(X1k ; Y k ) − δ(є) and kR2 < I(X2k ; Y k ) − δ(є).

Proof of the converse. Using Fano’s inequality, we can show that

1
R1 ≤ I(X1n ; Y n ) + єn ,
n
1
R2 ≤ I(X2n ; Y n ) + єn ,
n

where єn tends to zero as n → ∞. This shows that (R1 , R2 ) must be in the closure of
⋃k C (k) .
Although the above multiletter characterization of the capacity region is well-defined,
it is not clear how to compute it. Furthermore, this characterization does not provide
any insight into how to best code for the multiple access channel. Such multiletter ex-
pressions can be readily obtained for other multiuser channels and sources, leading to a
fairly complete but unsatisfactory theory. Consequently, we seek computable single-letter
characterizations of capacity, such as Shannon’s capacity formula for the point-to-point
channel, that shed light on practical coding techniques. Single-letter characterizations,
however, have been difficult to find for most multiuser channels and sources. The DM-
MAC is one of rare channels for which a complete single-letter characterization of the
capacity region is known. We develop this characterization over the next several sections.
4.4 Time Sharing 85

4.4 TIME SHARING

In time/frequency division, only one sender transmits in each time slot/frequency band.
Time sharing generalizes time division by allowing the senders to transmit simultaneously
at different nonzero rates in each slot.

Proposition .. If the rate pairs (R11 , R21 ) and (R12 , R22 ) are achievable for the DM-
̄ 21 , αR21 + αR
MAC p(y|x1 , x2 ), then the rate pair (R1 , R2 ) = (αR11 + αR ̄ 22 ) is achievable
for every α ∈ [0, 1].

Proof. Consider two sequences of codes, one achieving (R11 , R21 ) and the other achieving
(R12 , R22 ). For each block length n, assume without loss of generality that αn is an integer.
̄ 12 ᾱ nR22
Consider the (2αnR11 , 2αnR21 , αn) and (2αnR ,2 ̄ codes from the given first and
, αn)
second sequences of codes, respectively.
To send the message pair (M1 , M2 ), we perform rate splitting. We represent the mes-
sage M1 by independent messages M11 at rate αR11 and M12 at rate αR ̄ 12 . Similarly, we rep-
resent the message M2 by M21 at rate αR21 and M22 at rate αR ̄ 22 . Thus, R1 = αR11 + αR ̄ 12
and R2 = αR21 + αR ̄ 22 .
For the first αn transmissions, sender j = 1, 2 transmits its codeword for M j1 from the
(2αnR11 , 2αnR21 , αn) code and for the rest of the transmissions, it transmits its codeword for
̄ 12 ᾱ nR22
M j2 from the (2αnR ,2 ̄ code. Upon receiving y n , the receiver decodes yαn using
, αn)
n
the decoder of the first code and yαn+1 using the decoder of the second code.
By assumption, the probability of error for each decoder tends to zero as n → ∞.
Hence, by the union of events bound, the probability of decoding error tends to zero
as n → ∞ and the rate pair (R1 , R2 ) = (αR11 + αR ̄ 12 , αR21 + αR
̄ 22 ) is achievable. This
completes the proof of the proposition.

Remark 4.1. Time division and frequency division are special cases of time sharing, in
which the senders transmit at rate pairs (R1 , 0) and (0, R2 ).
Remark 4.2. The above time-sharing argument shows that the capacity region of the DM-
MAC is convex. Note that this proof uses the operational definition of the capacity region
(as opposed to the information definition in terms of mutual information).
Remark 4.3. Similar time-sharing arguments can be used to show the convexity of the ca-
pacity region of any (synchronous) communication channel for which capacity is defined
as the optimal rate of block codes, e.g., capacity with cost constraint, as well as optimal
rate regions for source coding problems, e.g., rate–distortion function in Chapter . As
we will see in Chapter , when the sender transmissions in the DM-MAC are not syn-
chronized, time sharing becomes infeasible and consequently the capacity region is not
necessarily convex.
Remark 4.4. The rate-splitting technique will be used in other coding schemes later; for
example, see Section ..
86 Multiple Access Channels

4.5 SINGLE-LETTER CHARACTERIZATION OF THE CAPACITY REGION

We are now ready to present a single-letter characterization of the DM-MAC capacity

region. Let (X1 , X2 ) ∼ p(x1 )p(x2 ). Let R(X1 , X2 ) be the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y | X2 ),
R2 ≤ I(X2 ; Y | X1 ),
R1 + R2 ≤ I(X1 , X2 ; Y).

As shown in Figure ., this set is in general a pentagonal region with a 45∘ side because

max󶁁I(X1 ; Y | X2 ), I(X2 ; Y | X1 )󶁑 ≤ I(X1 , X2 ; Y) ≤ I(X1 ; Y | X2 ) + I(X2 ; Y | X1 ).

I(X2 ; Y|X1 )

I(X2 ; Y)
R1
I(X1 ; Y ) I(X1 ; Y|X2 )
Figure .. The region R(X1 , X2 ) for a typical DM-MAC.

For example, consider the binary erasure MAC in Example .. Setting X1 , X2 ∼
Bern(1/2) to be independent, the corresponding R(X1 , X2 ) is the set of rate pairs (R1 , R2 )
such that

R1 ≤ H(Y | X2 ) = 1,
R2 ≤ H(Y | X1 ) = 1,
3
R1 + R2 ≤ H(Y) = .
2

This region coincides with the outer bound in Figure .; hence it is the capacity region.
In general, the capacity region of a DM-MAC is the convex hull of the union of mul-
tiple R(X1 , X2 ) regions; see Figure ..

Theorem .. The capacity region C of the DM-MAC p(y|x1 , x2 ) is the convex hull of
the union of the regions R(X1 , X2 ) over all p(x1 )p(x2 ).

We prove Theorem . in the following two subsections.

4.5 Single-Letter Characterization of the Capacity Region 87

C12

R1
C1 C12

Figure .. The capacity region of a typical DM-MAC. The individual capacities C1 ,
C2 , and the sum-capacity C12 are defined in (.) and (.), respectively.

4.5.1 Proof of Achievability

Let (X1 , X2 ) ∼ p(x1 )p(x2 ). We show that every rate pair (R1 , R2 ) in the interior of the re-
gion R(X1 , X2 ) is achievable. The rest of the capacity region is achieved using time sharing
between points in different R(X1 , X2 ) regions. Assume nR1 and nR2 to be integers.
Codebook generation. Randomly and independently generate 2nR1 sequences x1n (m1 ),
m1 ∈ [1 : 2nR1 ], each according to ∏ni=1 p X1 (x1i ). Similarly generate 2nR2 sequences x2n (m2 ),
m2 ∈ [1 : 2nR2 ], each according to ∏ni=1 p X2 (x2i ). These codewords constitute the code-
book, which is revealed to the encoders and the decoder.
Encoding. To send message m1 , encoder  transmits x1n (m1 ). Similarly, to send m2 , en-
coder  transmits x2n (m2 ).
In the following, we consider two decoding rules.
Successive cancellation decoding. This decoding rule aims to achieve one of the two
corner points of the pentagonal region R(X1 , X2 ), for example,

R1 < I(X1 ; Y ),
R2 < I(X2 ; Y | X1 ).

Decoding is performed in two steps:

̂ 1 is sent if it is the unique message such that (x1n (m
. The decoder declares that m ̂ 1 ), y n ) ∈
(n)
Tє ; otherwise it declares an error.
̂ 1 is found, the decoder finds the unique m
. If such m ̂ 2 such that (x1n (m
̂ 1 ), x2n (m
̂ 2 ), y n ) ∈
(n)
Tє ; otherwise it declares an error.
Analysis of the probability of error. We bound the probability of error averaged over
88 Multiple Access Channels

codebooks and messages. By symmetry of code generation,

P(E) = P(E | M1 = 1, M2 = 1).

The decoder makes an error iff one or more of the following events occur:

E1 = 󶁁(X1n (1), X2n (1), Y n ) ∉ Tє(n) 󶁑,

E2 = 󶁁(X1n (m1 ), Y n ) ∈ Tє(n) for some m1 ̸= 1󶁑,
E3 = 󶁁(X1n (1), X2n (m2 ), Y n ) ∈ Tє(n) for some m2 ̸= 1󶁑.

By the union of events bound,

P(E) ≤ P(E1 ) + P(E2 ) + P(E3 ).

We bound each term. By the LLN, the first term P(E1 ) tends to zero as n → ∞. For the
second term, note that for m1 ̸= 1, (X1n (m1 ), Y n ) ∼ ∏ni=1 p X1 (x1i )pY (yi ). Hence by the
packing lemma in Section . with A = [2 : 2nR1 ], X ← X1 , and U = , P(E2 ) tends to zero
as n → ∞ if R1 < I(X1 ; Y ) − δ(є).
For the third term, note that for m2 ̸= 1, X2n (m2 ) ∼ ∏ni=1 p X2 (x2i ) is independent of
(X1n (1), Y n ) ∼ ∏ni=1 p X1 ,Y (x1i , yi ). Hence, again by the packing lemma with A = [2 : 2nR2 ],
X ← X2 , Y ← (X1 , Y ), and U = , P(E3 ) tends to zero as n → ∞ if R2 < I(X2 ; Y , X1 ) −
δ(є), or equivalently—since X1 and X2 are independent—if R2 < I(X2 ; Y|X1 ) − δ(є). Thus,
the total average probability of decoding error P(E) tends to zero as n → ∞ if R1 <
I(X1 ; Y ) − δ(є) and R2 < I(X2 ; Y|X1 ) − δ(є). Since the probability of error averaged over
codebooks, P(E), tends to zero as n → ∞, there must exist a sequence of (2nR1 , 2nR2 , n)
codes such that limn→∞ Pe(n) = 0.
Achievability of the other corner point of R(X1 , X2 ) follows by changing the decoding
order. To show achievability of other points in R(X1 , X2 ), we use time sharing between
corner points and points on the axes. Finally, to show achievability of points in C that are
not in any single R(X1 , X2 ) region, we use time sharing between points in these regions.
Simultaneous decoding. We can prove achievability of every rate pair in the interior
of R(X1 , X2 ) without time sharing. The decoder declares that (m ̂ 1, m
̂ 2 ) is sent if it is the
(n)
n
̂ 1 ), x2 (m
unique message pair such that (x1 (m n
̂ 2 ), y ) ∈ Tє ; otherwise it declares an error.
n

Analysis of the probability of error. As before, we bound the probability of error aver-
aged over codebooks and messages. To analyze this probability, consider all possible pmfs
induced on the triple (X1n (m1 ), X2n (m2 ), Y n ) as listed in Table ..
Then the error event E occurs iff one or more of the following events occur:

E1 = 󶁁(X1n (1), X2n (1), Y n ) ∉ Tє(n) 󶁑,

E2 = 󶁁(X1n (m1 ), X2n (1), Y n ) ∈ Tє(n) for some m1 ̸= 1󶁑,
E3 = 󶁁(X1n (1), X2n (m2 ), Y n ) ∈ Tє(n) for some m2 ̸= 1󶁑,
E4 = 󶁁(X1n (m1 ), X2n (m2 ), Y n ) ∈ Tє(n) for some m1 ̸= 1, m2 ̸= 1󶁑.
4.5 Single-Letter Characterization of the Capacity Region 89

m1 m2 Joint pmf
  p(x1n )p(x2n )p(y n |x1n , x2n )
∗  p(x1n )p(x2n )p(y n |x2n )
 ∗ p(x1n )p(x2n )p(y n |x1n )
∗ ∗ p(x1n )p(x2n )p(y n )

Table .. The joint pmfs induced by different (m1 , m2 ) pairs. The ∗ symbol corre-
sponds to message m1 or m2 ̸= 1.

Thus by the union of events bound,

P(E) ≤ P(E1 ) + P(E2 ) + P(E3 ) + P(E4 ).
We now bound each term. By the LLN, P(E1 ) tends to zero as n → ∞. By the pack-
ing lemma, P(E2 ) tends to zero as n → ∞ if R1 < I(X1 ; Y|X2 ) − δ(є). Similarly, P(E3 )
tends to zero as n → ∞ if R2 < I(X2 ; Y |X1 ) − δ(є). Finally, since for m1 ̸= 1, m2 ̸= 1,
(X1n (m1 ), X2n (m2 )) is independent of (X1n (1), X2n (1), Y n ), again by the packing lemma with
A = [2 : 2nR1 ] × [2 : 2nR2 ], X ← (X1 , X2 ), and U = , P(E4 ) tends to zero as n → ∞ if
R1 + R2 < I(X1 , X2 ; Y ) − δ(є).
As in successive cancellation decoding, to show achievability of points in C that are
not in any single R(X1 , X2 ) region, we use time sharing between points in different such
regions.

Remark 4.5. For the DM-MAC, simultaneous decoding does not achieve higher rates
than successive cancellation decoding with time sharing. We will encounter several sce-
narios throughout the book, where simultaneous decoding achieves strictly higher rates
than sequential decoding schemes.
Remark 4.6. Unlike the capacity of the DMC, the capacity region of the DM-MAC with
the maximal probability of error can be strictly smaller than that with the average proba-
bility of error. However, by allowing randomization at the encoders, the capacity region
with the maximal probability of error can be shown to be equal to that with the average
probability of error.

4.5.2 Proof of the Converse

We need to show that given any sequence of (2nR1 , 2nR2 , n) codes with limn→∞ Pe(n) = 0,
then we must have (R1 , R2 ) ∈ C as defined in Theorem .. First note that each code
induces the joint pmf
n
(M1 , M2 , X1n , X2n , Y n ) ∼ 2−n(R1 +R2 ) p(x1n |m1 )p(x2n |m2 ) 󵠉 pY|X1 ,X2 (yi |x1i , x2i ).
i=1

By Fano’s inequality,
H(M1 , M2 |Y n ) ≤ n(R1 + R2 )Pe(n) + 1 = nєn , (.)
90 Multiple Access Channels

where єn tends to zero as n → ∞. Now, using similar steps to the converse proof of the
channel coding theorem in Section .., it is easy to show that
n
n(R1 + R2 ) ≤ 󵠈 I(X1i , X2i ; Yi ) + nєn .
i=1

Next, note from (.) that H(M1 |Y n , M2 ) ≤ H(M1 , M2 |Y n ) ≤ nєn . Hence

where (a) and (b) follow since X ji is a function of M j for j = 1, 2, respectively, and (c)
follows by the memoryless property of the channel, which implies that (M1 , M2 , Y i−1 ) →
(X1i , X2i ) → Yi form a Markov chain. Similarly bounding R2 , we have shown that
n
1
R1 ≤ 󵠈 I(X1i ; Yi | X2i ) + єn ,
n i=1
n
1
R2 ≤ 󵠈 I(X2i ; Yi | X1i ) + єn ,
n i=1
n
1
R1 + R2 ≤ 󵠈 I(X1i , X2i ; Yi ) + єn .
n i=1

Since M1 and M2 are independent, so are X1i (M1 ) and X2i (M2 ) for all i. Note that bound-
ing each of the above terms by its corresponding capacity, i.e., C1 , C2 , and C12 in (.)
and (.), respectively, yields the simple outer bound in Figure ., which is in general
larger than the inner bound we established.
4.5 Single-Letter Characterization of the Capacity Region 91

Continuing the proof, let the random variable Q ∼ Unif[1 : n] be independent of

(X1n , X2n , Y n ). Then, we can write
n
1
R1 ≤ 󵠈 I(X1i ; Yi | X2i ) + єn
n i=1
n
1
= 󵠈 I(X1i ; Yi | X2i , Q = i) + єn
n i=1
= I(X1Q ; YQ | X2Q , Q) + єn .

We now observe that YQ | {X1Q = x1 , X2Q = x2 } ∼ p(y|x1 , x2 ), that is, it is distributed ac-
cording to the channel conditional pmf. Hence, we identify X1 = X1Q , X2 = X2Q , and
Y = YQ to obtain
R1 ≤ I(X1 ; Y | X2 , Q) + єn .

We can similarly bound R2 and R1 + R2 . Note that by the independence of X1n and X2n , X1
and X2 are conditionally independent given Q. Thus, the rate pair (R1 , R2 ) must satisfy
the inequalities

R1 ≤ I(X1 ; Y | X2 , Q) + єn ,
R2 ≤ I(X2 ; Y | X1 , Q) + єn ,
R1 + R2 ≤ I(X1 , X2 ; Y |Q) + єn

for Q ∼ Unif[1 : n] and some pmf p(x1 |q)p(x2 |q) and hence for some pmf p(q)p(x1 |q)
p(x2 |q) with Q taking values in some finite set (independent of n). Since єn tends to zero
as n → ∞ by the assumption that limn→∞ Pe(n) = 0, we have shown that (R1 , R2 ) must be
in the set C 󳰀 of rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y | X2 , Q),
R2 ≤ I(X2 ; Y | X1 , Q), (.)
R1 + R2 ≤ I(X1 , X2 ; Y |Q)

for some pmf p(q)p(x1 |q)p(x2 |q) with Q in some finite set.
The random variable Q is referred to as a time-sharing random variable. It is an auxil-
iary random variable, that is, a random variable that is not part of the channel variables.
Note that Q → (X1 , X2 ) → Y form a Markov chain.
To complete the proof of the converse, we need to show that the above region C 󳰀 is
identical to the region C in Theorem .. We already know that C ⊆ C 󳰀 (because we
proved that every achievable rate pair must be in C 󳰀 ). We can see this directly also by
noting that C 󳰀 contains all R(X1 , X2 ) regions by definition as well as all points in the
convex closure of the union of these regions, since any such point can be represented as a
convex combination of points in these regions using the time-sharing random variable Q.
It can be also shown that C 󳰀 ⊆ C . Indeed, any joint pmf p(q)p(x1 |q)p(x2 |q) defines
a pentagonal region with a 45∘ side. Thus it suffices to check if the corner points of this
92 Multiple Access Channels

pentagonal region are in C . First consider the corner point (I(X1 ; Y |Q), I(X2 ; Y |X1 , Q)).
It is easy to see that this corner point belongs to C , since it is a finite convex combination
of the points (I(X1 ; Y |Q = q), I(X2 ; Y|X1 , Q = q)), q ∈ Q, each of which in turn belongs
to R(X1q , X2q ) with (X1q , X2q ) ∼ p(x1 |q)p(x2 |q). We can similarly show that the other
corner points of the pentagonal region also belong to C . This shows that C 󳰀 = C , which
completes the proof of Theorem ..
However, neither the characterization C nor C 󳰀 seems to be “computable.” How many
R(X1 , X2 ) sets do we need to consider in computing each point on the boundary of C ?
How large must the cardinality of Q be to compute each point on the boundary of C 󳰀 ? By
the convex cover method in Appendix C, we can show that it suffices to take the cardinality
of the time-sharing random variable |Q| ≤ 3 and hence at most three R(X1 , X2 ) regions
need to be considered for each point on the boundary of C . By exploiting the special
structure of the R(X1 , X2 ) regions, we show in Appendix A that |Q| ≤ 2 is sufficient.
This yields the following computable characterization of the capacity region.

Theorem .. The capacity region of the DM-MAC p(y|x1 , x2 ) is the set of rate pairs
(R1 , R2 ) such that

R1 ≤ I(X1 ; Y | X2 , Q),
R2 ≤ I(X2 ; Y | X1 , Q),
R1 + R2 ≤ I(X1 , X2 ; Y |Q)

for some pmf p(q)p(x1 |q)p(x2 |q) with the cardinality of Q bounded as |Q| ≤ 2.

Remark 4.7. In the above characterization of the capacity region, it should be understood
that p(q, x1 , x2 , y) = p(q)p(x1 |q)p(x2 |q)p(y|x1 , x2 ), i.e., Q → (X1 , X2 ) → Y. Throughout
the book we will encounter many such characterizations. We will not explicitly include
the given source or channel pmf in the characterization. It should be understood that the
joint pmf is the product of the given source or channel pmf and the pmf over which we
are optimizing.
Remark 4.8. The region in the theorem is closed and convex.
Remark 4.9. It can be shown that time sharing is required for some DM-MACs, that is,
setting |Q| = 1 is not sufficient in general; see the push-to-talk MAC in Problem ..
Remark 4.10. The sum-capacity of the DM-MAC can be expressed as

Csum = max I(X1 , X2 ; Y |Q)

p(q)p(x1 |q)p(x2 |q)
(a)
= max I(X1 , X2 ; Y),
p(x1 )p(x2 )

where (a) follows since the average is upper bounded by the maximum. Thus the sum-
capacity is “computable” even without any cardinality bound on Q. However, the second
4.6 Gaussian Multiple Access Channel 93

maximization problem is not convex in general, so there does not exist an efficient algo-
rithm to compute Csum from it.

4.5.3 Coded Time Sharing

It turns out that we can achieve the capacity region characterization in Theorem . di-
rectly without explicit time sharing. The proof involves the new technique of coded time
sharing as described in the following alternative proof of achievability.
Codebook generation. Fix a pmf p(q)p(x1 |q)p(x2 |q). Randomly generate a time-sharing
sequence qn according to ∏ni=1 pQ (qi ). Randomly and conditionally independently gener-
ate 2nR1 sequences x1n (m1 ), m1 ∈ [1 : 2nR1 ], each according to ∏ni=1 p X1 |Q (x1i |qi ). Similarly
generate 2nR2 sequences x2n (m2 ), m2 ∈ [1 : 2nR2 ], each according to ∏ni=1 p X2 |Q (x2i |qi ). The
chosen codebook, including qn , is revealed to the encoders and the decoder.
Encoding. To send (m1 , m2 ), transmit x1n (m1 ) and x2n (m2 ).
Decoding. The decoder declares that (m ̂ 1, m
̂ 2 ) is sent if it is the unique message pair such
(n)
n n
̂ 1 ), x2 (m
that (q , x1 (m n
̂ 2 ), y ) ∈ Tє ; otherwise it declares an error.
n

Analysis of the probability of error. The decoder makes an error iff one or more of the
following events occur:

E1 = 󶁁(Q n , X1n (1), X2n (1), Y n ) ∉ Tє(n) 󶁑,

E2 = 󶁁(Q n , X1n (m1 ), X2n (1), Y n ) ∈ Tє(n) for some m1 ̸= 1󶁑,
E3 = 󶁁(Q n , X1n (1), X2n (m2 ), Y n ) ∈ Tє(n) for some m2 ̸= 1󶁑,
E4 = 󶁁(Q n , X1n (m1 ), X2n (m2 ), Y n ) ∈ Tє(n) for some m1 ̸= 1, m2 ̸= 1󶁑.

Hence P(E) ≤ ∑4j=1 P(E j ). By the LLN, P(E1 ) tends to zero as n → ∞. Since for m1 ̸= 1,
X1n (m1 ) is conditionally independent of (X2n (1), Y n ) given Q n , by the packing lemma
with A = [2 : 2nR1 ], Y ← (X2 , Y ), and U ← Q, P(E2 ) tends to zero as n → ∞ if R1 <
I(X1 ; X2 , Y |Q) − δ(є) = I(X1 ; Y|X2 , Q) − δ(є). Similarly, P(E3 ) tends to zero as n → ∞ if
R2 < I(X2 ; Y |X1 , Q) − δ(є). Finally, since for m1 ̸= 1, m2 ̸= 1, (X1n (m1 ), X2n (m2 )) is con-
ditionally independent of Y n given Q n , again by the packing lemma, P(E4 ) tends to zero
as n → ∞ if R1 + R2 < I(X1 , X2 ; Y |Q) − δ(є). The rest of the proof follows as before.
Remark .. As we will see, for example in Chapter , coded time sharing is necessary
when an achievable rate region cannot be equivalently expressed as the convex closure of
the union of rate regions that are achievable without time sharing; see Problem ..

4.6 GAUSSIAN MULTIPLE ACCESS CHANNEL

Consider the -sender discrete-time additive white Gaussian noise MAC (Gaussian MAC
in short) depicted in Figure ., which is a simple model, for example, for uplink (handset-
to-base station) channels in cellular systems. The channel output corresponding to the
94 Multiple Access Channels

inputs X1 and X2 is
Y = д1 X1 + д2 X2 + Z,

where д1 and д2 are the channel gains and Z ∼ N(0, N0 /2) is the noise. Thus in transmis-
sion time i ∈ [1 : n], the channel output is

Yi = д1 X1i + д2 X2i + Zi ,

where {Zi } is a WGN(N0 /2) process, independent of the channel inputs X1n and X2n (when
no feedback is present). We assume average transmission power constraints

n
󵠈 x 2ji (m j ) ≤ nP, m j ∈ [1 : 2nR 󰑗 ], j = 1, 2.
i=1

Assume without loss of generality that N0 /2 = 1 and define the received powers (SNRs)
as S j = д 2j P, j = 1, 2. As we will see, the capacity region depends only on S1 and S2 .

Z
X1 д1

X2 д2

Figure .. Gaussian multiple access channel.

4.6.1 Capacity Region of the Gaussian MAC

The capacity region of the Gaussian MAC is a pentagonal region with corner points char-
acterized by point-to-point Gaussian channel capacities.

Theorem .. The capacity region of the Gaussian MAC is the set of rate pairs (R1 , R2 )
such that

R1 ≤ C(S1 ),
R2 ≤ C(S2 ),
R1 + R2 ≤ C(S1 + S2 ),

where C(x) is the Gaussian capacity function.

This region is sketched in Figure .. Note that the capacity region coincides with the
simple outer bound in Section ..
4.6 Gaussian Multiple Access Channel 95

C(S2 )

C(S2 /(1 + S1 ))
R1
C(S1 /(1 + S2 )) C(S1 )

Figure .. Capacity region of the Gaussian MAC.

Proof of achievability. Consider the R(X1 , X2 ) region where X1 and X2 are N(0, P),
independent of each other. Then

I(X1 ; Y | X2 ) = h(Y | X2 ) − h(Y | X1 , X2 )

= h(д1 X1 + д2 X2 + Z | X2 ) − h(д1 X1 + д2 X2 + Z | X1 , X2 )
= h(д1 X1 + Z) − h(Z)
1 1
= log(2πe(S1 + 1)) − log(2πe)
2 2
= C(S1 ).

The other two mutual information terms follow similarly. Hence the capacity region co-
incides with a single R(X1 , X2 ) region and there is no need to use time sharing. The rest
of the achievability proof follows by first establishing a coding theorem for the DM-MAC
with input costs (see Problem .), and then applying the discretization procedure used
in the achievability proof for the point-to-point Gaussian channel in Section ..
The converse proof is similar to that for the point-to-point Gaussian channel.

4.6.2 Comparison with Point-to-Point Coding Schemes

We compare the capacity region of the Gaussian MAC with the suboptimal rate regions
achieved by practical schemes that use point-to-point Gaussian channel codes. We further
show that such codes, when used with successive cancellation decoding and time sharing,
can achieve the entire capacity region.
Treating other codeword as noise. In this scheme, Gaussian random codes are used and
each message is decoded while treating the other codeword as noise. This scheme achieves
the set of rate pairs (R1 , R2 ) such that

S1
R1 < C 󶀥 󶀵,
1 + S2
S
R2 < C 󶀥 2 󶀵 .
1 + S1
96 Multiple Access Channels

Time-division multiple access. A naive time-division scheme achieves the set of rate
pairs (R1 , R2 ) such that

R1 < α C(S1 ),
R2 < ᾱ C(S2 )

for some α ∈ [0, 1].

Note that when the channel SNRs are sufficiently low, treating the other codeword as
noise can outperform time division as illustrated in Figure ..

R2 R2
RTD C
C(S2 ) C C(S2 )
RAN
RTD
S
C󶀣 1+S2 󶀳
1

S RAN
C󶀣 1+S2 󶀳
1

R1 R1
C(S1 /(1 + S2 )) C(S1 ) C(S1 /(1 + S2 )) C(S1 )

(a) (b)

Figure .. Comparison between time division (region RTD ) and treating the other
codeword as noise (region RAN ): (a) high SNR, (b) low SNR.

Time division with power control. The average power used by the senders in time di-
vision is strictly lower than the average power constraint P for α ∈ (0, 1). If the senders
are allowed to use higher powers during their transmission periods (without violating the
power constraint over the entire transmission block), strictly higher rates can be achieved.
We divide the transmission block into two subblocks, one of length αn and the other of
̄ (assuming αn is an integer). During the first subblock, sender  transmits using
length αn
Gaussian random codes at average power P/α (rather than P) and sender  does not trans-
mit. During the second subblock, sender  transmits at average power P/ᾱ and sender 
does not transmit. Note that the average power constraints are satisfied. This scheme
achieves the set of rate pairs (R1 , R2 ) such that

R1 < α C(S1 /α),

(.)
R2 < ᾱ C(S2 /α)
̄

for some α ∈ [0, 1].

Now set α = S1 /(S1 + S2 ). Substituting in (.) and adding yields a point (R1 , R2 ) on
the boundary of the inner bound that lies on the sum-capacity line C12 = C(S1 + S2 ) as
shown in Figure .!
4.6 Gaussian Multiple Access Channel 97

RTDP
C(S2 )
α = S1 /(S1 + S2 )

R1
C(S1 )

Figure .. Time division with power control (region RTDP ).

Remark .. It can be shown that time division with power control always outperforms
treating the other codeword as noise.

Successive cancellation decoding. As in the DM-MAC case, the corner points of the
Gaussian MAC capacity region can be achieved using successive cancellation decoding as
depicted in Figure ..
∙ Upon receiving y n = д2 x2n (m2 ) + д1 x1n (m1 ) + z n , the receiver recovers m2 while treat-
ing the received signal д1 x1n (m1 ) from sender  as part of the noise. The probability of
error for this step tends to zero as n → ∞ if R2 < C(S2 /(S1 + 1)).
∙ The receiver then subtracts д2 x2n (m2 ) from y n and decodes д1 x1n (m1 ) + z n to recover
the message m1 . The probability of error for this step tends to zero as n → ∞ if the
first decoding step is successful and R1 < C(S1 ).
The other corner point can be achieved by changing the decoding order and any point
on the R1 + R2 = C(S1 + S2 ) line can be achieved by time sharing between the two corner
points. Thus, any point inside the capacity region can be achieved using good point-to-
point Gaussian channel codes.
As before, the above argument can be made rigorous via the discretization procedure
used in the proof of achievability for the point-to-point Gaussian channel.

yn ̂2
m ̂ 2)
x2n (m
M2 -decoder Encoder 
−
̂1
m
M1 -decoder
+

̂ 2 is unique.
Figure .. Successive cancellation decoding assuming m
98 Multiple Access Channels

4.7 EXTENSIONS TO MORE THAN TWO SENDERS

The capacity region for the -sender DM-MAC extends naturally to DM-MACs with more
senders. Defining a code, achievability, and the capacity region for the k-sender DM-
MAC (X1 × X2 × ⋅ ⋅ ⋅ × Xk , p(y|x1 , x2 , . . . , xk ), Y) in a similar manner to the -sender case,
we can readily establish the following characterization of the capacity region.

Theorem .. The capacity region of the k-sender DM-MAC is the set of rate tuples
(R1 , R2 , . . . , Rk ) such that

󵠈 R j ≤ I(X(J ); Y | X(J c ), Q) for every J ⊆ [1 : k]

j∈J

for some pmf p(q) ∏kj=1 p j (x j |q) with |Q| ≤ k.

For the k-sender Gaussian MAC with received SNRs S j for j ∈ [1 : k], the capacity
region is the set of rate tuples such that

󵠈 R j ≤ C 󶀦 󵠈 S j 󶀶 for every J ⊆ [1 : k].

j∈J j∈J

This region is a polymatroid in the k-dimensional space and each of k! corner points can
be achieved by using point-to-point Gaussian codes and successive cancellation decoding
in some message decoding order.
Remark .. The multiple access channel is one of the few examples in network infor-
mation theory in which a straightforward extension of the optimal results for a small
number of users is optimal for an arbitrarily large number of users. Other examples in-
clude the degraded broadcast channel discussed in Chapter  and the distributed lossless
source coding problem discussed in Chapter . In other cases, e.g., the broadcast channel
with degraded message sets in Chapter , such straightforward extensions will be shown
to be strictly suboptimal.

SUMMARY

∙ Discrete memoryless multiple access channel (DM-MAC)

∙ Capacity region
∙ Time sharing via rate splitting
∙ Single-letter versus multiletter characterizations of the capacity region
∙ Successive cancellation decoding
∙ Simultaneous decoding is more powerful than successive cancellation decoding
Bibliographic Notes 99

∙ Time-sharing random variable

∙ Bounding the cardinality of the time-sharing random variable
∙ Coded time sharing is more powerful than time sharing
∙ Gaussian multiple access channel:
∙ Time division with power control achieves the sum-capacity
∙ Capacity region with average power constraints achieved via optimal point-to-
point codes, successive cancellation decoding, and time sharing

BIBLIOGRAPHIC NOTES

The multiple access channel was first alluded to in Shannon (). The multiletter char-
acterization of the capacity region in Theorem . was established by van der Meulen
(b). The single-letter characterization in Theorem . was established by Ahlswede
(, ) and Liao (). The characterization in Theorem . is due to Slepian and
Wolf (b) and the cardinality bound |Q| ≤ 2 appears in Csiszár and Körner (b).
The original proof of achievability uses successive cancellation decoding. Simultaneous
decoding first appeared in El Gamal and Cover (). Coded time sharing is due to Han
and Kobayashi (). Dueck (b) established the strong converse. The capacity region
of the Gaussian MAC in Theorem . is due to Cover (c) and Wyner (). Han
() and Tse and Hanly () studied the polymatroidal structure of the capacity re-
gion of the k-sender MAC. Surveys of the early literature on the multiple access channel
can be found in van der Meulen (, ).

PROBLEMS

.. Show that the multiletter characterization of the DM-MAC capacity region C in
Theorem . is convex.
.. Capacity region of multiple access channels.
(a) Consider the binary multiplier MAC in Example .. We established the capac-
ity region using time division between two individual capacities. Show that the
capacity region can be also expressed as the union of R(X1 , X2 ) sets (with no
time sharing) and specify the set of pmfs p(x1 )p(x2 ) on (X1 , X2 ) that achieve
the boundary of the capacity region.
(b) Find the capacity region of the modulo- sum MAC, where X1 and X2 are
binary and Y = X1 ⊕ X2 . Again show that the capacity region can be expressed
as the union of R(X1 , X2 ) sets and therefore time sharing is not necessary.
(c) The capacity regions of the above two MACs and the Gaussian MAC can each
be expressed as the union of R(X1 , X2 ) sets and no time sharing is necessary. Is
time sharing ever necessary? Find the capacity of the push-to-talk MAC with
100 Multiple Access Channels

binary inputs and output, given by p(0|0, 0) = p(1|0, 1) = p(1|1, 0) = 1 and

p(0|1, 1) = 1/2. Why is this channel called “push-to-talk”? Show that the ca-
pacity region cannot be completely expressed as the union of R(X1 , X2 ) sets
and that time sharing is necessary.
Remark: This example is due to Csiszár and Körner (b, Problem ..).
Another simple example for which time sharing is needed can be found in
Bierbaum and Wallmeier ().
.. Average vs. maximal probability of error. We proved the channel coding theorem
for a discrete memoryless channel under the average probability of error criterion.
It is straightforward to show that the capacity is the same if we instead deal with
the maximal probability of error. If we have a (2nR , n) code with Pe(n) ≤ є, then by
the Markov inequality, at least half its codewords have λm ≤ 2є and this half has
a rate R − 1/n → R. Such an argument cannot be used to show that the capacity
region of an arbitrary MAC under the maximal probability of error criterion is the
same as that under the average probability of error criterion.
(a) Argue that simply discarding half of the codeword pairs with the highest prob-
ability of error does not work in general.
(b) How about throwing out the worst half of each sender’s codewords? Show
that this does not work either. (Hint: Provide a simple example of a set of
probabilities λm1 m2 ∈ [0, 1], (m1 , m2 ) ∈ [1 : 2nR ] × [1 : 2nR ], such that the aver-
age probability of error 2−2nR ∑m1 ,m2 λm1 m2 ≤ є for some є ∈ (0, 1/4), yet there
are no subsets J , K ⊆ [1 : 2nR ] with cardinalities |J |, |K| ≥ 2nR /2 that satisfy
λm1 m2 ≤ 4є for all (m1 , m2 ) ∈ J × K.)
Remark: A much stronger statement holds. Dueck () provided an example of
a MAC for which the capacity region with maximal probability of error is strictly
smaller than that with average probability of error. This gives yet another exam-
ple in which a result from point-to-point information theory does not necessarily
carry over to the multiuser case.
.. Convex closure of the union of sets. Let

R1 = 󶁁(r1 , r2 ) : r1 ≤ I1 r2 ≤ I2 , r1 + r2 ≤ I12 󶁑

for some I1 , I2 , I12 ≥ 0 and

R1󳰀 = 󶁁(r1 , r2 ) : r1 ≤ I1󳰀 , r2 ≤ I2󳰀 , r1 + r2 ≤ I12

󳰀
󶁑

for some I1󳰀 , I2󳰀 , I12

󳰀
≥ 0. Let R2 be the convex closure of the union of R1 and R1󳰀 ,
and R3 = {(r1 , r2 ) : r1 ≤ αI1 + αĪ 1󳰀 , r2 ≤ αI2 + αI
̄ 2󳰀 , r1 + r2 ≤ αI12 + αI 󳰀
̄ 12 for some
α ∈ [0, 1]}.
(a) Show that R2 ⊆ R3 .
(b) Provide a counterexample showing that the inclusion can be strict.
Problems 101

(c) Under what condition on I1 , I2 , I12 , I1󳰀 , I2󳰀 , and I12

󳰀
, is R2 = R3 ?
(Hint: Consider the case I1 = 2, I2 = 5, I12 = 6, I1󳰀 = 1, I2󳰀 = 4, and I12
󳰀
= 7.)
.. Another multiletter characterization. Consider a DM-MAC p(y|x1 , x2 ). Let R (k) ,
k ≥ 1, be the set of rate pairs (R1 , R2 ) such that
1
R1 ≤ I(X1k ; Y k | X2k ),
k
1
R2 ≤ I(X2k ; Y k | X1k ),
k
1
R1 + R2 ≤ I(X1k , X2k ; Y k )
k

for some pmf p(x1k )p(x2k ). Show that the capacity region can be characterized as
C = ⋃k R (k) .
.. From multiletter to single-letter. Show that the multiletter characterization in The-
orem . can be directly reduced to Theorem . (without a cardinality bound).
.. Converse for the Gaussian MAC. Prove the weak converse for the Gaussian MAC
with average power constraint P by continuing the inequalities
n
nR1 ≤ 󵠈 I(X1i ; Yi | X2i ) + nєn ,
i=1
n
nR2 ≤ 󵠈 I(X2i ; Yi | X1i ) + nєn ,
i=1
n
n(R1 + R2 ) ≤ 󵠈 I(X1i , X2i ; Yi ) + nєn .
i=1

.. DM-MAC with input costs. Consider the DM-MAC p(y|x1 , x2 ) and let b1 (x1 ) and
b2 (x2 ) be input cost functions with b j (x j0 ) = 0 for some x j0 ∈ X j , j = 1, 2. Assume
that there are cost constraints ∑ni=1 b j (x ji (m j )) ≤ nB j for m j ∈ [1 : 2nR ], j = 1, 2.
Show that the capacity region is the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y | X2 , Q),
R2 ≤ I(X2 ; Y | X1 , Q),
R1 + R2 ≤ I(X1 , X2 ; Y |Q)
for some pmf p(q)p(x1 |q)p(x2 |q) that satisfies E(b j (X j )) ≤ B j , j = 1, 2.
.. Cooperative capacity of a MAC. Consider a DM-MAC p(y|x1 , x2 ). Assume that
both senders have access to both messages m1 ∈ [1 : 2nR1 ] and m2 ∈ [1 : 2nR2 ]; thus
each of the codewords x1n (m1 , m2 ) and x2n (m1 , m2 ) can depend on both messages.
(a) Find the capacity region.
(b) Evaluate the region for the Gaussian MAC with noise power , channel gains
д1 and д2 , and average power constraint P on each of X1 and X2 .
102 Multiple Access Channels

.. Achievable SNR region. Consider a Gaussian multiple access channel Y = дX1 +
X2 + Z with Z ∼ N(0, 1), д ≥ 1, and average power constraints P1 on X1 and P2
on X2 .
(a) Specify the capacity region of this channel with SNRs S1 = д 2 P1 and S2 = P2 .
(b) Suppose we wish to communicate reliably at a fixed rate pair (R1 , R2 ). Specify
the achievable SNR region S (R1 , R2 ) consisting of all SNR pairs (s1 , s2 ) such
that (R1 , R2 ) is achievable.
(c) Find the SNR pair (s1∗ , s2∗ ) ∈ S (R1 , R2 ) that minimizes the total average trans-
mission power Psum = P1 + P2 . (Hint: You can use a simple geometric argu-
ment.)
(d) Can (R1 , R2 ) be achieved with minimum total average power Psum using only
successive cancellation decoding (i.e., without time sharing)? If so, what is the
order of message decoding? Can (R1 , R2 ) be achieved by time division with
power control? (Hint: Treat the cases д = 1 and д > 1 separately.)
(e) Find the minimum-energy-per-bit region of the Gaussian MAC, that is, the set
of all energy pairs (E1 , E2 ) = (P1 /R1 , P2 /R2 ) such that the rate pair (R1 , R2 ) is
achievable with average code power pair (P1 , P2 ).
.. Two-sender two-receiver channel. Consider the DM -sender -receiver channel
depicted in Figure . with message pair (M1 , M2 ) uniformly distributed over
[1 : 2nR1 ] × [1 : 2nR2 ]. Sender j = 1, 2 encodes the message M j . Each receiver
wishes to recover both messages (how is this channel related to the multiple access
channel?). The average probability of error is defined as Pe(n) = P{(M ̂ 11 , M
̂ 21 ) ̸=
̂ ̂
(M1 , M2 ) or (M12 , M22 ) ̸= (M1 , M2 )}. Achievability and the capacity region are
defined as for the DM-MAC. Find a single-letter characterization of the capacity
region of this -sender -receiver channel.

M1 X1n Y1n ̂ 11 , M
M ̂ 21
Encoder  Decoder 
p(y1 , y2 |x1 , x2 )
M2 X2n Y2n ̂ 12 , M
M ̂ 22
Encoder  Decoder 

Figure .. Two-sender two-receiver channel.

.. MAC with list codes. A (2nR1 , 2nR2 , 2nL2 , n) list code for a DM-MAC p(y|x1 , x2 )
consists of two message sets [1 : 2nR1 ] and [1 : 2nR2 ], two encoders, and a list de-
coder. The message pair (M1 , M2 ) is uniformly distributed over [1 : 2nR1 ] × [1 :
2nR2 ]. Upon receiving y n , the decoder finds m ̂ 1 and a list of messages L2 ⊆ [1 : 2nR2 ]
of size |L2 | ≤ 2 . An error occurs if M
nL2 ̂ 1 ̸= M1 or if the list L2 does not contain the
(n)
transmitted message M2 , i.e., Pe = P{M ̂ 1 ̸= M1 or M2 ∉ L2 }. A triple (R1 , R2 , L2 )
Appendix 4A Cardinality Bound on Q 103

is said to be achievable if there exists a sequence of (2nR1 , 2nR2 , 2nL2 , n) list codes
such that limn→∞ Pe(n) = 0. The list capacity region of the DM-MAC is the closure
of the set of achievable triples (R1 , R2 , L2 ).
(a) Find a single-letter characterization of the list capacity region of the DM-MAC.
(b) Prove achievability and the converse.
.. MAC with dependent messages. Consider a DM-MAC p(y|x1 , x2 ) with message
pair (M1󳰀 , M2󳰀 ) distributed according to an arbitrary pmf on [1 : 2nR1 ] × [1 : 2nR2 ].
In particular, M1󳰀 and M2󳰀 are not independent in general. Show that if the rate pair
(R1 , R2 ) satisfies the inequalities in Theorem ., then there exists a sequence of
(2nR1 , 2nR2 , n) codes with limn→∞ P{(M ̂ 󳰀, M
̂ 󳰀) =
1 2 ̸ (M1 , M2 )} = 0.

APPENDIX 4A CARDINALITY BOUND ON Q

We already know that the region C 󳰀 of Theorem . without any bound on the cardi-
nality of Q is the capacity region. To establish the bound on |Q|, we first consider some
properties of the region C in Theorem ..
∙ Any pair (R1 , R2 ) in C is in the convex closure of the union of no more than two
R(X1 , X2 ) regions. This follows by the Fenchel–Eggleston–Carathéodory theorem in
Appendix A. Indeed, since the union of the R(X1 , X2 ) sets is a connected compact
set in ℝ2 , each point in its convex closure can be represented as a convex combination
of at most  points in the union, and thus each point is in the convex closure of the
union of no more than two R(X1 , X2 ) sets.
∙ Any point on the boundary of C is either a boundary point of some R(X1 , X2 ) set or
a convex combination of the “upper-diagonal” or “lower-diagonal” corner points of
two R(X1 , X2 ) sets as shown in Figure ..
These two properties imply that |Q| ≤ 2 suffices, which completes the proof.

R2 R2 R2

R1 R1 R1

Figure .. Boundary points of C as convex combinations of the corner points of

R(X1 , X2 ).
CHAPTER 5

Degraded Broadcast Channels

We introduce the broadcast channel as a model for noisy one-to-many communication,

such as the downlink of a cellular system, digital TV broadcasting, or a lecture to a group
of people with diverse backgrounds. Unlike the multiple access channel, no single-letter
characterization is known for the capacity region of the broadcast channel. However, there
are several interesting coding schemes and corresponding inner bounds on the capacity
region that are tight in some special cases.
In this chapter, we introduce the technique of superposition coding and use it to es-
tablish an inner bound on the capacity region of the broadcast channel. We show that
this inner bound is tight for the class of degraded broadcast channels, which includes the
Gaussian broadcast channel. The converse proofs involve the new technique of auxiliary
random variable identification and the use of Mrs. Gerber’s lemma, a symmetrization ar-
gument, and the entropy power inequality. We then show that superposition coding is
optimal for the more general classes of less noisy and more capable broadcast channels.
The converse proof uses the Csiszár sum identity.
We will resume the discussion of the broadcast channel in Chapters  and . In these
chapters, we will present more general inner and outer bounds on the capacity region
and show that they coincide for other classes of channels. The reason for postponing the
presentation of these results is pedagogical—we will need coding techniques that can be
introduced at a more elementary level through the discussion of channels with state in
Chapter .

5.1 DISCRETE MEMORYLESS BROADCAST CHANNEL

Consider the broadcast communication system depicted in Figure .. The sender wishes
to reliably communicate a private message to each receiver and a common message to both
receivers. It encodes the message triple (M0 , M1 , M2 ) into a codeword X n and transmits
̂ 0 j and M
it over the channel. Upon receiving Y jn , receiver j = 1, 2 finds the estimates M ̂j
of the common message and its private message, respectively. A tradeoff arises between
the rates of the three messages—when one of the rates is high, the other rates may need to
be reduced to ensure reliable communication of all three messages. As before, we study
the asymptotic limit on this tradeoff.
5.1 Discrete Memoryless Broadcast Channel 105

Y1n ̂ 01 , M
M ̂1
Decoder 
M0 , M1 , M2 Xn
Encoder p(y1 , y2 |x)
Y2n ̂ 02 , M
M ̂2
Decoder 

Figure .. Two-receiver broadcast communication system.

We first consider a -receiver discrete memoryless broadcast channel (DM-BC) model

(X , p(y1 , y2 |x), Y1 × Y2 ) that consists of three finite sets X , Y1 , Y2 , and a collection of
conditional pmfs p(y1 , y2 |x) on Y1 × Y2 (one for each input symbol x).
A (2nR0 , 2nR1 , 2nR2 , n) code for a DM-BC consists of
∙ three message sets [1 : 2nR0 ], [1 : 2nR1 ], and [1 : 2nR2 ],
∙ an encoder that assigns a codeword x n (m0 , m1 , m2 ) to each message triple (m0 , m1 ,
m2 ) ∈ [1 : 2nR0 ] × [1 : 2nR1 ] × [1 : 2nR2 ], and
∙ two decoders, where decoder  assigns an estimate (m ̂ 01 , m
̂ 1 ) ∈ [1 : 2nR0 ] × [1 : 2nR1 ]
or an error message e to each received sequence y1n , and decoder  assigns an estimate
̂ 02 , m
(m ̂ 2 ) ∈ [1 : 2nR0 ] × [1 : 2nR2 ] or an error message e to each received sequence y2n .
We assume that the message triple (M0 , M1 , M2 ) is uniformly distributed over [1 : 2nR0 ] ×
[1 : 2nR1 ] × [1 : 2nR2 ]. The average probability of error is defined as
̂ 01 , M
Pe(n) = P󶁁(M ̂ 1) = ̂ 02 , M
̸ (M0 , M1 ) or (M ̂ 2) =
̸ (M0 , M2 )󶁑.

A rate triple (R0 , R1 , R2 ) is said to be achievable for the DM-BC if there exists a sequence
of (2nR0 , 2nR1 , 2nR2 , n) codes such that limn→∞ Pe(n) = 0. The capacity region C of the DM-
BC is the closure of the set of achievable rate triples (R0 , R1 , R2 ).
The following simple observation, which results from the lack of cooperation between
the two receivers, will prove useful.

Lemma .. The capacity region of the DM-BC depends on the channel conditional
pmf p(y1 , y2 |x) only through the conditional marginal pmfs p(y1 |x) and p(y2 |x).

Proof. Consider the individual probabilities of error

Pe(n) ̂ ̂ ̸ (M0 , M j )󶁑,

j = P󶁁(M0 j , M j ) = j = 1, 2.

Note that each term depends only on its corresponding conditional marginal pmf p(y j |x),
(n) (n) (n) (n)
j = 1, 2. By the union of events bound, Pe(n) ≤ Pe1 + Pe2 . Also Pe(n) ≥ max{Pe1 , Pe2 }.
(n) (n)
Hence limn→∞ Pe(n) = 0 iff limn→∞ Pe1 = 0 and limn→∞ Pe2 = 0, which implies that the
capacity region of a DM-BC depends only on the conditional marginal pmfs. This com-
pletes the proof of the lemma.
106 Degraded Broadcast Channels

For simplicity of presentation, we focus the discussion on the private-message capacity

region, that is, the capacity region when R0 = 0. We then show how the results can be
readily extended to the case of both common and private messages.

5.2 SIMPLE BOUNDS ON THE CAPACITY REGION

Consider a DM-BC p(y1 , y2 |x). Let C j = max p(x) I(X; Y j ), j = 1, 2, be the capacities of
the DMCs p(y1 |x) and p(y2 |x). These capacities define the time-division inner bound in
Figure .. By allowing full cooperation between the receivers, we obtain the bound on
the sum-rate
R1 + R2 ≤ C12 = max I(X; Y1 , Y2 ).
p(x)

Combining this bound with the individual capacity bounds gives the outer bound on the
capacity region in Figure ..

C12 Time-division
inner bound
C2
Outer bound
?

R1
C1 C12
Figure .. Time-division inner bound and an outer bound on the capacity region.

The time-division bound can be tight in some cases.

Example .. Consider a symmetric DM-BC, where Y1 = Y2 = Y and pY1 |X (y|x) =
pY2 |X (y|x) = p(y|x). In this example, C1 = C2 = max p(x) I(X; Y). Since the capacity re-
gion depends only on the marginals of p(y1 , y2 |x), we can assume that Y1 = Y2 = Y. Thus
R1 + R2 ≤ C12 = max I(X; Y ) = C1 = C2 .
p(x)

This shows that the time-division inner bound and the outer bound sometimes coincide.

The following example shows that the bounds do not always coincide.
Example .. Consider a DM-BC with orthogonal components, where X = X1 × X2 and
p(y1 , y2 |x1 , x2 ) = p(y1 |x1 )p(y2 |x2 ). For this example, the capacity region is the set of rate
pairs (R1 , R2 ) such that R1 ≤ C1 and R2 ≤ C2 ; thus the outer bound is tight, but the time-
division inner bound is not.
5.3 Superposition Coding Inner Bound 107

As we will see, neither bound is tight in general.

5.3 SUPERPOSITION CODING INNER BOUND

The superposition coding technique is motivated by broadcast channels where one re-
ceiver is “stronger” than the other, for example, because it is closer to the sender, such that
it can always recover the “weaker” receiver’s message. This suggests a layered coding ap-
proach in which the weaker receiver’s message is treated as a “public” (common) message
and the stronger receiver’s message is treated as a “private” message. The weaker receiver
recovers only the public message, while the stronger receiver recovers both messages us-
ing successive cancellation decoding as for the MAC; see Section ... We illustrate this
coding scheme in the following.
Example . (Binary symmetric broadcast channel). The binary symmetric broadcast
channel (BS-BC) consists of a BSC(p1 ) and a BSC(p2 ) as depicted in Figure .. Assume
that p1 < p2 < 1/2.
Using time division, we can achieve the straight line between the capacities 1 − H(p1 )
and 1 − H(p2 ) of the individual BSCs. Can we achieve higher rates than time division? To
answer this question, consider the following superposition coding technique illustrated in
Figure ..
For α ∈ [0, 1/2], let U ∼ Bern(1/2) and V ∼ Bern(α) be independent, and X = U ⊕
V . Randomly and independently generate 2nR2 sequences un (m2 ), each i.i.d. Bern(1/2)
(cloud centers). Randomly and independently generate 2nR1 sequences 󰑣 n (m1 ), each i.i.d.
Bern(α). The sender transmits x n (m1 , m2 ) = un (m2 ) ⊕ 󰑣 n (m1 ) (satellite codeword).
To recover m2 , receiver  decodes y2n = un (m2 ) ⊕ (󰑣 n (m1 ) ⊕ z2n ) while treating 󰑣 n (m1 )
as noise. The probability of decoding error tends to zero as n → ∞ if R2 < 1 − H(α ∗ p2 ),
where α ∗ p2 = α p̄2 + αp ̄ 2.
Receiver  uses successive cancellation decoding—it first decodes y1n = un (m2 ) ⊕
(󰑣 (m1 ) ⊕ z1n ) to recover m2 while treating 󰑣 n (m1 ) as part of the noise, subtracts off un (m2 ),
n

Z1 ∼ Bern(p1 )
R2

Y1 Time division
1 − H(p2 )
X

R1
1 − H(p1 )
Z2 ∼ Bern(p2 )

Figure .. Binary symmetric broadcast channel and its time-division inner bound.
108 Degraded Broadcast Channels

cloud (radius ≈ nα)

{0, 1}n
satellite
codeword x n (m1 , m2 )

V
cloud
center un (m2 )
U X

Figure .. Superposition coding for the BS-BC.

and then decodes 󰑣 n (m1 ) ⊕ z1n to recover m1 . The probability of decoding error tends to
zero as n → ∞ if R1 < I(V ; V ⊕ Z1 ) = H(α ∗ p1 ) − H(p1 ) and R2 < 1 − H(α ∗ p1 ). The
latter condition is already satisfied by the rate constraint for receiver  since p1 < p2 .
Thus, superposition coding leads to an inner bound consisting of the set of rate pairs
(R1 , R2 ) such that

R1 ≤ H(α ∗ p1 ) − H(p1 ),
R2 ≤ 1 − H(α ∗ p2 )

for some α ∈ [0, 1/2]. This inner bound is larger than the time-division inner bound as
sketched in Figure .. We will show later that this bound is the capacity region of the
BS-BC.

Time division
1 − H(p2 ) Superposition
coding

R1
1 − H(p1 )

Figure .. Superposition coding inner bound for the BS-BC.

The superposition coding technique illustrated in the above example can be general-
ized to obtain the following inner bound on the capacity region of the general DM-BC.
5.3 Superposition Coding Inner Bound 109

Theorem . (Superposition Coding Inner Bound). A rate pair (R1 , R2 ) is achiev-
able for the DM-BC p(y1 , y2 |x) if

R1 < I(X; Y1 |U),

R2 < I(U ; Y2 ),
R1 + R2 < I(X; Y1 )

for some pmf p(u, x).

Remark 5.1. The random variable U in the theorem is an auxiliary random variable that
does not correspond to any of the channel variables. While the time-sharing random
variable encountered in Chapter  is also an auxiliary random variable, U actually carries
message information and as such plays a more essential role in the characterization of the
capacity region.
Remark 5.2. The superposition coding inner bound is evaluated for the joint pmf p(u, x)
p(y1 , y2 |x), i.e., U → X → (Y1 , Y2 ); see Remark ..
Remark 5.3. It can be shown that the inner bound is convex and therefore there is no
need for further convexification via a time-sharing random variable.

5.3.1 Proof of the Superposition Coding Inner Bound

We begin with a sketch of achievability; see Figure .. Fix a pmf p(u, x) and randomly
generate 2nR2 “cloud centers” un (m2 ). For each cloud center un (m2 ), generate 2nR1 “satel-
lite” codewords x n (m1 , m2 ). Receiver  decodes for the cloud center un (m2 ) and receiver 
decodes for the satellite codeword.

Un un Xn xn

Figure .. Superposition coding.

110 Degraded Broadcast Channels

We now provide details of the proof.

Codebook generation. Fix a pmf p(u)p(x|u). Randomly and independently generate
2nR2 sequences un (m2 ), m2 ∈ [1 : 2nR2 ], each according to ∏ni=1 pU (ui ). For each m2 ∈
[1 : 2nR2 ], randomly and conditionally independently generate 2nR1 sequences x n (m1 , m2 ),
m1 ∈ [1 : 2nR1 ], each according to ∏ni=1 p X|U (xi |ui (m2 )).
Encoding. To send (m1 , m2 ), transmit x n (m1 , m2 ).
Decoding. Decoder  declares that m ̂ 2 is sent if it is the unique message such that (un (m ̂ 2 ),
(n)
y2 ) ∈ Tє ; otherwise it declares an error. Since decoder  is interested only in recover-
n

ing m1 , it uses simultaneous decoding without requiring the recovery of m2 , henceforth

referred to as simultaneous nonunique decoding. Decoder  declares that m ̂ 1 is sent if it is
the unique message such that (un (m2 ), x n (m̂ 1 , m2 ), y1n ) ∈ Tє(n) for some m2 ; otherwise it
declares an error.
Analysis of the probability of error. Assume without loss of generality that (M1 , M2 ) =
(1, 1) is sent. First consider the average probability of error for decoder . This decoder
makes an error iff one or both of the following events occur:

E21 = 󶁁(U n (1), Y2n ) ∉ Tє(n) 󶁑,

E22 = 󶁁(U n (m2 ), Y2n ) ∈ Tє(n) for some m2 ̸= 1󶁑.

Thus the probability of error for decoder  is upper bounded as

P(E2 ) ≤ P(E21 ) + P(E22 ).

By the LLN, the first term P(E21 ) tends to zero as n → ∞. For the second term, since
U n (m2 ) is independent of (U n (1), Y2n ) for m2 ̸= 1, by the packing lemma, P(E22 ) tends to
zero as n → ∞ if R2 < I(U ; Y2 ) − δ(є).
Next consider the average probability of error for decoder . To analyze this probabil-
ity, consider all possible pmfs for the triple (U n (m2 ), X n (m1 , m2 ), Y1n ) as listed in Table ..
Since the last case in the table does not result in an error, the decoder makes an error iff

m1 m2 Joint pmf
1 1 p(u , x n )p(y1n |x n )
n

∗ 1 p(un , x n )p(y1n |un )

∗ ∗ p(un , x n )p(y1n )
1 ∗ p(un , x n )p(y1n )

Table .. The joint pmfs induced by different (m1 , m2 ) pairs. The ∗ symbol corre-
sponds to message m1 ̸= 1 or m2 ̸= 1.
5.3 Superposition Coding Inner Bound 111

one or more of the following three events occur:

E11 = 󶁁(U n (1), X n (1, 1), Y1n ) ∉ Tє(n) 󶁑,

E12 = 󶁁(U n (1), X n (m1 , 1), Y1n ) ∈ Tє(n) for some m1 ̸= 1󶁑,
E13 = 󶁁(U n (m2 ), X n (m1 , m2 ), Y1n ) ∈ Tє(n) for some m1 ̸= 1, m2 ̸= 1󶁑.

Thus the probability of error for decoder  is upper bounded as

P(E1 ) ≤ P(E11 ) + P(E12 ) + P(E13 ).

We now bound each term. By the LLN, P(E11 ) tends to zero as n → ∞. For the second
term, note that if m1 ̸= 1, then X n (m1 , 1) is conditionally independent of (X n (1, 1), Y1n )
given U n (1) and is distributed according to ∏ni=1 p X|U (xi |ui (1)). Hence, by the packing
lemma, P(E12 ) tends to zero as n → ∞ if R1 < I(X; Y1 |U ) − δ(є).
Finally, for the third term, note that for m2 ̸= 1 (and any m1 ), (U n (m2 ), X n (m1 , m2 )) is
independent of (U n (1), X n (1, 1), Y1n ). Hence, by the packing lemma, P(E13 ) tends to zero
as n → ∞ if R1 + R2 < I(U , X; Y1 ) − δ(є) = I(X; Y1 ) − δ(є) (recall that U → X → Y1 form
a Markov chain). This completes the proof of achievability.

Remark 5.4. Consider the error event

E14 = 󶁁(U n (m2 ), X n (1, m2 ), Y1n ) ∈ Tє(n) for some m2 ̸= 1󶁑.

By the packing lemma, P(E14 ) tends to zero as n → ∞ if R2 < I(U , X; Y1 ) − δ(є) =

I(X; Y1 ) − δ(є), which is already satisfied. Therefore, the inner bound does not change
if we use simultaneous (unique) decoding and require decoder  to also recover M2 .
Remark 5.5. We can obtain a second superposition coding inner bound by having re-
ceiver  decode for the cloud center (which would now represent M1 ) and receiver  de-
code for the satellite codeword. This yields the set of rate pairs (R1 , R2 ) such that

R1 < I(U ; Y1 ),
R2 < I(X; Y2 |U),
R1 + R2 < I(X; Y2 )

for some pmf p(u, x). The convex closure of the union of the two superposition coding
inner bounds constitutes a generally tighter inner bound on the capacity region.
Remark 5.6. Superposition coding is optimal for several classes of broadcast channels for
which one receiver is stronger than the other, as we detail in the following sections. It is
not, however, optimal in general since requiring one of the receivers to recover both mes-
sages (even nonuniquely for the other receiver’s message) can unduly constrain the set of
achievable rates. In Chapter , we present Marton’s coding scheme, which can outperform
superposition coding by not requiring either receiver to recover both messages.
112 Degraded Broadcast Channels

5.4 DEGRADED DM-BC

In a degraded broadcast channel, one of the receivers is statistically stronger than the other.
Formally, a DM-BC is said to be physically degraded if

p(y1 , y2 |x) = p(y1 |x)p(y2 | y1 ),

i.e., X → Y1 → Y2 form a Markov chain.

More generally, a DM-BC p(y1 , y2 |x) is said to be stochastically degraded (or simply
degraded) if there exists a random variable Ỹ1 such that Ỹ1 |{X = x} ∼ pY1 |X ( ỹ1 |x), i.e., Ỹ1
has the same conditional pmf as Y1 (given X), and X → Ỹ1 → Y2 form a Markov chain.
Since the capacity region of a DM-BC depends only on the conditional marginal pmfs,
the capacity region of a stochastically degraded DM-BC is the same as that of its corre-
sponding physically degraded channel. This observation will be used later in the proof of
the converse.
Note that the BS-BC in Example . is a degraded DM-BC. Again assume that p1 <
p2 < 1/2. The channel is degraded since we can write Y2 = X ⊕ Z̃1 ⊕ Z̃2 , where Z̃1 ∼
Bern(p1 ) and Z̃2 ∼ Bern( p̃2 ) are independent, and
p2 − p1
p̃2 = .
1 − 2p1
Figure . shows the physically degraded BS-BC with the same conditional marginal pmfs.
The capacity region of the original BS-BC is the same as that of the physically degraded
version.
Z̃1 ∼ Bern(p1 ) Z̃2 ∼ Bern( p̃2 )
Ỹ1

X Y2

Figure .. Corresponding physically degraded BS-BC.

The superposition coding inner bound is tight for the class of degraded broadcast
channels.

Theorem .. The capacity region of the degraded DM-BC p(y1 , y2 |x) is the set of rate
pairs (R1 , R2 ) such that

R1 ≤ I(X; Y1 |U),
R2 ≤ I(U ; Y2 )

for some pmf p(u, x), where the cardinality of the auxiliary random variable U satisfies
|U| ≤ min{|X |, |Y1 |, |Y2 |} + 1.
5.4 Degraded DM-BC 113

To prove achievability, note that the sum of the first two inequalities in the super-
position coding inner bound gives R1 + R2 < I(U ; Y2 ) + I(X; Y1 |U). Since the channel is
degraded, I(U ; Y1 ) ≥ I(U ; Y2 ) for all p(u, x). Hence, I(U ; Y2 ) + I(X; Y1 |U) ≤ I(U ; Y1 ) +
I(X; Y1 |U ) = I(X; Y1 ) and the third inequality is automatically satisfied.

5.4.1 Proof of the Converse

We need to show that given any sequence of (2nR1 , 2nR2 , n) codes with limn→∞ Pe(n) = 0, we
must have R1 ≤ I(X; Y1 |U ) and R2 ≤ I(U ; Y2 ) for some pmf p(u, x) such that U → X →
(Y1 , Y2 ) form a Markov chain. The key new idea in the converse proof is the identification
of the auxiliary random variable U. Every (2nR1 , 2nR2 , n) code (by definition) induces a
joint pmf on (M1 , M2 , X n , Y1n , Y2n ) of the form

n
p(m1 , m2 , x n , y1n , y2n ) = 2−n(R1 +R2 ) p(x n |m1 , m2 ) 󵠉 pY1 ,Y2 |X (y1i , y2i |xi ).
i=1

By Fano’s inequality,

H(M1 |Y1n ) ≤ nR1 Pe(n) + 1 ≤ nєn ,

H(M2 |Y2n ) ≤ nR2 Pe(n) + 1 ≤ nєn

for some єn that tends to zero as n → ∞. Hence

nR1 ≤ I(M1 ; Y1n ) + nєn ,

(.)
nR2 ≤ I(M2 ; Y2n ) + nєn .

First consider the following intuitive identification of the auxiliary random variable.
Since U represents M2 in the superposition coding scheme, it is natural to choose U = M2 ,
which also satisfies the desired Markov chain condition U → Xi → (Y1i , Y2i ). Using this
identification, consider the first mutual information term in (.),

I(M1 ; Y1n ) ≤ I(M1 ; Y1n |M2 )

Note that this bound has the same structure as the inequality on R1 in Theorem .. Next,
consider the second mutual information term in (.),
n
I(M2 ; Y2n ) = 󵠈 I(M2 ; Y2i |Y2i−1 )
i=1
n
= 󵠈 I(U ; Y2i |Y2i−1 ).
i=1

However, I(U ; Y2i |Y2i−1 ) is not necessarily less than or equal to I(U ; Y2i ). Thus setting U =
M2 does not yield an inequality that has the same form as the one on R2 in Theorem ..
Step (.) in the above series of inequalities suggests the identification of the auxiliary
random variable as Ui = (M2 , Y1i−1 ), which also satisfies Ui → Xi → (Y1i , Y2i ). This gives

I(M1 ; Y1n ) ≤ I(M1 ; Y1n |M2 )

n
= 󵠈 I(M1 ; Y1i |Y1i−1 , M2 )
i=1
n
= 󵠈 I(Xi , M1 ; Y1i |Ui )
i=1
n
= 󵠈 I(Xi ; Y1i |Ui ). (.)
i=1

Now consider the second term in (.)

n
I(M2 ; Y2n ) = 󵠈 I(M2 ; Y2i |Y2i−1 )
i=1
n
≤ 󵠈 I(M2 , Y2i−1 ; Y2i )
i=1
n
≤ 󵠈 I(M2 , Y2i−1 , Y1i−1 ; Y2i ). (.)
i=1

However, I(M2 , Y2i−1 , Y1i−1 ; Y2i ) is not necessarily equal to I(M2 , Y1i−1 ; Y2i ).
Using the observation that the capacity region of the general degraded DM-BC is
the same as that of the corresponding physically degraded BC, we can assume without
loss of generality that X → Y1 → Y2 form a Markov chain. This implies that Y2i−1 →
(M2 , Y1i−1 ) → Y2i also form a Markov chain and hence (.) implies that
n
I(M2 ; Y2n ) ≤ 󵠈 I(Ui ; Y2i ). (.)
i=1

To complete the proof, define the time-sharing random variable Q to be uniformly dis-
tributed over [1 : n] and independent of (M1 , M2 , X n , Y1n , Y2n ), and identify U = (Q, UQ ),
5.4 Degraded DM-BC 115

X = XQ , Y1 = Y1Q , and Y2 = Y2Q . Clearly, U → X → (Y1 , Y2 ) form a Markov chain. Using

these definitions and substituting from (.) and (.) into (.), we have
n
nR1 ≤ 󵠈 I(Xi ; Y1i |Ui ) + nєn
i=1
= nI(XQ ; Y1Q |UQ , Q) + nєn
= nI(X; Y1 |U ) + nєn ,
n
nR2 ≤ 󵠈 I(Ui ; Y2i ) + nєn
i=1
= nI(UQ ; Y2Q |Q) + nєn
≤ nI(U ; Y2 ) + nєn .

The bound on the cardinality of U can be established using the convex cover method
in Appendix C. This completes the proof of Theorem ..
Remark .. The proof works also for Ui = (M2 , Y2i−1 ) or Ui = (M2 , Y1i−1 , Y2i−1 ); both
identifications satisfy the Markov condition Ui → Xi → (Y1i , Y2i ).

5.4.2 Capacity Region of the BS-BC

We show that the superposition coding inner bound for the BS-BC presented in Exam-
ple . is tight. Since the BS-BC is degraded, Theorem . characterizes its capacity region.
We show that this region can be simplified to the set of rate pairs (R1 , R2 ) such that

R1 ≤ H(α ∗ p1 ) − H(p1 ),
(.)
R2 ≤ 1 − H(α ∗ p2 )

for some α ∈ [0, 1/2].

In Example ., we argued that the interior of this rate region is achievable via super-
position coding by taking U ∼ Bern(1/2) and X = U ⊕ V , where V ∼ Bern(α) is indepen-
dent of U . We now show that the characterization of the capacity region in Theorem .
reduces to (.) using two alternative methods.
Proof via Mrs. Gerber’s lemma. First recall that the capacity region is the same as that
of the physically degraded DM-BC X → Ỹ1 → Y2 . Thus we assume that it is physically
degraded, i.e., Y2 = Y1 ⊕ Z̃2 , where Z1 ∼ Bern(p1 ) and Z̃2 ∼ Bern( p̃2 ) are independent and
p2 = p1 ∗ p̃2 .
Consider the second inequality in the capacity region characterization in Theorem .

I(U ; Y2 ) = H(Y2 ) − H(Y2 |U ) ≤ 1 − H(Y2 |U ).

Since 1 ≥ H(Y2 |U ) ≥ H(Y2 |X) = H(p2 ), there exists an α ∈ [0, 1/2] such that H(Y2 |U ) =
H(α ∗ p2 ). Next consider

I(X; Y1 |U) = H(Y1 |U ) − H(Y1 | X) = H(Y1 |U) − H(p1 ).

116 Degraded Broadcast Channels

Now let 0 ≤ H −1 (󰑣) ≤ 1/2 be the inverse of the binary entropy function. By physical
degradedness and the scalar MGL in Section .,

H(Y2 |U ) = H(Y1 ⊕ Z̃2 |U ) ≥ H󶀡H −1 (H(Y1 |U)) ∗ p̃2 󶀱.

But H(Y2 |U) = H(α ∗ p2 ) = H(α ∗ p1 ∗ p̃2 ), and thus

H(Y1 |U ) ≤ H(α ∗ p1 ).

Proof via a symmetrization argument. Since for the BS-BC |U | ≤ |X | + 1 = 3, assume

without loss of generality that U = {1, 2, 3}. Given any (U , X) ∼ p(u, x), define (Ũ , X)
̃ ∼
p(ũ , x̃) as
1
pŨ (u) = pŨ (−u) = pU (u), u ∈ {1, 2, 3},
2
̃ Ũ (x |u) = p X|
p X| ̃ Ũ (1 − x | − u) = p X|U (x |u), (u, x) ∈ {1, 2, 3} × {0, 1}.

Further define Ỹ1 to be the output of the BS-BC when the input is X. ̃ Given {|U|̃ = u},
u = 1, 2, 3, the channel from Ũ to X̃ is a BSC with parameter p(x|u) (with input alphabet
{−u, u} instead of {0, 1}). Thus, H(Y1 |U = u) = H(Ỹ1 |Ũ = u) = H(Ỹ1 |Ũ = −u) for u =
1, 2, 3, which implies that H(Y1 |U ) = H(Ỹ1 |Ũ ) and the first bound in the capacity region
is preserved. Similarly, H(Y2 |U) = H(Ỹ2 |U).
̃ Also note that X̃ ∼ Bern(1/2) and so are the
̃ ̃
corresponding outputs Y1 and Y2 . Hence,

H(Y2 ) − H(Y2 |U ) ≤ H(Ỹ2 ) − H(Ỹ2 |U).

Therefore, it suffices to evaluate the capacity region characterization in Theorem . with
the above symmetric input pmfs p(ũ , x̃).
Next consider the weighted sum

̃ Ỹ1 |Ũ ) + (1 − λ)I(Ũ ; Ỹ2 )

λI( X;
= λH(Ỹ1 |Ũ ) − (1 − λ)H(Ỹ2 |Ũ ) − λH(p1 ) + (1 − λ)
3
= 󵠈 󶀡λH(Ỹ1 |Ũ , |Ũ | = u) − (1 − λ)H(Ỹ2 |Ũ , |Ũ | = u)󶀱p(u) − λH(p1 ) + (1 − λ)
u=1
≤ max 󶀡λH(Ỹ1 |Ũ , |Ũ | = u) − (1 − λ)H(Ỹ2 |Ũ , |Ũ | = u)󶀱 − λH(p1 ) + (1 − λ)
u∈{1,2,3}

for some λ ∈ [0, 1]. Since the maximum is attained by a single u (i.e., pŨ (u) = pŨ (−u) =
̃ = u, the channel from Ũ to X̃ is a BSC, the set of rate pairs (R1 , R2 )
1/2) and given |U|
such that

λR1 + (1 − λ)R2 ≤ max󶁡λ(H(α ∗ p1 ) − H(p1 )) + (1 − λ)(1 − H(α ∗ p2 ))󶁱

constitutes an outer bound on the capacity region given by the supporting line corre-
sponding to each λ ∈ [0, 1]. Finally, since the rate region in (.) is convex (see Prob-
lem .), by Lemma A. it coincides with the capacity region.
5.5 Gaussian Broadcast Channel 117

Remark 5.8. A brute-force optimization of the weighted sum Rsum (λ) = λI( X; ̃ Ỹ1 |U)
̃ +
̃ ̃
(1 − λ)I(U ; Y2 ) can lead to the same conclusion. In this approach, it suffices to optimize
over all p(u, x) with |U | ≤ 2. The above symmetrization argument provides a more elegant
approach to performing this three-dimensional optimization.
Remark 5.9. The second converse proof does not require physical degradedness. In fact,
this symmetrization argument can be used to evaluate the superposition coding inner
bound in Theorem . for any nondegraded binary-input DM-BC with pY1 ,Y2 |X (y1 , y2 |0) =
pY1 ,Y2 |X (−y1 , −y2 |1) for all y1 , y2 (after proper relabeling of the symbols) such as the BSC–
BEC BC in Example . of Section ..

5.5 GAUSSIAN BROADCAST CHANNEL

Consider the -receiver discrete-time additive white Gaussian noise (Gaussian in short)
BC model depicted in Figure .. The channel outputs corresponding to the input X are

Y1 = д1 X + Z1 ,
Y2 = д2 X + Z2 ,

where д1 and д2 are channel gains, and Z1 ∼ N(0, N0 /2) and Z2 ∼ N(0, N0 /2) are noise
components. Thus in transmission time i, the channel outputs are

Y1i = д1 Xi + Z1i ,
Y2i = д2 Xi + Z2i ,

where {Z1i } and {Z2i } are WGN(N0 /2) processes, independent of the channel input X n .
Note that only the marginal distributions of {(Z1i , Z2i )} are relevant to the capacity region
and hence we do not need to specify their joint distribution. Assume without loss of
generality that |д1 | ≥ |д2 | and N0 /2 = 1. Further assume an average transmission power
constraint
n
󵠈 xi2 (m1 , m2 ) ≤ nP, (m1 , m2 ) ∈ [1 : 2nR1 ] × [1 : 2nR2 ].
i=1

For notational convenience, we consider the equivalent Gaussian BC channel

Y1 = X + Z1 ,
Y2 = X + Z2 .

In this model, the channel gains are normalized to  and the transmitter-referred noise
components Z1 and Z2 are zero-mean Gaussian with powers N1 = 1/д12 and N2 = 1/д22 ,
respectively. This equivalence can be seen by first multiplying both sides of the equa-
tions of the original channel model by 1/д1 and 1/д2 , respectively, and then scaling the
resulting channel outputs by д1 and д2 . Note that the equivalent broadcast channel is
118 Degraded Broadcast Channels

д1 Y1

X
д2
Y2

Z2
Figure .. Gaussian broadcast channel.

stochastically degraded. Hence, its capacity region is the same as that of the physically
degraded Gaussian BC (see Figure .)
Y1 = X + Z1 ,
Y2 = Y1 + Z̃2 ,
where Z1 and Z̃2 are independent Gaussian random variables with average powers N1 and
(N2 − N1 ), respectively. Define the channel received signal-to-noise ratios as S1 = P/N1
and S2 = P/N2 .

Z1 ∼ N(0, N1 ) Z̃2 ∼ N(0, N2 − N1 )

X Y2

Figure .. Corresponding physically degraded Gaussian BC.

5.5.1 Capacity Region of the Gaussian BC

The capacity region of the Gaussian BC is a function only of the channel SNRs and a power
allocation parameter.

Theorem .. The capacity region of the Gaussian BC is the set of rate pairs (R1 , R2 )
such that
R1 ≤ C(αS1 ),
̄ 2
αS
R2 ≤ C 󶀥 󶀵
αS2 + 1
for some α ∈ [0, 1], where C(x) is the Gaussian capacity function.
5.5 Gaussian Broadcast Channel 119

̄ and V ∼ N(0, αP), independent of each

Achievability follows by setting U ∼ N(0, αP)
other, X = U + V ∼ N(0, P) in the superposition coding inner bound. With this choice
of (U , X), it can be readily to shown that
αP
I(X; Y1 |U ) = C 󶀥 󶀵 = C(αS1 ),
N1
̄
αP ̄ 2
αS
I(U ; Y2 ) = C 󶀥 󶀵 = C󶀥 󶀵,
αP + N2 αS2 + 1
which are the expressions in the theorem.
As for the BS-BC in Example ., the superposition coding scheme for the Gauss-
ian BC can be described more explicitly. Randomly and independently generate 2nR2 se-
̄
quences un (m2 ), m2 ∈ [1 : 2nR2 ], each i.i.d. N(0, αP), and 2nR1 sequences 󰑣 n (m1 ), m1 ∈
[1 : 2nR1 ], each i.i.d. N(0, αP). To send the message pair (m1 , m2 ), the encoder transmits
x n (m1 , m2 ) = un (m2 ) + 󰑣 n (m1 ).
Receiver  recovers m2 from y2n = un (m2 ) + (󰑣 n (m1 ) + z2n ) while treating 󰑣 n (m1 ) as
̄
noise. The probability of decoding error tends to zero as n → ∞ if R2 < C(αP/(αP + N2 )).
Receiver  uses successive cancellation—it first decodes y1n = un (m2 ) + (󰑣 n (m1 ) + z1n )
to recover m2 while treating 󰑣 n (m1 ) as part of the noise, subtracts off un (m2 ), and then
decodes 󰑣 n (m1 ) + z1n to recover m1 . The probability of decoding error tends to zero as
n → ∞ if R1 < C(αP/N1 ) and R2 < C(αP/(αP ̄ + N1 )). Since N1 < N2 , the latter condition
is already satisfied by the rate constraint for receiver .

Remark 5.10. To make the above arguments rigorous, we can use the discretization pro-
cedure detailed in the achievability proof for the point-to-point Gaussian channel in Sec-
tion ..
Remark 5.11. As for the Gaussian MAC in Section ., every point in the capacity region
can be achieved simply using good point-to-point Gaussian channel codes and successive
cancellation decoding.

5.5.2 Proof of the Converse

Since the capacity region of the Gaussian BC is the same as that of its corresponding phys-
ically degraded Gaussian BC, we prove the converse for the physically degraded Gaussian
BC shown in Figure .. By Fano’s inequality, we have

nR1 ≤ I(M1 ; Y1n |M2 ) + nєn ,

nR2 ≤ I(M2 ; Y2n ) + nєn .
We need to show that there exists an α ∈ [0, 1] such that
αP
I(M1 ; Y1n |M2 ) ≤ n C(αS1 ) = n C 󶀥 󶀵
N1
and
̄ 2
αS ̄
αP
I(M2 ; Y2n ) ≤ n C 󶀥 󶀵 = nC󶀥 󶀵.
αS2 + 1 αP + N2
120 Degraded Broadcast Channels

Consider
n
I(M2 ; Y2n ) = h(Y2n ) − h(Y2n |M2 ) ≤ log󶀡2πe(P + N2 )󶀱 − h(Y2n |M2 ).
2

Since
n n
log 󶀡2πeN2 󶀱 = h(Z2n ) = h(Y2n |M2 , X n ) ≤ h(Y2n |M2 ) ≤ h(Y2n ) ≤ log(2πe(P + N2 )),
2 2

there must exist an α ∈ [0, 1] such that

n
h(Y2n |M2 ) = log(2πe(αP + N2 )). (.)
2

Next consider

I(M1 ; Y1n |M2 ) = h(Y1n |M2 ) − h(Y1n |M1 , M2 )

Now using the conditional EPI in Section ., we obtain

h(Y2n |M2 ) = h(Y1n + Z̃2n |M2 )

n 󰑛 ̃󰑛
≥ log󶀡22h(Y1 |M2 )/n + 22h(Z2 |M2 )/n 󶀱
2
n 󰑛
= log󶀡22h(Y1 |M2 )/n + 2πe(N2 − N1 )󶀱.
2

Combining this inequality with (.) implies that

󰑛
2πe(αP + N2 ) ≥ 22h(Y1 |M2 )/n + 2πe(N2 − N1 ).

Thus, h(Y1n |M2 ) ≤ (n/2) log(2πe(αP + N1 )) and hence

n n αP
I(M1 ; Y1n |M2 ) ≤ log󶀡2πe(αP + N1 )󶀱 − log(2πeN1 ) = n C 󶀥 󶀵 .
2 2 N1

This completes the proof of Theorem ..

Remark 5.12. The converse can be proved directly starting with the single-letter charac-
terization R1 ≤ I(X; Y1 |U ), R2 ≤ I(U ; Y2 ) with the power constraint E(X 2 ) ≤ P. The proof
then follows similar steps to the above converse proof and uses the scalar version of the
conditional EPI.
Remark 5.13. The similarity between the above converse proof and the converse proof
for the BS-BC hints at some form of duality between Mrs. Gerber’s lemma for binary
random variables and the entropy power inequality for continuous random variables.
5.6 Less Noisy and More Capable Broadcast Channels 121

5.6 LESS NOISY AND MORE CAPABLE BROADCAST CHANNELS

Superposition coding is optimal for the following two classes of broadcast channels, which
are more general than the class of degraded broadcast channels.
Less noisy DM-BC. A DM-BC p(y1 , y2 |x) is said to be less noisy if I(U ; Y1 ) ≥ I(U ; Y2 )
for all p(u, x). In this case we say that receiver  is less noisy than receiver . The private-
message capacity region of the less noisy DM-BC is the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X; Y1 |U),
R2 ≤ I(U ; Y2 )

for some pmf p(u, x), where |U | ≤ min{|X |, |Y2 |} + 1. Note that the less noisy condition
guarantees that in the superposition coding scheme, receiver  can recover the message
intended for receiver .
More capable DM-BC. A DM-BC p(y1 , y2 |x) is said to be more capable if I(X; Y1 ) ≥
I(X; Y2 ) for all p(x). In this case we say that receiver 1 is more capable than receiver 2.
The private-message capacity region of the more capable DM-BC is the set of rate pairs
(R1 , R2 ) such that

R1 ≤ I(X; Y1 |U),
R2 ≤ I(U ; Y2 ), (.)
R1 + R2 ≤ I(X; Y1 )

for some pmf p(u, x), where |U | ≤ min{|X |, |Y1 |⋅|Y2 |} + 1.

It can be easily shown that if a DM-BC is degraded then it is less noisy, and that if
a DM-BC is less noisy then it is more capable. The converse to each of these statements
does not hold in general as illustrated in the following example.
Example . (A BSC and a BEC). Consider a DM-BC with X ∈ {0, 1}, Y1 ∈ {0, 1}, and
Y2 ∈ {0, 1, e}, where the channel from X to Y1 is a BSC(p), p ∈ (0, 1/2), and the channel
from X to Y2 is a BEC(є), є ∈ (0, 1). Then it can be shown that the following hold:
. For 0 < є ≤ 2p, Y1 is a degraded version of Y2 .
. For 2p < є ≤ 4p(1 − p), Y2 is less noisy than Y1 , but Y1 is not a degraded version of Y2 .
. For 4p(1 − p) < є ≤ H(p), Y2 is more capable than Y1 , but not less noisy.
. For H(p) < є < 1, the channel does not belong to any of the three classes.
The capacity region for each of these cases is achieved using superposition coding. The
converse proofs for the first three cases follow by evaluating the capacity expressions for
the degraded BC, less noisy BC, and more capable BC, respectively. The converse proof
for the last case will be given in Chapter .
122 Degraded Broadcast Channels

5.6.1 Proof of the Converse for the More Capable DM-BC

It is difficult to find an identification of the auxiliary random variable that satisfies the
desired properties for the capacity region characterization in (.). Instead, we prove the
converse for the equivalent region consisting of all rate pairs (R1 , R2 ) such that

R2 ≤ I(U ; Y2 ),
R1 + R2 ≤ I(X; Y1 |U) + I(U ; Y2 ),
R1 + R2 ≤ I(X; Y1 )

for some pmf p(u, x).

The converse proof for this alternative region involves a tricky identification of the
auxiliary random variable and the application of the Csiszár sum identity in Section ..
By Fano’s inequality, it is straightforward to show that

nR2 ≤ I(M2 ; Y2n ) + nєn ,

n(R1 + R2 ) ≤ I(M1 ; Y1n |M2 ) + I(M2 ; Y2n ) + nєn ,
n(R1 + R2 ) ≤ I(M1 ; Y1n ) + I(M2 ; Y2n |M1 ) + nєn .

Consider the mutual information terms in the second inequality.

n n
I(M1 ; Y1n |M2 ) + I(M2 ; Y2n ) = 󵠈 I(M1 ; Y1i |M2 , Y1i−1 ) + 󵠈 I(M2 ; Y2i |Y2,i+1
n
)
i=1 i=1
n n
n
≤ 󵠈 I(M1 , Y2,i+1 ; Y1i |M2 , Y1i−1 ) + 󵠈 I(M2 , Y2,i+1
n
; Y2i )
i=1 i=1
n n
n
= 󵠈 I(M1 , Y2,i+1 ; Y1i |M2 , Y1i−1 ) + 󵠈 I(M2 , Y2,i+1
n
, Y1i−1 ; Y2i )
i=1 i=1
n
− 󵠈 I(Y1i−1 ; Y2i |M2 , Y2,i+1
n
)
i=1
n n
= 󵠈 I(M1 ; Y1i |M2 , Y1i−1 , Y2,i+1
n n
) + 󵠈 I(M2 , Y2,i+1 , Y1i−1 ; Y2i )
i=1 i=1
n n
− 󵠈 I(Y1i−1 ; Y2i |M2 , Y2,i+1
n n
) + 󵠈 I(Y2,i+1 ; Y1i |M2 , Y1i−1 )
i=1 i=1
n
(a)
= 󵠈󶀡I(M1 ; Y1i |Ui ) + I(Ui ; Y2i )󶀱
i=1
n
≤ 󵠈󶀡I(Xi ; Y1i |Ui ) + I(Ui ; Y2i )󶀱,
i=1

where Y10 , Y2,n+1

n
=  and (a) follows by the Csiszár sum identity and the auxiliary random
variable identification Ui = (M2 , Y1i−1 , Y2,i+1
n
).
5.7 Extensions 123

Next consider the mutual information term in the first inequality

n
I(M2 ; Y2n ) = 󵠈 I(M2 ; Y2i |Y2,i+1
n
)
i=1
n n
≤ 󵠈 I(M2 , Y1i−1 , Y2,i+1
n
; Y2i ) = 󵠈 I(Ui ; Y2i ).
i=1 i=1

For the third inequality, define Vi = (M1 , Y1i−1 , Y2,i+1

n
). Following similar steps to the
bound for the second inequality, we have
n
I(M1 ; Y1n ) + I(M2 ; Y2n |M1 ) ≤ 󵠈󶀡I(Vi ; Y1i ) + I(Xi ; Y2i |Vi )󶀱
i=1
(a) n n
≤ 󵠈󶀡I(Vi ; Y1i ) + I(Xi ; Y1i |Vi )󶀱 = 󵠈 I(Xi ; Y1i ),
i=1 i=1

where (a) follows by the more capable condition, which implies I(X; Y2 |V ) ≤ I(X; Y1 |V )
whenever V → X → (Y1 , Y2 ) form a Markov chain.
The rest of the proof follows by introducing a time-sharing random variable Q ∼
Unif[1 : n] independent of (M1 , M2 , X n , Y1n , Y2n ) and defining U = (Q, UQ ), X = XQ , Y1 =
Y1Q , and Y2 = Y2Q . The bound on the cardinality of U can be proved using the convex
cover method in Appendix C. This completes the proof of the converse.
Remark .. The converse proof for the more capable DM-BC also establishes the con-
verse for the less noisy and degraded DM-BCs.

5.7 EXTENSIONS

We extend the results in the previous sections to the setup with common message and to
channels with more than two receivers.
Capacity region with common and private messages. Our discussion so far has focused
on the private-message capacity region. As mentioned in Remark ., the superposi-
tion coding inner bound is still achievable if we require the “stronger” receiver to also
recover the message intended for the “weaker” receiver. Therefore, if the rate pair (R1 , R2 )
is achievable for private messages, then the rate triple (R0 , R1 , R2 − R0 ) is achievable for
private and common messages.
Using this observation, we can readily show that the capacity region with common
message of the more capable DM-BC is the set of rate triples (R0 , R1 , R2 ) such that

R1 ≤ I(X; Y1 |U),
R0 + R2 ≤ I(U ; Y2 ),
R0 + R1 + R2 ≤ I(X; Y1 )

for some pmf p(u, x). Unlike achievability, the converse is not implied automatically by
124 Degraded Broadcast Channels

the converse for the private-message capacity region. The proof, however, follows similar
steps to that for the private-message capacity region.

k-Receiver degraded DM-BC. Consider a k-receiver degraded DM-BC p(y1 , . . . , yk |x),

where X → Y1 → Y2 → ⋅ ⋅ ⋅ → Yk form a Markov chain. The private-message capacity
region is the set of rate tuples (R1 , . . . , Rk ) such that

R1 ≤ I(X; Y1 |U2 ),
R j ≤ I(U j ; Y j |U j+1 ), j ∈ [2 : k],

for some pmf p(uk , uk−1 )p(uk−2 |uk−1 ) ⋅ ⋅ ⋅ p(x|u2 ) and Uk+1 = . The capacity region for
the 2-receiver Gaussian BC also extends to more than two receivers.
Remark .. The capacity region is not known in general for the less noisy DM-BC with
k > 3 receivers and for the more capable DM-BC with k > 2 receivers.

SUMMARY

∙ Discrete memoryless broadcast channel (DM-BC)

∙ Capacity region depends only on the channel marginal pmfs
∙ Superposition coding
∙ Simultaneous nonunique decoding
∙ Physically and stochastically degraded BCs
∙ Capacity region of degraded BCs is achieved by superposition coding
∙ Identification of the auxiliary random variable in the proof of the converse
∙ Bounding the cardinality of the auxiliary random variable
∙ Proof of the converse for the BS-BC:
∙ Mrs. Gerber’s lemma
∙ Symmetrization argument
∙ Gaussian BC is always degraded
∙ Use of EPI in the proof of the converse for the Gaussian BC
∙ Less noisy and more capable BCs:
∙ Degraded ⇒ less noisy ⇒ more capable
∙ Superposition coding is optimal
∙ Use of Csiszár sum identity in the proof of the converse for more capable BCs
Bibliographic Notes 125

∙ Open problems:
5.1. What is the capacity region of less noisy BCs with four or more receivers?
5.2. What is the capacity region of more capable BCs with three or more receivers?

BIBLIOGRAPHIC NOTES

The broadcast channel was first introduced by Cover (), who demonstrated super-
position coding through the BS-BC and Gaussian BC examples, and conjectured the char-
acterization of the capacity region of the stochastically degraded broadcast channel using
the auxiliary random variable U . Bergmans () proved achievability of the capacity
region of the degraded DM-BC. Subsequently, Gallager () proved the converse by
providing the nonintuitive identification of the auxiliary random variable discussed in
Section .. He also provided a bound on the cardinality of U. Wyner () proved
the converse for the capacity region of the BS-BC using Mrs. Gerber’s lemma. The sym-
metrization argument in the alternative converse proof for the BS-BC is due to Nair (),
who also applied it to binary-input symmetric-output BCs. Bergmans () established
the converse for the capacity region of the Gaussian BC using the entropy power inequal-
ity. A strong converse for the capacity region of the DM-BC was proved by Ahlswede and
Körner () for the maximal probability of error. A technique by Willems () can
be used to extend this strong converse to the average probability of error; see also Csiszár
and Körner (b) for an indirect proof based on the correlation elimination technique
by Ahlswede ().
The classes of less noisy and more capable DM-BCs were introduced by Körner and
Marton (a), who provided operational definitions of these classes and established the
capacity region for the less noisy case. The capacity region for the more capable case was
established by El Gamal (). Nair () established the classification the BSC–BEC
broadcast channel and showed that superposition coding is optimal for all classes. The
capacity region of the -receiver less noisy DM-BC is due to Wang and Nair (), who
showed that superposition coding is optimal for this case as well. Surveys of the literature
on the broadcast channel can be found in van der Meulen (, ) and Cover ().

PROBLEMS

.. Show that the converse for the degraded DM-BC can be proved with the auxiliary
random variable identification Ui = (M2 , Y2i−1 ) or Ui = (M2 , Y1i−1 , Y2i−1 ).
.. Prove the converse for the capacity region of the Gaussian BC by starting from the
single-letter characterization
R1 ≤ I(X; Y1 |U ),
R2 ≤ I(U ; Y2 )
for some cdf F(u, x) such that E(X 2 ) ≤ P.
126 Degraded Broadcast Channels

.. Verify that the characterizations of the capacity region for the BS-BC and the
Gaussian BC in (.) and in Theorem ., respectively, are convex.
.. Show that if a DM-BC is degraded, then it is also less noisy.
.. Show that if a DM-BC is less noisy, then it is also more capable.
.. Given a DM-BC p(y1 , y2 |x), let D(p(x)) = I(X; Y1 ) − I(X; Y2 ). Show that D(p(x))
is concave in p(x) iff Y1 is less noisy than Y2 .
.. Prove the classification of the BSC–BEC broadcast channel in Example ..
.. Show that the two characterizations of the capacity region for the more capable
DM-BC in Section . are equivalent. (Hint: One direction is trivial. For the other
direction, show that the corner points of these regions are the same.)
.. Another simple outer bound. Consider the DM-BC with three messages.
(a) Show that if a rate triple (R0 , R1 , R2 ) is achievable, then it must satisfy the in-
equalities

R0 + R1 ≤ I(X; Y1 ),
R0 + R2 ≤ I(X; Y2 )

for some pmf p(x).

(b) Show that this outer bound on the capacity region can be strictly tighter than
the simple outer bound in Figure .. (Hint: Consider two antisymmetric Z
channels.)
(c) Set R1 = R2 = 0 in the outer bound in part (a) to show that the common-message
capacity is
C0 = max min󶁁I(X; Y1 ), I(X; Y2 )󶁑.
p(x)

Argue that C0 is in general strictly smaller than min{C1 , C2 }.

.. Binary erasure broadcast channel. Consider a DM-BC p(y1 , y2 |x) where the chan-
nel from X to Y1 is a BEC(p1 ) and the channel from X to Y2 is a BEC(p2 ) with
p1 ≤ p2 . Find the capacity region in terms of p1 and p2 .
.. Product of two degraded broadcast channels. Consider two degraded DM-BCs
p(y11 |x1 )p(y21 |y11 ) and p(y12 |x2 )p(y22 |y12 ). The product of these two degraded
DM-BCs depicted in Figure . is a DM-BC with X = (X1 , X2 ), Y1 = (Y11 , Y12 ),
Y2 = (Y21 , Y22 ), and p(y1 , y2 |x) = p(y11 |x1 )p(y21 |y11 )p(y12 |x2 )p(y22 |y12 ). Show
that the private-message capacity region of the product DM-BC is the set of rate
pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y11 |U1 ) + I(X2 ; Y12 |U2 ),

R2 ≤ I(U1 ; Y21 ) + I(U2 ; Y22 )
Problems 127

Y11
X1 p(y11 |x1 ) p(y21 |y11 ) Y21

Y12
X2 p(y12 |x2 ) p(y22 |y12 ) Y22

Figure .. Product of two degraded broadcast channels.

for some pmf p(u1 , x1 )p(u2 , x2 ). Thus, the capacity region is the Minkowski sum
of the capacity regions of the two component DM-BCs.
Remark: This result is due to Poltyrev ().
.. Product of two Gaussian broadcast channels. Consider two Gaussian BCs

Y1 j = X j + Z1 j ,
Y2 j = Y1 j + Z̃2 j , j = 1, 2,

where Z11 ∼ N(0, N11 ), Z12 ∼ N(0, N12 ), Z̃21 ∼ N(0, Ñ 21 ), and Z̃22 ∼ N(0, Ñ 22 ) are
independent noise components. Assume the average transmission power con-
straint
n
2 2
󵠈󶀡x1i (m1 , m2 ) + x2i (m1 , m2 )󶀱 ≤ nP.
i=1

Find the private-message capacity region of the product Gaussian BC with X =

(X1 , X2 ), Y1 = (Y11 , Y12 ), and Y2 = (Y21 , Y22 ), in terms of the noise powers, P, and
power allocation parameters α1 , α2 ∈ [0, 1] for each subchannel and β ∈ [0, 1] be-
tween two channels.
Remark: This result is due to Hughes-Hartogs ().
.. Minimum-energy-per-bit region. Consider the Gaussian BC with noise powers N1
and N2 . Find the minimum-energy-per-bit region, that is, the set of all energy pairs
(E1 , E2 ) = (P/R1 , P/R2 ) such that the rate pair (R1 , R2 ) is achievable with average
code power P.
.. Reversely degraded broadcast channels with common message. Consider two re-
versely degraded DM-BCs p(y11 |x1 )p(y21 |y11 ) and p(y22 |x2 )p(y12 |y22 ). The prod-
uct of these two degraded DM-BCs depicted in Figure . is a DM-BC with X =
(X1 , X2 ), Y1 = (Y11 , Y12 ), Y2 = (Y21 , Y22 ), and p(y1 , y2 |x) = p(y11 |x1 )p(y21 |y11 ) ⋅
p(y22 |x2 )p(y12 |y22 ). A common message M0 ∈ [1 : 2nR0 ] is to be communicated to
both receivers. Show that the common-message capacity is

C0 = max min󶁁I(X1 ; Y11 ) + I(X2 ; Y12 ), I(X1 ; Y21 ) + I(X2 ; Y22 )󶁑.
p(x1 )p(x2 )

Remark: This channel is studied in more detail in Section ..

128 Degraded Broadcast Channels

Y11
X1 p(y11 |x1 ) p(y21 |y11 ) Y21

Y22
X2 p(y22 |x2 ) p(y12 |y22 ) Y12

Figure .. Product of reversely degraded broadcast channels.

.. Duality between Gaussian broadcast and multiple access channels. Consider the
following Gaussian BC and Gaussian MAC:
∙ Gaussian BC: Y1 = д1 X + Z1 and Y2 = д2 X + Z2 , where Z1 ∼ N(0, 1) and Z2 ∼
N(0, 1). Assume average power constraint P on X.
∙ Gaussian MAC: Y = д1 X1 + д2 X2 + Z, where Z ∼ N(0, 1). Assume the average
sum-power constraint
n
2 2
󵠈󶀡x1i (m1 ) + x2i (m2 )󶀱 ≤ nP, (m1 , m2 ) ∈ [1 : 2nR1 ] × [1 : 2nR2 ].
i=1

(a) Characterize the (private-message) capacity regions of these two channels in

terms of P, д1 , д2 , and power allocation parameter α ∈ [0, 1].
(b) Show that the two capacity regions are equal.
(c) Show that every point (R1 , R2 ) on the boundary of the capacity region of the
above Gaussian MAC is achievable using random coding and successive can-
cellation decoding. That is, time sharing is not needed in this case.
(d) Argue that the sequence of codes that achieves the rate pairs (R1 , R2 ) on the
boundary of the Gaussian MAC capacity region can be used to achieve the
same point on the capacity region of the above Gaussian BC.
Remark: This result is a special case of a general duality result between the Gauss-
ian vector BC and MAC presented in Chapter .
.. k-receiver Gaussian BC. Consider the k-receiver Gaussian BC
Y1 = X + Z1 ,
Y j = Y j−1 + Z̃ j , j ∈ [2 : k],

where Z1 , Z̃2 , . . . , Z̃k are independent Gaussian noise components with powers
N1 , Ñ 2 , . . . , Ñ k , respectively. Assume average power constraint P on X. Provide a
characterization of the private-message capacity region in terms of the noise pow-
ers, P, and power allocation parameters α1 , . . . , αk ≥ 0 with ∑kj=1 α j = 1.
.. Three-receiver less noisy BC. Consider the -receiver DM-BC p(y1 , y2 , y3 |x) such
that I(U ; Y1 ) ≥ I(U ; Y2 ) ≥ I(U ; Y3 ) for all p(u, x).
Problems 129

(a) Suppose that W → X n → (Y1n , Y2n ) form a Markov chain. Show that
I(Y2i−1 ; Y1i |W) ≤ I(Y1i−1 ; Y1i |W), i ∈ [2 : n].

(b) Suppose that W → X n → (Y2n , Y3n ) form a Markov chain. Show that
I(Y3i−1 ; Y3i |W) ≤ I(Y2i−1 ; Y3i |W), i ∈ [2 : n].

(c) Using superposition coding and parts (a) and (b), show that the capacity region
is the set of rate triples (R1 , R2 , R3 ) such that
R1 ≤ I(X; Y1 |U , V ),
R2 ≤ I(V ; Y2 |U ),
R3 ≤ I(U ; Y3 )
for some pmf p(u, 󰑣, x). (Hint: Identify Ui = (M3 , Y2i−1 ) and Vi = M2 .)
Remark: This result is due to Wang and Nair ().
.. MAC with degraded message sets. Consider a DM-MAC p(y|x1 , x2 ) with message
pair (M0 , M1 ) uniformly distributed over [1 : 2nR0 ] × [1 : 2nR1 ]. Sender  encodes
(M0 , M1 ), while sender  encodes only M1 . The receiver wishes to recover both
messages. The probability of error, achievability, and the capacity region are de-
fined as for the DM-MAC with private messages.
(a) Show that the capacity region is the set of rate pairs (R0 , R1 ) such that
R1 ≤ I(X1 ; Y | X2 ),
R0 + R1 ≤ I(X1 , X2 ; Y)
for some pmf p(x1 , x2 ).
(b) Characterize the capacity region of the Gaussian MAC with noise power 1,
channel gains д1 and д2 , and average power constraint P on each of X1 and X2 .
.. MAC with common message. Consider a DM-MAC p(y|x1 , x2 ) with message triple
(M0 , M1 , M2 ) uniformly distributed over [1 : 2nR0 ] × [1 : 2nR1 ] × [1 : 2nR2 ]. Sender 
encodes (M0 , M1 ) and sender  encodes (M0 , M2 ). Thus, the common message M0
is available to both senders, while the private messages M1 and M2 are available
only to the respective senders. The receiver wishes to recover all three messages.
The probability of error, achievability, and the capacity region are defined as for
the DM-MAC with private messages.
(a) Show that the capacity region is the set of rate triples (R0 , R1 , R2 ) such that
R1 ≤ I(X1 ; Y |U , X2 ),
R2 ≤ I(X2 ; Y |U , X1 ),
R1 + R2 ≤ I(X1 , X2 ; Y |U ),
R0 + R1 + R2 ≤ I(X1 , X2 ; Y )
for some pmf p(u)p(x1 |u)p(x2 |u).
130 Degraded Broadcast Channels

(b) Show that the capacity region of the Gaussian MAC with average power con-
straint P on each of X1 and X2 is the set of rate triples (R0 , R1 , R2 ) such that

R1 ≤ C(α1 S1 ),
R2 ≤ C(α2 S2 ),
R1 + R2 ≤ C(a1 S1 + α2 S2 ),
R0 + R1 + R2 ≤ C󶀣S1 + S2 + 2󵀆ᾱ1 ᾱ2 S1 S2 󶀳

for some α1 , α2 ∈ [0, 1].

Remark: The capacity region of the DM-MAC with common message is due to
Slepian and Wolf (b). The converse for the Gaussian case is due to Bross,
Lapidoth, and Wigger ().
.. MAC with a helper. Consider the -sender DM-MAC p(y|x1 , x2 , x3 ) with mes-
sage pair (M1 , M2 ) uniformly distributed over [1 : 2nR1 ] × [1 : 2nR2 ] as depicted in
Figure .. Sender  encodes M1 , sender  encodes M2 , and sender  encodes
(M1 , M2 ). The receiver wishes to recover both messages. Show that the capacity
region is the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 , X3 ; Y | X2 , Q),
R2 ≤ I(X2 , X3 ; Y | X1 , Q),
R1 + R2 ≤ I(X1 , X2 , X3 ; Y |Q)

for some pmf p(q)p(x1 |q)p(x2 |q)p(x3 |x1 , x2 , q) with |Q| ≤ 2.

Remark: This is a special case of the capacity region of the general l-message k-
sender DM-MAC established by Han ().

M1 X1n
Encoder 

M2 X2n Yn ̂ 1, M
M ̂2
Encoder  p(y|x1 , x2 , x3 ) Decoder

M1 , M2 X3n
Encoder 

Figure .. Three-sender DM-MAC with two messages.

CHAPTER 6

Interference Channels

We introduce the interference channel as a model for single-hop multiple one-to-one

communications, such as pairs of base stations–handsets communicating over a frequency
band that suffers from intercell interference, pairs of DSL modems communicating over
a bundle of telephone lines that suffers from crosstalk, or pairs of people talking to each
other in a cocktail party. The capacity region of the interference channel is not known
in general. In this chapter, we focus on coding schemes for the two sender–receiver pair
interference channel that are optimal or close to optimal in some special cases.
We first study simple coding schemes that use point-to-point channel codes, namely
time division, treating interference as noise, and simultaneous decoding. We show that
simultaneous decoding is optimal under strong interference, that is, when the interfer-
ing signal at each receiver is stronger than the signal from its respective sender. These
inner bounds are compared for the Gaussian interference channel. We extend the strong
interference result to the Gaussian case and show that treating interference as noise is
sum-rate optimal when the interference is sufficiently weak. The converse proof of the
latter result uses the new idea of a genie that provides side information to each receiver
about its intended codeword.
We then present the Han–Kobayashi coding scheme, which generalizes the aforemen-
tioned simple schemes by also using rate splitting (see Section .) and superposition cod-
ing (see Section .). We show that the Han–Kobayashi scheme is optimal for the class of
injective deterministic interference channels. The converse proof of this result is extended
to establish an outer bound on the capacity region of the class of injective semidetermin-
istic interference channels, which includes the Gaussian interference channel. The outer
bound for the Gaussian case, and hence the capacity region, is shown to be within half a bit
per dimension of the Han–Kobayashi inner bound. This gap vanishes in the limit of high
signal and interference to noise ratios for the normalized symmetric capacity (degrees
of freedom). We discuss an interesting correspondence to q-ary expansion deterministic
(QED) interference channels in this limit.
Finally, we introduce the new idea of interference alignment through a QED interfer-
ence channel with many sender–receiver pairs. Interference alignment for wireless fading
channels will be illustrated in Section ..
132 Interference Channels

6.1 DISCRETE MEMORYLESS INTERFERENCE CHANNEL

Consider the two sender–receiver pair communication system depicted in Figure .,
where each sender wishes to communicate a message to its respective receiver over a
shared interference channel. Each message M j , j = 1, 2, is separately encoded into a code-
word X nj and transmitted over the channel. Upon receiving the sequence Y jn , receiver
j = 1, 2 finds an estimate M ̂ j of message M j . Because communication takes place over
a shared channel, the signal at each receiver can suffer not only from the noise in the
channel, but also from interference by the other transmitted codeword. This leads to a
tradeoff between the rates at which both messages can be reliably communicated. We
seek to determine the limits on this tradeoff.
We first consider a two sender–receiver (-user) pair discrete memoryless interference
channel (DM-IC) model (X1 × X2 , p(y1 , y2 |x1 , x2 ), Y1 × Y2 ) that consists of four finite
sets X1 , X2 , Y1 , Y2 , and a collection of conditional pmfs p(y1 , y2 |x1 , x2 ) on Y1 × Y2 . A
(2nR1 , 2nR2 , n) code for the interference channel consists of
∙ two message sets [1 : 2nR1 ] and [1 : 2nR2 ],
∙ two encoders, where encoder  assigns a codeword x1n (m1 ) to each message m1 ∈ [1 :
2nR1 ] and encoder  assigns a codeword x2n (m2 ) to each message m2 ∈ [1 : 2nR2 ], and
∙ two decoders, where decoder  assigns an estimate m̂ 1 or an error message e to each
received sequence y1n and decoder  assigns an estimate m̂ 2 or an error message e to
each received sequence y2n .
We assume that the message pair (M1 , M2 ) is uniformly distributed over [1 : 2nR1 ] × [1 :
2nR2 ]. The average probability of error is defined as
̂ 1, M
Pe(n) = P󶁁(M ̂ 2) =
̸ (M1 , M2 )󶁑.
A rate pair (R1 , R2 ) is said to be achievable for the DM-IC if there exists a sequence of
(2nR1 , 2nR2 , n) codes such that limn→∞ Pe(n) = 0. The capacity region C of the DM-IC is
the closure of the set of achievable rate pairs (R1 , R2 ) and the sum-capacity Csum of the
DM-IC is defined as Csum = max{R1 + R2 : (R1 , R2 ) ∈ C }.
As for the broadcast channel, the capacity region of the DM-IC depends on the chan-
nel conditional pmf p(y1 , y2 |x1 , x2 ) only through the conditional marginals p(y1 |x1 , x2 )
and p(y2 |x1 , x2 ). The capacity region of the DM-IC is not known in general.

M1 X1n Y1n ̂1
M
Encoder  Decoder 
p(y1 , y2 |x1 , x2 )
M2 X2n Y2n ̂2
M
Encoder  Decoder 

Figure .. Two sender–receiver pair communication system.

6.2 Simple Coding Schemes 133

6.2 SIMPLE CODING SCHEMES

We first consider several simple coding schemes for the interference channel.
Time division. The maximum achievable individual rates for the two sender–receiver
pairs are
C1 = max I(X1 ; Y1 | X2 = x2 ),
p(x1 ), x2

C2 = max I(X2 ; Y2 | X1 = x1 ).
p(x2 ), x1

These capacities define the time-division inner bound consisting of all rate pairs (R1 , R2 )
such that
R1 < αC1 ,
(.)
̄ 2
R2 < αC

for some α ∈ [0, 1]. This bound is tight in some special cases.
Example . (Modulo- sum IC). Consider a DM-IC where the channel inputs X1 , X2
and outputs Y1 , Y2 are binary, and Y1 = Y2 = X1 ⊕ X2 . The time-division inner bound
reduces to the set of rate pairs (R1 , R2 ) such that R1 + R2 < 1. In the other direction, by
allowing cooperation between the receivers, we obtain the upper bound on the sum-rate

R1 + R2 ≤ C12 = max I(X1 , X2 ; Y1 , Y2 ).

p(x1 )p(x2 )

Since in our example, Y1 = Y2 , this bound reduces to the set of rate pairs (R1 , R2 ) such that
R1 + R2 ≤ 1. Hence the time-division inner bound is tight.

The time-division inner bound is not tight in general, however.

Example . (No interference). Consider an interference channel with orthogonal com-
ponents p(y1 , y2 |x1 , x2 ) = p(y1 |x1 )p(y2 |x2 ). In this case, the channel can be viewed sim-
ply as two separate DMCs and the capacity region is the set of rate pairs (R1 , R2 ) such that
R1 ≤ C1 and R2 ≤ C2 . This is clearly larger than the time-division inner bound.

Treating interference as noise. Another inner bound on the capacity region of the inter-
ference channel can be achieved using point-to-point codes, time sharing, and treating
interference as noise. This yields the interference-as-noise inner bound consisting of all
rate pairs (R1 , R2 ) such that

R1 < I(X1 ; Y1 |Q),

(.)
R2 < I(X2 ; Y2 |Q)

for some pmf p(q)p(x1 |q)p(x2 |q).

Simultaneous decoding. At the opposite extreme of treating interference as noise, we
can have each receiver recover both messages. Following the achievability proof for the
134 Interference Channels

DM-MAC using simultaneous decoding and coded time sharing in Section .. (also see
Problem .), we can easily show that this scheme yields the simultaneous-decoding inner
bound on the capacity region of the DM-IC consisting of all rate pairs (R1 , R2 ) such that
R1 < min󶁁I(X1 ; Y1 | X2 , Q), I(X1 ; Y2 | X2 , Q)󶁑,
R2 < min󶁁I(X2 ; Y1 | X1 , Q), I(X2 ; Y2 | X1 , Q)󶁑, (.)
R1 + R2 < min󶁁I(X1 , X2 ; Y1 |Q), I(X1 , X2 ; Y2 |Q)󶁑
for some pmf p(q)p(x1 |q)p(x2 |q).
Remark .. Let R(X1 , X2 ) be the set of rate pairs (R1 , R2 ) such that
R1 < min󶁁I(X1 ; Y1 | X2 ), I(X1 ; Y2 | X2 )󶁑,
R2 < min󶁁I(X2 ; Y2 | X1 ), I(X2 ; Y1 | X1 )󶁑,
R1 + R2 < min󶁁I(X1 , X2 ; Y1 ), I(X1 , X2 ; Y2 )󶁑
for some pmf p(x1 )p(x2 ). Unlike the DM-MAC, the inner bound in . can be strictly
larger than the convex closure of R(X1 , X2 ) over all p(x1 )p(x2 ). Hence, coded time shar-
ing can achieve higher rates than (uncoded) time sharing, and is needed to achieve the
inner bound in (.).

The simultaneous-decoding inner bound is sometimes tight.

Example .. Consider a DM-IC with output alphabets Y1 = Y2 and pY1 |X1 ,X2 (y|x1 , x2 ) =
pY2 |X1 ,X2 (y|x1 , x2 ). The simultaneous-decoding inner bound reduces to the set of rate pairs
(R1 , R2 ) such that
R1 < I(X1 ; Y1 | X2 , Q),
R2 < I(X2 ; Y2 | X1 , Q),
R1 + R2 < I(X1 , X2 ; Y1 |Q)
for some pmf p(q)p(x1 |q)p(x2 |q). Now, using standard converse proof techniques, we can
establish the outer bound on the capacity region of the general DM-IC consisting of all
rate pairs (R1 , R2 ) such that
R1 ≤ I(X1 ; Y1 | X2 , Q),
R2 ≤ I(X2 ; Y2 | X1 , Q),
R1 + R2 ≤ I(X1 , X2 ; Y1 , Y2 |Q)
for some pmf p(q)p(x1 |q)p(x2 |q). This bound can be further improved by using the fact
that the capacity region depends only on the marginals of p(y1 , y2 |x1 , x2 ). If a rate pair
(R1 , R2 ) is achievable, then it must satisfy the inequalities
R1 ≤ I(X1 ; Y1 | X2 , Q),
R2 ≤ I(X2 ; Y2 | X1 , Q), (.)
R1 + R2 ≤ min I(X1 , X2 ; Y1 , Y2 |Q)
̃ 1 , y2 |x1 ,x2 )
p(y

for some pmf p(q)p(x1 |q)p(x2 |q), where the minimum in the third inequality is over all
6.3 Strong Interference 135

̃ 1 , y2 |x1 , x2 ) with the same marginals p(y1 |x1 , x2 ) and p(y2 |x1 , x2 ) as
conditional pmfs p(y
the given channel conditional pmf p(y1 , y2 |x1 , x2 ).
Now, since the marginals of the channel in our example are identical, the minimum
in the third inequality of the outer bound in (.) is attained for Y1 = Y2 , and the bound
reduces to the set of all rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y1 | X2 , Q),
R2 ≤ I(X2 ; Y2 | X1 , Q),
R1 + R2 ≤ I(X1 , X2 ; Y1 |Q)

for some pmf p(q)p(x1 |q)p(x2 |q). Hence, simultaneous decoding is optimal for the DM-
IC in this example.

Simultaneous nonunique decoding. We can improve upon the simultaneous-decoding

inner bound via nonunique decoding, that is, by not requiring each receiver to recover the
message intended for the other receiver. This yields the simultaneous-nonunique-decoding
inner bound consisting of all rate pairs (R1 , R2 ) such that

R1 < I(X1 ; Y1 | X2 , Q),

R2 < I(X2 ; Y2 | X1 , Q), (.)
R1 + R2 < min󶁁I(X1 , X2 ; Y1 |Q), I(X1 , X2 ; Y2 |Q)󶁑

for some pmf p(q)p(x1 |q)p(x2 |q).

The achievability proof of this inner bound uses techniques we have already encoun-
tered in Sections . and .. Fix a pmf p(q)p(x1 |q)p(x2 |q). Randomly generate a sequence
qn ∼ ∏ni=1 pQ (qi ). Randomly and conditionally independently generate 2nR1 sequences
x1n (m1 ), m1 ∈ [1 : 2nR1 ], each according to ∏ni=1 p X1 |Q (x1i |qi ), and 2nR2 sequences x2n (m2 ),
m2 ∈ [1 : 2nR2 ], each according to ∏ni=1 p X2 |Q (x2i |qi ). To send (m1 , m2 ), encoder j = 1, 2
transmits x nj (m j ).
Decoder  finds the unique message m ̂ 1 such that (q n , x1n (m ̂ 1 ), x2n (m2 ), y1n ) ∈ Tє(n) for
some m2 . By the LLN and the packing lemma, the probability of error for decoder 
tends to zero as n → ∞ if R1 < I(X1 ; Y1 , X2 |Q) − δ(є) = I(X1 ; Y1 |X2 , Q) − δ(є) and R1 +
R2 < I(X1 , X2 ; Y1 |Q) − δ(є). Similarly, decoder  finds the unique message m ̂ 2 such that
(n)
(q , x1 (m1 ), x2 (m
n n n
̂ 2 ), y2 ) ∈ Tє for some m1 . Again by the LLN and the packing lemma,
n

the probability of error for decoder  tends to zero as n → ∞ if R2 < I(X2 ; Y2 |X1 , Q) −
δ(є) and R1 + R2 < I(X1 , X2 ; Y2 |Q) − δ(є). This completes the achievability proof of the
simultaneous-nonunique-decoding inner bound.

6.3 STRONG INTERFERENCE

Suppose that each receiver in an interference channel is physically closer to the interfering
transmitter than to its own transmitter and hence the received signal from the interfer-
ing transmitter is stronger than that from its transmitter. Under such strong interference
136 Interference Channels

condition, each receiver can essentially recover the message of the interfering transmitter
without imposing an additional constraint on its rate. We define two notions of strong
interference for the DM-IC and show that simultaneous decoding is optimal under both
notions.
Very strong interference. A DM-IC is said to have very strong interference if

I(X1 ; Y1 | X2 ) ≤ I(X1 ; Y2 ),
(.)
I(X2 ; Y2 | X1 ) ≤ I(X2 ; Y1 )
for all p(x1 )p(x2 ). The capacity region of the DM-IC with very strong interference is the
set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y1 | X2 , Q),
R2 ≤ I(X2 ; Y2 | X1 , Q)
for some pmf p(q)p(x1 |q)p(x2 |q). The converse proof is quite straightforward, since this
region constitutes an outer bound on the capacity region of the general DM-IC. The proof
of achievability follows by noting that under the very strong interference condition, the
sum-rate inequality in the simultaneous (unique or nonunique) decoding inner bound is
inactive. Note that the capacity region can be achieved also via successive cancellation de-
coding and time sharing. Each decoder successively recovers the other message and then
its own message. Because of the very strong interference condition, only the requirements
on the achievable rates for the second decoding step matter.
Strong interference. A DM-IC is said to have strong interference if

I(X1 ; Y1 | X2 ) ≤ I(X1 ; Y2 | X2 ),
(.)
I(X2 ; Y2 | X1 ) ≤ I(X2 ; Y1 | X1 )
for all p(x1 )p(x2 ). Note that this is an extension of the more capable condition for the
DM-BC. In particular, Y2 is more capable than Y1 given X2 , and Y1 is more capable than
Y2 given X1 . Clearly, if the channel has very strong interference, then it also has strong
interference. The converse is not necessarily true as illustrated by the following.
Example .. Consider the DM-IC with binary inputs X1 , X2 and ternary outputs Y1 =
Y2 = X1 + X2 . Then

I(X1 ; Y1 | X2 ) = I(X1 ; Y2 | X2 ) = H(X1 ),

I(X2 ; Y2 | X1 ) = I(X2 ; Y1 | X1 ) = H(X2 ).
Therefore, this DM-IC has strong interference. However,

I(X1 ; Y1 | X2 ) = H(X1 ) ≥ H(X1 ) − H(X1 |Y2 ) = I(X1 ; Y2 ),

I(X2 ; Y2 | X1 ) = H(X2 ) ≥ H(X2 ) − H(X2 |Y1 ) = I(X2 ; Y1 )
with strict inequality for some pmf p(x1 )p(x2 ). Therefore, this channel does not satisfy
the very strong interference condition.
6.4 Gaussian Interference Channel 137

We now show that the simultaneous-nonunique-decoding inner bound is tight under

the strong interference condition.

Theorem .. The capacity region of the DM-IC p(y1 , y2 |x1 , x2 ) with strong interfer-
ence is the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y1 | X2 , Q),
R2 ≤ I(X2 ; Y2 | X1 , Q),
R1 + R2 ≤ min󶁁I(X1 , X2 ; Y1 |Q), I(X1 , X2 ; Y2 |Q)󶁑

for some pmf p(q)p(x1 |q)p(x2 |q) with |Q| ≤ 4.

Proof of the converse. The first two inequalities can be easily established. By symmetry
it suffices to show that R1 + R2 ≤ I(X1 , X2 ; Y2 |Q). Consider

n(R1 + R2 ) = H(M1 ) + H(M2 )

(a)
≤ I(M1 ; Y1n ) + I(M2 ; Y2n ) + nєn
(b)
= I(X1n ; Y1n ) + I(X2n ; Y2n ) + nєn
≤ I(X1n ; Y1n | X2n ) + I(X2n ; Y2n ) + nєn
(c)
≤ I(X1n ; Y2n | X2n ) + I(X2n ; Y2n ) + nєn
= I(X1n , X2n ; Y2n ) + nєn
n
≤ 󵠈 I(X1i , X2i ; Y2i ) + nєn
i=1
= nI(X1 , X2 ; Y2 |Q) + nєn ,

where (a) follows by Fano’s inequality and (b) follows since M j → X nj → Y jn for j = 1, 2
(by the independence of M1 and M2 ). Step (c) is established using the following.

Lemma .. For a DM-IC p(y1 , y2 |x1 , x2 ) with strong interference, I(X1n ; Y1n |X2n ) ≤
I(X1n ; Y2n |X2n ) for all (X1n , X2n ) ∼ p(x1n )p(x2n ) and all n ≥ 1.

This lemma can be proved by noting that the strong interference condition implies that
I(X1 ; Y1 |X2 , U) ≤ I(X1 ; Y2 |X2 , U) for all p(u)p(x1 |u)p(x2 |u) and using induction on n.
The other bound R1 + R2 ≤ I(X1 , X2 ; Y1 |Q) follows similarly, which completes the
proof of the theorem.

6.4 GAUSSIAN INTERFERENCE CHANNEL

Consider the -user-pair Gaussian interference channel depicted in Figure ., which is
138 Interference Channels

a simple model for a wireless interference channel or a DSL cable bundle. The channel
outputs corresponding to the inputs X1 and X2 are
Y1 = д11 X1 + д12 X2 + Z1 ,
Y2 = д21 X1 + д22 X2 + Z2 ,
where д jk , j, k = 1, 2, is the channel gain from sender k to receiver j, and Z1 ∼ N(0, N0 /2)
and Z2 ∼ N(0, N0 /2) are noise components. Assume average power constraint P on each
of X1 and X2 . We assume without loss of generality that N0 /2 = 1 and define the received
2 2
SNRs as S1 = д11 P and S2 = д22 P and the received interference-to-noise ratios (INRs) as
2 2
I1 = д12 P and I2 = д21 P.

д11
X1 Y1
д21

д12
X2 д22 Y2

Z2
Figure .. Gaussian interference channel.

The capacity region of the Gaussian IC is not known in general.

6.4.1 Inner Bounds

We specialize the inner bounds in Section . to the Gaussian case.
Time division with power control. Using time division and power control, we obtain the
time-division inner bound on the capacity region of the Gaussian IC that consists of all
rate pairs (R1 , R2 ) such that
R1 < α C(S1 /α),
R2 < ᾱ C(S2 /α)
̄
for some α ∈ [0, 1].
Treating interference as noise. Consider the inner bound in (.) subject to the power
constraints. By setting X1 ∼ N(0, P) , X2 ∼ N(0, P), and Q = , we obtain the inner bound
on the capacity region of the Gaussian IC consisting of all rate pairs (R1 , R2 ) such that
R1 < C󶀡S1 /(1 + I1 )󶀱,
R2 < C󶀡S2 /(1 + I2 )󶀱.
6.4 Gaussian Interference Channel 139

Note, however, that Gaussian input signals are not necessarily optimal when evaluating
the mutual information characterization in (.) under the power constraints. Also note
that the above inner bound can be further improved via time sharing and power control.
Simultaneous nonunique decoding. The inner bound in (.) subject to the power con-
straints is optimized by setting X1 ∼ N(0, P) , X2 ∼ N(0, P), and Q = . This gives the in-
ner bound on the capacity region of the Gaussian IC that consists of all rate pairs (R1 , R2 )
such that

R1 < C(S1 ),
R2 < C(S2 ),
R1 + R2 < min󶁁C(S1 + I1 ), C(S2 + I2 )󶁑.

Although this bound is again achieved using optimal point-to-point Gaussian codes, it
cannot be achieved in general via successive cancellation decoding.
The above inner bounds are compared in Figure . for symmetric Gaussian ICs with
SNRs S1 = S2 = S = 1 and increasing INRs I1 = I2 = I. When interference is weak (Fig-
ure .a), treating interference as noise can outperform time division and simultaneous
nonunique decoding, and is in fact sum-rate optimal as we show in Section ... As in-
terference becomes stronger (Figure .b), simultaneous nonunique decoding and time
division begin to outperform treating interference as noise. As interference becomes even
stronger, simultaneous nonunique decoding outperforms the other two coding schemes
(Figures .c,d), ultimately achieving the interference-free rate region consisting of all
rate pairs (R1 , R2 ) such that R1 < C1 and R2 < C2 (Figure .d).

6.4.2 Capacity Region of the Gaussian IC with Strong Interference

A Gaussian IC is said to have strong interference if |д21 | ≥ |д11 | and |д12 | ≥ |д22 |, or equiv-
alently, I2 ≥ S1 and I1 ≥ S2 .

Theorem .. The capacity region of the Gaussian IC with strong interference is the
set of rate pairs (R1 , R2 ) such that

R1 ≤ C(S1 ),
R2 ≤ C(S2 ),
R1 + R2 ≤ min󶁁C(S1 + I1 ), C(S2 + I2 )󶁑.

The proof of achievability follows by using simultaneous nonunique decoding. The

proof of the converse follows by noting that the above condition is equivalent to the strong
interference condition for the DM-IC in (.) and showing that X1 ∼ N(0, P) and X2 ∼
N(0, P) optimize the mutual information terms.
The nontrivial step is to show that the condition I2 ≥ S1 and I1 ≥ S2 is equivalent
to the condition I(X1 ; Y1 |X2 ) ≤ I(X1 ; Y2 |X2 ) and I(X2 ; Y2 |X1 ) ≤ I(X2 ; Y1 |X1 ) for every
140 Interference Channels

R1 R1
RTD RIAN RTD
RIAN

RSND
RSND

R2 R2
(a) I = 0.1 (b) I = 0.5

R1 R1
RTD RTD RSND
RSND

RIAN

R2 R2
(c) I = 1.1 (d) I = 5.5

Figure .. Comparison of time division (region RTD ), treating interference as noise
(region RIAN ), and simultaneous nonunique decoding (region RSND ) for S =  and
different values of I. Treating interference as noise achieves the sum-capacity for
case (a), while RSND is the capacity region for cases (c) and (d).

F(x1 )F(x2 ). If I2 ≥ S1 and I1 ≥ S2 , then it can be easily shown that the Gaussian BC from
X1 to (Y2 − д22 X2 , Y1 − д12 X2 ) given X2 is degraded and the Gaussian BC from X2 to (Y1 −
д11 X1 , Y2 − д21 X1 ) given X1 is degraded, and hence each is more capable. This proves one
direction of the equivalence. To prove the other direction, assume that h(д11 X1 + Z1 ) ≤
h(д21 X1 + Z2 ) and h(д22 X2 + Z2 ) ≤ h(д12 X2 + Z1 ). Substituting X1 ∼ N(0, P) and X2 ∼
N(0, P) shows that I2 ≥ S1 and I1 ≥ S2 , respectively.

Remark .. A Gaussian IC is said to have very strong interference if S2 ≤ I1 /(1 + S1 ) and
S1 ≤ I2 /(1 + S2 ). It can be shown that this condition is the same as the very strong in-
terference condition for the DM-IC in (.) when restricted to Gaussian inputs. Under
this condition, the capacity region is the set of rate pairs (R1 , R2 ) such that R1 ≤ C(S1 ) and
R2 ≤ C(S2 ) and hence interference does not impair communication.
6.4 Gaussian Interference Channel 141

6.4.3 Sum-Capacity of the Gaussian IC with Weak Interference

A Gaussian IC is said to have weak interference if for some ρ1 , ρ2 ∈ [0, 1],

󵀆I1 /S2 (1 + I2 ) ≤ ρ2 󵀆1 − ρ12 ,

(.)
󵀆I2 /S1 (1 + I1 ) ≤ ρ1 󵀆1 − ρ22 .

Under this weak interference condition, treating interference as noise is optimal for the
sum-rate.

Theorem .. The sum-capacity of the Gaussian IC with weak interference is

S1 S
Csum = C 󶀥 󶀵 + C󶀥 2 󶀵.
1 + I1 1 + I2

The interesting part of the proof is the converse. It involves the use of a genie to es-
tablish an upper bound on the sum-capacity. For simplicity of presentation, we consider
the symmetric case with I1 = I2 = I and S1 = S2 = S. In this case, the weak interference
condition in (.) reduces to
󵀄I/S (1 + I) ≤ 1 (.)
2
and the sum-capacity is Csum = 2 C(S/(1 + I)).
Proof of the converse. Consider the genie-aided Gaussian IC depicted in Figure . with
side information

T1 = 󵀄 I/P (X1 + ηW1 ),

T2 = 󵀄 I/P (X2 + ηW2 ),

where W1 ∼ N(0, 1) and W2 ∼ N(0, 1) are independent noise components with E(Z1W1 ) =
E(Z2W2 ) = ρ and η ≥ 0. Suppose that a genie reveals T1 to decoder  and T2 to decoder .
Clearly, the sum-capacity of this channel C̃sum ≥ Csum .
We first show that if η 2 I ≤ (1 − ρ2 )P (useful genie), then the sum-capacity of the
genie-aided channel is achieved by using Gaussian inputs and treating interference as
noise. We then show that if in addition, ηρ󵀄S/P = 1 + I (smart genie), then the sum-
capacity of the genie-aided channel is the same as that of the original channel. Since
C̃sum ≥ Csum , this also shows that Csum is achieved by using Gaussian inputs and treating
interference as noise. Using the second condition to eliminate η from the first condi-
tion gives 󵀄 I/S(1 + I) ≤ ρ󵀆1 − ρ2 . Taking ρ = 󵀄1/2, which maximizes the range of I,
gives the weak interference condition in the theorem. The proof steps involve properties
of differential entropy, including the maximum differential entropy lemma; the fact that
Gaussian is the worst noise with a given average power in an additive noise channel with
Gaussian input (see Problem .); and properties of jointly Gaussian random variables
(see Appendix B).
142 Interference Channels

Z1 ηW1

󵀄S/P
X1 Y1
󵀄I/P
󵀄 I/P T1

󵀄 I/P T2
󵀄 I/P
X2 Y2
󵀄S/P

Z2 ηW2

Figure .. Genie-aided Gaussian interference channel.

Let X1∗ and X2∗ be independent N(0, P) random variables, and Y1∗ , Y2∗ and T1∗ , T2∗
be the corresponding channel outputs and side information. Then we can establish the
following condition under which C̃sum is achieved by treating interference as Gaussian
noise.

Lemma . (Useful Genie). If η 2 I ≤ (1 − ρ2 )P, then the sum-capacity of the above
genie-aided channel is

C̃sum = I(X1∗ ; Y1∗ , T1∗ ) + I(X2∗ ; Y2∗ , T2∗ ).

The proof of this lemma is in Appendix A.

Remark .. If ρ = 0, η = 1, and I ≤ P, then the genie is always useful.

Continuing the proof of the converse, suppose that the following smart genie condition

ηρ󵀄S/P = 1 + I

holds. Note that combined with the (useful genie) condition for the lemma, the smart
genie gives the weak interference condition in (.). Now by the smart genie condition,

E(T1∗ | X1∗ , Y1∗ ) = E󶀡T1∗ 󵄨󵄨󵄨󵄨 X1∗ , 󵀄 I/P X2∗ + Z1 󶀱

= 󵀄 I/P X1∗ + η 󵀄 I/P E󶀡W1 󵄨󵄨󵄨󵄨 󵀄I/P X2∗ + Z1 󶀱
ηρ󵀄 I/P
= 󵀄 I/P X1∗ + 󶀡󵀄I/P X2∗ + Z1 󶀱
1+I
= 󵀄 I/S Y1∗
= E(T1∗ |Y1∗ ).
6.5 Han–Kobayashi Inner Bound 143

Since all random variables involved are jointly Gaussian, this implies that X1∗ → Y1∗ → T1∗
form a Markov chain, or equivalently, I(X1∗ ; T1∗ |Y1∗ ) = 0. Similarly I(X2∗ ; T2∗ |Y2∗ ) = 0.
Finally, by the useful genie lemma,
Csum ≤ C̃sum = I(X1∗ ; Y1∗ , T1∗ ) + I(X2∗ ; Y2∗ , T2∗ ) = I(X1∗ ; Y1∗ ) + I(X2∗ ; Y2∗ ).
This completes the proof of the converse.
Remark .. The idea of a genie providing each receiver with side information about
its intended codeword can be used to obtain outer bounds on the capacity region of the
general Gaussian IC; see Problem .. This same idea will be used also in the converse
proof for the injective deterministic IC in Section ..

6.5 HAN–KOBAYASHI INNER BOUND

The Han–Kobayashi inner bound is the best-known bound on the capacity region of the
DM-IC. It includes all the inner bounds we discussed so far, and is tight for all interference
channels with known capacity regions. We consider the following characterization of this
inner bound.

Theorem . (Han–Kobayashi Inner Bound). A rate pair (R1 , R2 ) is achievable for
the DM-IC p(y1 , y2 |x1 , x2 ) if

R1 < I(X1 ; Y1 |U2 , Q),

for some pmf p(q)p(u1 , x1 |q)p(u2 , x2 |q), where |U1 | ≤ |X1 | + 4, |U2 | ≤ |X2 | + 4, and
|Q| ≤ 6.

Remark 6.5. The Han–Kobayashi inner bound reduces to the interference-as-noise in-
ner bound in (.) by setting U1 = U2 = . At the other extreme, the Han–Kobayashi
inner bound reduces to the simultaneous-nonunique-decoding inner bound in (.) by
setting U1 = X1 and U2 = X2 . Thus, the bound is tight for the class of DM-ICs with strong
interference.
Remark 6.6. The Han–Kobayashi inner bound can be readily extended to the Gaussian
IC with average power constraints and evaluated using Gaussian (U j , X j ), j = 1, 2; see
Problem .. It is not known, however, if the restriction to the Gaussian distribution is
sufficient.
144 Interference Channels

6.5.1 Proof of the Han–Kobayashi Inner Bound

The proof uses rate splitting. We represent each message M j , j = 1, 2, by independent
“public” message M j0 at rate R j0 and “private” message M j j at rate R j j . Thus, R j = R j0 +
R j j . These messages are sent via superposition coding, whereby the cloud center U j
represents the public message M j0 and the satellite codeword X j represents the mes-
sage pair (M j0 , M j j ). The public messages are to be recovered by both receivers, while
each private message is to be recovered only by its intended receiver. We first show that
(R10 , R20 , R11 , R22 ) is achievable if

R11 < I(X1 ; Y1 |U1 , U2 , Q),

for some pmf p(q)p(u1 , x1 |q)p(u2 , x2 |q).

Codebook generation. Fix a pmf p(q)p(u1 , x1 |q)p(u2 , x2 |q). Generate a sequence q n ∼
∏ni=1 pQ (qi ). For j = 1, 2, randomly and conditionally independently generate 2nR 󰑗0 se-
quences unj (m j0 ), m j0 ∈ [1 : 2nR 󰑗0 ], each according to ∏ni=1 pU 󰑗 |Q (u ji |qi ). For each m j0 ,
randomly and conditionally independently generate 2nR 󰑗 󰑗 sequences x nj (m j0 , m j j ), m j j ∈
[1 : 2nR 󰑗 󰑗 ], each according to ∏ni=1 p X 󰑗 |U 󰑗 ,Q (x ji |u ji (m j0 ), qi ).

Encoding. To send m j = (m j0 , m j j ), encoder j = 1, 2 transmits x nj (m j0 , m j j ).

Decoding. We use simultaneous nonunique decoding. Upon receiving y1n , decoder  finds
the unique message pair (m ̂ 10 , m
̂ 11 ) such that (q n , u1n (m
̂ 10 ), un2 (m20 ), x1n (m
̂ 10 , m
̂ 11 ), y1n ) ∈
(n)
Tє for some m20 ∈ [1 : 2 ]; otherwise it declares an error. Decoder  finds the message
nR20

̂ 20 , m
pair (m ̂ 22 ) similarly.
Analysis of the probability of error. Assume message pair ((1, 1), (1, 1)) is sent. We
bound the average probability of error for each decoder. First consider decoder . As
shown in Table ., we have eight cases to consider (here conditioning on qn is suppressed).
Cases  and , and  and , respectively, share the same pmf, and case  does not cause an
error. Thus, we are left with only five error events and decoder  makes an error only if
one or more of the following events occur:

E10 = 󶁁(Q n , U1n (1), U2n (1), X1n (1, 1), Y1n ) ∉ Tє(n) 󶁑,
E11 = 󶁁(Q n , U1n (1), U2n (1), X1n (1, m11 ), Y1n ) ∈ Tє(n) for some m11 ̸= 1󶁑,
E12 = 󶁁(Q n , U1n (m10 ), U2n (1), X1n (m10 , m11 ), Y1n ) ∈ Tє(n) for some m10 ̸= 1, m11 󶁑,
6.6 Injective Deterministic IC 145

m10 m20 m11 Joint pmf

    p(un1 , x1n )p(u2n )p(y1n |x1n , u2n )
   ∗ p(u1n , x1n )p(u2n )p(y1n |un1 , u2n )
 ∗  ∗ p(u1n , x1n )p(u2n )p(y1n |u2n )
 ∗   p(u1n , x1n )p(u2n )p(y1n |u2n )
  ∗ ∗ p(u1n , x1n )p(u2n )p(y1n |u1n )
 ∗ ∗  p(u1n , x1n )p(u2n )p(y1n )
 ∗ ∗ ∗ p(u1n , x1n )p(u2n )p(y1n )
  ∗  p(u1n , x1n )p(u2n )p(y1n |x1n )

Table .. The joint pmfs induced by different (m10 , m20 , m11 ) triples.

E13 = 󶁁(Q n , U1n (1), U2n (m20 ), X1n (1, m11 ), Y1n ) ∈ Tє(n) for some m20 ̸= 1, m11 ̸= 1󶁑,
E14 = 󶁁(Q n , U1n (m10 ), U2n (m20 ), X1n (m10 , m11 ), Y1n ) ∈ Tє(n)
for some m10 ̸= 1, m20 ̸= 1, m11 󶁑.
Hence, the average probability of error for decoder  is upper bounded as
P(E1 ) ≤ P(E10 ) + P(E11 ) + P(E12 ) + P(E13 ) + P(E14 ).
We bound each term. By the LLN, P(E10 ) tends to zero as n → ∞. By the packing
lemma, P(E11 ) tends to zero as n → ∞ if R11 < I(X1 ; Y1 |U1 , U2 , Q) − δ(є). Similarly,
by the packing lemma, P(E12 ), P(E13 ), and P(E14 ) tend to zero as n → ∞ if the con-
ditions R11 + R10 < I(X1 ; Y1 |U2 , Q) − δ(є), R11 + R20 < I(X1 , U2 ; Y1 |U1 , Q) − δ(є), and
R11 + R10 + R20 < I(X1 , U2 ; Y1 |Q) − δ(є) are satisfied, respectively. The average probability
of error for decoder  can be bounded similarly. Finally, substituting R11 = R1 − R10 and
R22 = R2 − R20 , and using the Fourier–Motzkin procedure with the constraints 0 ≤ R j0 ≤
R j , j = 1, 2, to eliminate R10 and R20 (see Appendix D for the details), we obtain the seven
inequalities in Theorem . and two additional inequalities R1 < I(X1 ; Y1 |U1 , U2 , Q) +
I(X2 , U1 ; Y2 |U2 , Q) and R2 < I(X1 , U2 ; Y1 |U1 , Q) + I(X2 ; Y2 |U1 , U2 , Q). The correspond-
ing inner bound can be shown to be equivalent to the inner bound in Theorem .; see
Problem .. The cardinality bound on Q can be proved using the convex cover method
in Appendix C. This completes the proof of the Han–Kobayashi inner bound.

6.6 INJECTIVE DETERMINISTIC IC

Consider the deterministic interference channel depicted in Figure .. The channel out-
puts are given by the functions
Y1 = y1 (X1 , T2 ),
Y2 = y2 (X2 , T1 ),
146 Interference Channels

X1 y1 (x1 , t2 ) Y1
T2
t1 (x1 )

t2 (x2 )
T1
X2 y2 (x2 , t1 ) Y2

Figure .. Injective deterministic interference channel.

where T1 = t1 (X1 ) and T2 = t2 (X2 ) are functions of X1 and X2 , respectively. We assume

that the functions y1 and y2 are injective in t1 and t2 , respectively, that is, for every x1 ∈ X1 ,
y1 (x1 , t2 ) is a one-to-one function of t2 and for every x2 ∈ X2 , y2 (x2 , t1 ) is a one-to-one
function of t1 . Note that these conditions imply that H(Y1 |X1 ) = H(T2 ) and H(Y2 |X2 ) =
H(T1 ).
This class of interference channels is motivated by the Gaussian IC, where the func-
tions y1 and y2 are additions. Unlike the Gaussian IC, however, the channel is noiseless
and its capacity region can be fully characterized.

Theorem .. The capacity region of the injective deterministic interference channel
is the set of rate pairs (R1 , R2 ) such that

R1 ≤ H(Y1 |T2 , Q),

for some pmf p(q)p(x1 |q)p(x2 |q).

The proof of achievability follows by noting that the above region coincides with the
Han–Kobayashi inner bound (take U1 = T1 , U2 = T2 ).
Remark .. By the one-to-one conditions on the functions y1 and y2 , decoder  knows
T2n after decoding for X1n and decoder  knows T1n after decoding for X2n . As such, the
interference random variables T1 and T2 can be naturally considered as the auxiliary ran-
dom variables that represent the public messages in the Han–Kobayashi scheme.

Proof of the converse. Consider the first two inequalities in the characterization of the
6.6 Injective Deterministic IC 147

capacity region. By specializing the outer bound in (.), we obtain

nR1 ≤ nI(X1 ; Y1 | X2 , Q) + nєn = nH(Y1 |T2 , Q) + nєn ,
nR2 ≤ nH(Y2 |T1 , Q) + nєn ,
where Q is the usual time-sharing random variable.
Now consider the third inequality. By Fano’s inequality,
n(R1 + R2 ) ≤ I(M1 ; Y1n ) + I(M2 ; Y2n ) + nєn
(a)
≤ I(M1 ; Y1n ) + I(M2 ; Y2n , T2n ) + nєn
≤ I(X1n ; Y1n ) + I(X2n ; Y2n , T2n ) + nєn
(b)
≤ I(X1n ; Y1n ) + I(X2n ; T2n , Y2n |T1n ) + nєn
= H(Y1n ) − H(Y1n | X1n ) + I(X2n ; T2n |T1n ) + I(X2n ; Y2n |T1n , T2n ) + nєn
(c)
= H(Y1n ) + H(Y2n |T1n , T2n ) + nєn
n
≤ 󵠈󶀡H(Y1i ) + H(Y2i |T1i , T2i )󶀱 + nєn
i=1
= n󶀡H(Y1 |Q) + H(Y2 |T1 , T2 , Q)󶀱 + nєn .
Here step (a) is the key step in the proof. Even if a “genie” gives receiver 2 its common
message T2 as side information to help it find X2 , the capacity region does not change! Step
(b) follows by the fact that X2n and T1n are independent, and (c) follows by the equalities
H(Y1n |X1n ) = H(T2n ) and I(X2n ; T2n |T1n ) = H(T2n ). Similarly, for the fourth inequality,
n(R1 + R2 ) ≤ n󶀡H(Y2 |Q) + H(Y1 |T1 , T2 , Q)󶀱 + nєn .
Consider the fifth inequality
n(R1 + R2 ) ≤ I(X1n ; Y1n ) + I(X2n ; Y2n ) + nєn
= H(Y1n ) − H(Y1n | X1n ) + H(Y2n ) − H(Y2n | X1n ) + nєn
(a)
= H(Y1n ) − H(T2n ) + H(Y2n ) − H(T1n ) + nєn
≤ H(Y1n |T1n ) + H(Y2n |T2n ) + nєn
≤ n󶀡H(Y1 |T1 , Q) + H(Y2 |T2 , Q)󶀱 + nєn ,
where (a) follows by the one-to-one conditions of the injective deterministic IC. Following
similar steps, consider the sixth inequality
n(2R1 + R2 ) ≤ 2I(M1 ; Y1n ) + I(M2 ; Y2n ) + nєn
≤ I(X1n ; Y1n ) + I(X1n ; Y1n , T1n |T2n ) + I(X2n ; Y2n ) + nєn
= H(Y1n ) − H(T2n ) + H(T1n ) + H(Y1n |T1n , T2n ) + H(Y2n ) − H(T1n ) + nєn
= H(Y1n ) − H(T2n ) + H(Y1n |T1n , T2n ) + H(Y2n ) + nєn
≤ H(Y1n ) + H(Y1n |T1n , T2n ) + H(Y2n |T2n ) + nєn
≤ n󶀡H(Y1 |Q) + H(Y1 |T1 , T2 , Q) + H(Y2 |T2 , Q)󶀱 + nєn .
148 Interference Channels

Similarly, for the last inequality, we have

n(R1 + 2R2 ) ≤ n󶀡H(Y1 |T1 , Q) + H(Y2 |Q) + H(Y2 |T1 , T2 , Q)󶀱 + nєn .

This completes the proof of the converse.

6.7 CAPACITY REGION OF THE GAUSSIAN IC WITHIN HALF A BIT

As we have seen, the capacity (region) of the Gaussian IC is known only under certain
strong and weak interference conditions and is achieved by extreme special cases of the
Han–Kobayashi scheme where no rate splitting is used. How close is the Han–Kobayashi
inner bound in its full generality to the capacity region?
We show that even a suboptimal evaluation of the Han–Kobayashi inner bound differs
by no more than half a bit per rate component from the capacity region, independent of
the channel parameters! We prove this result by first establishing bounds on the capac-
ity region of a class of semideterministic ICs that include both the Gaussian IC and the
injective deterministic IC in Section . as special cases.

6.7.1 Injective Semideterministic IC

Consider the semideterministic interference channel depicted in Figure .. Here again
the functions y1 , y2 satisfy the condition that for every x1 ∈ X1 , y1 (x1 , t2 ) is a one-to-one
function of t2 and for every x2 ∈ X2 , y2 (x2 , t1 ) is a one-to-one function of t1 . The gener-
alization comes from making the mappings from X1 to T1 and from X2 to T2 random.

X1 y1 (x1 , t2 ) Y1
T2
p(t1 |x1 )

p(t2 |x2 )
T1
X2 y2 (x2 , t1 ) Y2

Figure .. Injective semideterministic interference channel.

Note that if we assume the channel variables to be real-valued instead of finite, the
Gaussian IC becomes a special case of this semideterministic IC with T1 = д21 X1 + Z2
and T2 = д12 X2 + Z1 .
Outer bound on the capacity region. Consider the following outer bound on the capacity
region of the injective semideterministic IC.
6.7 Capacity Region of the Gaussian IC within Half a Bit 149

Proposition .. Any achievable rate pair (R1 , R2 ) for the injective semideterministic
IC must satisfy the inequalities

R1 ≤ H(Y1 | X2 , Q) − H(T2 | X2 ),
R2 ≤ H(Y2 | X1 , Q) − H(T1 | X1 ),
R1 + R2 ≤ H(Y1 |Q) + H(Y2 |U2 , X1 , Q) − H(T1 | X1 ) − H(T2 | X2 ),
R1 + R2 ≤ H(Y1 |U1 , X2 , Q) + H(Y2 |Q) − H(T1 | X1 ) − H(T2 | X2 ),
R1 + R2 ≤ H(Y1 |U1 , Q) + H(Y2 |U2 , Q) − H(T1 | X1 ) − H(T2 | X2 ),
2R1 + R2 ≤ H(Y1 |Q) + H(Y1 |U1 , X2 , Q) + H(Y2 |U2 , Q) − H(T1 | X1 ) − 2H(T2 | X2 ),
R1 + 2R2 ≤ H(Y2 |Q) + H(Y2 |U2 , X1 , Q) + H(Y1 |U1 , Q) − 2H(T1 | X1 ) − H(T2 | X2 )

for some pmf p(q)p(x1 |q)p(x2 |q)pT1 |X1 (u1 |x1 )pT2 |X2 (u2 |x2 ).

This outer bound is established by extending the proof of the converse for the injective
deterministic IC. We again use a genie argument with U j conditionally independent of T j
given X j , j = 1, 2. The details are given in Appendix B.

Remark 6.8. If we replace each channel p(t j |x j ), j = 1, 2, with a deterministic function

t j (X j ), the above outer bound reduces to the capacity region of the injective deterministic
IC in Theorem . by setting U j = T j , j = 1, 2.
Remark 6.9. The above outer bound is not tight under the strong interference condition
in (.), and tighter outer bounds can be established.
Remark 6.10. We can obtain a corresponding outer bound for the Gaussian IC with dif-
ferential entropies in place of entropies in the above outer bound.

Inner bound on the capacity region. The Han–Kobayashi inner bound with the restric-
tion that p(u1 , u2 |q, x1 , x2 ) = pT1 |X1 (u1 |x1 ) pT2 |X2 (u2 |x2 ) reduces to the following.

Proposition .. A rate pair (R1 , R2 ) is achievable for the injective semideterministic
IC if

R1 < H(Y1 |U2 , Q) − H(T2 |U2 , Q),

R1 + 2R2 < H(Y2 |Q) + H(Y2 |U1 , U2 , Q) + H(Y1 |U1 , Q)

− 2H(T1 |U1 , Q) − H(T2 |U2 , Q)

for some pmf p(q)p(x1 |q)p(x2 |q)pT1 |X1 (u1 |x1 )pT2 |X2 (u2 |x2 ).

Considering the Gaussian IC, we obtain a corresponding inner bound with differen-
tial entropies in place of entropies. This inner bound coincides with the outer bound for
the injective deterministic interference channel discussed in Section ., where T1 is a
deterministic function of X1 and T2 is a deterministic function of X2 (thus U1 = T1 and
U2 = T2 ).
Gap between the inner and outer bounds. For a fixed (Q, X1 , X2 ) ∼ p(q)p(x1 |q)p(x2 |q),
let Ro (Q, X1 , X2 ) be the region defined by the set of inequalities in Proposition ., and
let Ri (Q, X1 , X2 ) denote the closure of the region defined by the set of inequalities in
Proposition ..

Lemma .. If (R1 , R2 ) ∈ Ro (Q, X1 , X2 ), then

󶀡R1 − I(X2 ; T2 |U2 , Q), R2 − I(X1 ; T1 |U1 , Q)󶀱 ∈ Ri (Q, X1 , X2 ).

To prove this lemma, we first construct the rate region R o (Q, X1 , X2 ) from the outer
bound Ro (Q, X1 , X2 ) by replacing X j in every positive conditional entropy term in Ro (Q,
X1 , X2 ) with U j for j = 1, 2. Clearly R o (Q, X1 , X2 ) ⊇ Ro (Q, X1 , X2 ). Observing that

I(X j ; T j |U j ) = H(T j |U j ) − H(T j | X j ), j = 1, 2,

and comparing the rate region R o (Q, X1 , X2 ) to the inner bound Ri (Q, X1 , X2 ), we see
that R o (Q, X1 , X2 ) can be equivalently characterized as the set of rate pairs (R1 , R2 ) that
satisfy the statement in Lemma ..

6.7.2 Half-Bit Theorem for the Gaussian IC

We show that the outer bound in Proposition ., when specialized to the Gaussian IC,
is achievable within half a bit per dimension. For the Gaussian IC, the auxiliary random
variables in the outer bound can be expressed as

U1 = д21 X1 + Z2󳰀
(.)
U2 = д12 X2 + Z1󳰀 ,

where Z1󳰀 and Z2󳰀 are N(0, 1), independent of each other and of (X1 , X2 , Z1 , Z2 ). Substitut-
ing in the outer bound in Proposition ., we obtain an outer bound Ro on the capacity
6.7 Capacity Region of the Gaussian IC within Half a Bit 151

region of the Gaussian IC that consists of all rate pairs (R1 , R2 ) such that

R1 ≤ C(S1 ),
R2 ≤ C(S2 ),
S
R1 + R2 ≤ C 󶀥 1 󶀵 + C 󶀡I2 + S2 󶀱 ,
1 + I2
S
R1 + R2 ≤ C 󶀥 2 󶀵 + C 󶀡I1 + S1 󶀱 ,
1 + I1 (.)
S +I +I I S +I +I I
R1 + R2 ≤ C 󶀥 1 1 1 2 󶀵 + C 󶀥 2 2 1 2 󶀵 ,
1 + I2 1 + I1
S S +I +I I
2R1 + R2 ≤ C 󶀥 1 󶀵 + C 󶀡S1 + I1 󶀱 + C 󶀥 2 2 1 2 󶀵 ,
1 + I2 1 + I1
S S +I +I I
R1 + 2R2 ≤ C 󶀥 2 󶀵 + C 󶀡S2 + I2 󶀱 + C 󶀥 1 1 1 2 󶀵 .
1 + I1 1 + I2

Now we show that Ro is achievable with half a bit.

Theorem . (Half-Bit Theorem). For the Gaussian IC, if (R1 , R2 ) ∈ Ro , then (R1 −
1/2, R2 − 1/2) is achievable.

To prove this theorem, consider Lemma . for the Gaussian IC with the auxiliary
random variables in (.). Then, for j = 1, 2,

I(X j ; T j |U j , Q) = h(T j |U j , Q) − h(T j |U j , X j , Q)

≤ h(T j − U j ) − h(Z j )
1
= .
2

6.7.3 Symmetric Degrees of Freedom

Consider the symmetric Gaussian IC with S1 = S2 = S and I1 = I2 = I. Note that S and
I fully characterize the channel. Define the symmetric capacity of the channel as Csym =
max{R : (R, R) ∈ C } and the normalized symmetric capacity as

Csym
dsym = .
C(S)
∗
We find the symmetric degrees of freedom (DoF) dsym , which is the limit of dsym as the SNR
and INR approach infinity. Note that in taking the limit, we are considering a sequence
of channels rather than any particular channel. This limit, however, sheds light on the
optimal coding strategies under different regimes of high SNR/INR.
152 Interference Channels

Specializing the outer bound Ro in (.) to the symmetric case yields

1 S 1 S + I + I2
Csym ≤ C sym = min󶁄C (S) , C󶀤 󶀴 + C (S + I) , C 󶀦 󶀶,
2 1+I 2 1+I
2 S 1
C󶀤 󶀴 + C(S + 2I + I 2 )󶁔.
3 1+I 3

By the half-bit theorem,

C sym − 1/2 C sym
≤ dsym ≤ .
C(S) C(S)

Thus, the difference between the upper and lower bounds converges to zero as S → ∞,
∗
and the normalized symmetric capacity converges to the degrees of freedom dsym . This
limit, however, depends on how I scales as S → ∞. Since it is customary to measure
SNR and INR in decibels (dBs), we consider the limit for a constant ratio between the
logarithms of the INR and SNR
log I
α= ,
log S

or equivalently, I = S α . Then, as S → ∞, the normalized symmetric capacity dsym con-

verges to

∗
C sym |I=S 󰛼
dsym (α) = lim
S→∞ C(S)
= min󶁁1, max{α/2, 1 − α/2}, max{α, 1 − α},
max{2/3, 2α/3} + max{1/3, 2α/3} − 2α/3󶁑.

Since the fourth bound inside the minimum is redundant, we have

∗
dsym (α) = min󶁁1, max{α/2, 1 − α/2}, max{α, 1 − α}󶁑. (.)

The symmetric DoF as a function of α is plotted in Figure .. Note the unexpected W
(instead of V) shape of the DoF curve. When interference is negligible (α ≤ 1/2), the DoF
is 1 − α and corresponds to the limit of the normalized rates achieved by treating inter-
ference as noise. For strong interference (α ≥ 1), the DoF is min{1, α/2} and corresponds
to simultaneous decoding. In particular, when interference is very strong (α ≥ 2), it does
not impair the DoF. For moderate interference (1/2 ≤ α ≤ 1), the DoF corresponds to the
Han–Kobayashi rate splitting; see Problem .. However, the DoF first increases until
α = 2/3 and then decreases to 1/2 as α is increased to 1. Note that for α = 1/2 and α = 1,
time division is also optimal.
Remark .. In the above analysis, we scaled the channel gains under a fixed power con-
straint. Alternatively, we can fix the channel gains and scale the power P to infinity. It is
not difficult to see that under this high power regime, limP→∞ d ∗ = 1/2, regardless of the
values of the channel gains. Thus time division is asymptotically optimal.
6.8 Deterministic Approximation of the Gaussian IC 153

∗
dsym

2/3
1/2

α
1/2 2/3 1 2

Figure .. Degrees of freedom for symmetric Gaussian IC versus α = log I/ log S.

6.8 DETERMINISTIC APPROXIMATION OF THE GAUSSIAN IC

We introduce the q-ary expansion deterministic (QED) interference channel and show
that it closely approximates the Gaussian IC in the limit of high SNR. The inputs to the
QED-IC are q-ary L-vectors X1 and X2 for some “q-ary digit pipe” number L. We express
X1 as [X1,L−1 , X1,L−2 , X1,L−3 , . . . , X10 ]T , where X1l ∈ [0 : q − 1] for l ∈ [0 : L − 1], and sim-
ilarly for X2 . Consider the symmetric case where the interference is specified by the pa-
rameter α ∈ [0, 2] such that αL is an integer. Define the “shift” parameter s = (α − 1)L.
The output of the channel depends on whether the shift is negative or positive.
Downshift. Here s < 0, i.e., 0 ≤ α < 1, and Y1 is a q-ary L-vector with

X1l if L + s ≤ l ≤ L − 1,
Y1l = 󶁇
X1l + X2,l−s (mod q) if 0 ≤ l ≤ L + s − 1.

This case is depicted in Figure .. The outputs of the channel can be represented as

Y1 = X1 + Gs X2 ,
(.)
Y2 = Gs X1 + X2 ,

where Gs is an L × L (down)shift matrix with Gs ( j, k) = 1 if k = j − s and Gs ( j, k) = 0,

otherwise.
Upshift. Here s ≥ 0, i.e., 1 ≤ α ≤ 2, and Y1 is a q-ary αL-vector with

󶀂
󶀒 X2,l−s if L ≤ l ≤ L + s − 1,
󶀒
Y1l = 󶀊 X1l + X2,l−s (mod q) if s ≤ l ≤ L − 1,
󶀒
󶀒
󶀚X
1l if 0 ≤ l ≤ s − 1.

Again the outputs of the channel can be represented as in (.), where Gs is now an
(L + s) × L (up)shift matrix with Gs ( j, k) = 1 if j = k and Gs ( j, k) = 0, otherwise.
The capacity region of the symmetric QED-IC can be obtained by a straightforward
154 Interference Channels

X1,L−1 Y1,L−1
X1,L−2 Y1,L−2

X1,L+s Y1,L+s
X1,L+s−1 Y1,L+s−1

X1,1 Y1,1
X1,0 Y1,0

X2,L−1

X2,1−s
X2,−s

X2,1
X2,0

Figure .. The q-ary expansion deterministic interference channel with downshift.

evaluation of the capacity region of the injective deterministic IC in Theorem .. Let
R󳰀j = R j /(L log q), j = 1, 2. The normalized capacity region C 󳰀 is the set of rate pairs
(R1󳰀 , R2󳰀 ) such that

R1󳰀 ≤ 1,
R2󳰀 ≤ 1,
R1󳰀 + R2󳰀 ≤ max{2α, 2 − 2α},
(.)
R1󳰀 + R2󳰀 ≤ max{α, 2 − α},
2R1󳰀 + R2󳰀 ≤ 2,
R1󳰀 + 2R2󳰀 ≤ 2

for α ∈ [1/2, 1], and

R1󳰀 ≤ 1,
R2󳰀 ≤ 1,
(.)
R1󳰀 + R2󳰀 ≤ max{2α, 2 − 2α},
R1󳰀 + R2󳰀 ≤ max{α, 2 − α}

for α ∈ [0, 1/2) ∪ (1, 2].

6.8 Deterministic Approximation of the Gaussian IC 155

Surprisingly, the capacity region of the symmetric QED-IC can be achieved error-free
using a simple single-letter linear coding scheme. We illustrate this scheme for the nor-
󳰀
malized symmetric capacity Csym = max{R : (R, R) ∈ C 󳰀 }. Encoder j = 1, 2 represents its
󳰀
“single-letter” message by a q-ary LCsym -vector U j and transmits X j = AU j , where A is an
󳰀
L × LCsym q-ary matrix A. Decoder j multiplies its received symbol Y j by a correspond-
󳰀
ing LCsym × L matrix B to recover U j perfectly! For example, consider a binary expansion
deterministic IC with q = 2, L = 12, and α = 5/6. The symmetric capacity for this case is
Csym = 7 bits/transmission. For encoding, we use the matrix
1 0 0 0 0 0 0
󶀄
󶀔0 1 0 0 0 0 0󶀅
󶀕
󶀔0 0󶀕
󶀔 0 1 0 0 0 󶀕
󶀔 󶀕
󶀔0 0 0 1 0 0 0󶀕
󶀔 󶀕
󶀔0 0 0 0 1 0 0󶀕
󶀔 󶀕
󶀔0 0 1 0 0 0 0󶀕
A=󶀔
󶀔0
󶀕.
󶀔 1 0 0 0 0 0󶀕
󶀕
󶀔1 0󶀕
󶀔 0 0 0 0 0 󶀕
󶀔 󶀕
󶀔0 0 0 0 0 0 0󶀕
󶀔 󶀕
󶀔0 0 0 0 0 0 0󶀕
󶀔 󶀕
󶀔
󶀜0 0 0 0 0 1 0󶀕
󶀝
0 0 0 0 0 0 1
Note that the first  bits of U j are sent twice, while X1,2 = X1,3 = 0. The transmitted symbol
X j and the two signal components of the received vector Y j are illustrated in Figure ..
Decoding for U1 can also be performed sequentially as follows (see Figure .):
. U1,6 = Y1,11 , U1,5 = Y1,10 , U1,1 = Y1,1 , and U1,0 = Y1,0 . Also U2,6 = Y1,3 and U2,5 = Y1,2 .
. U1,4 = Y1,9 ⊕ U2,6 and U2,4 = Y1,4 ⊕ U1,6 .
. U1,3 = Y1,8 ⊕ U2,5 and U1,2 = Y1,7 ⊕ U2,4 .
This decoding procedure corresponds to multiplying the output by the matrix
1 0 0 0 0 0 0 0 0 0 0 0
󶀄
󶀔0 1 0 0 0 0 0 0 0 0 0 0󶀅
󶀕
󶀔0 0󶀕
󶀔 0 1 0 0 0 0 0 0 1 0 󶀕
󶀔 󶀕
B = 󶀔0 0 0 1 0 0 0 0 1 0 0 0󶀕 .
󶀔 󶀕
󶀔1 0 0 0 1 0 0 1 0 0 0 0󶀕
󶀔 󶀕
󶀔
󶀜0 0 0 0 0 0 0 0 0 0 1 0󶀕
󶀝
0 0 0 0 0 0 0 0 0 0 0 1
Note that BA = I and BGs A = 0, and hence interference is canceled out while the intended
signal is recovered perfectly.
Under the choice of the input X j = AU j , j = 1, 2, where U j is uniformly distributed
󳰀
over the set of binary vectors of length LCsym , the symmetric capacity can be expressed as
Csym = H(U j ) = I(U j ; Y j ) = I(X j ; Y j ), j = 1, 2.
156 Interference Channels

U1,6
⃝

U1,5
U1,4 ⃝

U1,3
⃝

U1,2
U1,4
U1,5
U1,6 ⃝

0
⃝

0
U1,1
⃝

U1,0

X1 Y1 = X1 + shifted X2
Figure .. Transmitted symbol X j and the received vector Y j . The circled numbers
denote the order of decoding.

Hence, the symmetric capacity is achieved error-free simply by treating interference as

noise! In fact, the same linear coding technique can achieve the entire capacity region,
which is generally characterized as the set of rate pairs (R1 , R2 ) such that

R1 < I(X1 ; Y1 ),
(.)
R2 < I(X2 ; Y2 )

for some pmf p(x1 )p(x2 ). A similar linear coding technique can be readily developed for
any q-ary alphabet, dimension L, and α ∈ [0, 2] that achieves the entire capacity region
by treating interference as noise.

6.8.1* QED-IC Approximation of the Gaussian IC

Considering the normalized capacity region characterization in (.) and (.), we can
show that the normalized symmetric capacity of the symmetric QED-IC is
󳰀
Csym = min󶁁1, max{α/2, 1 − α/2}, max{α, 1 − α}󶁑 (.)
∗
for α ∈ [0, 2]. This matches the DoF dsym (α) of the symmetric Gaussian IC in (.) ex-
actly. It can be shown that the Gaussian IC can be closely approximated by a QED-IC.
Therefore, if a normalized rate pair is achievable for the QED-IC, then it is achievable for
the corresponding Gaussian IC in the high SNR/INR limit, and vice versa.
We only sketch the proof that achievability carries over from the QED-IC to the Gauss-
ian IC. Consider the q-ary expansions of the inputs and outputs of the Gaussian IC,
e.g., X1 = X1,L−1 X1,L−2 ⋅ ⋅ ⋅ X1,1 X1,0 . X1,−1 X1,−2 ⋅ ⋅ ⋅ , where X1l ∈ [0 : q − 1] are q-ary digits.
6.9 Extensions to More than Two User Pairs 157

Assuming P = 1, we express the channel outputs as Y1 = 󵀂SX1 + 󵀂 I X2 + Z1 and Y2 =

󵀂 I X1 + 󵀂SX2 + Z2 . Suppose that 󵀂S and 󵀂 I are powers of q. Then the digits of X1 , X2 ,
Y1 , and Y2 align with each other. We further assume that the noise Z1 is peak-power-
constrained. Then, only the least-significant digits of Y1 are affected by the noise. These
digits are considered unusable for transmission. Now, we restrict each input digit to
values from [0 : ⌊(q − 1)/2⌋]. Thus, the signal additions at the q-ary digit-level are in-
dependent of each other, that is, there are no carry-overs, and the additions are effec-
tively modulo-q. Note that this assumption does not affect the rate significantly because
log 󶀡⌊(q − 1)/2⌋󶀱 / log q can be made arbitrarily close to one by choosing q sufficiently
large. Under the above assumptions, we arrive at a QED-IC, whereby the (random cod-
ing) achievability proof for rate pairs in (.) carries over to the Gaussian IC.
Remark .. Recall that the capacity region of the QED-IC can be achieved by a simple
single-letter linear coding technique (treating interference as noise) without using the full
Han–Kobayashi coding scheme. Hence, the approximate capacity region and the DoF
of the Gaussian IC can be both achieved simply by treating interference as noise. The
resulting approximation gap, however, is significantly larger than half a bit.

6.9 EXTENSIONS TO MORE THAN TWO USER PAIRS

Interference channels with more than two user pairs are far less understood. For example,
the notion of strong interference does not seem to naturally extend to more than two user
pairs. These channels also exhibit the interesting property that decoding at each receiver is
impaired by the joint effect of interference from the other senders rather by each sender’s
signal separately. Consequently, coding schemes that deal directly with the effect of the
combined interference signal are expected to achieve higher rates. One such coding scheme
is interference alignment, whereby the code is designed so that the combined interfering
signal at each receiver is confined (aligned) to a subset of the receiver signal space. The
subspace that contains the combined interference is discarded, while the desired signal is
reconstructed from the orthogonal subspace. We illustrate this scheme in the following
example.
Example . (k-User-pair symmetric QED-IC). Consider the k-user-pair QED-IC
Y j = X j + Gs 󵠈 X j 󳰀 , j ∈ [1 : k],
j 󳰀 ̸= j

where X1 , . . . , Xk are q-ary L vectors, Y1 , . . . , Yk are q-ary Ls vectors, Ls = max{L, L + s},

and Gs is the Ls × L s-shift matrix for some s ∈ [−L, L]. As before, let α = (L + s)/L. If
α = 1, then the received signals are identical and the normalized symmetric capacity is
󳰀
Csym = 1/k, which is achieved via time division. However, if α ̸= 1, then the normalized
symmetric capacity is
󳰀
Csym = min󶁁1, max{α, 1 − α}, max{α/2, 1 − α/2}󶁑,
which is equal to the normalized symmetric capacity for the -user-pair case, regardless
158 Interference Channels

of k! To show this, consider the single-letter linear coding technique described earlier
for the -user-pair case. Then it is easy to check that the symmetric capacity is achiev-
able (error-free), since the interfering signals from other senders are aligned in the same
subspace and can be filtered out simultaneously.

Using the same approximation procedure detailed for the -user-pair case, this deter-
ministic IC example shows that the DoF of the symmetric k-user-pair Gaussian IC is

∗ 1/k if α = 1,
dsym (α) = 󶁇
min󶁁1, max{α, 1 − α}, max{α/2, 1 − α/2}󶁑 otherwise.

The DoF is achieved simply by treating interference as noise with a carefully chosen input
pmf.

SUMMARY

∙ Discrete memoryless interference channel (DM-IC)

∙ Simultaneous nonunique decoding is optimal under strong interference
∙ Coded time sharing can strictly outperform time sharing
∙ Han–Kobayashi coding scheme:
∙ Rate splitting and superposition coding
∙ Fourier–Motzkin elimination
∙ Optimal for injective deterministic ICs
∙ Gaussian interference channel:
∙ Capacity region under strong interference achieved via simultaneous decoding
∙ Sum-capacity under weak interference achieved by treating interference as noise
∙ Genie-based converse proof
∙ Han–Kobayashi coding scheme achieves within half a bit of the capacity region
∙ Symmetric degrees of freedom
∙ Approximation by the q-ary expansion deterministic IC in high SNR
∙ Interference alignment
∙ Open problems:
6.1. What is the capacity region of the Gaussian IC with weak interference?
6.2. What is the generalization of strong interference to three or more user pairs?
Bibliographic Notes 159

6.3. What is the capacity region of the -user-pair injective deterministic IC?
6.4. Is the Han–Kobayashi inner bound tight in general?

BIBLIOGRAPHIC NOTES

The interference channel was first studied by Ahlswede (), who established basic inner
and outer bounds including the simultaneous decoding inner bound in (.). The outer
bound in (.) is based on a simple observation by L. Coviello that improves upon the
outer bound in Sato () by reversing the order of the union (over the input pmfs)
and the intersection (over the channel pmfs). Carleial () introduced the notion of
very strong interference for the Gaussian IC and showed that the capacity region is the
intersection of the capacity regions for the two component Gaussian MACs. The capacity
region of the Gaussian IC with strong interference was established by Sato (b) and
Han and Kobayashi (). Costa and El Gamal () extended these results to the DM-
IC.
Carleial () introduced the idea of rate splitting and established an inner bound us-
ing successive cancellation decoding and (uncoded) time sharing. His inner bound was
improved through simultaneous decoding and coded time sharing by Han and Kobayashi
(). The inner bound in the Han–Kobayashi paper used four auxiliary random vari-
ables representing public and private messages and involved more inequalities than in
Theorem .. The equivalent characterization with only two auxiliary random variables
and a reduced set of inequalities in Theorem . is due to Chong, Motani, Garg, and
El Gamal (). The injective deterministic IC in Section . was introduced by El Gamal
and Costa (), who used the genie argument to show that the Han–Kobayashi inner
bound is tight.
Kramer () developed a genie-based outer bound for the Gaussian IC. Shang,
Kramer, and Chen (), Annapureddy and Veeravalli (), and Motahari and Khan-
dani () independently established the sum-capacity of the Gaussian IC with weak
interference in Theorem .. Our proof using the genie method follows the one by Anna-
pureddy and Veeravalli (). The half-bit theorem was first established by Etkin, Tse,
and Wang () using the Han–Kobayashi inner bound and a variant of the genie-based
outer bound by Kramer (). The proof in Section . using the injective semideter-
ministic IC is due to Telatar and Tse ().
The approximation of the Gaussian IC by the q-ary expansion deterministic channel
was first proposed by Avestimehr, Diggavi, and Tse (). Bresler, Parekh, and Tse ()
applied this approach to approximate the many-to-one Gaussian IC. This approximation
method was further refined by Jafar and Vishwanath () and Bresler and Tse ().
The symmetric capacity achieving linear coding scheme for the QED-IC is due to Jafar
and Vishwanath (). Bandemer () showed that the entire capacity region can be
achieved by this linear coding scheme.
Interference alignment has been investigated for several classes of Gaussian channels
by Maddah-Ali, Motahari, and Khandani (), Cadambe and Jafar (), Ghasemi,
160 Interference Channels

Motahari, and Khandani (), Motahari, Gharan, Maddah-Ali, and Khandani (),
Gou and Jafar (), and Nazer, Gastpar, Jafar, and Vishwanath (), and for QED-
ICs by Jafar and Vishwanath (), Cadambe, Jafar, and Shamai () and Bandemer,
Vazquez-Vilar, and El Gamal (). Depending on the specific channel, this alignment is
achieved via linear subspaces (Maddah-Ali, Motahari, and Khandani ), signal scale
levels (Cadambe, Jafar, and Shamai ), time delay slots (Cadambe and Jafar ), or
number-theoretic irrational bases (Motahari, Gharan, Maddah-Ali, and Khandani ).
In each case, the subspace that contains the combined interference is disregarded, while
the desired signal is reconstructed from the orthogonal subspace.
There are very few results on the IC with more than two user pairs beyond interference
alignment. A straightforward extension of the Han–Kobayashi coding scheme is shown
to be optimal for the deterministic IC (Gou and Jafar ), where the received signal is
one-to-one to all interference signals given the intended signal. More interestingly, each
receiver can decode for the combined (not individual) interference, which is achieved
using structured codes for the many-to-one Gaussian IC (Bresler, Parekh, and Tse ).
Decoding for the combined interference has been also applied to deterministic ICs with
more than two user pairs (Bandemer and El Gamal ).

PROBLEMS

.. Establish the interference-as-noise inner bound in (.).

.. Prove the outer bound in (.).
.. Prove Lemma ..
.. Verify the outer bound on the capacity region of the Gaussian IC in (.).
.. Show that the normalized capacity region of the QED-IC reduces to the regions
in (.) and (.) and that the normalized symmetric capacity is given by (.).
.. Successive cancellation decoding vs. simultaneous decoding. Consider a DM-IC
p(y1 , y2 |x1 , x2 ). As in the simple coding schemes discussed in Section ., sup-
pose that point-to-point codes are used. Consider the successive cancellation de-
coding scheme, where receiver  first decodes for M2 and then decodes for its own
message M1 . Likewise receiver  first decodes for M1 and then for M2 .
(a) Find the rate region achieved by successive cancellation decoding.
(b) Show that this region is always contained in the simultaneous-decoding inner
bound in (.).
.. Successive cancellation decoding for the Gaussian IC. In Chapter  we found that
for the DM-MAC, successive cancellation decoding with time sharing achieves
the same inner bound as simultaneous decoding. In this problem, we show that
this is not the case for the interference channel.
Problems 161

Consider the Gaussian IC with SNRs S1 and S2 and INRs I1 and I2 .

(a) Write down the rate region achieved by successive cancellation decoding with
Gaussian codes and no power control.
(b) Under what conditions is this region equal to the simultaneous-nonunique-
decoding inner bound in Section .?
(c) How much worse can successive cancellation decoding be than simultaneous
nonunique decoding?
.. Handoff. Consider two symmetric Gaussian ICs, one with SNR S and INR I > S,
and the other with SNR I and INR S. Thus, the second Gaussian IC is equivalent to
the setting where the messages are sent to the other receivers in the first Gaussian
IC. Which channel has a larger capacity region?
.. Power control. Consider the symmetric Gaussian IC with SNR S and INR I.
(a) Write down the rate region achieved by treating interference as noise with time
sharing between two transmission subblocks and power control. Express the
region in terms of three parameters: time-sharing fraction α ∈ [0, 1] and two
power allocation parameters β1 , β2 ∈ [0, 1].
(b) Similarly, write down the rate region achieved by simultaneous nonunique de-
coding with time sharing between two transmission subblocks and power con-
trol in terms of α, β1 , β2 .
.. Gaussian Z interference channel. Consider the Gaussian IC depicted in Figure .
with SNRs S1 , S2 , and INR I1 . (Here the INR I2 = 0.)

д11
X1 Y1

д12

X2 Y2
д22

Figure .. Gaussian interference channel with I2 = 0.

(a) Find the capacity region when S2 ≤ I1 .

(b) Find the sum-capacity when I1 ≤ S2 .
162 Interference Channels

(c) Find the capacity region when S2 ≤ I1 /(1 + S1 ).

.. Minimum-energy-per-bit region. Consider the Gaussian IC with channel gains
д11 , д12 , д21 , and д22 . Find the minimum-energy-per-bit region, that is, the set of
all energy pairs (E1 , E2 ) = (P1 /R1 , P2 /R2 ) such that the rate pair (R1 , R2 ) is achiev-
able with average code power pair (P1 , P2 ).
.. An equivalent characterization of the Han–Kobayashi inner bound. Consider the
inner bound on the capacity region of the DM-IC that consists of all rate pairs
(R1 , R2 ) such that

R1 < I(X1 ; Y1 |U2 , Q),

for some pmf p(q)p(u1 , x1 |q)p(u2 , x2 |q). Show that this inner bound is equivalent
to the characterization of the Han–Kobayashi inner bound in Theorem .. (Hint:
Show that if R1 ≥ I(X1 ; Y1 |U1 , U2 , Q) + I(X2 , U1 ; Y2 |U2 , Q) then the inequalities in
Theorem . imply the above set of inequalities restricted to the choice of U1 = ,
and similarly for the case R2 ≥ I(X1 , U2 ; Y1 |U1 , Q) + I(X2 ; Y2 |U1 , U2 , Q).)
.. A semideterministic interference channel. Consider the DM-IC depicted in Fig-
ure .. Assume that H(Y2 |X2 ) = H(T). A message M j ∈ [1 : 2nR 󰑗 ] is to be sent
from sender X j to receiver Y j for j = 1, 2. The messages are uniformly distributed
and mutually independent. Find the capacity region of this DM-IC. (Hint: To
prove achievability, simplify the Han–Kobayashi inner bound.)

X1 p(y1 |x1 ) Y1
t(x1 )

X2 y2 (x2 , t) Y2

Figure .. Semideterministic DM-IC.

Problems 163

.. Binary injective deterministic interference channel. Consider an injective deter-

ministic IC with binary inputs X1 and X2 and ternary outputs Y1 = X1 + X2 and
Y2 = X1 − X2 + 1. Find the capacity region of the channel.
.. Deterministic interference channel with strong interference. Find the conditions on
the functions of the injective deterministic IC in Section . under which the chan-
nel has strong interference.
.. Han–Kobayashi inner bound for the Gaussian IC. Consider the Gaussian IC with
SNRs S1 , S2 and INRs I1 , I2 .
(a) Show that the Han–Kobayashi inner bound, when evaluated with Gaussian
random variables, reduces to the set of rate pairs (R1 , R2 ) such that

S1
R1 < EQ 󶁦C 󶀦 󶀶󶁶 ,
1 + λ2Q I1
S2
R2 < EQ 󶁦C 󶀦 󶀶󶁶 ,
1 + λ1Q I2
S1 + λ̄ 2Q I1 λ2Q S2
R1 + R2 < EQ 󶁧C 󶀧 󶀷 + C󶀦 󶀶󶁷 ,
1 + λ2Q I1 1 + λ1Q I2
S2 + λ̄ 1Q I2 λ1Q S1
R1 + R2 < EQ 󶁧C 󶀧 󶀷 + C󶀦 󶀶󶁷 ,
1 + λ1Q I2 1 + λ2Q I1
λ1Q S1 + λ̄2Q I1 λ2Q S2 + λ̄1Q I2
R1 + R2 < EQ 󶁧C 󶀧 󶀷 + C󶀧 󶀷󶁷 ,
1 + λ2Q I1 1 + λ1Q I2
S1 + λ̄ 2Q I1 λ1Q S1 λ2Q S2 + λ̄1Q I2
2R1 + R2 ≤ EQ 󶁧C 󶀧 󶀷 + C󶀦 󶀶 + C󶀧 󶀷󶁷 ,
1 + λ2Q I1 1 + λ2Q I1 1 + λ1Q I2
S2 + λ̄ 1Q I2 λ2Q S2 λ1Q S1 + λ̄2Q I1
R1 + 2R2 ≤ EQ 󶁧C 󶀧 󶀷 + C󶀦 󶀶 + C󶀧 󶀷󶁷
1 + λ1Q I2 1 + λ1Q I2 1 + λ2Q I1

for some λ1Q , λ2Q ∈ [0, 1] and pmf p(q) with |Q| ≤ 6.
(b) Suppose that S1 = S2 = S and I1 = I2 = I. By further specializing the inner
bound in part (a), show that the symmetric capacity is lower bounded as

S ̄
λS + λI
Csym ≥ max min 󶁆C 󶀤 󶀴, C󶀦 󶀶,
λ∈[0,1] 1 + λI 1 + λI
1 ̄
S + λI λS
󶀦C 󶀦 󶀶 + C󶀥 󶀵󶀶󶁖 .
2 1 + λI 1 + λI

(c) Use part (b) to show that the symmetric DoF is lower bounded as
∗
dsym (α) ≥ max{1 − α, min{1 − α/2, α}, min{1, α/2}},

which coincides with (.). (Hint: Consider λ = 0, 1, and 1/(1 + S α ).)

164 Interference Channels

.. Genie-aided outer bound for the Gaussian IC. Consider the symmetric Gaussian
IC with SNR S and INR I. Establish the outer bound on the capacity region that
consists of the set of rate pairs (R1 , R2 ) such that
R1 ≤ C(S),
R2 ≤ C(S),
R1 + R2 ≤ C(S) + C(S/(1 + I)),
R1 + R2 ≤ 2 C(I + S/(1 + I)),
2R1 + R2 ≤ C(S + I) + C(I + S/(1 + I)) + C(S) − C(I),
R1 + 2R2 ≤ C(S + I) + C(I + S/(1 + I)) + C(S) − C(I).
(Hint: For the last two inequalities, suppose that receiver  has side information
T1 = 󵀄 I/P X1 + W1 and receiver  has side information T2 = 󵀄 I/P X2 + W2 , where
W1 and W2 are i.i.d. N(0, 1), independent of (Z1 , Z2 ).)
Remark: This bound, which is tighter than the outer bound in (.), is due to
Etkin, Tse, and Wang ().
.. Rate splitting for the more capable DM-BC. Consider the alternative characteriza-
tion of the capacity region in Section .. Prove achievability of this region using
rate splitting and Fourier–Motzkin elimination. (Hint: Divide M1 into two inde-
pendent messages M10 at rate R10 and M11 at rate R11 . Represent (M10 , M2 ) by U
and (M10 , M11 , M2 ) by X.)

APPENDIX 6A PROOF OF LEMMA 6.2

The sum-capacity C̃sum is achieved by treating interference as Gaussian noise. Thus we

only need to prove the converse. Let Q ∼ Unif[1 : n] be a time-sharing random variable in-
dependent of all other random variables and define (T1 , T2 , Y1 , Y2 ) = (T1Q , T2Q , Y1Q , Y2Q ).
Thus, (T1 , T2 , Y1 , Y2 ) = (T1i , T2i , Y1i , Y2i ) with probability 1/n for i ∈ [1 : n]. Suppose that
a rate pair (R̃ 1 , R̃ 2 ) is achievable for the genie-aided channel. Then by Fano’s inequality,
nR̃ 1 ≤ I(X1n ; Y1n , T1n ) + nєn
= I(X1n ; T1n ) + I(X1n ; Y1n |T1n ) + nєn
= h(T1n ) − h(T1n | X1n ) + h(Y1n |T1n ) − h(Y1n |T1n , X1n ) + nєn
n
≤ h(T1n ) − h(T1n | X1n ) + 󵠈 h(Y1i |T1n ) − h(Y1n |T1n , X1n ) + nєn
i=1
(a) n
≤ h(T1n ) − h(T1n | X1n ) + 󵠈 h(Y1i |T1 ) − h(Y1n |T1n , X1n ) + nєn
i=1
(b)
≤ h(T1n ) − h(T1n | X1n ) + nh(Y1∗ |T1∗ ) − h(Y1n |T1n , X1n ) + nєn
(c)
= h(T1n ) − nh(T1∗ | X1∗ ) + nh(Y1∗ |T1∗ ) − h(Y1n |T1n , X1n ) + nєn ,
Appendix 6B Proof of Proposition 6.1 165

nR̃ 2 ≤ h(T2n ) − nh(T2∗ | X2∗ ) + nh(Y2∗ |T2∗ ) − h(Y2n |T2n , X2n ) + nєn .

Thus, we can upper bound the sum-rate as

n(R̃ 1 + R̃ 2 ) ≤ h(T1n ) − h(Y2n |T2n , X2n ) − nh(T1∗ | X1∗ ) + nh(Y1∗ |T1∗ )

+ h(T2n ) − h(Y1n |T1n , X1n ) − nh(T2∗ | X2∗ ) + nh(Y2∗ |T2∗ ) + nєn .

Evaluating the first two terms, we obtain

h(T1n ) − h(Y2n |T2n , X2n ) = h󶀡󵀄 I/P X1n + η 󵀄 I/P W1n 󶀱 − h󶀡󵀄I/P X1n + Z2n 󵄨󵄨󵄨󵄨 W2n 󶀱
= h󶀡󵀄 I/P X1n + V1n 󶀱 − h󶀡󵀄I/P X1n + V2n 󶀱,

where V1n = η 󵀄I/P W1n is i.i.d. N(0, η 2 I/P) and V2n = Z2n − E(Z2n |W2n ) is i.i.d. N(0, 1 − ρ2 ).
Given the useful genie condition η 2 I/P ≤ 1 − ρ2 , express V2n = V1n + V n , where V n is i.i.d.
N(0, 1 − ρ2 − η 2 I/P), independent of V1n . Now let (V , V1 , V2 , X1 ) = (VQ , V1Q , V2Q , X1Q )
and consider

h(T1n ) − h(Y2n |T2n , X2n ) = h󶀡󵀄I/P X1n + V1n 󶀱 − h󶀡󵀄 I/P X1n + V1n + V n 󶀱
= −I 󶀢V n ; 󵀄 I/P X1n + V1n + V n 󶀲
= −nh(V ) + h󶀡V n 󵄨󵄨󵄨󵄨 󵀄 I/P X1n + V1n + V n 󶀱
n
≤ −nh(V ) + 󵠈 h󶀡Vi 󵄨󵄨󵄨󵄨 󵀄 I/P X1n + V1n + V n 󶀱
i=1
n
≤ −nh(V ) + 󵠈 h󶀡Vi 󵄨󵄨󵄨󵄨 󵀄 I/P X1i + V1i + Vi 󶀱
i=1

≤ −nh(V ) + nh󶀡V 󵄨󵄨󵄨󵄨 󵀄I/P X1 + V1 + V 󶀱

(a)
≤ −nI󶀡V ; 󵀄 I/P X1∗ + V1 + V 󶀱
= nh󶀡󵀄 I/P X1∗ + V1 󶀱 − nh󶀡󵀄I/P X1∗ + V1 + V 󶀱
= nh(T1∗ ) − nh(Y2∗ |T2∗ , X2∗ ),

where (a) follows since Gaussian is the worst noise with a given average power in an ad-
ditive noise channel with Gaussian input; see Problem .. The other terms h(T2n ) −
h(Y1n |T1n , X1n ) can be bounded in the same manner. This completes the proof of the lemma.

APPENDIX 6B PROOF OF PROPOSITION 6.1

Consider a sequence of (2nR1 , 2nR2 ) codes with limn→∞ Pe(n) = 0. Furthermore, let X1n , X2n ,
T1n , T2n , Y1n , Y2n denote the random variables resulting from encoding and transmitting
166 Interference Channels

the independent messages M1 and M2 . Define random variables U1n , U2n such that U ji is
jointly distributed with X ji according to pT 󰑗 |X 󰑗 (u ji |x ji ), conditionally independent of T ji
given X ji for j = 1, 2 and i ∈ [1 : n]. By Fano’s inequality,

nR j = H(M j )
≤ I(M j ; Y jn ) + nєn
≤ I(X nj ; Y jn ) + nєn .

This directly yields a multiletter outer bound of the capacity region. We are looking for a
nontrivial single-letter upper bound.
Observe that

I(X1n ; Y1n ) = H(Y1n ) − H(Y1n | X1n )

= H(Y1n ) − H(T2n | X1n )
= H(Y1n ) − H(T2n )
n
≤ 󵠈 H(Y1i ) − H(T2n ) ,
i=1

since Y1n and T2n are one-to-one given X1n , and T2n is independent of X1n . The second term
H(T2n ), however, is not easily upper-bounded in a single-letter form. Now consider the
following augmentation

I(X1n ; Y1n ) ≤ I(X1n ; Y1n , U1n , X2n )

The second and fourth terms in (a) represent the output of a memoryless channel given
its input. Thus they readily single-letterize with equality. The third term can be upper-
bounded in a single-letter form. The first term H(T1n ) will be used to cancel boxed terms
such as H(T2n ) above. Similarly, we can write

I(X1n ; Y1n ) ≤ I(X1n ; Y1n , U1n )

and

I(X1n ; Y1n ) ≤ I(X1n ; Y1n , X2n )

By symmetry, similar bounds can be established for I(X2n ; Y2n ), namely,

Now consider linear combinations of the above inequalities where all boxed terms are
canceled. Combining them with the bounds using Fano’s inequality and using a time-
sharing variable Q ∼ Unif[1 : n] completes the proof of the outer bound.
CHAPTER 7

Channels with State

In previous chapters, we assumed that the channel statistics do not change over transmis-
sions and are completely known to the senders and the receivers. In this chapter, we study
channels with state, which model communication settings where the channel statistics
are not fully known or vary over transmissions, such as a wireless channel with fading, a
write-once memory with programmed cells, a memory with stuck-at faults, or a commu-
nication channel with an adversary (jammer). The uncertainty about the channel statistics
in such settings is captured by a state that may be fixed throughout the transmission block
or vary randomly (or arbitrarily) over transmissions. The information about the channel
state may be fully or partially available at the sender, the receiver, or both. In each set-
ting, the channel capacity can be defined as before, but taking into consideration the state
model and the availability of state information at the sender and/or the receiver.
We first discuss the compound channel model, where the state is selected from a given
set and fixed throughout transmission. We then briefly discuss the arbitrarily varying
channel, where the state varies over transmissions in an unknown manner. The rest of
the chapter is dedicated to studying channels for which the state varies randomly over
transmissions according to an i.i.d. process. We establish the capacity under various as-
sumptions on state information availability at the encoder and/or the decoder. The most
interesting case is when the state information is available causally or noncausally only at
the encoder. For the causal case, the capacity is achieved by the Shannon strategy where
each transmitted symbol is a function of a codeword symbol and the current state. For the
noncausal case, the capacity is achieved by the Gelfand–Pinsker coding scheme, which
involves joint typicality encoding and the new technique of multicoding (subcodebook
generation). When specialized to the Gaussian channel with additive Gaussian state, the
Gelfand–Pinsker scheme leads to the writing on dirty paper result, which shows that the
effect of the state can be completely canceled. We also discuss several extensions of these
results to multiuser channels with random state.
In Chapter , we will discuss the application of some of the above results to Gaussian
fading channels, which are popular models for wireless communication. The multicoding
technique will be used to establish an inner bound on the capacity region of the general
broadcast channel in Chapter . Writing on dirty paper will be used to establish an al-
ternative achievability proof for the Gaussian BC in Chapter , which will be extended to
establish the capacity region of the vector Gaussian BC in Chapter .
7.1 Discrete Memoryless Channel with State 169

7.1 DISCRETE MEMORYLESS CHANNEL WITH STATE

Consider the point-to-point communication system with state depicted in Figure .. The
sender wishes to communicate a message M ∈ [1 : 2nR ] over a channel with state to the
receiver with possible side information about the state sequence s n available at the encoder
and/or the decoder. We consider a discrete-memoryless channel with state model (X ×
S , p(y|x, s), Y) that consists of a finite input alphabet X , a finite output alphabet Y, a
finite state alphabet S, and a collection of conditional pmfs p(y|x, s) on Y. The channel is
memoryless in the sense that, without feedback, p(y n |x n , s n , m) = ∏ni=1 pY|X,S (yi |xi , si ).

M Xn Yn ̂
M
Encoder p(y|x, s) Decoder

Figure .. Point-to-point communication system with state.

In the following sections, we study special cases of this general setup.

7.2 COMPOUND CHANNEL

The compound channel is a DMC with state, where the channel is selected arbitrarily
from a set of possible DMCs and fixed throughout the transmission block as depicted
in Figure .. It models communication in the presence of uncertainty about channel
statistics. For clarity of notation, we use the equivalent definition of a compound channel
as consisting of a set of DMCs (X , p(ys |x), Y), where ys ∈ Y for every state s in the finite
set S. The state s remains the same throughout the transmission block, i.e., p(y n |x n , s n ) =
∏ni=1 pY󰑠 |X (yi |xi ) = ∏ni=1 pY|X,S (yi |xi , s).

M Xn Yn ̂
M
Encoder p(ys |x) Decoder

Figure .. The compound channel.

170 Channels with State

Consider the case where the state is not available at either the encoder or the decoder.
The definition of a (2nR , n) code under this assumption is the same as for the DMC. The
average probability of error, however, is defined as

̂ | s is the selected channel state}.

Pe(n) = max P{M ̸= M
s∈S

Achievability and capacity are also defined as for the DMC.

Theorem .. The capacity of the compound channel (X , {p(ys |x) : s ∈ S}, Y) with no
state information available at either the encoder or the decoder is

CCC = max min I(X; Ys ).

p(x) s∈S

Clearly, the channel capacity CCC ≤ mins∈S Cs , where Cs = max p(x) I(X; Ys ) is the ca-
pacity of the channel p(ys |x). The following example shows that this inequality can be
strict.
Example .. Consider the two-state compound Z channel depicted in Figure .. If s = 1,
pY󰑠 |X (0|1) = 1/2, and if s = 2, pY󰑠 |X (1|0) = 1/2. The capacity of this compound channel is

1 1
CCC = H 󶀣 󶀳 − = 0.3113,
4 2
and is attained by X ∼ Bern(1/2). Note that the capacity is strictly less than C1 = C2 =
H(1/5) − 2/5 = log(5/4) = 0.3219.

1/2
0 0 0 0

1 1 1 1
1/2

s=1 s=2

Figure .. Z channel with state.

Remark 7.1. Note the similarity between the compound channel setup and the DM-BC
with input X and outputs Ys , s ∈ S, when only a common message is to be sent to all the
receivers. The main difference between these two setups is that in the broadcast channel
case each receiver knows the statistics of its channel from the sender, while in the com-
pound channel case with no state information available at the encoder or the decoder, the
receiver knows only that the channel is one of several possible DMCs. Theorem . shows
that the capacity is the same for these two setups, which in turn shows that the capacity
7.2 Compound Channel 171

of the compound channel does not increase when the state s is available at the decoder.
This is not surprising since the receiver can learn the state from a relatively short training
sequence.
Remark 7.2. It can be easily shown that mins∈S Cs is the capacity of the compound chan-
nel when the state s is available at the encoder.
Converse proof of Theorem .. For every sequence of codes with limn→∞ Pe(n) = 0, by
Fano’s inequality, we must have H(M|Ysn ) ≤ nєn for some єn that tends to zero as n → ∞
for every s ∈ S. As in the converse proof for the DMC,

nR ≤ I(M; Ysn ) + nєn

n
≤ 󵠈 I(Xi ; Ysi ) + nєn .
i=1

Now we introduce the time-sharing random variable Q ∼ Unif[1 : n] independent of

(M, X n , Ysn , s) and define X = XQ and Ys = YsQ . Then, Q → X → Ys form a Markov
chain and
nR ≤ nI(XQ ; YsQ |Q) + nєn
≤ nI(X, Q; Ys ) + nєn
= nI(X; Ys ) + nєn

for every s ∈ S. By taking n → ∞, we have

R ≤ min I(X; Ys )
s∈S

for some pmf p(x). This completes the proof of the converse.
Achievability proof of Theorem .. The proof uses random codebook generation and
joint typicality decoding. We fix p(x) and randomly and independently generate 2nR se-
quences x n (m), m ∈ [1 : 2nR ], each according to ∏ni=1 p X (xi ). To send m, the encoder
transmits x n (m). Upon receiving y n , the decoder finds a unique message m ̂ such that
(n)
n
̂
(x (m), y ) ∈ Tє (X, Ys ) for some s ∈ S.
n

We now analyze the probability of error. Assume without loss of generality that M = 1
is sent. The decoder makes an error only if one or both of the following events occur:

E1 = 󶁁(X n (1), Y n ) ∉ Tє(n) (X, Ys󳰀 ) for all s 󳰀 ∈ S󶁑,

E2 = 󶁁(X n (m), Y n ) ∈ Tє(n) (X, Ys󳰀 ) for some m ̸= 1, s 󳰀 ∈ S󶁑.

Then, the average probability of error is upper bounded as P(E) ≤ P(E1 ) + P(E2 ). By the
LLN, P{(X n (1), Y n ) ∉ Tє(n) (X, Ys )} tends to zero as n → ∞. Thus P(E1 ) tends to zero as
n → ∞. By the packing lemma, for each s 󳰀 ∈ S,

lim P󶁁(X n (m), Y n ) ∈ Tє(n) (X, Ys󳰀 ) for some m ̸= 1󶁑 = 0,

n→∞

if R < I(X; Ys󳰀 ) − δ(є). (Recall that the packing lemma applies to an arbitrary output pmf
172 Channels with State

p(y n ).) Hence, by the union of events bound,

P(E2 ) = P󶁁(X n (m), Y n ) ∈ Tє(n) (X, Ys󳰀 ) for some m ̸= 1, s󳰀 ∈ S󶁑
≤ |S | ⋅ max
󳰀
P󶁁(X n (m), Y n ) ∈ Tє(n) (X, Ys󳰀 ) for some m ̸= 1󶁑,
s ∈S

which tends to zero as n → ∞ if R < I(X; Ys󳰀 ) − δ(є) for all s󳰀 ∈ S, or equivalently, R <
mins󳰀 ∈S I(X; Ys󳰀 ) − δ(є). This completes the achievability proof.
Theorem . can be generalized to the case where S is arbitrary (not necessarily finite),
and the capacity is
CCC = max inf I(X; Ys ).
p(x) s∈S

This can be proved, for example, by noting that the probability of error for the packing
lemma decays exponentially fast in n and the effective number of states is polynomial in
n (there are only polynomially many empirical pmfs p(ysn |x n )); see Problem ..
Example .. Consider the compound BSC(s) with s ∈ [0, p], p < 1/2. Then, CCC =
1 − H(p), which is attained by X ∼ Bern(1/2). In particular, if p = 1/2, then CCC = 0.
This example demonstrates that the compound channel model is quite pessimistic, being
robust against the worst-case channel.

Remark .. The compound channel model can be readily extended to channels with
input cost and to Gaussian channels. The extension to Gaussian channels with state is
particularly interesting because these channels are used to model wireless communication
channels with fading; see Chapter .

7.3* ARBITRARILY VARYING CHANNEL

In an arbitrarily varying channel (AVC), the state sequence s n changes over transmissions
in an unknown and possibly adversarial manner. The capacity of the AVC depends on the
availability of common randomness between the encoder and the decoder (deterministic
code versus randomized code as defined in Problem .), the performance criterion (av-
erage versus maximal probability of error), and knowledge of the adversary (codebook
and/or the actual codeword transmitted). For example, suppose that X and S are binary
and Y = X + S is ternary. If the adversary knows the codebook, then it can always choose
S n to be one of the codewords. Given the sum of two codewords, the decoder has no
way of distinguishing the true codeword from the interference. Hence, the probability
of error is close to 1/2 and the capacity is equal to zero. By contrast, if the encoder and
the decoder can use common randomness, they can use a randomized code to combat
the adversary. In this case, the capacity is 1/2 bit/transmission, which is the capacity of
a BEC(1/2) that corresponds to S ∼ Bern(1/2), for both the average and maximal error
probability criteria. If the adversary, however, knows the actual codeword transmitted,
the shared randomness again becomes useless since the adversary can make the output
equal to the all ones sequence, and the capacity is again equal to zero.
Suppose that the encoder and the decoder can use shared common randomness to
7.4 Channels with Random State 173

randomize the encoding and decoding operations, and the adversary has no knowledge
of the actual codeword transmitted. In this case, the performance criterion of average or
maximal probability of error does not affect the capacity, which is

CAVC = max min I(X; YS ) = min max I(X; YS ). (.)

p(x) p(s) p(s) p(x)

Hence, the capacity is the saddle point of the game played by the encoder and the state
selector with randomized strategies p(x) and p(s).

7.4 CHANNELS WITH RANDOM STATE

We now turn our attention to the less adversarial setup where the state is randomly chosen
by nature. We consider the DMC with DM state model (X × S , p(y|x, s)p(s), Y), where
the state sequence (S1 , S2 , . . .) is i.i.d. with Si ∼ pS (si ). We are interested in finding the
capacity of this channel under various scenarios of state information availability at the
encoder and/or the decoder. The fact that the state changes over transmissions provides a
temporal dimension to state information availability. The state may be available causally
(that is, only S i is known before transmission i takes place), noncausally (that is, S n is
known before communication commences), or with some delay or lookahead. For state
availability at the decoder, the capacity under these different temporal constraints is the
same. This is not the case, however, for state availability at the encoder as we will see.
We first consider two simple special cases. More involved cases are discussed in Sec-
tions .., ., and ..
State information not available at either the encoder or the decoder. Let p(y|x) =
∑s p(s)p(y|x, s) be the DMC obtained by averaging the DMCs p(y|x, s) over the state.
Then it is easy to see that the capacity when the state information is not available at the
encoder or the decoder is
C = max I(X; Y ).
p(x)

State information available only at the decoder. When the state sequence is available
only at the decoder, the capacity is

CSI-D = max I(X; Y , S) = max I(X; Y |S). (.)

p(x) p(x)

Achievability follows by treating (Y , S) as the output of the DMC p(y, s|x) = p(s)p(y|x, s).
The converse proof is straightforward.

7.4.1 State Information Available at Both the Encoder and the Decoder
Suppose that the state information is available causally and/or noncausally at both the
encoder and the decoder as depicted in Figure .. Then the capacity is the same for all
four combinations and is given by

CSI-ED = max I(X; Y |S). (.)

p(x|s)
174 Channels with State

S i /S n p(s) S i /S n

Si
M Xi Yi ̂
M
Encoder p(y|x, s) Decoder

Figure .. State information available at both the encoder and the decoder.

Achievability of CSI-ED can be proved by rate splitting and treating S n as a time-sharing

sequence.
Rate splitting. Divide the message M into independent messages Ms ∈ [1 : 2nRs ], s ∈ S.
Thus R = ∑s Rs .
Codebook generation. Fix the conditional pmf p(x|s) that achieves the capacity and
let 0 < є < 1. For every s ∈ S, randomly and independently generate 2nRs sequences
x n (ms , s), ms ∈ [1 : 2nRs ], each according to ∏ni=1 p X|S (xi |s).
Encoding. To send message m = (ms : s ∈ S), consider the corresponding codeword tuple
(x n (ms , s) : s ∈ S). Store each of these codewords in a first-in first-out (FIFO) buffer of
length n. In time i ∈ [1 : n], the encoder transmits the first untransmitted symbol from
the FIFO buffer corresponding to the state si .
Decoding and the analysis of the probability of error. The decoder demultiplexes the
received sequence into subsequences (yn󰑠 (s), s ∈ S), where ∑s ns = n. Assuming s n ∈
Tє(n) , and hence ns ≥ n(1 − є)p(s) for all s ∈ S, it finds for each s a unique m ̂ s such that
the codeword subsequence x n(1−є)p(s) (m ̂ s , s) is jointly typical with ynp(s)(1−є) (s). By the
LLN and the packing lemma, the probability of error for each decoding step tends to zero
as n → ∞ if Rs < (1 − є)p(s)I(X; Y|S = s) − δ(є). Thus, the total probability of error tends
to zero as n → ∞ if R < (1 − є)I(X; Y|S) − δ(є). This completes the proof of achievability.
The converse for (.) when the state is noncausally available is quite straightforward.
Note that this converse also establishes the capacity for the causal case.
Remark .. The capacity expressions in (.) for the case with state information available
only at the decoder and in (.) for the case with state information available at both the
encoder and the decoder continue to hold when {Si } is a stationary ergodic process. This
key observation will be used in the discussion of fading channels in Chapter .

7.4.2 Extensions to the DM-MAC with State

The above coding theorems can be extended to some multiuser channels with DM state.
An interesting example is the DM-MAC with -DM state components (X1 × X2 × S ,
p(y|x1 , x2 , s)p(s), Y), where S = (S1 , S2 ). The motivation for assuming two state compo-
nents is that in certain practical settings, such as the MAC with fading studied in Chap-
ter  and the random access channel studied in Chapter , the effect of the channel
7.5 Causal State Information Available at the Encoder 175

uncertainty or variation with time can be modeled by a separate state for the path from
each sender to the receiver. The DM-MAC with a single DM state is a special case of this
model with S = S1 = S2 .

∙ When no state information is available at the encoders or the decoder, the capacity
region is that for the average DM-MAC p(y|x1 , x2 ) = ∑s p(s)p(y|x1 , x2 , s).
∙ When the state sequence is available only at the decoder, the capacity region CSI-D is
the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y | X2 , Q, S),
R2 ≤ I(X2 ; Y | X1 , Q, S),
R1 + R2 ≤ I(X1 , X2 ; Y |Q, S)

for some pmf p(q)p(x1 |q)p(x2 |q).

∙ When the state sequence is available at both encoders and the decoder, the capacity
region CSI-ED is the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y | X2 , Q, S),
R2 ≤ I(X2 ; Y | X1 , Q, S),
R1 + R2 ≤ I(X1 , X2 ; Y |Q, S)

for some pmf p(q)p(x1 |s, q)p(x2 |s, q) and the encoders can adapt their codebooks
according to the state sequence.
∙ More interestingly, when the state sequence is available at the decoder and each state
component is available causally or noncausally at its respective encoder, that is, the en-
coders are specified by x nj (m j , s nj ) for j = 1, 2, the capacity region with such distributed
state information, CDSI-ED , is the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y | X2 , Q, S),
R2 ≤ I(X2 ; Y | X1 , Q, S), (.)
R1 + R2 ≤ I(X1 , X2 ; Y |Q, S)

for some pmf p(q)p(x1 |s1 , q)p(x2 |s2 , q) and each encoder can adapt its codebook to
its state component sequence.

The proofs of achievability and the converse for the above results follow the proofs for the
DM-MAC and the DMC with DM state.

7.5 CAUSAL STATE INFORMATION AVAILABLE AT THE ENCODER

We consider yet another special case of state availability for the DMC with DM state
p(y|x, s) p(s). Suppose that the state sequence is available only at the encoder. In this case,
176 Channels with State

the capacity depends on whether the state information is available causally or noncausally.
We first study the causal case, that is, when the encoder knows S i before transmission i, as
depicted in Figure .. A (2nR , n) code for this setup is specified by a message set [1 : 2nR ],
̂ n ). The definitions of the average prob-
an encoder xi (m, s i ), i ∈ [1 : n], and a decoder m(y
ability of error, achievability, and capacity for this case are otherwise the same as for the
DMC without state.

Si
p(s)

Si
M Xi Yi ̂
M
Encoder p(y|x, s) Decoder

Figure .. State information causally available only at the encoder.

The capacity for this case is given in the following.

Theorem .. The capacity of the DMC with DM state p(y|x, s)p(s) when the state
information is available causally only at the encoder is

CCSI-E = max I(U ; Y),

p(u), x(u,s)

where U is an auxiliary random variable independent of S with |U | ≤ min{(|X | −

1)|S| + 1, |Y|}.

Proof of achievability. Fix the pmf p(u) and function x(u, s) that attain CCSI-E . Ran-
domly and independently generate 2nR sequences un (m), m ∈ [1 : 2nR ], each according
to ∏ni=1 pU (ui ). To send m given the state si , the encoder transmits x(ui (m), si ) at time
i ∈ [1 : n]. The decoder finds the unique message m ̂ y n ) ∈ Tє(n) . By
̂ such that (un (m),
the LLN and the packing lemma, the probability of error tends to zero as n → ∞ if
R < I(U ; Y) − δ(є), which completes the proof of achievability.

Remark .. The above coding scheme corresponds to attaching a deterministic “physical
device” x(u, s) with two inputs U and S and one output X in front of the actual channel
input as depicted in Figure .. This induces a new DMC p(y|u) = ∑s p(y|x(u, s), s)p(s)
with input U and output Y with capacity CCSI-E .

Remark . (Shannon strategy). We can view the encoding as being performed over
the set of all functions {xu (s) : S → X } indexed by u as the input alphabet. This tech-
nique of coding over functions onto X instead of actual symbols in X is referred to as
7.5 Causal State Information Available at the Encoder 177

Si
p(s)

Si
M Ui Xi Yi ̂
M
Encoder x(u, s) p(y|x, s) Decoder

Figure .. Coding with causal state information at the encoder.

the Shannon strategy. Note that the above cardinality bound shows that we need only
min{(|X | − 1)|S| + 1, |Y|} (instead of |X ||S| ) functions.

Proof of the converse. The key to the proof is the identification of the auxiliary random
variable U . We would like Xi to be a function of (Ui , Si ). In general, Xi is a function of
(M, S i−1 , Si ). So we define Ui = (M, S i−1 ). This identification also satisfies the require-
ments that Ui is independent of Si and Ui → (Xi , Si ) → Yi form a Markov chain for every
i ∈ [1 : n]. Now, by Fano’s inequality,

nR ≤ I(M; Y n ) + nєn
n
= 󵠈 I(M; Yi |Y i−1 ) + nєn
i=1
n
≤ 󵠈 I(M, Y i−1 ; Yi ) + nєn
i=1
n
≤ 󵠈 I(M, S i−1 , Y i−1 ; Yi ) + nєn
i=1
n
(a)
= 󵠈 I(M, S i−1 , X i−1 , Y i−1 ; Yi ) + nєn
i=1
n
(b)
= 󵠈 I(M, S i−1 , X i−1 ; Yi ) + nєn
i=1
n
(c)
= 󵠈 I(Ui ; Yi ) + nєn
i=1
≤ n max I(U ; Y) + nєn ,
p(u), x(u,s)

where (a) and (c) follow since X i−1 is a function of (M, S i−1 ) and (b) follows since Y i−1 →
(X i−1 , S i−1 ) → Yi . This completes the proof of the converse.
Remark .. Consider the case where the state sequence is available causally at both the
encoder and the decoder in Section ... Note that we can use the Shannon strategy to
provide an alternative proof of achievability to the time-sharing proof discussed earlier.
178 Channels with State

Treating (Y n , S n ) as the equivalent channel output, Theorem . reduces to

CSI-ED = max I(U ; Y , S)
p(u), x(u,s)
(a)
= max I(U ; Y |S)
p(u), x(u,s)
(b)
= max I(X; Y |S)
p(u), x(u,s)
(c)
= max I(X; Y |S),
p(x|s)

where (a) follows by the independence of U and S, and (b) follows since X is a function
of (U , S) and U → (S, X) → Y form a Markov chain. Step (c) follows by the functional
representation lemma in Appendix B, which states that every conditional pmf p(x|s) can
be represented as a deterministic function x(u, s) of S and a random variable U that is
independent of S.

7.6 NONCAUSAL STATE INFORMATION AVAILABLE AT THE ENCODER

We now consider the case where the state sequence is available noncausally only at the
encoder. In other words, a (2nR , n) code is specified by a message set [1 : 2nR ], an en-
̂ n ). The definitions of the average probability of error,
coder x n (m, s n ), and a decoder m(y
achievability, and capacity are otherwise the same as for the DMC without state.
Assuming noncausal state information availability at the encoder, however, may ap-
pear unrealistic. How can the encoder know the state sequence before it is generated? The
following example demonstrates a real-world scenario in which the state can be available
noncausally at the encoder (other scenarios will be discussed later).
Example . (Memory with stuck-at faults). Consider the DMC with DM state depicted
in Figure ., which is a model for a digital memory with “stuck-at” faults or a write-once
memory (WOM) such as a CD-ROM. As shown in the figure, the state S = 0 corresponds
to a faulty memory cell that outputs a 0 independent of its stored input value, the state
S = 1 corresponds to a faulty memory cell that always outputs a 1, and the state S = 2
corresponds to a nonfaulty cell that outputs the same value as its stored input. The prob-
abilities of these states are p/2, p/2, and 1 − p, respectively.
The writer (encoder) who knows the locations of the faults (by testing the memory)
wishes to reliably store information in a way that does not require the reader (decoder) to
know the locations of the faults. How many bits per memory cell can be reliably stored?

∙ If neither the writer nor the reader knows the fault locations, we can store up to
p
C = max I(X; Y ) = 1 − H 󶀣 󶀳 bits/cell.
p(x) 2
∙ If both the writer and the reader know the fault locations, we can store up to
CSI-ED = max I(X; Y |S) = 1 − p bits/cell.
p(x|s)
7.6 Noncausal State Information Available at the Encoder 179

S p(s) X Y
0 0
0 p/2 stuck at 0
1 1

0 0
1 p/2 stuck at 1
1 1

0 0
2 1−p
1 1

Figure .. Memory with stuck-at faults.

∙ If the reader knows the fault locations (erasure channel), we can also store

CSI-D = max I(X; Y |S) = 1 − p bits/cell.

p(x)

Surprisingly, even if only the writer knows the fault locations we can still reliably store
up to 1 − p bits/cell! Consider the following multicoding scheme. Assume that the mem-
ory has n cells. Randomly and independently assign each binary n-sequence to one of 2nR
subcodebooks C(m), m ∈ [1 : 2nR ]; see Figure .. To store message m, the writer searches
in subcodebook C(m) for a sequence that matches the pattern of the faulty cells and stores
it; otherwise it declares an error. The reader declares as the message estimate the index of
the subcodebook that the stored sequence belongs to. Since there are roughly np faulty
cells for n sufficiently large, there are roughly 2n(1−p) sequences that match any given fault
pattern. Hence for n sufficiently large, any given subcodebook has at least one matching
sequence with high probability if R < CSI-E = (1 − p), and asymptotically up to n(1 − p)
bits can be reliably stored in an n-bit memory with np faulty cells.

C(1) C(2) C(3) C(2nR )

Figure .. Multicoding scheme for the memory with stuck-at faults. Each binary
n-sequence is randomly assigned to a subcodebook C(m), m ∈ [1 : 2nR ].

The following theorem generalizes the above example to DMCs with DM state using
the same basic coding idea.
180 Channels with State

Theorem . (Gelfand–Pinsker Theorem). The capacity of the DMC with DM state
p(y|x, s)p(s) when the state information is available noncausally only at the encoder is

CSI-E = max 󶀡I(U ; Y) − I(U ; S)󶀱,

p(u|s), x(u,s)

where |U | ≤ min{|X |⋅|S|, |Y| + |S| − 1}.

For the memory with stuck-at faults example, given S = 2, that is, when there is no
fault, we set U ∼ Bern(1/2) and X = U. If S = 1 or , we set U = X = S. Since Y = X = U
under this choice of p(u|s) and x(u, s), we have

I(U ; Y ) − I(U ; S) = H(U |S) − H(U |Y) = H(U |S) = 1 − p.

We prove the Gelfand–Pinsker theorem in the next two subsections.

7.6.1 Proof of Achievability

The Gelfand–Pinsker coding scheme is illustrated in Figure .. It uses multicoding and
joint typicality encoding and decoding. For each message m, we generate a subcodebook
̃
C(m) of 2n(R−R) sequences un (l). To send m given the state sequence s n , we find a sequence
un (l) ∈ C(m) that is jointly typical with s n and transmit x n (un (l), s n ). The receiver recov-
ers un (l) and then declares its subcodebook index as the message estimate.

sn
n
u
un (1)
C(1)

Tє(n) (U , S)
C(m) un (l)

C(2nR )
̃
un (2nR )

Figure .. Gelfand–Pinsker coding scheme. Each subcodebook C(m), m ∈ [1 : 2nR ],

̃
consists of 2n(R−R) sequences un (l).
7.6 Noncausal State Information Available at the Encoder 181

We now provide the details of the achievability proof.

Codebook generation. Fix the conditional pmf p(u|s) and function x(u, s) that attain
the capacity and let R̃ > R. For each message m ∈ [1 : 2nR ] generate a subcodebook C(m)
̃
consisting of 2n(R−R) randomly and independently generated sequences un (l), l ∈ [(m −
̃ ̃
1)2n(R−R) + 1 : m2n(R−R) ], each according to ∏ni=1 pU (ui ).
Encoding. To send message m ∈ [1 : 2nR ] with the state sequence s n observed, the encoder
chooses a sequence un (l) ∈ C(m) such that (un (l), s n ) ∈ Tє(n)
󳰀 . If no such sequence exists,

it picks l = 1. The encoder then transmits xi = x(ui (l), si ) at time i ∈ [1 : n].

Decoding. Let є > є 󳰀 . Upon receiving y n , the decoder declares that m ̂ ∈ [1 : 2nR ] is sent
if it is the unique message such that (un (l), yn ) ∈ Tє(n) for some un (l) ∈ C(m);
̂ otherwise
it declares an error.
Analysis of the probability of error. Assume without loss of generality that M = 1 and
let L denote the index of the chosen U n sequence for M = 1 and S n . The decoder makes
an error only if one or more of the following events occur:
E1 = 󶁁(U n (l), S n ) ∉ Tє(n)
󳰀 for all U n (l) ∈ C(1)󶁑,
E2 = 󶁁(U n (L), Y n ) ∉ Tє(n) 󶁑,
E3 = 󶁁(U n (l), Y n ) ∈ Tє(n) for some U n (l) ∉ C(1)󶁑.
Thus, the probability of error is upper bounded as
P(E) ≤ P(E1 ) + P(E1c ∩ E2 ) + P(E3 ).
By the covering lemma in Section . with U = , X ← S, and X̂ ← U, P(E1 ) tends to zero
as n → ∞ if R̃ − R > I(U ; S) + δ(є 󳰀 ). Next, note that
E1c = 󶁁(U n (L), S n ) ∈ Tє(n) n n n (n)
󳰀 󶁑 = 󶁁(U (L), X , S ) ∈ T є 󳰀 󶁑

Hence, by the conditional typicality lemma in Section . (see also Problem .), P(E1c ∩
E2 ) tends to zero as n → ∞ (recall the standing assumption that є 󳰀 < є).
Finally, since every U n (l) ∉ C(1) is distributed according to ∏ni=1 pU (ui ) and is in-
dependent of Y n , by the packing lemma in Section ., P(E3 ) tends to zero as n → ∞ if
R̃ < I(U ; Y ) − δ(є). Combining these results, we have shown P(E) tends to zero as n → ∞
if R < I(U ; Y ) − I(U ; S) − δ(є) − δ(є 󳰀 ). This completes the proof of achievability.
Remark .. In the bound on P(E3 ), the sequence Y n is not generated i.i.d. However, we
can still use the packing lemma.
182 Channels with State

7.6.2 Proof of the Converse

Again the key is to identify Ui such that Ui → (Xi , Si ) → Yi form a Markov chain. By
Fano’s inequality, for any code with limn→∞ Pe(n) = 0, H(M|Y n ) ≤ nєn , where єn tends to
zero as n → ∞. Consider

nR ≤ I(M; Y n ) + nєn
n
= 󵠈 I(M; Yi |Y i−1 ) + nєn
i=1
n
≤ 󵠈 I(M, Y i−1 ; Yi ) + nєn
i=1
n n
= 󵠈 I(M, Y i−1 , Si+1
n n
; Yi ) − 󵠈 I(Yi ; Si+1 |M, Y i−1 ) + nєn
i=1 i=1
n n
(a)
= 󵠈 I(M, Y i−1 , Si+1
n
; Yi ) − 󵠈 I(Y i−1 ; Si |M, Si+1
n
) + nєn
i=1 i=1
n n
(b)
= 󵠈 I(M, Y i−1 , Si+1
n
; Yi ) − 󵠈 I(M, Y i−1 , Si+1
n
; Si ) + nєn ,
i=1 i=1

where (a) follows by the Csiszár sum identity and (b) follows since (M, Si+1n
) is indepen-
i−1 n
dent of Si . Now identifying Ui = (M, Y , Si+1 ), we have Ui → (Xi , Si ) → Yi for i ∈ [1 : n]
as desired. Hence
n
nR ≤ 󵠈󶀡I(Ui ; Yi ) − I(Ui ; Si )󶀱 + nєn
i=1
≤ n max 󶀡I(U ; Y ) − I(U ; S)󶀱 + nєn . (.)
p(u,x|s)

The cardinality bound on U can be proved using the convex cover method in Appendix C.
Finally, we show that it suffices to maximize over p(u|s) and functions x(u, s). For a
fixed p(u|s), note that

p(y|u) = 󵠈 p(s|u)p(x |u, s)p(y|x, s)

x,s

is linear in p(x|u, s). Since p(u|s) is fixed, the maximum in (.) is only over I(U ; Y),
which is convex in p(y|u) (since p(u) is fixed) and hence in p(x|u, s). This implies that
the maximum is attained at an extreme point of the set of pmfs p(x|u, s), that is, using one
of the deterministic mappings x(u, s). This completes the proof of the Gelfand–Pinsker
theorem.

7.6.3 Comparison with the Causal Case

Recall that when the state information is available noncausally only at the encoder, the
capacity is
CSI-E = max 󶀡I(U ; Y ) − I(U ; S)󶀱.
p(u|s), x(u,s)
7.6 Noncausal State Information Available at the Encoder 183

By comparison, when the state information is available causally only at the encoder, the
capacity can be expressed as

CCSI-E = max 󶀡I(U ; Y) − I(U ; S)󶀱,

p(u), x(u,s)

since I(U ; S) = 0. Note that these two expressions have the same form, except that in the
causal case the maximum is over p(u) instead of p(u|s).
In addition, the coding schemes for both scenarios are the same except that in the
noncausal case the encoder knows the state sequence S n in advance and hence can choose
a sequence U n jointly typical with S n ; see Figure .. Thus the cost of causality (that is,
the gap between the causal and the noncausal capacities) is captured entirely by the more
restrictive independence condition between U and S.

p(s)

M Un Yn ̂
M
Encoder x(u, s) p(y|x, s) Decoder

Figure .. Alternative view of Gelfand–Pinsker coding.

7.6.4 Extensions to the Degraded DM-BC with State

The coding schemes for channels with state information available at the encoder can be
extended to multiuser channels. For example, Shannon strategy can be used whenever the
state information is available causally at some of the encoders. Similarly, Gelfand–Pinsker
coding can be used whenever the state sequence is available noncausally. The optimality
of these extensions, however, is not known in most cases.
In this subsection, we consider the degraded DM-BC with DM state p(y1 |x, s)p(y2 |y1 )
p(s) depicted in Figure .. The capacity region of this channel is known for the following
state information availability scenarios:

∙ State information available causally at the encoder: In this case, the encoder at trans-
mission i assigns a symbol xi to each message pair (m1 , m2 ) and state sequence s i
and decoder j = 1, 2 assigns an estimate m ̂ j to each output sequence y nj . The capacity
region is the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(U1 ; Y1 |U2 ),
R2 ≤ I(U2 ; Y2 )

for some pmf p(u1 , u2 ) and function x(u1 , u2 , s), where (U1 , U2 ) is independent of S
with |U1 | ≤ |X |⋅|S|(|X |⋅|S| + 1) and |U2 | ≤ |X |⋅|S| + 1.
184 Channels with State

S i /S n p(s) S i /S n

Si
Y1i ̂1
M
Decoder 
M1 , M2 Xi
Encoder p(y1 |x, s)p(y2 |y1 )
Y2i ̂2
M
Decoder 

Figure .. Degraded DM-BC with state.

∙ State information available causally at the encoder and decoder , but not at decoder :
The capacity region is the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X; Y1 |U , S),
R2 ≤ I(U ; Y2 )

for some pmf p(u)p(x|u, s) with |U | ≤ |X |⋅|S| + 1.

∙ State information available noncausally at both the encoder and decoder , but not at
decoder : The capacity region is the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X; Y1 |U , S),
R2 ≤ I(U ; Y2 ) − I(U ; S)

for some pmf p(u, x|s) with |U| ≤ |X |⋅|S| + 1.

However, the capacity region is not known for other scenarios, for example, when the
state information is available noncausally only at the encoder.

7.7 WRITING ON DIRTY PAPER

Consider the Gaussian channel with additive Gaussian state depicted in Figure .. The
output of the channel is
Y = X + S + Z,

where the state S ∼ N(0, Q) and the noise Z ∼ N(0, 1) are independent. Assume an ex-
pected average transmission power constraint
n
󵠈 E(xi2 (m, S n )) ≤ nP for every m ∈ [1 : 2nR ],
i=1

where the expectation is over the random state sequence S n .

7.7 Writing on Dirty Paper 185

S Z
Sn

M Xn Yn ̂
M
Encoder Decoder

Figure .. Gaussian channel with additive Gaussian state available noncausally at
the encoder.

If the state is not available at either the encoder or the decoder, the state becomes part
of the additive noise and the capacity is
P
C = C󶀥 󶀵.
1+Q
If the state is available at the decoder, it can be simply subtracted off, which in effect re-
duces the channel to a Gaussian channel with no state. Hence the capacity for this case is

CSI-D = CSI-ED = C(P). (.)

Now suppose that the state information is available noncausally only at the encoder.
Can the effect of the state still be canceled in this case? Note that the encoder cannot simply
presubtract the state because the resulting codeword may violate the power constraint.
Nevertheless, it turns out that the effect of the state can still be completely canceled!

Theorem .. The capacity of the Gaussian channel with additive Gaussian state when
the state information is available noncausally only at the encoder is

CSI-E = C(P).

This result is known as writing on dirty paper. Imagine a sheet of paper with indepen-
dent dirt spots of normal intensity. The writer knows the spot locations and intensities,
but the reader cannot distinguish between the message and the dirt. The writing on dirty
paper result shows that using an “optimal writing scheme,” the reader can still recover the
message as if the paper had no dirt at all.
Writing on dirty paper will prove useful in coding for Gaussian BCs as will be dis-
cussed in Chapters  and . In addition, it is used in information hiding and digital wa-
termarking.
Example . (Digital watermarking). Suppose that a publisher embeds a signal (water-
mark) X in a host image or text S. A viewer receives a noisy version of the watermarked
image Y = X + S + Z, where Z ∼ N(0, 1) is an additive Gaussian noise. Given a host im-
age S n , the authentication message M ∈ [1 : 2nR ] is encoded into a watermark sequence
X n (M, S n ), which is added to the image to produce the watermarked image S n + X n . An
authenticator wishes to retrieve the message M from Y n . What is the optimal tradeoff
186 Channels with State

between the amount of authentication information (measured by the capacity C of the

channel from X to Y ) and the fidelity of the watermarked image (measured by the mean
squared error distortion D between S and S + X)? By Theorem ., the answer is

C(D) = C(D),

where D corresponds to the average power of the watermark.

Proof of Theorem .. The proof of the converse follows immediately from (.). To
prove achievability, first consider the extension of the Gelfand–Pinsker theorem to the
DMC with DM state and input cost constraint
n
󵠈 E󶁡b(xi (m, S n ))󶁱 ≤ nB for every m ∈ [1 : 2nR ].
i=1

In this case, the capacity is

CSI-E = max 󶀡I(U ; Y) − I(U ; S)󶀱. (.)

p(u|s), x(u,s):E(b(X))≤B

Now we return to the Gaussian channel with additive Gaussian state. Since the channel
is Gaussian and linear, it is natural to expect the optimal distribution on (U , X) given S
that attains the capacity to be also Gaussian. This turns out to be the case, but with the
nonintuitive choice of X ∼ N(0, P) as independent of S and U = X + αS. Evaluating the
mutual information terms in (.) with this choice, we obtain

I(U ; Y) = h(X + S + Z) − h(X + S + Z | X + αS)

= h(X + S + Z) + h(X + αS) − h(X + S + Z, X + αS)
2
1 (P + Q + 1)(P + α Q)
= log 󶀦 󶀶
2 PQ(1 − α)2 + (P + α 2Q)
and
1 P + α2Q
I(U ; S) = log 󶀦 󶀶.
2 P
Thus
R(α) = I(U ; Y) − I(U ; S)
1 P(P + Q + 1)
= log 󶀥 󶀵.
2 PQ(1 − α)2 + (P + α 2Q)

Maximizing over α gives maxα R(α) = C(P), which is attained by α ∗ = P/(P + 1). Finally,
applying the discretization procedure in Section . completes the proof of the theorem.
Alternative proof of Theorem .. In the proof of the writing on dirty paper result, we
found that the optimal coefficient for S is α ∗ = P/(P + 1), which is the weight of the lin-
ear minimum mean squared error (MMSE) estimate of X given X + Z. The following
7.7 Writing on Dirty Paper 187

alternative proof uses the connection to MMSE estimation to generalize the result to non-
Gaussian state S.
As before, let U = X + αS, where X ∼ N(0, P) is independent of S and α = P/(P + 1).
From the Gelfand–Pinsker theorem, we can achieve

I(U ; Y) − I(U ; S) = h(U |S) − h(U |Y).

We wish to show that this is equal to

I(X; X + Z) = h(X) − h(X | X + Z)

= h(X) − h(X − α(X + Z)| X + Z)
= h(X) − h(X − α(X + Z)),

where the last step follows by the fact that (X − α(X + Z)), which is the error of the MMSE
estimate of X given (X + Z), and (X + Z) are orthogonal and thus independent because
X and (X + Z) are jointly Gaussian.
First consider
h(U |S) = h(X + αS |S) = h(X |S) = h(X).

Since X − α(X + Z) is independent of (X + Z, S) and thus of Y = X + S + Z,

h(U |Y) = h(X + αS |Y )

= h(X + αS − αY |Y )
= h(X − α(X + Z)|Y)
= h(X − α(X + Z)).

This completes the proof.

Remark .. This derivation does not require S to be Gaussian. Hence, CSI-E = C(P) for
any independent state S with finite power!

7.7.1 Writing on Dirty Paper for the Gaussian MAC

The writing on dirty paper result can be extended to several multiuser Gaussian channels.
Consider the Gaussian MAC with additive Gaussian state Y = X1 + X2 + S + Z, where
the state S ∼ N(0, Q) and the noise Z ∼ N(0, 1) are independent. Assume average power
constraints P1 on X1 and P2 on X2 . Further assume that the state sequence S n is available
noncausally at both encoders. The capacity region of this channel is the set of rate pairs
(R1 , R2 ) such that

R1 ≤ C(P1 ),
R2 ≤ C(P2 ),
R1 + R2 ≤ C(P1 + P2 )

This is the capacity region when the state sequence is available also at the decoder (so the
interference S n can be canceled out). Thus the proof of the converse is trivial.
188 Channels with State

For the proof of achievability, consider the “writing on dirty paper” channel Y = X1 +
S + (X2 + Z) with input X1 , known state S, and unknown noise (X2 + Z) that will be
shortly shown to be independent of S. We first prove achievability of the corner point
(R1 , R2 ) = (C(P1 /(P2 + 1)), C(P2 )) of the capacity region. By taking U1 = X1 + α1 S, where
X1 ∼ N(0, P1 ) independent of S and α1 = P1 /(P1 + P2 + 1), we can achieve R1 = I(U1 ; Y ) −
I(U1 ; S) = C(P1 /(P2 + 1)). Now as in successive cancellation decoding for the Gaussian
MAC, once u1n is decoded correctly, the decoder can subtract it from y n to obtain the
effective channel ỹn = y n − u1n = x2n + (1 − α1 )s n + z n . Thus, for sender , the channel
is another “writing on dirty paper” channel with input X2 , known state (1 − α1 )S, and
unknown noise Z. By taking U2 = X2 + α2 S, where X2 ∼ N(0, P2 ) is independent of S and
α2 = (1 − α1 )P2 /(P2 + 1) = P2 /(P1 + P2 + 1), we can achieve R2 = I(U2 ; Ỹ ) − I(U2 ; S) =
C(P2 ). The other corner point can be achieved by reversing the role of two encoders. The
rest of the capacity region can then be achieved using time sharing.
Achievability can be proved alternatively by considering the DM-MAC with DM state
p(y|x1 , x2 , s)p(s) when the state information is available noncausally at both encoders
and evaluating the inner bound on the capacity region consisting of all rate pairs (R1 , R2 )
such that

R1 < I(U1 ; Y |U2 ) − I(U1 ; S |U2 ),

R2 < I(U2 ; Y |U1 ) − I(U2 ; S |U1 ),
R1 + R2 < I(U1 , U2 ; Y) − I(U1 , U2 ; S)

for some pmf p(u1 |s)p(u2 |s) and functions x1 (u1 , s) and x2 (u2 , s). By choosing (U1 , U2 ,
X1 , X2 ) as before, it can be easily checked that this inner bound simplifies to the capacity
region. It is not known, however, if this inner bound is tight for the general DM-MAC
with DM state.

7.7.2 Writing on Dirty Paper for the Gaussian BC

Now consider the Gaussian BC with additive Gaussian state Y j = X + S j + Z j , j = 1, 2,
where the state components S1 ∼ N(0, Q1 ) and S2 ∼ N(0, Q2 ) and the noise components
Z1 ∼ N(0, N1 ) are Z2 ∼ N(0, N2 ) are independent, and N2 ≥ N1 . Assume average power
constraint P on X. We assume that the state sequences S1n and S2n are available noncausally
at the encoder. The capacity region of this channel is the set of rate pairs (R1 , R2 ) such that

αP
R1 ≤ C 󶀥 󶀵,
N1
̄
αP
R2 ≤ C 󶀥 󶀵
αP + N2

for some α ∈ [0, 1]. This is the same as the capacity region when the state sequences are
available also at their respective decoders, which establishes the converse.
The proof of achievability follows closely that of the MAC case. We split the input into
two independent parts X1 ∼ N(0, αP) and X2 ∼ N(0, αP) ̄ such that X = X1 + X2 . For
7.8 Coded State Information 189

the weaker receiver, consider the “writing on dirty paper” channel Y2 = X2 + S2 + (X1 +
Z2 ) with input X2 , known state S2 , and noise (X1 + Z2 ). Then by taking U2 = X2 + β2 S2 ,
where X2 ∼ N(0, αP) ̄ independent of S2 and β2 = αP/(P ̄ + N2 ), we can achieve any rate
̄
R2 < I(U2 ; Y2 ) − I(U2 ; S2 ) = C(αP/(αP + N2 )). For the stronger receiver, consider another
“writing on dirty paper” channel Y1 = X1 + (X2 + S1 ) + Z1 with input X1 , known state
(X2 + S1 ), and noise Z1i . Using the writing on dirty paper result with U1 = X1 + β1 (X2 +
S1 ) and β1 = αP/(αP + N1 ), we can achieve any rate R1 < C(αP/N1 ).
As in the MAC case, achievability can be proved alternatively by considering the DM-
BC with state p(y1 , y2 |x, s)p(s) when the state information is available noncausally at the
encoder and evaluating the inner bound on the capacity region consisting of all rate pairs
(R1 , R2 ) such that
R1 < I(U1 ; Y1 ) − I(U1 ; S),
R2 < I(U2 ; Y2 ) − I(U2 ; S),
R1 + R2 < I(U1 ; Y1 ) + I(U2 ; Y1 ) − I(U1 ; U2 ) − I(U1 , U2 ; S)
for some pmf p(u1 , u2 |s) and function x(u1 , u2 , s). Taking S = (S1 , S2 ) and choosing
(U1 , U2 , X) as before, this inner bound simplifies to the capacity region. Again it is not
known if this inner bound is tight for the general DM-BC with DM state.

7.8 CODED STATE INFORMATION

So far we have considered settings where the state sequence is available perfectly (causally
or noncausally) at the encoder and/or the decoder. Providing such perfect information
about the state sequence is often not feasible, however, because the channel over which
such information is to be sent has limited capacity. This motivates the coded side infor-
mation setup in which a rate-limited description of the state is provided to the encoder
and/or to the decoder. What is the optimal tradeoff between the communication rate and
the rate at which the state is described?
The answer to this question is known only in a few special cases, for example, when
the state sequence is available at the decoder, but only a coded version of it is avail-
able at the encoder as depicted in Figure ., Formally, consider a DMC with DM state
and define a (2nR , 2nRs , n) code to consist of a state encoder ms (s n ) ∈ [1 : 2nRs ], a mes-
̂ n , s n ). The probability of error is defined as
sage encoder x n (m, ms ), and a decoder m(y
(n) ̂
Pe = P{M ̸= M}. A rate pair (R, Rs ) is said to be achievable if there exists a sequence
of (2nR , 2nRs , n) codes such that limn→∞ Pe(n) = 0. We define the capacity–rate function
C(Rs ) as the supremum over rates R such that (R, Rs ) is achievable.

Theorem .. The capacity–rate function for the DMC with DM state p(y|x, s)p(s)
when the state information is available at the decoder and coded state information of
rate Rs is available at the encoder is

C(Rs ) = max I(X; Y |S, U).

p(u|s)p(x|u):Rs ≥I(U ;S)
190 Channels with State

Sn p(s)
Encoder 

Ms
M Xn Yn ̂
M
Encoder  p(y|x, s) Decoder

Figure .. Coded state information available at the encoder.

Achievability follows by using joint typicality encoding to describe the state sequence
S n by U n at rate Rs and using time sharing on the “compressed state” sequence U n as for
the case where the state sequence is available at both the encoder and the decoder; see
Section ...
To prove the converse, we identify the auxiliary random variable as Ui = (Ms , S i−1 )
and consider

nRs ≥ H(Ms )
= I(Ms ; S n )
n
= 󵠈 I(Ms ; Si |S i−1 )
i=1
n
= 󵠈 I(Ms , S i−1 ; Si )
i=1
n
= 󵠈 I(Ui ; Si ).
i=1

In addition, by Fano’s inequality,

Since Si → Ui → Xi and Ui → (Xi , Si ) → Yi for i ∈ [1 : n], we can complete the proof by

introducing the usual time-sharing random variable.
Summary 191

SUMMARY

∙ State-dependent channel models:

∙ Compound channel
∙ Arbitrarily varying channel
∙ DMC with DM state
∙ Channel coding with side information
∙ Shannon strategy
∙ Gelfand–Pinsker coding:
∙ Multicoding (subcodebook generation)
∙ Use of joint typicality encoding in channel coding
∙ Writing on dirty paper
∙ Open problems:
7.1. What is the capacity region of the DM-MAC with DM state when the state infor-
mation is available causally or noncausally at the encoders?
7.2. What is the common-message capacity of the DM-BC with DM state when the
state information is available noncausally at the encoder?

BIBLIOGRAPHIC NOTES

The compound channel was first studied by Blackwell, Breiman, and Thomasian (),
Dobrushin (b), and Wolfowitz (), who established Theorem . for arbitrary state
spaces using maximum likelihood decoding or joint typicality decoding. An alternative
proof was given by Csiszár and Körner (b) using maximum mutual information de-
coding. Generalizations of the compound channel to the multiple access channel can be
found in Ahlswede (), Han (), and Lapidoth and Narayan (). The arbitrarily
varying channel was first studied by Blackwell, Breiman, and Thomasian () who es-
tablished the capacity characterization in (.). Coding theorems for other AVC settings
can be found in Ahlswede and Wolfowitz (, ), Ahlswede (), Wolfowitz (),
Csiszár and Narayan (), Csiszár and Körner (b), and Csiszár and Körner (a).
Shannon () introduced the DMC with DM state and established Theorem . using
the idea of coding over the function space (Shannon strategy). Goldsmith and Varaiya
() used time sharing on the state sequence to establish the capacity when the state
information is available at both the encoder and the decoder. The capacity region in (.)
of the DM-MAC with two state components when the state information is available in a
distributed manner is due to Jafar ().
192 Channels with State

The memory with stuck-at faults was studied by Kuznetsov and Tsybakov (), who
established the capacity using multicoding. This result was generalized to DMCs with
DM state by Gelfand and Pinsker (b) and Heegard and El Gamal (). Our proofs
follow Heegard and El Gamal (). The capacity results on the DM-BC with DM state
are due to Steinberg ().
The writing on dirty paper result is due to Costa (). The alternative proof follows
Cohen and Lapidoth (). A similar result was also obtained for the case of nonstation-
ary, nonergodic Gaussian noise and state (Yu, Sutivong, Julian, Cover, and Chiang ).
Erez, Shamai, and Zamir () studied lattice coding strategies for writing on dirty pa-
per. Writing on dirty paper for the Gaussian MAC and the Gaussian BC was studied by
Gelfand and Pinsker (). Applications to information hiding and digital watermarking
can be found in Moulin and O’Sullivan ().
The DMC with coded state information was first studied by Heegard and El Gamal
(), who considered a more general setup in which encoded versions of the state infor-
mation are separately available at the encoder and the decoder. Rosenzweig, Steinberg,
and Shamai () studied a similar setup with coded state information as well as a quan-
tized version of the state sequence available at the encoder. Extensions to the DM-MAC
with coded state information were studied in Cemal and Steinberg (). Surveys of the
literature on channels with state can be found in Lapidoth and Narayan () and Keshet,
Steinberg, and Merhav ().

PROBLEMS

.. Consider the DMC with DM state p(y|x, s)p(s) and let b(x) ≥ 0 be an input cost
function with b(x0 ) = 0 for some x0 ∈ X . Assume that the state information is
noncausally available at the encoder. Establish the capacity–cost function in (.).
.. Establish the capacity region of the DM-MAC with -DM state components
p(y|x1 , x2 , s)p(s), where S = (S1 , S2 ), (a) when the state information is available
only at the decoder and (b) when the state information is available at the decoder
and each state component S j , j = 1, 2, is available at encoder j.
.. Establish the capacity region of the degraded DM-BC with DM state p(y1 |x, s)
p(y2 |y1 )p(s) (a) when the state information is causally available only at the en-
coder, (b) when the state information is causally available at the encoder and de-
coder , and (c) when the state information is noncausally available at the encoder
and decoder .
.. Compound channel with state information available at the encoder. Suppose that the
encoder knows the state s before communication commences over a compound
channel p(ys |x). Find the capacity.
.. Compound channel with arbitrary state space. Consider the compound channel
p(ys |x) with state space S of infinite cardinality.
Problems 193

(a) Show that there is a finite subset S̃n ⊂ S with cardinality at most (n + 1)2|X |⋅|Y|
such that for every s ∈ S, there exists s̃ ∈ S̃n with
Tє(n) (X, Ys ) = Tє(n) (X, Ys̃).
(Hint: Recall that a typical set is defined by the upper and lower bounds on the
number of occurrences of each pair (x, y) ∈ X × Y in an n sequence.)
(b) Use part (a) to prove achievability for Theorem . with infinite state space.
.. No state information. Show that the capacity of the DMC with DM state p(y|x, s)
p(s) when no state information is available at either the encoder or decoder is
C = max I(X; Y ),
p(x)

where p(y|x) = ∑s p(s)p(y|x, s). Further show that any (2nR , n) code for the DMC
p(y|x) achieves the same average probability of error when used over the DMC
with DM state p(y|x, s)p(s), and vice versa.
.. Stationary ergodic state process. Assume that {Si } is a discrete stationary ergodic
process.
(a) Show that the capacity of the DMC with state p(y|x, s) when the state infor-
mation is available only at the decoder is
CSI-D = max I(X; Y |S).
p(x)

(Hint: If X n ∼ ∏ni=1 p X (xi ), then (X n , S n , Y n ) is jointly stationary ergodic.)

(b) Show that the capacity of the DMC with state p(y|x, s) when the state infor-
mation is available at both the encoder and the decoder is
CSI-ED = max I(X; Y |S).
p(x|s)

.. Strictly causal state information. Consider the DMC with DM state p(y|x, s)p(s).
Suppose that the state information is available strictly causally at the encoder, that
is, the encoder is specified by xi (m, s i−1 ), i ∈ [1 : n]. Establish the capacity (a)
when the state information is not available at the decoder and (b) when the state
information is also available at the decoder.
.. DM-MAC with strictly causal state information. Consider the DM-MAC with DM
state Y = (X1 ⊕ S, X2 ), where the inputs X1 and X2 are binary and the state S ∼
Bern(1/2). Establish the capacity region (a) when the state information is not
available at either the encoders or the decoder, (b) when the state information is
available strictly causally only at encoder , and (c) when the state information is
available strictly causally at both encoders.
Remark: The above two problems demonstrate that while strictly causal state in-
formation does not increase the capacity of point-to-point channels, it can increase
the capacity of some multiuser channels. We will see similar examples for channels
with feedback in Chapter .
194 Channels with State

.. Value of state information. Consider the DMC with DM state p(y|x, s)p(s). Quan-
tify how much state information can help by proving the following statements:
(a) CSI-D − C ≤ max p(x) H(S|Y ).
(b) CSI-ED − CSI-E ≤ CSI-ED − CCSI-E ≤ max p(x|s) H(S|Y).
Thus, the state information at the decoder is worth at most H(S) bits. Show that
the state information at the encoder can be much more valuable by providing an
example for which CSI-E − C > H(S).
.. Ternary-input memory with state. Consider the DMC with DM state p(y|x, s)p(s)
depicted in Figure ..

S p(s) X Y
0
0
0 1/3 1
1
2 1/2

0
1/2 0
1 1/3 1
1
2

0 1/2
0
2 1/3 1
1
2

Figure .. Ternary-input memory with state.

(a) Show that the capacity when the state information is not available at either the
encoder or the decoder is C = 0.
(b) Show that the capacity when the state information is available only at the de-
coder is CSI-D = 2/3.
(c) Show that the capacity when the state information is (causally or noncausally)
available at the encoder (regardless of whether it is available at the decoder) is
CCSI-E = CSI-E = CSI-ED = 1.
.. BSC with state. Consider the DMC with DM state p(y|x, s)p(s), where S ∼ Bern(q)
and p(y|x, s) is a BSC(ps ), s = 0, 1.
(a) Show that the capacity when the state information is not available at either the
Problems 195

encoder or the decoder is

C = 1 − H(p0 q̄ + p1 q).

(b) Show that the capacity when the state information is available at the decoder
(regardless of whether it is available at the encoder) is

̄
CSI-D = CSI-ED = 1 − qH(p 0 ) − qH(p1 ).

1 − H(p0 q̄ + p1 q) if p0 , p1 ∈ [0, 1/2] or p0 , p1 ∈ [1/2, 1],

CCSI-E = CSI-E = 󶁇
1 − H( p̄0 q̄ + p1 q) otherwise.

(Hint: Use a symmetrization argument similar to that in Section .. with

x(u, s) = 1 − x(−u, s) and show that it suffices to consider U ∼ Bern(1/2) in-
dependent of S.)
Remark: The above two problems appeared as examples in Heegard and El Gamal
(). These problems demonstrate that there is no universal ordering between
CSI-E and CSI-D , each of which is strictly less than CSI-ED in general.
.. Common-message broadcasting with state information. Consider the DM-BC with
DM state p(y1 , y2 |x, s)p(s). Establish the common-message capacity C0 (that is,
the maximum achievable common rate R0 when R1 = R2 = 0) for the following
settings:
(a) The state information is available only at decoder .
(b) The state information is causally available at both the encoder and decoder .
(c) The state information is noncausally available at the encoder and decoder .
(d) The state information is causally available only at the encoder.
.. Memory with stuck-at faults and noise. Recall the memory with stuck-at faults
in Example .. Assume that the memory now has temporal noise in addition to
stuck-at faults such that for state s = 2, the memory is modeled by the BSC(p) for
p ∈ [0, 1/2]. Find the capacity of this channel (a) when the state information is
causally available only at the encoder and (b) when the state information is non-
causally available only at the encoder.
.. MMSE estimation via writing on dirty paper. Consider the additive noise channel
with output (observation)
Y = X + S + Z,

where X is the transmitted signal and has mean μ and variance P, S is the state
and has zero mean and variance Q, and Z is the noise and has zero mean and
196 Channels with State

variance N. Assume that X, S, and Z are uncorrelated. The sender knows S and
wishes to transmit a signal U , but instead transmits X such that U = X + αS for
some constant α.
(a) Find the mean squared error (MSE) of the linear MMSE estimate of U given Y
in terms only of μ, α, P, Q, and N.
(b) Find the value of α that minimizes the MSE in part (a).
(c) How does the minimum MSE obtained in part (b) compare to the MSE of the
linear MMSE when there is no state at all, i.e., S = 0? Interpret the result.
.. Cognitive radio. Consider the Gaussian IC

Y1 = д11 X1 + д12 X2 + Z1 ,
Y2 = д22 X2 + Z2 ,

where д11 , д12 , and д22 are channel gains, and Z1 ∼ N(0, 1) and Z2 ∼ N(0, 1) are
noise components. Sender  encodes two independent messages M1 and M2 , while
sender  encodes only M2 . Receiver  wishes to recover M1 and receiver  wishes
to recover M2 . Assume average power constraint P on each of X1 and X2 . Find the
capacity region.
Remark: This is a simple example of cognitive radio channel models studied, for
example, in Devroye, Mitran, and Tarokh (), Wu, Vishwanath, and Arapos-
tathis (), and Jovičić and Viswanath ().
.. Noisy state information. Consider the DMC with DM state p(y|x, s)p(s), where
the state has three components S = (T0 , T1 , T2 ) ∼ p(t0 , t1 , t2 ). Suppose that T1 is
available at the encoder, T2 is available at the decoder, and T0 is hidden from both.
(a) Suppose that T1 is a function of T2 . Show that the capacity (for both causal and
noncausal cases) is
CNSI = max I(X; Y |T2 ).
p(x|t1 )

(b) Show that any (2nR , n) code (in either the causal or noncausal case) for this
channel achieves the same probability of error when used over the DMC with
state p(y󳰀 |x, t1 )p(t1 ), where Y 󳰀 = (Y , T2 ) and

p(y, t2 |x, t1 ) = 󵠈 p(y|x, t0 , t1 , t2 )p(t0 , t2 |t1 ),

and vice versa.

Remark: This result is due to Caire and Shamai ().
CHAPTER 8

General Broadcast Channels

We resume the discussion of broadcast channels started in Chapter . Again consider the
-receiver DM-BC p(y1 , y2 |x) with private and common messages depicted in Figure ..
The definitions of a code, achievability, and capacity regions are the same as in Chapter .
As mentioned before, the capacity region of the DM-BC is not known in general. In
Chapter , we presented the superposition coding scheme and showed that it is optimal
for several classes of channels in which one receiver is stronger than the other. In this
chapter, we study coding schemes that can outperform superposition coding and present
the tightest known inner and outer bounds on the capacity region of the general broadcast
channel.
We first show that superposition coding is optimal for the -receiver DM-BC with
degraded message sets, that is, when either R1 = 0 or R2 = 0. We then show that super-
position coding is not optimal for BCs with more than two receivers. In particular, we
establish the capacity region of the -receiver multilevel BC. The achievability proof in-
volves the new idea of indirect decoding, whereby a receiver who wishes to recover only
the common message still uses satellite codewords in decoding for the cloud center.
We then present Marton’s inner bound on the private-message capacity region of the
-receiver DM-BC and show that it is optimal for the class of semideterministic BCs. The
coding scheme involves the multicoding technique introduced in Chapter  and the new
idea of joint typicality codebook generation to construct dependent codewords for inde-
pendent messages without the use of a superposition structure. The proof of the inner
bound uses the mutual covering lemma, which is a generalization of the covering lemma
in Section .. Marton’s coding scheme is then combined with superposition coding to
establish an inner bound on the capacity region of the DM-BC that is tight for all classes
of DM-BCs with known capacity regions. Next, we establish the Nair–El Gamal outer
bound on the capacity region of the DM-BC. We show through an example that there is

Y1n ̂ 01 , M
M ̂1
Decoder 
M0 , M1 , M2 Xn
Encoder p(y1 , y2 |x)
Y2n ̂ 02 , M
M ̂2
Decoder 

Figure .. Two-receiver broadcast communication system.

198 General Broadcast Channels

a gap between these inner and outer bounds. Finally, we discuss extensions of the afore-
mentioned coding techniques to broadcast channels with more than two receivers and
with arbitrary messaging requirements.

8.1 DM-BC WITH DEGRADED MESSAGE SETS

A broadcast channel with degraded message sets is a model for a layered communication
setup in which a sender wishes to communicate a common message to all receivers, a first
private message to a first subset of the receivers, a second private message to a second
subset of the first subset, and so on. Such a setup arises, for example, in video or music
broadcasting over a wireless network at varying levels of quality. The common message
represents the lowest quality description of the data to be sent to all receivers, and each
private message represents a refinement over the common message and previous private
messages.
We first consider the -receiver DM-BC with degraded message sets, i.e., with R2 = 0.
The capacity region for this case is known.

Theorem .. The capacity region of the -receiver DM-BC p(y1 , y2 |x) with degraded
message sets is the set of rate pairs (R0 , R1 ) such that

R0 ≤ I(U ; Y2 ),
R1 ≤ I(X; Y1 |U),
R0 + R1 ≤ I(X; Y1 )

for some pmf p(u, x) with |U | ≤ min{|X |, |Y1 | + |Y2 |} + 1.

This capacity region is achieved using superposition coding (see Section .) by noting
that receiver  can recover both messages M0 and M1 . The converse proof follows similar
lines to that for the more capable BC in Section .. It is proved by first considering the
alternative characterization of the capacity region consisting of all rate pairs (R0 , R1 ) such
that
R0 ≤ I(U ; Y2 ),
R0 + R1 ≤ I(X; Y1 ), (.)
R0 + R1 ≤ I(X; Y1 |U ) + I(U ; Y2 )
for some pmf p(u, x). The Csiszár sum identity and other standard techniques are then
used to complete the proof. The cardinality bound on U can be proved using the convex
cover method in Appendix C.
It turns out that we can prove achievability for the alternative region in (.) directly.
The proof involves an (unnecessary) rate splitting step. We divide the private message
M1 into two independent messages M10 at rate R10 and M11 at rate R11 . We use super-
position coding whereby the cloud center U represents the message pair (M0 , M10 ) and
8.2 Three-Receiver Multilevel DM-BC 199

the satellite codeword X represents the message triple (M0 , M10 , M11 ). Following similar
steps to the achievability proof of the superposition coding inner bound, we can show
that (R0 , R10 , R11 ) is achievable if

R0 + R10 < I(U ; Y2 ),

R11 < I(X; Y1 |U ),
R0 + R1 < I(X; Y1 )

for some pmf p(u, x). Substituting R11 = R1 − R10 , we have the conditions

R10 ≥ 0,
R10 < I(U ; Y2 ) − R0 ,
R10 > R1 − I(X; Y1 |U),
R0 + R1 < I(X; Y1 ),
R10 ≤ R1 .

Eliminating R10 by the Fourier–Motzkin procedure in Appendix D yields the desired char-
acterization.
Rate splitting turns out to be crucial when the DM-BC has more than  receivers.

8.2 THREE-RECEIVER MULTILEVEL DM-BC

The degraded message set capacity region of the DM-BC with more than two receivers is
not known in general. We show that the straightforward extension of the superposition
coding inner bound in Section . to more than two receivers is not optimal in general.
Consider the -receiver multilevel DM-BC (X , p(y1 , y3 |x)p(y2 |y1 ), Y1 × Y2 × Y3 ) de-
picted in Figure ., which is a -receiver DM-BC where Y2 is a degraded version of Y1 . We
consider the case of two degraded message sets, where a common message M0 ∈ [1 : 2nR0 ]
is to be communicated to all receivers, and a private message M1 ∈ [1 : 2nR1 ] is to be com-
municated only to receiver . The definitions of a code, probability of error, achievability,
and capacity region for this setting are as before.
A straightforward extension of the superposition coding inner bound to the -receiver
multilevel DM-BC, where receivers  and  decode for the cloud center and receiver 

Y1
p(y2 |y1 ) Y2
X p(y1 , y3 |x)
Y3

Figure .. Three-receiver multilevel broadcast channel.

200 General Broadcast Channels

decodes for the satellite codeword, gives the set of rate pairs (R0 , R1 ) such that

R0 < min{I(U ; Y2 ), I(U ; Y3 )},

(.)
R1 < I(X; Y1 |U)

for some pmf p(u, x). Note that the last inequality R0 + R1 < I(X; Y1 ) in the superposition
coding inner bound in Theorem . drops out by the assumption that Y2 is a degraded
version of Y1 .
This region turns out not to be optimal in general.

Theorem .. The capacity region of the -receiver multilevel DM-BC p(y1 , y3 |x)
p(y2 |y1 ) is the set of rate pairs (R0 , R1 ) such that

R0 ≤ min{I(U ; Y2 ), I(V ; Y3 )},

R1 ≤ I(X; Y1 |U ),
R0 + R1 ≤ I(V ; Y3 ) + I(X; Y1 |V )

for some pmf p(u, 󰑣)p(x|󰑣) with |U | ≤ |X | + 4 and |V| ≤ (|X | + 1)(|X | + 4).

The proof of the converse uses steps from the converse proofs for the degraded BC and
the -receiver BC with degraded message sets. The cardinality bounds on U and V can
be proved using the extension of the convex cover method to multiple random variables
in Appendix C.

8.2.1 Proof of Achievability

The achievability proof of Theorem . uses the new idea of indirect decoding. We di-
vide the private message M1 into two messages M10 and M11 . We generate a codebook
of un sequences for the common message M0 , and use superposition coding to gener-
ate a codebook of 󰑣 n sequences for the message pair (M0 , M10 ). We then use super-
position coding again to generate a codebook of x n sequences for the message triple
(M0 , M10 , M11 ) = (M0 , M1 ). Receiver  finds (M0 , M1 ) by decoding for x n , receiver  finds
M0 by decoding for un , and receiver  finds M0 indirectly by simultaneous decoding for
(un , 󰑣 n ). We now provide details of the proof.
Rate splitting. Divide the private message M1 into two independent messages M10 at rate
R10 and M11 at rate R11 . Hence R1 = R10 + R11 .
Codebook generation. Fix a pmf p(u, 󰑣)p(x|󰑣). Randomly and independently generate
2nR0 sequences un (m0 ), m0 ∈ [1 : 2nR0 ], each according to ∏ni=1 pU (ui ). For each m0 , ran-
domly and conditionally independently generate 2nR10 sequences 󰑣 n (m0 , m10 ), m10 ∈ [1 :
2nR10 ], each according to ∏ni=1 pV |U (󰑣i |ui (m0 )). For each pair (m0 , m10 ), randomly and
conditionally independently generate 2nR11 sequences x n (m0 , m10 , m11 ), m11 ∈ [1 : 2nR11 ],
each according to ∏ni=1 p X|V (xi |󰑣i (m0 , m10 )).
8.2 Three-Receiver Multilevel DM-BC 201

Encoding. To send the message pair (m0 , m1 ) = (m0 , m10 , m11 ), the encoder transmits
x n (m0 , m10 , m11 ).
Decoding and analysis of the probability of error for decoders  and . Decoder  de-
̂ 02 ∈ [1 : 2nR0 ] is sent if it is the unique message such that (un (m
clares that m ̂ 02 ), y2n ) ∈ Tє(n) .
By the LLN and the packing lemma, the probability of error tends to zero as n → ∞ if
R0 < I(U ; Y2 ) − δ(є). Decoder  declares that (m ̂ 01 , m
̂ 10 , m
̂ 11 ) is sent if it is the unique mes-
sage triple such that (un (m ̂ 01 ), 󰑣 n (m
̂ 01 , m
̂ 10 ), x n (m
̂ 01 , m ̂ 11 ), y1n ) ∈ Tє(n) . By the LLN
̂ 10 , m
and the packing lemma, the probability of error tends to zero as n → ∞ if

R11 < I(X; Y1 |V ) − δ(є),

R10 + R11 < I(X; Y1 |U ) − δ(є),
R0 + R10 + R11 < I(X; Y1 ) − δ(є).
Decoding and analysis of the probability of error for decoder . If receiver  decodes for
m0 directly by finding the unique message m ̂ 03 such that (un (m ̂ 03 ), y3n ) ∈ Tє(n) , we obtain
the condition R0 < I(U ; Y3 ) − δ(є), which together with the previous conditions gives the
extended superposition coding inner bound in (.).
To achieve the capacity region, receiver  decodes for m0 indirectly. It declares that m ̂ 03
(n)
is sent if it is the unique message such that (u (m n
̂ 03 ), 󰑣 (m
n
̂ 03 , m10 ), y3 ) ∈ Tє for some
n

m10 ∈ [1 : 2nR10 ]. Assume that (M0 , M10 ) = (1, 1) is sent and consider all possible pmfs
for the triple (U n (m0 ), V n (m0 , m10 ), Y3n ) as listed in Table ..

m0 m10 Joint pmf

  p(un , 󰑣 n )p(y3n |󰑣 n )
 ∗ p(un , 󰑣 n )p(y3n |un )
∗ ∗ p(un , 󰑣 n )p(y3n )
∗  p(un , 󰑣 n )p(y3n )

Table .. The joint distribution induced by different (m0 , m10 ) pairs.

The second case does not result in an error and the last two cases have the same pmf.
Thus, the decoding error occurs iff one or both of the following events occur:

E31 = 󶁁(U n (1), V n (1, 1), Y3n ) ∉ Tє(n) 󶁑,

E32 = 󶁁(U n (m0 ), V n (m0 , m10 ), Y3n ) ∈ Tє(n) for some m0 ̸= 1, m10 󶁑.

Then, the probability of error for decoder  averaged over codebooks P(E3 ) ≤ P(E31 ) +
P(E32 ). By the LLN, P(E31 ) tends to zero as n → ∞. By the packing lemma (with A =
[2 : 2n(R0 +R10 ) ], X ← (U , V ), and U = ), P(E32 ) tends to zero as n → ∞ if R0 + R10 <
I(U , V ; Y3 ) − δ(є) = I(V ; Y3 ) − δ(є). Combining the bounds, substituting R10 + R11 = R1 ,
and eliminating R10 and R11 by the Fourier–Motzkin procedure completes the proof of
achievability.
202 General Broadcast Channels

Indirect decoding is illustrated in Figures . and .. Suppose that R0 > I(U ; Y3 ) as
shown in Figure .. Then receiver  cannot decode for the cloud center un (1) directly.
Now suppose that R0 + R10 < I(V ; Y3 ) in addition. Then receiver  can decode for the
cloud center indirectly by finding the unique message m ̂ 0 such that (un (m
̂ 0 ), 󰑣 n (m
̂ 0 , m10 ),
(n)
y3 ) ∈ Tє for some m10 as shown in Figure .. Note that the condition R0 + R10 <
n

I(V ; Y3 ) suffices in general (even when R0 ≤ I(U ; Y3 )).

(un (m0 ), y3n ) ∈ Tє(n)

Un Vn

un (1) y3n

󰑣 n (1, m10 ),
m10 ∈ [1 : 2nR10 ]
Figure .. Direct decoding for the cloud center un (1).

Un Vn

un (1) y3n

(un (1), 󰑣 n (1, m10 ), y3n ) ∈ Tє(n) ,

m10 ∈ [1 : 2nR10 ]
Figure .. Indirect decoding for un (1) via 󰑣 n (1, m10 ).

Remark 8.1. Although it seems surprising that higher rates can be achieved by having
receiver  recover more than it needs to, the reason we can do better than superposi-
tion coding can be explained by the observation that for a -receiver BC p(y2 , y3 |x),
the conditions I(U ; Y2 ) < I(U ; Y3 ) and I(V ; Y2 ) > I(V ; Y3 ) can hold simultaneously for
8.2 Three-Receiver Multilevel DM-BC 203

some U → V → X; see discussions on less noisy and more capable BCs in Section ..
Now, considering our -receiver BC scenario, suppose we have a choice of U such that
I(U ; Y3 ) < I(U ; Y2 ). In this case, requiring receivers  and  to directly decode for un
necessitates that R0 < I(U ; Y3 ). From the above observation, a V may exist such that
U → V → X and I(V ; Y3 ) > I(V ; Y2 ), in which case the rate of the common message can
be increased to I(U ; Y2 ) and receiver  can still find un indirectly by decoding for (un , 󰑣 n ).
Thus, the additional “degree-of-freedom” introduced by V helps increase the rates in spite
of the fact that receiver  may need to recover more than just the common message.
Remark 8.2. If we require receiver  to recover M10 , we obtain the region consisting of
all rate pairs (R0 , R1 ) such that
R0 < min{I(U ; Y2 ), I(V ; Y3 )},
R1 < I(X; Y1 |U),
(.)
R0 + R1 < I(V ; Y3 ) + I(X; Y1 |V ),
R1 < I(V ; Y3 |U) + I(X; Y1 |V )
for some pmf p(u)p(󰑣|u)p(x|󰑣). While this region involves one more inequality than the
capacity region in Theorem ., it can be shown by optimizing the choice of V for each
given U that it coincides with the capacity region. However, requiring decoder  to recover
M10 is unnecessary and leads to a region with more inequalities for which the converse is
difficult to establish.

8.2.2 Multilevel Product DM-BC

We show via an example that the extended superposition coding inner bound in (.) can
be strictly smaller than the capacity region in Theorem . for the -receiver multilevel
DM-BC. Consider the product of two -receiver BCs specified by the Markov chains
X1 → Y31 → Y11 → Y21 ,
X2 → Y12 → Y22 .
For this channel, the extended superposition coding inner bound in (.) reduces to the
set of rate pairs (R0 , R1 ) such that
R0 ≤ I(U1 ; Y21 ) + I(U2 ; Y22 ),
R0 ≤ I(U1 ; Y31 ), (.)
R1 ≤ I(X1 ; Y11 |U1 ) + I(X2 ; Y12 |U2 )
for some pmf p(u1 , x1 )p(u2 , x2 ). Similarly, it can be shown that the capacity region in
Theorem . reduces to the set of rate pairs (R0 , R1 ) such that
R0 ≤ I(U1 ; Y21 ) + I(U2 ; Y22 ),
R0 ≤ I(V1 ; Y31 ),
(.)
R1 ≤ I(X1 ; Y11 |U1 ) + I(X2 ; Y12 |U2 ),
R0 + R1 ≤ I(V1 ; Y31 ) + I(X1 ; Y11 |V1 ) + I(X2 ; Y12 |U2 )
204 General Broadcast Channels

for some pmf p(u1 , 󰑣1 )p(x1 |󰑣1 )p(u2 , x2 ).

In the following, we compare the extended superposition coding inner bound in (.)
to the capacity region in (.).
Example .. Consider the multilevel product DM-BC depicted in Figure ., where X1 ,
X2 , Y12 , and Y31 are binary, and Y11 , Y21 , Y22 ∈ {0, 1, e}.
The extended superposition coding inner bound in (.) can be simplified to the set
of rate pairs (R0 , R1 ) such that
α β
R0 ≤ min 󶁅 + , α󶁕 ,
6 2
(.)
ᾱ ̄
R1 ≤ + β
2
for some α, β ∈ [0, 1]. It is straightforward to show that (R0 , R1 ) = (1/2, 5/12) lies on the
boundary of this region. By contrast, the capacity region in (.) can be simplified to the
set of rate pairs (R0 , R1 ) such that
r s
R0 ≤ min 󶁄 + , t󶁔 ,
6 2
1−r
R1 ≤ + 1 − s, (.)
2
1−t
R0 + R1 ≤ t + +1−s
2
for some 0 ≤ r ≤ t ≤ 1, 0 ≤ s ≤ 1. Note that substituting r = t in (.) yields the extended
superposition coding inner bound in (.). By setting r = 0, s = 1, t = 1, it can be read-
ily checked that (R0 , R1 ) = (1/2, 1/2) lies on the boundary of the capacity region. For
R0 = 1/2, however, the maximum achievable R1 in the extended superposition coding in-
ner bound in (.) is 5/12. Thus the capacity region is strictly larger than the extended
superposition coding inner bound.

Y31 1/2 Y11 1/3

1 1

X1 e Y21

0 0
1/2 1/3

Y12 1/2
1 1

X2 e Y22

0 0
1/2
Figure .. Binary erasure multilevel product DM-BC.
8.3 Marton’s Inner Bound 205

8.3 MARTON’S INNER BOUND

We now turn our attention to the -receiver DM-BC with only private messages, i.e., when
R0 = 0. First we show that a rate pair (R1 , R2 ) is achievable for the DM-BC p(y1 , y2 |x) if

R1 < I(U1 ; Y1 ),
(.)
R2 < I(U2 ; Y2 )

for some pmf p(u1 )p(u2 ) and function x(u1 , u2 ). The proof of achievability for this inner
bound is straightforward.
Codebook generation. Fix a pmf p(u1 )p(u2 ) and x(u1 , u2 ). Randomly and indepen-
dently generate 2nR1 sequences u1n (m1 ), m1 ∈ [1 : 2nR1 ], each according to ∏ni=1 pU1 (u1i ),
and 2nR2 sequences u2n (m2 ), m2 ∈ [1 : 2nR2 ], each according to ∏ni=1 pU2 (u2i ).
Encoding. To send (m1 , m2 ), transmit xi (u1i (m1 ), u2i (m2 )) at time i ∈ [1 : n].
Decoding and analysis of the probability of error. Decoder j = 1, 2 declares that m ̂ j is
(n)
n
̂ j ), Y j ) ∈ Tє . By the LLN and the packing
sent if it is the unique message such that (U j (m n

lemma, the probability of decoding error tends to zero as n → ∞ if R j < I(U j ; Y j ) − δ(є)
for j = 1, 2. This completes the proof of achievability.
Marton’s inner bound allows U1 , U2 in (.) to be arbitrarily dependent (even though
the messages themselves are independent). This comes at an apparent penalty term in the
sum-rate.

Theorem . (Marton’s Inner Bound). A rate pair (R1 , R2 ) is achievable for the DM-
BC p(y1 , y2 |x) if

R1 < I(U1 ; Y1 ),
R2 < I(U2 ; Y2 ),
R1 + R2 < I(U1 ; Y1 ) + I(U2 ; Y2 ) − I(U1 ; U2 )

for some pmf p(u1 , u2 ) and function x(u1 , u2 ).

Before proving the theorem, we make a few remarks and show that Marton’s inner
bound is tight for the semideterministic BC.

Remark 8.3. Note that Marton’s inner bound reduces to the inner bound in (.) if U1
and U2 are restricted to be independent.
Remark 8.4. As for the Gelfand–Pinsker theorem in Section ., Marton’s inner bound
does not become larger if we evaluate it using general conditional pmfs p(x|u1 , u2 ). To
show this, fix a pmf p(u1 , u2 , x). By the functional representation lemma in Appendix B,
there exists a random variable V independent of (U1 , U2 ) such that X is a function of
206 General Broadcast Channels

(U1 , U2 , V ). Now defining U1󳰀 = (U1 , V ), we have

I(U1 ; Y1 ) ≤ I(U1󳰀 ; Y1 ),
I(U1 ; Y1 ) + I(U2 ; Y2 ) − I(U1 ; U2 ) ≤ I(U1󳰀 ; Y1 ) + I(U2 ; Y2 ) − I(U1󳰀 ; U2 ),

and X is a function of (U1󳰀 , U2 ). Thus there is no loss of generality in restricting X to be a

deterministic function of (U1 , U2 ).
Remark 8.5. Marton’s inner bound is not convex in general, but can be readily convexi-
fied via a time-sharing random variable Q.

8.3.1 Semideterministic DM-BC

Marton’s inner bound is tight for the class of semideterministic BCs for which Y1 is a func-
tion of X, i.e., Y1 = y1 (X). Hence, we can set the auxiliary random variable U1 = Y1 , which
simplifies Marton’s inner bound to the set of rate pairs (R1 , R2 ) such that

R1 ≤ H(Y1 ),
R2 ≤ I(U ; Y2 ),
R1 + R2 ≤ H(Y1 |U) + I(U ; Y2 )

for some pmf p(u, x). This region turns out to be the capacity region. The converse follows
by the general outer bound on the capacity region of the broadcast channel presented in
Section ..
For the special case of fully deterministic DM-BCs, where Y1 = y1 (X) and Y2 = y2 (X),
the capacity region further simplifies to the set of rate pairs (R1 , R2 ) such that

R1 ≤ H(Y1 ),
R2 ≤ H(Y2 ),
R1 + R2 ≤ H(Y1 , Y2 )

for some pmf p(x). This region is evaluated for the following simple channel.
Example . (Blackwell channel). Consider the deterministic BC depicted in Figure ..
The capacity region, which is plotted in Figure ., is the union of the two regions

R1 = 󶁁(R1 , R2 ) : R1 ≤ H(α), R2 ≤ H(α/2), R1 + R2 ≤ H(α) + α for α ∈ [1/2, 2/3]󶁑,

R2 = 󶁁(R1 , R2 ) : R1 ≤ H(α/2), R2 ≤ H(α), R1 + R2 ≤ H(α) + α for α ∈ [1/2, 2/3]󶁑.

̄ while the second

The first region is attained by setting p X (0) = p X (2) = α/2 and p X (1) = α,
region is attained by setting p X (0) = ᾱ and p X (1) = p X (2) = α/2. It can be shown that this
capacity region is strictly larger than both the superposition coding inner bound and the
inner bound in (.).
8.3 Marton’s Inner Bound 207

0
0 Y1
1
X 2
0
1 Y2
1

Figure .. Blackwell channel. When X = 0, Y1 = Y2 = 0. When X = 1, Y1 = Y2 = 1.

However, when X = 2, Y1 = 0 while Y2 = 1.

R2
R1
1
H(1/3)

R2
2/3

1/2

R1
1/2 2/3 H󶀡 13 󶀱 1

Figure .. The capacity region of the Blackwell channel: C = R1 ∪ R2 . Note that
the sum-rate H(1/3) + 2/3 = log 3 is achievable.

8.3.2 Achievability of Marton’s Inner Bound

The proof of achievability uses multicoding and joint typicality codebook generation. The
idea is illustrated in Figure .. For each message m j , j = 1, 2, we generate a subcode-
book C j (m j ) consisting of independently generated unj sequence. For each message pair
(m1 , m2 ), we find a jointly typical sequence pair (un1 , u2n ) in the product subcodebook
C1 (m1 ) × C2 (m2 ). Note that although these codeword pairs are dependent, they repre-
sent independent messages! To send (m1 , m2 ), we transmit a symbol-by-symbol function
x(u1i , u2i ), i ∈ [1 : n], of the selected sequence pair (u1n , u2n ). Each receiver decodes for its
intended codeword using joint typicality decoding.
A crucial requirement for Marton’s coding scheme to succeed is the existence of at least
one jointly typical sequence pair (u1n , u2n ) in the chosen product subcodebook C1 (m1 ) ×
C2 (m2 ). A sufficient condition on the subcodebook sizes to guarantee the existence of
such a sequence pair is provided in the following.
208 General Broadcast Channels

C1 (1) C1 (2) C1 (2nR1 )

󰑛(󰑅̃ 1 −󰑅1 ) ̃
u1󰑛 (l1 ) 1 2 2 󰑛󰑅 1
u󰑛2 (l2 )
1
C2 (1)
̃
2󰑛(󰑅2 −󰑅2 )

C2 (2)
T󰜖(󰑛)
󳰀 (U1 , U2 )

C2 (2nR2 )
̃
2 󰑛󰑅 2

Figure .. Marton’s coding scheme.

Lemma . (Mutual Covering Lemma). Let (U0 , U1 , U2 ) ∼ p(u0 , u1 , u2 ) and є 󳰀 < є.
Let U0n ∼ p(u0n ) be a random sequence such that limn→∞ P{U0n ∈ Tє(n) 󳰀 } = 1. Let

U1 (m1 ), m1 ∈ [1 : 2 ], be pairwise conditionally independent random sequences, each

n nr1

distributed according to ∏ni=1 pU1 |U0 (u1i |u0i ). Similarly, let U2n (m2 ), m2 ∈ [1 : 2nr2 ], be
pairwise conditionally independent random sequences, each distributed according to
∏ni=1 pU2 |U0 (u2i |u0i ). Assume that (U1n (m1 ) : m1 ∈ [1 : 2nr1 ]) and (U2n (m2 ) : m2 ∈ [1 :
2nr2 ]) are conditionally independent given U0n . Then, there exists δ(є) that tends to
zero as є → 0 such that

lim P󶁁(U0n , U1n (m1 ), U2n (m2 )) ∉ Tє(n) for all (m1 , m2 ) ∈ [1 : 2nr1 ] × [1 : 2nr2 ]󶁑 = 0,
n→∞

if r1 + r2 > I(U1 ; U2 |U0 ) + δ(є).

The proof of this lemma is given in Appendix A. Note that the mutual covering
lemma extends the covering lemma in two ways:
∙ By considering a single U1n sequence (r1 = 0), we obtain the same rate requirement
r2 > I(U1 ; U2 |U0 ) + δ(є) as in the covering lemma.
∙ The lemma requires only pairwise conditional independence among the sequences
U1n (m1 ), m1 ∈ [1 : 2nr1 ] (and among the sequences U2n (m2 ), m2 ∈ [1 : 2nr2 ]). This im-
plies, for example, that it suffices to use linear codes for lossy source coding of a binary
symmetric source with Hamming distortion.
We are now ready to present the details of the proof of Marton’s inner bound in The-
orem ..
8.3 Marton’s Inner Bound 209

Codebook generation. Fix a pmf p(u1 , u2 ) and function x(u1 , u2 ) and let R̃ 1 ≥ R1 , R̃ 2 ≥
̃
R2 . For each message m1 ∈ [1 : 2nR1 ] generate a subcodebook C1 (m1 ) consisting of 2n(R1 −R1 )
̃
randomly and independently generated sequences u1n (l1 ), l1 ∈ [(m1 − 1)2n(R1 −R1 ) + 1 :
̃
m1 2n(R1 −R1 ) ], each according to ∏ni=1 pU1 (u1i ). Similarly, for each message m2 ∈ [1 : 2nR2 ]
̃
generate a subcodebook C2 (m2 ) consisting of 2n(R2 −R2 ) randomly and independently gen-
̃ ̃
erated sequences u2n (l2 ), l2 ∈ [(m2 − 1)2n(R2 −R2 ) + 1 : m2 2n(R2 −R2 ) ], each according to
n
∏i=1 pU2 (u2i ).
For each (m1 , m2 ) ∈ [1 : 2nR1 ] × [1 : 2nR2 ], find an index pair (l1 , l2 ) such that u1n (l1 ) ∈
C1 (m1 ), u2n (l2 ) ∈ C2 (m2 ), and (u1n (l1 ), u2n (l2 )) ∈ Tє(n)
󳰀 . If there is more than one such pair,

choose an arbitrary one among those. If no such pair exists, choose (l1 , l2 ) = (1, 1). Then
generate x n (m1 , m2 ) as xi (m1 , m2 ) = x(u1i (l1 ), u2i (l2 )), i ∈ [1 : n].
Encoding. To send the message pair (m1 , m2 ), transmit x n (m1 , m2 ).
Decoding. Let є > є 󳰀 . Decoder  declares that m ̂ 1 is sent if it is the unique message such
(n)
n n n
̂ 1 ); otherwise it declares an error. Similarly,
that (u1 (l1 ), y1 ) ∈ Tє for some u1 (l1 ) ∈ C1 (m
decoder  finds the unique message m ̂ 2 such that (u2n (l2 ), y2n ) ∈ Tє(n) for some u2n (l2 ) ∈
̂ 2 ).
C2 ( m
Analysis of the probability of error. Assume without loss of generality that (M1 , M2 ) =
(1, 1) and let (L1 , L2 ) be the index pair of the chosen sequences (U1n (L1 ), U2n (L2 )) ∈
C1n (1) × C2n (1). Then decoder  makes an error only if one or more of the following events
occur:

E0 = 󶁁(U1n (l1 ), U2n (l2 )) ∉ Tє(n)

󳰀 for all (U1n (l1 ), U2n (l2 )) ∈ C1 (1) × C2 (1)󶁑,
E11 = 󶁁(U1n (L1 ), Y1n ) ∉ Tє(n) 󶁑,
E12 = 󶁁(U1n (l1 ), Y1n ) ∈ Tє(n) (U1 , Y1 ) for some U1n (l1 ) ∉ C1 (1)󶁑.

Thus the probability of error for decoder  is upper bounded as

P(E1 ) ≤ P(E0 ) + P(E0c ∩ E11 ) + P(E12 ).

̃
To bound P(E0 ), we note that the subcodebook C1 (1) consists of 2n(R1 −R1 ) i.i.d. U1n (l1 ) se-
̃
quences and the subcodebook C2 (1) consists of 2n(R2 −R2 ) i.i.d. U2n (l2 ) sequences. Hence,
by the mutual covering lemma (with U0 = , r1 = R̃ 1 − R1 , and r2 = R̃ 2 − R2 ), P(E0 ) tends
to zero as n → ∞ if (R̃ 1 − R1 ) + (R̃ 2 − R2 ) > I(U1 ; U2 ) + δ(є 󳰀 ).
To bound P(E0c ∩ E11 ), note that since (U1n (L1 ), U2n (L2 )) ∈ Tє(n)
󳰀 and є 󳰀 < є, then by the
conditional typicality lemma in Section ., P{(U1n (L1 ), U2n (L2 ), X n , Y1n ) ∉ Tє(n) } tends to
zero as n → ∞. Hence, P(E0c ∩ E11 ) tends to zero as n → ∞.
To bound P(E12 ), note that since U1n (l1 ) ∼ ∏ni=1 pU1 (u1i ) and Y1n is independent of
every U1n (l1 ) ∉ C(1), then by the packing lemma, P(E12 ) tends to zero as n → ∞ if R̃ 1 <
I(U1 ; Y1 ) − δ(є). Similarly, the average probability of error P(E2 ) for decoder  tends to
zero as n → ∞ if R̃ 2 < I(U2 ; Y2 ) + δ(є) and (R̃ 1 − R1 ) + (R̃ 2 − R2 ) > I(U1 ; U2 ) + δ(є 󳰀 ).
210 General Broadcast Channels

Thus, the average probability of error P(E) tends to zero as n → ∞ if the rate pair (R1 , R2 )
satisfies the inequalities

R1 ≤ R̃ 1 ,
R2 ≤ R̃ 2 ,
R̃ 1 < I(U1 ; Y1 ) − δ(є),
R̃ 2 < I(U2 ; Y2 ) − δ(є),
R1 + R2 < R̃ 1 + R̃ 2 − I(U1 ; U2 ) − δ(є 󳰀 )

for some (R̃ 1 , R̃ 2 ), or equivalently, if

R1 < I(U1 ; Y1 ) − δ(є),

R2 < I(U2 ; Y2 ) − δ(є),
R1 + R2 < I(U1 ; Y1 ) + I(U2 ; Y2 ) − I(U1 ; U2 ) − δ(є 󳰀 ).

This completes the proof of achievability.

8.3.3 Relationship to Gelfand–Pinsker Coding

Marton’s coding scheme is closely related to the Gelfand–Pinsker coding scheme for the
DMC with DM state when the state information is available noncausally at the encoder;
see Section .. Fix a pmf p(u1 , u2 ) and function x(u1 , u2 ) in Marton’s inner bound and
consider the achievable rate pair R1 < I(U1 ; Y1 ) − I(U1 ; U2 ) and R2 < I(U2 ; Y2 ). As shown
in Figure ., the coding scheme for communicating M1 to receiver  is identical to that for
communicating M1 over the channel p(y1 |u1 , u2 ) = p(y1 |x(u1 , u2 )) with state U2 available
noncausally at the encoder.

p(u2 )
U2n

M1 U1n Xn Y1n ̂1
M
Encoder x(u1 , u2 ) p(y1 |x) Decoder 

Figure .. An interpretation of Marton’s coding scheme at a corner point.

Application to the Gaussian Broadcast Channel. We revisit the Gaussian BC studied in

Section ., with outputs

Y1 = X + Z1 ,
Y2 = X + Z2 ,
8.3 Marton’s Inner Bound 211

where Z1 ∼ N(0, N1 ) and Z2 ∼ N(0, N2 ). As before, assume average power constraint P on

X and define the received SNRs as S j = P/N j , j = 1, 2. We show that a rate pair (R1 , R2 )
is achievable if

R1 < C(αS1 ),
̄ 2
αS
R2 < C 󶀥 󶀵
αS2 + 1

for some α ∈ [0, 1] without using superposition coding and successive cancellation de-
coding, and even when N1 > N2 .
Decompose the channel input X into the sum of two independent parts X1 ∼ N(0, αP)
and X2 ∼ N(0, αP) ̄ as shown in Figure .. To send M2 to Y2 , consider the channel
Y2 = X2 + X1 + Z2 with input X2 , Gaussian interference signal X1 , and Gaussian noise Z2 .
Treating the interference signal X1 as noise, M2 can be communicated reliably to receiver 
̄ 2 /(αS2 + 1)). To send M1 to receiver , consider the channel Y1 = X1 + X2 + Z1
if R2 < C(αS
with input X1 , additive Gaussian state X2 , and Gaussian noise Z1 , where the state X2n (M2 )
is known noncausally at the encoder. By the writing on dirty paper result (which is a spe-
cial case of Gelfand–Pinsker coding and hence Marton coding), M1 can be communicated
reliably to receiver  if R1 < C(αS1 ).
The above heuristic argument can be made rigorous by considering Marton’s inner
bound with input cost, setting U2 = X2 , U1 = βU2 + X1 , and X = X1 + X2 = (U1 − βU2 ) +
U2 , where X1 ∼ N(0, αP) and X2 ∼ N(0, αP) ̄ are independent, and β = αP/(αP + N1 ),
and using the discretization procedure in Section ..

M1 X1n Y1n
M1 -encoder
д1
Xn

M2 X2n д2 Y2n
M2 -encoder

Figure .. Writing on dirty paper for the Gaussian BC.

In Section ., we extend this scheme to the multiple-input multiple-output (MIMO)

Gaussian BC, which is not degraded in general, and show that a vector version of writing
on dirty paper achieves the capacity region.
212 General Broadcast Channels

8.4 MARTON’S INNER BOUND WITH COMMON MESSAGE

Marton’s inner bound can be extended to the case of common and private messages by
using superposition coding and rate splitting in addition to multicoding and joint typi-
cality codebook generation. This yields the following inner bound on the capacity of the
DM-BC.

Theorem . (Marton’s Inner Bound with Common Message). A rate triple (R0 ,
R1 , R2 ) is achievable for the DM-BC p(y1 , y2 |x) if

R0 + R1 < I(U0 , U1 ; Y1 ),
R0 + R2 < I(U0 , U2 ; Y2 ),
R0 + R1 + R2 < I(U0 , U1 ; Y1 ) + I(U2 ; Y2 |U0 ) − I(U1 ; U2 |U0 ),
R0 + R1 + R2 < I(U1 ; Y1 |U0 ) + I(U0 , U2 ; Y2 ) − I(U1 ; U2 |U0 ),
2R0 + R1 + R2 < I(U0 , U1 ; Y1 ) + I(U0 , U2 ; Y2 ) − I(U1 ; U2 |U0 )

for some pmf p(u0 , u1 , u2 ) and function x(u0 , u1 , u2 ), where |U0 | ≤ |X | + 4, |U1 | ≤ |X |,
and |U2 | ≤ |X |.

This inner bound is tight for all classes of DM-BCs with known capacity regions. In
particular, the capacity region of the deterministic BC (Y1 = y1 (X), Y2 = y2 (X)) with com-
mon message is the set of rate triples (R0 , R1 , R2 ) such that
R0 ≤ min{I(U ; Y1 ), I(U ; Y2 )},
R0 + R1 < H(Y1 ),
R0 + R2 < H(Y2 ),
R0 + R1 + R2 < H(Y1 ) + H(Y2 |U , Y1 ),
R0 + R1 + R2 < H(Y1 |U , Y2 ) + H(Y2 )
for some pmf p(u, x). It is not known, however, if this inner bound is tight in general.
Proof of Theorem . (outline). Divide M j , j = 1, 2, into two independent messages
M j0 at rate R j0 and M j j at rate R j j . Hence R j = R j0 + R j j . Randomly and indepen-
dently generate 2n(R0 +R10 +R20 ) sequences u0n (m0 , m10 , m20 ). For each (m0 , m10 , m20 , m11 ),
̃
generate a subcodebook C1 (m0 , m10 , m20 , m11 ) consisting of 2n(R11 −R11 ) independent se-
̃ ̃
quences u1n (m0 , m10 , m20 , l11 ), l11 ∈ [(m11 − 1)2n(R11 −R11 ) + 1 : m11 2n(R11 −R11 ) ]. Similarly,
for each (m0 , m10 , m20 , m22 ), generate a subcodebook C2 (m0 , m10 , m20 , m22 ) consisting
̃ ̃
of 2n(R22 −R22 ) independent sequences u1n (m0 , m10 , m20 , l22 ), l22 ∈ [(m22 − 1)2n(R22 −R22 ) + 1 :
̃
m22 2n(R22 −R22 ) ]. As in the codebook generation step in Marton’s coding scheme for the
private-message case, for each (m0 , m10 , m11 , m20 , m22 ), find an index pair (l11 , l22 ) such
that unj (m0 , m10 , m20 , l j j ) ∈ C j (m0 , m10 , m20 , m j j ), j = 1, 2, and

󶀡u1n (m0 , m10 , m20 , l11 ), u2n (m0 , m10 , m20 , l22 )󶀱 ∈ Tє(n)
󳰀 ,
8.4 Marton’s Inner Bound with Common Message 213

and generate x n (m0 , m10 , m11 , m20 , m22 ) as

xi = x󶀡u0i (m0 , m10 , m20 ), u1i (m0 , m10 , m20 , l11 ), u2i (m0 , m10 , m20 , l22 )󶀱

for i ∈ [1 : n]. To send the message triple (m0 , m1 , m2 ) = (m0 , m10 , m11 , m20 , m22 ), trans-
mit x n (m0 , m10 , m20 , m11 , m22 ).
Receiver j = 1, 2 uses joint typicality decoding to find the unique message triple (m ̂ 0j,
m̂ j0 , m
̂ j j ). Following similar steps to the proof for the private-message case and using the
Fourier–Motzkin elimination procedure, it can be shown that the probability of decoding
error tends to zero as n → ∞ if the inequalities in Theorem . are satisfied. The cardinal-
ity bounds of the auxiliary random variables can be proved using the perturbation method
in Appendix C.
Remark .. The inner bound in Theorem . can be equivalently characterized as the
set of rate triples (R0 , R1 , R2 ) such that

R0 < min{I(U0 ; Y1 ), I(U0 ; Y2 )},

R0 + R1 < I(U0 , U1 ; Y1 ),
R0 + R2 < I(U0 , U2 ; Y2 ), (.)
R0 + R1 + R2 < I(U0 , U1 ; Y1 ) + I(U2 ; Y2 |U0 ) − I(U1 ; U2 |U0 ),
R0 + R1 + R2 < I(U1 ; Y1 |U0 ) + I(U0 , U2 ; Y2 ) − I(U1 ; U2 |U0 )

for some pmf p(u0 , u1 , u2 ) and function x(u0 , u1 , u2 ). This region can be achieved without
rate splitting.

Marton’s private-message inner bound with U0 . By setting R0 = 0 in Theorem ., we

obtain the following inner bound.

Proposition .. A rate pair (R1 , R2 ) is achievable for the DM-BC p(y1 , y2 |x) if

R1 < I(U0 , U1 ; Y1 ),
R2 < I(U0 , U2 ; Y2 ),
R1 + R2 < I(U0 , U1 ; Y1 ) + I(U2 ; Y2 |U0 ) − I(U1 ; U2 |U0 ),
R1 + R2 < I(U1 ; Y1 |U0 ) + I(U0 , U2 ; Y2 ) − I(U1 ; U2 |U0 )

for some pmf p(u0 , u1 , u2 ) and function x(u0 , u1 , u2 ).

This inner bound is tight for all classes of broadcast channels with known private-
message capacity regions. Furthermore, it can be shown that the special case of U0 = 
(that is, Marton’s inner bound in Theorem .) is not tight in general for the degraded BC.
214 General Broadcast Channels

8.5 OUTER BOUNDS

First consider the following outer bound on the private-message capacity region of the
DM-BC.

Theorem .. If a rate pair (R1 , R2 ) is achievable for the DM-BC p(y1 , y2 |x), then it
must satisfy the inequalities

R1 ≤ I(U1 ; Y1 ),
R2 ≤ I(U2 ; Y2 ),
R1 + R2 ≤ min󶁁I(U1 ; Y1 ) + I(X; Y2 |U1 ), I(U2 ; Y2 ) + I(X; Y1 |U2 )󶁑

for some pmf p(u1 , u2 , x).

The proof of this theorem follows similar steps to the converse proof for the more
capable DM-BC in Section .. The outer bound in the theorem is tight for all broadcast
channels that we presented so far in this chapter and in Chapter , but not tight in general.
In the following, we discuss two applications of this outer bound.

8.5.1 A BSC and A BEC

We revisit Example . in Section ., where the channel from X to Y1 is a BSC(p) and
the channel from X to Y2 is a BEC(є). As mentioned in the example, when H(p) < є < 1,
this DM-BC does not belong to any class with known capacity region. Nevertheless, the
private-message capacity region for this range of parameter values is still achievable using
superposition coding and is given by the set of rate pairs (R1 , R2 ) such that

R2 ≤ I(U ; Y2 ),
(.)
R1 + R2 ≤ I(U ; Y2 ) + I(X; Y1 |U )

for some pmf p(u, x) with X ∼ Bern(1/2).

Proof of achievability. Recall the superposition coding inner bound that consists of all
rate pairs (R1 , R2 ) such that

R2 < I(U ; Y2 ),
R1 + R2 < I(U ; Y2 ) + I(X; Y1 |U),
R1 + R2 < I(X; Y1 )

for some pmf p(u, x). Now it can be easily shown that if H(p) < є < 1 and X ∼ Bern(1/2),
then I(U ; Y2 ) ≤ I(U ; Y1 ) for all p(u|x). Hence, the third inequality in the superposition
coding inner bound is inactive and any rate pair in the capacity region is achievable.
Proof of the converse. We relax the inequalities in Theorem . by replacing the first
8.5 Outer Bounds 215

inequality with R1 ≤ I(X; Y1 ) and dropping the first term inside the minimum in the third
inequality. Now setting U2 = U, we obtain an outer bound on the capacity region that
consists of all rate pairs (R1 , R2 ) such that
R2 ≤ I(U ; Y2 ),
R1 + R2 ≤ I(U ; Y2 ) + I(X; Y1 |U), (.)
R1 ≤ I(X; Y1 )
for some pmf p(u, x). We first show that it suffices to consider only joint pmfs p(u, x)
̃ be
with X ∼ Bern(1/2). Given a pair (U , X) ∼ p(u, x), let W 󳰀 ∼ Bern(1/2) and (U 󳰀 , X)
defined as
P{U 󳰀 = u, X̃ = x | W 󳰀 = 󰑤} = pU ,X (u, x ⊕ 󰑤).
Let (Ỹ1 , Ỹ2 ) be the channel output pair corresponding to the input X. ̃ Then, by the sym-
metries in the input construction and the channels, it is easy to check that X̃ ∼ Bern(1/2)
and
I(U ; Y2 ) = I(U 󳰀 ; Ỹ2 |W 󳰀 ) ≤ I(U 󳰀 , W 󳰀 ; Ỹ2 ) = I(Ũ ; Ỹ2 ),
̃ Ỹ1 |U 󳰀 , W 󳰀 ) = I( X;
I(X; Y1 |U ) = I( X; ̃ Ỹ1 |U),
̃
̃ Ỹ1 |W 󳰀 ) ≤ I( X;
I(X; Y1 ) = I( X; ̃ Ỹ1 ),

where Ũ = (U 󳰀 , W 󳰀 ). This proves the sufficiency of X ∼ Bern(1/2). As mentioned in the

proof of achievability, this implies that I(U ; Y2 ) ≤ I(U ; Y1 ) for all p(u|x) and hence the
outer bound in (.) reduces to the characterization of the capacity region in (.).

8.5.2 Binary Skew-Symmetric Broadcast Channel

The binary skew-symmetric broadcast channel consists of two symmetric Z channels as
depicted in Figure ..

0
1/2
Y1
1/2
0 1
X
1 0
1/2
Y2
1/2
1

Figure .. Binary skew-symmetric broadcast channel.

We show that the private-message sum-capacity Csum = max{R1 + R2 : (R1 , R2 ) ∈ C }

for this channel is bounded as
0.3616 ≤ Csum ≤ 0.3726. (.)
216 General Broadcast Channels

The lower bound is achieved by the following randomized time-sharing technique. We

divide message M j , j = 1, 2, into two independent messages M j0 and M j j .

Codebook generation. Randomly and independently generate 2n(R10 +R20 ) sequences

un (m10 , m20 ), each i.i.d. Bern(1/2). For each (m10 , m20 ), let k(m10 , m20 ) be the number of
locations where ui (m10 , m20 ) = 1. Randomly and conditionally independently generate
2nR11 sequences x k(m10 ,m20 ) (m10 , m20 , m11 ), each i.i.d. Bern(α). Similarly, randomly and
conditionally independently generate 2nR22 sequences x n−k(m10 ,m20 ) (m10 , m20 , m22 ), each
̄
i.i.d. Bern(α).
Encoding. To send the message pair (m1 , m2 ), represent it by the message quadruple (m10 ,
m20 , m11 , m22 ). Transmit x k(m10 ,m20 ) (m10 , m20 , m11 ) in the locations where ui (m10 , m20 ) =
1 and x n−k(m10 ,m20 ) (m10 , m20 , m22 ) in the locations where ui (m10 , m20 ) = 0. Thus the mes-
sages m11 and m22 are transmitted using time sharing with respect to un (m10 , m20 ).
Decoding. Each decoder j = 1, 2 first decodes for un (m10 , m20 ) and then proceeds to
recover m j j from the output subsequence that corresponds to the respective symbol loca-
tions in un (m10 , m20 ). It is straightforward to show that a rate pair (R1 , R2 ) is achievable
using this scheme if
1
R1 < min{I(U ; Y1 ), I(U ; Y2 )} + I(X; Y1 |U = 1),
2
1
R2 < min{I(U ; Y1 ), I(U ; Y2 )} + I(X; Y2 |U = 0), (.)
2
1
R1 + R2 < min{I(U ; Y1 ), I(U ; Y2 )} + 󶀡I(X; Y1 |U = 1) + I(X; Y2 |U = 0)󶀱
2
for some α ∈ [0, 1]. Taking the maximum of the sum-rate bound over α establishes the
sum-capacity lower bound Csum ≥ 0.3616.
Note that by considering U0 ∼ Bern(1/2), U1 ∼ Bern(α), and U2 ∼ Bern(α), indepen-
dent of each other, and setting X = U0U1 + (1 − U0 )(1 − U2 ), Marton’s inner bound with
U0 in Proposition . reduces to the randomized time-sharing rate region in (.). Fur-
thermore, it can be shown that the best achievable sum-rate using Marton’s coding scheme
is 0.3616, which is the rate achieved by randomized time-sharing.
The upper bound in (.) follows by Theorem ., which implies that

Csum ≤ max min󶁁I(U1 ; Y1 ) + I(U2 ; Y2 |U1 ), I(U2 ; Y2 ) + I(U1 ; Y1 |U2 )󶁑

p(u1 ,u2 ,x)
1
≤ max 󶀡I(U1 ; Y1 ) + I(U2 ; Y2 |U1 ) + I(U2 ; Y2 ) + I(U1 ; Y1 |U2 )󶀱.
p(u1 ,u2 ,x) 2

It can be shown that the latter maximum is attained by X ∼ Bern(1/2) and binary U1 and
U2 , which gives the upper bound on the sum-capacity Csum ≤ 0.3726.

8.5.3 Nair–El Gamal Outer Bound

We now consider the following outer bound on the capacity region of the DM-BC with
common and private messages.
8.6 Inner Bounds for More than Two Receivers 217

Theorem . (Nair–El Gamal Outer Bound). If a rate triple (R0 , R1 , R2 ) is achievable
for the DM-BC p(y1 , y2 |x), then it must satisfy the inequalities

R0 ≤ min{I(U0 ; Y1 ), I(U0 ; Y2 )},

R0 + R1 ≤ I(U0 , U1 ; Y1 ),
R0 + R2 ≤ I(U0 , U2 ; Y2 ),
R0 + R1 + R2 ≤ I(U0 , U1 ; Y1 ) + I(U2 ; Y2 |U0 , U1 ),
R0 + R1 + R2 ≤ I(U1 ; Y1 |U0 , U2 ) + I(U0 , U2 ; Y2 )

for some pmf p(u0 , u1 , u2 , x) = p(u1 )p(u2 )p(u0 |u1 , u2 ) and function x(u0 , u1 , u2 ).

The proof of this outer bound is quite similar to the converse proof for the more ca-
pable DM-BC in Section . and is given in Appendix B.
The Nair–El Gamal bound is tight for all broadcast channels with known capacity re-
gions that we discussed so far. Note that the outer bound coincides with Marton’s inner
bound with common message in Theorem . (or more directly, the alternative character-
ization in (.)) if I(U1 ; U2 |U0 , Y1 ) = I(U1 ; U2 |U0 , Y2 ) = 0 for every pmf p(u0 , u1 , u2 , x)
that attains a boundary point of the outer bound. This bound is not tight in general, how-
ever.
Remark .. It can be shown that the Nair–El Gamal outer bound with no common mes-
sage, i.e., with R0 = 0, simplifies to the outer bound in Theorem ., that is, no U0 is needed
when evaluating the outer bound in Theorem . with R0 = 0. This is in contrast to Mar-
ton’s inner bound, where the private-message region without U0 in Theorem . can be
strictly smaller than that with U0 in Proposition ..

8.6 INNER BOUNDS FOR MORE THAN TWO RECEIVERS

Marton’s inner bound can be easily extended to more than two receivers. For example,
consider a -receiver DM-BC p(y1 , y2 , y3 |x) with three private messages M j ∈ [1 : 2nR 󰑗 ],
j = 1, 2, 3. A rate triple (R1 , R2 , R3 ) is achievable for the -receiver DM-BC if
R1 < I(U1 ; Y1 ),
R2 < I(U2 ; Y2 ),
R3 < I(U3 ; Y3 ),
R1 + R2 < I(U1 ; Y1 ) + I(U2 ; Y2 ) − I(U1 ; U2 ),
R1 + R3 < I(U1 ; Y1 ) + I(U3 ; Y3 ) − I(U1 ; U3 ),
R2 + R3 < I(U2 ; Y2 ) + I(U3 ; Y3 ) − I(U2 ; U3 ),
R1 + R2 + R3 < I(U1 ; Y1 ) + I(U2 ; Y2 ) + I(U3 ; Y3 ) − I(U1 ; U2 ) − I(U1 , U2 ; U3 )
for some pmf p(u1 , u2 , u3 ) and function x(u1 , u2 , u3 ). This region can be readily extended
to any number of receivers k.
218 General Broadcast Channels

It can be readily shown that the extended Marton inner bound is tight for deterministic
DM-BCs with an arbitrary number of receivers, where we substitute U j = Y j for j ∈ [1 :
k]. The proof of achievability follows similar steps to the case of two receivers using the
following extension of the mutual covering lemma.

Lemma . (Multivariate Covering Lemma). Let (U0 , U1, . . . , Uk ) ∼ p(u0 , u1, . . . , uk )
and є 󳰀 < є. Let U0n ∼ p(u0n ) be a random sequence with limn→∞ P{U0n ∈ Tє(n) 󳰀 } = 1.

For each j ∈ [1 : k], let U jn (m j ), m j ∈ [1 : 2nr 󰑗 ], be pairwise conditionally independent

random sequences, each distributed according to ∏ni=1 pU 󰑗 |U0 (u ji |u0i ). Assume that
(U1n (m1 ) : m1 ∈ [1 : 2nr1 ]), (U2n (m2 ) : m2 ∈ [1 : 2nr2 ]), . . . , (Ukn (mk ) : mk ∈ [1 : 2nr󰑘 ]) are
mutually conditionally independent given U0n . Then, there exists δ(є) that tends to zero
as є → 0 such that

lim P󶁁(U0n , U1n (m1 ), U2n (m2 ), . . . , Ukn (mk )) ∉ Tє(n) for all (m1 , m2 , . . . , mk )󶁑 = 0,
n→∞

if ∑ j∈J r j > ∑ j∈J H(U j |U0 ) − H(U (J )|U0 ) + δ(є) for all J ⊆ [1 : k] with |J | ≥ 2.

The proof of this lemma is a straightforward extension of that for the k = 2 case pre-
sented in Appendix A. When k = 3 and U0 = , the conditions for joint typicality become
r1 + r2 > I(U1 ; U2 ) + δ(є),
r1 + r3 > I(U1 ; U3 ) + δ(є),
r2 + r3 > I(U2 ; U3 ) + δ(є),
r1 + r2 + r3 > I(U1 ; U2 ) + I(U1 , U2 ; U3 ) + δ(є).
In general, combining Marton’s coding with superposition coding, rate splitting, and
indirect decoding, we can construct an inner bound for DM-BCs with more than two
receivers for any given messaging requirement. We illustrate this construction via the -
receiver DM-BC p(y1 , y2 , y3 |x) with two degraded message sets, where a common mes-
sage M0 ∈ [1 : 2nR0 ] is to be communicated to receivers , , and , and a private message
M1 ∈ [1 : 2nR1 ] is to be communicated only to receiver .

Proposition .. A rate pair (R0 , R1 ) is achievable for the -receiver DM-BC
p(y1 , y2 , y3 |x) with  degraded message sets if
R0 < min{I(V2 ; Y2 ), I(V3 ; Y3 )},
2R0 < I(V2 ; Y2 ) + I(V3 ; Y3 ) − I(V2 ; V3 |U),
R0 + R1 < min󶁁I(X; Y1 ), I(V2 ; Y2 ) + I(X; Y1 |V2 ), I(V3 ; Y3 ) + I(X; Y1 |V3 )󶁑,
2R0 + R1 < I(V2 ; Y2 ) + I(V3 ; Y3 ) + I(X; Y1 |V2 , V3 ) − I(V2 ; V3 |U),
2R0 + 2R1 < I(V2 ; Y2 ) + I(V3 ; Y3 ) + I(X; Y1 |V2 ) + I(X; Y1 |V3 ) − I(V2 ; V3 |U ),
2R0 + 2R1 < I(V2 ; Y2 ) + I(V3 ; Y3 ) + I(X; Y1 |U) + I(X; Y1 |V2 , V3 ) − I(V2 ; V3 |U )
for some pmf p(u, 󰑣2 , 󰑣3 , x) = p(u)p(󰑣2 |u)p(x, 󰑣3 |󰑣2 ) = p(u)p(󰑣3 |u)p(x, 󰑣2 |󰑣3 ).
Summary 219

It can be shown that this inner bound is tight when Y1 is less noisy than Y2 , which is a
generalization of the class of multilevel DM-BCs discussed earlier in Section ..

Proof of Proposition . (outline). Divide M1 into four independent messages M10 at
rate R10 , M11 at rate R11 , M12 at rate R12 , and M13 at rate R13 . Let R̃ 12 ≥ R12 and R̃ 13 ≥ R13 .
Randomly and independently generate 2n(R0 +R10 ) sequences un (m0 , m10 ), (m0 , m10 ) ∈ [1 :
2nR0 ] × [1 : 2nR10 ]. As in the achievability proof of Theorem ., for each (m0 , m10 ), we
use the codebook generation step in the proof of Marton’s inner bound in Theorem .
̃
to randomly and conditionally independently generate 2nR12 sequences (󰑣2n (m0 , m10 , l2 ),
̃ ̃ ̃
l2 ∈ [1 : 2nR12 ], and 2nR13 sequences 󰑣3n (m0 , m10 , l3 )), l3 ∈ [1 : 2nR13 ]. For each (m0 , m10 ,
m12 , m13 ), find a jointly typical sequence pair (󰑣2n (m0 , m10 , l2 ), 󰑣3n (m0 , m10 , l3 )), where l2 ∈
̃ ̃ ̃ ̃
[(m12 − 1)2n(R12 −R12 ) + 1 : m12 2n(R12 −R12 ) ] and l3 ∈ [(m13 − 1)2n(R13 −R13 ) + 1 : m13 2n(R13 −R13 ) ],
and randomly and conditionally independently generate 2nR11 sequences x n (m0 , m10 , m11 ,
m12 , m13 ). To send the message pair (m0 , m1 ), transmit x n (m0 , m10 , m11 , m12 , m13 ).
Receiver  uses joint typicality decoding to find the unique tuple (m ̂ 01 , m
̂ 10 , m
̂ 11 , m
̂ 12 ,
̂ ̂
m13 ). Receiver j = 2, 3 uses indirect decoding to find the unique m0 j through (u , 󰑣 j ). n n

Following a standard analysis of the probability of error and using the Fourier–Motzkin
elimination procedure, it can be shown that the probability of error tends to zero as n →
∞ if the inequalities in Proposition . are satisfied.

SUMMARY

∙ Indirect decoding
∙ Marton’s inner bound:
∙ Multidimensional subcodebook generation
∙ Generating correlated codewords for independent messages
∙ Tight for all BCs with known capacity regions
∙ A common auxiliary random variable is needed even when coding only for private
messages
∙ Mutual covering lemma and its multivariate extension
∙ Connection between Marton coding and Gelfand–Pinsker coding
∙ Writing on dirty paper achieves the capacity region of the Gaussian BC
∙ Randomized time sharing for the binary skew-symmetric BC
∙ Nair–El Gamal outer bound:
∙ Does not always coincide with Marton’s inner bound
∙ Not tight in general
220 General Broadcast Channels

∙ Open problems:
8.1. What is the capacity region of the general -receiver DM-BC with one common
message to all three receivers and one private message to one receiver?
8.2. Is superposition coding optimal for the general -receiver DM-BC with one mes-
sage to all three receivers and another message to two receivers?
8.3. What is the sum-capacity of the binary skew-symmetric broadcast channel?
8.4. Is Marton’s inner bound tight in general?

BIBLIOGRAPHIC NOTES

The capacity region of the -receiver DM-BC with degraded message sets was established
by Körner and Marton (b), who proved the strong converse using the technique of
images of sets (Körner and Marton c). The multilevel BC was introduced by Borade,
Zheng, and Trott (), who conjectured that the straightforward generalization of The-
orem . to three receivers in (.) is optimal. The capacity region of the multilevel BC
in Theorem . was established using indirect decoding by Nair and El Gamal ().
They also established the general inner bound for the -receiver DM-BC with degraded
message sets in Section ..
The inner bound on the private-message capacity region in (.) is a special case of
a more general inner bound by Cover (b) and van der Meulen (), which con-
sists of all rate pairs (R1 , R2 ) that satisfy the inequalities in Proposition . for some pmf
p(u0 )p(u1 )p(u2 ) and x(u0 , u1 , u2 ). The deterministic broadcast channel in Example . is
attributed to D. Blackwell. The capacity region of the Blackwell channel was established
by Gelfand (). The capacity region of the deterministic broadcast channel was estab-
lished by Pinsker (). The capacity region of the semideterministic broadcast channel
was established independently by Marton () and Gelfand and Pinsker (a). The
inner bounds in Theorem . and in (.) are due to Marton (). The mutual covering
lemma and its application in the proof of achievability are due to El Gamal and van der
Meulen (). The inner bound on the capacity region of the DM-BC with common
message in Theorem . is due to Liang (). The equivalence to Marton’s character-
ization in (.) was shown by Liang, Kramer, and Poor (). Gohari, El Gamal, and
Anantharam () showed that Marton’s inner bound without U0 is not optimal even for
a degraded DM-BC with no common message.
The outer bound in Theorem . is a direct consequence of the converse proof for
the more capable BC by El Gamal (). Nair and El Gamal () showed through the
binary skew-symmetric BC example that it is strictly tighter than the earlier outer bounds
by Sato (a) and Körner and Marton (Marton ). The outer bound in Theorem . is
due to Nair and El Gamal (). Other outer bounds can be found in Liang, Kramer, and
Shamai (), Gohari and Anantharam (), and Nair (). These bounds coincide
with the bound in Theorem . when there is no common message. It is not known,
Problems 221

however, if they are strictly tighter than the Nair–El Gamal outer bound. The application
of the outer bound in Theorem . to the BSC–BEC broadcast channel is due to Nair
(). The binary skew-symmetric channel was first studied by Hajek and Pursley (),
who showed through randomized time sharing that U0 is needed for the Cover–van der
Meulen inner bound even when there is no common message. Gohari and Anantharam
(), Nair and Wang (), and Jog and Nair () showed that the maximum sum-
rate of Marton’s inner bound (with U0 ) coincides with the randomized time-sharing lower
bound, establishing a strict gap between Marton’s inner bound with U0 and the outer
bound in Theorem .. Geng, Gohari, Nair, and Yu () established the capacity region
of the product of reversely semideterministic or more capable DM-BCs and provided an
ingenuous counterexample that shows that the Nair–El Gamal outer bound is not tight.

PROBLEMS

.. Prove achievability of the extended superposition coding inner bound in (.) for
the -receiver multilevel DM-BC.
.. Consider the binary erasure multilevel product DM-BC in Example .. Show that
the extended superposition coding inner bound in (.) can be simplified to (.).
.. Prove the converse for the capacity region of the -receiver multilevel DM-BC in
Theorem .. (Hint: Use the converse proof for the degraded DM-BC for Y1 and
Y2 and the converse proof for the -receiver DM-BC with degraded message sets
for Y1 and Y3 .)
.. Consider the BSC–BEC broadcast channel in Section ... Show that if H(p) <
є < 1 and X ∼ Bern(1/2), then I(U ; Y2 ) ≤ I(U ; Y1 ) for all p(u|x).
.. Complete the proof of the mutual covering lemma for the case U0 ̸= .
.. Show that the inner bound on the capacity region of the -receiver DM-BC with
two degraded message sets in Proposition . is tight when Y1 is less noisy that Y2 .
.. Complete decoding for the -receiver multilevel DM-BC. Show that the fourth in-
equality in (.) is redundant and thus that the region simplifies to the capacity re-
gion in Theorem .. (Hint: Given the choice of U , consider two cases I(U ; Y2 ) ≤
I(U ; Y3 ) and I(U ; Y2 ) > I(U ; Y3 ) separately, and optimize V .)
.. Sato’s outer bound. Show that if a rate pair (R1 , R2 ) is achievable for the DM-BC
p(y1 , y2 |x), then it must satisfy

R1 ≤ I(X; Y1 ),
R2 ≤ I(X; Y2 ),
R1 + R2 ≤ min I(X; Y1 , Y2 )
̃ 1 , y2 |x)
p(y

̃ 1 , y2 |x)
for some pmf p(x), where the minimum is over all conditional pmfs p(y
222 General Broadcast Channels

with the same conditional marginal pmfs p(y1 |x) and p(y2 |x) as the given channel
conditional pmf p(y1 , y2 |x).
.. Write-once memory. Consider a memory that can store bits without any noise. But
unlike a regular memory, each cell of the memory is permanently programmed
to zero once a zero is stored. Thus, the memory can be modeled as a DMC with
state Y = XS, where S is the previous state of the memory, X is the input to the
memory, and Y is the output of the memory.
Suppose that the initial state of the memory is Si = 1, i ∈ [1 : n]. We wish to
use the memory twice at rates R1 and R2 , respectively. For the first write, we store
data with input X1 and output Y1 = X1 . For the second write, we store data with
input X2 and output Y2 = X1 X2 .
(a) Show that the capacity region for two writes over the memory is the set of rate
pairs (R1 , R2 ) such that

R1 ≤ H(α),
R2 ≤ 1 − α

for some α ∈ [0, 1/2].

(b) Find the sum-capacity.
(c) Now suppose we use the memory k times at rates R1 , . . . , Rk . The memory at
stage k has Yk = X1 X2 ⋅ ⋅ ⋅ Xk . Find the capacity region and sum-capacity.
Remark: This result is due to Wolf, Wyner, Ziv, and Körner ().
.. Marton inner bound with common message. Consider the Marton inner bound
with common message in Theorem .. Provide the details of the coding scheme
outlined below.
(a) Using the packing lemma and the mutual covering lemma, show that the prob-
ability of error tends to zero as n → ∞ if

R0 + R01 + R02 + R̃ 11 < I(U0 , U1 ; Y1 ),

R0 + R01 + R02 + R̃ 22 < I(U0 , U2 ; Y2 ),
R̃ 11 < I(U1 ; Y1 |U0 ),
R̃ 22 < I(U2 ; Y2 |U0 ),
R̃ 11 + R̃ 22 − R11 − R22 > I(U1 ; U2 |U0 ).

(b) Establish the inner bound by using the Fourier–Motzkin elimination proce-
dure to reduce the set of inequalities in part (a).
(c) Show that the inner bound is convex.
.. Maximal probability of error. Consider a (2nR1 , 2nR2 , n) code C for the DM-BC
p(y1 , y2 |x) with average probability of error Pe(n) . In this problem, we show that
there exists a (2nR1 /n2 , 2nR2 /n2 , n) code with maximal probability of error less than
Appendix 8A Proof of the Mutual Covering Lemma 223

(Pe(n) )1/2 . Consequently, the capacity region with maximal probability of error
is the same as that with average probability of error. This is in contrast to the
DM-MAC case, where the capacity region for maximal probability of error can
be strictly smaller than that for average probability of error; see Problem ..
(a) A codeword x n (m1 , m2 ) is said to be “bad” if its probability of error λm2 m2 =
P{(M̂ 1, M
̂ 2 ) ̸= (M1 , M2 ) | (M1 , M2 ) = (m1 , m2 )} > (P (n) )1/2 . Show that there are
e
at most 2n(R1 +R2 ) (Pe(n) )1/2 “bad” codewords x n (m1 , m2 ).
(b) Randomly and independently permute the message indices m1 and m2 to gen-
erate a new codebook C̄ consisting of codewords x n (σ1 (m1 ), σ(m2 )), where σ1
and σ2 denote the random permutations. Then partition the codebook C̄ into
subcodebooks C(m̄ 󳰀 , m󳰀 ), (m󳰀 , m󳰀 ) ∈ [1 : 2nR1 /n2 ] × [1 : 2nR2 /n2 ], each consist-
1 2 1 2
ing of n × n sequences. Show that the probability that all n4 codewords in the
2 2
2
subcodebook C(m̄ 󳰀 , m󳰀 ) are “bad” is upper bounded by (P (n) )n /2 . (Hint: This
1 2 e
probability is upper bounded by the probability that all n2 “diagonal” code-
words are “bad.” Upper bound the latter probability using part (a) and the
independence of the permutations.)
(c) Suppose that a subcodebook C(m ̄ 󳰀 , m󳰀 ) is said to be “bad” if all sequences in
1 2
the subcodebook are “bad.” Show that the expected number of “bad” subcode-
books is upper bounded by

2n(R1 +R2 ) (n) n2 /2

󶀡Pe 󶀱 .
n4

Further show that this upper bound tends to zero as n → ∞ if Pe(n) tends to
zero as n → ∞.
(d) Argue that there exists at least one permutation pair (σ1 , σ2 ) such that there is
no “bad” subcodebook. Conclude that there exists a (2nR1 /n2 , 2nR2 /n2 , n) code
with maximal probability of error less than or equal to (Pe(n) )1/2 for n sufficiently
large, if Pe(n) tends to zero as n → ∞.
Remark: This argument is due to İ. E. Telatar, which is a simplified version of the
original proof by Willems ().

APPENDIX 8A PROOF OF THE MUTUAL COVERING LEMMA

We prove the case U0 =  only. The proof for the general case follows similarly.
Let A = {(m1 , m2 ) ∈ [1 : 2nr1 ] × [1 : 2nr2 ] : (U1n (m1 ), U2n (m2 )) ∈ Tє(n) (U1 , U2 )}. Then
by the Chebyshev lemma in Appendix B, the probability of the event of interest can be
bounded as
Var(|A|)
P{|A| = 0} ≤ P󶁁(|A| − E |A|)2 ≥ (E |A|)2 󶁑 ≤ .
(E |A|)2
224 General Broadcast Channels

We now show that Var(|A|)/(E |A|)2 tends to zero as n → ∞ if

r1 > 3δ(є),
r2 > 3δ(є),
r1 + r2 > I(U1 ; U2 ) + δ(є).
Using indicator random variables, we can express |A| as
2󰑛󰑟1 2󰑛󰑟2
|A| = 󵠈 󵠈 E(m1 , m2 ),
m1 =1 m2 =1

where
1 if (U1n (m1 ), U2n (m2 )) ∈ Tє(n) ,
E(m1 , m2 ) = 󶁇
0 otherwise

for each (m1 , m2 ) ∈ [1 : 2nr1 ] × [1 : 2nr2 ]. Let

p1 = P󶁁(U1n (1), U2n (1)) ∈ Tє(n) 󶁑,

p2 = P󶁁(U1n (1), U2n (1)) ∈ Tє(n) , (U1n (1), U2n (2)) ∈ Tє(n) 󶁑,
p3 = P󶁁(U1n (1), U2n (1)) ∈ Tє(n) , (U1n (2), U2n (1)) ∈ Tє(n) 󶁑,
p4 = P󶁁(U1n (1), U2n (1)) ∈ Tє(n) , (U1n (2), U2n (2)) ∈ Tє(n) 󶁑 = p21 .
Then

E(|A|) = 󵠈 P󶁁(U1n (m1 ), U2n (m2 )) ∈ Tє(n) 󶁑 = 2n(r1 +r2 ) p1 ,

m1 ,m2

E(|A| ) = 󵠈 P󶁁(U1n (m1 ), U2n (m2 )) ∈ Tє(n) 󶁑

m1 ,m2

+ 󵠈 󵠈 P󶁁(U1n (m1 ), U2n (m2 )) ∈ Tє(n) , (U1n (m1 ), U2n (m󳰀2 )) ∈ Tє(n) 󶁑
m1 ,m2 m󳰀 ̸=m
2 2

+ 󵠈 󵠈 P󶁁(U1n (m1 ), U2n (m2 )) ∈ Tє(n) , (U1n (m󳰀1 ), U2n (m2 )) ∈ Tє(n) 󶁑
m1 ,m2 m󳰀 ̸=m
1 1

+ 󵠈 󵠈 󵠈 P󶁁(U1n (m1 ), U2n (m2 )) ∈ Tє(n) , (U1n (m󳰀1 ), U2n (m󳰀2 )) ∈ Tє(n) 󶁑
m1 ,m2 m󳰀 ̸=m m󳰀 ̸=m
1 1 2 2

≤ 2n(r1 +r2 ) p1 + 2n(r1 +2r2 ) p2 + 2n(2r1 +r2 ) p3 + 22n(r1 +r2 ) p4 .

Hence
Var(|A|) ≤ 2n(r1 +r2 ) p1 + 2n(r1 +2r2 ) p2 + 2n(2r1 +r2 ) p3 .

Now by the joint typicality lemma, for sufficiently large n, we have

p1 ≥ 2−n(I(U1 ;U2 )+δ(є)) ,

p2 ≤ 2−n(2I(U1 ;U2 )−δ(є)) ,
p3 ≤ 2−n(2I(U1 ;U2 )−δ(є)) ,
Appendix 8B Proof of the Nair–El Gamal Outer Bound 225

and hence

p2 /p21 ≤ 23nδ(є) ,
p3 /p21 ≤ 23nδ(є) .

Therefore
Var(|A|)
≤ 2−n(r1 +r2 −I(U1 ;U2 )−δ(є)) + 2−n(r1 −3δ(є)) + 2−n(r2 −3δ(є)) ,
(E |A|)2
which tends to zero as n → ∞ if
r1 > 3δ(є),
r2 > 3δ(є),
r1 + r2 > I(U1 ; U2 ) + δ(є).

It can be similarly shown that P{|A| = 0} tends to zero as n → ∞ if r1 = 0 and r2 >

I(U1 ; U2 ) + δ(є), or if r1 > I(U1 ; U2 ) + δ(є) and r2 = 0. Combining these three sets of in-
equalities, we have shown that P{|A| = 0} tends to zero as n → ∞ if r1 + r2 > I(U1 ; U2 ) +
4δ(є).

APPENDIX 8B PROOF OF THE NAIR–EL GAMAL OUTER BOUND

By Fano’s inequality and following standard steps, we have

nR0 ≤ min{I(M0 ; Y1n ), I(M0 ; Y2n )} + nєn ,

n(R0 + R1 ) ≤ I(M0 , M1 ; Y1n ) + nєn ,
n(R0 + R2 ) ≤ I(M0 , M2 ; Y2n ) + nєn ,
n(R0 + R1 + R2 ) ≤ I(M0 , M1 ; Y1n ) + I(M2 ; Y2n |M0 , M1 ) + nєn ,
n(R0 + R1 + R2 ) ≤ I(M1 ; Y1n |M0 , M2 ) + I(M0 , M2 ; Y2n ) + nєn .

We bound the mutual information terms in the above bounds. First consider the terms
in the fourth inequality

I(M0 , M1 ; Y1n ) + I(M2 ; Y2n |M0 , M1 )

n n
= 󵠈 I(M0 , M1 ; Y1i |Y1i−1 ) + 󵠈 I(M2 ; Y2i |M0 , M1 , Y2,i+1
n
).
i=1 i=1

Now consider
n n
󵠈 I(M0 , M1 ; Y1i |Y1i−1 ) ≤ 󵠈 I(M0 , M1 , Y1i−1 ; Y1i )
i=1 i=1
n n
= 󵠈 I(M0 , M1 , Y1i−1 , Y2,i+1
n n
; Y1i ) − 󵠈 I(Y2,i+1 ; Y1i |M0 , M1 , Y1i−1 ).
i=1 i=1
226 General Broadcast Channels

Also
n n
n
󵠈 I(M2 ; Y2i |M0 , M1 , Y2,i+1 ) ≤ 󵠈 I(M2 , Y1i−1 ; Y2i |M0 , M1 , Y2,i+1
n
)
i=1 i=1
n
= 󵠈 I(Y1i−1 ; Y2i |M0 , M1 , Y2,i+1
n
)
i=1
n
n
+ 󵠈 I(M2 ; Y2i |M0 , M1 , Y2,i+1 , Y1i−1 ).
i=1

Combining the above results and identifying auxiliary random variables as U0i = (M0 ,
Y1i−1 , Y2,i+1
n
), U1i = M1 , and U2i = M2 , we obtain

I(M0 , M1 ; Y1n ) + I(M2 ; Y2n |M0 , M1 )

n n
≤ 󵠈 I(M0 , M1 , Y1i−1 , Y2,i+1
n n
; Y1i ) − 󵠈 I(Y2,i+1 ; Y1i |M0 , M1 , Y1i−1 )
i=1 i=1
n n
+ 󵠈 I(Y1i−1 ; Y2i |M0 , M1 , Y2,i+1
n n
) + 󵠈 I(M2 ; Y2i |M0 , M1 , Y2,i+1 , Y1i−1 )
i=1 i=1
n n
(a)
= 󵠈 I(M0 , M1 , Y1i−1 , Y2,i+1
n n
; Y1i ) + 󵠈 I(M2 ; Y2i |M0 , M1 , Y2,i+1 , Y1i−1 )
i=1 i=1
n n
= 󵠈 I(U0i , U1i ; Y1i ) + 󵠈 I(U2i ; Y2i |U0i , U1i ),
i=1 i=1

where (a) follows by the Csiszár sum identity. Similarly

n n
I(M1 ; Y1n |M0 , M2 ) + I(M0 , M2 ; Y2n ) ≤ 󵠈 I(U1i ; Y1i |U0i , U2i ) + 󵠈 I(U0i , U2i ; Y2i ).
i=1 i=1

It can be also easily shown that

n
I(M0 ; Y jn ) ≤ 󵠈 I(U0i ; Y ji ),
i=1
n
I(M0 , M j ; Y jn ) ≤ 󵠈 I(U0i , U ji ; Y ji ), j = 1, 2.
i=1

The rest of the proof follows by introducing a time-sharing random variable Q indepen-
dent of (M0 , M1 , M2 , X n , Y1n , Y2n ), and uniformly distributed over [1 : n], and defining
U0 = (Q, U0Q ), U1 = U1Q , U2 = U2Q , X = XQ , Y1 = Y1Q , and Y2 = Y2Q . Using arguments
similar to Remark ., it can be easily verified that taking a function x(u0 , u1 , u2 ) suffices.
Note that the independence of the messages M1 and M2 implies the independence of the
auxiliary random variables U1 and U2 as specified.
CHAPTER 9

Gaussian Vector Channels

Gaussian vector channels are models for multiple-input multiple-output (MIMO) wire-
less communication systems in which each transmitter and each receiver can have more
than a single antenna. The use of multiple antennas provides several benefits in a wireless
multipath medium, including higher received power via beamforming, higher channel
capacity via spatial multiplexing without increasing bandwidth or transmission power,
and improved transmission robustness via diversity coding. In this chapter, we focus on
the limits on spatial multiplexing of MIMO communication.
We first establish the capacity of the Gaussian vector point-to-point channel. We show
that this channel is equivalent to the Gaussian product channel discussed in Section .
and thus the capacity is achieved via water-filling. We then establish the capacity region
of the Gaussian vector multiple access channel as a straightforward extension of the scalar
case and show that the sum-capacity is achieved via iterative water-filling.
The rest of the chapter is dedicated to studying Gaussian vector broadcast channels.
We first establish the capacity region with common message for the special case of the
Gaussian product broadcast channel, which is not in general degraded. Although the
capacity region of this channel is achieved via superposition coding with a product in-
put pmf on the channel components, the codebook generation for the common message
and decoding must each be performed jointly across the channel components. For the
special case of private messages only, the capacity region can be expressed as the intersec-
tion of the capacity regions of two enhanced degraded broadcast channels. Next, we turn
our attention to the general Gaussian vector broadcast channel and establish the private-
message capacity region. We first describe a vector extension of writing on dirty paper
for the Gaussian BC discussed in Section .. We show that this scheme is optimal by
constructing an enhanced degraded Gaussian vector broadcast channel for every corner
point on the boundary of the capacity region. The proof uses Gaussian vector BC–MAC
duality, convex optimization techniques (Lagrange duality and the KKT condition), and
the entropy power inequality.
In Chapter , we show that in the presence of fading, the capacity of the vector Gauss-
ian channel grows linearly with the number of antennas.

9.1 GAUSSIAN VECTOR POINT-TO-POINT CHANNEL

Consider the point-to-point communication system depicted in Figure .. The sender
228 Gaussian Vector Channels

M Xn G Yn ̂
M
Encoder Decoder

Figure .. MIMO point-to-point communication system.

wishes to reliably communicate a message M to the receiver over a MIMO communication

channel.
We model the MIMO communication channel as a Gaussian vector channel, where
the output of the channel Y corresponding to the input X is

Y = GX + Z.

Here Y is an r-dimensional vector, X is a t-dimensional vector, Z ∼ N(0, KZ ), KZ ≻ 0, is

an r-dimensional noise vector, and G is an r × t constant channel gain matrix with its
element G jk representing the gain of the channel from transmitter antenna k to receiver
antenna j. The channel is discrete-time and the noise vector process {Z(i)} is i.i.d. with
Z(i) ∼ N(0, KZ ) for every transmission i ∈ [1 : n]. We assume average transmission power
constraint P on every codeword xn (m) = (x(m, 1), . . . , x(m, n)), i.e.,
n
󵠈 xT(m, i)x(m, i) ≤ nP, m ∈ [1 : 2nR ].
i=1

Remark 9.1. We can assume without loss of generality that KZ = Ir , since the channel
Y = GX + Z with a general KZ ≻ 0 can be transformed into the channel

̃ = K −1/2 Y
Y Z
̃
= KZ−1/2 GX + Z,

̃ = K −1/2 Z ∼ N(0, Ir ), and vice versa.

where Z Z

Remark 9.2. The Gaussian vector channel reduces to the Gaussian product channel in
Section . when r = t = d, G = diag(д1 , д2 , . . . , д d ), and KZ = Id .

The capacity of the Gaussian vector channel is obtained by a straightforward evalua-

tion of the formula for channel capacity with input cost in Theorem ..

Theorem .. The capacity of the Gaussian vector channel is

1
C= max log |GKX G T + Ir |.
KX ⪰0:tr(KX )≤P 2
9.1 Gaussian Vector Point-to-Point Channel 229

Proof. First note that the capacity with power constraint is upper bounded as

C≤ sup I(X; Y)
F(x):E(X󰑇 X)≤P
= sup h(Y) − h(Z)
F(x):E(X󰑇 X)≤P
1
= max log |GKX G T + Ir |,
KX ⪰0:tr(KX )≤P 2

where the last step follows by the maximum differential entropy lemma in Section ..
In particular, the supremum is attained by a Gaussian X with zero mean and covariance
matrix KX . With this choice of X, the output Y is also Gaussian with covariance matrix
GKX G T + Ir . To prove achievability, we resort to the DMC with input cost in Section .
and use the discretization procedure in Section .. This completes the proof of Theo-
rem ..
The optimal covariance matrix KX∗ can be characterized more explicitly. Suppose that
G has rank d and singular value decomposition G = ΦΓΨT with Γ = diag(γ1 , γ2 , . . . , γ d ).
Then

where (a) follows since |AB + I| = |BA + I| with A = ΦΓΨT KX ΨΓ and B = ΦT , and
(b) follows since ΦT Φ = Id (recall the definition of the singular value decomposition in
Notation), and (c) follows since the maximization problem is equivalent to that in (b) via
the transformations K̃ X = ΨT KX Ψ and KX = ΨK̃X ΨT . By Hadamard’s inequality, the op-
timal K̃X∗ is a diagonal matrix diag(P1 , P2 , . . . , Pd ) such that the water-filling condition is
satisfied, i.e.,
1 +
P j = 󶁤λ − 2 󶁴 ,
γi

where λ is chosen so that ∑ dj=1 P j = P. Finally, from the transformation between KX and
K̃X , the optimal KX∗ is KX∗ = ΨK̃X∗ ΨT . Thus the transmitter should align its signal direction
with the singular vectors of the effective channel and allocate an appropriate amount of
power in each direction to water-fill over the singular values.
230 Gaussian Vector Channels

9.1.1 Equivalent Gaussian Product Channel

The role of the singular values of the Gaussian vector channel gain matrix can be seen
more directly. Let Y = GX + Z, where the channel gain matrix G has a singular value
decomposition G = ΦΓΨT and rank d. We show that this channel is equivalent to the
Gaussian product channel

Ỹ j = γ j X̃ j + Z̃ j , j ∈ [1 : d],

where Z̃ j , j ∈ [1 : d], are independent Gaussian additive noise with common average
power of 1.
̃ and the
First consider the channel Y = GX + Z with the input transformation X = ΨX
̃
output transformation Y = Φ Y. This gives a Gaussian product channel
T

̃ = ΦT GX + ΦT Z
Y
̃ + ΦT Z
= ΦT (ΦΓΨT )ΨX
̃
̃ + Z,
= ΓX

where the zero-mean Gaussian noise vector Z ̃ = ΦT Z has covariance matrix ΦT Φ = I d .

Let K̃ X = E(X
̃X̃ T ) and KX = E(XXT ). Then

tr(KX ) = tr(ΨK̃X ΨT ) = tr(ΨT ΨK̃X ) = tr(K̃ X ).

̃ = ΓX
Hence, every code for the Gaussian product channel Y ̃ +Z ̃ can be transformed into
a code for the Gaussian vector channel Y = GX + Z with the same probability of error.
̃ = ΓX
Conversely, given the channel Y ̃ we perform the input transformation X
̃ + Z, ̃ =
󳰀 ̃
Ψ X and the output transformation Y = ΦY to obtain the channel
T

̃ = GX + ΦZ.
Y󳰀 = ΦΓΨT X + ΦZ ̃

Noting that ΦΦT ⪯ IR , we then add an independent Gaussian noise Z󳰀 ∼ N(0, Ir − ΦΦT )
to Y󳰀 , which yields
Y = Y󳰀 + Z󳰀 = GX + ΦZ ̃ + Z󳰀 = GX + Z,

where Z = ΦZ ̃ + Z󳰀 has the covariance matrix ΦΦT + (Ir − ΦΦT ) = Ir . Also since ΨΨT ⪯
It , tr(K̃X ) = tr(ΨT KX Ψ) = tr(ΨΨT KX ) ≤ tr(KX ). Hence every code for the Gaussian vec-
tor channel Y = GX + Z can be transformed into a code for the Gaussian product channel
̃ = ΓX
Y ̃ +Z ̃ with the same probability of error. Consequently, the two channels have the
same capacity.

9.1.2 Reciprocity
Since the channel gain matrices G and G T have the same set of (nonzero) singular values,
the channels with gain matrices G and G T have the same capacity. In fact, both channels
are equivalent to the same Gaussian product channel, and hence are equivalent to each
other. The following result is an immediate consequence of this equivalence, and will be
useful later in proving the Gaussian vector BC–MAC duality.
9.1 Gaussian Vector Point-to-Point Channel 231

Lemma . (Reciprocity Lemma). For every r × t channel matrix G and t × t matrix
K ⪰ 0, there exists an r × r matrix K ⪰ 0 such that

tr(K) ≤ tr(K),
|G KG + It | = |GKG T + Ir |.
T

To prove this lemma, consider the singular value decomposition G = ΦΓΨT and let
K = ΦΨT KΨΦT . Now, we check that K satisfies both properties. Indeed, since ΨΨT ⪯ It ,

tr(K) = tr(ΨΦT ΦΨT K) = tr(ΨΨT K) ≤ tr(K).

To check the second property, let d = rank(G) and consider

|G T KG + It | = |ΨΓΦT (ΦΨT KΨΦT )ΦΓΨT + It |

(a)
= |(ΨT Ψ)ΓΨT KΨΓ + I d |
= |ΓΨT KΨΓ + Id |
= |(ΦT Φ)ΓΨT KΨΓ + I d |
(b)
= |ΦΓΨT KΨΓΦT + Ir |
= |GKG T + Ir |,

where (a) follows since if A = ΨΓΨT KΨΓ and B = ΨT , then |I + AB| = |I + BA|, and (b)
follows similarly. This completes the proof of the reciprocity lemma.

9.1.3 Alternative Characterization of KX∗

Consider the Gaussian vector channel Y = GX + Z, where Z has an arbitrary covariance
matrix KZ ≻ 0. We have already characterized the optimal input covariance matrix KX∗
for the effective channel gain matrix KZ−1/2 G via singular value decomposition and water-
filling. Here we give an alternative characterization of KX∗ via Lagrange duality.
First note that the optimal input covariance matrix KX∗ is the solution to the convex
optimization problem

1
maximize log |GKX G T + KZ |
2
subject to KX ⪰ 0, (.)
tr(KX ) ≤ P.

With dual variables

tr(KX ) ≤ P ⇔ λ ≥ 0,
KX ⪰ 0 ⇔ Υ ⪰ 0,
232 Gaussian Vector Channels

we can form the Lagrangian

1
L(KX , Υ, λ) = log |GKX G T + KZ | + tr(ΥKX ) − λ(tr(KX ) − P).
2

Recall that if a convex optimization problem satisfies Slater’s condition (that is, the fea-
sible region has an interior point), then the Karush–Kuhn–Tucker (KKT) condition pro-
vides necessary and sufficient condition for the optimal solution; see Appendix E. For the
problem in (.), Slater’s condition is satisfied for any P > 0. Thus, a solution KX∗ is primal
optimal iff there exists a dual optimal solution (λ∗ , Υ∗ ) that satisfies the KKT condition:
∙ The Lagrangian is stationary (zero differential with respect to KX ), i.e.,

1 T
G (GKX∗ G T + KZ )−1 G + Υ∗ − λ∗ Ir = 0.
2

∙ The complementary slackness conditions

λ∗ 󶀡tr(KX∗ ) − P󶀱 = 0,
tr(Υ∗ KX∗ ) = 0

are satisfied.
In particular, fixing Υ∗ = 0, any solution KX∗ with tr(KX∗ ) = P is optimal if

1 T
G (GKX∗ G T + KZ )−1 G = λ∗ Ir
2

for some λ∗ > 0. Such a covariance matrix KX∗ corresponds to water-filling with all sub-
channels under water (which occurs when the SNR is sufficiently high).

9.2 GAUSSIAN VECTOR MULTIPLE ACCESS CHANNEL

Consider the MIMO multiple access communication system depicted in Figure ., where
each sender wishes to communicate an independent message to the receiver. Assume a
Gaussian vector multiple access channel (GV-MAC) model

Y = G1 X1 + G2 X2 + Z,

where Y is an r-dimensional output vector, X1 and X2 are t-dimensional input vectors,

G1 and G2 are r × t channel gain matrices, and Z ∼ N(0, KZ ) is an r-dimensional noise
vector. As before, we assume without loss of generality that KZ = Ir . We further assume
average power constraint P on each of X1 and X2 , i.e.,
n
󵠈 xTj (m j , i)x j (m j , i) ≤ nP, m j ∈ [1 : 2nR 󰑗 ], j = 1, 2.
i=1
9.2 Gaussian Vector Multiple Access Channel 233

Z
M1 X1n
Encoder  G1
Yn ̂ 1, M
M ̂2
Decoder
M2 X2n
Encoder  G2

Figure .. MIMO multiple access communication system.

The capacity region of the GV-MAC is given in the following.

for some K1 , K2 ⪰ 0 with tr(K j ) ≤ P, j = 1, 2.

To prove the theorem, note that the capacity region of the GV-MAC can be simply
characterized by the capacity region of the MAC with input costs in Problem . as the
set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y|X2 , Q),

R2 ≤ I(X2 ; Y|X1 , Q),
R1 + R2 ≤ I(X1 , X2 ; Y|Q)

for some conditionally independent X1 and X2 given Q that satisfy the constraints
E(XTj X j ) ≤ P, j = 1, 2. Furthermore, it is easy to show that |Q| = 1 suffices and that among
all input distributions with given correlation matrices K1 = E(X1 X1T ) and K2 = E(X2 X2T ),
Gaussian input vectors X1 ∼ N(0, K1 ) and X2 ∼ N(0, K2 ) simultaneously maximize all
three mutual information bounds. The rest of the achievability proof follows by the dis-
cretization procedure.

Remark 9.3. Unlike the point-to-point Gaussian vector channel, which is always equiva-
lent to a product of Gaussian channels, the GV-MAC cannot be factorized into a product
of Gaussian MACs in general.
Remark 9.4. The sum-capacity of the GV-MAC can be found by solving the maximiza-
tion problem
234 Gaussian Vector Channels

1
maximize log |G1 K1 G1T + G2 K2 G2T + Ir |
2
subject to tr(K j ) ≤ P,
K j ⪰ 0, j = 1, 2.

This optimization problem is convex and the optimal solution is attained when K1 is the
water-filling covariance matrix for the channel gain matrix G1 and the noise covariance
matrix G2 K2 G2T + Ir and K2 is the water-filling covariance matrix for the channel gain
matrix G2 and the noise covariance matrix G1 K1 G1T + Ir . The following iterative water-
filling algorithm finds the optimal (K1 , K2 ):

repeat
Σ1 ← G2 K2 G2T + Ir
K1 ← arg max log |G1 KG1T + Σ1 |
K:tr(K)≤P

Σ2 ← G1 K1 G1T + Ir
K2 ← arg max log |G2 KG2T + Σ2 |
K:tr(K)≤P

until the desired accuracy is reached.

It can be shown that this algorithm converges to the optimal solution (sum-capacity) from
any initial assignment of K1 and K2 .

9.2.1 GV-MAC with More than Two Senders

The capacity region of the GV-MAC can be extended to any number of senders. Consider
the k-sender GV-MAC model
k
Y = 󵠈 G j X j + Z,
j=1

where Z ∼ N(0, KZ ) is the noise vector. Assume an average power constraint P on each
X j . It can be shown that the capacity region is the set of rate tuples (R1 , . . . , Rk ) such that

1 󵄨󵄨 󵄨󵄨
󵠈 Rj ≤ log 󵄨󵄨󵄨 󵠈 G j K j G Tj + Ir 󵄨󵄨󵄨 , J ⊆ [1 : k],
2 󵄨󵄨 󵄨󵄨
j∈J j∈J

for some K1 , . . . , Kk ⪰ 0 with tr(K j ) ≤ P, j ∈ [1 : k]. The iterative water-filling algorithm

can be easily extended to find the sum-capacity achieving covariance matrices K1 , . . . , Kk .

9.3 GAUSSIAN VECTOR BROADCAST CHANNEL

Consider the MIMO broadcast communication system depicted in Figure .. The sender
wishes to communicate a common message M0 to the two receivers and a private message
9.4 Gaussian Product Broadcast Channel 235

Yn1 ̂ 01 , M
M ̂1
G1 Decoder 
M0 , M1 , M2 Xn
Encoder
Yn2 ̂ 02 , M
M ̂2
G2 Decoder 

Figure .. MIMO broadcast communication system.

M j to receiver j = 1, 2. The channel is modeled by a Gaussian vector broadcast channel

(GV-BC)
Y1 = G1 X + Z1 ,
Y2 = G2 X + Z2 ,
where G1 , G2 are r × t channel gain matrices and Z1 ∼ N(0, Ir ) and Z2 ∼ N(0, Ir ). Assume
the average transmission power constraint ∑ni=1 xT(m0 , m1 , m2 , i)x(m0 , m1 , m2 , i) ≤ nP
for (m0 , m1 , m2 ) ∈ [1 : 2nR0 ] × [1 : 2nR1 ] × [1 : 2nR2 ].
Note that unlike the scalar Gaussian BC, the Gaussian vector BC is not in general
degraded, and the capacity region is known only in several special cases.
∙ If t = r and G1 and G2 are diagonal, then the channel is a product of Gaussian BCs
and the capacity region is known. This case is discussed in the following section.
∙ If M0 = , then the (private-message) capacity region is known. The achievability
proof uses a generalization of the writing on dirty paper scheme to the vector case,
which is discussed in Section .. The converse proof is quite involved. In fact, even if
the channel is degraded, the converse proof does not follow simply by using the EPI
as in the converse proof for the scalar case in Section .. The converse proof is given
in Section ..
∙ If M1 =  (or M2 = ), then the (degraded message sets) capacity region is known.

9.4 GAUSSIAN PRODUCT BROADCAST CHANNEL

The Gaussian product broadcast channel is a special case of the Gaussian vector BC de-
picted in Figure .. The channel consists of a set of d parallel Gaussian BCs
Y1k = Xk + Z1k ,
Y2k = Xk + Z2k , k ∈ [1 : d],
236 Gaussian Vector Channels

where the noise components Z jk ∼ N(0, N jk ), j = 1, 2, k ∈ [1 : d], are mutually indepen-

dent. This channel models a continuous-time (waveform/spectral) Gaussian broadcast
channel, where the noise spectrum for each receiver varies over frequency or time. We
consider the problem of sending a common message M0 to both receivers and a private
message M j , j = 1, 2, to each receiver over this broadcast channel. Assume average power
constraint P on X d .
In general, the Gaussian product channel is not degraded, but a product of reversely
(or inconsistently) degraded broadcast channels. As such, we assume without loss of gen-
erality that N1k ≤ N2k for k ∈ [1 : r] and N1k > N2k for [r + 1 : d] as depicted in Figure ..
If r = d, then the channel is degraded and it can be easily shown that the capacity region
is the Minkowski sum of the capacity regions for each component Gaussian BC (up to
power allocation).

Z1r Z̃2r
Y1r

Xr Y2r

d
Z2,r+1 Z̃1,r+1
d
d
Y2,r+1

d d
Xr+1 Y1,r+1

Figure .. Gaussian product broadcast channel. Here Z2r = Z1r + Z̃2r and Z1,r+1
d
=
̃
Z1,r+1 + Z2,r+1 .
d d

Although the channel under consideration is not in general degraded, the capacity
region is established using superposition coding. Coding for the common message, how-
ever, requires special attention.

Theorem .. The capacity region of the Gaussian product broadcast channel is the
set of rate triples (R0 , R1 , R2 ) such that
r d
βk P α k βk P
R0 + R1 ≤ 󵠈 C 󶀥 󶀵 + 󵠈 C󶀥 󶀵,
k=1
N1k k=r+1
ᾱk βk P + N1k
r d
αk β k P β P
R0 + R2 ≤ 󵠈 C 󶀥 󶀵+ 󵠈 C󶀥 k 󶀵,
k=1
̄
αk βk P + N2k k=r+1
N2k
r
βk P d
α k βk P ᾱ β P
R0 + R1 + R2 ≤ 󵠈 C 󶀥 󶀵 + 󵠈 󶀥C 󶀥 󶀵 + C 󶀥 k k 󶀵󶀵 ,
k=1
N1k k=r+1
ᾱk βk P + N1k N2k
9.4 Gaussian Product Broadcast Channel 237

r
α k βk P ᾱ β P d
β P
R0 + R1 + R2 ≤ 󵠈 󶀥C 󶀥 󶀵 + C 󶀥 k k 󶀵󶀵 + 󵠈 C 󶀥 k 󶀵
k=1
ᾱk βk P + N2k N1k k=r+1
N2k

d
for some αk , βk ∈ [0, 1], k ∈ [1 : d], with ∑k=1 βk = 1.

To prove Theorem ., we first consider a product DM-BC consisting of two reversely
degraded DM-BCs (X1 , p(y11 |x1 )p(y21 |y11 ), Y11 × Y21 ) and (X2 , p(y22 |x2 )p(y12 |y22 ),
Y12 × Y22 ) depicted in Figure .. The capacity of this channel is as follows.

Proposition .. The capacity region of the product of two reversely degraded DM-
BCs is the set of rate triples (R0 , R1 , R2 ) such that

R0 + R1 ≤ I(X1 ; Y11 ) + I(U2 ; Y12 ),

R0 + R2 ≤ I(X2 ; Y22 ) + I(U1 ; Y21 ),
R0 + R1 + R2 ≤ I(X1 ; Y11 ) + I(U2 ; Y12 ) + I(X2 ; Y22 |U2 ),
R0 + R1 + R2 ≤ I(X2 ; Y22 ) + I(U1 ; Y21 ) + I(X1 ; Y11 |U1 )

for some pmf p(u1 , x1 )p(u2 , x2 ).

The converse proof of this proposition follows from the Nair–El Gamal outer bound
in Chapter . The proof of achievability uses rate splitting and superposition coding.
Rate splitting. Divide M j , j = 1, 2, into two independent messages: M j0 at rate R j0 and
M j j at rate R j j .
Codebook generation. Fix a pmf p(u1 , x1 )p(u2 , x2 ). Randomly and independently gen-
erate 2n(R0 +R10 +R20 ) sequence pairs (u1n , u2n )(m0 , m10 , m20 ), (m0 , m10 , m20 ) ∈ [1 : 2nR0 ] × [1 :
2nR10 ] × [1 : 2nR20 ], each according to ∏ni=1 pU1 (u1i )pU2 (u2i ). For each (m0 , m10 , m20 ), ran-
domly and conditionally independently generate 2nR 󰑗 󰑗 sequences x nj (m0 , m10 , m20 , m j j ),
m j j ∈ [1 : 2nR 󰑗 󰑗 ], j = 1, 2, each according to ∏ni=1 p X 󰑗 |U 󰑗 (x ji |u ji (m0 , m10 , m20 )).
Encoding. To send the message triple (m0 , m1 , m2 ) = (m0 , (m10 , m11 ), (m20 , m22 )), the
encoder transmits (x1n (m0 , m10 , m20 , m11 ), x2n (m0 , m10 , m20 , m22 )).

Y11
X1 p(y11 |x1 ) p(y21 |y11 ) Y21

Y22
X2 p(y22 |x2 ) p(y12 |y22 ) Y12

Figure .. Product of two reversely degraded DM-BCs.

238 Gaussian Vector Channels

Decoding and analysis of the probability of error. Decoder  finds the unique triple
(m̂ 01 , m
̂ 10 , m
̂ 11 ) such that ((u1n , u2n )(m ̂ 01 , m
̂ 10 , m20 ), x1n (m ̂ 01 , m
̂ 10 , m20 , m̂ 11 ), y11
n n
, y12 )∈
(n)
Tє for some m20 . Similarly, decoder  finds the unique triple (m ̂ 02 , m
̂ 20 , m
̂ 22 ) such that
((u1n , u2n )(m̂ 02 , m10 , m
̂ 20 ), x2n (m
̂ 02 , m10 , m
̂ 20 , m
̂ 22 ), y21
n n
, y22 ) ∈ Tє(n) for some m10 .
Using standard arguments, it can be shown that the probability of error for decoder 
tends to zero as n → ∞ if
R0 + R1 + R20 < I(U1 , U2 , X1 ; Y11 , Y12 ) − δ(є)
= I(X1 ; Y11 ) + I(U2 ; Y12 ) − δ(є),
R11 < I(X1 ; Y11 |U1 ) − δ(є).
Similarly, the probability of error for decoder  tends to zero as n → ∞ if

R0 + R10 + R2 < I(X2 ; Y22 ) + I(U1 ; Y21 ) − δ(є),

R22 < I(X2 ; Y22 |U2 ) − δ(є).
Substituting R j j = R j − R j0 , j = 1, 2, combining with the constraints R10 , R20 ≥ 0, and
eliminating R10 and R20 by the Fourier–Motzkin procedure in Appendix D completes the
achievability proof of Proposition ..

Remark 9.5. It is interesting to note that even though U1 and U2 are statistically inde-
pendent, the codebook for the common message is generated simultaneously, and not as
the product of two independent codebooks. Decoding for the common message at each
receiver is also performed simultaneously.
Remark 9.6. Recall that the capacity region C of the product of two degraded DM-BCs
p(y11 |x1 )p(y21 |y11 ) and p(y12 |x2 )p(y22 |y12 ) is the Minkowski sum of the individual ca-
pacity regions C1 and C2 , i.e., C = {(R01 + R02 , R11 + R12 , R21 + R22 ) : (R01 , R11 , R21 ) ∈
C1 , (R02 , R12 , R22 ) ∈ C2 }; see Problem .. However, the capacity region in Proposi-
tion . is in general larger than the Minkowski sum of the individual capacity regions.
For example, consider the special case of reversely degraded BC with Y21 = Y12 = . The
common-message capacities of the component BCs are C01 = C02 = 0, while the common-
message capacity of the product BC is

C0 = min󶁄max I(X1 ; Y11 ), max I(X2 ; Y22 )󶁔.

p(x1 ) p(x2 )

This shows that simultaneous codebook generation and decoding for the common mes-
sage can be much more powerful than product codebook generation and separate decod-
ing.
Remark 9.7. Proposition . can be extended to the product of any number of channels.
Suppose that Xk → Y1k → Y2k for k ∈ [1 : r] and Xk → Y2k → Y1k for k ∈ [r + 1 : d]. Then
the capacity region of the product of these d reversely degraded DM-BCs is
r d
R0 + R1 ≤ 󵠈 I(Xk ; Y1k ) + 󵠈 I(Uk ; Y1k ),
k=1 k=r+1
9.4 Gaussian Product Broadcast Channel 239

r d
R0 + R2 ≤ 󵠈 I(Uk ; Y2k ) + 󵠈 I(Xk ; Y2k ),
k=1 k=r+1
r d
R0 + R1 + R2 ≤ 󵠈 I(Xk ; Y1k ) + 󵠈 󶀡I(Uk ; Y1k ) + I(Xk ; Y2k |Uk )󶀱,
k=1 k=r+1
r d
R0 + R1 + R2 ≤ 󵠈 󶀡I(Uk ; Y2k ) + I(Xk ; Y1k |Uk )󶀱 + 󵠈 I(Xk ; Y2k )
k=1 k=r+1

for some pmf p(u1 , x1 )p(u2 , x2 ).

Proof of Theorem .. Achievability follows immediately by extending Proposition . to
the case with input cost, taking Uk ∼ N(0, αk βk P), Vk ∼ N(0, ᾱk βk P), k ∈ [1 : d], indepen-
dent of each other, and Xk = Uk + Vk , k ∈ [1 : d], and using the discretization procedure
in Section .. The converse follows by considering the rate region in Proposition . with
average power constraint and appropriately applying the EPI.

Remark .. The capacity region of the product of reversely degraded DM-BCs for more
than two receivers is not known in general.

Specialization to the private-message capacity region. By setting R0 = 0, Proposition .

yields the private-message capacity region C of the product of two reversely degraded
DM-BC that consists of all rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y11 ) + I(U2 ; Y12 ),

R2 ≤ I(X2 ; Y22 ) + I(U1 ; Y21 ),
R1 + R2 ≤ I(X1 ; Y11 ) + I(U2 ; Y12 ) + I(X2 ; Y22 |U2 ),
R1 + R2 ≤ I(X2 ; Y22 ) + I(U1 ; Y21 ) + I(X1 ; Y11 |U1 )
for some pmf p(u1 , x1 )p(u2 , x2 ). This can be further simplified as follows.
Let C11 = max p(x1 ) I(X1 ; Y11 ) and C22 = max p(x2 ) I(X2 ; Y22 ). Then the private-message
capacity region C is the set of rate pairs (R1 , R2 ) such that

R1 ≤ C11 + I(U2 ; Y12 ),

R2 ≤ C22 + I(U1 ; Y21 ),
(.)
R1 + R2 ≤ C11 + I(U2 ; Y12 ) + I(X2 ; Y22 |U2 ),
R1 + R2 ≤ C22 + I(U1 ; Y21 ) + I(X1 ; Y11 |U1 )
for some pmf p(u1 , x1 )p(u2 , x2 ). This region can be expressed alternatively as the inter-
section of two regions (see Figure .):
∙ C1 that consists of all rate pairs (R1 , R2 ) such that

R1 ≤ C11 + I(U2 ; Y12 ),

R1 + R2 ≤ C11 + I(U2 ; Y12 ) + I(X2 ; Y22 |U2 )
for some pmf p(u2 , x2 ), and
240 Gaussian Vector Channels

∙ C2 that consists of all rate pairs (R1 , R2 ) such that

R2 ≤ C22 + I(U1 ; Y21 ),
R1 + R2 ≤ C22 + I(U1 ; Y21 ) + I(X1 ; Y11 |U1 )
for some pmf p(u1 , x1 ).
Note that C1 is the capacity region of the enhanced degraded product BC with Y21 = Y11 ,
which is in general larger than the capacity region of the original product BC. Similarly
C2 is the capacity region of the enhanced degraded product BC with Y12 = Y22 . Thus
C ⊆ C1 ∩ C2 .
To establish the other direction, i.e., C ⊇ C1 ∩ C2 , we show that each boundary point
of C1 ∩ C2 lies on the boundary of the capacity region C . To show this, first note that
(C11 , C22 ) ∈ C is on the boundary of C1 ∩ C2 ; see Figure .. Moreover, each boundary
point (R1∗ , R2∗ ) such that R1∗ ≥ C11 is on the boundary of C1 and satisfies the conditions
R1∗ = C11 + I(U2 ; Y12 ), R2∗ = I(X2 ; Y22 |U2 ) for some pmf p(u2 , x2 ). By evaluating C with
the same p(u2 , x2 ), U1 = , and p(x1 ) that attains C11 , it follows that (R1∗ , R2∗ ) lies on the
boundary of C . We can similarly show every boundary point (R1∗ , R2∗ ) with R1∗ ≤ C11 also
lies on the boundary of C . This shows that C = C1 ∩ C2 .

C11 + C22 C1

(C11 , C22 )

C2
C1 ∩ C2

R1
C11 + C22

Figure .. The private-message capacity region of a reversely degraded DM-BC:

C = C1 ∩ C2 .

Remark 9.9. The above argument establishes the converse for the private-message capac-
ity region immediately, since the capacity region of the original BC is contained in that
of each enhanced degraded BC. As we will see in Section .., this channel enhancement
approach turns out to be crucial for proving the converse for the general Gaussian vector
BC.
Remark 9.10. Unlike the capacity region for the general setting with common message
in Proposition ., the private-message capacity region in (.) can be generalized to more
than two receivers.
9.5 Vector Writing on Dirty Paper 241

9.5 VECTOR WRITING ON DIRTY PAPER

Consider the problem of communicating a message over a Gaussian vector channel with
additive Gaussian vector state when the state sequence is available noncausally at the en-
coder as depicted in Figure .. The channel output corresponding to the input X is

Y = GX + S + Z,

where the state S ∼ N(0, KS ) and the noise Z ∼ N(0, Ir ) are independent. Assume an
average power constraint ∑ni=1 E(xT (m, S, i)x(m, S, i)) ≤ nP, m ∈ [1 : 2nR ].

S Z
Sn

M Xn Yn ̂
M
Encoder Decoder

Figure .. Gaussian vector channel with additive Gaussian state available non-
causally at the decoder.

This setup is a generalization of the scalar case discussed in Section ., for which the
capacity is the same as when the state is not present and is achieved via writing on dirty
paper (that is, Gelfand–Pinsker coding specialized to this case).
It turns out that the same result holds for the vector case, that is, the capacity is the
same as if S were not present, and is given by
1
C = max log |GKX G T + Ir |.
tr(KX )≤P 2

As for the scalar case, this result follows by appropriately evaluating the Gelfand–Pinsker
capacity expression

C= sup 󶀡I(U; Y) − I(U; S)󶀱.

F(u|s), x(u,s):E(X󰑇 X)≤P

Let U = X + AS, where X ∼ N(0, KX ) is independent of S and

A = KX G T (GKX G T + Ir )−1 .

We can easily check that the matrix A is chosen such that A(GX + Z) is the MMSE estimate
of X. Thus, X − A(GX + Z) is independent of GX + Z, S, and hence Y = GX + Z + S.
Finally consider

h(U|S) = h(X + AS|S)

= h(X)
242 Gaussian Vector Channels

and

h(U|Y) = h(X + AS|Y)

= h(X + AS − AY|Y)
= h(X − A(GX + Z)|Y)
= h(X − A(GX + Z))
= h(X − A(GX + Z)|GX + Z)
= h(X|GX + Z).

Combining the above results implies that we can achieve

h(X) − h(X|GX + Z) = I(X; GX + Z)

1
= log |Ir + GKX G T |
2

for every KX with tr(KX ) ≤ P. Note that this vector writing on dirty paper result holds for
any additive (non-Gaussian) state S independent of the Gaussian noise Z.

9.6 GAUSSIAN VECTOR BC WITH PRIVATE MESSAGES

Again consider the MIMO broadcast communication system depicted in Figure .. Sup-
pose that M0 = , that is, the sender only wishes to communicate a private message M j to
receiver j = 1, 2. Define the following two rate regions:

∙ R1 that consists of all rate pairs (R1 , R2 ) such that

for some K1 , K2 ⪰ 0 with tr(K1 + K2 ) ≤ P, and

∙ R2 that consists of all rate pair (R1 , R2 ) such that

for some K1 , K2 ⪰ 0 with tr(K1 + K2 ) ≤ P.

9.6 Gaussian Vector BC with Private Messages 243

Let RWDP be the convex hull of the union of R1 and R2 , as illustrated in Figure .. This
characterizes the capacity region.

Theorem .. The private-message capacity region of the Gaussian vector BC is

C = RWDP .

This capacity region coincides with Marton’s inner bound without U0 (see Section .)
with properly chosen Gaussian random vectors (U1 , U2 , X). Equivalently and more intu-
itively, achievability can be established via the vector writing on dirty paper scheme dis-
cussed in Section .. The converse proof uses the channel enhancement idea introduced
in Section ., convex optimization techniques (see Appendix E for a brief description of
these techniques), Gaussian BC–MAC duality, and the conditional vector EPI.

R2
R1
RWDP

Figure .. The capacity region of a GV-BC: C = RWDP = co(R1 ∪ R2 ).

9.6.1 Proof of Achievability

We illustrate the achievability of every rate pair in the interior of R1 (or R2 ). The rest of
the capacity region is achieved using time sharing between points in R1 and R2 . First
consider a rate pair (R1 , R2 ) in the interior of R2 . The writing on dirty paper scheme is
illustrated in Figure .. Fix the covariance matrices K1 and K2 such that tr(K1 + K2 ) ≤ P,
and let X = X1 + X2 , where X1 ∼ N(0, K1 ) and X2 ∼ N(0, K2 ) are independent. Encoding
is performed successively. In the figure M2 is encoded first.

∙ To send M2 to Y2 , we consider the channel Y2 = G2 X2 + G2 X1 + Z2 with input X2 ,

additive independent Gaussian interference signal G2 X1 , and additive Gaussian noise
Z2 . Treating the interference signal G2 X1 as noise, by Theorem . for the Gaussian
vector channel, M2 can be sent reliably to Y2 if

1 |G K G T + G2 K1 G2T + Ir |
R2 < log 2 2 2 .
2 |G2 K1 G2T + Ir |
244 Gaussian Vector Channels

M1 X1n Y1n
M1 -encoder
G1
n
X

M2 X2n G2 Y2n
M2 -encoder

Figure .. Writing on dirty paper for the Gaussian vector BC.

∙ To send M1 to Y1 , consider the channel Y1 = G1 X1 + G1 X2 + Z1 , with input X1 , in-

dependent additive Gaussian state G1 X2 , and additive Gaussian noise Z1 , where the
state vector G1 X2n (M2 ) is available noncausally at the encoder. By the vector writing
on dirty paper result in Section ., M1 can be sent reliably to Y1 if

1
R1 < log |G1 K1 G1T + Ir |.
2

Now, by considering the other message encoding order, we can similarly obtain the
constraints on the rates

The rest of the achievability proof follows as for the scalar case.
Computing the capacity region is difficult because the rate terms for R j , j = 1, 2, are
not concave functions of K1 and K2 . This difficulty can be overcome by the following
duality result between the Gaussian vector BC and the Gaussian vector MAC. This result
will prove useful in the proof of the converse.

9.6.2 Gaussian Vector BC–MAC Duality

Given the GV-BC (referred to as the original BC) with channel gain matrices G1 and G2
and power constraint P, consider a GV-MAC with channel gain matrices G1T and G2T
(referred to as the dual MAC) as depicted in Figure ..
9.6 Gaussian Vector BC with Private Messages 245

Z1 ∼ N(0, Ir )

Z ∼ N(0, It )
G1 Y1 X1 G1T

X Y

G2 Y2 X2 G2T

Z2 ∼ N(0, Ir )

Original BC Dual MAC

Figure .. Original BC and its dual MAC.

The following states the dual relationship between these two channels.

Lemma . (BC–MAC Duality Lemma). Let CDMAC denote the capacity of the dual
MAC under the sum-power constraint
n
󵠈(x1T(m1 , i)x1 (m1 , i) + x2T(m2 , i)x2 (m2 , i)) ≤ nP for every (m1 , m2 ).
i=1

Then
RWDP = CDMAC .

Note that this lemma generalizes the scalar Gaussian BC–MAC duality result in Prob-
lem .. The proof is given in Appendix A.
It is easy to characterize CDMAC (see Figure .). Let K1 and K2 be the covariance
matrices for each sender and R(K1 , K2 ) be the set of rate pairs (R1 , R2 ) such that
1
R1 ≤ log |G1T K1 G1 + It |,
2
1
R2 ≤ log |G2T K2 G2 + It |,
2
1
R1 + R2 ≤ log |G1T K1 G1 + G2T K2 G2 + It |.
2
Then any rate pair in the interior of R(K1 , K2 ) is achievable and thus the capacity region
of the dual MAC under the sum-power constraint is
CDMAC = 󵠎 R(K1 , K2 ).
K1 ,K2 ⪰0:tr(K1 )+tr(K2 )≤P

The converse proof of this statement follows the same steps as that for the GV-MAC with
individual power constraints.
246 Gaussian Vector Channels

R2
CDMAC

R(K1 , K2 )

Figure .. Dual GV-MAC capacity region CDMAC .

The dual representation RWDP = CDMAC exhibits the following useful properties:
∙ The rate constraint terms for R(K1 , K2 ) are concave functions of (K1 , K2 ), so we can
use tools from convex optimization.
∙ The region ⋃K1 ,K2 R(K1 , K2 ) is closed and convex.
Consequently, each boundary point (R1∗ , R2∗ ) of CDMAC lies on the boundary of R(K1 , K2 )
for some K1 and K2 , and is a solution to the convex optimization problem

̄ 2
maximize αR1 + αR
subject to (R1 , R2 ) ∈ CDMAC

for some α ∈ [0, 1]. That is, (R1∗ , R2∗ ) lies on the supporting line with slope −α/ᾱ as shown
in Figure .. Now it can be easily seen that each boundary point (R1∗ , R2∗ ) of CDMAC is a
corner point of R(K1 , K2 ) for some K1 and K2 (we refer to such a corner point (R1∗ , R2∗ ) as
a boundary corner point) or a convex combination of boundary corner points (or both).
Furthermore, if a boundary corner point (R1∗ , R2∗ ) is inside the positive quadrant, i.e.,
R1∗ , R2∗ > 0, then it has a unique supporting line; see Appendix B for the proof. In other
words, CDMAC does not have a kink inside the positive quadrant.

R2 R2
(R1∗ (α), R2∗ (α))

Slope −α/ᾱ
Slope −α/ᾱ
(R1∗ (α), R2∗ (α))
R(K1∗ , K2∗ )

R(K1∗ , K2∗ )

R1 R1

Figure .. Boundary corner points of the dual MAC capacity region.
9.6 Gaussian Vector BC with Private Messages 247

9.6.3 Proof of the Converse

To prove the converse for Theorem ., by the BC–MAC duality, it suffices to show that
every achievable rate pair (R1 , R2 ) for the original GV-BC is in CDMAC . First note that the
end points of CDMAC are (C1 , 0) and (0, C2 ), where

1
Cj = max log |G Tj K j G j + It |, j = 1, 2,
K 󰑗 :tr(K 󰑗 )≤P 2

is the capacity of the channel to each receiver. By the reciprocity lemma (Lemma .), these
corner points correspond to the individual capacity bounds for the GV-BC and hence are
on the boundary of its capacity region C .
We now focus on proving the optimality of CDMAC inside the positive quadrant. The
proof consists of the following three steps:

∙ First we characterize the boundary corner point (R1∗ (α), R2∗ (α)) associated with the
unique supporting line of slope −α/ᾱ via Lagrange duality.
∙ Next we construct an enhanced degraded Gaussian vector BC, referred to as DBC(α),
for each boundary corner point (R1∗ (α), R2∗ (α)) such that the capacity region C (of the
original BC) is contained in the enhanced BC capacity region CDBC(α) (Lemma .) as
illustrated in Figure ..
∙ We then show that the boundary corner point (R1∗ (α), R2∗ (α)) is on the boundary of
CDBC(α) (Lemma .). Since CDMAC = RWDP ⊆ C ⊆ CDBC(α) , we conclude that each
boundary corner point (R1∗ (α), R2∗ (α)) is on the boundary of C as illustrated in Fig-
ure ..

Finally, since every boundary corner point (R1∗ (α), R2∗ (α)) of CDMAC inside the positive
quadrant has a unique supporting line, the boundary of CDMAC must coincide with that
of C by Lemma A., which completes the proof of the converse. We now give the details
of the above three steps.

R2
CDBC(α)

C
RWDP
(R1∗ (α), R2∗ (α))

Figure .. Illustration of step : RWDP ⊆ C ⊆ CDBC(α) .

248 Gaussian Vector Channels

CDBC(α)
C
RWDP
(R1∗ (α), R2∗ (α))

Figure .. Illustration of step : Optimality of (R1∗ (α), R2∗ (α)).

Step  (Boundary corner points of CDMAC via Lagrange duality). Recall that every
boundary corner point (R1∗ , R2∗ ) inside the positive quadrant maximizes αR1 + αR
̄ 2 for
̄ Then the rate pair
some α ∈ [0, 1]. Assume without loss of generality that α ≤ 1/2 ≤ α.
∗ ∗
1 |G K G + G K G + I |
T T
R1∗ = log 1 1 T1 ∗ 2 2 2 t ,
2 |G2 K2 G2 + It |
1
R2∗ = log |G2T K2∗ G2 + It |
2
uniquely corresponds to an optimal solution to the convex optimization problem
α ᾱ − α
maximize log |G1T K1 G1 + G2T K2 G2 + It | + log |G2T K2 G2 + It |
2 2
subject to tr(K1 ) + tr(K2 ) ≤ P, K1 , K2 ⪰ 0.
Introducing the dual variables
tr(K1 ) + tr(K2 ) ≤ P ⇔ λ ≥ 0,
K1 , K2 ⪰ 0 ⇔ Υ1 , Υ2 ⪰ 0,
we form the Lagrangian
α ᾱ − α
L(K1 , K2 , Υ1 , Υ2 , λ) = log |G1T K1 G1 + G2T K2 G2 + It | + log |G2T K2 G2 + It |
2 2
+ tr(Υ1 K1 ) + tr(Υ2 K2 ) − λ(tr(K1 ) + tr(K2 ) − P).
Since Slater’s condition is satisfied for P > 0, the KKT condition characterizes the optimal
solution (K1∗ , K2∗ ), that is, a primal solution (K1∗ , K2∗ ) is optimal iff there exists a dual
optimal solution (λ∗ , Υ1∗ , Υ2∗ ) such that

λ∗ G1 Σ1 G1T + Υ1∗ − λ∗ Ir = 0,
λ∗ G2 Σ2 G2T + Υ2∗ − λ∗ Ir = 0,
(.)
λ∗ 󶀡tr(K1∗ ) + tr(K2∗ ) − P󶀱 = 0,
tr(Υ1∗ K1∗ ) = tr(Υ2∗ K2∗ ) = 0,
9.6 Gaussian Vector BC with Private Messages 249

where
α −1
Σ1 = ∗
󶀡G1T K1∗ G1 + G2T K2∗ G2 + It 󶀱 ,
2λ
α −1 ᾱ − α T ∗ −1
Σ2 = ∗ 󶀡G1T K1∗ G1 + G2T K2∗ G2 + It 󶀱 + 󶀡G2 K2 G2 + It 󶀱 .
2λ 2λ∗

Note that λ∗ > 0 by the first two equality conditions and the positive definite property of
Σ1 and Σ2 . Further define
α −1
K1∗∗ = ∗
󶀡G2T K2∗ G2 + It 󶀱 − Σ1 ,
2λ
(.)
ᾱ
K2∗∗ = ∗ It − K1∗∗ − Σ2 .
2λ
It can be easily verified that
. K1∗∗ , K2∗∗ ⪰ 0,
. tr(K1∗∗ ) + tr(K2∗∗ ) = P (since for a matrix A, I − (I + A)−1 = (I + A)−1 A), and
. the boundary corner point (R1∗ , R2∗ ) can be written as

1 |K ∗∗ + Σ1 |
R1∗ = log 1 ,
2 |Σ1 |
(.)
1 |K ∗∗ + K ∗∗ + Σ2 |
R2∗ = log 1 ∗∗ 2 .
2 |K1 + Σ2 |

Step  (Construction of the enhanced degraded GV-BC). For each boundary corner
point (R1∗ (α), R2∗ (α)) corresponding to the supporting line (α, α),
̄ we define the GV-BC
DBC(α)

̃ 1 = X + W1 ,
Y
̃ 2 = X + W2 ,
Y

where W1 ∼ N(0, Σ1 ) and W2 ∼ N(0, Σ2 ) are the noise vector components. As in the
original BC, we assume average power constraint P on X. Since Σ2 ⪰ Σ1 ≻ 0, the enhanced
channel DBC(α) is a degraded BC. Assume without loss of generality that it is physically
degraded, as shown in Figure ..

W1 ∼ N(0, Σ1 ) ̃ 2 ∼ N(0, Σ2 − Σ1 )
W
̃1
Y

X ̃2
Y

Figure .. DBC(α).

250 Gaussian Vector Channels

We now show the following.

Lemma .. The capacity region CDBC(α) of DBC(α) is an outer bound on the capacity
region C of the original BC, i.e.,

C ⊆ CDBC(α) .

To prove this lemma, we show that the original BC is a degraded version of the en-
hanced DBC(α). This clearly implies that any code for the original BC achieves the same
or smaller probability of error when used for the DBC(α); hence every rate pair in C is
also in CDBC(α) . By the KKT optimality conditions for (K1∗ , K2∗ ) in (.),

Ir − G j Σ j G Tj ⪰ 0, j = 1, 2.

Now each receiver j = 1, 2 of DBC(α) can multiply the received signal Y ̃ j by G j and add
󳰀
an independent noise vector W j ∼ N(0, Ir − G j Σ j G j ) to form the new output
T

̃ j + W󳰀j
Y󳰀j = G j Y
= G j X + G j W j + W󳰀j ,

where the zero-mean Gaussian noise G j W j + W󳰀j has covariance matrix G j Σ j G Tj + (Ir −
G j Σ j G Tj ) = Ir . Thus the transformed received signal Y󳰀j has the same distribution as the
received signal Y j for the original BC with channel gain matrix G j and noise covariance
matrix Ir .
Step  (Optimality of (R1∗ (α), R2∗ (α))). We are now ready to establish the final step in the
proof of the converse.

Lemma .. Every boundary corner point (R1∗ (α), R2∗ (α)) in CDMAC in the positive
quadrant corresponding to the supporting line of slope −α/ᾱ is on the boundary of the
capacity region of DBC(α).

To prove this lemma, we follow similar steps to the proof of the converse for the scalar
Gaussian BC in Section ... First recall the representation of the boundary corner point
(R1∗ , R2∗ ) in (.). Consider a (2nR1 , 2nR2 , n) code for DBC(α) with limn→∞ Pe(n) = 0. To
prove the optimality of (R1∗ , R2∗ ), we show that if R1 > R1∗ , then R2 ≤ R2∗ .
First note that h(Y ̃ 2n ) = h(Xn + Wn2 ) is upper bounded by

n
max log󶀡(2πe)t |KX + Σ2 |󶀱.
KX :tr(KX )≤P 2
9.6 Gaussian Vector BC with Private Messages 251

Recalling the properties

tr(K1∗∗ + K2∗∗ ) = P,
ᾱ
K1∗∗ + K2∗∗ + Σ2 = ∗ It ,
2λ
we see that the covariance matrix (K1∗∗ + K2∗∗ ) satisfies the KKT condition for the point-
to-point channel Y2 = X + Z2 . Therefore

̃ n2 ) ≤ n
h(Y log󶀡(2πe)t |K1∗∗ + K2∗∗ + Σ2 |󶀱.
2
As in the converse proof for the scalar Gaussian BC, we use the conditional vector EPI to
lower bound h(Y ̃ 2n |M2 ). By Fano’s inequality and the assumption R1∗ < R1 ,

n |K ∗∗ + Σ1 |
log 1 = nR1∗
2 |Σ1 |
≤ I(M1 ; Y1n |M2 ) + nєn
̃ 1n |M2 ) − h(Y
= h(Y ̃ n1 |M1 , M2 ) + nєn
̃ 1n |M2 ) − n
≤ h(Y log󶀡(2πe)t |Σ1 |󶀱 + nєn ,
2
or equivalently,
̃ n1 |M2 ) ≥ n
h(Y log󶀡(2πe)t |K1∗∗ + Σ1 |󶀱 − nєn .
2

for some єn󳰀 that tends to zero as n → ∞. But from the definitions of Σ1 , Σ2 , K1∗∗ , K2∗∗ ,
the matrices
α
K1∗∗ + Σ1 = (G T K ∗ G + I )−1 ,
2λ∗ 2 2 2 t
ᾱ − α T ∗
Σ2 − Σ1 = (G2 K2 G2 + It )−1
2λ∗
are scaled versions of each other. Hence

|K1∗∗ + Σ1 | 1/t + |Σ2 − Σ1 | 1/t = 󵄨󵄨󵄨󵄨(K1∗∗ + Σ1 ) + (Σ2 − Σ1 )󵄨󵄨󵄨󵄨

1/t

= |K1∗∗ + Σ2 | 1/t .

Therefore
̃ n2 |M2 ) ≥ n
h(Y log󶀡(2πe)t |K1∗∗ + Σ2 |󶀱 − nєn󳰀 .
2
252 Gaussian Vector Channels

Combining the bounds and taking n → ∞, we finally obtain

1 |K ∗∗ + K ∗∗ + Σ2 |
R2 ≤ log 1 ∗∗ 2 = R2∗ .
2 |K1 + Σ2 |
This completes the proof of Lemma .. Combining the results from the previous steps
completes the proof of the converse and Theorem ..

9.6.4 GV-BC with More than Two Receivers

The capacity region of the -receiver GV-BC can be extended to an arbitrary number of
receivers. Consider the k-receiver Gaussian vector broadcast channel
Yj = GjX + Zj, j ∈ [1 : k],
where Z j ∼ N(0, Ir ), j ∈ [1 : k]. Assume average power constraint P on X. The capacity
region of this channel is as follows.

Theorem .. The capacity region of the k-receiver GV-BC is the convex hull of the
set of rate tuples (R1 , R2 , . . . , Rk ) such that
󵄨󵄨∑ 󳰀 G 󳰀 K 󳰀 G T 󳰀 + I 󵄨󵄨
󵄨 j ≥ j σ( j ) σ( j ) σ( j ) r 󵄨󵄨
≤ log 󵄨󵄨
1
Rσ( j)
2 󵄨󵄨∑ j 󳰀 > j Gσ( j 󳰀 ) Kσ( j󳰀 ) G T 󳰀 + Ir 󵄨󵄨󵄨
󵄨 σ( j ) 󵄨

for some permutation σ on [1 : k] and positive semidefinite matrices K1 , . . . , Kk with

∑ j tr(K j ) ≤ P.

For a given σ, the corresponding rate region is achievable by writing on dirty paper
with decoding order σ(1) → ⋅ ⋅ ⋅ → σ(k). As in the -receiver case, the converse proof
hinges on the BC–MAC duality, which can be easily extended to more than two receivers.
First, it can be shown that the corresponding dual MAC capacity region CDMAC consists
of all rate tuples (R1 , R2 , . . . , Rk ) such that
1 󵄨󵄨 󵄨󵄨
󵠈 R j ≤ log 󵄨󵄨󵄨 󵠈 G Tj K j G j + It 󵄨󵄨󵄨 , J ⊆ [1 : k],
2 󵄨
󵄨 󵄨󵄨
j∈J j∈J

for some positive semidefinite matrices K1 , . . . , Kk with ∑ j tr(K j ) ≤ P. The optimality

of CDMAC can be proved by induction on k. For the case R j = 0 for some j ∈ [1 : k],
the problem reduces to proving the optimality of CDMAC for k − 1 receivers. Therefore,
we can consider (R1 , . . . , Rk ) in the positive orthant only. Now as in the -receiver case,
each boundary corner point can be shown to be optimal by constructing a corresponding
degraded GV-BC, and showing that the boundary corner point is on the boundary of the
capacity region of the degraded GV-BC and hence on the boundary of the capacity region
of the original GV-BC. It can be also shown that CDMAC does not have a kink, that is, every
boundary corner point has a unique supporting hyperplane. Therefore, the boundary of
CDMAC must coincide with the capacity region, which proves the optimality of writing on
dirty paper.
Summary 253

SUMMARY

∙ Gaussian vector channel models

∙ Reciprocity via singular value decomposition
∙ Water-filling and iterative water-filling
∙ Vector writing on dirty paper:
∙ Marton’s coding scheme
∙ Optimal for the Gaussian vector BC with private messages
∙ BC–MAC duality
∙ Proof of the converse for the Gaussian vector BC:
∙ Converse for each boundary corner point
∙ Characterization of the corner point via Lagrange duality and KKT condition
∙ Construction of an enhanced degraded BC
∙ Use of the vector EPI for the enhanced degraded BC
∙ Open problems:
9.1. What is the capacity region of the Gaussian product BC with more than  re-
ceivers?
9.2. Can the converse for the Gaussian vector BC be proved directly by optimizing
the Nair–El Gamal outer bound?
9.3. What is the capacity region of the -receiver Gaussian vector BC with common
message?

BIBLIOGRAPHIC NOTES

The Gaussian vector channel was first considered by Telatar () and Foschini ().
The capacity region of the Gaussian vector MAC in Theorem . was established by Cheng
and Verdú (). The iterative water-filling algorithm in Section . is due to Yu, Rhee,
Boyd, and Cioffi (). The capacity region of the degraded Gaussian product broadcast
channel was established by Hughes-Hartogs (). The more general Theorem . and
Proposition . appeared in El Gamal (). The private-message capacity region in (.)
is due to Poltyrev ().
Caire and Shamai () devised a writing on dirty paper scheme for the Gaussian
vector broadcast channel and showed that it achieves the sum-capacity for r = 2 and t = 1
antennas. The sum-capacity for an arbitrary number of antennas was established by Vish-
wanath, Jindal, and Goldsmith () and Viswanath and Tse () using the BC–MAC
254 Gaussian Vector Channels

duality and by Yu and Cioffi () using a minimax argument. The capacity region in
Theorem . was established by Weingarten, Steinberg, and Shamai (), using the
technique of channel enhancement. Our proof is a simplified version of the proof by
Mohseni ().
A recent survey of the literature on Gaussian vector channels can be found in Biglieri,
Calderbank, Constantinides, Goldsmith, Paulraj, and Poor ().

PROBLEMS

.. Show that the capacity region of the reversely degraded DM-BC in Proposition .
coincides with both the Marton inner bound and the Nair–El Gamal outer bound
in Chapter .
.. Establish the converse for Theorem . starting from the rate region in Proposi-
tion . with average power constraint.
.. Time division for the Gaussian vector MAC. Consider a -sender Gaussian vector
MAC with channel output Y = (Y1 , Y2 ) given by

Y1 = X1 + Z1
Y2 = X1 + X2 + Z2 ,

where Z1 ∼ N(0, 1) and Z2 ∼ N(0, 1) are independent. Assume average power

constraint P on each of X1 and X2 .
(a) Find the capacity region.
(b) Find the time-division inner bound (with power control). Is it possible to
achieve any point on the boundary of the capacity region (except for the end
points)?
.. Sato’s outer bound for the Gaussian IC. Consider the Gaussian IC in Section .
with channel gain matrix
д д12
G = 󶁦 11 󶁶
д21 д22

and SNRs S1 and S2 . Show that if a rate pair (R1 , R2 ) is achievable, then it must
satisfy the inequalities
R1 ≤ C(S1 ),
R2 ≤ C(S2 ),
1
R1 + R2 ≤ min log |PGG T + K |,
K 2
where the minimum is over all covariance matrices K of the form
1 ρ
K =󶁦 󶁶.
ρ 1
Appendix 9A Proof of the BC–MAC Duality Lemma 255

APPENDIX 9A PROOF OF THE BC–MAC DUALITY LEMMA

We show that RWDP ⊆ CDMAC . The other direction of inclusion can be proved similarly.
Consider the writing on dirty paper scheme in which the message M1 is encoded be-
fore M2 . Denote by K1 and K2 the covariance matrices for X11 n
(M1 ) and X21
n
(X11
n
, M1 ),
respectively. We know that the following rates are achievable:

where Σ1−1/2 is the symmetric square root inverse of Σ1 = G1 K2 G1T + Ir and K 1 is obtained
by the reciprocity lemma in Section .. for the covariance matrix K1 and the channel
matrix Σ1−1/2 G1 such that tr(K 1 ) ≤ tr(K1 ) and

|Σ1−1/2 G1 K1 G1T Σ1−1/2 + Ir | = |G1T Σ1−1/2 K 1 Σ1−1/2 G1 + It |.

Further let
K2󳰀 = Σ21/2 K2 Σ21/2 ,

where Σ21/2 is the symmetric square root of Σ2 = G1T K1󳰀 G1 + It and the bar over Σ21/2 K2 Σ21/2
means K2󳰀 is obtained by the reciprocity lemma for the covariance matrix Σ21/2 K2 Σ21/2 and
the channel matrix G2 Σ2−1/2 such that

tr(K2󳰀 ) ≤ tr(Σ21/2 K2 Σ21/2 )

and
|G2 Σ2−1/2 (Σ21/2 K2 Σ21/2 )Σ2−1/2 G2T + Ir | = |Σ2−1/2 G2T K2󳰀 G2 Σ2−1/2 + It |.

Using the covariance matrices K1󳰀 and K2󳰀 for senders  and , respectively, the following
rates are achievable for the dual MAC when the receiver decodes for M2 before M1 (the
reverse of the encoding order for the BC):
1
R1󳰀 = log |G1T K1󳰀 G1 + It |,
2
1 |G T K 󳰀 G + G T K 󳰀 G + I |
R2󳰀 = log 1 1 T1 󳰀 2 2 2 t .
2 |G1 K1 G1 + It |

From the definitions of K1󳰀 , K2󳰀 , Σ1 , and Σ2 , we have

1 |G1 (K1 + K2 )G1T + Ir |

R1 = log
2 |G1 K2 G1T + Ir |
256 Gaussian Vector Channels

1 |G K G + Σ1 | T
= log 1 1 1
2 |Σ1 |
1
= log |Σ1 G1 K1 G1T Σ1−1/2 + Ir |
−1/2
2
1
= log |G1T Σ1−1/2 K 1 Σ1−1/2 G1 + It |
2
= R1󳰀 ,

and similarly we can show that R2 = R2󳰀 . Furthermore

tr(K2󳰀 ) = tr 󶀤Σ21/2 K2 Σ21/2 󶀴

≤ tr(Σ21/2 K2 Σ21/2 )
= tr(Σ2 K2 )
= tr 󶀢(G1T K1󳰀 G1 + It )K2 󶀲
= tr(K2 ) + tr(K1󳰀 G1 K2 G1T )
= tr(K2 ) + tr 󶀢K1󳰀 (Σ1 − Ir )󶀲
= tr(K2 ) + tr(Σ11/2 K1󳰀 Σ11/2 ) − tr(K1󳰀 )
= tr(K2 ) + tr(K 1 ) − tr(K1󳰀 )
≤ tr(K2 ) + tr(K1 ) − tr(K1󳰀 ).

Therefore, any point in RWDP is also achievable for the dual MAC under the sum-power
constraint. The proof for the case where M2 is encoded before M1 follows similarly.

APPENDIX 9B UNIQUENESS OF THE SUPPORTING LINE

We show that every boundary corner point (R1∗ , R2∗ ) with R1∗ , R2∗ > 0 has a unique sup-
̄ Consider a boundary corner point (R1∗ , R2∗ ) in the positive quadrant.
porting line (α, α).
Assume without loss of generality that 0 ≤ α ≤ 1/2 and
∗ ∗
1 |G K G + G K G + I |
T T
R1∗ = log 1 1 T1 ∗ 2 2 2 t ,
2 |G2 K2 G2 + It |
1
R2∗ = log |G2T K2∗ G2 + It |,
2
where (K1∗ , K2∗ ) is an optimal solution to the convex optimization problem
α ᾱ − α
maximize log |G1T K1 G1 + G2T K2 G2 + It | + log |G2T K2 G2 + It |
2 2
subject to tr(K1 ) + tr(K2 ) ≤ P
K1 , K2 ⪰ 0.

We prove the uniqueness by contradiction. Suppose that (R1∗ , R2∗ ) has another supporting
̄ ̸= (α, α).
line (β, β) ̄ First note that α and β must be nonzero. Otherwise, K1∗ = 0, which
Appendix 9B Uniqueness of the Supporting Line 257

contradicts the assumption that R1∗ > 0. Also by the assumption that R1∗ > 0, we must
have G1T K1∗ G1 ̸= 0 as well as K1∗ ̸= 0. Now consider a feasible solution of the optimization
problem at ((1 − є)K1∗ , єK1∗ + K2∗ ), given by

α ᾱ − α
log 󵄨󵄨󵄨󵄨(1 − є)G1T K1∗ G1 + G2T (єK1∗ + K2∗ )G2 + It 󵄨󵄨󵄨󵄨 + log 󵄨󵄨󵄨󵄨G2T (єK1∗ + K2∗ )G2 + It 󵄨󵄨󵄨󵄨 .
2 2
Taking the derivative (Boyd and Vandenberghe ) at є = 0 and using the optimality
of (K1∗ , K2∗ ), we obtain

α tr󶀡(G1T K1∗ G1 + G2T K2∗ G2 + It )−1 (G2T K1∗ G2 − G1T K1∗ G1 )󶀱

+ (ᾱ − α) tr󶀡(G2T K2∗ G2 + It )−1 (G2T K1∗ G2 )󶀱 = 0,

and similarly

β tr󶀡(G1T K1∗ G1 + G2T K2∗ G2 + It )−1 (G2T K1∗ G2 − G1T K1∗ G1 )󶀱

+ ( β̄ − β) tr󶀡(G2T K2∗ G2 + It )−1 (G2T K1∗ G2 )󶀱 = 0.

But since α ̸= β, this implies that

tr󶀡(G1T K1∗ G1 + G2T K2∗ G2 + It )−1 (G2T K1∗ G2 − G1T K1∗ G1 )󶀱

= tr󶀡(G2T K2∗ G2 + It )−1 (G2T K1∗ G2 )󶀱 = 0,

which in turn implies that G1T K1∗ G1 = 0 and that R1∗ = 0. But this contradicts the hypoth-
esis that R1∗ > 0, which completes the proof.
CHAPTER 10

Distributed Lossless Compression

In this chapter, we begin the discussion on communication of uncompressed sources over

multiple noiseless links. We consider the limits on lossless compression of separately en-
coded sources, which is motivated by distributed sensing problems. For example, consider
a sensor network for measuring the temperature at different locations across a city. Sup-
pose that each sensor node compresses its measurement and transmits it to a common
base station via a noiseless link. What is the minimum total transmission rate needed so
that the base station can losslessly recover the measurements from all the sensors? If the
sensor measurements are independent of each other, then the answer to this question is
straightforward; each sensor compresses its measurement to the entropy of its respective
temperature process, and the limit on the total rate is the sum of the individual entropies.
The temperature processes at the sensors, however, can be highly correlated. Can such
correlation be exploited to achieve a lower rate than the sum of the individual entropies?
Slepian and Wolf showed that the total rate can be reduced to the joint entropy of the
processes, that is, the limit on distributed lossless compression is the same as that on cen-
tralized compression, where the sources are jointly encoded. The achievability proof of
this surprising result uses the new idea of random binning.
We then consider lossless source coding with helpers. Suppose that the base station in
our sensor network example wishes to recover the temperature measurements from only
a subset of the sensors while using the information sent by the rest of the sensor nodes to
help achieve this goal. What is the optimal tradeoff between the rates from the different
sensors? We establish the optimal rate region for the case of a single helper node.
In Chapter , we continue the discussion of distributed lossless source coding by
considering more general networks modeled by graphs.

10.1 DISTRIBUTED LOSSLESS SOURCE CODING FOR A 2-DMS

Consider the distributed compression system depicted in Figure ., where two sources
X1 and X2 are separately encoded (described) at rates R1 and R2 , respectively, and the
descriptions are communicated over noiseless links to a decoder who wishes to recover
both sources losslessly. What is the set of simultaneously achievable description rate pairs
(R1 , R2 )?
We assume a -component DMS (-DMS) (X1 × X2 , p(x1 , x2 )), informally referred to
as correlated sources or -DMS (X1 , X2 ), that consists of two finite alphabets X1 , X2 and
10.2 Inner and Outer Bounds on the Optimal Rate Region 259

X1n M1
Encoder 
X̂ 1n , X̂ 2n
Decoder
X2n M2
Encoder 

Figure .. Distributed lossless compression system.

a joint pmf p(x1 , x2 ) over X1 × X2 . The -DMS (X1 , X2 ) generates a jointly i.i.d. random
process {(X1i , X2i )} with (X1i , X2i ) ∼ p X1 ,X2 (x1i , x2i ).
A (2nR1 , 2nR2 , n) distributed lossless source code for the -DMS (X1 , X2 ) consists of
∙ two encoders, where encoder  assigns an index m1 (x1n ) ∈ [1 : 2nR1 ) to each sequence
x1n ∈ X1n and encoder  assigns an index m2 (x2n ) ∈ [1 : 2nR2 ) to each sequence x2n ∈ X2n ,
and
∙ a decoder that assigns an estimate (x̂1n , x̂2n ) ∈ X1n × X2n or an error message e to each
index pair (m1 , m2 ) ∈ [1 : 2nR1 ) × [1 : 2nR2 ).
The probability of error for a distributed lossless source code is defined as

Pe(n) = P󶁁( X̂ 1n , X̂ 2n ) =
̸ (X1n , X2n )󶁑.

A rate pair (R1 , R2 ) is said to be achievable for distributed lossless source coding if there
exists a sequence of (2nR1 , 2nR2 , n) codes such that limn→∞ Pe(n) = 0. The optimal rate re-
gion R ∗ is the closure of the set of achievable rate pairs.
Remark .. As for the capacity region of a multiuser channel, we can readily show that
R ∗ is convex by using time sharing.

10.2 INNER AND OUTER BOUNDS ON THE OPTIMAL RATE REGION

By the lossless source coding theorem in Section ., a rate pair (R1 , R2 ) is achievable for
distributed lossless source coding if

R1 > H(X1 ),
R2 > H(X2 ).

This gives the inner bound in Figure ..

Also, by the lossless source coding theorem, a rate R ≥ H(X1 , X2 ) is necessary and
sufficient to send a pair of sources (X1 , X2 ) together to a receiver. This yields the sum-rate
bound
R1 + R2 ≥ H(X1 , X2 ). (.)

Furthermore, using similar steps to the converse proof of the lossless source coding theo-
rem, we can show that any achievable rate pair for distributed lossless source coding must
260 Distributed Lossless Compression

satisfy the conditional entropy bounds

R1 ≥ H(X1 | X2 ),
(.)
R2 ≥ H(X2 | X1 ).

Combining the bounds in (.) and (.), we obtain the outer bound on the optimal rate
region in Figure .. In the following, we show that the outer bound is tight.

R2
Outer bound
Inner bound

H(X2 )
?

H(X2 |X1 )
R1
H(X1 |X2 ) H(X1 )

Figure .. Inner and outer bounds on the optimal rate region R ∗ .

10.3 SLEPIAN–WOLF THEOREM

First suppose that sender  observes both X1 and X2 . Then, by a conditional version of the
lossless source coding theorem (see Problem .), it can be readily shown that the corner
point (R1 , R2 ) = (H(X1 |X2 ), H(X2 )) on the boundary of the outer bound in Figure .
is achievable. Slepian and Wolf showed that this corner point is achievable even when
sender  does not know X2 . Thus the sum-rate R1 + R2 > H(X1 , X2 ) is achievable even
when the sources are separately encoded and the outer bound in Figure . is tight!

Theorem . (Slepian–Wolf Theorem). The optimal rate region R ∗ for distributed
lossless source coding of a -DMS (X1 , X2 ) is the set of rate pairs (R1 , R2 ) such that

R1 ≥ H(X1 | X2 ),
R2 ≥ H(X2 | X1 ),
R1 + R2 ≥ H(X1 , X2 ).

The following example illustrates the saving in rate using Slepian–Wolf coding.
10.3 Slepian–Wolf Theorem 261

Example .. Consider a doubly symmetric binary source (DSBS(p)) (X1 , X2 ), where
X1 and X2 are binary random variables with p X1 ,X2 (0, 0) = p X1 ,X2 (1, 1) = (1 − p)/2 and
p X1 ,X2 (0, 1) = p X1 ,X2 (1, 0) = p/2, p ∈ [0, 1/2]. Thus, X1 ∼ Bern(1/2), X2 ∼ Bern(1/2),
and their modulo- sum Z = X1 ⊕ X2 ∼ Bern(p) is independent of each of X1 and X2 .
Equivalently, X2 = X1 ⊕ Z is the output of a BSC(p) with input X1 and vice versa.
Suppose that p = 0.01. Then the sources X1 and X2 are highly dependent. If we
compress the sources independently, then we need to send 2 bits/symbol-pair. How-
ever, if we use Slepian–Wolf coding instead, we need to send only H(X1 ) + H(X2 |X1 ) =
H(1/2) + H(0.01) = 1 + 0.0808 = 1.0808 bits/symbol-pair.
In general, using Slepian–Wolf coding, the optimal rate region for the DSBS(p) is the
set of all rate pairs (R1 , R2 ) such that

R1 ≥ H(p),
R2 ≥ H(p),
R1 + R2 ≥ 1 + H(p).

The converse of the Slepian–Wolf theorem is already established by the fact the rate
region in the theorem coincides with the outer bound given by (.) and (.) in Fig-
ure .. Achievability involves the new idea of random binning. We first illustrate this
idea through an alternative achievability proof of the lossless source coding theorem.

10.3.1 Lossless Source Coding via Random Binning

Consider the following random binning achievability proof of the lossless source coding
theorem in Section ..

Codebook generation. Randomly and independently assign an index m(x n ) ∈ [1 : 2nR ] to

each sequence x n ∈ X n according to a uniform pmf over [1 : 2nR ]. We refer to each subset
of sequences with the same index m as a bin B(m), m ∈ [1 : 2nR ]. This binning scheme is
illustrated in Figure .. The chosen bin assignments are revealed to the encoder and the
decoder.

Encoding. Upon observing x n ∈ B(m), the encoder sends the bin index m.
Decoding. Upon receiving m, the decoder declares x̂n to be the estimate of the source
sequence if it is the unique typical sequence in B(m); otherwise it declares an error.

B(1) B(2) B(3) B(2nR )

Figure .. Random binning for a single source. The black circles denote the typical
x n sequences and the white circles denote the rest.
262 Distributed Lossless Compression

Analysis of the probability of error. We show that the probability of error averaged over
bin assignments tends to zero as n → ∞ if R > H(X).
Let M denote the random bin index of X n , i.e., X n ∈ B(M). Note that M ∼ Unif[1 :
2 ] is independent of X n . The decoder makes an error iff one or both of the following
nR

events occur:

E1 = 󶁁X n ∉ Tє(n) 󶁑,
E2 = 󶁁x̃n ∈ B(M) for some x̃n ̸= X n , x̃n ∈ Tє(n) 󶁑.

Then, by the symmetry of codebook construction and the union of events bound, the
average probability of error is upper bounded as

P(E) ≤ P(E1 ) + P(E2 )

= P(E1 ) + P(E2 | X n ∈ B(1)).

We now bound each probability of error term. By the LLN, P(E1 ) tends to zero as n → ∞.
For the second term, consider

P(E2 | X n ∈ B(1)) = 󵠈 P{X n = x n | X n ∈ B(1)}

x󰑛
⋅ P󶁁x̃n ∈ B(1) for some x̃n ̸= x n , x̃n ∈ Tє(n) 󵄨󵄨󵄨󵄨 x n ∈ B(1), X n = x n 󶁑
(a)
≤ 󵠈 p(x n ) 󵠈 P󶁁x̃n ∈ B(1) 󵄨󵄨󵄨󵄨 x n ∈ B(1), X n = x n 󶁑
x󰑛 x̃󰑛 ∈T󰜖(󰑛)
x̃󰑛 ̸= x 󰑛
(b)
= 󵠈 p(x n ) 󵠈 P{x̃n ∈ B(1)}
x󰑛 x̃󰑛 ∈T󰜖(󰑛)
x̃󰑛 ̸= x 󰑛

≤ |Tє(n) | ⋅ 2−nR
≤ 2n(H(X)+δ(є)) 2−nR ,

where (a) and (b) follow since for every x̃n ̸= x n , the events {x n ∈ B(1)}, {x̃n ∈ B(1)},
and {X n = x n } are mutually independent. Thus, the probability of error averaged over
bin assignments tends to zero as n → ∞ if R > H(X) + δ(є). Hence, there must exist a
sequence of bin assignments with limn→∞ Pe(n) = 0. This completes the achievability proof
of the lossless source coding theorem via random binning.
Remark .. In the above proof, we used only pairwise independence of the bin assign-
ments. Hence, if X is a Bernoulli source, we can use random linear binning (hashing)
m(x n ) = H x n , where m ∈ [1 : 2nR ] is represented by a vector of nR bits and the elements
of the nR × n “parity-check” random binary matrix H are generated i.i.d. Bern(1/2). Note
that this is the dual of the linear channel coding in which encoding is performed using a
random generator matrix; see Section ... Such linear binning can be extended to the
more general case where X is the set of elements of a finite field.
10.3 Slepian–Wolf Theorem 263

10.3.2 Achievability Proof of the Slepian–Wolf Theorem

We now use random binning to prove achievability of the Slepian–Wolf theorem.
Codebook generation. Randomly and independently assign an index m1 (x1n ) to each se-
quence x1n ∈ X1n according to a uniform pmf over [1 : 2nR1 ]. The sequences with the same
index m1 form a bin B1 (m1 ). Similarly assign an index m2 (x2n ) ∈ [1 : 2nR2 ] to each se-
quence x2n ∈ X2n . The sequences with the same index m2 form a bin B2 (m2 ). This binning
scheme is illustrated in Figure .. The bin assignments are revealed to the encoders and
the decoder.

B1 (1) B1 (2) B1 (3) B1 (m1) B1 (2nR1 )

x1n
x2n
B2 (1)

B2 (m2 ) Tє(n) (X1 , X2 )

B2 (2nR2 )

Figure .. Random binning for two correlated sources. The black circles corre-
spond to typical sequences.

Encoding. Upon observing x1n ∈ B1 (m1 ), encoder  sends m1 . Similarly, upon observing
x2n ∈ B2 (m2 ), encoder  sends m2 .
Decoding. Given the received index pair (m1 , m2 ), the decoder declares (x̂1n , x̂2n ) to be
the estimate of the source pair if it is the unique jointly typical pair in the product bin
B1 (m1 ) × B2 (m2 ); otherwise it declares an error.
Analysis of the probability of error. We bound the probability of error averaged over bin
assignments. Let M1 and M2 denote the random bin indices for X1n and X2n , respectively.
The decoder makes an error iff one or more of the following events occur:
E1 = {(X1n , X2n ) ∉ Tє(n) },
E2 = {x̃1n ∈ B1 (M1 ) for some x̃1n ̸= X1n , (x̃1n , X2n ) ∈ Tє(n) },
E3 = {x̃2n ∈ B2 (M2 ) for some x̃2n ̸= X2n , (X1n , x̃2n ) ∈ Tє(n) },
E4 = {x̃1n ∈ B1 (M1 ), x̃2n ∈ B2 (M2 ) for some x̃1n ̸= X1n , x̃2n ̸= X2n , (x̃1n , x̃2n ) ∈ Tє(n) }.
264 Distributed Lossless Compression

Then, the average probability of error is upper bounded as

P(E) ≤ P(E1 ) + P(E2 ) + P(E3 ) + P(E4 ).

Now we bound each probability of error term. By the LLN, P(E1 ) tends to zero as n → ∞.
Now consider the second term. Using the symmetry of the codebook construction and
following similar steps to the achievability proof of the lossless source coding theorem in
Section .., we have

P(E2 ) = P(E2 | X1n ∈ B1 (1))

= 󵠈 P{(X1n , X2n ) = (x1n , x2n ) | X1n ∈ B1 (1)} ⋅ P󶁁x̃1n ∈ B1 (1) for some x̃1n ̸= x1n ,
(x1󰑛 ,x2󰑛 )
(x̃1n , x2n ) ∈ Tє(n) 󵄨󵄨󵄨󵄨 x1n ∈ B1 (1), (X1n , X2n ) = (x1n , x2n )󶁑
≤ 󵠈 p(x1n , x2n ) 󵠈 P{x̃1n ∈ B1 (1)}
(x1󰑛 ,x2󰑛 ) x̃1󰑛 ∈T󰜖(󰑛) (X1 |x2󰑛 )
x̃1󰑛 ̸= x1󰑛

≤ 2n(H(X1 |X2 )+δ(є)) 2−nR1 ,

which tends to zero as n → ∞ if R1 > H(X1 |X2 ) + δ(є). Similarly, P(E3 ) and P(E4 ) tend to
zero as n → ∞ if R2 > H(X2 |X1 ) + δ(є) and R1 + R2 > H(X1 , X2 ) + δ(є). Thus, the proba-
bility of error averaged over bin assignments tends to zero as n → ∞ if (R1 , R2 ) is in the in-
terior of R ∗ . Therefore, there exists a sequence of bin assignments with limn→∞ Pe(n) = 0.
This completes the achievability proof of the Slepian–Wolf theorem.
Remark . (Distributed lossless compression via linear binning). If X1 and X2 are
Bernoulli sources, we can use random linear binnings m1 (x1n ) = H1 x1n and m2 (x2n ) =
H2 x2n , where the entries of H1 and H2 are generated i.i.d. Bern(1/2). If in addition,
(X1 , X2 ) is DSBS(p) (i.e., Z = X1 ⊕ X2 ∼ Bern(p)), then we can use the following cod-
ing scheme based on single-source linear binning. Consider the corner point (R1 , R2 ) =
(1, H(p)) of the optimal rate region. Suppose that X1n is sent uncoded while X2n is encoded
as H X2n with a randomly generated n(H(p) + δ(є)) × n parity-check matrix H. The de-
coder can calculate H X1n ⊕ H X2n = H Z n , from which Z n can be recovered with high prob-
ability as in the single-source case. Hence, X2n = X1n ⊕ Z n can also be recovered with high
probability.

10.4 LOSSLESS SOURCE CODING WITH A HELPER

Consider the distributed compression system depicted in Figure ., where only one of
the two sources is to be recovered losslessly and the encoder for the other source (helper)
provides coded side information to the decoder to help reduce the first encoder’s rate.
The definitions of a code, achievability, and optimal rate region R ∗ are the same as for
the distributed lossless source coding setup in Section ., except that the probability of
error is defined as Pe(n) = P{ X̂ n ̸= X n }.
10.4 Lossless Source Coding with a Helper 265

Xn M1
Encoder 
X̂ n
Decoder
Yn M2
Encoder 

Figure .. Distributed lossless compression system with a helper.

If there is no helper, i.e., R2 = 0, then R1 ≥ H(X) is necessary and sufficient for loss-
lessly recovering X at the decoder. At the other extreme, if the helper sends Y losslessly to
the decoder, i.e., R2 ≥ H(Y), then R1 ≥ H(X|Y ) is necessary and sufficient by the Slepian–
Wolf theorem. These two extreme points define the time-sharing inner bound and the
trivial outer bound in Figure .. Neither bound is tight in general, however.

Outer bound

H(Y) Time-sharing
inner bound

?
R1
H(X|Y) H(X)
Figure .. Inner and outer bounds on the optimal rate region for lossless source
coding with a helper.

The optimal rate region for the one-helper problem can be characterized as follows.

Theorem .. Let (X, Y ) be a -DMS. The optimal rate region R ∗ for lossless source
coding of X with a helper observing Y is the set of rate pairs (R1 , R2 ) such that

R1 ≥ H(X |U ),
R2 ≥ I(Y ; U)

for some conditional pmf p(u|y), where |U | ≤ |Y| + 1.

We illustrate this result in the following.

Example .. Let (X, Y ) be a DSBS(p), p ∈ [0, 1/2]. The optimal rate region simplifies
266 Distributed Lossless Compression

to the set of rate pairs (R1 , R2 ) such that

R1 ≥ H(α ∗ p),
(.)
R2 ≥ 1 − H(α)
for some α ∈ [0, 1/2]. It is straightforward to show that the above region is attained by
setting the backward test channel from Y to U as a BSC(α). The proof of optimality uses
Mrs. Gerber’s lemma as in the converse for the binary symmetric BC in Section ...
First, note that H(p) ≤ H(X|U ) ≤ 1. Thus there exists an α ∈ [0, 1/2] such that H(X|U ) =
H(α ∗ p). By the scalar MGL, H(X|U) = H(Y ⊕ Z|U) ≥ H(H −1 (H(Y|U )) ∗ p), since
given U, Z and Y remain independent. Thus, H −1 (H(Y|U )) ≤ α, which implies that

I(Y ; U ) = H(Y) − H(Y |U )

= 1 − H(Y |U )
≥ 1 − H(α).

Optimality of the rate region in (.) can be alternatively proved using a symmetrization
argument similar to that in Section ..; see Problem ..

10.4.1 Proof of Achievability

The coding scheme for Theorem . is illustrated in Figure .. We use random binning
and joint typicality encoding. The helper (encoder ) uses joint typicality encoding (as in
the achievability proof of the lossy source coding theorem) to generate a description of the
source Y represented by the auxiliary random variable U . Encoder  uses random binning
as in the achievability proof of the Slepian–Wolf theorem to help the decoder recover X
given that it knows U . We now present the details of the proof.

x n B(1) B(2) B(3) B(2nR1 )

n
y
u (1)
n

un (2)
Tє(n) (U , X)
u (3)
n

un (2nR2 )

Figure .. Coding scheme for lossless source coding with a helper.
10.4 Lossless Source Coding with a Helper 267

Codebook generation. Fix a conditional pmf p(u|y) and let p(u) = ∑ y p(y)p(u|y). Ran-
domly and independently assign an index m1 (x n ) ∈ [1 : 2nR1 ] to each sequence x n ∈ X n .
The set of sequences with the same index m1 form a bin B(m1 ). Randomly and indepen-
dently generate 2nR2 sequences un (m2 ), m2 ∈ [1 : 2nR2 ], each according to ∏ni=1 pU (ui ).
Encoding. If x n ∈ B(m1 ), encoder  sends m1 . Encoder  finds an index m2 such that
(un (m2 ), y n ) ∈ Tє(n)
󳰀 . If there is more than one such index, it sends the smallest one among

them. If there is no such index, set m2 = 1. Encoder  then sends the index m2 to the
decoder.
Decoding. The receiver finds the unique x̂n ∈ B(m1 ) such that (x̂n , un (m2 )) ∈ Tє(n) . If
there is none or more than one, it declares an error.
Analysis of the probability of error. Assume that M1 and M2 are the chosen indices for
encoding X n and Y n , respectively. The decoder makes an error iff one or more of the
following events occur:

E1 = 󶁁(U n (m2 ), Y n ) ∉ Tє(n)

󳰀 for all m2 ∈ [1 : 2nR2 )󶁑,
E2 = 󶁁(X n , U n (M2 )) ∉ Tє(n) 󶁑,
E3 = 󶁁x̃n ∈ B(M1 ), (x̃n , U n (M2 )) ∈ Tє(n) for some x̃n ̸= X n 󶁑.

Thus, the probability of error is upper bounded as

P(E) ≤ P(E1 ) + P(E1c ∩ E2 ) + P(E3 | X n ∈ B(1)).

We now bound each term. By the covering lemma in Section . with X ← Y, X̂ ← U ,
and U ← , P(E1 ) tends to zero as n → ∞ if R2 > I(Y ; U ) + δ(є 󳰀 ). Next, note that E1c =
{(U n (M2 ), Y n ) ∈ Tє(n)
󳰀 } and X | {U (M2 ) = u , Y
n n n n
= y n } ∼ ∏ni=1 p X|Y (xi |yi ). Hence, by
the conditional typicality lemma in Section . (see also Problem .), P(E1c ∩ E2 ) tends
to zero as n → ∞. Finally for the third term, consider

P(E3 | X n ∈ B(1)) = 󵠈 P󶁁(X n , U n ) = (x n , un ) 󵄨󵄨󵄨󵄨 X n ∈ B(1)󶁑 P󶁁x̃n ∈ B(1) for some

(x 󰑛 ,u󰑛 )

x̃n ̸= x n , (x̃n , un ) ∈ Tє(n) 󵄨󵄨󵄨󵄨 X n ∈ B(1), (X n , U n ) = (x n , un )󶁑

≤ 󵠈 p(x n , un ) 󵠈 P{x̃n ∈ B(1)}
(x 󰑛 ,u󰑛 ) x̃󰑛 ∈T󰜖(󰑛) (X|u󰑛 )
x̃󰑛 ̸= x 󰑛

≤ 2n(H(X|U )+δ(є)) 2−nR1 ,

which tends to zero as n → ∞ if R1 > H(X|U) + δ(є). This completes the proof of achiev-
ability.

10.4.2 Proof of the Converse

Let M1 and M2 denote the indices from encoders  and , respectively. By Fano’s inequality,
H(X n |M1 , M2 ) ≤ nєn for some єn that tends to zero as n → ∞.
268 Distributed Lossless Compression

First consider
nR2 ≥ H(M2 )
≥ I(M2 ; Y n )
n
= 󵠈 I(M2 ; Yi |Y i−1 )
i=1
n
= 󵠈 I(M2 , Y i−1 ; Yi )
i=1
n
(a)
= 󵠈 I(M2 , Y i−1 , X i−1 ; Yi ),
i=1

where (a) follows since X → (M2 , Y i−1 ) → Yi form a Markov chain. Identifying Ui =
i−1

(M2 , Y i−1 , X i−1 ) and noting that Ui → Yi → Xi form a Markov chain, we have shown that
n
nR2 ≥ 󵠈 I(Ui ; Yi ).
i=1

Using a time-sharing random variable Q ∼ Unif[1 : n], independent of (X n , Y n , U n ), we

obtain
n
1
󵠈 H(Xi |Ui , Q = i) = H(XQ |UQ , Q),
n i=1
n
1
󵠈 I(Yi ; Ui |Q = i) = I(YQ ; UQ |Q).
n i=1

Since Q is independent of YQ , I(YQ ; UQ |Q) = I(YQ ; UQ , Q). Thus, defining X = XQ , Y =

YQ , and U = (UQ , Q) and letting n → ∞, we have shown that R1 ≥ H(X|U) and R2 ≥
I(Y ; U ) for some conditional pmf p(u|y). The cardinality bound on U can be proved
using the convex cover method in Appendix C. This completes the proof of Theorem ..
10.5 Extensions to More than Two Sources 269

10.5 EXTENSIONS TO MORE THAN TWO SOURCES

The Slepian–Wolf theorem can be extended to distributed lossless source coding for an
arbitrary number of sources.

Theorem .. The optimal rate region R ∗ (X1 , X2 , . . . , Xk ) for distributed lossless
source coding of a k-DMS (X1 , . . . , Xk ) is the set of rate tuples (R1 , R2 , . . . , Rk ) such
that
󵠈 R j ≥ H(X(S)| X(S c )) for all S ⊆ [1 : k].
j∈S

Remark .. It is interesting to note that the optimal rate region for distributed lossless
source coding and the capacity region of the multiple access channel, both of which are for
many-to-one communication, have single-letter characterizations for an arbitrary num-
ber of senders. This is not the case for dual one-to-many communication settings such as
the broadcast channel (even with only two receivers), as we have seen in Chapter .

Theorem . can be similarly generalized to k sources (X1 , X2 , . . . , Xk ) and a single

helper Y .

Theorem .. Let (X1 , X2 , . . . , Xk , Y) be a (k + 1)-DMS. The optimal rate region for
the lossless source coding of (X1 , X2 , . . . , Xk ) with helper Y is the set of rate tuples
(R1 , R2 , . . . , Rk , Rk+1 ) such that

󵠈 R j ≥ H(X(S)|U , X(S c )) for all S ∈ [1 : k],

j∈S

Rk+1 ≥ I(Y ; U)

for some conditional pmf p(u|y) with |U | ≤ |Y| + 2k − 1.

The optimal rate region for the lossless source coding with more than one helper is
not known in general even when there is only one source to be recovered.
270 Distributed Lossless Compression

SUMMARY

∙ k-Component discrete memoryless source (k-DMS)

∙ Distributed lossless source coding for a k-DMS:
∙ Slepian–Wolf optimal rate region
∙ Random binning
∙ Source coding via random linear codes
∙ Lossless source coding with a helper:
∙ Joint typicality encoding for lossless source coding
∙ Use of Mrs. Gerber’s lemma in the proof of the converse for the doubly symmetric
binary source (DSBS)

BIBLIOGRAPHIC NOTES

The Slepian–Wolf theorem was first proved in Slepian and Wolf (a). The random
binning proof is due to Cover (a), who also showed that the proof can be extended
to any pair of stationary ergodic sources X1 and X2 with joint entropy rates H(X1 , X2 ).
Since the converse can also be easily extended, the Slepian–Wolf theorem for this larger
class of correlated sources is the set of rate pairs (R1 , R2 ) such that

R1 ≥ H(X1 | X2 ) = H(X1 , X2 ) − H(X2 ),

R2 ≥ H(X2 | X1 ),
R1 + R2 ≥ H(X1 , X2 ).

The random linear binning technique for the DSBS in Remark . appeared in Wyner
().
Error-free distributed lossless source coding has been studied, for example, in Witsen-
hausen (b), Ahlswede (), and El Gamal and Orlitsky (). As discussed in the
bibliographic notes of Chapter , the rate of lossless source coding is the same as that of
error-free compression using variable-length codes. This result does not hold in general
for distributed source coding, that is, the Slepian–Wolf theorem does not hold in general
for error-free distributed compression (using variable-length codes). In fact, for many
-DMS (X1 , X2 ), the optimal error-free rate region is

R1 ≥ H(X1 ),
R2 ≥ H(X2 ).

For example, error-free distributed compression for a DSBS(0.01) requires that R1 = R2 =

1 bit per symbol; in other words, no compression is possible.
The problem of lossless source coding with a helper was first studied by Wyner (),
Problems 271

who established the optimal rate region for the DSBS in Example .. The optimal rate
region for a general DMS in Theorem . was established independently by Ahlswede
and Körner () and Wyner (b). The strong converse was proved via the blowing-
up lemma by Ahlswede, Gács, and Körner ().

PROBLEMS

.. Conditional lossless source coding. Consider the lossless source coding setup de-
picted in Figure .. Let (X, Y ) be a -DMS. The source sequence X n is to be
sent losslessly to a decoder with side information Y n available at both the encoder
and the decoder. Thus, a (2nR , n) code is defined by an encoder m(x n , y n ) and a
decoder x̂n (m, y n ), and the probability of error is defined as Pe(n) = P{ X̂ n ̸= X n }.

Xn M X̂ n
Encoder Decoder

Figure .. Source coding with side information.

(a) Find the optimal rate R∗ .

(b) Prove achievability using |Tє(n) (X|y n )| ≤ 2n(H(X|Y)+δ(є)) for y n ∈ Tє(n) (Y).
(c) Prove the converse using Fano’s inequality.
(d) Using part (c), establish the inequalities on the rates in (.) for distributed
lossless source coding.
.. Prove the converse for the Slepian–Wolf theorem by establishing the outer bound
given by (.) and (.).
.. Provide the details of the achievability proof for lossless source coding of a Bern(p)
source using random linear binning described in Remark ..
.. Show that the rate region in Theorem . is convex.
.. Prove Theorem . for a -DMS (X1 , X2 , Y ).
.. Lossless source coding with degraded source sets. Consider the distributed lossless
source coding setup depicted in Figure .. Let (X, Y ) be a -DMS. Sender  en-
codes (X n , Y n ) into an index M1 , while sender  encodes only Y n into an index
M2 . Find the optimal rate region (a) when the decoder wishes to recover both X
and Y losslessly, and (b) when the decoder wishes to recover only Y.
.. Converse for lossless source coding of a DSBS with a helper via symmetrization.
Consider distributed lossless coding of the DSBS with a helper in Example ..
272 Distributed Lossless Compression

Xn, Y n M1
Encoder 
X̂ n , Ŷ n
Decoder
Yn M2
Encoder 

Figure .. Distributed lossless source coding with degraded source sets.

Here we evaluate the optimal rate region in Theorem . via a symmetrization
argument similar to that in Section ... Given p(u|y) (and corresponding p(u)
and p(y|u)), define p(ũ |y) by
1
pŨ (u) = pŨ (−u) = pU (u), u = 1, 2, 3,
2
pY|Ũ (y|u) = pY|Ũ (1 − y| − u) = pY|U (y|u), (u, y) ∈ {1, 2, 3} × {0, 1}.
(a) Verify that the above p(ũ ) and p(y|ũ ) are well-defined by checking that
∑ũ p(ũ )p(y|ũ ) = 1/2 for y ∈ {0, 1}.
(b) Show that H(Y|U ) = H(Y|U) ̃ and H(X|U ) = H(X|U).
̃
(c) Show that for any λ ∈ [0, 1], the weighted sum λH(X|Ũ ) + (1 − λ)I(Ũ ; Y ) is
minimized by some BSC p(ũ |y).
(d) Show that the rate region in (.) is convex.
(e) Combining parts (b), (c), and (d), conclude that the optimal rate region is char-
acterized by (.).
.. Lossless source coding with two decoders and side information. Consider the lossless
source coding setup for a -DMS (X, Y1 , Y2 ) depicted in Figure .. Source X is
encoded into an index M, which is broadcast to two decoders. Decoder j = 1, 2
wishes to recover X losslessly from the index M and side information Y j . The
probability of error is defined as P{ X̂ 1n ̸= X n or X̂ 2n ̸= X n }. Find the optimal lossless
source coding rate.
.. Correlated sources with side information. Consider the distributed lossless source
coding setup for a -DMS (X1 , X2 , Y) depicted in Figure .. Sources X1 and X2
are separately encoded into M1 and M2 , respectively, and the decoder wishes to
recover both sources from M1 , M2 , and side information Y is available. Find the
optimal rate region R ∗ .
.. Helper to both the encoder and the decoder. Consider the variant on the one-helper
problem for a -DMS (X, Y ) in Section ., as depicted in Figure .. Suppose
that coded side information of Y is available at both the encoder and the decoder,
and the decoder wishes to recover X losslessly. Find the optimal rate region R ∗ .
.. Cascade lossless source coding. Consider the distributed lossless source coding
setup depicted in Figure .. Let (X1 , X2 ) be a -DMS. Source X1 is encoded
Problems 273

into an index M1 . Then source X2 and the index M1 are encoded into an index
M2 . The decoder wishes to recover both sources losslessly only from M2 . Find the
optimal rate region R ∗ .

Y1n

X̂ 1n
Decoder 
Xn M
Encoder
X̂ 2n
Decoder 

Y2n

Figure .. Lossless source coding with multiple side information.

X1n M1
Encoder 
X̂ 1n , X̂ 2n
Decoder
X2n M2
Encoder 

Figure .. Distributed lossless source coding with side information.

Xn M1 X̂ n
Encoder  Decoder

Yn M2
Encoder 

Figure .. Lossless source coding with a helper.

X1n X2n
M1 M2 X̂ 1n , X̂ 2n
Encoder  Encoder  Decoder

Figure .. Cascade lossless source coding.

CHAPTER 11

Lossy Compression with Side

Information

We turn our attention to distributed lossy compression. In this chapter, we consider the
special case of lossy source coding with side information depicted in Figure .. Let (X, Y )
be a -DMS and d(x, x̂) be a distortion measure. The sender wishes to communicate the
source X over a noiseless link with distortion D to the receiver with side information Y
available at the encoder, the decoder, or both. The most interesting case is when the side
information is available only at the decoder. For example, the source may be an image
to be sent at high quality to a receiver who already has a noisy or a lower quality version
of the image. As another example, the source may be a frame in a video sequence and
the side information is the previous frame available at the receiver. We wish to find the
rate–distortion function with side information.
We first establish the rate–distortion function when the side information is available
at the decoder and the reconstruction sequence can depend only causally on the side in-
formation. We show that in the lossless limit, causal side information may not reduce
the optimal compression rate. We then establish the rate–distortion function when the
side information is available noncausally at the decoder. The achievability proof uses
Wyner–Ziv coding, which involves joint typicality encoding, binning, and joint typical-
ity decoding. We observe certain dualities between these results and the capacities for
channels with causal and noncausal state information available at the encoder studied in
Chapter . Finally we discuss the lossy source coding problem when the encoder does
not know whether there is side information at the decoder or not. This problem can be
viewed as a source coding dual to the broadcast channel approach to coding for fading
channels discussed in Chapter .

Xn M ( X̂ n , D)
Encoder Decoder

Figure .. Lossy compression system with side information.

11.1 Simple Special Cases 275

11.1 SIMPLE SPECIAL CASES

By the lossy source coding theorem in Section ., the rate–distortion function with no
side information at either the encoder or the decoder is

R(D) = min ̂
I(X; X).
̂
p(x̂|x):E(d(X, X))≤D

In the lossless case, the optimal compression rate is R∗ = H(X). This corresponds to R(0)
under the Hamming distortion measure as discussed in Section ...
It can be easily shown that when side information is available only at the encoder, the
rate–distortion function remains the same, i.e.,

RSI-E (D) = min ̂ = R(D).

I(X; X) (.)
̂
p(x̂|x):E(d(X, X))≤D

∗
Thus for the lossless case, RSI-E = H(X) = R∗ .
When side information is available at both the encoder and the decoder (causally or
noncausally), the rate–distortion function is a conditional version of that with no side
information, i.e.,
RSI-ED (D) = min I(X; X̂ |Y). (.)
̂
p(x̂|x, y):E(d(X, X))≤D

The proof of this result is a simple extension of the proof of the lossy source coding theo-
∗
rem. For the lossless case, the optimal rate is RSI-ED = H(X|Y), which follows also by the
conditional version of the lossless source coding theorem; see Problem ..

11.2 CAUSAL SIDE INFORMATION AVAILABLE AT THE DECODER

We consider yet another special case of side information availability. Suppose that the side
information sequence is available only at the decoder. Here the rate–distortion function
depends on whether the side information is available causally or noncausally.
We first consider the case where the reconstruction of each symbol X̂ i (M, Y i ) can de-
pend causally on the side information sequence Y i as depicted in Figure .. This setting
is motivated, for example, by denoising with side information. Suppose that X n is the tra-
jectory of a target including the terrain in which it is moving. The tracker (decoder) has
a description M of the terrain available prior to tracking. At time i, the tracker obtains a
noisy observation Yi of the target’s location and wishes to output an estimate X̂ i (M, Y i )
of the location based on the terrain description M and the noisy observations Y i . The
rate–distortion function characterizes the performance limit of the optimal sequential
target tracker (filter). The causal side information setup is also motivated by an attempt
to better understand the duality between channel coding and source coding as discussed
in Section ...
276 Lossy Compression with Side Information

Xn M X̂ i (M, Y i )
Encoder Decoder

Figure .. Lossy source coding with causal side information at the decoder.

A (2nR , n) lossy source code with side information available causally at the decoder
consists of
. an encoder that assigns an index m(x n ) ∈ [1 : 2nR ) to each x n ∈ X n and
. a decoder that assigns an estimate x̂i (m, y i ) to each received index m and side infor-
mation sequence y i for i ∈ [1 : n].
The rate–distortion function with causal side information available at the decoder
RCSI-D (D) is the infimum of rates R such that there exists a sequence of (2nR , n) codes with
lim supn→∞ E(d(X n , X̂ n )) ≤ D.
The rate–distortion function has a simple characterization.

Theorem .. Let (X, Y) be a -DMS and d(x, x) ̂ be a distortion measure. The rate–
distortion function for X with side information Y causally available at the decoder is

RCSI-D (D) = min I(X; U ) for D ≥ Dmin ,

where the minimum is over all conditional pmfs p(u|x) with |U | ≤ |X | + 1 and func-
̂ ≤ D, and Dmin = minx(y)
tions x̂(u, y) such that E[d(X, X)] ̂ E[d(X, x̂(Y))].

Note that RCSI-D (D) is nonincreasing, convex, and thus continuous in D. In the fol-
lowing example, we compare the rate–distortion function in the theorem with the rate–
distortion functions with no side information and with side information available at both
the encoder and the decoder.
Example .. Let (X, Y ) be a DSBS(p), p ∈ [0, 1/2], and d be a Hamming distortion
measure. When there is no side information (see Example .), the rate–distortion func-
tion is
1 − H(D) for 0 ≤ D ≤ 1/2,
R(D) = 󶁇
0 for D > 1/2.

By comparison, when side information is available at both the encoder and the decoder,
the rate–distortion function can be found by evaluating (.), which yields

H(p) − H(D) for 0 ≤ D ≤ p,

RSI-ED (D) = 󶁇 (.)
0 for D > p.
11.2 Causal Side Information Available at the Decoder 277

Now suppose that the side information is causally available at the decoder. Then it can be
shown by evaluating the rate–distortion function in Theorem . that

󶀂
󶀒1 − H(D) for 0 ≤ D ≤ Dc ,
󶀒
RCSI-D (D) = 󶀊(p − D)H 󳰀 (Dc ) for Dc < D ≤ p,
󶀒
󶀒
󶀚0 for D > p,
where H 󳰀 is the derivative of the binary entropy function, and D c is the solution to the
equation (1 − H(Dc ))/(p − Dc ) = H 󳰀 (D c ). Thus RCSI-D (D) coincides with R(D) for 0 ≤
D ≤ D c , and is otherwise given by the tangent to the curve of R(D) that passes through
the point (p, 0) for Dc ≤ D ≤ p as shown in Figure .. In other words, the optimum
performance is achieved by time sharing between rate–distortion coding with no side
information and zero-rate decoding that uses only the side information. For sufficiently
small distortion, RCSI-D (D) is achieved simply by ignoring the side information.

H(p)
RSI-ED (D)

RCSI-D (D)
R(D)

Dc p
D

Figure .. Comparison of R(D), RCSI-D (D), and RSI-ED (D) for a DSBS(p).

11.2.1 Proof of Achievability

To prove achievability for Theorem ., we use joint typicality encoding to describe X by
U as in the achievability proof of the lossy source coding theorem. The reconstruction X̂
is a function of U and the side information Y . The following provides the details.
Codebook generation. Fix the conditional pmf p(u|x) and function x̂(u, y) that attain
RCSI-D (D/(1 + є)), where D is the desired distortion. Randomly and independently gen-
erate 2nR sequences un (m), m ∈ [1 : 2nR ], each according to ∏ni=1 pU (ui ).
Encoding. Given a source sequence x n , find an index m such that (un (m), x n ) ∈ Tє(n)
󳰀 . If

there is more than one such index, choose the smallest one among them. If there is no
such index, set m = 1. The encoder sends the index m to the decoder.
278 Lossy Compression with Side Information

Decoding. The decoder finds the reconstruction sequence x̂n (m, y n ) by setting x̂i =
x̂(ui (m), yi ) for i ∈ [1 : n].
Analysis of expected distortion. Denote the chosen index by M and let є > є 󳰀 . Define
the “error” event
E = 󶁁(U n (M), X n , Y n ) ∉ Tє(n) 󶁑.

Then by the union of events bound,

P(E) ≤ P(E0 ) + P(E0c ∩ E),

where

E0 = 󶁁(U n (m), X n ) ∉ Tє(n)

󳰀 for all m ∈ [1 : 2nR )󶁑 = 󶁁(U n (M), X n ) ∉ Tє(n)
󳰀 󶁑.

We now bound each term. By the covering lemma in Section . (with U ←  and X̂ ←
U ), P(E0 ) tends to zero as n → ∞ if R > I(X; U ) + δ(є 󳰀 ). For the second term, since
є > є 󳰀 , (U n (M), X n ) ∈ Tє(n) n n n
󳰀 , and Y | {U (M) = u , X
n
= x n } ∼ ∏ni=1 pY|U ,X (yi |ui , xi ) =
n
∏i=1 pY|X (yi |xi ), by the conditional typicality lemma in Section ., P(E0c ∩ E) tends to
zero as n → ∞. Thus, by the typical average lemma in Section ., the asymptotic distor-
tion averaged over codebooks is upper bounded as

lim sup E(d(X n ; X̂ n )) ≤ lim sup 󶀡dmax P(E) + (1 + є) E(d(X, X))

̂ P(E c )󶀱 ≤ D,
n→∞ n→∞

if R > I(X; U ) + δ(є 󳰀 ) = RCSI-D (D/(1 + є)) + δ(є 󳰀 ). Finally, by taking є → 0 and using the
continuity of RCSI-D (D) in D, we have shown that every rate R > RCSI-D (D) is achievable.
This completes the proof.

11.2.2 Proof of the Converse

Denote the index sent by the encoder as M. In general, X̂ i is a function of (M, Y i ), so we
set Ui = (M, Y i−1 ). Note that Ui → Xi → Yi form a Markov chain and X̂ i is a function of
(Ui , Yi ) as desired. Consider

nR ≥ H(M)
= I(X n ; M)
n
= 󵠈 I(Xi ; M | X i−1 )
i=1
n
= 󵠈 I(Xi ; M, X i−1 )
i=1
n
(a)
= 󵠈 I(Xi ; M, X i−1 , Y i−1 )
i=1
n
≥ 󵠈 I(Xi ; Ui )
i=1
11.2 Causal Side Information Available at the Decoder 279

n
≥ 󵠈 RCSI-D (E(d(Xi , X̂ i )))
i=1
(b) n
1
≥ nRCSI-D 󶀤 󵠈 E(d(Xi , X̂ i ))󶀴,
n i=1

where (a) follows since Xi → (M, X i−1 ) → Y i−1 form a Markov chain and (b) follows by
the convexity of RCSI-D (D). Since lim supn→∞ (1/n) ∑ni=1 E(d(Xi , X̂ i )) ≤ D (by assump-
tion) and RCSI-D (D) is nonincreasing, R ≥ RCSI-D (D). The cardinality bound on U can
be proved using the convex cover method in Appendix C. This completes the proof of
Theorem ..

11.2.3 Lossless Source Coding with Causal Side Information

Consider the source coding with causal side information setting in which the decoder
wishes to recover X losslessly, that is, with probability of error P{X n ̸= X̂ n } that tends
∗
to zero as n → ∞. Denote the optimal compression rate by RCSI-D . Clearly H(X|Y) ≤
∗
RCSI-D ≤ H(X). First consider some special cases:
∗
∙ If X = Y , then RCSI-D = H(X|Y ) (= 0).
∗
∙ If X and Y are independent, then RCSI-D = H(X|Y ) (= H(X)).
∗
∙ If X = (Y , Z), where Y and Z are independent, then RCSI-D = H(X|Y) (= H(Z)).
The following theorem establishes the optimal lossless rate in general.

Theorem .. Let (X, Y) be a -DMS. The optimal lossless compression rate for the
source X with causal side information Y available at the decoder is
∗
RCSI-D = min I(X; U ),

where the minimum is over all conditional pmfs p(u|x) with |U | ≤ |X | + 1 such that
H(X|U , Y) = 0.

∗
To prove the theorem, we show that RCSI-D = RCSI-D (0) under Hamming distortion
measure as in the random coding proof of the lossless source coding theorem (without
side information) in Section ... The details are as follows.
Proof of the converse. Consider the lossy source coding problem with X = X̂ and Ham-
ming distortion measure d, where the minimum in Theorem . is evaluated at D = 0.
Note that requiring that the block error probability P{X n ̸= X̂ n } tend to zero as n → ∞
is stronger than requiring that the average bit-error probability (1/n) ∑ni=1 P{Xi ̸= X̂ i } =
E(d(X n , X̂ n )) tend to zero. Hence, by the converse proof for the lossy coding case, it
∗
follows that RCSI-D ≥ RCSI-D (0) = min I(X; U), where the minimum is over all condi-
tional pmfs p(u|x) such that E(d(X, x̂(U , Y)) = 0, or equivalently, x(u, ̂ y) = x (hence
p(x|u, y) = 1) for all (x, y, u) with p(x, y, u) > 0, or equivalently, H(X|U , Y) = 0.
280 Lossy Compression with Side Information

Proof of achievability. Consider the proof of achievability for the lossy case. If (x n , y n ,
un (m)) ∈ Tє(n) for some m ∈ [1 : 2nR ), then p X,Y ,U (xi , yi , ui (m)) > 0 for all i ∈ [1 : n] and
hence x̂(ui (m), yi ) = xi for all i ∈ [1 : n]. Thus

P󶁁 X̂ n = X n 󵄨󵄨󵄨󵄨 (U n (m), X n , Y n ) ∈ Tє(n) for some m ∈ [1 : 2nR )󶁑 = 1.

But, since P{(U n (m), X n , Y n ) ∉ Tє(n) for all m ∈ [1 : 2nR )} tends to zero as n → ∞, so does
P{ X̂ n ̸= X n } as n → ∞. This completes the proof of achievability.
Compared to the Slepian–Wolf theorem, Theorem . shows that causality of side in-
formation can severely limit how encoders can leverage correlation between the sources.
The following proposition shows that under a fairly general condition causal side infor-
mation does not help at all.

Proposition .. Let (X, Y) be a -DMS such that p(x, y) > 0 for all (x, y) ∈ X × Y.
∗
Then, RCSI-D = H(X).

To prove the proposition, we show by contradiction that if p(x, y) > 0 for all (x, y),
then the condition H(X|U , Y) = 0 in Theorem . implies that H(X|U) = 0. Suppose
H(X|U ) > 0. Then p(u) > 0 and 0 < p(x|u) < 1 for some (x, u), which also implies that
0 < p(x 󳰀 |u) < 1 for some x 󳰀 ̸= x. Since d(x, x̂(u, y)) = 0 and d(x 󳰀 , x̂(u, y)) = 0 cannot
both hold at the same time, and pU ,X,Y (u, x, y) > 0 and pU ,X,Y (u, x 󳰀 , y) > 0 (because
p(x, y) > 0), we must have E[d(X, x̂(U , Y))] > 0. But this contradicts the assumption
∗
of E[d(X, x̂(U , Y ))] = 0. Thus, H(X|U ) = 0 and RCSI-D = H(X). This completes the proof
of the proposition.

11.3 NONCAUSAL SIDE INFORMATION AVAILABLE AT THE DECODER

We now consider the source coding setting with side information available noncausally
at the decoder as depicted in Figure .. In other words, a (2nR , n) code is defined by an
encoder m(x n ) and a decoder x̂n (m, y n ).
The rate–distortion function is also known for this case.

Theorem . (Wyner–Ziv Theorem). Let (X, Y) be a -DMS and d(x, x̂) be a distor-
tion measure. The rate–distortion function for X with side information Y available
noncausally at the decoder is

RSI-D (D) = min 󶀡I(X; U ) − I(Y ; U)󶀱 = min I(X; U |Y ) for D ≥ Dmin ,

where the minimum is over all conditional pmfs p(u|x) with |U | ≤ |X | + 1 and func-
̂ ≤ D, and Dmin = minx(y)
tions x̂(u, y) such that E[d(X, X)] ̂ E[d(X, x̂(Y))].

Note that RSI-D (D) is nonincreasing, convex, and thus continuous in D.

11.3 Noncausal Side Information Available at the Decoder 281

Xn M ( X̂ n , D)
Encoder Decoder

Figure .. Lossy source coding with noncausal side information at the decoder.

Clearly, RSI-ED (D) ≤ RSI-D (D) ≤ RCSI-D (D) ≤ R(D). The difference between RSI-D (D)
and RCSI-D (D) is in the subtracted term I(Y ; U ). Also recall that with side information at
both the encoder and the decoder,
RSI-ED (D) = min I(X; U |Y ).
̂
̂ y):E(d(X, X))≤D
p(u|x, y), x(u,

Hence the difference between RSI-D (D) and RSI-ED (D) is in taking the minimum over
p(u|x) versus p(u|x, y).
We know from the Slepian–Wolf theorem that the optimal lossless compression rate
∗
with side information Y only at the decoder, RSI-D = H(X|Y), is the same as the rate when
the side information Y is available at both the encoder and the decoder. Does this equal-
ity also hold in general for the lossy case? The following example shows that in some
cases the rate–distortion function with side information available only at the decoder in
Theorem . is the same as when it is available at both the encoder and the decoder.
Example . (Quadratic Gaussian source coding with noncausal side information).
We show that RSI-D (D) = RSI-ED (D) for a -WGN source (X, Y) and squared error dis-
tortion. Assume without loss of generality that X ∼ N(0, P) and the side information
Y = X + Z, where Z ∼ N(0, N) is independent of X. It is easy to show that the rate–
distortion function with noncausal side information Y available at both the encoder and
the decoder is
P󳰀
RSI-ED (D) = R 󶀥 󶀵 , (.)
D
where P 󳰀 = Var(X|Y) = PN/(P + N) and R(x) is the quadratic Gaussian rate function.
Now consider the case in which the side information is available only at the decoder.
Clearly, RSI-D (D) = 0 for D ≥ P 󳰀 . For 0 < D ≤ P 󳰀 , we take the auxiliary random variable to
be U = X + V , where V ∼ N(0, Q) is independent of X and Y , that is, we use a Gaussian
test channel from X to U . Setting Q = P 󳰀 D/(P 󳰀 − D), we can easily see that
P󳰀
I(X; U |Y) = R 󶀥 󶀵.
D
Now let the reconstruction X̂ be the MMSE estimate of X given U and Y. Then, E((X −
̂ 2 ) = Var(X|U , Y), which can be easily shown to be equal to D. Thus the rate–distortion
X)
function for this case is again
P󳰀
RSI-D (D) = R 󶀥 󶀵 = RSI-ED (D). (.)
D
282 Lossy Compression with Side Information

This surprising result does not hold in general, however.

Example .. Consider the case of a DSBS(p), p ∈ [0, 1/2], and Hamming distortion
measure. It can be shown that the rate–distortion function for this case is

󶀂
󶀒 д(D) for 0 ≤ D ≤ Dc󳰀 ,
󶀒
RSI-D (D) = 󶀊(p − D)д 󳰀 (D c󳰀 ) for Dc󳰀 < D ≤ p,
󶀒
󶀒
󶀚0 for D > p,

where д(D) = H(p ∗ D) − H(D), д 󳰀 is the derivative of д, and D c󳰀 is the solution to the
equation д(D c󳰀 )/(p − D c󳰀 ) = д 󳰀 (D c󳰀 ). It can be easily shown that RSI-D (D) > RSI-ED (D) =
H(p) − H(D) for all D ∈ (0, p). Thus there is a nonzero cost for the lack of side informa-
tion at the encoder. Figure . compares the rate–distortion functions with and without
side information.

H(p)

RCSI-D (D)

RSI-D (D)

RSI-ED (D)

D
D c D c󳰀 p

Figure .. Comparison of RCSI-D (D), RSI-D (D), and RSI-ED (D) for a DSBS(p).

11.3.1 Proof of Achievability

The Wyner–Ziv coding scheme uses the compress–bin idea illustrated in Figure .. As in
the causal case, we use joint typicality encoding to describe X by U . Since U is correlated
with Y, however, binning can be used to reduce its description rate. The bin index of U is
sent to the receiver. The decoder uses joint typicality decoding with Y to recover U and
then reconstructs X̂ from U and Y. We now provide the details.
Codebook generation. Fix the conditional pmf p(u|x) and function x(u, ̂ y) that attain
the rate–distortion function RSI-D (D/(1 + є)), where D is the desired distortion. Ran-
̃ ̃
domly and independently generate 2nR sequences un (l), l ∈ [1 : 2nR ], each according to
̃
∏ni=1 pU (ui ). Partition the set of indices l ∈ [1 : 2nR ] into equal-size subsets referred to as
11.3 Noncausal Side Information Available at the Decoder 283

yn
xn
u (1)
n

B(1)

B(m) un (l) Tє(n) (U , Y)

B(2nR )
̃
un (2nR )

Figure .. Wyner–Ziv coding scheme. Each bin B(m), m ∈ [1 : 2nR ], consists of
̃
2n(R−R) indices.

̃ ̃
bins B(m) = [(m − 1)2n(R−R) + 1 : m2n(R−R) ], m ∈ [1 : 2nR ]. The codebook is revealed to
the encoder and the decoder.
Encoding. Given x n , the encoder finds an index l such that (x n , un (l)) ∈ Tє(n)
󳰀 . If there

is more than one such index, it selects one of them uniformly at random. If there is no
̃
such index, it selects an index from [1 : 2nR ] uniformly at random. The encoder sends
the bin index m such that l ∈ B(m). Note that randomized encoding here is used only to
simplify the analysis and should be viewed as a part of random codebook generation; see
Problem ..
Decoding. Let є > є 󳰀 . Upon receiving m, the decoder finds the unique index ̂l ∈ B(m)
such that (un ( ̂l), y n ) ∈ Tє(n) ; otherwise it sets ̂l = 1. It then computes the reconstruction
sequence as x̂i = x̂(ui ( ̂l), yi ) for i ∈ [1 : n].
Analysis of expected distortion. Let (L, M) denote the chosen indices at the encoder and
L̂ be the index estimate at the decoder. Define the “error” event
̂ X n , Y n ) ∉ T (n) 󶁑
E = 󶁁(U n (L), є

and consider the events

̃
E1 = 󶁁(U n (l), X n ) ∉ Tє(n)
󳰀 for all l ∈ [1 : 2nR )󶁑,
E2 = 󶁁(U n (L), X n , Y n ) ∉ Tє(n) 󶁑,
E3 = 󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(M), ̃l ̸= L󶁑.
284 Lossy Compression with Side Information

Since the “error” event occurs only if (U n (L), X n , Y n ) ∉ Tє(n) or L̂ =

̸ L, by the union of
events bound,
P(E) ≤ P(E1 ) + P(E1c ∩ E2 ) + P(E3 ).

We now bound each term. By the covering lemma, P(E1 ) tends to zero as n → ∞ if
R̃ > I(X; U ) + δ(є 󳰀 ). Since є > є 󳰀 , E1c = {(U n (L), X n ) ∈ Tє(n)
󳰀 }, and Y | {U (L) = u , X
n n n n
=
x n } ∼ ∏ni=1 pY|U ,X (yi |ui , xi ) = ∏ni=1 pY|X (yi |xi ), by the conditional typicality lemma,
P(E1c ∩ E2 ) tends to zero as n → ∞.
To bound P(E3 ), we first establish the following bound.

Lemma .. P(E3 ) ≤ P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(1)󶁑.

The proof of this lemma is given in Appendix A.

For each ̃l ∈ B(1), the sequence U n ( ̃l) ∼ ∏ni=1 pU (ui ) is independent of Y n . Hence,
by the packing lemma, P{(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(1)} tends to zero as n → ∞
if R̃ − R < I(Y ; U ) − δ(є). Therefore, by Lemma ., P(E3 ) tends to zero as n → ∞ if
R̃ − R < I(Y ; U ) − δ(є). Combining the bounds, we have shown that P(E) tends to zero
as n → ∞ if R > I(X; U ) − I(Y ; U ) + δ(є) + δ(є 󳰀 ).
When there is no “error,” (U n (L), X n , Y n ) ∈ Tє(n) . Thus, by the law of total expecta-
tion and the typical average lemma, the asymptotic distortion averaged over the random
codebook and encoding is upper bounded as

lim sup E(d(X n , X̂ n )) ≤ lim sup 󶀡dmax P(E) + (1 + є) E(d(X, X))

̂ P(E c )󶀱 ≤ D,
n→∞ n→∞

if R > I(X; U) − I(Y ; U) + δ(є) + δ(є󳰀 ) = RSI-D (D/(1 + є)) + δ(є) + δ(є 󳰀 ). Finally from
the continuity of RSI-D (D) in D, taking є → 0 shows that any rate–distortion pair (R, D)
with R > RSI-D (D) is achievable, which completes the proof of achievability.

Remark 11.1 (Deterministic versus random binning). Note that the above proof uses
deterministic instead of random binning. This is because we are given a set of randomly
generated sequences instead of a set of deterministic sequences; hence, there is no need
to also randomize the binning. Following similar arguments to the proof of the lossless
source coding with causal side information, we can readily show that the corner points
of the Slepian–Wolf region are special cases of the Wyner–Ziv theorem. As such, random
binning is not required for either.
Remark 11.2 (Binning versus multicoding). There is an interesting duality between
multicoding and binning. On the one hand, the multicoding technique used in the achiev-
ability proofs of the Gelfand–Pinsker theorem and Marton’s inner bound is a channel
coding technique—we are given a set of messages and we generate a set of codewords
(subcodebook) for each message, which increases the rate. To send a message, we trans-
mit a codeword in its subcodebook that satisfies a desired property. On the other hand,
11.3 Noncausal Side Information Available at the Decoder 285

binning is a source coding technique—we are given a set of indices/sequences and we com-
press them into a smaller number of bin indices, which reduces the rate. To send an in-
dex/sequence, we send its bin index. Thus, although in both techniques we partition a set
into equal-size subsets, multicoding and binning are in some general sense dual to each
other. We discuss other dualities between channel and source coding shortly.

11.3.2 Proof of the Converse

Let M denote the encoded index of X n . The key to the proof is to identify Ui . In general
X̂ i is a function of (M, Y n ). We would like X̂ i to be a function of (Ui , Yi ), so we identify
the auxiliary random variable Ui = (M, Y i−1 , Yi+1 n
). Note that this is a valid choice since
Ui → Xi → Yi form a Markov chain. Consider

where (a) follows since (Xi , Yi ) is independent of (Y i−1 , Yi+1

n
, X i−1 ) and (b) follows by
the convexity of RSI-D (D). Since lim supn→∞ (1/n) ∑i=1 E(d(Xi , X̂ i )) ≤ D (by assumption)
n

and RSI-D (D) is nonincreasing, R ≥ RSI-D (D). The cardinality bound on U can be proved
using the convex cover method in Appendix C. This completes the proof of the Wyner–
Ziv theorem.

11.3.3 Source–Channel Coding Dualities

First recall the two fundamental limits on point-to-point communication—the rate–
̂ X), and the capacity for a DMC
distortion function for a DMS X, R(D) = min p(x̂|x) I( X;
p(y|x), C = max p(x) I(X; Y). By comparing these two expressions, we observe interesting
dualities between the given source X ∼ p(x) and channel p(y|x), the test channel p(x̂|x)
and channel input X ∼ p(x), and the minimum and maximum. Thus, roughly speaking,
these two solutions (and underlying problems) are dual to each other.
286 Lossy Compression with Side Information

Now, let us compare the rate–distortion functions with side information at the decoder

RCSI-D (D) = min I(X; U),

RSI-D (D) = min (I(X; U ) − I(Y ; U))

with the capacity expressions for the DMC with DM state available at the encoder pre-
sented in Section .
CCSI-E = max I(U ; Y ),
CSI-E = max (I(U ; Y ) − I(U ; S)).

There are obvious dualities between the maximum and minimum, and multicoding and
binning. But perhaps the most intriguing duality is that RSI-D (D) is the difference between
the “covering rate” I(U ; X) and the “packing rate” I(U ; Y ), while CSI-E is the difference
between the “packing rate” I(U ; Y) and the “covering rate” I(U ; S).
These types of dualities are abundant in network information theory. For example, we
can observe a similar duality between distributed lossless source coding and the deter-
ministic broadcast channel. Although not as mathematically precise as Lagrange duality
in convex analysis or the BC–MAC duality in Chapter , this covering–packing duality (or
the source coding–channel coding duality in general) can lead to a better understanding
of the underlying coding techniques.

11.4 SOURCE CODING WHEN SIDE INFORMATION MAY BE ABSENT

Consider the lossy source coding setup depicted in Figure ., where the side informa-
tion Y may be not be available at the decoder; thus the encoder needs to send a robust
description of the source X under this uncertainty. Let (X, Y ) be a -DMS and d j (x, x̂ j ),
j = 1, 2, be two distortion measures. The encoder generates a description of X so that
decoder  who does not have any side information can reconstruct X with distortion D1
and decoder  who has side information Y can reconstruct X with distortion D2 .

( X̂ 1n , D1 )
Decoder 
Xn M
Encoder
( X̂ 2n , D2 )
Decoder 

Figure .. Source coding when side information may be absent.

We investigate this problem for the following side information availability scenarios.
The definitions of a (2nR , n) code, achievability for distortion pair (D1 , D2 ), and the rate–
distortion function for each scenario can be defined as before.
11.4 Source Coding When Side Information May Be Absent 287

Causal side information at decoder . If side information is available causally only at

decoder , then
RCSI-D2 (D1 , D2 ) = min I(X; U),

where the minimum is over all conditional pmfs p(u|x) with |U | ≤ |X | + 2 and functions
x̂1 (u) and x̂2 (u, y) such that E(d j (X, X̂ j )) ≤ D j , j = 1, 2.
Noncausal side information at decoder . If the side information is available noncausally
at decoder , then

RSI-D2 (D1 , D2 ) = min 󶀡I(X; X̂ 1 ) + I(X; U | X̂ 1 , Y)󶀱,

where the minimum is over all conditional pmfs p(u, x̂1 |x) with |U | ≤ |X |⋅|X̂1 | + 2 and
functions x̂2 (u, x̂1 , y) such that E(d j (X, X̂ j )) ≤ D j , j = 1, 2. Achievability follows by using
lossy source coding for the source X with reconstruction X̂ 1 at rate I(X; X̂ 1 ) and then
using Wyner–Ziv coding of the pair (X, X̂ 1 ) with side information ( X̂ 1 , Y) available at
decoder . To prove the converse, let M be the encoded index of X n , identify the auxiliary
random variable Ui = (M, Y i−1 , Yi+1 n
, X i−1 ), and consider

The rest of the proof follows similar steps to the proof of the Wyner–Ziv theorem.
Side information at the encoder and decoder . If side information is available (causally
or noncausally) at both the encoder and decoder , then

RSI-ED2 (D1 , D2 ) = min 󶀡I(X, Y ; X̂ 1 ) + I(X; X̂ 2 | X̂ 1 , Y )󶀱,

where the minimum is over all conditional pmfs p(x̂1 , x̂2 |x, y) such that E(d j (X, X̂ j )) ≤
D j , j = 1, 2. Achievability follows by using lossy source coding for the source pair (X, Y)
with reconstruction X̂ 1 at rate I(X, Y ; X̂ 1 ) and then using conditional lossy source coding
for the source X and reconstruction X̂ 2 with side information ( X̂ 1 , Y) available at both
the encoder and decoder  (see Section .). To prove the converse, first observe that
288 Lossy Compression with Side Information

I(X, Y ; X̂ 1 ) + I(X; X̂ 2 | X̂ 1 , Y) = I( X̂ 1 ; Y ) + I(X; X̂ 1 , X̂ 2 |Y). Using the same first steps as

in the proof of the case of side information only at decoder , we have
n
nR ≥ 󵠈󶀡I(Xi ; M, X i−1 , Y i−1 , Yi+1
n
|Yi ) + I(Yi ; M, Y i−1 )󶀱
i=1
n
≥ 󵠈󶀡I(Xi ; X̂ 1i , X̂ 2i |Yi ) + I(Yi ; X̂ 1i )󶀱.
i=1

The rest of the proof follows as before.

SUMMARY

∙ Lossy source coding with causal side information at the decoder:

∙ Lossless source coding with causal side information does not reduce lossless en-
coding rate when p(x, y) > 0 for all (x, y)
∙ Lossy source coding with noncausal side information at the decoder:
∙ Wyner–Ziv coding (compress–bin)
∙ Deterministic binning
∙ Use of the packing lemma in source coding
∙ Corner points of the Slepian–Wolf region are special cases of the Wyner–Ziv rate–
distortion function
∙ Dualities between source coding with side information at the decoder and channel
coding with state information at the encoder

BIBLIOGRAPHIC NOTES

Weissman and El Gamal () established Theorem . and its lossless counterpart in
Theorem .. Theorem . is due to Wyner and Ziv ().
The duality between source and channel coding was first observed by Shannon ().
This duality has been made more precise in, for example, Gupta and Verdú (). Duality
between source coding with side information and channel coding with state information
was further explored by Cover and Chiang () and Pradhan, Chou, and Ramchandran
().
The problem of lossy source coding when side information may be absent was studied
by Heegard and Berger () and Kaspi (). Heegard and Berger () also consid-
ered the case when different side information is available at several decoders. The rate–
distortion function with side information available at both the encoder and decoder  was
established by Kaspi ().
Problems 289

PROBLEMS

.. Let (X, Y) be a -DMS. Consider lossy source coding for X (a) when side infor-
mation Y is available only at the encoder and (b) when Y is available at both the
encoder and the decoder. Provide the details of the proof of the rate–distortion
functions RSI-E (D) in (.) and RSI-ED (D) in (.).
.. Consider lossy source coding with side information for a DSBS(p) in Example ..
Derive the rate–distortion function RSI-ED (D) in (.).
.. Consider quadratic Gaussian source coding with side information in Example ..
(a) Provide the details of the proof of the rate–distortion function RSI-ED in (.).
(b) Provide the details of the proof of the rate–distortion function RSI-D in (.).
.. Randomized source code. Suppose that in the definition of the (2nR , n) lossy source
code with side information available noncausally at the decoder, we allow the en-
coder and the decoder to use random mappings. Specifically, let W be an arbi-
trary random variable independent of the source X and the side information Y .
The encoder generates an index m(x n , W) and the decoder generates an estimate
x̂n (m, y n , W). Show that this randomization does not decrease the rate–distortion
function.
.. Quadratic Gaussian source coding with causal side information. Consider the causal
version of quadratic Gaussian source coding with side information in Example ..
Although the rate–distortion function RCSI-D (D) is characterized in Theorem .,
the conditional distribution and the reconstruction function that attain the mini-
mum are not known. Suppose we use the Gaussian test channel U = X + Z, where
Z ∼ N(0, N) independent of (X, Y), and the MMSE estimate X̂ = E(X|U , Y). Find
the corresponding upper bound on the rate–distortion function.
Remark: This problem was studied in Weissman and El Gamal (), who also
showed that Gaussian test channels (even with memory) are suboptimal.
.. Side information with occasional erasures. Let X be a Bern(1/2) source, Y be the
output of a BEC(p) with input X, and d be a Hamming distortion measure. Find
a simple expression for the rate–distortion function RSI-D (D) for X with side in-
formation Y.
.. Side information dependent distortion measure. Consider lossy source coding with
noncausal side information in Section .. Suppose that the distortion measure
d(x, x̂, y) depends on the side information Y as well. Show that the rate–distortion
function is characterized by the Wyner–Ziv theorem, except that the minimum is
̂ Y )) ≤ D.
over all p(u|x) and x̂(u, y) such that E(d(X, X,
Remark: This problem was studied by Linder, Zamir, and Zeger ().
.. Lossy source coding from a noisy observation with side information. Let (X, Y , Z) ∼
p(x, y, z) be a -DMS and d(x, x̂) be a distortion measure. Consider the lossy
290 Lossy Compression with Side Information

source coding problem, where the encoder has access to a noisy version Y of the
source X instead of X itself and the decoder has side information Z. Thus, unlike
the standard lossy source coding setup with side information, the encoder maps
each y n sequence to an index m ∈ [1 : 2nR ). Otherwise, the definitions of a (2nR , n)
code, achievability, and rate–distortion function are the same as before.
(a) Suppose that the side information Z is causally available at the decoder. This
setup corresponds to the tracking scenario mentioned in Section ., where X
is the location of the target, Y is a description of the terrain information, and
Z is the information from the tracking sensor. Show that the rate–distortion
function is
RCSI-D (D) = min I(Y ; U) for D ≥ Dmin ,
̂ z)
where the minimum is over all conditional pmfs p(u|y) and functions x(u,
̂ ≤ D, and Dmin = minx̂(y,z) E[d(X, x̂(Y , Z))].
such that E(d(X, X))
(b) Now suppose that the side information Z is noncausally available at the de-
coder. Show that the rate–distortion function is
RSI-D (D) = min I(Y ; U |Z) for D ≥ Dmin ,
̂ z)
where the minimum is over all conditional pmfs p(u|y) and functions x(u,
̂ ≤ D.
such that E(d(X, X))
.. Quadratic Gaussian source coding when side information may be absent. Consider
the source coding setup when side information may be absent. Let (X, Y) be a -
WGN source with X ∼ N(0, P) and Y = X + Z, where Z ∼ N(0, N) is independent
of X, and let d1 and d2 be squared error distortion measures. Show that the rate–
distortion function is
PN
R 󶀣 D (D 󶀳 if D1 < P and D2 < D1 N/(D1 + N),
󶀂
󶀒
󶀒 2 1 +N)
󶀒
󶀒 P
󶀒R 󶀣 󶀳 if D1 < P and D2 ≥ D1 N/(D1 + N),
RSI-D2 (D1 , D2 ) = 󶀊 D1
󶀒
󶀒
PN
R 󶀣 D (P+N) 󶀳 if D1 ≥ P, D2 < D1 N/(D1 + N),
󶀒
󶀒
󶀒
󶀚
2

0 otherwise.
.. Lossy source coding with coded side information at the encoder and the decoder.
Let (X, Y ) be a -DMS and d(x, x̂) be a distortion measure. Consider the lossy
source coding setup depicted in Figure ., where coded side information of Y is
available at both the encoder and the decoder. Define R1 (D) as the set of rate pairs
(R1 , R2 ) such that
R1 ≥ I(X; X̂ |U),
R2 ≥ I(Y ; U )

for some conditional pmf p(u|y)p(x|u,̂ x) that satisfies E(d(X, X)) ̂ ≤ D. Further
define R2 (D) to be the set of rate pairs (R1 , R2 ) that satisfy the above inequalities
for some conditional pmf p(u|y)p(x̂|u, x, y) such that E(d(X, X)) ̂ ≤ D.
Problems 291

Xn M1 ( X̂ n , D)
Encoder  Decoder

Yn M2
Encoder 

Figure .. Lossy source coding with coded side information.

(a) Show that R1 (D) is an inner bound on the rate–distortion region.

(b) Show that R1 (D) = R2 (D).
(c) Prove that R2 (D) is an outer bound on the rate–distortion region. Conclude
that R1 (D) is the rate–distortion region.
.. Source coding with degraded side information. Consider the lossy source coding
setup depicted in Figure .. Let (X, Y1 , Y2 ) be a -DMS such that X → Y1 → Y2
form a Markov chain. The encoder sends a description M of the source X so that
decoder j = 1, 2 can reconstruct X with distortion D j using side information Y j .
Find the rate–distortion function.

Y1n
( X̂ 1n , D1 )
Decoder 
Xn M
Encoder
( X̂ 2n , D2 )
Decoder 

Y2n

Figure .. Lossy source coding with degraded side information.

Remark: This result is due to Heegard and Berger ().

.. Complementary delivery. Consider the lossy source coding with side information
setup in Figure .. The encoder sends a description M ∈ [1 : 2nR ] of the -DMS
(X, Y) to both decoders so that decoder  can reconstruct Y with distortion D2 and
decoder  can reconstruct X with distortion D1 . Define the rate–distortion func-
tion R(D1 , D2 ) to be the infimum of rates R such that the rate–distortion triple
(R, D1 , D2 ) is achievable.
(a) Establish the upper bound

R(D1 , D2 ) ≤ min max󶁁I(X; U |Y ), I(Y ; U | X)󶁑,

292 Lossy Compression with Side Information

Xn
(Ŷ n , D2 )
Decoder 
Xn, Y n M
Encoder
( X̂ n , D1 )
Decoder 

Figure .. Complementary delivery.

where the minimum is over all conditional pmfs p(u|x, y) and functions
̂ ≤ D1 and E(d2 (Y , Y))
̂ y) and ŷ(u, x) such that E(d1 (X, X))
x(u, ̂ ≤ D2 .
(b) Let (X, Y ) be a DSBS(p), and d1 and d2 be Hamming distortion measures.
Show that the rate–distortion function for D1 , D2 ≤ p is

R(D1 , D2 ) = H(p) − H(min{D1 , D2 }).

(Hint: For achievability, let U ∈ {0, 1} be a description of X ⊕ Y.)

(c) Now let (X, Y ) be a -WGN source with X ∼ N(0, P) and Y = X + Z, where
Z ∼ N(0, N) is independent of X, and let d1 and d2 be squared error distortion
measures. Let P 󳰀 = PN/(P + N). Show that the rate–distortion function for
D1 ≤ P 󳰀 and D2 ≤ N is

P󳰀 N
R(D1 , D2 ) = max 󶁅R 󶀥 󶀵 , R 󶀥 󶀵󶁕 .
D1 D2

(Hint: For the achievability proof, use the upper bound given in part (a). Let
U = (V1 , V2 ) such that V1 → X → Y → V2 and choose Gaussian test channels
from X to V1 and from Y to V2 , and the functions x̂ and ŷ to be the MMSE
estimate of X given (V1 , Y) and Y given (V2 , X), respectively.)
Remark: This problem was first considered in Kimura and Uyematsu ().

APPENDIX 11A PROOF OF LEMMA 11.1

We first show that

P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(m), ̃l ̸= L 󵄨󵄨󵄨󵄨 M = m󶁑

≤ P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(1) 󵄨󵄨󵄨󵄨 M = m󶁑.
Appendix 11A Proof of Lemma 11.1 293

This holds trivially when m = 1. For m ̸= 1, consider

P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(m), ̃l ̸= L 󵄨󵄨󵄨󵄨 M = m󶁑

= 󵠈 p(l |m) P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(m), ̃l ̸= l 󵄨󵄨󵄨󵄨 L = l, M = m󶁑
l∈B(m)

= 󵠈 p(l |m) P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(m), ̃l ̸= l 󵄨󵄨󵄨󵄨 L = l󶁑

(a)

l∈B(m)

= 󵠈 p(l |m) P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ [1 : 2n(R−R) − 1] 󵄨󵄨󵄨󵄨 L = l󶁑

(b) ̃

l∈B(m)

≤ 󵠈 p(l |m) P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(1) 󵄨󵄨󵄨󵄨 L = l󶁑

l∈B(m)

= 󵠈 p(l |m) P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(1) 󵄨󵄨󵄨󵄨 L = l, M = m󶁑

(c)

l∈B(m)

= P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(1) 󵄨󵄨󵄨󵄨 M = m󶁑,

where (a) and (c) follow since M is a function of L and (b) follows since given L = l, any
̃
collection of 2n(R−R) − 1 codewords U n ( ̃l) with ̃l ̸= l has the same distribution. Hence, we
have

P(E3 ) = 󵠈 p(m) P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(m), ̃l ̸= L | M = m󶁑

≤ 󵠈 p(m) P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(1) 󵄨󵄨󵄨󵄨 M = m󶁑

= P󶁁(U n ( ̃l), Y n ) ∈ Tє(n) for some ̃l ∈ B(1)󶁑.

CHAPTER 12

Distributed Lossy Compression

In this chapter, we consider the general distributed lossy compression system depicted in
Figure .. Let (X1 , X2 ) be a -DMS and d j (x j , x̂ j ), j = 1, 2, be two distortion measures.
Sender j = 1, 2 wishes to communicate its source X j over a noiseless link to the receiver
with a prescribed distortion D j . We wish to find the rate–distortion region, which is the
set of description rates needed to achieve the desired distortions. It is not difficult to see
that this problem includes, as special cases, distributed lossless source coding in Chap-
ter  and lossy source coding with noncausal side information available at the decoder
in Chapter . Unlike these special cases, however, the rate–distortion region of the dis-
tributed lossy source coding problem is not known in general.
We first extend the Wyner–Ziv coding scheme to establish the Berger–Tung inner
bound on the rate–distortion region. The proof involves two new lemmas—the Markov
lemma, which is, in a sense, a stronger version of the conditional typicality lemma, and the
mutual packing lemma, which is a dual to the mutual covering lemma in Section .. We
also establish an outer bound on the rate–distortion region, which is tight in some cases.
The rest of the chapter is largely dedicated to establishing the optimality of the Berger–
Tung coding scheme for the quadratic Gaussian case. The proof of the converse involves
the entropy power inequality, identification of a common-information random variable,
and results from MMSE estimation. We then study the quadratic Gaussian CEO prob-
lem in which each sender observes a noisy version of the same Gaussian source and the
receiver wishes to reconstruct the source with a prescribed distortion. We show that the
Berger–Tung coding scheme is again optimal for this problem. We show via an example,
however, that the Berger–Tung coding scheme is not optimal in general.

X1n M1 ( X̂ 1n , D1 )
Encoder 
Decoder
X2n M2 ( X̂ 2n , D2 )
Encoder 

Figure .. Distributed lossy compression system.

12.1 Berger–Tung Inner Bound 295

12.1 BERGER–TUNG INNER BOUND

A (2nR1 , 2nR2 , n) distributed lossy source code consists of

∙ two encoders, where encoder  assigns an index m1 (x1n ) ∈ [1 : 2nR1 ) to each sequence
x1n ∈ X1n and encoder  assigns an index m2 (x2n ) ∈ [1 : 2nR2 ) to each sequence x2n ∈ X2n ,
and
∙ a decoder that assigns a pair of estimates (x̂1n , x̂2n ) to each index pair (m1 , m2 ) ∈ [1 :
2nR1 ) × [1 : 2nR2 ).

A rate–distortion quadruple (R1 , R2 , D1 , D2 ) is said to be achievable (and a rate pair (R1 ,

R2 ) is achievable for distortion pair (D1 , D2 )) if there exists a sequence of (2nR1 , 2nR2 , n)
codes with
lim sup E(d j (X n , X̂ n )) ≤ D j , j = 1, 2.
j j
n→∞

The rate–distortion region R(D1 , D2 ) for distributed lossy source coding is the closure of
the set of all rate pairs (R1 , R2 ) such that (R1 , R2 , D1 , D2 ) is achievable.
The rate–distortion region for distributed lossy source coding is not known in general.
The following is an inner bound on the rate–distortion region.

Theorem . (Berger–Tung Inner Bound). Let (X1 , X2 ) be a -DMS and d1 (x1 , x̂1 )
and d2 (x2 , x̂2 ) be two distortion measures. A rate pair (R1 , R2 ) is achievable with dis-
tortion pair (D1 , D2 ) for distributed lossy source coding if

R1 > I(X1 ; U1 |U2 , Q),

R2 > I(X2 ; U2 |U1 , Q),
R1 + R2 > I(X1 , X2 ; U1 , U2 |Q)

for some conditional pmf p(q)p(u1 |x1 , q)p(u2 |x2 , q) with |U j | ≤ |X j | + 4, j = 1, 2, and
functions x̂1 (u1 , u2 , q) and x̂2 (u1 , u2 , q) such that E(d j (X j , X̂ j )) ≤ D j , j = 1, 2.

The Berger–Tung inner bound is tight in several special cases.

∙ It reduces to the Slepian–Wolf region when d1 and d2 are Hamming distortion mea-
sures and D1 = D2 = 0 (set U1 = X1 and U2 = X2 ).
∙ It reduces to the Wyner–Ziv rate–distortion function when there is no rate limit on
describing X2 , i.e., R2 ≥ H(X2 ). In this case, the only active constraint I(X1 ; U1 |U2 , Q)
is minimized by U2 = X2 and Q = .
∙ More generally, suppose that d2 is a Hamming distortion measure and D2 = 0. Then
the Berger–Tung inner bound is tight and reduces to the set of rate pairs (R1 , R2 ) such
296 Distributed Lossy Compression

that

R1 ≥ I(X1 ; U1 | X2 ),
R2 ≥ H(X2 |U1 ), (.)
R1 + R2 ≥ I(X1 ; U1 | X2 ) + H(X2 ) = I(X1 ; U1 ) + H(X2 |U1 )

for some conditional pmf p(u1 |x1 ) and function x̂1 (u1 , x2 ) that satisfy the constraint
E(d1 (X1 , X̂ 1 )) ≤ D1 .
∙ The Berger–Tung inner bound is tight for the quadratic Gaussian case, that is, for a
-WGN source and squared error distortion measures. We will discuss this result in
detail in Section ..
However, the Berger–Tung inner bound is not tight in general as we show via an example
in Section ..

12.1.1 Markov and Mutual Packing Lemmas

We will need the following two lemmas in the proof of achievability.

Lemma . (Markov Lemma). Suppose that X → Y → Z form a Markov chain. Let
(x n , y n ) ∈ Tє(n)
󳰀 (X, Y) and Z
n
∼ p(z n |y n ), where the conditional pmf p(z n |yn ) satisfies
the following conditions:
. limn→∞ P{(y n , Z n ) ∈ Tє(n)
󳰀 (Y , Z)} = 1.

. For every z n ∈ Tє(n) n

󳰀 (Z|y ) and n sufficiently large

󳰀 󳰀
2−n(H(Z|Y)+δ(є )) ≤ p(z n | y n ) ≤ 2−n(H(Z|Y)−δ(є ))

for some δ(є 󳰀 ) that tends to zero as є 󳰀 → 0.

Then, for some sufficiently small є 󳰀 < є,

lim P󶁁(x n , y n , Z n ) ∈ Tє(n) (X, Y , Z)󶁑 = 1.

n→∞

The proof of this lemma involves a counting argument and is given in Appendix A.
Note that if p(z n |y n ) = ∏ni=1 p Z|Y (zi |yi ), then the conclusion in the lemma holds by the
conditional typicality lemma in Section ., which was sufficient for the achievability
proof of the Wyner–Ziv theorem in Section .. In the proof of the Berger–Tung inner
bound, however, joint typicality encoding is used twice, necessitating the above lemma,
which is applicable to more general conditional pmfs p(z n |y n ).
The second lemma is a straightforward generalization of the packing lemma in Sec-
tion .. It can be viewed also as a dual to the mutual covering lemma in Section ..
12.1 Berger–Tung Inner Bound 297

Lemma . (Mutual Packing Lemma). Let (U1 , U2 ) ∼ p(u1 , u2 ). Let U1n (l1 ), l1 ∈
[1 : 2nr1 ], be random sequences, each distributed according to ∏ni=1 pU1 (u1i ) with arbi-
trary dependence on the rest of the U1n (l1 ) sequences. Similarly, let U2n (l2 ), l2 ∈ [1 : 2nr2 ],
be random sequences, each distributed according to ∏ni=1 pU2 (u2i ) with arbitrary de-
pendence on the rest of the U2n (l2 ) sequences. Assume that (U1n (l1 ) : l1 ∈ [1 : 2nr1 ]) and
(U2n (l2 ) : l2 ∈ [1 : 2nr2 ]) are independent. Then there exists δ(є) that tends to zero as
є → 0 such that

lim P󶁁(U1n (l1 ), U2n (l2 )) ∈ Tє(n) for some (l1 , l2 ) ∈ [1 : 2nr1 ] × [1 : 2nr2 ]󶁑 = 0,
n→∞

if r1 + r2 < I(U1 ; U2 ) − δ(є).

12.1.2 Proof of the Berger–Tung Inner Bound

Achievability of the Berger–Tung inner bound uses the distributed compress–bin scheme
illustrated in Figure .. As in Wyner–Ziv coding, the scheme uses joint typicality en-
coding and binning at each encoder and joint typicality decoding and symbol-by-symbol
reconstructions at the decoder.

B1 (1) B1 (m1 ) B1 (2nR1 )

̃
un1 (1) u1n (l1 ) u1n (2nR1 )
x1n
x2n
u2n (1)

B2 (1)

B2 (m2 ) u2n (l2 ) Tє(n) (U1 , U2 )

B2 (2nR2 )
̃
u2n (2nR2 )

Figure .. Berger–Tung coding scheme. Each bin B j (m j ), m j ∈ [1 : 2nR 󰑗 ], j = 1, 2,

̃
consists of 2n(R 󰑗 −R 󰑗 ) indices.
298 Distributed Lossy Compression

We prove achievability for |Q| = 1; the rest of the proof follows using time sharing. In
the following we assume that є > є 󳰀 > є 󳰀󳰀 .
Codebook generation. Fix a conditional pmf p(u1 |x1 )p(u2 |x2 ) and functions x̂1 (u1 , u2 )
and x̂2 (u1 , u2 ) such that E(d j (X j , X̂ j )) ≤ D j /(1 + є), j = 1, 2. Let R̃ 1 ≥ R1 , R̃ 2 ≥ R2 . For
̃ ̃
j = 1, 2, randomly and independently generate 2nR 󰑗 sequences unj (l j ), l j ∈ [1 : 2nR 󰑗 ], each
̃
according to ∏ni=1 pU 󰑗 (u ji ). Partition the set of indices l j ∈ [1 : 2nR 󰑗 ] into equal-size bins
̃ ̃
B j (m j ) = [(m j − 1)2n(R 󰑗 −R 󰑗 ) + 1 : m j 2n(R 󰑗 −R 󰑗 ) ], m j ∈ [1 : 2nR 󰑗 ]. The codebook is revealed
to the encoders and the decoder.
̃
Encoding. Upon observing x nj , encoder j = 1, 2 finds an index l j ∈ [1 : 2nR 󰑗 ] such that
(x nj , unj (l j )) ∈ Tє(n)
󳰀󳰀 . If there is more than one such index l j , encoder j selects one of them
̃
uniformly at random. If there is no such index l j , encoder j selects an index from [1 : 2nR 󰑗 ]
uniformly at random. Encoder j = 1, 2 sends the index m j such that l j ∈ B j (m j ).

Decoding. The decoder finds the unique index pair ( ̂l1 , ̂l2 ) ∈ B1 (m1 ) × B2 (m2 ) such that
(u1n ( ̂l1 ), u2n ( ̂l2 )) ∈ Tє(n) . If there is such a unique index pair ( ̂l1 , ̂l2 ), the reconstructions are
computed as x̂1i (u1i ( ̂l1 ), u2i ( ̂l2 )) and x̂2i (u1i ( ̂l1 ), u2i ( ̂l2 )) for i ∈ [1 : n]; otherwise x̂1n and
x̂2n are set to arbitrary sequences in X̂1n and X̂2n , respectively.
Analysis of expected distortion. Let (L1 , L2 ) denote the pair of indices for the chosen
(U1n , U2n ) pair, (M1 , M2 ) be the pair of corresponding bin indices, and (L̂ 1 , L̂ 2 ) be the pair
of decoded indices. Define the “error” event
E = 󶁁(U1n (L̂ 1 ), U2n (L̂ 2 ), X1n , X2n ) ∉ Tє(n) 󶁑
and consider the following events
̃
E1 = 󶁁(U jn (l j ), X nj ) ∉ Tє(n)
󳰀󳰀 for all l j ∈ [1 : 2
nR 󰑗
] for j = 1, 2󶁑,
E2 = 󶁁(U1n (L1 ), X1n , X2n ) ∉ Tє(n)
󳰀 󶁑,

E3 = 󶁁(U1n (L1 ), U2n (L2 ), X1n , X2n ) ∉ Tє(n) 󶁑,

E4 = 󶁁(U1n ( ̃l1 ), U2n ( ̃l2 )) ∈ Tє(n) for some ( ̃l1 , ̃l2 ) ∈ B1 (M1 ) × B2 (M2 ), ( ̃l1 , ̃l2 ) =
̸ (L1 , L2 )󶁑.
By the union of events bound,
P(E) ≤ P(E1 ) + P(E1c ∩ E2 ) + P(E2c ∩ E3 ) + P(E4 ).
We bound each term. By the covering lemma, P(E1 ) tends to zero as n → ∞ if R̃ j >
I(U j ; X j ) + δ(є 󳰀󳰀 ) for j = 1, 2. Since X2n | {U1n (L1 ) = un1 , X1n = x1n } ∼ ∏ni=1 p X2 |X1 (x2i |x1i )
and є 󳰀 > є 󳰀󳰀 , by the conditional typicality lemma, P(E1c ∩ E2 ) tends to zero as n → ∞.
To bound P(E2c ∩ E3 ), let (u1n , x1n , x2n ) ∈ Tє(n)󳰀 (U1 , X1 , X2 ) and consider P{U2 (L2 ) =
n

u2n | X2n = x2n , X1n = x1n , U1n (L1 ) = u1n } = P{U2n (L2 ) = un2 | X2n = x2n } = p(u2n |x2n ). First note
that by the covering lemma, P{U2n (L2 ) ∈ Tє(n) 󳰀 (U2 |x2 ) | X2 = x2 } converges to  as n → ∞,
n n n

that is, p(u2 |x2 ) satisfies the first condition in the Markov lemma. In Appendix B we
n n

show that it also satisfies the second condition.

12.2 Berger–Tung Outer Bound 299

Lemma .. For every u2n ∈ Tє(n)

󳰀 (U2 |x2 ) and n sufficiently large,
n

p(un2 |x2n ) ≐ 2−nH(U2 |X2 ) .

Hence, by the Markov lemma with Z ← U2 , Y ← X2 , and X ← (U1 , X1 ), we have

lim P󶁁(u1n , x1n , x2n , U2n (L2 )) ∈ Tє(n) 󵄨󵄨󵄨󵄨 U1n (L1 ) = un1 , X1n = x1n , X2n = x2n 󶁑 = 1,
n→∞

if (u1n , x1n , x2n ) ∈ Tє(n) 󳰀

󳰀 (U1 , X1 , X2 ) and є < є is sufficiently small. Therefore, P(E2 ∩ E3 )
c

tends to zero as n → ∞.
Following a similar argument to Lemma . in the proof of the Wyner–Ziv theorem,
we have

P(E4 ) ≤ P󶁁(U1n ( ̃l1 ), U2n ( ̃l2 )) ∈ Tє(n) for some ( ̃l1 , ̃l2 ) ∈ B1 (1) × B2 (1)󶁑.

Hence, by the mutual packing lemma, P(E4 ) tends to zero as n → ∞ if (R̃ 1 − R1 ) + (R̃ 2 −
R2 ) < I(U1 ; U2 ) − δ(є). Combining the bounds and eliminating R̃ 1 and R̃ 2 , we have shown
that P(E) tends to zero as n → ∞ if

R1 > I(U1 ; X1 ) − I(U1 ; U2 ) + δ(є 󳰀󳰀 ) + δ(є)

= I(U1 ; X1 |U2 ) + δ 󳰀 (є),
R2 > I(U2 ; X2 |U1 ) + δ 󳰀 (є), (.)
󳰀
R1 + R2 > I(U1 ; X1 ) + I(U2 ; X2 ) − I(U1 ; U2 ) + 2δ (є)
= I(U1 , U2 ; X1 , X2 ) + 2δ 󳰀 (є).

As in previous lossy source coding achievability proofs, by the typical average lemma , it
can be shown that the asymptotic distortions averaged over the random codebook and
encoding, and over (X1n , X2n ) are bounded as

lim sup E(d j (X nj , X̂ nj )) ≤ D j , j = 1, 2,

n→∞

if the inequalities in (.) are satisfied. Now using time sharing establishes the achiev-
ability of every rate pair (R1 , R2 ) that satisfies the inequalities in the theorem for some
conditional pmf p(q)p(u1 |x1 , q)p(u2 |x2 , q) and functions x̂1 (u1 , u2 , q), x̂2 (u1 , u2 , q) such
that E(d j (X j , X̂ j )) < D j , j = 1, 2. Finally, using the continuity of mutual information com-
pletes the proof with nonstrict distortion inequalities.

12.2 BERGER–TUNG OUTER BOUND

The following is an outer bound on the rate–distortion region for distributed lossy source
coding.
300 Distributed Lossy Compression

Theorem . (Berger–Tung Outer Bound). Let (X1 , X2 ) be a -DMS and d1 (x1 , x̂1 )
and d2 (x2 , x̂2 ) be two distortion measures. If a rate pair (R1 , R2 ) is achievable with
distortion pair (D1 , D2 ) for distributed lossy source coding, then it must satisfy the
inequalities
R1 ≥ I(X1 , X2 ; U1 |U2 ),
R2 ≥ I(X1 , X2 ; U2 |U1 ),
R1 + R2 ≥ I(X1 , X2 ; U1 , U2 )

for some conditional pmf p(u1 , u2 |x1 , x2 ) and functions x̂1 (u1 , u2 ) and x̂2 (u1 , u2 ) such
that U1 → X1 → X2 and X1 → X2 → U2 form Markov chains and E(d j (X j , X̂ j )) ≤ D j ,
j = 1, 2.

This outer bound is similar to the Berger–Tung inner bound except that the region is
convex without the use of a time-sharing random variable and the Markov conditions are
weaker than the condition U1 → X1 → X2 → U2 in the inner bound. The proof of the
outer bound uses standard arguments with the auxiliary random variable identifications
U1i = (M1 , X1i−1 , X2i−1 ) and U2i = (M2 , X1i−1 , X2i−1 ).
The Berger–Tung outer bound is again tight when d2 is a Hamming distortion measure
with D2 = 0, which includes as special cases distributed lossless source coding (Slepian–
Wolf) and lossy source coding with noncausal side information available at the decoder
(Wyner–Ziv). The outer bound is also tight when X2 is a function of X1 . In this case,
the Berger–Tung inner bound is not tight as discussed in Section .. The outer bound,
however, is not tight in general. For example, it is not tight for the quadratic Gaussian
case discussed next.

12.3 QUADRATIC GAUSSIAN DISTRIBUTED SOURCE CODING

Consider the distributed lossy source coding problem for a -WGN(1, ρ) source (X1 , X2 )
and squared error distortion measures d j (x j , x̂ j ) = (x j − x̂ j )2 , j = 1, 2. Assume without
loss of generality that ρ = E(X1 X2 ) ≥ 0.
The Berger–Tung inner bound is tight for this case and reduces to the following.

Theorem .. The rate–distortion region R(D1 , D2 ) for distributed lossy source cod-
ing a -WGN(1, ρ) source (X1 , X2 ) and squared error distortion is the intersection of
the following three regions:

R1 (D1 ) = 󶁁(R1 , R2 ) : R1 ≥ д(R2 , D1 )󶁑,

R2 (D2 ) = 󶁁(R1 , R2 ) : R2 ≥ д(R1 , D2 )󶁑,
(1 − ρ2 )ϕ(D1 , D2 )
R12 (D1 , D2 ) = 󶁆(R1 , R2 ) : R1 + R2 ≥ R 󶀦 󶀶󶁖 ,
2D1 D2
12.3 Quadratic Gaussian Distributed Source Coding 301

where

1 − ρ2 + ρ2 2−2R
д(R, D) = R 󶀦 󶀶,
D
ϕ(D1 , D2 ) = 1 + 󵀆1 + 4ρ2 D1 D2 /(1 − ρ2 )2 ,

and R(x), x ≥ 1, is the quadratic Gaussian rate function.

The rate–distortion region R(D1 , D2 ) is sketched in Figure ..

R2 (D2 )

R12 (D1 , D2 )

R1 (D1 )

Figure .. The rate–distortion region for quadratic Gaussian distributed source
coding; R(D1 , D2 ) = R1 (D1 ) ∩ R2 (D2 ) ∩ R12 (D1 , D2 ).

We prove Theorem . in the next two subsections.

12.3.1 Proof of Achievability

Assume without loss of generality that D1 ≤ D2 . We set the auxiliary random variables in
the Berger–Tung inner bound to U1 = X1 + V1 and U2 = X2 + V2 , where V1 ∼ N(0, N1 )
and V2 ∼ N(0, N2 ), N2 ≥ N1 , are independent of each other and of (X1 , X2 ). Let the re-
constructions X̂ 1 and X̂ 2 be the MMSE estimates E(X1 |U1 , U2 ) and E(X2 |U1 , U2 ) of X1 and
X2 , respectively. We refer to such choice of (U1 , U2 ) as a distributed Gaussian test channel
characterized by (N1 , N2 ).
Each distributed Gaussian test channel corresponds to the distortion pair

N1 (1 + N2 − ρ2 )
D1 = E 󶀡(X1 − X̂ 1 )2 󶀱 = ,
(1 + N1 )(1 + N2 ) − ρ2
N2 (1 + N1 − ρ2 )
D2 = E 󶀡(X2 − X̂ 2 )2 󶀱 = ≥ D1 .
(1 + N1 )(1 + N2 ) − ρ2
302 Distributed Lossy Compression

Evaluating the Berger–Tung inner bound with the above choices of test channels and
reconstruction functions, we obtain
(1 + N1 )(1 + N2 ) − ρ2
R1 > I(X1 ; U1 |U2 ) = R 󶀦 󶀶,
N1 (1 + N2 )
(1 + N1 )(1 + N2 ) − ρ2
R2 > I(X2 ; U2 |U1 ) = R 󶀦 󶀶, (.)
N2 (1 + N1 )
(1 + N1 )(1 + N2 ) − ρ2
R1 + R2 > I(X1 , X2 ; U1 , U2 ) = R 󶀦 󶀶.
N1 N2
Since U1 → X1 → X2 → U2 , we have

I(X1 , X2 ; U1 , U2 ) = I(X1 , X2 ; U1 |U2 ) + I(X1 , X2 ; U2 )

= I(X1 ; U1 |U2 ) + I(X2 ; U2 )
= I(X1 ; U1 ) + I(X2 ; U2 |U1 ).

Thus in general, the rate region in (.) has two corner points (I(X1 ; U1 |U2 ), I(X2 ; U2 ))
and (I(X1 ; U1 ), I(X2 ; U2 |U1 )). The first (left) corner point can be expressed as

R2󳰀 = I(X2 ; U2 ) = R 󶀡(1 + N2 )/N2 󶀱,

(1 + N1 )(1 + N2 ) − ρ2
R1󳰀 = I(X1 ; U1 |U2 ) = R 󶀦 󶀶 = д(R2󳰀 , D1 ).
N1 (1 + N2 )

The other corner point (R1󳰀󳰀 , R2󳰀󳰀 ) has a similar representation.

We now consider the following two cases.
Case : (1 − D2 ) ≤ ρ2 (1 − D1 ). For this case, the characterization for R(D1 , D2 ) can be
simplified as follows.

Lemma .. If (1 − D2 ) ≤ ρ2 (1 − D1 ), then R1 (D1 ) ⊆ R2 (D2 ) ∩ R12 (D1 , D2 ).

The proof of this lemma is in Appendix C.

Consider a distributed Gaussian test channel with N1 = D1 /(1 − D1 ) and N2 = ∞ (i.e.,
U2 = ). Then, the left corner point of the rate region in (.) becomes

R2󳰀 = 0,
1 + N1 1
R1󳰀 = R 󶀥 󶀵 = R 󶀥 󶀵 = д(R2󳰀 , D1 ).
N1 D1
Also it can be easily verified that the distortion constraints are satisfied as
1
E 󶀡(X1 − X̂ 1 )2 󶀱 = 1 − = D1 ,
1 + N1
ρ2
E 󶀡(X2 − X̂ 2 )2 󶀱 = 1 − = 1 − ρ2 + ρ2 D1 ≤ D2 .
1 + N1
12.3 Quadratic Gaussian Distributed Source Coding 303

Now consider a test channel with (Ñ 1 , Ñ 2 ), where Ñ 2 < ∞ and Ñ 1 ≥ N1 such that

Ñ 1 (1 + Ñ 2 − ρ2 ) N1
= = D1 .
(1 + Ñ 1 )(1 + Ñ 2 ) − ρ 2 1 + N1

Then, the corresponding distortion pair (D̃ 1 , D̃ 2 ) satisfies D̃ 1 = D1 and

D̃ 2 1/Ñ 1 + 1/(1 − ρ2 ) 1/N1 + 1/(1 − ρ2 ) D2

= ≤ = ,
D̃ 1 1/Ñ 2 + 1/(1 − ρ2 ) 1/(1 − ρ2 ) D1

i.e., D̃ 2 ≤ D2 . Furthermore, as shown before, the left corner point of the corresponding
rate region is

R̃ 2 = I(X2 ; U2 ) = R 󶀡(1 + Ñ 2 )/Ñ 2 󶀱,

R̃ 1 = д(R̃ 2 , D̃ 1 ).

Hence, by varying 0 < Ñ 2 < ∞, we can achieve the entire region R1 (D1 ).
Case : (1 − D2 ) > ρ2 (1 − D1 ). In this case, there exists a distributed Gaussian test channel
with some (N1 , N2 ) such that both distortion constraints are tight. To show this, we use the
following result, which is a straightforward consequence of the matrix inversion lemma
in Appendix B.

Lemma .. Let U j = X j + V j , j = 1, 2, where X = (X1 , X2 ) and V = (V1 , V2 ) are in-

dependent zero-mean Gaussian random vectors with respective covariance matrices
KX , KV ≻ 0. Let KX|U = KX − KXU KU−1 KXU T
be the error covariance matrix of the (lin-
ear) MMSE estimate of X given U. Then

−1 −1
KX|U = KX−1 + KX−1 KXU
T
󶀢KU − KXU KX−1 KXU
T
󶀲 KXU KX−1 = KX−1 + KV−1 .

Hence, V1 and V2 are independent iff KV−1 = KX|U

−1
− KX−1 is diagonal.

Now let
D1 θ󵀄D1 D2
K=󶀄
󶀜 󶀝,
󶀅
θ󵀄D1 D2 D2

where θ ∈ [0, 1] is chosen so that K −1 − KX−1 = diag(N1 , N2 ) for some N1 , N2 > 0. By

simple algebra, if (1 − D2 ) > ρ2 (1 − D1 ), then such a choice of θ exists and is given by

󵀆(1 − ρ2 )2 + 4ρ2 D1 D2 − (1 − ρ2 )
θ= . (.)
2ρ󵀄D1 D2

Hence, by Lemma ., we have the covariance matrix K = KX−X̂ = KX|U with U j = X j +
V j , j = 1, 2, where V1 ∼ N(0, N1 ) and V2 ∼ N(0, N2 ) are independent of each other and
304 Distributed Lossy Compression

of (X1 , X2 ). In other words, there exists a distributed Gaussian test channel characterized
by (N1 , N2 ) with corresponding distortion pair (D1 , D2 ).
With such choice of the distributed Gaussian test channel, it can be readily verified
that the left corner point (R1󳰀 , R2󳰀 ) of the rate region is

(1 − ρ2 )ϕ(D1 , D2 )
R1󳰀 + R2󳰀 = R 󶀦 󶀶,
2D1 D2
1 + N2
R2󳰀 = R 󶀥 󶀵,
N2
R1󳰀 = д(R2󳰀 , D1 ).

Therefore, (R1󳰀 , R2󳰀 ) is on the boundary of both R12 (D1 , D2 ) and R1 (D1 ). Following similar
steps to case , we can show that any rate pair (R̃ 1 , R̃ 2 ) such that R̃ 1 = д(R̃ 2 , D1 ) and R̃ 2 ≥ R2󳰀
is achievable; see Figure .. Similarly, the right corner point (R1󳰀󳰀 , R2󳰀󳰀 ) is on the boundary
of R12 (D1 , D2 ) and R2 (D2 ), and any rate pair (R̃ 1 , R̃ 2 ) such that R̃ 2 = д(R̃ 1 , D2 ) and R̃ 1 ≥
R1󳰀󳰀 is achievable. Using time sharing between the two corner points completes the proof
for case .

R2 R1 (D1 )
R(Ñ 1 , Ñ 2 )
R(N1 , N2 )
R̃ 2

R2󳰀
R12 (D1 , D2 )

R2 (D2 )
R2󳰀󳰀
R1
R̃ 1 R1󳰀 R1󳰀󳰀

Figure .. Rate region for case . Here R(N1 , N2 ) denotes the rate region for
(N1 , N2 ) in (.) with left corner point (R1󳰀 , R2󳰀 ) and right corner point (R1󳰀󳰀 , R2󳰀󳰀 ),
and R(Ñ 1 , Ñ 2 ) denotes the rate region for (Ñ 1 , Ñ 2 ) with left corner point (R̃ 1 , R̃ 2 ).

The rest of the achievability proof follows by the discretization procedure described
in the proof of the quadratic Gaussian source coding theorem in Section ..

12.3.2 Proof of the Converse

We first show that the rate–distortion region is contained in both R1 (D1 ) and R2 (D2 ),
i.e., R(D1 , D2 ) ⊆ R1 (D1 ) ∩ R2 (D2 ). For a given sequence of (2nR1 , 2nR2 , n) codes that
12.3 Quadratic Gaussian Distributed Source Coding 305

achieves distortions D1 and D2 , consider

where (a) follows by Jensen’s inequality and the distortion constraint. Next consider

nR2 ≥ I(M2 ; X2n )

= h(X2n ) − h(X2n |M2 )
n
= log(2πe) − h(X2n |M2 ) .
2

Since the sources X1 and X2 are jointly Gaussian, we can express X1n as X1n = ρX2n + W n ,
where {Wi } is a WGN(1 − ρ2 ) process independent of {X2i } and hence of M2 . Now by the
conditional EPI,
󰑛 󰑛 󰑛
22h(X1 |M2 )/n ≥ 22h(ρX2 |M2 )/n + 22h(W |M2 )/n

󰑛
= ρ2 22h(X2 |M2 )/n + 2πe(1 − ρ2 )
≥ 2πe󶀡ρ2 2−2R2 + (1 − ρ2 )󶀱.

Therefore
n n
nR1 ≥ log󶀡2πe(1 − ρ2 + ρ2 2−2R2 )󶀱 − log(2πeD1 )
2 2
= nд(R2 , D1 )

and (R1 , R2 ) ∈ R1 (D1 ). Similarly, R(D1 , D2 ) ⊆ R2 (D2 ).

Remark .. The above argument shows that the rate–distortion region when only X1
is to be reconstructed with distortion D1 < 1 is R(D1 , 1) = R1 (D1 ). Similarly, when only
X2 is to reconstructed, R(1, D2 ) = R2 (D2 ).

Proof of R(D1 , D2 ) ⊆ R12 (D1 , D2 ). We now proceed to show that the rate–distortion
region is contained in R12 (D1 , D2 ). By assumption and in light of Lemma ., we assume
without loss of generality that (1 − D1 ) ≥ (1 − D2 ) ≥ ρ2 (1 − D1 ) and the sum-rate bound
is active. We establish the following two lower bounds R 󳰀 (θ) and R󳰀󳰀 (θ) on the sum-rate
306 Distributed Lossy Compression

parametrized by θ ∈ [−1, 1] and show that

(1 − ρ2 )ϕ(D1 , D2 )
min max{R 󳰀 (θ), R󳰀󳰀 (θ)} = R 󶀦 󶀶,
θ 2D1 D2
which implies that R(D1 , D2 ) ⊆ R12 (D1 , D2 ).
Cooperative lower bound. The sum-rate for the distributed source coding problem is
lower bounded by the sum-rate for the cooperative (centralized) lossy source coding set-
ting depicted in Figure ..

X1n ( X̂ 1n , D1 )
R1 + R2
Encoder Decoder
X2n ( X̂ 2n , D2 )

Figure .. Cooperative lossy source coding.

For any sequence of codes that achieves the desired distortions, consider
n(R1 + R2 ) ≥ H(M1 , M2 )
= I(X1n , X2n ; M1 , M2 )
= h(X1n , X2n ) − h(X1n , X2n |M1 , M2 )
n
n
≥ log 󶀡(2πe)2 |KX |󶀱 − 󵠈 h(X1i , X2i |M1 , M2 )
2 i=1
n
n
≥ log 󶀡(2πe)2 |KX |󶀱 − 󵠈 h(X1i − X̂ 1i , X2i − X̂ 2i )
2 i=1
n
n 1
≥ log 󶀡(2πe)2 |KX |󶀱 − 󵠈 log 󶀡(2πe)2 | K̃ i |󶀱
2 i=1
2
n n
≥ log 󶀡(2πe)2 |KX |󶀱 − log 󶀡(2πe)2 | K̃ |󶀱
2 2
n |KX |
= log ,
2 ̃
|K|
where K̃ i = KX(i)−X(i)
̂ and K̃ = (1/n) ∑ni=1 K̃ i . Since K(
̃ j, j) ≤ D j , j = 1, 2,

D1 θ󵀄D1 D2
K̃ ⪯ K(θ) = 󶀜
󶀄 󶀝
󶀅 (.)
θ󵀄D1 D2 D2
for some θ ∈ [−1, 1]. Hence,
|KX | 1 − ρ2
R1 + R2 ≥ R󳰀 (θ) = R 󶀥 󶀵 = R󶀦 󶀶. (.)
|K(θ)| D1 D2 (1 − θ 2 )
12.3 Quadratic Gaussian Distributed Source Coding 307

μ-Sum lower bound. Let Yi = μ1 X1i + μ2 X2i + Zi = μT X(i) + Zi , where {Zi } is a WGN(N)
process independent of {(X1i , X2i )}. Then for every (μ, N),

n(R1 + R2 ) ≥ H(M1 , M2 )
= I(Xn , Y n ; M1 , M2 )
≥ h(Xn , Y n ) − h(Y n |M1 , M2 ) − h(Xn |M1 , M2 , Y n )
n
≥ 󵠈 󶀡h(X(i), Yi ) − h(Yi |M1 , M2 ) − h(X(i)|M1 , M2 , Y n )󶀱
i=1
n
1 |KX | N
≥ 󵠈 log
i=1
2 |K̂ i | (μT K̃ i μ + N)
n |KX | N
≥ log
2 |K| (μT K̃ μ + N)
̂
n |KX | N
≥ log , (.)
2 ̂ (μ K(θ)μ + N)
|K| T

where K̂ i = KX(i)|M1 ,M2 ,Y 󰑛 , K̂ = (1/n) ∑i=1 K̂ i , and K(θ) is defined in (.).

We now take μ = (1/󵀄D1 , 1/󵀄D2 ) and N = (1 − ρ2 )/(ρ󵀄D1 D2 ). Then μT K(θ)μ =
2(1 + θ) and it can be readily shown that X1i → Yi → X2i form a Markov chain. Further-
more, we can upper bound |K| ̂ as follows.

̂ ≤ D1 D2 N 2 (1 + θ)2 /(2(1 + θ) + N)2 .

Lemma .. |K|

This can be shown by noting that K̂ is diagonal, which follows since X1i → Yi → X2i
form a Markov chain, and by establishing the matrix inequality K̂ ⪯ (K̃ −1 + (1/N)μμT )−1
using results from estimation theory. The details are given in Appendix D.
Using the above lemma, the bound in (.), and the definitions of μ and N establishes
the second lower bound on the sum-rate

|KX | (2(1 + θ) + N) 2ρ󵀄D1 D2 (1 + θ) + (1 − ρ2 )

R1 + R2 ≥ R󳰀󳰀 (θ) = R 󶀥 󶀵 = R 󶀦 󶀶.
ND1 D2 (1 + θ)2 D1 D2 (1 + θ)2
(.)
Finally, combining the cooperative and μ-sum lower bounds in (.) and (.), we have

R1 + R2 ≥ min max{R󳰀 (θ), R 󳰀󳰀 (θ)}.

These two bounds are plotted in Figure .. It can be easily checked that R󳰀 (θ) = R󳰀󳰀 (θ)
at a unique point
󵀆(1 − ρ2 )2 + 4ρ2 D1 D2 − (1 − ρ2 )
θ = θ∗ = ,
2ρ󵀄D1 D2
308 Distributed Lossy Compression

R 󳰀󳰀 (θ)

R 󳰀 (θ)

θ
−1 0 θ∗ 1

Figure .. Plot of the two bounds on the sum-rate.

which is exactly what we chose in the proof of achievability; see (.). Finally, since R󳰀 (θ)
is increasing on [θ ∗ , 1) and R󳰀󳰀 (θ) is decreasing on (−1, 1), we can conclude that

min max{R 󳰀 (θ), R󳰀󳰀 (θ)} = R󳰀 (θ ∗ ) = R󳰀󳰀 (θ ∗ ).

Hence
(1 − ρ2 )ϕ(D1 , D2 )
R1 + R2 ≥ R 󳰀 (θ ∗ ) = R 󳰀󳰀 (θ ∗ ) = R 󶀦 󶀶.
2D1 D2
This completes the proof of Theorem ..
Remark .. The choice of the auxiliary random variable Y such that X1 → Y → X2
form a Markov chain captures common information between X1 and X2 . A similar auxil-
iary random variable will be used in the converse proof for quadratic Gaussian multiple
descriptions in Section .; see also Section . for a detailed discussion of common
information.

12.4 QUADRATIC GAUSSIAN CEO PROBLEM

The CEO (chief executive or estimation officer) problem is motivated by distributed esti-
mation and detection settings, such as a tracker following the location of a moving object
over time using noisy measurements from multiple sensors, or a CEO making a decision
using several sketchy briefings from subordinates; see Chapter  for more discussion on
communication for computing. The problem is formulated as a distributed source cod-
ing problem where the objective is to reconstruct a source from coded noisy observations
rather than reconstructing the noisy observations themselves. We consider the special
case depicted in Figure .. A WGN(P) source X is observed through a Gaussian broad-
cast channel Y j = X + Z j , j = 1, 2, where Z1 ∼ N(0, N1 ) and Z2 ∼ N(0, N2 ) are noise com-
ponents, independent of each other and of X. The observation sequences Y jn , j = 1, 2, are
separately encoded with the goal of finding an estimate X̂ n of X n with mean squared error
distortion D.
12.4 Quadratic Gaussian CEO Problem 309

Z1n

Y1n M1
Encoder 
Xn ( X̂ n , D)
Decoder
Y2n M2
Encoder 

Z2n

Figure .. Quadratic Gaussian CEO problem.

A (2nR1 , 2nR2 , n) code for the CEO problem consists of

∙ two encoders, where encoder  assigns an index m1 (y1n ) ∈ [1 : 2nR1 ) to each sequence
y1n and encoder  assigns an index m2 (y2n ) ∈ [1 : 2nR2 ) to each sequence y2n , and
∙ a decoder that assigns an estimate x̂n to each index pair (m1 , m2 ) ∈ [1 : 2nR1 ) × [1 :
2nR2 ).
A rate–distortion triple (R1 , R2 , D) is said to be achievable if there exists a sequence of
(2nR1 , 2nR2 , n) codes with
n
1
lim sup E󶀣 󵠈(Xi − X̂ i )2 󶀳 ≤ D.
n→∞ n i=1

The rate–distortion region RCEO (D) for the quadratic Gaussian CEO problem is the clo-
sure of the set of all rate pairs (R1 , R2 ) such that (R1 , R2 , D) is achievable.
This problem is closely related to the quadratic Gaussian distributed source coding
problem and the rate–distortion function is given by the following.

Theorem .. The rate–distortion region RCEO (D) for the quadratic Gaussian CEO
problem is the set of rate pairs (R1 , R2 ) such that
−1
1 1 1 − 2−2r2
R1 ≥ r1 + R 󶀧 󶀦 + 󶀶 󶀷,
D P N2
−1
1 1 1 − 2−2r1
R2 ≥ r2 + R 󶀧 󶀦 + 󶀶 󶀷,
D P N1
P
R1 + R2 ≥ r1 + r2 + R 󶀤 󶀴
D
for some r1 , r2 ≥ 0 that satisfy the condition
−1
1 1 − 2−2r1 1 − 2−2r2
D≥󶀦 + + 󶀶 .
P N1 N2
310 Distributed Lossy Compression

Achievability is proved using the Berger–Tung coding scheme with distributed Gauss-
ian test channels as for quadratic Gaussian distributed lossy source coding; see Prob-
lem ..

12.4.1 Proof of the Converse

We need to show that for any sequence of (2nR1 , 2nR2 , n) codes such that
n
1
lim sup 󵠈 E󶀡(Xi − X̂ i )2 󶀱 ≤ D,
n→∞ n i=1

the rate pair (R1 , R2 ) ∈ RCEO (D). Let J be a subset of {1, 2} and J c be its complement.
Consider

j∈J
+
n P
≥ 󶁤 log 󶀤 󶀴 − I(X n ; M(J c ))󶁴 + 󵠈 nr j , (.)
2 D j∈J

where we define
1
rj = I(Y jn ; M j | X n ), j = 1, 2.
n

In order to upper bound I(X n ; M(J c )) = h(X n ) − h(X n |M(J c )) = (n/2) log(2πeP) −
h(X n |M(J c )), or equivalently, to lower bound h(X n |M(J c )), let X̃ i be the MMSE esti-
mate of Xi given Yi (J c ). Then, by the orthogonality principle in Appendix B,

Ñ Ñ
X̃ i = 󵠈 Y ji = 󵠈 (X + Z ji ), (.)
j∈J 󰑐
Nj j∈J 󰑐
Nj i
Xi = X̃ i + Z̃i , (.)

where { Z̃i } is a WGN process with average power Ñ = 1/(1/P + ∑ j∈J 󰑐 1/N j ), independent
of {Yi (J c )}. By the conditional EPI and (.), we have
󰑛
|M(J 󰑐 ))/n ̃ 󰑛 |M(J 󰑐 ))/n ̃ 󰑛 |M(J 󰑐 ))/n
22h(X ≥ 22h(X + 22h(Z
̃ 󰑛 |X 󰑛 ,M(J 󰑐 ))+I( X̃ 󰑛 ;X 󰑛 |M(J 󰑐 ))]/n
= 22[h( X ̃
+ 2πe N.
12.4 Quadratic Gaussian CEO Problem 311

We now lower bound each term in the exponent. First, by (.) and the definition
of r j ,

̃ 󰑛 |X 󰑛 ,M(J 󰑐 ))/n Ñ 2 2h(Y 󰑗󰑛 |X 󰑛 ,M(J 󰑐 ))/n

22h( X ≥ 󵠈 2
j∈J 󰑐
N 2j
Ñ 2 2h(Y 󰑗󰑛 |X 󰑛 ,M 󰑗 )/n
= 󵠈 2
j∈J 󰑐
N 2j
Ñ 2 2(h(Y 󰑗󰑛 |X 󰑛 )−I(Y󰑗󰑛 ;M 󰑗 |X 󰑛 ))/n
= 󵠈 2
j∈J 󰑐
N 2j
Ñ 2
= 󵠈 (2πe)2−2r 󰑗 .
j∈J 󰑐
Nj

For the second term, since X n = X̃ n + Z̃ n and Z̃ n is conditionally independent of Y n (J c )

and thus of M(J c ),
󰑛 ̃ 󰑛 |M(J 󰑐 ))/n 󰑛 󰑐 󰑛 ̃󰑛 󰑐
22I(X ;X
= 22[h(X |M(J ))−h(X | X ,M(J ))]/n

1 󰑛 󰑐
= 22h(X |M(J ))/n .
2πe Ñ
Therefore
󰑛
|M(J 󰑐 ))/n Ñ 2 1 󰑛 󰑐
22h(X ≥󶀤󵠈 (2πe)2−2r 󰑗 󶀴 󶀤 22h(X |M(J ))/n 󶀴 + 2πe Ñ
j∈J 󰑐
Nj 2πe Ñ
Ñ −2r 󰑗 2h(X 󰑛 |M(J 󰑐 ))/n
= 󵠈 2 2 + 2πe Ñ ,
j∈J 󰑐
Nj

or equivalently,
Ñ −2r 󰑗
1 − ∑ j∈J 󰑐 2
−2h(X 󰑛 |M(J 󰑐 ))/n N󰑗
2 ≤
2πe Ñ
and
󰑛
;M(J 󰑐 ))/n P P −2r 󰑗
22I(X ≤ − 󵠈 2
̃ N
N j∈J 󰑐 j
P
=1+ 󵠈 (1 − 2−2r 󰑗 ).
j∈J 󰑐
Nj

Thus, continuing the lower bound in (.), we have

+
1 P 1 P
󵠈 R j ≥ 󶁦 log󶀤 󶀴 − log󶀦1 + 󵠈 (1 − 2−2r 󰑗 )󶀶󶁶 + 󵠈 r j
j∈J
2 D 2 j∈J 󰑐
N j j∈J
+
1 1 1 1 1
= 󶁦 log󶀤 󶀴 − log󶀦 + 󵠈 (1 − 2−2r 󰑗 )󶀶󶁶 + 󵠈 r j .
2 D 2 P j∈J 󰑐 N j j∈J

Substituting J = {1, 2}, {1}, {2}, and  establishes the four inequalities in Theorem ..
312 Distributed Lossy Compression

12.5* SUBOPTIMALITY OF BERGER–TUNG CODING

In previous sections we showed that the Berger–Tung coding scheme is optimal in sev-
eral special cases. The scheme, however, does not perform well when the sources have a
common part; see Section . for a formal definition. In this case, the encoders can send
the description of the common part cooperatively to reduce the rates.

Proposition .. Let (X1 , X2 ) be a -DMS such that X2 is a function of X1 , and d2 ≡ 0,

that is, the decoder is interested in reconstructing only X1 . Then the rate–distortion
region R(D1 ) for distributed lossy source coding is the set of rate pairs (R1 , R2 ) such
that

R1 ≥ I(X1 ; X̂ 1 |V ),
R2 ≥ I(X2 ; V )

for some conditional pmf p(󰑣|x2 )p(x̂1 |x1 , 󰑣) with |V| ≤ |X2 | + 2 that satisfy the con-
straint E(d1 (X1 , X̂ 1 )) ≤ D1 .

The converse follows by the Berger–Tung outer bound. To prove achievability, en-
coder  describes X2 with V . Now that encoder  has access to X2 and thus to V , it can
describe X1 with X̂ 1 conditioned on V . It can be shown that this scheme achieves the
rate–distortion region in the proposition.
To illustrate this result, consider the following.
Example .. Let ( X,̃ Ỹ ) be a DSBS(p), p ∈ (0, 1/2), X1 = ( X, ̃ Y),
̃ X2 = Ỹ , d1 (x1 , x̂1 ) =
d(x̃, x̂1 ) be a Hamming distortion measure, and d2 (x2 , x̂2 ) ≡ 0. We consider the minimum
achievable sum-rate for distortion D1 . By taking p(󰑣| ỹ) to be a BSC(α), the region in
Proposition . can be further simplified to the set of rate pairs (R1 , R2 ) such that

R1 > H(α ∗ p) − H(D1 ),

(.)
R2 > 1 − H(α)

for some α ∈ [0, 1/2] that satisfies the constraint H(α ∗ p) − H(D1 ) ≥ 0.
It can be shown that the Berger–Tung inner bound is contained in the above region.
In fact, noting that RSI-ED (D1 ) < RSI-D (D1 ) for a DSBS(α ∗ p) and using Mrs. Gerber’s
lemma, it can be shown that a boundary point of the region in (.) lies strictly outside
the Berger–Tung inner bound. Thus, the Berger–Tung coding scheme is suboptimal in
general.

Remark . (Berger–Tung coding with common part). Using the common part of a
general -DMS, the cooperative coding scheme in Proposition . and the Berger–Tung
coding scheme can be combined to yield a tighter inner bound on the rate–distortion
region for distributed lossy source coding.
Summary 313

SUMMARY

∙ Lossy distributed source coding setup

∙ Rate–distortion region
∙ Berger–Tung inner bound:
∙ Distributed compress–bin
∙ Includes Wyner–Ziv and Slepian–Wolf as special cases
∙ Not tight in general
∙ Markov lemma
∙ Mutual packing lemma
∙ Quadratic Gaussian distributed source coding:
∙ Berger–Tung coding is optimal
∙ Identification of a common-information random variable in the proof of the
μ-sum lower bound
∙ Use of the EPI in the proof of the converse
∙ Use of MMSE estimation results in the proof of the converse
∙ Berger–Tung coding is optimal for the quadratic Gaussian CEO problem

BIBLIOGRAPHIC NOTES

Theorems . and . are due to Berger () and Tung (). Tung () established
the Markov lemma and also showed that the Berger–Tung outer bound is not tight in
general. A strictly improved outer bound was recently established by Wagner and Anan-
tharam (). Berger and Yeung () established the rate–distortion region in (.).
The quadratic Gaussian was studied by Oohama (), who showed that R(D1 , D2 ) ⊆
R1 (D1 ) ∩ R2 (D2 ). Theorem . was established by Wagner, Tavildar, and Viswanath
(). Our proof of the converse combines key ideas from the proofs in Wagner, Tavil-
dar, and Viswanath () and Wang, Chen, and Wu (). The CEO problem was first
introduced by Berger, Zhang, and Viswanathan (). The quadratic Gaussian case was
studied by Viswanathan and Berger () and Oohama (). Theorem . is due to
Oohama () and Prabhakaran, Tse, and Ramchandran (). The rate–distortion
region for quadratic Gaussian distributed source coding with more than two sources is
known for sources satisfying a certain tree-structured Markov condition (Tavildar, Vis-
wanath, and Wagner ). In addition, the minimum sum-rate for quadratic Gaussian
distributed lossy source coding with more than two sources is known in several special
cases (Yang, Zhang, and Xiong ). The rate–distortion region for the Gaussian CEO
314 Distributed Lossy Compression

problem can be extended to more than two encoders (Oohama , Prabhakaran, Tse,
and Ramchandran ).
Proposition . was established by Kaspi and Berger (). Example . is due to
Wagner, Kelly, and Altuğ (), who also showed that the Berger–Tung coding scheme is
suboptimal even for sources without a common part.

PROBLEMS

.. Prove the mutual packing lemma.

.. Show that the Berger–Tung inner bound can be alternatively expressed as the set
of rate pairs (R1 , R2 ) such that
R1 > I(X1 , X2 ; U1 |U2 , Q),
R2 > I(X1 , X2 ; U2 |U1 , Q),
R1 + R2 > I(X1 , X2 ; U1 , U2 |Q)
for some conditional pmf p(q)p(u1 |x1 , q)p(u2 |x2 , q) and functions x̂1 (u1 , u2 , q),
x̂2 (u1 , u2 , q) that satisfy the constraints D j ≥ E(d j (X j , X̂ j )), j = 1, 2.
.. Prove the Berger–Tung outer bound.
.. Lossy source coding with coded side information only at the decoder. Let (X, Y) be
a -DMS and d(x, x̂) be a distortion measure. Consider the lossy source coding
setup depicted in Figure ., where coded side information of Y is available at the
decoder.

Xn M1 ( X̂ n , D)
Encoder  Decoder

Yn M2
Encoder 

Figure .. Lossy source coding with coded side information.

(a) Let R(D) be the set of rate pairs (R1 , R2 ) such that
R1 > I(X; U |V , Q),
R2 > I(Y ; V |Q)
for some conditional pmf p(q)p(u|x, q)p(󰑣|y, q) and function x̂(u, 󰑣, q) such
̂ ≤ D. Show that any rate pair (R1 , R2 ) is achievable.
that E(d(X, X))
(b) Consider the Berger–Tung inner bound with (X1 , X2 ) = (X, Y ), d1 = d, and
d2 ≡ 0. Let R 󳰀 (D) denote the resulting region. Show that R(D) = R 󳰀 (D).
Appendix 12A Proof of the Markov Lemma 315

.. Distributed lossy source coding with one distortion measure. Consider the distrib-
uted lossy source coding setup where X2 is to be recovered losslessly and X1 is to
be reconstructed with distortion D1 . Find the rate–distortion region.
Remark: This result is due to Berger and Yeung ().
.. Achievability for the quadratic Gaussian CEO problem. Show that the Berger–Tung
̂ 2 ) ≤ D reduces to the
inner bound with the single distortion constraint E((X − X)
rate–distortion region R(D) in Theorem ..
(a) Let R 󳰀 (D) be the set of rate pairs (R1 , R2 ) that satisfy the first three inequalities
in Theorem . and the last one with equality. Show that R 󳰀 (D) = R(D).
(Hint: Consider three cases 1/P + (1 − 2−2r1 )/N1 > 1/D, 1/P + (1 − 2−2r2 )/N2 >
1/D, and otherwise.)
(b) Let U j = Y j + V j , where V j ∼ N(0, Ñ j ) for j = 1, 2, and X̂ = E(X|U1 , U2 ). Show
that
E󶁡(X − X) ̂ 2 󶁱 = 󶀡1/P + (1 − 2−2r1 )/N1 + (1 − 2−2r2 )/N2 󶀱−1 ,

where r j = (1/2) log(1 + Ñ j /N j ), j = 1, 2.

APPENDIX 12A PROOF OF THE MARKOV LEMMA

By the union of events bound,

P󶁁(x n , y n , Z n ) ∉ Tє(n) 󶁑 ≤ 󵠈 P󶁁|π(x, y, z|x n , yn , Z n ) − p(x, y, z)| ≥ єp(x, y, z)󶁑.

x, y,z

Hence it suffices to show that

P(E(x, y, z)) = P󶁁|π(x, y, z|x n , y n , Z n ) − p(x, y, z)| ≥ єp(x, y, z)󶁑

tends to zero as n → ∞ for each (x, y, z) ∈ X × Y × Z.

Given (x, y, z), define the sets

A1 = 󶁁z n : (yn , z n ) ∈ Tє(n)
󳰀 (Y , Z)󶁑,

A2 = 󶁁z n : |π(x, y, z|x n , y n , z n ) − p(x, y, z)| ≥ єp(x, y, z)󶁑.

Then P(E(x, y, z)) = P{Z n ∈ A2 } ≤ P{Z n ∈ A1c } + P{Z n ∈ A1 ∩ A2 }.

We bound each term. By the first condition on p(z n |y n ), the first term tends to zero
as n → ∞. For the second term, let

B1 (n yz ) = A1 ∩ 󶁁z n : π(y, z | y n , z n ) = n yz /n󶁑 for n yz ∈ [0 : n].

316 Distributed Lossy Compression

Then, for n sufficiently large,

󳰀
P{Z n ∈ A1 ∩ A2 } ≤ 󵠈 |B1 (n yz ) ∩ A2 | 2−n(H(Z|Y)−δ(є )) ,
n 󰑦󰑧

where the summation is over all n yz such that |n yz − np(y, z)| ≤ є 󳰀 np(y, z) and the in-
equality follows by the second condition on p(z n |y n ).
Let nx y = nπ(x, y|x n , y n ) and n y = nπ(y|y n ). Let K be a hypergeometric random
variable that represents the number of red balls in a sequence of n yz draws without re-
placement from a bag of nx y red balls and n y − nx y blue balls. Then

n󰑥 󰑦 n −n
󶀣 k
󶀳󶀣 n󰑦 − 󰑥k󰑦 󶀳 |B1 (n yz ) ∩ B2 (k)|
󰑦󰑧
P{K = k} = = ,
n
󶀣n 󰑦 󶀳 |B1 (n yz )|
󰑦󰑧

where
B2 (k) = {z n : π(x, y, z|x n , y n , z n ) = k/n}

for k ∈ [0 : min{nx y , n yz }]. Thus

|B1 (n yz ) ∩ B2 (k)|
|B1 (n yz ) ∩ A2 | = |B1 (n yz )| 󵠈
k:|k−np(x, y,z)|≥єnp(x, y,z)
|B1 (n yz )|
≤ |A1 | 󵠈 P{K = k}
k:|k−np(x, y,z)|≥єnp(x, y,z)
󳰀
≤ 2n(H(Z|Y)+δ(є )) P󶁁|K − np(x, y, z)| ≥ єnp(x, y, z)󶁑.

Note that from the conditions on nx y , n yz , and n y , E(K) = nx y n yz /n y satisfies the bounds
n(1 − δ(є 󳰀 ))p(x, y, z) ≤ E(K) ≤ n(1 + δ(є 󳰀 ))p(x, y, z). Now let K 󳰀 ∼ Binom(nx y , n yz /n y )
be a binomial random variable with the same mean E(K 󳰀 ) = E(K) = nx y n yz /n y . Then
it can be shown (Uhlmann , Orlitsky and El Gamal ) that n(1 − є)p(x, y, z) ≤
E(K) ≤ n(1 + є)p(x, y, z) (which holds if є 󳰀 is sufficiently small) implies that

P󶁁|K − np(x, y, z)| ≥ єnp(x, y, z)󶁑 ≤ P󶁁|K 󳰀 − np(x, y, z)| ≥ єnp(x, y, z)󶁑
󳰀 2
/(3(1+δ(є 󳰀 )))
≤ 2e −np(x, y,z)(є−δ(є )) ,

where the last inequality follows by the Chernoff bound in Appendix B. Combining the
above inequalities, we have for n sufficiently large
󳰀 󳰀 2 󳰀
P{Z n ∈ A1 ∩ A2 } ≤ 󵠈 22nδ(є ) 2e −np(x, y,z)(є−δ(є )) /(3(1+δ(є )))
n 󰑦󰑧
󳰀 󳰀 2 󳰀
≤ (n + 1)22nδ(є ) 2e −np(x, y,z)(є−δ(є )) /(3(1+δ(є ))) ,

which tends to zero as n → ∞ if є 󳰀 is sufficiently small.

Appendix 12B Proof of Lemma 12.3 317

APPENDIX 12B PROOF OF LEMMA 12.3

For every u2n ∈ Tє(n)

󳰀 (U2 |x2 ),
n

n 󵄨󵄨 n
P󶁁U2n (L2 ) = u2n | X2n = x2n 󶁑 = P󶁁U2n (L2 ) = u2n , U2n (L2 ) ∈ Tє(n) 󳰀 (U2 |x2 ) 󵄨 X2 = x2 󶁑
󵄨
n

n 󵄨󵄨 n
= P󶁁U2n (L2 ) ∈ Tє(n)
󳰀 (U2 |x2 ) 󵄨 X2 = x2 󶁑
󵄨
n

⋅ P󶁁U2n (L2 ) = u2n 󵄨󵄨󵄨󵄨 X2n = x2n , U2n (L2 ) ∈ Tє(n) n

󳰀 (U2 |x2 )󶁑

≤ P󶁁U2n (L2 ) = u2n 󵄨󵄨󵄨󵄨 X2n = x2n , U2n (L2 ) ∈ Tє(n) n

󳰀 (U2 |x2 )󶁑

= 󵠈 P󶁁U2n (L2 ) = u2n , L2 = l2 󵄨󵄨󵄨󵄨 X2n = x2n , U2n (L2 ) ∈ Tє(n) n

󳰀 (U2 |x2 )󶁑
l2

= 󵠈 P󶁁L2 = l2 󵄨󵄨󵄨󵄨 X2n = x2n , U2n (L2 ) ∈ Tє(n) n

󳰀 (U2 |x2 )󶁑

⋅ P󶁁U2n (l2 ) = u2n 󵄨󵄨󵄨󵄨 X2n = x2n , U2n (l2 ) ∈ Tє(n)

l2 n
󳰀 (U2 |x2 ), L2 = l2 󶁑

= 󵠈 P󶁁L2 = l2 󵄨󵄨󵄨󵄨 X2n = x2n , U2n (L2 ) ∈ Tє(n)

(a) n
󳰀 (U2 |x2 )󶁑

⋅ P󶁁U2n (l2 ) = u2n 󵄨󵄨󵄨󵄨 U2n (l2 ) ∈ Tє(n)

l2 n
󳰀 (U2 |x2 )󶁑

(b)
≤ 󵠈 P󶁁L2 = l2 󵄨󵄨󵄨󵄨 X2n = x2n , U2n (L2 ) ∈ Tє(n) n
󳰀 (U2 |x2 )󶁑
l2 󳰀
⋅ 2−n(H(U2 |X2 )−δ(є ))
󳰀
= 2−n(H(U2 |X2 )−δ(є )) ,

where (a) follows since U2n (l2 ) is independent of X2n and U2n (l2󳰀 ) for l2󳰀 ̸= l2 and is con-
ditionally independent of L2 given X2n and the indicator variables of the events {U2n (l2 ) ∈
nR̃ 2
Tє(n)
󳰀 (U2 |x2 )}, l2 ∈ [1 : 2
n
], which implies that the event {U2n (l2 ) = u2n } is conditionally in-
dependent of {X2n = x2n , L2 = l2 } given {U2n (l2 ) ∈ Tє(n) 󳰀 (U2 |x2 )}. Step (b) follows from the
n

properties of typical sequences. Similarly, for every u2n ∈ Tє(n) 󳰀 (U2 |x2 ) and n sufficiently
n

large,
P󶁁U2n (L2 ) = u2n 󵄨󵄨󵄨󵄨 X2n = x2n 󶁑 ≥ (1 − є 󳰀 )2−n(H(U2 |X2 )+δ(є )) .
󳰀

This completes the proof of Lemma ..

APPENDIX 12C PROOF OF LEMMA 12.4

Since D2 ≥ 1 − ρ2 + ρ2 D1 ,

R12 (D1 , D2 ) ⊇ R12 (D1 , 1 − ρ2 + ρ2 D1 ) = 󶁁(R1 , R2 ) : R1 + R2 ≥ R(1/D1 )󶁑.

If (R1 , R2 ) ∈ R1 (D1 ), then

R1 + R2 ≥ д(R2 , D1 ) + R2 ≥ д(0, D1 ) = R(1/D1 ),

since д(R2 , D1 ) + R2 is an increasing function of R2 . Thus, R1 (D1 ) ⊆ R12 (D1 , D2 ).

318 Distributed Lossy Compression

Also note that the rate regions R1 (D1 ) and R2 (D2 ) can be expressed as

1 − ρ2 + ρ2 2−2R2
R1 (D1 ) = 󶁆(R1 , R2 ) : R1 ≥ R 󶀦 󶀶󶁖 ,
D1
ρ2
R2 (D2 ) = 󶁆(R1 , R2 ) : R1 ≥ R 󶀦 󶀶󶁖 .
D2 22R2 − 1 + ρ2

But D2 ≥ 1 − ρ2 + ρ2 D1 implies that

ρ2 1 − ρ2 + ρ2 2−2R2
≤ .
D2 22R2 − 1 + ρ2 D1

Thus, R1 (D1 ) ⊆ R2 (D2 ).

APPENDIX 12D PROOF OF LEMMA 12.6

The proof has three steps. First, we show that K̂ ⪰ 0 is diagonal. Second, we prove the
matrix inequality K̂ ⪯ (K̃ −1 + (1/N)μμT )−1 . Since K̃ ⪯ K(θ), it can be further shown by
the matrix inversion lemma in Appendix B that

−1 (1 − α)D1 (θ − α)󵀄D1 D2
K̂ ⪯ 󶀡K −1 (θ) + (1/N)μμT 󶀱 = 󶀜
󶀄 󶀝,
󶀅
(θ − α)󵀄D1 D2 (1 − α)D2

where α = (1 + θ)2 /(2(1 + θ) + N). Finally, combining the above matrix inequality with
the fact that K̂ is diagonal, we show that

D1 D2 N 2 (1 + θ)2
| K̂ | ≤ D1 D2 (1 + θ − 2α)2 = .
(2(1 + θ) + N)2

Step . Since M1 → X1n → Y n → X2n → M2 form a Markov chain,

E 󶁡(X1i − E(X1i |Y n , M1 , M2 ))(X2i 󳰀 − E(X2i󳰀 |Y n , M1 , M2 ))󶁱

= E 󶁡(X1i − E(X1i |Y n , X2i 󳰀 , M1 , M2 ))(X2i 󳰀 − E(X2i 󳰀 |Y n , M1 , M2 ))󶁱 = 0

for all i, i 󳰀 ∈ [1 : n]. Thus

n
1
K̂ = 󵠈 KX(i)|Y󰑛 ,M1 ,M2 = diag(β1 , β2 )
n i=1

for some β1 , β2 > 0.

̃ = (Yi , X(i)),
Step . Let Y(i) ̂ ̂ = E(X(i)|M1 , M2 ) for i ∈ [1 : n], and
where X(i)

n n −1
1 1
A = 󶀧 󵠈 KX(i),Y(i)
̃ 󶀷 󶀧 󵠈 KY(i)
̃ 󶀷 .
n i=1 n i=1
Appendix 12D Proof of Lemma 12.6 319

Then
n (a) 1 n
1
󵠈K 󰑛 ⪯ 󵠈 KX(i)−AY(i)
̃
n i=1 X(i)|Y ,M1 ,M2 n i=1
n n n −1 n
(b)1 1 1 1
= 󶀧 󵠈 KX(i) 󶀷 − 󶀧 󵠈 KX(i)Y(i)
̃ 󶀷 󶀧 󵠈 KY(i)
̃ 󶀷 󶀧 󵠈 KY(i)X(i)
̃ 󶀷
n i=1 n i=1 n i=1 n i=1
−1
̃
̃ 󶁧 μ KX μ + N μT (KX − K) μT K X
T
(c)
= KX − 󶁢KX μ KX − K󶁲 󶁷 󶁧 󶁷
̃
(KX − K)μ KX − K̃ KX − K̃
(d) −1
= 󶀡K̃ −1 + (1/N)μμT 󶀱
(e) −1
⪯ 󶀡K −1 (θ) + (1/N)μμT 󶀱 ,

where (a) follows by the optimality of the MMSE estimate E(X(i)|Yn , M1 , M2 ) (compared
̃
to the estimate AY(i)), (b) follows by the definition of the matrix A, (c) follows since
(1/n) ∑ni=1 KX(i)
̂ = KX − ̃ (d) follows by the matrix inversion lemma in Appendix B,
K,
̃
and (e) follows since K ⪯ K(θ). Substituting for μ and N, we have shown

(1 − α)D1 (θ − α)󵀄D1 D2
K̂ = diag(β1 , β2 ) ⪯ 󶀄
󶀜 󶀝,
󶀅
(θ − α)󵀄D1 D2 (1 − α)D2

where α = (1 + θ)2 /(2(1 + θ) + N).

Step . We first note by simple algebra that if b1 , b2 ≥ 0 and

b1 0 a c
󶁦 󶁶⪯󶁦 󶁶,
0 b2 c a

then b1 b2 ≤ (a − c)2 . Now let Λ = diag(1/󵀄D1 , −1/󵀄D2 ). Then

(1 − α)D1 (θ − α)󵀄D1 D2 1−α α−θ

Λ diag(β1 , β2 )Λ ⪯ Λ 󶀄
󶀜 󶀝Λ = 󶀄
󶀅 󶀜 󶀝.
󶀅
(θ − α)󵀄D1 D2 (1 − α)D2 α−θ 1−α

Therefore
β1 β2 2
≤ 󶀡(1 − α) − (α − θ)󶀱 ,
D1 D2

or equivalently,
β1 β2 ≤ D1 D2 (1 + θ − 2α)2 .

Plugging in α and simplifying, we obtain the desired inequality.

CHAPTER 13

Multiple Description Coding

We consider the problem of generating two descriptions of a source such that each de-
scription by itself can be used to reconstruct the source with some desired distortion and
the two descriptions together can be used to reconstruct the source with a lower dis-
tortion. This problem is motivated by the need to efficiently communicate multimedia
content over networks such as the Internet. Consider the following two scenarios:
∙ Path diversity: Suppose we wish to send a movie to a viewer over a network that suffers
from data loss and delays. We can send multiple copies of the same description of the
movie to the viewer via different paths in the network. Such replication, however, is
inefficient and the viewer does not benefit from receiving more than one copy of the
description. Multiple description coding provides a more efficient means to achieve
such “path diversity.” We generate multiple descriptions of the movie, so that if the
viewer receives only one of them, the movie can be reconstructed with some accept-
able quality, and if the viewer receives two of them, the movie can be reconstructed
with a higher quality and so on.
∙ Successive refinement: Suppose we wish to send a movie with different levels of quality
to different viewers. We can send a separate description of the movie to each viewer.
These descriptions, however, are likely to have significant overlaps. Successive refine-
ment, which is a special case of multiple description coding, provides a more efficient
way to distribute the movie. The idea is to send the lowest quality description and
successive refinements of it (instead of additional full descriptions). Each viewer then
uses the lowest quality description and some of the successive refinements to recon-
struct the movie at her desired level of quality.
The optimal scheme for generating multiple descriptions is not known in general. We
present the El Gamal–Cover coding scheme for generating two descriptions that are in-
dividually good but still carry additional information about the source when combined
together. The proof of achievability uses the multivariate covering lemma in Section ..
We show that this scheme is optimal for the quadratic Gaussian case. The key to the con-
verse is the identification of a common-information random variable. We then present an
improvement on the El Gamal–Cover scheme by Zhang and Berger that involves sending
an additional common description. Finally, we briefly discuss extensions of these results
to more than two descriptions. We will continue the discussion of multiple description
coding in Chapter .
13.1 Multiple Description Coding for a DMS 321

13.1 MULTIPLE DESCRIPTION CODING FOR A DMS

Consider the multiple description coding setup for a DMS X and three distortion mea-
sures d j (x, x̂ j ), j = 0, 1, 2, depicted in Figure .. Each encoder generates a description
of X so that decoder 1 that receives only description M1 can reconstruct X with distortion
D1 , decoder 2 that receives only description M2 can reconstruct X with distortion D2 , and
decoder  that receives both descriptions can reconstruct X with distortion D0 . We wish
to find the optimal tradeoff between the description rate pair (R1 , R2 ) and the distortion
triple (D0 , D1 , D2 ).
A (2nR1 , 2nR2 , n) multiple description code consists of
∙ two encoders, where encoder  assigns an index m1 (x n ) ∈ [1 : 2nR1 ) and encoder 
assigns an index m2 (x n ) ∈ [1 : 2nR2 ) to each sequence x n ∈ X n , and
∙ three decoders, where decoder  assigns an estimate x̂1n to each index m1 , decoder 
assigns an estimate x̂2n to each index m2 , and decoder  assigns an estimate x̂0n to each
index pair (m1 , m2 ).
A rate–distortion quintuple (R1 , R2 , D0 , D1 , D2 ) is said to be achievable (and a rate
pair (R1 , R2 ) is said to be achievable for distortion triple (D0 , D1 , D2 )) if there exists a
sequence of (2nR1 , 2nR2 , n) codes with

lim sup E(d j (X n , X̂ nj )) ≤ D j , j = 0, 1, 2.

n→∞

The rate–distortion region R(D0 , D1 , D2 ) for multiple description coding is the closure of
the set of rate pairs (R1 , R2 ) such that (R1 , R2 , D0 , D1 , D2 ) is achievable.
The rate–distortion region for multiple description coding is not known in general.
The difficulty is that two good individual descriptions must be close to the source and
so must be highly dependent. Thus the second description contributes little extra in-
formation beyond the first one. At the same time, to obtain a better reconstruction by
combining two descriptions, they must be far apart and so must be highly independent.
Two independent descriptions, however, cannot be individually good in general.

M1 ( X̂ 1n , D1 )
Encoder  Decoder 

Xn ( X̂ 0n , D0 )
Decoder 

( X̂ 2n , D2 )
Encoder  Decoder 
M2

Figure .. Multiple description coding setup.

322 Multiple Description Coding

13.2 SIMPLE SPECIAL CASES

First consider the following special cases of the general multiple description problem.
No combined reconstruction. Suppose D0 = ∞, that is, decoder  does not exist as de-
picted in Figure .. Then the rate–distortion region for distortion pair (D1 , D2 ) is the
set of rate pairs (R1 , R2 ) such that

R1 ≥ I(X; X̂ 1 ),
(.)
R2 ≥ I(X; X̂ 2 ),

for some conditional pmf p(x̂1 |x)p(x̂2 |x) that satisfies the constraints E(d j (X, X̂ j )) ≤ D j ,
j = 1, 2. This rate–distortion region is achieved by generating two reconstruction code-
books independently and performing joint typicality encoding separately to generate each
description.

M1 ( X̂ 1n , D1 )
Encoder  Decoder 
Xn

M2 ( X̂ 2n , D2 )
Encoder  Decoder 

Figure .. Multiple description coding with no combined reconstruction.

Single description with two reconstructions. Suppose that R2 = 0, that is, one descrip-
tion is used to generate two reconstructions X̂ 1n and X̂ 0n as depicted in Figure .. Then
the rate–distortion function is

R(D0 , D1 ) = min I(X; X̂ 0 , X̂ 1 ), (.)

̂ 󰑗 ))≤D 󰑗 , j=0,1
p(x̂0 , x̂1 |x):E(d 󰑗 (X, X

and is achieved by generating the reconstruction sequences jointly and performing joint
typicality encoding to find a pair that is jointly typical with the source sequence.

( X̂ 1n , D1 )
Decoder 
Xn M1
Encoder 
( X̂ 0n , D0 )
Decoder 

Figure .. Single description with two reconstructions.

13.3 El Gamal–Cover Inner Bound 323

Combined description only. Now suppose D1 = D2 = ∞, that is, decoders  and  do

not exist, as depicted in Figure .. Then the rate–distortion region for distortion D0 is
the set of rate pairs (R1 , R2 ) such that

R1 + R2 ≥ I(X; X̂ 0 ) (.)

for some conditional pmf p(x̂0 |x) that satisfies the constraint E(d0 (X, X̂ 0 )) ≤ D0 . This
rate–distortion region is achieved by rate splitting. We generate a single description with
rate R1 + R2 as in point-to-point lossy source coding and divide it into two independent
indices with rates R1 and R2 , respectively.

M1
Encoder 
Xn ( X̂ 0n , D0 )
Decoder 
M2
Encoder 

Figure .. Multiple descriptions with combined description only.

The optimality of the above rate regions follows by noting that each region coincides
with the following simple outer bound.
Outer bound. Fix a distortion triple (D0 , D1 , D2 ). Following similar steps to the converse
proof of the lossy source coding theorem, we can readily establish an outer bound on the
rate–distortion region that consists of all rate pairs (R1 , R2 ) such that

R1 ≥ I(X; X̂ 1 ),
R2 ≥ I(X; X̂ 2 ), (.)
R1 + R2 ≥ I(X; X̂ 0 , X̂ 1 , X̂ 2 )

for some conditional pmf p(x̂0 , x̂1 , x̂2 |x) that satisfies the constraints E(d j (X, X̂ j )) ≤ D j ,
j = 0, 1, 2. This outer bound, however, is not tight in general.

13.3 EL GAMAL–COVER INNER BOUND

As we have seen, generating the reconstructions independently and performing joint typ-
icality encoding separately is optimal when decoder  does not exist. When it does, how-
ever, this scheme is inefficient because three separate descriptions are sent. At the other
extreme, generating the reconstructions jointly is optimal when either decoder  or  does
not exist, but is inefficient when both exist because the same description is sent twice. The
following inner bound is achieved by generating two descriptions that are individually
good, but different enough to provide additional information to decoder .
324 Multiple Description Coding

Theorem . (El Gamal–Cover Inner Bound). Let X be a DMS and d j (x, x̂ j ), j =
0, 1, 2, be three distortion measures. A rate pair (R1 , R2 ) is achievable with distortion
triple (D0 , D1 , D2 ) for multiple description coding if

R1 > I(X; X̂ 1 |Q),

R2 > I(X; X̂ 2 |Q),
R1 + R2 > I(X; X̂ 0 , X̂ 1 , X̂ 2 |Q) + I( X̂ 1 ; X̂ 2 |Q)

for some conditional pmf p(q)p(x̂0 , x̂1 , x̂2 |x, q) with |Q| ≤ 6 such that E(d j (X, X̂ j )) ≤
D j , j = 0, 1, 2.

It is easy to verify that the above inner bound is tight for all the special cases discussed
in Section .. In addition, the El Gamal–Cover inner bound is tight for the following
nontrivial special cases:
∙ Successive refinement: In this case d2 ≡ 0, that is, decoder  does not exist and the
reconstruction at decoder  is a refinement of the reconstruction at decoder . We
discuss this case in detail in Section ..
∙ Semideterministic distortion measure: Consider the case where d1 (x, x̂1 ) = 0 if x̂1 =
д(x) and d1 (x, x̂1 ) = 1, otherwise, and D1 = 0. In other words, X̂ 1 recovers some
function д(X) losslessly.
∙ Quadratic Gaussian: Let X ∼ N(0, 1) and each d j , j = 0, 1, 2, be the squared error
(quadratic) distortion measure.
In addition to these special cases, the El Gamal–Cover inner bound is tight when there
is no excess rate, that is, when the rate pair (R1 , R2 ) satisfies the condition R1 + R2 = R(D0 ),
where R(D0 ) is the rate–distortion function for X with the distortion measure d0 evalu-
ated at D0 . It can be shown that R(D1 , D2 ) ∩ {(R1 , R2 ) : R1 + R2 = R(D0 )} is contained
in the closure of the El Gamal–Cover inner bound for each distortion triple (D0 , D1 , D2 ).
The no-excess rate case is illustrated in the following.
Example . (Multiple descriptions of a Bernoulli source). Consider the multiple de-
scription coding problem for a Bern(1/2) source and Hamming distortion measures d0 ,
d1 , and d2 . Suppose that there is no excess rate with R1 = R2 = 1/2 and D0 = 0. Symmetry
suggests that D1 = D2 = D. Define

Dmin = inf󶁁D : (R1 , R2 ) = (1/2, 1/2) is achievable for distortion (0, D, D)󶁑.

Note that D0 = 0 requires the two descriptions, each at rate 1/2, to be independent. The
lossy source coding theorem in this case gives R = 1/2 ≥ 1 − H(D) is necessary, i.e., Dmin ≥
0.11.
First consider the following simple scheme. We split the source sequence x n into two
equal length subsequences (e.g., the odd and the even entries). Encoder  sends the index
13.3 El Gamal–Cover Inner Bound 325

of the first subsequence and encoder  sends the index of the second. This scheme achieves
individual distortions D1 = D2 = D = 1/4; hence Dmin ≤ D = 1/4. Can we do better?
Consider the El Gamal–Cover inner bound with X̂ 1 and X̂ 2 independent Bern(1/󵀂2 )
random variables, X = X̂ 1 ⋅ X̂ 2 , and X̂ 0 = X. Thus, X ∼ Bern(1/2) as it should be. Substi-
tuting in the inner bound rate constraints, we obtain

1 1
R1 ≥ I(X; X̂ 1 ) = H(X) − H(X | X̂ 1 ) = 1 − H 󶀥 󶀵 ≈ 0.383,
󵀂2 󵀂2
R2 ≥ I(X; X̂ 2 ) ≈ 0.383,
R1 + R2 ≥ I(X; X̂ 0 , X̂ 1 , X̂ 2 ) + I( X̂ 1 ; X̂ 2 )
= H(X) − H(X | X̂ 0 , X̂ 1 , X̂ 2 ) + I( X̂ 1 ; X̂ 2 ) = 1 − 0 + 0 = 1.

The expected distortions are

E(d(X, X̂ 1 )) = P{ X̂ 1 ̸= X}
󵀂2 − 1
= P{ X̂ 1 = 0, X = 1} + P{ X̂ 1 = 1, X = 0} = ,
2
󵀂2 − 1
E(d(X, X̂ 2 )) = ,
2
E(d(X, X̂ 0 )) = 0.

Thus, the rate pair (R1 , R2 ) = (1/2, 1/2) is achievable with distortion triple (D1 , D2 , D0 ) =
((󵀂2 − 1)/2, (󵀂2 − 1)/2, 0). It turns out that Dmin = (󵀂2 − 1)/2 ≈ 0.207 is indeed optimal.

The El Gamal–Cover inner bound is not tight in general when there is excess rate even
for the Bern(1/2) source and Hamming distortion measures as discussed in Section ..

13.3.1 Proof of the El Gamal–Cover Inner Bound

The idea of the proof is to generate two descriptions from which we can obtain arbitrarily
correlated reconstructions that satisfy the distortion constraints individually and jointly.
The proof uses the multivariate covering lemma in Section .. We prove achievability for
|Q| = 1; the rest of the proof follows by time sharing.
Rate splitting. Divide index M j , j = 1, 2, into two independent indices M0 j at rate R0 j
and M j j at rate R j j . Thus, R j = R0 j + R j j for j = 1, 2. Define R0 = R01 + R02 .

Codebook generation. Fix a conditional pmf p(x̂0 , x̂1 , x̂2 |x) such that E(d j (X, X̂ j )) ≤
D j /(1 + є), j = 0, 1, 2. For j = 1, 2, randomly and independently generate 2nR 󰑗 󰑗 sequences
x̂nj (m j j ), m j j ∈ [1 : 2nR 󰑗 󰑗 ], each according to ∏ni=1 p X̂ 󰑗 (x̂ ji ). For each (m11 , m22 ) ∈ [1 :
2nR11 ] × [1 : 2nR22 ], randomly and conditionally independently generate 2nR0 sequences
x̂0n (m11 , m22 , m0 ), m0 ∈ [1 : 2nR0 ], each according to ∏ni=1 p X̂ | X̂ , X̂ (x̂0i |x̂1i (m11 ), x̂2i (m22 )).
0 1 2

Encoding. Upon observing x , the encoder finds an index triple (m11 , m22 , m0 ) such that
n
326 Multiple Description Coding

(x n , x̂1n (m11 ), x̂2n (m22 ), x̂0n (m11 , m22 , m0 )) ∈ Tє(n) . If no such triple exists, the encoder sets
(m11 , m22 , m0 ) = (1, 1, 1). It then represents m0 ∈ [1 : 2nR0 ] by (m01 , m02 ) ∈ [1 : 2nR01 ] ×
[1 : 2nR02 ] and sends (m11 , m01 ) to decoder , (m22 , m02 ) to decoder , and (m11 , m22 , m0 )
to decoder .
Decoding. Given the index pair (m11 , m01 ), decoder  declares x̂1n (m11 ) as its reconstruc-
tion of x n . Similarly, given (m22 , m02 ), decoder  declares x̂2n (m22 ) as its reconstruction
of x n , and given (m11 , m22 , m0 ), decoder  declares x̂0n (m11 , m22 , m0 ) as its reconstruction
of x n .
Analysis of expected distortion. Let (M11 , M22 , M0 ) denote the index triple for the re-
construction codeword triple ( X̂ 1n , X̂ 2n , X̂ 0n ). Define the “error” event
E = 󶁁(X n , X̂ 1n (M11 ), X̂ 2n (M22 ), X̂ 0n (M11 , M22 , M0 )) ∉ Tє(n) 󶁑
and consider the following events:
E1 = 󶁁(X n , X̂ 1n (m11 ), X̂ 2n (m22 )) ∉ Tє(n) for all (m11 , m22 ) ∈ [1 : 2nR11 ] × [1 : 2nR22 ]󶁑,
E2 = 󶁁(X n , X̂ 1n (M11 ), X̂ 2n (M22 ), X̂ 0n (M11 , M22 , m0 )) ∉ Tє(n) for all m0 ∈ [1 : 2nR0 ]󶁑.
Then by the union of events bound,
P(E) ≤ P(E1 ) + P(E1c ∩ E2 ).
By the multivariate covering lemma for the -DMS (X, X̂ 1 , X̂ 2 ) with r3 = 0, P(E1 ) tends
to zero as n → ∞ if
R11 > I(X; X̂ 1 ) + δ(є),
R22 > I(X; X̂ 2 ) + δ(є), (.)
R11 + R22 > I(X; X̂ 1 , X̂ 2 ) + I( X̂ 1 ; X̂ 2 ) + δ(є).
By the covering lemma, P(E1c ∩ E2 ) tends to zero as n → ∞ if R0 > I(X; X̂ 0 | X̂ 1 , X̂ 2 ) + δ(є).
Eliminating R11 , R22 , and R0 by the Fourier–Motzkin procedure in Appendix D, it can be
shown that P(E) tends to zero as n → ∞ if
R1 > I(X; X̂ 1 ) + δ(є),
R2 > I(X; X̂ 2 ) + δ(є), (.)
R1 + R2 > I(X; X̂ 0 , X̂ 1 , X̂ 2 ) + I( X̂ 1 ; X̂ 2 ) + 2δ(є).
Now by the law of total expectation and the typical average lemma, the asymptotic dis-
tortions averaged over random codebooks are bounded as
lim sup E(d j (X n , X̂ nj )) ≤ D j , j = 0, 1, 2,
n→∞

if the inequalities on (R1 , R2 ) in (.) are satisfied. Using time sharing establishes the
achievability of every rate pair (R1 , R2 ) that satisfies the inequalities in the theorem for
some conditional pmf p(q)p(x̂0 , x̂1 , x̂2 |x, q) such that E(d j (X, X̂ j )) < D j , j = 0, 1, 2. Fi-
nally, using the continuity of mutual information completes the proof with nonstrict dis-
tortion inequalities.
13.4 Quadratic Gaussian Multiple Description Coding 327

13.4 QUADRATIC GAUSSIAN MULTIPLE DESCRIPTION CODING

Consider the multiple description coding problem for a WGN(P) source X and squared
error distortion measures d0 , d1 , and d2 . Without loss of generality, we assume that
0 < D0 ≤ D1 , D2 ≤ P. The El Gamal–Cover inner bound is tight in this case.

Theorem .. The rate–distortion region R(D0 , D1 , D2 ) for multiple description cod-
ing of a WGN(P) source X and squared error distortion measures is the set of rate pairs
(R1 , R2 ) such that

P
R1 ≥ R 󶀥 󶀵,
D1
P
R2 ≥ R 󶀥 󶀵 ,
D2
P
R1 + R2 ≥ R 󶀥 󶀵 + Δ(P, D0 , D1 , D2 ),
D0

where Δ = Δ(P, D0 , D1 , D2 ) is

󶀂 0 if D0 ≤ D1 + D2 − P,
󶀒
󶀒
󶀒R((PD )/(D D )) if D0 ≥ 1/(1/D1 + 1/D2 − 1/P),
Δ=󶀊 0 1 2
󶀒
󶀒 (P−D0 )2
󶀒
󶀚R 󶀥 2󶀵 otherwise.
(P−D0 )2 −(󵀄(P−D1 )(P−D2 )−󵀄(D1 −D0 )(D2 −D0 ))

13.4.1 Proof of Achievability

We set (X, X̂ 0 , X̂ 1 , X̂ 2 ) to be jointly Gaussian with

Dj
X̂ j = 󶀥1 − 󶀵 (X + Z j ), j = 0, 1, 2,
P

where (Z0 , Z1 , Z2 ) is a zero-mean Gaussian random vector independent of X with covari-

ance matrix

N0 N0 N0
K =󶀄
󶀔
󶀜 N0 N1 N0 + ρ󵀄(N1 − N0 )(N2 − N0 )󶀅
󶀕
󶀝,
N0 N0 + ρ󵀄(N1 − N0 )(N2 − N0 ) N2

and
PD j
Nj = , j = 0, 1, 2.
P − Dj

Note that X → X̂ 0 → ( X̂ 1 , X̂ 2 ) form a Markov chain for all ρ ∈ [−1, 1].

We divide the rest of the proof into two parts.
328 Multiple Description Coding

High distortion: D1 + D2 ≥ P + D0 . Note that by relaxing the simple outer bound in

Section ., any achievable rate pair (R1 , R2 ) must satisfy the inequalities
P
R1 ≥ R 󶀥
󶀵,
D1
P
R2 ≥ R 󶀥 󶀵 ,
D2
P
R1 + R2 ≥ R 󶀥 󶀵 .
D0
Surprisingly these rates are achievable under the high distortion condition D1 + D2 ≥
P + D0 . Under this condition and the standing assumption 0 < D0 ≤ D1 , D2 ≤ P, it can
be easily verified that (N1 − N0 )(N2 − N0 ) ≥ (P + N0 )2 . Thus, there exists ρ ∈ [−1, 1] such
that N0 + ρ󵀄(N1 − N0 )(N2 − N0 ) = −P. This shows that X̂ 1 and X̂ 2 can be made inde-
pendent of each other, while achieving E(d(X, X̂ j )) = D j , j = 0, 1, 2, which proves the
achievability of the simple outer bound.
Low distortion: D1 + D2 < P + D0 . Under this low distortion condition, the consequent
dependence of the descriptions causes an increase in the total description rate R1 + R2
beyond R(P/D0 ). Consider the above choice of ( X̂ 0 , X̂ 1 , X̂ 2 ) along with ρ = −1. Then it
can be shown by a little algebra that
I( X̂ 1 ; X̂ 2 ) = Δ(P, D0 , D1 , D2 ),
which proves that the rate region in Theorem . is achievable.
To complete the proof of achievability, we use the discretization procedure described
in Section ..

13.4.2 Proof of the Converse

We only need to consider the low distortion case, where D0 + P > D1 + D2 . Again the in-
equalities R j ≥ R(P/D j ), j = 1, 2, follow immediately by the lossy source coding theorem.
To bound the sum-rate, let Yi = Xi + Zi , where {Zi } is a WGN(N) process independent
of {Xi }. Then
n(R1 + R2 ) ≥ H(M1 , M2 ) + I(M1 ; M2 )
= I(X n ; M1 , M2 ) + I(Y n , M1 ; M2 ) − I(Y n ; M2 |M1 )
≥ I(X n ; M1 , M2 ) + I(Y n ; M2 ) − I(Y n ; M2 |M1 )
= I(X n ; M1 , M2 ) + I(Y n ; M2 ) + I(Y n ; M1 ) − I(Y n ; M1 , M2 )
= 󶀡I(X n ; M1 , M2 ) − I(Y n ; M1 , M2 )󶀱 + I(Y n ; M1 ) + I(Y n ; M2 ). (.)
We first lower bound the second and third terms. Consider
n
I(Y n ; M1 ) ≥ 󵠈 I(Yi ; X̂ 1i )
i=1
n
≥ 󵠈󶀡h(Yi ) − h(Yi − X̂ 1i )󶀱
i=1
13.4 Quadratic Gaussian Multiple Description Coding 329

n
1 P+N
≥󵠈 log 󶀦 󶀶
i=1 2 E((Yi − X̂ 1i )2 )
(a) n P+N
≥ log 󶀥 󶀵,
2 D1 + N
where (a) follows by Jensen’s inequality and the fact that
n n
1 1
󵠈 E((Yi − X̂ 1i )2 ) ≤ 󵠈 E((Xi − X̂ 1i )2 ) + N ≤ D1 + N.
n i=1 n i=1

n N
≥ log 󶀥1 + 󶀵.
2 D0
Hence
n P(D0 + N)
I(X n ; M1 , M2 ) − I(Y n ; M1 , M2 ) ≥ log 󶀥 󶀵.
2 (P + N)D0
Combining these inequalities and continuing with the lower bound on the sum-rate, we
have
1 P 1 (P + N)(D0 + N)
R1 + R2 ≥ log 󶀥 󶀵 + log 󶀥 󶀵.
2 D0 2 (D1 + N)(D2 + N)
Finally we maximize this sum-rate bound over N ≥ 0 by taking
+
D D − D0 P + 󵀄(D1 − D0 )(D2 − D0 )(P − D1 )(P − D2 )
N=󶁦 1 2 󶁶 , (.)
P + D0 − D1 − D2
which yields the desired inequality
P
R1 + R2 ≥ R 󶀥 󶀵 + Δ(P, D0 , D1 , D2 ).
D0
This completes the proof of Theorem ..

Remark 13.1. It can be shown that if (R1 , R2 ) ∈ R(D0 , D1 , D2 ) and D0 ≤ D1 + D2 − P,

then (R1 , R2 ) ∈ R(D0 , D1∗ , D2∗ ) for some D1∗ ≤ D1 and D2∗ ≤ D2 such that D0 = D1∗ + D2∗ −
P. Also, if (R1 , R2 ) ∈ R(D0 , D1 , D2 ) and D0 ≥ 1/(1/D1 + 1/D2 − 1/P), then (R1 , R2 ) ∈
R(D0∗ , D1 , D2 ), where D0∗ = 1/(1/D1 + 1/D2 − 1/P) ≤ D0 . Note that these two cases (high
330 Multiple Description Coding

distortion and low distortion with D0 ≥ 1/(1/D1 + 1/D2 − 1/P)) correspond to N = ∞

and N = 0 in (.), respectively.
Remark 13.2. It can be shown that the optimal Y satisfies the Markov chain relation-
ship X̂ 1 → Y → X̂ 2 , where X̂ 1 and X̂ 2 are the reconstructions specified in the proof of
achievability. In addition, the optimal Y minimizes I( X̂ 1 , X̂ 2 ; Y) over all choices of the
form Y = X + Z such that X̂ 1 → Y → X̂ 2 , where Z is independent of (X, X̂ 1 , X̂ 2 ). The
minimized mutual information represents “common information" between the optimal
reconstructions for the given distortion triple; see Section . for a detailed discussion of
common information.

13.5 SUCCESSIVE REFINEMENT

Consider the special case of multiple description coding depicted in Figure ., where de-
coder  does not exist, or equivalently, d2 (x, x2 ) ≡ 0. Since there is no standalone decoder
for the second description M2 , there is no longer a tension between the two descriptions.
As such, the second description can be viewed as a refinement of the first description that
helps decoder  achieve a lower distortion. However, a tradeoff still exists between the two
descriptions because if the first description is optimal for decoder , that is, if it achieves
the rate–distortion function for d1 , the first and second descriptions combined may not
be optimal for decoder .

M1 ( X̂ 1n , D1 )
Encoder  Decoder 
Xn
( X̂ 0n , D0 )
Encoder  Decoder 
M2

Figure .. Successive refinement.

The El Gamal–Cover inner bound is tight for this case.

Theorem .. The successive refinement rate–distortion region R(D0 , D1 ) for a

DMS X and distortion measures d0 and d1 is the set of rate pairs (R1 , R2 ) such that

R1 ≥ I(X; X̂ 1 ),
R1 + R2 ≥ I(X; X̂ 0 , X̂ 1 )

for some conditional pmf p(x̂0 , x̂1 |x) that satisfies the constraints E(d j (X, X̂ j )) ≤ D j ,
j = 0, 1.
13.5 Successive Reﬁnement 331

The proof of achievability follows immediately from the proof of Theorem . by set-
ting Q = X̂ 2 = . The proof of the converse is also straightforward.

13.5.1 Successively Reﬁnable Sources

Consider the successive refinement of a source under a common distortion measure d1 =
d0 = d. If a rate pair (R1 , R2 ) is achievable with distortion pair (D0 , D1 ) for D0 ≤ D1 , then
we must have
R1 ≥ R(D1 ),
R1 + R2 ≥ R(D0 ),

where R(D) = min p(x̂|x):E(d(X, X))≤D

̂
̂ is the rate–distortion function for a single de-
I(X; X)
scription.
In some cases, (R1 , R2 ) = (R(D1 ), R(D0 ) − R(D1 )) is actually achievable for all D0 ≤
D1 and there is no loss of optimality in describing the source successively by a coarse
description and a refinement of it. Such a source is referred to as successively refinable.

Example 13.2 (Bernoulli source with Hamming distortion). A Bern(p) source X is

successively refinable under Hamming distortion measures d0 and d1 . This is shown by
considering a cascade of backward binary symmetric text channels

X = X̂ 0 ⊕ Z0 = ( X̂ 1 ⊕ Z1 ) ⊕ Z0 ,

where Z0 ∼ Bern(D0 ) and Z1 ∼ Bern(D 󳰀 ) such that D 󳰀 ∗ D0 = D1 . Note that X̂ 1 → X̂ 0 →

X form a Markov chain.
Example 13.3 (Gaussian source with squared error distortion). Consider a WGN(P)
source with squared error distortion measures. We can achieve R1 = R(P/D1 ) and R1 +
R2 = R(P/D0 ) (or equivalently, R2 = R(D1 /D0 )) simultaneously by taking a cascade of
backward Gaussian test channels

X = X̂ 0 + Z0 = ( X̂ 1 + Z1 ) + Z0 ,

where X̂ 1 ∼ N(0, P − D1 ), Z1 ∼ N(0, D1 − D0 ), and Z0 ∼ N(0, D0 ) are mutually indepen-

dent and X̂ 0 = X̂ 1 + Z1 . Again X̂ 1 → X̂ 0 → X form a Markov chain.

Successive refinability of the Gaussian source can be shown more directly by the fol-
lowing heuristic argument. By the quadratic Gaussian source coding theorem in Sec-
tion ., using rate R1 , the source sequence X n can be reconstructed using X̂ 1n with er-
ror Y n = X n − X̂ 1n and distortion D1 = (1/n) ∑ni=1 E(Yi2 ) ≤ P2−2R1 . Then, using rate R2 ,
the error sequence Y n can be reconstructed using X̂ 0n with distortion D0 ≤ D1 2−2R2 ≤
P2−2(R1 +R2 ) . Subsequently, the error of the error, the error of the error of the error, and
so on can be successively described, providing further refinement of the source. Each
stage represents a quadratic Gaussian source coding problem. Thus, successive refinement
for a WGN source is, in a sense, dual to successive cancellation for the Gaussian MAC
and BC.
332 Multiple Description Coding

In general, we can establish the following result on successive refinability.

Proposition .. A DMS X is successively refinable under a common distor-

tion measure d = d0 = d1 iff for every D0 ≤ D1 there exists a conditional pmf
p(x̂0 , x̂1 |x) = p(x̂0 |x)p(x̂1 | x̂0 ) such that p(x̂0 |x) and p(x̂1 |x) attain the rate–distortion
functions R(D0 ) and R(D1 ), respectively; in other words, X → X̂ 0 → X̂ 1 form a
Markov chain, and
E(d(X, X̂ 0 )) ≤ D0 ,
E(d(X, X̂ 1 )) ≤ D1 ,
I(X; X̂ 0 ) = R(D0 ),
I(X; X̂ 1 ) = R(D1 ).

13.6 ZHANG–BERGER INNER BOUND

The Zhang–Berger inner bound extends the El Gamal–Cover inner bound by adding a
common description.

Theorem . (Zhang–Berger Inner Bound). Let X be a DMS and d j (x, x̂ j ),

j = 0, 1, 2, be three distortion measures. A rate pair (R1 , R2 ) is achievable for distor-
tion triple (D0 , D1 , D2 ) if

R1 > I(X; X̂ 1 , U ),
R2 > I(X; X̂ 2 , U ),
R1 + R2 > I(X; X̂ 0 , X̂ 1 , X̂ 2 |U ) + 2I(U ; X) + I( X̂ 1 ; X̂ 2 |U)

for some conditional pmf p(u, x̂0 , x̂1 , x̂2 |x) with |U | ≤ |X | + 5 such that E(d j (X, X̂ j )) ≤
D j , j = 0, 1, 2.

Note that the Zhang–Berger inner bound is convex. It is also easy to see that it con-
tains the El Gamal–Cover inner bound (by taking U = ). This containment can be
strict. Consider a Bern(1/2) source and Hamming distortion measures. If (D0 , D1 , D2 ) =
(0, 0.1, 0.1), the El Gamal–Cover coding scheme cannot achieve (R1 , R2 ) such that R1 +
R2 ≤ 1.2564. By comparison, the Zhang–Berger inner bound contains a rate pair (R1 , R2 )
with R1 + R2 = 1.2057. It is not known, however, if this inner bound is tight in general.
Proof of Theorem . (outline). We first generate a common description of the source
and then refine it using the El Gamal–Cover coding scheme to obtain the two descriptions.
The common description is sent by both encoders and each encoder sends a different
refinement in addition. In the following, we describe the coding scheme in more detail.
Divide each index M j , j = 1, 2, into a common index M0 at rate R0 and independent
private indices M0 j at rate R0 j and M j j at rate R j j . Thus R j = R0 + R j j + R0 j , j = 1, 2. Fix
13.6 Zhang–Berger Inner Bound 333

p(u, x̂1 , x̂2 , x̂0 |x) such that E(d(X, X̂ j )) ≤ D j /(1 + є), j = 0, 1, 2. Randomly and indepen-
dently generate 2nR0 sequences un (m0 ), m0 ∈ [1 : 2nR0 ], each according to ∏ni=1 pU (ui ). For
each m0 ∈ [1 : 2nR0 ], randomly and conditional independently generate 2nR11 sequences
x̂1n (m0 , m11 ), m11 ∈ [1 : 2nR11 ], each according to ∏ni=1 p X̂ 1 |U (x̂1i |ui (m0 )), and 2nR22 se-
quences x̂2n (m0 , m22 ), m22 ∈ [1 : 2nR22 ], each according to ∏ni=1 p X̂ 2 |U (x̂2i |ui (m0 )). For each
(m0 , m11 , m22 ) ∈ [1 : 2nR0 ] × [1 : 2nR11 ] × [1 : 2nR22 ], randomly and conditionally indepen-
dently generate 2n(R01 +R02 ) sequences x̂0n (m0 , m11 , m22 , m01 , m02 ), (m01 , m02 ) ∈ [1 : 2nR01 ] ×
[1 : 2nR02 ], each according to ∏ni=1 p X̂ 0 |U , X̂ 1 , X̂ 2 (x̂0i |ui (m0 ), x̂1i (m0 , m11 ), x̂2i (m0 , m22 )).
Upon observing x n , the encoders find an index tuple (m0 , m11 , m22 , m01 , m02 ) such
that (x n, un (m0 ), x̂1n (m0 , m11), x̂2n (m0 , m22 ), x̂0n (m0 , m11 , m22 , m01 , m02 )) ∈ Tє(n). Encoder 
sends (m0 , m11 , m01 ) and encoder  sends (m0 , m22 , m02 ). By the covering lemma and the
multivariate covering lemma, it can be shown that a rate tuple (R0 , R11 , R22 , R01 , R02 ) is
achievable if
R0 > I(X; U ) + δ(є),
R11 > I(X; X̂ 1 |U) + δ(є),
R22 > I(X; X̂ 2 |U) + δ(є),
R11 + R22 > I(X; X̂ 1 , X̂ 2 |U) + I( X̂ 1 ; X̂ 2 |U) + δ(є),
R01 + R02 > I(X; X̂ 0 |U , X̂ 1 , X̂ 2 ) + δ(є)

for some conditional pmf p(u, x̂0 , x̂1 , x̂2 |x) such that E(d j (X, X̂ j )) < D j , j = 0, 1, 2. Fi-
nally, using Fourier–Motzkin elimination and the continuity of mutual information, we
arrive at the inequalities in the theorem.
Remark . (An equivalent characterization). We show that the Zhang–Berger inner
bound is equivalent to the set of rate pairs (R1 , R2 ) such that
R1 > I(X; U0 , U1 ),
R2 > I(X; U0 , U2 ), (.)
R1 + R2 > I(X; U1 , U2 |U0 ) + 2I(U0 ; X) + I(U1 ; U2 |U0 )
for some conditional pmf p(u0 , u1 , u2 |x) and functions x̂0 (u0 , u1 , u2 ), x̂1 (u0 , u1 ), and
x̂2 (u0 , u2 ) that satisfy the constraints

E󶁡d0 (X, x̂0 (U0 , U1 , U2 ))󶁱 ≤ D0 ,

E󶁡d1 (X, x̂1 (U0 , U1 ))󶁱 ≤ D1 ,
E󶁡d2 (X, x̂2 (U0 , U2 ))󶁱 ≤ D2 .
Clearly region (.) is contained in the Zhang–Berger inner bound in Theorem .. We
now show that region (.) contains the Zhang–Berger inner bound. Consider the fol-
lowing corner point of the Zhang–Berger inner bound:

R1 = I(X; X̂ 1 , U),
R2 = I(X; X̂ 0 , X̂ 2 | X̂ 1 , U ) + I(X; U) + I( X̂ 1 ; X̂ 2 |U)
334 Multiple Description Coding

for some conditional pmf p(u, x̂0 , x̂1 , x̂2 |x). By the functional representation lemma in
Appendix B, there exists a random variable W independent of (U , X̂ 1 , X̂ 2 ) such that X̂ 0 is
a function of (U , X̂ 1 , X̂ 2 , W) and X → (U , X̂ 0 , X̂ 1 , X̂ 2 ) → W. Let U0 = U , U1 = X̂ 1 , and
U2 = (W , X̂ 2 ). Then

and X̂ 1 , X̂ 2 , and X̂ 0 are functions of U1 , U2 , and (U0 , U1 , U2 ), respectively. Hence, (R1 , R2 )

is also a corner point of the region in (.). The rest of the proof follows by time sharing.

SUMMARY

∙ Multiple description coding setup

∙ El Gamal–Cover inner bound:
∙ Generation of two descriptions that are individually good and jointly better
∙ Use of the multivariate covering lemma
∙ Quadratic Gaussian multiple description coding:
∙ El Gamal–Cover inner bound is tight
∙ Individual rate–distortion functions can be achieved at high distortion
∙ Identification of a common-information random variable in the proof of the con-
verse
∙ Successive refinement
∙ Zhang–Berger inner bound adds a common description to El Gamal–Cover coding
∙ Open problems:
13.1. Is the Zhang–Berger inner bound tight?
13.2. What is the multiple description rate–distortion region for a Bern(1/2) source
and Hamming distortion measures?

BIBLIOGRAPHIC NOTES

The multiple description coding problem was formulated by A. Gersho and H. .S. Witsen-
hausen, and initially studied by Witsenhausen (), Wolf, Wyner, and Ziv (), and
Problems 335

Witsenhausen and Wyner (). El Gamal and Cover () established Theorem .
and evaluated the inner bound for the quadratic Gaussian case in Theorem .. Ozarow
() proved the converse of Theorem .. Chen, Tian, Berger, and Hemami ()
proposed a successive quantization scheme for the El Gamal–Cover inner bound. Berger
and Zhang () considered the Bernoulli source with Hamming distortion measures in
Example . and showed that the El Gamal–Cover inner bound is tight when there is no
excess rate. Ahlswede () established the optimality for the general no excess rate case.
The rate–distortion region for the semideterministic case is due to Fu and Yeung ().
Theorem . is due to Equitz and Cover () and Rimoldi (). Proposition .
and Examples . and . also appeared in Equitz and Cover (). The inner bound
in Theorem . is due to Venkataramani, Kramer, and Goyal (). The equivalence to
the original characterization in (.) by Zhang and Berger () is due to Wang, Chen,
Zhao, Cuff, and Permuter (). Venkataramani, Kramer, and Goyal () extended the
El Gamal–Cover and Zhang–Berger coding schemes to k descriptions and 2k−1 decoders.
This extension is optimal in several special cases, including k-level successive refinement
(Equitz and Cover ) and quadratic Gaussian multiple descriptions with individual
decoders, each of which receives its own description, and a central decoder that receives
all descriptions (Chen ). It is not known, however, if these extensions are optimal in
general (Puri, Pradhan, and Ramchandran ).

PROBLEMS

.. Establish the rate–distortion region for no combined reconstruction in (.).

.. Establish the outer bound on the rate–distortion region in (.).
.. Consider the multiple description coding setup, where d1 (x, x̂1 ) = 0 if x̂1 = д(x)
and d1 (x, x̂1 ) = 1, otherwise, and D1 = 0. Show that the El Gamal–Cover inner
bound is tight for this case.
.. Complete the details of the achievability proof of Theorem ..
.. Gaussian auxiliary random variable. Consider a jointly Gaussian triple (X, X̂ 1 , X̂ 2 )
that attains the rate–distortion region for quadratic Gaussian multiple description
in Theorem .. Let Y = X + Z, where Z ∼ N(0, N) is independent of (X, X̂ 1 , X̂ 2 )
and N is given by (.).
(a) Show that X̂ 1 → Y → X̂ 2 form a Markov chain.
(b) Let Ỹ = X + Z̃ such that Z̃ is independent of (X, X̂ 1 , X̂ 2 ) and E( Z̃ 2 ) = N. Show
that I( X̂ 1 , X̂ 2 ; Y) ≤ I( X̂ 1 , X̂ 2 ; Y).
̃

.. Prove the converse for Theorem ..

.. Prove the sufficient and necessary condition for successive refinability in Proposi-
tion ..
CHAPTER 14

Joint Source–Channel Coding

In Chapters  through , we studied reliable communication of independent messages

over noisy single-hop networks (channel coding), and in Chapters  through , we stud-
ied the dual setting of reliable communication of uncompressed sources over noiseless
single-hop networks (source coding). These settings are special cases of the more general
information flow problem of reliable communication of uncompressed sources over noisy
single-hop networks. As we have seen in Section ., separate source and channel coding
is asymptotically sufficient for communicating a DMS over a DMC. Does such separation
hold in general for communicating a k-DMS over a DM single-hop network?
In this chapter, we show that such separation does not hold in general. Thus in some
multiuser settings it is advantageous to perform joint source–channel coding. We demon-
strate this breakdown in separation through examples of lossless communication of a
-DMS over a DM-MAC and over a DM-BC.
For the DM-MAC case, we show that joint source–channel coding can help commu-
nication by utilizing the correlation between the sources to induce statistical cooperation
between the transmitters. We present a joint source–channel coding scheme that out-
performs separate source and channel coding. We then show that this scheme can be
improved when the sources have a common part, that is, a source that both senders can
agree on with probability one.
For the DM-BC case, we show that joint source–channel coding can help communi-
cation by utilizing the statistical compatibility between the sources and the channel. We
first consider a separate source and channel coding scheme based on the Gray–Wyner
source coding system and Marton’s channel coding scheme. The optimal rate–region for
the Gray–Wyner system naturally leads to several definitions of common information be-
tween correlated sources. We then describe a joint source–channel coding scheme that
outperforms the separate Gray–Wyner and Marton coding scheme.
Finally, we present a general single-hop network that includes as special cases many of
the multiuser source and channel settings we discussed in previous chapters. We describe
a hybrid source–channel coding scheme for this network.

14.1 LOSSLESS COMMUNICATION OF A 2-DMS OVER A DM-MAC

Consider the multiple access communication system depicted in Figure ., where a
-DMS (U1 , U2 ) is to be communicated losslessly over a -sender DM-MAC p(y|x1 , x2 ).
14.1 Lossless Communication of a 2-DMS over a DM-MAC 337

k
U1 1 X1n
Encoder 
Û 1 1 , Û 2 2
k k
Yn
p(y|x1 , x2 ) Decoder
k
U2 2 X2n
Encoder 

Figure .. Communication of a -DMS over a -sender DM-MAC.

A (|U1 |k1 , |U2 |k2 , n) joint source–channel code of rate pair (r1 , r2 ) = (k1 /n, k2 /n) for
this setup consists of
k
∙ two encoders, where encoder j = 1, 2 assigns a sequence x nj (u j 󰑗 ) ∈ X jn to each se-
k k
quence u j 󰑗 ∈ U j 󰑗 , and
k k ̂ k1 × U
∙ a decoder that assigns an estimate (û11 , û22 ) ∈ U ̂ k2 to each sequence y n ∈ Y n .
1 2

The probability of error is defined as Pe(n) = P{(Û 1 1 , Û 2 2 ) ̸= (U1 1 , U2 2 )}. We say that the
k k k k

sources are communicated losslessly over the DM-MAC if there exists a sequence of
(|U1 |k1 , |U2 |k2 , n) codes such that limn→∞ Pe(n) = 0. The problem is to find the necessary
and sufficient condition for lossless communication. For simplicity, we assume henceforth
the rates r1 = r2 = 1 symbol/transmission.
First consider the following sufficient condition for separate source and channel cod-
ing. We know that the capacity region C of the DM-MAC is the set of rate pairs (R1 , R2 )
such that

R1 ≤ I(X1 ; Y | X2 , Q),
R2 ≤ I(X2 ; Y | X1 , Q),
R1 + R2 ≤ I(X1 , X2 ; Y |Q)

for some pmf p(q)p(x1 |q)p(x2 |q). We also know from the Slepian–Wolf theorem that
the optimal rate region R ∗ for distributed lossless source coding is the set of rate pairs
(R1 , R2 ) such that

R1 ≥ H(U1 |U2 ),
R2 ≥ H(U2 |U1 ),
R1 + R2 ≥ H(U1 , U2 ).

Hence, if the intersection of the interiors of C and R ∗ is not empty, that is, there exists a
pmf p(q)p(x1 |q)p(x2 |q) such that

H(U1 |U2 ) < I(X1 ; Y | X2 , Q),

H(U2 |U1 ) < I(X2 ; Y | X1 , Q), (.)
H(U1 , U2 ) < I(X1 , X2 ; Y |Q),
338 Joint Source–Channel Coding

then the -DMS (U1 , U2 ) can be communicated losslessly over the DM-MAC using sep-
arate source and channel coding. The encoders use Slepian–Wolf coding (binning) to
encode (U1n , U2n ) into the bin indices (M1 , M2 ) ∈ [1 : 2nR1 ] × [1 : 2nR2 ]. The encoders then
transmit the codeword pair (x1n (M1 ), x2n (M2 )) selected from a randomly generated chan-
nel codebook; see Section .. The decoder first performs joint typicality decoding to
find (M1 , M2 ) and then recovers (U1n , U2n ) by finding the unique jointly typical sequence
pair in the product bin with index pair (M1 , M2 ). Since the rate pair (R1 , R2 ) satisfies
the conditions for both lossless source coding and reliable channel coding, the end-to-
end probability of error tends to zero as n → ∞. Note that although the joint pmf on
(M1 , M2 ) is not necessarily uniform, the message pair can still be reliably transmitted to
the receiver if (R1 , R2 ) ∈ C (see Problem .).
Consider the following examples for which this sufficient condition for separate source
and channel coding is also necessary.

Example 14.1 (MAC with orthogonal components). Let (U1 , U2 ) be an arbitrary -

DMS and p(y|x1 , x2 ) = p(y1 |x1 )p(y2 |x2 ) be a DM-MAC with output Y = (Y1 , Y2 ) that
consists of two separate DMCs, p(y1 |x1 ) with capacity C1 and p(y2 |x2 ) with capacity C2 .
The sources can be communicated losslessly over this MAC if
H(U1 |U2 ) < C1 ,
H(U2 |U1 ) < C2 ,
H(U1 , U2 ) < C1 + C2 .
Conversely, if one of the following inequalities is satisfied:
H(U1 |U2 ) > C1 ,
H(U2 |U1 ) > C2 ,
H(U1 , U2 ) > C1 + C2 ,
then the sources cannot be communicated losslessly over the channel. Thus source–
channel separation holds for this case.
Example 14.2 (Independent sources). Let U1 and U2 be independent sources with en-
tropies H(U1 ) and H(U2 ), respectively, and p(y|x1 , x2 ) be an arbitrary DM-MAC. Source
channel separation again holds in this case. That is, the sources can be communicated
losslessly over the DM-MAC by separate source and channel coding if
H(U1 ) < I(X1 ; Y | X2 , Q),
H(U2 ) < I(X2 ; Y | X1 , Q),
H(U1 ) + H(U2 ) < I(X1 , X2 ; Y |Q),
for some pmf p(q)p(x1 |q)p(x2 |q), and the converse holds in general.

Does source–channel separation hold in general for lossless communication of an ar-

bitrary -DMS (U1 , U2 ) over an arbitrary DM-MAC p(y|x1 , x2 )? In other words, is it al-
ways the case that if the intersection of R ∗ and C is empty for a -DMS and a DM-MAC,
14.1 Lossless Communication of a 2-DMS over a DM-MAC 339

then the -DMS cannot be communicated losslessly over the DM-MAC? To answer this
question, consider the following.
Example .. Let (U1 , U2 ) be a -DMS with U1 = U2 = {0, 1}, pU1 ,U2 (0, 0) = pU1 ,U2 (0, 1) =
pU1 ,U2 (1, 1) = 1/3, and pU1 ,U2 (1, 0) = 0. Let p(y|x1 , x2 ) be a binary erasure MAC with X1 =
X2 = {0, 1}, Y = {0, 1, 2}, and Y = X1 + X2 (see Example .). The optimal rate region R ∗
for this -DMS and the capacity region of the binary erasure MAC are sketched in Fig-
ure .. Note that the intersection of these two regions is empty since H(U1 , U2 ) = log 3 =
1.585 and max p(x1 )p(x2 ) I(X1 , X2 ; Y) = 1.5. Hence, H(U1 , U2 ) > max p(x1 )p(x2 ) I(X1 , X2 ; Y)
and (U1 , U2 ) cannot be communicated losslessly over the erasure DM-MAC using sepa-
rate source and channel coding.
Now consider an uncoded transmission scheme in which the encoders transmit X1i =
U1i and X2i = U2i in time i ∈ [1 : n]. It is easy to see that this scheme achieves error-free
communication! Thus using separate source and channel coding for sending a -DMS
over a DM-MAC is not optimal in general.

1 R∗

R1
1
Figure .. Separate source and channel coding fails since R ∗ ∩ C = .

A general necessary and sufficient condition for lossless communication of a -DMS

over a DM-MAC is not known. In the following we present joint source–channel coding
schemes that include as special cases the aforementioned separate source and channel
coding scheme and the uncoded transmission scheme in Example ..

14.1.1 A Joint Source–Channel Coding Scheme

We establish the following sufficient condition for lossless communication of a -DMS
over a DM-MAC.

Theorem .. A -DMS (U1 , U2 ) can be communicated losslessly over a DM-MAC

p(y|x1 , x2 ) at rates r1 = r2 = 1 if

H(U1 |U2 ) < I(X1 ; Y |U2 , X2 , Q),

H(U2 |U1 ) < I(X2 ; Y |U1 , X1 , Q),
340 Joint Source–Channel Coding

H(U1 , U2 ) < I(X1 , X2 ; Y |Q)

This theorem recovers the following as special cases:

∙ Separate source and channel coding: We set p(x1 |u1 , q)p(x2 |u2 , q) = p(x1 |q)p(x2 |q),
that is, (X1 , X2 , Q) is independent of (U1 , U2 ). Then, the set of inequalities in the
theorem simplifies to (.).
∙ Example .: Set Q = , X1 = U1 , and X2 = U2 .

14.1.2 Proof of Theorem 14.1

We establish achievability for |Q| = 1; the rest of the proof follows by time sharing.
Codebook generation. Fix a conditional pmf p(x1 |u1 )p(x2 |u2 ). For each u1n ∈ U1n , ran-
domly and independently generate a sequence x1n (u1n ) according to ∏ni=1 p X1 |U1 (x1i |u1i ).
Similarly, generate a sequence x2n (u2n ), u2n ∈ U2n , according to ∏ni=1 p X2 |U2 (x2i |u2i ).
Encoding. Upon observing u1n , encoder  transmits x1n (u1n ). Similarly encoder  transmits
x2n (u2n ). Note that with high probability, no more than 2n(H(U1 ,U2 )+δ(є)) codeword pairs
(x1n , x2n ) can simultaneously occur.
Decoding. The decoder declares (û 1n , û 2n ) to be the source pair estimate if it is the unique
pair such that (û1n , û2n , x1n (û n1 ), x2n (û2n ), y n ) ∈ Tє(n) ; otherwise it declares an error.
Analysis of the probability of error. The decoder makes an error iff one or more of the
following events occur:

E1 = 󶁁(U1n , U2n , X1n (U1n ), X2n (U2n ), Y n ) ∉ Tє(n) 󶁑,

u1n , U2n , X1n (ũ 1n ), X2n (U2n ), Y n ) ∈ Tє(n) for some ũ1n ̸= U1n 󶁑,
E2 = 󶁁(̃
E3 = 󶁁(U1n , ũ 2n , X1n (U1n ), X2n (ũ2n ), Y n ) ∈ Tє(n) for some ũ2n ̸= U2n 󶁑,
u1n , ũ 2n , X1n (ũ 1n ), X2n (ũ2n ), Y n ) ∈ Tє(n) for some ũ1n ̸= U1n , ũ2n ̸= U2n 󶁑.
E4 = 󶁁(̃

Thus, the average probability of error is upper bounded as

P(E) ≤ P(E1 ) + P(E2 ) + P(E3 ) + P(E4 ).

By the LLN, P(E1 ) tends to zero as n → ∞. Next consider the second term. By the union
of events bound,

P(E2 ) = 󵠈 p(u1n ) P󶁁(ũ 1n , U2n , X1n (ũ 1n ), X2n (U2n ), Y n ) ∈ Tє(n) for some ũ1n ̸= u1n 󵄨󵄨󵄨󵄨 U1n = u1n 󶁑
󰑛
u1

≤ 󵠈 p(u1n ) 󵠈 P󶁁(ũ1n , U2n , X1n (ũ 1n ), X2n (U2n ), Y n ) ∈ Tє(n) 󵄨󵄨󵄨󵄨 U1n = u1n 󶁑.
u1󰑛 ũ1󰑛 ̸= u󰑛1
14.1 Lossless Communication of a 2-DMS over a DM-MAC 341

Now conditioned on {U1n = u1n }, (U2n , X1n (ũ1n ), X2n (U2n ), Y n ) ∼ p(u2n , x2n , y n |u1n )p(x1n |ũ n1 ) =
∏ni=1 pU2 ,X2 ,Y|U1 (u2i , x2i , yi |u1i )p X1 |U1 (x1i |ũ 1i ) for all ũ 1n ̸= u1n . Thus

P(E2 ) ≤ 󵠈 p(u1n ) 󵠈 p(un2 , x2n , y n |u1n )p(x1n | ũ 1n )

u󰑛1 ũ1󰑛 ̸= u󰑛1
(ũ1󰑛 ,u󰑛2 ,x1󰑛 ,x2󰑛 , y 󰑛 )∈T󰜖(󰑛)

= 󵠈 󵠈 p(u1n , u2n , x2n , y n )p(x1n | ũ 1n )

󰑛 󰑛
u󰑛1 ,u2󰑛 ,x1󰑛 ,x2󰑛 , y󰑛 )∈T󰜖(󰑛) u1 ̸= ũ1
(̃

≤ 󵠈 󵠈 p(un1 , u2n , x2n , y n )p(x1n | ũ1n )

󰑛
u󰑛1 ,u2󰑛 ,x1󰑛 ,x2󰑛 , y󰑛 )∈T󰜖(󰑛) u1
(̃

= 󵠈 p(u2n , x2n , y n )p(x1n | ũ1n )

u󰑛1 ,u2󰑛 ,x1󰑛 ,x2󰑛 , y󰑛 )∈T󰜖(󰑛)
(̃

= 󵠈 p(u2n , x2n , y n ) 󵠈 p(x1n | ũ n1 )

(u󰑛2 ,x2󰑛 , y 󰑛 )∈T󰜖(󰑛) u󰑛1 ,x1󰑛 )∈T󰜖(󰑛) (U1 ,X1 |u2󰑛 ,x2󰑛 , y󰑛 )
(̃

≤ 󵠈 p(x1n | ũ n1 )
u󰑛1 ,x1󰑛 )∈T󰜖(󰑛) (U1 ,X1 |u2󰑛 ,x2󰑛 , y󰑛 )
(̃

≤ 2n(H(U1 ,X1 |U2 ,X2 ,Y)−H(X1 |U1 )+2δ(є)) .

Collecting the entropy terms, we have

H(U1 , X1 |U2 , X2 , Y) − H(X1 |U1 )

where (a) follows since X1 → U1 → U2 → X2 form a Markov chain and (b) follows since
(U1 , U2 ) → (X1 , X2 ) → Y form a Markov chain. Thus P(E2 ) tends to zero as n → ∞ if
H(U1 |U2 ) < I(X1 ; Y |U2 , X2 ) − 2δ(є). Similarly, P(E3 ) and P(E4 ) tend to zero as n → ∞
if H(U2 |U1 ) < I(X2 ; Y|U1 , X1 ) − 2δ(є) and H(U1 , U2 ) < I(X1 , X2 ; Y) − 3δ(є). This com-
pletes the proof of Theorem ..
Suboptimality of the coding scheme. The coding scheme used in the above proof is not
optimal in general. Suppose U1 = U2 = U . Then Theorem . reduces to the sufficient
condition

H(U ) < max I(X1 , X2 ; Y |Q)

p(q)p(x1 |q,u)p(x2 |q,u)

= max I(X1 , X2 ; Y). (.)

p(x1 |u)p(x2 |u)

However, since both senders observe the same source, they can first encode the source
losslessly at rate H(U) and then transmit the source description using cooperative channel
342 Joint Source–Channel Coding

coding; see Problem .. Thus, the source can be communicated losslessly if

H(U) < max I(X1 , X2 ; Y ),

p(x1 ,x2 )

which is a less stringent condition than that in (.). Hence, when U1 and U2 have a
common part, we can improve upon the joint source–channel coding scheme for Theo-
rem .. In the following subsection, we formally define the common part between two
correlated sources. Subsequently, we present separate and joint source–channel coding
schemes that incorporate this common part.

14.1.3 Common Part of a 2-DMS

Let (U1 , U2 ) be a pair of random variables. Arrange p(u1 , u2 ) in a block diagonal form
with the maximum possible number k of nonzero blocks, as shown in Figure .. The
common part between U1 and U2 is the random variable U0 that takes the value u0 if
(U1 , U2 ) is in block u0 ∈ [1 : k]. Note that U0 can be determined by U1 or U2 alone.

u1
u2
u0 = 1 0 0

0 u0 = 2 0

0 0 u0 = k

Figure .. Block diagonal arrangement of the joint pmf p(u1 , u2 ).

Formally, let д1 : U1 → [1 : k] and д2 : U2 → [1 : k] be two functions with the largest

integer k such that P{д1 (U1 ) = u0 } > 0, P{д2 (U2 ) = u0 } > 0 for u0 ∈ [1 : k] and P{д1 (U1 ) =
д2 (U2 )} = 1. The common part between U1 and U2 is defined as U0 = д1 (U1 ) = д2 (U2 ),
which is unique up to relabeling of the symbols.
To better understand this definition, consider the following.
Example .. Let (U1 , U2 ) be a pair of random variables with the joint pmf in Table ..
Here k = 2 and the common part U0 has the pmf pU0 (1) = 0.7 and pU0 (2) = 0.3.

Now let (U1 , U2 ) be a -DMS. What is the common part between the sequences U1n
and U2n ? It turns out that this common part is always U0n (up to relabeling). Thus we say
that U0 is the common part of the -DMS (U1 , U2 ).
14.1 Lossless Communication of a 2-DMS over a DM-MAC 343

u0 = 1 u0 = 2
u1
u2 1 2 3 4
1 0.1 0.2 0 0

u0 = 1 2 0.1 0.1 0 0

3 0.1 0.1 0 0

u0 = 2 4 0 0 0.2 0.1

Table .. Joint pmf for Example ..

14.1.4 Three-Index Separate Source and Channel Coding Scheme

Taking the common part into consideration, we can generalize the -index separate source
and channel coding scheme discussed earlier in this section into a -index scheme. Source
coding is performed by encoding U1n into an index pair (M0 , M1 ) and U2n into an in-
dex pair (M0 , M2 ) such that (U1n , U2n ) can be losslessly recovered from the index triple
(M0 , M1 , M2 ) as depicted in Figure ..

U1n M1
Encoder 
M0
U2n M2
Encoder 

Figure .. Source encoding setup for the -index separate source and channel
coding scheme. The -DMS can be losslessly recovered from (M0 , M1 , M2 ).

Since M0 must be a function only of U0n , it can be easily shown that the optimal rate
region R ∗ is the set of rate triples (R0 , R1 , R2 ) such that

R1 ≥ H(U1 |U2 ),
R2 ≥ H(U2 |U1 ),
(.)
R1 + R2 ≥ H(U1 , U2 |U0 ),
R0 + R1 + R2 ≥ H(U1 , U2 ).

At the same time, the capacity region C for a DM-MAC p(y|x1 , x2 ) with a common mes-
sage (see Problem .) is the set of rate triples (R0 , R1 , R2 ) such that

R1 ≤ I(X1 ; Y | X2 , W),
R2 ≤ I(X2 ; Y | X1 , W),
344 Joint Source–Channel Coding

R1 + R2 ≤ I(X1 , X2 ; Y |W),
R0 + R1 + R2 ≤ I(X1 , X2 ; Y )

for some pmf p(󰑤)p(x1 |󰑤)p(x2 |󰑤), where |W| ≤ min{|X1 |⋅|X2 | + 2, |Y| + 3}. Hence, if
the intersection of the interiors of R ∗ and C is not empty, separate source and channel
coding using three indices can be used to communicate the -DMS losslessly over the
DM-MAC. Note that this coding scheme is not optimal in general as already shown in
Example ..

14.1.5 A Joint Source–Channel Coding Scheme with Common Part

By generalizing the coding schemes in Sections .. and .., we obtain the following
sufficient condition for lossless communication of a -DMS over a DM-MAC.

Theorem .. A -DMS (U1 , U2 ) with common part U0 can be communicated loss-
lessly over a DM-MAC p(y|x1 , x2 ) if

H(U1 |U2 ) < I(X1 ; Y | X2 , U2 , W),

H(U2 |U1 ) < I(X2 ; Y | X1 , U1 , W),
H(U1 , U2 |U0 ) < I(X1 , X2 ; Y |U0 , W),
H(U1 , U2 ) < I(X1 , X2 ; Y )

for some conditional pmf p(󰑤)p(x1 |u1 , 󰑤)p(x2 |u2 , 󰑤).

In this sufficient condition, the common part U0 is represented by the independent

auxiliary random variable W, which is chosen to maximize cooperation between the
senders.

Remark 14.1. Although the auxiliary random variable W represents the common part
U0 , there is no benefit in making it statistically correlated with U0 . This is a consequence
of Shannon’s source–channel separation theorem in Section ..
Remark 14.2. The above sufficient condition does not change by introducing a time-
sharing random variable Q.
Proof of Theorem . (outline). For each u0n , randomly and independently generate
󰑤 n (u0n ) according to ∏ni=1 pW (󰑤i ). For each (u0n , u1n ), randomly and independently gen-
erate x1n (u0n , u1n ) according to ∏ni=1 p X1 |U1 ,W (x1i |u1i , 󰑤i (u0n )). Similarly, for (u0n , u2n ), ran-
domly and independently generate x2n (u0n , u2n ). The decoder declares (û 0n , û 1n , û 2n ) to be the
estimate of the sources if it is the unique triple such that (û n0 , û n1 , û 2n , 󰑤 n (û n0 ), x1n (û0n , û1n ),
x2n (û 0n , û2n ), y n ) ∈ Tє(n) (this automatically implies that û 0n is the common part of û 1n and
û 2n ). Following the steps in the proof of the previous coding scheme, it can be shown that
the above inequalities are sufficient for the probability of error to tend to zero as n → ∞.
Remark .. The above coding scheme is not optimal in general either.
14.2 Lossless Communication of a 2-DMS over a DM-BC 345

14.2 LOSSLESS COMMUNICATION OF A 2-DMS OVER A DM-BC

Now consider the broadcast communication system depicted in Figure ., where a -
DMS (U1 , U2 ) is to be communicated losslessly over a -receiver DM-BC p(y1 , y2 |x). The
definitions of a code, probability of error, and lossless communication for this setup are
along the same lines as those for the MAC case. As before, assume rates r1 = r2 = 1 sym-
bol/transmission.
Since the private-message capacity region of the DM-BC is not known in general (see
Chapter ), the necessary and sufficient condition for lossless communication of a -DMS
over a DM-BC is not known even when the sources are independent. We will show nev-
ertheless that separation does not hold in general for sending a -DMS over a DM-BC.
Consider the following separate source and channel coding scheme for this setup. The
encoder first assigns an index triple (M0 , M1 , M2 ) ∈ [1 : 2nR0 ] × [1 : 2nR1 ] × [1 : 2nR2 ] to the
source sequence pair (U1n , U2n ) such that U1n can be recovered losslessly from the pair of
indices (M0 , M1 ) and U2n can be recovered losslessly from the pair of indices (M0 , M2 ).
The encoder then transmits a codeword x n (M0 , M1 , M2 ) from a channel codebook. De-
coder  first decodes for (M0 , M1 ) and then recovers U1n . Similarly decoder  first decodes
for (M0 , M2 ) and then recovers U2n . The source coding part of this scheme is discussed in
the following subsection.

Y1n Û 1n
Decoder 
U1n , U2n Xn
Encoder p(y1 , y2 |x)
Y2n Û 2n
Decoder 

Figure .. Communication of a -DMS over a -receiver DM-BC.

14.2.1 Gray–Wyner System

The Gray–Wyner system depicted in Figure . is a distributed lossless source coding
setup in which a -DMS (U1 , U2 ) is described by three encoders so that decoder , who
receives the descriptions M0 and M1 , can losslessly recover U1n and decoder , who re-
ceives the descriptions M0 and M2 , can losslessly recover U2n . We wish to find the optimal
rate region for this distributed lossless source coding setup.
A (2nR0 , 2nR1 , 2nR2 , n) code for the Gray–Wyner system consists of

∙ three encoders, where encoder j = 0, 1, 2 assigns the index m j (u1n , u2n ) ∈ [1 : 2nR 󰑗 ) to
each sequence pair (u1n , u2n ) ∈ U1n × U2n , and
∙ two decoders, where decoder  assigns an estimate û1n (m0 , m1 ) to each index pair
346 Joint Source–Channel Coding

M1 Û 1n
Encoder  Decoder 

U1n , U2n M0
Encoder 

M2 Û 2n
Encoder  Decoder 

Figure .. Gray–Wyner system.

(m0 , m1 ) ∈ [1 : 2nR0 ) × [1 : 2nR1 ) and decoder  assigns an estimate û2n (m0 , m2 ) to each
index pair (m0 , m2 ) ∈ [1 : 2nR0 ) × [1 : 2nR2 ).
The probability of error is defined as Pe(n) = P{(Û 1n , Û 2n ) =
̸ (U1n , U2n )}. A rate triple (R0 , R1 ,
R2 ) is said to be achievable if there exists a sequence of (2nR0 , 2nR1 , 2nR2 , n) codes such that
limn→∞ Pe(n) = 0. The optimal rate region R ∗ for the Gray–Wyner system is the closure
of the set of achievable rate triples.
The optimal rate region for the Gray–Wyner system is given in the following.

Theorem .. The optimal rate region R ∗ for the Gray–Wyner system with -DMS
(U1 , U2 ) is the set of rate triples (R0 , R1 , R2 ) such that

R0 ≥ I(U1 , U2 ; V ),
R1 ≥ H(U1 |V ),
R2 ≥ H(U2 |V )

for some conditional pmf p(󰑣|u1 , u2 ) with |V| ≤ |U1 |⋅|U2 | + 2.

The optimal rate region has the following extreme points:

∙ R0 = 0: By taking V = , the region reduces to R1 ≥ H(U1 ) and R2 ≥ H(U2 ).
∙ R1 = 0: By taking V = U1 , the region reduces to R0 ≥ H(U1 ) and R2 ≥ H(U2 |U1 ).
∙ R2 = 0: By taking V = U2 , the region reduces to R0 ≥ H(U2 ) and R1 ≥ H(U1 |U2 ).
∙ (R1 , R2 ) = (0, 0): By taking V = (U1 , U2 ), the region reduces to R0 ≥ H(U1 , U2 ).
Proof of Theorem .. To prove achievability, we use joint typicality encoding to find a
󰑣 n (m0 ), m0 ∈ [1 : 2nR0 ], jointly typical with (u1n , u2n ). The index m0 is sent to both decoders.
Given 󰑣 n (m0 ), we assign indices m1 ∈ [1 : 2nR1 ] and m2 ∈ [1 : 2nR2 ] to the sequences in
Tє(n) (U1 |󰑣 n (m0 )) and Tє(n) (U2 |󰑣 n (m0 )), respectively, and send them to decoders  and ,
respectively. For the proof of the converse, we use standard arguments with the auxiliary
14.2 Lossless Communication of a 2-DMS over a DM-BC 347

random variable identification Vi = (M0 , U1i−1 , U2i−1 ). The cardinality bound on V can be
proved using the convex cover method in Appendix C.

14.2.2 Common Information

A rate triple (R0 , R1 , R2 ) in the optimal rate region for the Gray–Wyner system must satisfy
the inequalities

R0 + R1 ≥ H(U1 ),
R0 + R2 ≥ H(U2 ),
R0 + R1 + R2 ≥ H(U1 , U2 ),
2R0 + R1 + R2 ≥ H(U1 ) + H(U2 ).

Each of these inequalities is tight as seen from the extreme points above. Interestingly, the
corresponding common rate R0 on these extreme points of R ∗ leads to several notions of
common information.

∙ Gács–Körner–Witsenhausen common information. The maximum common rate

R0 subject to R0 + R1 = H(U1 ) and R0 + R2 = H(U2 ) is the entropy H(U0 ) of the com-
mon part between U1 and U2 (as defined in Section ..), denoted by K(U1 ; U2 ).
∙ Mutual information. The maximum common rate R0 subject to 2R0 + R1 + R2 =
H(U1 ) + H(U2 ) is the mutual information I(U1 ; U2 ).
∙ Wyner’s common information. The minimum common rate R0 subject to R0 + R1 +
R2 = H(U1 , U2 ) is
J(U1 ; U2 ) = min I(U1 , U2 ; V ), (.)

where the minimum is over all conditional pmfs p(󰑣|u1 , u2 ) with |V| ≤ |U1 |⋅|U2 | such
that I(U1 ; U2 |V ) = 0, i.e., U1 → V → U2 . Recall that this Markov structure appeared
in the converse proofs for the quadratic Gaussian distributed source coding and mul-
tiple description coding problems in Sections . and ., respectively.

The above three quantities represent common information between the random variables
U1 and U2 in different contexts. The Gács–Körner–Witsenhausen common information
K(X; Y ) captures the amount of common randomness that can be extracted by knowing
U1 and U2 separately. In comparison, Wyner’s common information captures the amount
of common randomness that is needed to generate U1 and U2 separately. Mutual informa-
tion, as we have seen in the Slepian–Wolf theorem, captures the amount of information
about U1 provided by observing U2 and vice versa.
In general, it can be easily shown that

0 ≤ K(U1 ; U2 ) ≤ I(U1 ; U2 ) ≤ J(U1 ; U2 ) ≤ H(U1 , U2 ), (.)

and these inequalities can be strict. Furthermore, K(U1 ; U2 ) = I(U1 ; U2 ) = J(U1 ; U2 ) iff
U1 = (V , V1 ) and U2 = (V , V2 ) for some pmf p(󰑣1 )p(󰑣|󰑣1 )p(󰑣2 |󰑣).
348 Joint Source–Channel Coding

Example 14.5. Let (U1 , U2 ) be a DSBS(p), p ∈ [0, 1/2]. Then it can be easily shown that
J(U1 ; U2 ) = 1 + H(p) − 2H(α), where α ⋆ α = p. The minimum in (.) is attained by
setting V ∼ Bern(1/2), V1 ∼ Bern(α), and V2 ∼ Bern(α) to be mutually independent and
U j = V ⊕ V j , j = 1, 2.
Example 14.6. Let (U1 , U2 ) be binary with p(0, 0) = p(0, 1) = p(1, 1) = 1/3. Then it can
be shown that J(U1 ; U2 ) = 2/3, which is attained by setting V ∼ Bern(1/2), and U1 = 0,
U2 ∼ Bern(2/3) if V = 0, and U1 ∼ Bern(1/3), U2 = 1 if V = 1.

14.2.3 A Separate Source–Channel Coding Scheme

We return to the discussion on sending a -DMS over a DM-BC using separate source
and channel coding. Recall that Marton’s inner bound in Section . is the best-known
inner bound on the capacity region of the DM-BC. Denote this inner bound as R ⊆ C .
Then, a -DMS can be communicated losslessly over a DM-BC if the intersection of the
interiors of Marton’s inner bound R and the optimal rate region R ∗ for the Gray–Wyner
system is not empty, that is, if

I(U1 , U2 ; V ) + H(U1 |V ) < I(W0 , W1 ; Y1 ),

for some pmfs p(󰑣|u1 , u2 ) and p(󰑤0 , 󰑤1 , 󰑤2 ), and function x(󰑤0 , 󰑤1 , x2 ). This separate
source–channel coding scheme is optimal for some classes of sources and channels.

∙ More capable BC: Suppose that Y1 is more capable than Y2 , i.e., I(X; Y1 ) ≥ I(X; Y2 )
for all p(x). Then the -DMS (U1 , U2 ) can be communicated losslessly if

H(U1 , U2 ) < I(X; Y1 ),

H(U1 , U2 ) < I(X; Y1 |W) + I(W ; Y2 ),
H(U2 ) < I(W ; Y2 )

for some pmf p(󰑤, x).

∙ Nested sources: Suppose that U1 = (V1 , V2 ) and U2 = V2 for some (V1 , V2 ) ∼ p(󰑣1 , 󰑣2 ).
Then the -DMS (U1 , U2 ) can be communicated losslessly if

H(V1 , V2 ) = H(U1 ) < I(X; Y1 ),

H(V1 , V2 ) = H(U1 ) < I(X; Y1 |W) + I(W ; Y2 ),
H(V2 ) = H(U2 ) < I(W ; Y2 )

for some pmf p(󰑤, x).

14.2 Lossless Communication of a 2-DMS over a DM-BC 349

In both cases, achievability follows by representing (U1n , U2n ) by a message pair (M1 , M2 )
at rates R2 = H(U2 ) and R1 = H(U1 |U2 ), respectively, and using superposition coding.
The converse proofs are essentially the same as the converse proofs for the more capable
BC and degraded message sets BC, respectively.
Source–channel separation is not optimal in general, however, as demonstrated in the
following.
Example .. Consider the -DMS (U1 , U2 ) with U1 = U2 = {0, 1} and pU1 ,U2 (0, 0) =
pU1 ,U2 (0, 1) = pU1 ,U2 (1, 1) = 1/3 and the Blackwell channel in Example . defined by X =
{0, 1, 2}, Y1 = Y2 = {0, 1}, and pY1 ,Y2 |X (0, 0|0) = pY1 ,Y2 |X (0, 1|1) = pY1 ,Y2 |X (1, 1|2) = 1. The
capacity region of this channel is contained in the set of rate triples (R0 , R1 , R2 ) such that

R0 + R1 ≤ 1,
R0 + R2 ≤ 1,
R0 + R1 + R2 ≤ log 3.

However, as we found in Example ., the sources require R0 ≥ J(U1 ; U2 ) = 2/3 when
R0 + R1 + R2 = log 3, or equivalently, 2R0 + R1 + R2 ≥ log 3 + 2/3 = 2.252, which implies
that R0 + R1 ≥ 1.126 or R0 + R2 ≥ 1.126.
Hence, the intersection of the optimal rate region R ∗ for the Gray–Wyner system and
the capacity region C is empty and this -DMS cannot be communicated losslessly over
the Blackwell channel using separate source and channel coding.
By contrast, setting X = U1 + U2 achieves error-free transmission since Y1 and Y2
uniquely determine U1 and U2 , respectively. Thus joint source–channel coding can strictly
outperform separate source and channel coding for sending a -DMS over a DM-BC.

14.2.4 A Joint Source–Channel Coding Scheme

We describe a general joint source–channel coding scheme that improves upon separate
Gray–Wyner source coding and Marton’s channel coding.

Theorem .. A -DMS (U1 , U2 ) can be communicated losslessly over a DM-BC

p(y1 , y2 |x) if

H(U1 |U2 ) < I(U1 , W0 , W1 ; Y1 ) − I(U1 , W0 , W1 ; U2 ),

H(U2 |U1 ) < I(U2 , W0 , W2 ; Y2 ) − I(U2 , W0 , W2 ; U1 ),
H(U1 , U2 ) < I(U1 , W0 , W1 ; Y1 ) + I(U2 , W2 ; Y2 |W0 ) − I(U1 , W1 ; U2 , W2 |W0 ),
H(U1 , U2 ) < I(U1 , W1 ; Y1 |W0 ) + I(U2 , W0 , W2 ; Y2 ) − I(U1 , W1 ; U2 , W2 |W0 ),
H(U1 , U2 ) < I(U1 , W0 , W1 ; Y1 ) + I(U2 , W0 , W2 ; Y2 ) − I(U1 , W1 ; U2 , W2 |W0 )
− I(U1 , U2 ; W0 )

for some conditional pmf p(󰑤0 , 󰑤1 , 󰑤2 |u1 , u2 ) and function x(u1 , u2 , 󰑤0 , 󰑤1 , 󰑤2 ).

350 Joint Source–Channel Coding

This theorem recovers the following as special cases:

∙ Separate source and channel coding: We set p(󰑤0 , 󰑤1 , 󰑤2 |u0 , u1 ) = p(󰑤0 , 󰑤1 , 󰑤2 ) and
x(u1 , u2 , 󰑤0 , 󰑤1 , 󰑤2 ) = x(󰑤0 , 󰑤1 , 󰑤2 ), i.e., (W0 , W1 , W2 , X) is independent of (U1 , U2 ).
Then, the set of inequalities in the theorem simplifies to (.).
∙ Example .: Set W0 = , W1 = U1 , W2 = U2 , X = U1 + U2 .

Remark .. The sufficient condition in Theorem . does not improve by time sharing.

14.2.5 Proof of Theorem 14.4

Codebook generation. Fix a conditional pmf p(󰑤0 , 󰑤1 , 󰑤2 |u1 , u2 ) and function

x(u1 , u2 , 󰑤0 , 󰑤1 , 󰑤2 ). Randomly and independently generate 2nR0 sequences 󰑤0n (m0 ),
m0 ∈ [1 : 2nR0 ], each according to ∏ni=1 pW0 (󰑤0i ). For each u1n ∈ U1n and m0 ∈ [1 : 2nR0 ],
randomly and independently generate 2nR1 sequences 󰑤1n (u1n , m0 , m1 ), m1 ∈ [1 : 2nR1 ],
each according to ∏ni=1 pW1 |U1 ,W0 (󰑤1i |u1i , 󰑤0i (m0 )). Similarly, for each u2n ∈ U2n and m0 ∈
[1 : 2nR0 ], randomly and independently generate 2nR2 sequences 󰑤2n (un2 , m0 , m2 ), m2 ∈ [1 :
2nR2 ], each according to ∏ni=1 pW2 |U2 ,W0 (󰑤2i |u2i , 󰑤0i (m0 )).
Encoding. For each sequence pair (u1n , u2n ), choose a triple (m0 , m1 , m2 ) ∈ [1 : 2nR0 ] ×
[1 : 2nR1 ] × [1 : 2nR2 ] such that (u1n , u2n , 󰑤0n (m0 ), 󰑤1n (un1 , m0 , m1 ), 󰑤2n (u2n , m0 , m2 )) ∈ Tє(n)
󳰀 .

If there is no such triple, choose (m0 , m1 , m2 ) = (1, 1, 1). Then the encoder transmits
xi = x(u1i , u2i , 󰑤0i (m0 ), 󰑤1i (u1n , m0 , m1 ), 󰑤2i (u2n , m0 , m2 )) for i ∈ [1 : n].
Decoding. Let є > є 󳰀 . Decoder  declares û1n to be the estimate of u1n if it is the unique se-
quence such that (û1n , 󰑤0n (m0 ), 󰑤1n (û1n , m0 , m1 ), y1n ) ∈ Tє(n) for some (m0 , m1 ) ∈ [1 : 2nR0 ] ×
[1 : 2nR1 ]. Similarly, decoder  declares û2n to be the estimate of u2n if it is the unique se-
quence such that (û2n , 󰑤0n (m0 ), 󰑤2n (û2n , m0 , m2 ), y2n ) ∈ Tє(n) for some (m0 , m2 ) ∈ [1 : 2nR0 ] ×
[1 : 2nR2 ].
Analysis of the probability of error. Assume (M0 , M1 , M2 ) is selected at the encoder.
Then decoder  makes an error only if one or more of the following events occur:
E0 = 󶁁(U1n , U2n , W0n (m0 ), W1n (U1n , m0 , m1 ), W2n (U2n , m0 , m2 )) ∉ Tє(n)
󳰀

for all m0 , m1 , m2 󶁑,
E11 = 󶁁(U1n , W0n (M0 ), W1n (U1n , M0 , M1 ), Y1n ) ∉ Tє(n) 󶁑,
u1n , W0n (M0 ), W1n (ũ 1n , M0 , m1 ), Y1n ) ∈ Tє(n) for some ũ1n ̸= U1n , m1 󶁑,
E12 = 󶁁(̃
u1n , W0n (m0 ), W1n (ũ1n , m0 , m1 ), Y1n ) ∈ Tє(n) for some ũ1n ̸= U1n , m0 ̸= M0 , m1 󶁑.
E13 = 󶁁(̃
Thus the probability of error P(E1 ) for decoder  is upper bounded as
P(E1 ) ≤ P(E0 ) + P(E0c ∩ E11 ) + P(E12 ) + P(E13 ).
The first term tends to zero by the following variant of the multivariate covering lemma
in Section ..
14.3 A General Single-Hop Network 351

Lemma .. The probability P(E0 ) tends to zero as n → ∞ if

R0 > I(U1 , U2 ; W0 ) + δ(є 󳰀 ),

R0 + R1 > I(U1 , U2 ; W0 ) + I(U2 ; W1 |U1 , W0 ) + δ(є 󳰀 ),
R0 + R2 > I(U1 , U2 ; W0 ) + I(U1 ; W2 |U2 , W0 ) + δ(є 󳰀 ),
R0 + R1 + R2 > I(U1 , U2 ; W0 ) + I(U2 ; W1 |U1 , W0 ) + I(U1 , W1 ; W2 |U2 , W0 ) + δ(є 󳰀 ).

The proof of this lemma is given in Appendix A.

By the conditional typicality lemma, P(E0c ∩ E11 ) tends to zero as n → ∞. Following
steps similar to the DM-MAC joint source–channel coding, it can be shown that P(E12 )
tends to zero as n → ∞ if H(U1 ) + R1 < I(U1 , W1 ; Y1 |W0 ) + I(U1 ; W0 ) − δ(є), and P(E13 )
tends to zero as n → ∞ if H(U1 ) + R0 + R1 < I(U1 , W0 , W1 ; Y1 ) + I(U1 ; W0 ) − δ(є).
Similarly, the probability of error for decoder  tends to zero as n → ∞ if H(U2 ) +
R2 < I(U2 , W2 ; Y2 |W0 ) + I(U2 ; W0 ) − δ(є) and H(U2 ) + R0 + R2 < I(U2 , W0 , W2 ; Y2 ) +
I(U2 ; W0 ) − δ(є). The rest of the proof follows by combining the above inequalities and
eliminating (R0 , R1 , R2 ) by the Fourier–Motzkin procedure in Appendix D.

14.3 A GENERAL SINGLE-HOP NETWORK

We end our discussion of single-hop networks with a general network model that includes
many of the setups we studied in previous chapters. Consider the -sender -receiver
communication system with general source transmission demand depicted in Figure ..
Let (U1 , U2 ) be a -DMS with common part U0 , p(y1 , y2 |x1 , x2 ) be a DM single-hop
network, and d11 (u1 , û 11 ), d12 (u1 , û12 ), d21 (u2 , û21 ), d22 (u2 , û 22 ) be four distortion mea-
sures. For simplicity, assume transmission rates r1 = r2 = 1 symbol/transmission. Sender 
observes the source sequence U1n and sender  observes the source sequence U2n . Re-
ceiver  wishes to reconstruct (U1n , U2n ) with distortions (D11 , D21 ) and receiver  wishes
to reconstruct (U1n , U2n ) with distortions (D12 , D22 ). We wish to determine the necessary
and sufficient condition for sending the sources within prescribed distortions.
This general network includes the following special cases we discussed earlier.

∙ Lossless communication of a -DMS over a DM-MAC: Assume that Y2 = , d11 and

d21 are Hamming distortion measures, and D11 = D21 = 0. As we have seen, this setup

U1n X1n Y1n Û 11

n ̂n
, U21
Encoder  Decoder 
p(y1 , y2 |x1 , x2 )
U2n X2n Y2n Û 12
n ̂n
, U22
Encoder  Decoder 

Figure .. A general single-hop communication network.

352 Joint Source–Channel Coding

in turn includes as special cases communication of independent and common mes-

sages over a DM-MAC in Problem . and distributed lossless source coding in Chap-
ter .
∙ Lossy communication of a -DMS over a DM-MAC: Assume Y2 =  and relabel d11
as d1 and d21 as d2 . This setup includes distributed lossy source coding discussed in
Chapter  as a special case.
∙ Lossless communication of a -DMS over a DM-BC: Assume that X2 = U2 = , U1 =
(V1 , V2 ), d11 and d21 are Hamming distortion measures on V1 and V2 , respectively,
and D11 = D21 = 0. As we have seen, this setup includes sending private and common
messages over a DM-BC in Chapter  and the Gray–Wyner system in Section .. as
special cases.
∙ Lossy communication of a -DMS over a DM-BC: Assume that X2 = U2 = , and
relabel d11 as d1 and d21 as d2 . This setup includes several special cases of the multiple-
description coding problem in Chapter  such as successive refinement.
∙ Interference channel: Assume that U1 and U2 are independent, d11 and d22 are Ham-
ming distortion measures, and D11 = D22 = 0. This yields the DM-IC in Chapter .

14.3.1 Separate Source and Channel Coding Scheme

We define separate source and channel coding for this general single-hop network as fol-
lows. A (2nR0 , 2nR10 , 2nR11 , 2nR20 , 2nR22 , n) source code consists of
∙ two source encoders, where source encoder  assigns an index triple (m0 , m10 , m11 ) ∈
[1 : 2nR0 ] × [1 : 2nR10 ] × [1 : 2nR11 ] to every u1n and source encoder  assigns an index
triple (m0 , m20 , m22 ) ∈ [1 : 2nR0 ] × [1 : 2nR20 ] × [1 : 2nR22 ] to every u2n (here m0 is a com-
mon index that is a function only of the common part u0n ), and
∙ two source decoders, where source decoder  assigns an estimate (û11n
, û n21 ) to every in-
dex quadruple (m0 , m10 , m11 , m20 ) and source decoder  assigns an estimate (û 12 n
, û22
n
)
to every index quadruple (m0 , m10 , m20 , m22 ).
Achievability and the rate–distortion region R(D11 , D12 , D21 , D22 ) are defined as for other
lossy source coding problems. A (2nR0 , 2nR10 , 2nR11 , 2nR20 , 2nR22 , n) channel code consists
of
∙ five message sets [1 : 2nR0 ], [1 : 2nR10 ], [1 : 2nR11 ], [1 : 2nR20 ], and [1 : 2nR22 ],
∙ two channel encoders, where channel encoder  assigns a codeword x1n (m0 , m10 , m11 )
to every message triple (m0 , m10 , m11 ) ∈ [1 : 2nR0 ] × [1 : 2nR10 ] × [1 : 2nR11 ] and chan-
nel encoder  assigns a codeword x2n (m0 , m20 , m22 ) to every message triple (m0 , m20 ,
m22 ) ∈ [1 : 2nR0 ] × [1 : 2nR20 ] × [1 : 2nR22 ], and
∙ two channel decoders, where channel decoder  assigns an estimate (m ̂ 01 , m
̂ 101 , m
̂ 11 ,
̂
m201 ) to every received sequence y1 and channel decoder  assigns an estimate (m
n
̂ 02 ,
m̂ 102 , m
̂ 202 , m
̂ 22 ) to every received sequence y2 .
n
14.3 A General Single-Hop Network 353

The average probability of error, achievability, and the capacity region C are defined as
for other channel coding settings.
The sources can be communicated over the channel with distortion quadruple (D11 ,
D12 , D21 , D22 ) using separate source and channel coding if the intersection of the interiors
of R(D11 , D12 , D21 , D22 ) and C is nonempty. As we have already seen, source–channel
separation does not hold in general, that is, there are cases where this intersection is empty,
yet the sources can be still communicated over the channel as specified.

14.3.2* A Hybrid Source–Channel Coding Scheme

In separate source and channel coding, channel codewords are conditionally independent
of the source sequences given the descriptions (indices). Hence, the correlation between
the sources is not utilized in channel coding. The hybrid source–channel coding scheme
we discuss here captures this correlation in channel coding, while utilizing the wealth
of known lossy source coding and channel coding schemes. Each sender first performs
source encoding on its source sequences. It then maps the resulting codewords and the
source sequence symbol-by-symbol into a channel input sequence and transmits it. Each
receiver performs channel decoding for the codewords generated through source encod-
ing and then maps the codeword estimates and the received sequence symbol-by-symbol
into reconstructions of the desired source sequences.
For simplicity of presentation, we describe this scheme only for the special case of
lossy communication of a -DMS over a DM-MAC.

Proposition .. Let (U1 , U2 ) be a -DMS and d1 (u1 , û1 ), d2 (u2 , û2 ) be two distortion
measures. The -DMS (U1 , U2 ) can be communicated over a DM-MAC p(y|x1 , x2 ) with
distortion pair (D1 , D2 ) if

I(U1 ; V1 |Q) < I(V1 ; Y , V2 |Q),

for some conditional pmf p(q, 󰑣1 , 󰑣2 |u1 , u2 ) = p(q)p(󰑣1 |u1 , q)p(󰑣2 |u2 , q) and functions
x1 (u1 , 󰑣1 , q), x2 (u2 , 󰑣2 , q), û1 (󰑣1 , 󰑣2 , y, q), and û2 (󰑣1 , 󰑣2 , y, q) such that E(d j (U j , Û j )) ≤
D j , j = 1, 2.

Proof outline. The coding scheme used to prove this proposition is depicted in Fig-
ure .. For simplicity, let Q = . Fix a conditional pmf p(󰑣1 |u1 )p(󰑣2 |u2 ) and functions
x1 (u1 , 󰑣1 ), x2 (u2 , 󰑣2 ), û1 (󰑣1 , 󰑣2 , y), and û 2 (󰑣1 , 󰑣2 , y). For j = 1, 2, randomly and indepen-
dently generate 2nR 󰑗 sequences 󰑣 nj (m j ), m j ∈ [1 : 2nR 󰑗 ], each according to ∏ni=1 pV󰑗 (󰑣 ji ).
Given unj , encoder j = 1, 2 finds an index m j ∈ [1 : 2nR 󰑗 ] such that (unj , 󰑣 nj (m j )) ∈ Tє(n)
󳰀 .

By the covering lemma, the probability of error for this joint typicality encoding step tends
354 Joint Source–Channel Coding

U1n Joint typicality 󰑣1n (M1 ) X1n

x1 (󰑣1 , u1 )
encoder 
p(y|x1 , x2 )
U2n Joint typicality 󰑣2n (M2 ) X2n
x2 (󰑣2 , u2 )
encoder 

̂ 1)
󰑣1n (M Û 1n
û 1 (󰑣1 , 󰑣2 , y)
Yn Joint typicality
decoder
̂ 2)
󰑣2n (M Û 2n
û 2 (󰑣1 , 󰑣2 , y)

Figure .. Hybrid source–channel coding for communicating a -DMS over a

DM-MAC.

to zero as n → ∞ if
R1 > I(U1 ; V1 ) + δ(є 󳰀 ),
R2 > I(U2 ; V2 ) + δ(є 󳰀 ).
Encoder j = 1, 2 then transmits x ji = x j (u ji , 󰑣 ji (m j )) for i ∈ [1 : n]. Upon receiving y n ,
the decoder finds the unique index pair (m ̂ 1, m ̂ 2 ) such that (󰑣1n (m ̂ 2 ), y n ) ∈ Tє(n) .
̂ 1 ), 󰑣2n (m
It can be shown that the probability of error for this joint typicality decoding step tends
to zero as n → ∞ if
R1 < I(V1 ; Y , V2 ) − δ(є),
R2 < I(V2 ; Y , V1 ) − δ(є),
R1 + R2 < I(V1 , V2 ; Y) + I(V1 ; V2 ) − δ(є).
The decoder then sets the reconstruction sequences as û ji = û j (󰑣1i (m
̂ 1 ), 󰑣2i (m
̂ 2 ), yi ), i ∈
[1 : n], for j = 1, 2. Eliminating R1 and R2 and following similar arguments to the achiev-
ability proof for distributed lossy source coding completes the proof.

Remark 14.5. By setting V j = (U j , X j ) and Û j = U j , j = 1, 2, Proposition . reduces to

Theorem ..
Remark 14.6. Due to the dependence between the codebook {U jn (m j ) : m j ∈ [1 : 2nR 󰑗 ]}
and the index M j , j = 1, 2, the analysis of the probability error for joint typicality decoding
requires nontrivial extensions of the packing lemma and the proof of achievability for the
DM-MAC.
Summary 355

Remark 14.7. This hybrid coding scheme can be readily extended to the case of sources
with a common part. It can be extended also to the general single-hop network depicted
in Figure . by utilizing the source coding and channel coding schemes discussed in
previous chapters.

SUMMARY

∙ Source–channel separation does not hold in general for communicating correlated

sources over multiuser channels
∙ Joint source–channel coding schemes that utilize the correlation between the sources
for cooperative transmission
∙ Common part of a -DMS
∙ Gray–Wyner system
∙ Notions of common information:
∙ Gács–Körner–Witsenhausen common information K(X; Y)
∙ Wyner’s common information J(X; Y )
∙ Mutual information I(X; Y )
∙ K(X; Y ) ≤ I(X; Y ) ≤ J(X; Y)
∙ Joint Gray–Wyner–Marton coding for lossless communication of a -DMS over a DM-
BC
∙ Hybrid source–channel coding scheme for a general single-hop network

BIBLIOGRAPHIC NOTES

The joint source–channel coding schemes for sending a -DMS over a DM-MAC in The-
orems . and . are due to Cover, El Gamal, and Salehi (), who also showed via
Example . that source–channel separation does not always hold. The definition of a
common part of a -DMS and its characterization are due to Gács and Körner () and
Witsenhausen (). Dueck (a) showed via an example that the coding scheme used
in the proof of Theorem ., which utilizes the common part, is still suboptimal.
Theorem . is due to Gray and Wyner (), who also established the rate–distortion
region for the lossy case. The definitions of common information and their properties can
be found in Wyner (a). Examples . and . are due to Wyner (a) and Witsen-
hausen (a). Theorem . was established by Han and Costa (); see also Kramer
and Nair (). The proof in Section .. is due to Minero and Kim (). The hybrid
source–channel coding scheme in Section .. was proposed by Lim, Minero, and Kim
(), who also established Proposition ..
356 Joint Source–Channel Coding

PROBLEMS

.. Establish the necessarily condition for lossless communication of an arbitrary

-DMS (U1 , U2 ) over a DM-MAC with orthogonal components p(y1 |x1 )p(y2 |x2 )
in Example ..
.. Consider the -index lossless source coding setup in Section ... Show that the
optimal rate region is given by (.).
.. Provide the details of the proof of Theorem ..
.. Consider the sufficient condition for lossless communication of a -DMS (U1 , U2 )
over a DM-MAC p(y|x1 , x2 ) in Theorem .. Show that the condition does not
change by considering conditional pmfs p(󰑤|u0 )p(x1 |u1 , 󰑤)p(x2 |u2 , 󰑤). Hence,
joint source–channel coding of the common part U0n via the codeword W n does
not help.
.. Provide the details of the proof of Theorem ..
.. Show that the optimal rate region R ∗ of the Gray–Wyner system can be equiva-
lently characterized by the set of rate pairs (R1 , R2 ) such that
R0 + R1 ≥ H(U1 ),
R0 + R2 ≥ H(U2 ),
R0 + R1 + R2 ≥ H(U1 , U2 ).
.. Separate source and channel coding over a DM-BC. Consider the sufficient condi-
tion for lossless communication of a -DMS over a DM-BC via separate source
and channel coding in (.).
(a) Show that, when specialized to a noiseless BC, the condition simplifies to the
set of rate triples (R0 , R1 , R2 ) such that
R0 + R1 ≥ I(U1 , U2 ; V ) + H(U1 |V ),
R0 + R2 ≥ I(U1 , U2 ; V ) + H(U2 |V ),
R0 + R1 + R2 ≥ I(U1 , U2 ; V ) + H(U1 |V ) + H(U2 |V ),
for some conditional pmf p(󰑣|u1 , u2 ).
(b) Show that the above region is equivalent to the optimal rate region for the
Gray–Wyner system in Theorem ..
.. Common information. Consider the optimal rate region R ∗ of the Gray–Wyner
system in Theorem ..
(a) Complete the derivations of the three measures of common information as
extreme points of R ∗ .
(b) Show that the three measures of common information satisfy the inequalities
0 ≤ K(U1 ; U2 ) ≤ I(U1 ; U2 ) ≤ J(U1 ; U2 ) ≤ H(U1 , U2 ).
Problems 357

(c) Show that K(U1 ; U2 ) = I(U1 ; U2 ) = J(U1 ; U2 ) iff U1 = (V , V1 ) and U2 = (V , V2 )

for some (V , V1 , V2 ) ∼ p(󰑣)p(󰑣1 |󰑣)p(󰑣2 |󰑣).
.. Lossy Gray–Wyner system. Consider the Gray–Wyner system in Section .. for
a -DMS (U1 , U2 ) and two distortion measures d1 and d2 . The sources are to
be reconstructed with prescribed distortion pair (D1 , D2 ). Show that the rate–
distortion region R(D1 , D2 ) is the set of rate pairs (R1 , R2 ) such that

R0 ≥ I(U1 , U2 ; V ),
R1 ≥ I(U1 ; Û 1 |V ),
R2 ≥ I(U2 ; Û 2 |V )

for some conditional pmf p(󰑣|u1 , u2 )p(û1 |u1 , 󰑣)p(û2 |u2 , 󰑣) that satisfy the con-
straints E(d j (U j , Û j )) ≤ D j , j = 1, 2.
.. Nested sources over a DM-MAC. Let (U1 , U2 ) be a -DMS with common part U0 =
U2 . We wish to send this -DMS over a DM-MAC p(y|x1 , x2 ) at rates r1 = r2 = r
symbol/transmission. Show that source–channel separation holds for this setting.
Remark: This problem was studied by De Bruyn, Prelov, and van der Meulen
().
.. Nested sources over a DM-BC. Consider the nested -DMS (U1 , U2 ) in the previous
problem. We wish to communicated this -DMS over a DM-BC p(y1 , y2 |x) at rates
r1 = r2 = r. Show that source–channel separation holds again for this setting.
.. Lossy communication of a Gaussian source over a Gaussian BC. Consider a Gauss-
ian broadcast channel Y1 = X + Z1 and Y2 = X + Z2 , where Z1 ∼ N(0, N1 ) and
Z2 ∼ N(0, N2 ) are noise components with N2 > N1 . Assume average power con-
straint P on X. We wish to communicate a WGN(P) source U with mean squared
error distortions D1 to Y1 and D2 to Y2 at rate r = 1 symbol/transmission.

(a) Find the minimum achievable individual distortions D1 and D2 in terms of P,

N1 , and N2 .

(b) Suppose we use separate source and channel coding by first using successive
refinement coding for the quadratic Gaussian source in Example . and then
using optimal Gaussian BC codes for independent messages. Characterize the
set of achievable distortion pairs (D1 , D2 ) using this scheme.

(c) Now suppose we send the source with no coding, i.e., set Xi = Ui for i ∈ [1 : n],
and use the linear MMSE estimate Û 1i at Y1 and Û 2i at Y2 . Characterize the set
of achievable distortion pairs (D1 , D2 ) using this scheme.

(d) Does source–channel separation hold for communicating a Gaussian source

over a Gaussian BC with squared error distortion measure?
358 Joint Source–Channel Coding

APPENDIX 14A PROOF OF LEMMA 14.1

The proof follows similar steps to the mutual covering lemma in Section .. For each
(u1n , u2n ) ∈ Tє(n)
󳰀 (U1 , U2 ), define

A(u1n , u2n ) = 󶁁(m0 , m1 , m2 ) ∈ [1 : 2nR0 ] × [1 : 2nR1 ] × [1 : 2nR2 ] :

(u1n , u2n , W0n (m0 ), W1n (u1n , m0 , m1 ), W2n (u2n , m0 , m2 )) ∈ Tє(n)
󳰀 󶁑.

Then

P(E0 ) ≤ P󶁁(U1n , U2n ) ∉ Tє(n)

󳰀 󶁑 + 󵠈 p(u1n , u2n ) P󶁁|A(un1 , u2n )| = 0󶁑.
(u1󰑛 ,u2󰑛 )∈T (󰑛)
󳰀
󰜖

By the LLN, the first term tends to zero as n → ∞. To bound the second term, recall from
the proof of the mutual covering lemma that

Var(|A(u1n , u2n )|)

P󶁁|A(un1 , u2n )| = 0󶁑 ≤ .
(E(|A(u1n , u2n )|))2

Now, define the indicator function

1 if 󶀡u1n , u2n , W0n (m0 ), W1n (u1n , m0 , m1 ), W2n (u2n , m0 , m2 )󶀱 ∈ Tє(n)

󳰀 ,
E(m0 , m1 , m2 ) = 󶁇
0 otherwise

for each (m0 , m1 , m2 ). We can then write

|A(u1n , u2n )| = 󵠈 E(m0 , m1 , m2 ),

m0 ,m1 ,m2

Let

p1 = E[E(1, 1, 1)]
= P󶁁󶀡un1 , u2n , W0n (m0 ), W1n (u1n , m0 , m1 ), W2n (u2n , m0 , m2 )󶀱 ∈ Tє(n)
󳰀 󶁑,

p2 = E[E(1, 1, 1)E(1, 2, 1)],

p3 = E[E(1, 1, 1)E(1, 1, 2)],
p4 = E[E(1, 1, 1)E(1, 2, 2)],
p5 = E[E(1, 1, 1)E(2, 1, 1)] = E[E(1, 1, 1)E(2, 1, 2)]
= E[E(1, 1, 1)E(2, 2, 1)] = E[E(1, 1, 1)E(2, 2, 2)] = p12 .

Then
E(|A(un1 , u2n )|) = 󵠈 E[E(m0 , m1 , m2 )] = 2n(R0 +R1 +R2 ) p1
m0 ,m1 ,m2
Appendix 14A Proof of Lemma 14.1 359

and

E(|A(u1n , u2n )| 2 ) = 󵠈 E[E(m0 , m1 , m2 )]

m0 ,m1 ,m2

+ 󵠈 󵠈 E[E(m0 , m1 , m2 )E(m0 , m󳰀1 , m2 )]

m0 ,m1 ,m2 m󳰀 ̸=m
1 1

+ 󵠈 󵠈 E[E(m0 , m1 , m2 )E(m0 , m1 , m󳰀2 )]

m0 ,m1 ,m2 m󳰀 ̸=m
2 2

+ 󵠈 󵠈 E[E(m0 , m1 , m2 )E(m0 , m1󳰀 , m2󳰀 )]

m0 ,m1 ,m2 m󳰀 ̸=m ,m󳰀 ̸=m
1 1 2 2

+ 󵠈 󵠈 E[E(m0 , m1 , m2 )E(m0󳰀 , m1󳰀 , m󳰀2 )]

m0 ,m1 ,m2 m󳰀 ̸=m ,m󳰀 ,m󳰀
0 0 1 2

≤ 2n(R0 +R1 +R2 ) p1 + 2n(R0 +2R1 +R2 ) p2 + 2n(R0 +R1 +2R2 ) p3

+ 2n(R0 +2R1 +2R2 ) p4 + 22n(R0 +R1 +R2 ) p5 .

Hence

Var(|A(u1n , u2n )|) ≤ 2n(R0 +R1 +R2 ) p1 + 2n(R0 +2R1 +R2 ) p2 + 2n(R0 +R1 +2R2 ) p3 + 2n(R0 +2R1 +2R2 ) p4 .

Now by the joint typicality lemma, we have

Therefore, P{|A(u1n , u2n )| = 0} tends to zero as n → ∞ if

R0 > I(U1 , U2 ; W0 ) + 3δ(є 󳰀 ),

R0 + R1 > I(U1 , U2 ; W0 ) + I(U2 ; W1 |U1 , W0 ) + 3δ(є 󳰀 ),
R0 + R2 > I(U1 , U2 ; W0 ) + I(U1 ; W2 |U2 , W0 ) + 3δ(є 󳰀 ),
R0 + R1 + R2 > I(U1 , U2 ; W0 ) + I(U2 ; W1 |U1 , W0 ) + I(U1 , W1 ; W2 |U2 , W0 ) + δ(є 󳰀 ).

This completes the proof of Lemma ..

PART III

MULTIHOP NETWORKS
CHAPTER 15

Graphical Networks

So far we have studied single-hop networks in which each node is either a sender or a
receiver. In this chapter, we begin the discussion of multihop networks, where some nodes
can act as both senders and receivers and hence communication can be performed over
multiple rounds. We consider the limits on communication of independent messages over
networks modeled by a weighted directed acyclic graph. This network model represents,
for example, a wired network or a wireless mesh network operated in time or frequency
division, where the nodes may be servers, handsets, sensors, base stations, or routers. The
edges in the graph represent point-to-point communication links that use channel coding
to achieve close to error-free communication at rates below their respective capacities.
We assume that each node wishes to communicate a message to other nodes over this
graphical network. The nodes can also act as relays to help other nodes communicate
their messages. What is the capacity region of this network?
Although communication over such a graphical network is not hampered by noise or
interference, the conditions on optimal information flow are not known in general. The
difficulty arises in determining the optimal relaying strategies when several messages are
to be sent to different destination nodes.
We first consider the graphical multicast network, where a source node wishes to com-
municate a message to a set of destination nodes. We establish the cutset upper bound
on the capacity and show that it is achievable error-free via routing when there is only
one destination, leading to the celebrated max-flow min-cut theorem. When there are
multiple destinations, routing alone cannot achieve the capacity, however. We show that
the cutset bound is still achievable, but using more sophisticated coding at the relays. The
proof of this result involves linear network coding in which the relays perform simple
linear operations over a finite field.
We then consider graphical networks with multiple independent messages. We show
that the cutset bound is tight when the messages are to be sent to the same set of des-
tination nodes (multimessage multicast), and is achieved again error-free using linear
network coding. When each message is to be sent to a different set of destination nodes,
however, neither the cutset bound nor linear network coding is optimal in general.
The aforementioned capacity results can be extended to networks with broadcasting
and cycles. In Chapter , we present a coding scheme that extends network coding to
general networks. This noisy network coding scheme yields an alternative achievability
364 Graphical Networks

proof of the network coding theorem that applies to graphical networks with and without
cycles.

15.1 GRAPHICAL MULTICAST NETWORK

Consider a multicast network modeled by a weighted directed acyclic graph G = (N , E , C)

as depicted in Figure .. Here N = [1 : N] is the set of nodes, E ⊂ N × N is the set
of edges, and C = {C jk : ( j, k) ∈ E} is the set of edge weights. Each node represents a
sender–receiver pair and each edge ( j, k) represents a noiseless communication link with
capacity C jk . Source node  wishes to communicate a message M ∈ [1 : 2nR ] to a set of
destination nodes D ∈ N . Each node k ∈ [2 : N] can also act as a relay to help the source
node communicate its message to the destination nodes. Note that in addition to being
noiseless, this network model does not allow for broadcasting or interference. However,
we do not assume any constraints on the functions performed by the nodes; hence general
relaying operations are allowed.
A (2nR , n) code for a graphical multicast network G = (N , E , C) consists of
∙ a message set [1 : 2nR ],
∙ a source encoder that assigns an index m1 j (m) ∈ [1 : 2nC1 󰑗 ] to each message m ∈ [1 :
2nR ] for each edge (1, j) ∈ E,
∙ a set of relay encoders, where encoder k ∈ [2 : N − 1] assigns an index mkl ∈ [1 : 2nC󰑘󰑙 ]
to each received index tuple (m jk : ( j, k) ∈ E) for each (k, l) ∈ E, and

∙ a set of decoders, where decoder k ∈ D assigns an estimate m ̂ k ∈ [1 : 2nR ] or an error

message e to every received index tuple (m jk : ( j, k) ∈ E).
These coding operations are illustrated in Figure ..
We assume that the message M is uniformly distributed over [1 : 2nR ]. The average

̂j
M
2 j

C12

1 C14 N
M ̂N
M
4
C13

3 k
̂k
M

Figure .. Graphical multicast network.

15.1 Graphical Multicast Network 365

m1 j1 m j1 k mkl1 m j1 k
m m1 j2 m j2 k m j2 k ̂k
m
m1 j3 m j3 k mkl2 m j3 k

(a) (b) (c)

Figure .. Coding operations at the nodes: (a) source encoder, (b) relay encoder,
(c) decoder.

̂ j ̸= M for some j ∈ D}. A rate R is said to

probability of error is defined as Pe(n) = P{M
be achievable if there exists a sequence of (2nR , n) codes such that limn→∞ Pe(n) = 0. The
capacity (maximum flow) of the graphical multicast network is the supremum of the set
of achievable rates.
First consider the following upper bound on the capacity.
Cutset bound. For destination node j ∈ D, define a cut (S , S c ) as a partition of the set of
nodes N such that source node 1 ∈ S and destination node j ∈ S c . The capacity of the
cut is defined as
C(S) = 󵠈 Ckl ,
(k,l)∈E
k∈S, l∈S 󰑐

that is, the sum of the capacities of the edges from the subset of nodes S to its complement
S c . Intuitively, the capacity of this network cannot exceed the smallest cut capacity C(S)
of every cut (S, S c ) and every destination node j ∈ D. We formalize this statement in the
following.

Theorem . (Cutset Bound for the Graphical Multicast Network). The capacity
of the graphical multicast network G = (N , E , C) with destination set D is upper
bounded as
C ≤ min min C(S).
j∈D S⊂N
1∈S, j∈S 󰑐

Proof. To establish this cutset upper bound, consider a cut (S, S c ) such that 1 ∈ S and
j ∈ S c for some j ∈ D. Then for every (2nR , n) code, M ̂ j is a function of M(N , S c ) =
(Mkl : (k, l) ∈ E , l ∈ S ), which in turn is a function of M(S, S c ) = (Mkl : (k, l) ∈ E , k ∈
c

S , l ∈ S c ). Thus by Fano’s inequality,

̂ j ) + nєn
nR ≤ I(M; M
̂ j ) + nєn
≤ H(M
≤ H(M(S, S c )) + nєn
≤ nC(S) + nєn .
366 Graphical Networks

Repeating this argument over all cuts and all destination nodes completes the proof of the
cutset bound.

15.2 CAPACITY OF GRAPHICAL UNICAST NETWORK

The cutset bound is achievable when there is only a single destination node, say, D = {N}.

Theorem . (Max-Flow Min-Cut Theorem). The capacity of the graphical unicast
network G = (N , E , C) with destination node N is

C= min C(S).
S⊂N
1∈S , N∈S 󰑐

Note that it suffices to take the minimum over connected cuts (S , S c ), that is, if k ∈ S
and k ̸= 1, then ( j, k) ∈ E for some j ∈ S, and if j ∈ S c and j ̸= N, then ( j, k) ∈ E for
some k ∈ S c . This theorem is illustrated in the following.
Example .. Consider the graphical unicast network depicted in Figure .. The ca-
pacity of this network is C = 3, with the minimum cut S = {1, 2, 3, 5}, and is achieved by
routing  bit along the path 1 → 2 → 4 → 6 and  bits along the path 1 → 3 → 5 → 6.

S Sc
2 1 4
2 2
1 2 6
M 1 1 ̂
M
2 2
2
3 5
Figure .. Graphical unicast network for Example ..

To prove the max-flow min-cut theorem, we only need to establish achievability, since
the converse follows by the cutset bound.
Proof of achievability. Assume without loss of generality that every node lies on at least
one path from node  to node N. Suppose we allocate rate r jk ≤ C jk to each edge ( j, k) ∈ E
such that
󵠈 r jk = 󵠈 rkl , k ̸= 1, N ,
j:( j,k)∈E l:(k,l)∈E

󵠈 r1k = 󵠈 r jN = R,
k:(1,k)∈E j:( j,N)∈E
15.2 Capacity of Graphical Unicast Network 367

that is, the total incoming information rate at each node is equal to the total outgoing
information rate from it. Then it is straightforward to check that the rate R is achievable
by splitting the message into multiple messages and routing them according to the rate
allocation r jk as in commodity flow. Hence, the optimal rate allocation can be found by
solving the optimization problem

maximize R
subject to 0 ≤ r jk ≤ C jk ,
󵠈 r jk = 󵠈 rkl , k ̸= 1, N ,
j l (.)
󵠈 r1k = R,
k

󵠈 r jN = R.
j

This is a linear program (LP) and the optimal solution can be found by solving its dual
problem, which is another LP (see Appendix E)

minimize 󵠈 λ jk C jk
j,k

subject to λ jk ≥ 0, (.)
󰜈 j − 󰜈k = λ jk ,
󰜈 1 − 󰜈N = 1

with variables λ jk , ( j, k) ∈ E (weights for the link capacities) and 󰜈 j , j ∈ N (differences

of the weights). Since the minimum depends on 󰜈 j , j ∈ [1 : N], only through their differ-
ences, the dual LP is equivalent to

minimize 󵠈 λ jk C jk
j,k

subject to λ jk ≥ 0,
(.)
󰜈 j − 󰜈k = λ jk ,
󰜈1 = 1,
󰜈N = 0.

Now it can be shown (see Problem .) that the set of feasible solutions to (.) is a
polytope with extreme points of the form

1 if j ∈ S ,
󰜈j = 󶁇
0 otherwise,
(.)
1 if j ∈ S , k ∈ S c , ( j, k) ∈ E ,
λ jk = 󶁇
0 otherwise
368 Graphical Networks

for some S such that 1 ∈ S, N ∈ S c , and S and S c are each connected. Furthermore, it can
be readily checked that the value of (.) at each of these extreme solutions corresponds
to C(S). Since the minimum of a linear (or convex in general) function is attained by an
extreme point of the feasible set, the minimum of (.) and, in turn, of (.) is equal
to min C(S), where the minimum is over all (connected) cuts. Finally, since both the
primal and dual optimization problems satisfy Slater’s condition (see Appendix E), the
dual optimum is equal to the primal optimum, i.e., max R = min C(S). This completes
the proof of the max-flow min-cut theorem.

Remark 15.1. The capacity of a unicast graphical network is achieved error-free using
routing. Hence, information in such a network can be treated as water flowing in pipes or
a commodity transported over a network of roads.
Remark 15.2. The max-flow min-cut theorem continues to hold for networks with cycles
or delays (see Chapter ), and capacity is achieved error-free using routing. For networks
with cycles, there is always an optimal routing that does not involve any cycle.

15.3 CAPACITY OF GRAPHICAL MULTICAST NETWORK

The cutset bound turns out to be achievable also when the network has more than one
destination. Unlike the unicast case, however, it is not always achievable using only rout-
ing as demonstrated in the following.
Example . (Butterfly network). Consider the graphical multicast network depicted
in Figure . with C jk = 1 for all ( j, k) and D = {6, 7}. It is easy to see that the cutset
upper bound for this network is C ≤ 2. To send the message M via routing, we split it into
two independent messages M1 at rate R1 and M2 at rate R2 with R = R1 + R2 . If each relay
node k simply forwards its incoming messages (i.e., ∑ j r jk = ∑ l rkl for each relay node k),
then it can be easily seen that R1 + R2 cannot exceed 1 bit per transmission with link (4, 5)
being the main bottleneck. Even if we allow the relay nodes to forward multiple copies of
its incoming messages, we must still have R1 + R2 ≤ 1.

2 6
̂6
M
M1 M1
(M1 , M2 )
(M1 , M2 )
M = (M1 , M2 ) 1 4 5
(M1 , M2 )
M2 M2
̂7
M
3 7
Figure .. Optimal routing for the butterfly network.
15.3 Capacity of Graphical Multicast Network 369

Surprisingly, if we allow simple encoding operations at the relay nodes (network cod-
ing), we can achieve the cutset bound. Let R1 = R2 = 1. As illustrated in Figure ., relay
nodes , , and  forward multiple copies of their incoming messages, and relay node 
sends the modulo- sum of its incoming messages. Then both destination nodes  and 
can recover (M1 , M2 ) error-free, achieving the cutset upper bound. This simple example
shows that treating information as a physical commodity is not optimal in general.

2 M1 6
̂6
M
M1 M1
M1 ⊕ M2
M1 ⊕ M2
M = (M1 , M2 ) 1 4 5
M1 ⊕ M2
M2 M2
M2
̂7
M
3 7
Figure .. Network coding for the butterfly network.

The above network coding idea can be generalized to achieve the cutset bound for
arbitrary graphical multicast networks.

Theorem . (Network Coding Theorem). The capacity of the graphical multicast
network G = (N , E , C) with destination set D is

C = min min C(S).

j∈D S⊂N
1∈S, j∈S 󰑐

As in the butterfly network example, capacity can be achieved error-free using simple
linear operations at the relay nodes.
Remark .. The network coding theorem can be extended to networks with broad-
casting (that is, networks modeled by a hypergraph), cycles, and delays. The proof of the
extension to networks with cycles is sketched in the Bibliographic Notes. In Section .,
we show that network coding and these extensions are special cases of the noisy network
coding scheme.

15.3.1 Linear Network Coding

For simplicity, we first consider a multicast network with integer link capacities, repre-
sented by a directed acyclic multigraph G = (N , E) with links of the same -bit capacity
as depicted in Figure .. Hence, each link of the multigraph G can carry n bits of infor-
mation (a symbol from the finite field 𝔽2󰑛 ) per n-transmission block. Further, we assume
370 Graphical Networks

3 1 M1 ̂1
M
M 1 ̂
M M2 ̂2
M
M3 ̂3
M
1 2

Figure .. Graphical network and its equivalent multigraph.

that R is an integer and thus the message can be represented as M = (M1 , . . . , MR ) with
M j ∈ 𝔽2󰑛 , j ∈ [1 : R].
Given a network modeled by a multigraph (N , E), we denote the set of outgoing edges
from a node k ∈ N by Ek→ and the set of incoming edges to a node k by E→k . A (2nR , n)
linear network code for the multigraph (N , E) consists of
∙ a message set 𝔽R2󰑛 (that is, each message is represented by a vector in the R-dimensional
vector space over the finite field 𝔽2󰑛 ),
∙ a linear source encoder that assigns an index tuple m(E1→ ) = {me ∈ 𝔽2󰑛 : e ∈ E1→ } to
each (m1 , . . . , mR ) ∈ 𝔽R2󰑛 via a linear transformation (with coefficients in 𝔽2󰑛 ),
∙ a set of linear relay encoders, where encoder k assigns an index tuple m(Ek→ ) to each
m(E→k ) via a linear transformation, and
̂ Rj to each m(E→ j ) via a linear
∙ a set of linear decoders, where decoder j ∈ D assigns m
transformation.
These linear network coding operations are illustrated in Figure ..
Note that for each destination node j ∈ D, a linear network code induces a linear
transformation
̂ Rj = A j (α) mR
m

for some A j (α) ∈ 𝔽2R×R

󰑛 , where α is a vector of linear encoding/decoding coefficients with
elements taking values in 𝔽2󰑛 . A rate R is said to be achievable error-free if there exist an
integer n and a vector α such that A j (α) = IR for every j ∈ D. Note that any invertible
A j (α) suffices since the decoder can multiply m ̂ Rj by A−1
j (α) to recover m. The resulting
decoder is still linear (with a different α).
Example .. Consider the -node network depicted in Figure .. A linear network
code with R = 2 induces the linear transformation
̂1
m α α10 α 0 α α2 m1 m
󶁦 󶁶=󶁦 9 󶁶󶁦 6 󶁶󶁦 1 󶁶 󶁦 󶁶 = A(α) 󶁦 1 󶁶 .
̂2
m α11 α12 α5 α8 α7 α3 α4 m2 m2

If A(α) is invertible, then the rate R = 2 can be achieved error-free by substituting the
matrix
α α10
󶁦 9 󶁶
α11 α12
15.3 Capacity of Graphical Multicast Network 371

m1 me1 = α11 m1 + α12 m2 + α13 m3

m2
m3 me2 = α21 m1 + α22 m2 + α23 m3

(a)
󳰀 󳰀
me1󳰀 me3󳰀 = α31 me1󳰀 + α32 me2󳰀
󳰀 󳰀
me4󳰀 = α41 me1󳰀 + α42 me2󳰀
󳰀 󳰀
me2󳰀 me5󳰀 = α51 me1󳰀 + α52 me2󳰀

(b)
󳰀󳰀 󳰀󳰀 󳰀󳰀
me1󳰀󳰀 ̂ j1 = α11
m me1󳰀󳰀 + α12 me2󳰀󳰀 + α13 me3󳰀󳰀
󳰀󳰀 󳰀󳰀 󳰀󳰀
me2󳰀󳰀 ̂ j2 = α21
m me1󳰀󳰀 + α22 me2󳰀󳰀 + α23 me3󳰀󳰀
󳰀󳰀 󳰀󳰀 󳰀󳰀
me3󳰀󳰀 ̂ j3 = α31
m me1󳰀󳰀 + α32 me2󳰀󳰀 + α33 me3󳰀󳰀

(c)

Figure .. Linear network code: (a) source encoder, (b) relay encoder, (c) decoder.

2
2
+ α 2m α6 m
1 12
α 1m
m1 1 4 ̂ 1 = α9 m24 + α10 m34
m
α5 m12
m2 α3 m ̂ 2 = α11 m24 + α12 m34
m
m 23
1 +α + α8
4m 13
2 α 7m
3
Figure .. Linear network code for Example ..

at the decoder with

α9 α10
A−1 (α) 󶁦 󶁶.
α11 α12

Remark .. A general network with noninteger link capacities can be approximated by
a multigraph with links of the same n/k bit capacities (each link carries n information bits
per k transmissions). A (2nR , k) linear code with rate nR/k is defined as before with the
R-dimensional vector space over 𝔽2󰑛 .
372 Graphical Networks

15.3.2 Achievability Proof of the Network Coding Theorem

We show that the cutset bound can be achieved using linear network coding. First con-
sider a graphical unicast network (i.e., D = {N}). A (2nR , n) linear code induces a linear
transformation m ̂ R = A(α)mR for some matrix A(α), where α denotes the coefficients in
the linear encoder and decoder maps. Now replace the coefficients α with an indetermi-
nate vector x and consider the determinant |A(x)| as a multivariate polynomial in x. In
Example ., x = (x1 , . . . , x12 ),
x9 x10 x 0 x x2
A(x) = 󶁦 󶁶󶁦 6 󶁶󶁦 1 󶁶
x11 x12 x5 x8 x7 x3 x4

x1 (x6 x9 + x5 x8 x10 ) + x3 x7 x10 x2 (x6 x9 + x5 x8 x10 ) + x4 x7 x10

=󶁦 󶁶,
x1 (x6 x11 + x5 x8 x12 ) + x3 x7 x12 x2 (x6 x11 + x5 x8 x12 ) + x4 x7 x12

and

|A(x)| = (x1 (x6 x9 + x5 x8 x10 ) + x3 x7 x10 )(x2 (x6 x11 + x5 x8 x12 ) + x4 x7 x12 )
− (x1 (x6 x11 + x5 x8 x12 ) + x3 x7 x12 )(x2 (x6 x9 + x5 x8 x10 ) + x4 x7 x10 ).
In general, |A(x)| is a polynomial in x with binary coefficients, that is, |A(x)| ∈ 𝔽2 [x], the
polynomial ring over 𝔽2 . This polynomial depends on the network topology and the rate
R, but not on n.
We first show that the rate R is achievable iff |A(x)| is nonzero. Suppose R ≤ C, that
is, R is achievable by the max-flow min-cut theorem. Then R is achievable error-free via
routing, which is a special class of linear network coding (with zero or one coefficients).
Hence, there exists a coefficient vector α with components in 𝔽2 such that A(α) = IR .
In particular, |A(α)| = |IR | = 1, which implies that |A(x)| is a nonzero element of the
polynomial ring 𝔽2 [x].
Conversely, given a rate R, suppose that the corresponding |A(x)| is nonzero. Then
there exist an integer n and a vector α with components taking values in 𝔽2󰑛 such that
A(α) is invertible. This is proved using the following.

Lemma .. If P(x) is a nonzero polynomial over 𝔽2 , then there exist an integer n and
a vector α with components taking values in 𝔽2󰑛 such that P(α) ̸= 0.

For example, x 2 + x = 0 for all x ∈ 𝔽2 , but x 2 + x ̸= 0 for some x ∈ 𝔽4 . The proof of

this lemma is given in Appendix A.
Now by Lemma ., |A(α)| is nonzero, that is, A(α) is invertible and hence R is achiev-
able.
Next consider the multicast network with destination set D. If R ≤ C, then by the
above argument, |A j (x)| is a nonzero polynomial over 𝔽2 for every j ∈ D. Since the poly-
nomial ring 𝔽2 [x] is an integral domain, that is, the product of two nonzero elements is
always nonzero (Lidl and Niederreiter , Theorem .), the product ∏ j∈D |A j (x)| is
also nonzero. As before, this implies that there exist n and α with elements in 𝔽2󰑛 such
15.4 Graphical Multimessage Network 373

that |A j (α)| is nonzero for all j ∈ D, that is, A−1

j (α) is invertible for every j ∈ D. This
completes the proof of achievability.

Remark 15.5. It can be shown that block length n ≤ ⌈log(|D|R + 1)⌉ suffices in the above
proof.
Remark 15.6. The achievability proof of the network coding theorem via linear network
coding readily extends to broadcasting. It can be also extended to networks with cycles
by using convolutional codes.

15.4 GRAPHICAL MULTIMESSAGE NETWORK

We now consider the more general problem of communicating multiple independent

messages over a network. As before, we model the network by a directed acyclic graph
G = (N , E , C) as depicted in Figure .. Assume that the nodes are ordered so that there
is no path from node k to node j if j < k. Each node j ∈ [1 : N − 1] wishes to send a
message M j to a set D j ⊆ [ j + 1 : N] of destination nodes. This setting includes the case
where only a subset of nodes are sending messages by taking M j =  (R j = 0) for each
nonsource node j.
A (2nR1 , . . . , 2nR󰑁−1 , n) code for the graphical multimessage network G = (N , E , C)
consists of
∙ message sets [1 : 2nR1 ], . . . , [1 : 2nR󰑁−1 ],
∙ a set of encoders, where encoder k ∈ [1 : N − 1] assigns an index mkl ∈ [1 : 2nC󰑘󰑙 ] to
each received index tuple (m jk : ( j, k) ∈ E) and its own message mk for each (k, l) ∈ E,
and
∙ a set of decoders, where decoder l ∈ [2 : N] assigns an estimate m̂ jl or an error message
e to each received index tuple (mkl : (k, l) ∈ E) for j such that l ∈ D j .

̂ 12
M ̂1j, M
(M ̂ 3j)
2 j
M2

C12

1 C14 N
M1 ̂ 1N
M
4
C13

M3
3 k
̂ 13
M ̂ 2k , M
(M ̂ 3k )

Figure .. Graphical multimessage network.

374 Graphical Networks

Assume that the message tuple (M1 , . . . , MN−1 ) is uniformly distributed over [1 :
2nR1 ] × ⋅ ⋅ ⋅ × [1 : 2nR󰑁−1 ]. The average probability of error is defined as Pe(n) = P{M ̂ jk ̸=
M j for some j ∈ [1 : N − 1], k ∈ D j }. A rate tuple (R1 , . . . , RN−1 ) is said to be achievable
if there exists a sequence of (2nR1 , . . . , 2nR󰑁−1 , n) codes such that limn→∞ Pe(n) = 0. The
capacity region of the graphical network is the closure of the set of achievable rate tuples.
The capacity region for the multimessage setting is not known in general. By extending
the proof of the cutset bound for the multicast case, we can obtain the following outer
bound on the capacity region.

Theorem . (Cutset Bound for the Graphical Multimessage Network). If the
rate tuple (R1 , . . . , RN−1 ) is achievable for the graphical multimessage network
G = (N , E , C) with destination sets (D1 , . . . , DN−1 ), then it must satisfy the inequality

󵠈 R j ≤ C(S)
j∈S:D 󰑗 ∩S 󰑐 ̸=

for all S ⊂ N such that D j ∩ S c ̸=  for some j ∈ [1 : N − 1].

Remark .. As we show in Chapter , this cutset outer bound continues to hold even
when the network has cycles and allows for broadcasting and interaction between the
nodes.
In the following, we consider two special classes of multimessage networks.

15.4.1 Graphical Multimessage Multicast Network

When R2 = ⋅ ⋅ ⋅ = RN−1 = 0, the network reduces to a multicast network and the cutset
bound is tight. More generally, let [1 : k], for some k ≤ N − 1, be the set of source nodes
and assume that the sets of destination nodes are the same for every source, i.e., D j = D
for j ∈ [1 : k]. Hence in this general multimessage multicast setting, every destination
node in D is to recover all the messages. The cutset bound is again tight for this class of
networks and is achieved via linear network coding.

Theorem .. The capacity region of the graphical multimessage multicast network
G = (N , E , C) with source nodes [1 : k] and destination nodes D is the set of rate tuples
(R1 , . . . , Rk ) such that
󵠈 R j ≤ C(S)
j∈S

for all S ⊂ N with [1 : k] ∩ S ̸=  and D ∩ S c ̸= .

It can be easily checked that for k = 1, this theorem reduces to the network coding
theorem. The proof of the converse follows by the cutset bound. For the proof of achiev-
ability, we use linear network coding.
15.4 Graphical Multimessage Network 375

Proof of achievability. Since the messages are to be sent to the same set of destination
nodes, we can treat them as a single large message rather than distinct nonexchangeable
quantities. This key observation makes it straightforward to reduce the multimessage
multicast problem to a single-message multicast one as follows.
Consider the augmented network G 󳰀 depicted in Figure ., where an auxiliary node
0 is connected to every source node j ∈ [1 : k] by an edge (0, j) of capacity R j . Suppose
that the auxiliary node 0 wishes to communicate a message M0 ∈ [1 : 2nR0 ] to D. Then, by
the achievability proof of the network coding theorem for the multicast network, linear
network coding achieves the cutset bound for the augmented network G 󳰀 and hence its
capacity is
C0 = min C(S 󳰀 ),
where the minimum is over all S 󳰀 such that 0 ∈ S 󳰀 and D ∩ S 󳰀c ̸= . Now if the rate tuple
(R1 , . . . , Rk ) satisfies the rate constraints in Theorem ., then it can be easily shown that
k
C0 = 󵠈 R j .
j=1

Hence, there exist an integer n and a coefficient vector α such that for every j ∈ D, the
linear transformation A󳰀j (α) induced by the corresponding linear network code is invert-
ible. But from the structure of the augmented graph G 󳰀 , the linear transformation can be
factored as A󳰀j (α) = A j (α)B j (α), where B j (α) is a square matrix that encodes m0 ∈ 𝔽2󰑛0
C

into (m1 , . . . , mk ) ∈ 𝔽2󰑛0 . Since A󳰀j (α) is invertible, both A j (α) and B j (α) must be invert-
C

ible as well. Therefore, each destination node j ∈ D can recover (m1 , . . . , mk ) as well as
m0 , which establishes the achievability of the rate tuple (R1 , . . . , Rk ).

0 R1
M0 1
R2
2
G
Rk

G󳰀 k

Figure .. Augmented single-message multicast network.

15.4.2 Graphical Multiple-Unicast Network

Consider a multiple-unicast network, where each source j has a single destination, i.e.,
|D j | = 0 or 1 for all j ∈ [1 : N − 1]. If the operations at the nodes are restricted to routing,
then the problem reduces to the well-studied multicommodity flow. The necessary and
376 Graphical Networks

sufficient conditions for optimal multicommodity flow can be found by linear program-
ming as for the max-flow min-cut theorem. This provides an inner bound on the capacity
region of the multiple-unicast network, which is not optimal in general. More generally,
each node can perform linear network coding.
Example .. Consider the -unicast butterfly network depicted in Figure ., where
R3 = ⋅ ⋅ ⋅ = R6 = 0, D1 = {6}, D2 = {5}, and C jk = 1 for all ( j, k) ∈ E. By the cutset bound,
we must have R1 ≤ 1 and R2 ≤ 1 (by setting S = {1, 3, 4, 5} and S = {2, 3, 4, 6}, respec-
tively), which is achievable via linear network coding. In comparison, routing can achieve
at most R1 + R2 ≤ 1, because of the bottleneck edge (3, 4).

1 5
M1 ̂2
M

3 4

M2 ̂1
M
2 6
Figure .. Two-unicast butterfly network.

The cutset bound is not tight in general, however, as demonstrated in the following.
Example .. Consider the -unicast network depicted in Figure ., where R3 = ⋅ ⋅ ⋅ =
R6 = 0, D1 = {6}, D2 = {5}, and C jk = 1 for all ( j, k) ∈ E. It can be easily verified that the
cutset bound is the set of rate pairs (R1 , R2 ) such that R1 ≤ 1 and R2 ≤ 1. By compari-
son, the capacity region is the set of rate pairs (R1 , R2 ) such that R1 + R2 ≤ 1, which is
achievable via routing.

1 5
M1 ̂2
M

3 4

M2 ̂1
M
2 6
Figure .. Two-unicast network for which the cutset bound is not tight.

The capacity region of the multiple-unicast network is not known in general.

Summary 377

SUMMARY

∙ Cutset bounds on the capacity of graphical networks

∙ Max-flow min-cut theorem for graphical unicast networks
∙ Routing alone does not achieve the capacity of general graphical networks
∙ Network coding theorem for graphical multicast networks
∙ Linear network coding achieves the capacity of graphical multimessage multicast net-
works (error-free and with finite block length)
∙ Open problems:
15.1. What is the capacity region of the general graphical -unicast network?
15.2. Does linear network coding achieve the capacity region of graphical multiple-
unicast networks?
15.3. Is the average probability of error capacity region of the general graphical multi-
message network equal to its maximal probability of error capacity region?

BIBLIOGRAPHIC NOTES

The max-flow min-cut theorem was established by Ford and Fulkerson (); see also
Elias, Feinstein, and Shannon (). The capacity and optimal routing can be found by
the Ford–Fulkerson algorithm, which is constructive and more efficient than standard
linear programming algorithms such as the simplex and interior point methods.
The butterfly network in Example . and the network coding theorem are due to
Ahlswede, Cai, Li, and Yeung (). The proof of the network coding theorem in their
paper uses random coding. The source and relay encoding and the decoding mappings are
randomly and independently generated, each according to a uniform pmf. The key step
in the proof is to show that if the rate R is less than the cutset bound, then the end-to-end
mapping is one-to-one with high probability. This proof was extended to cyclic networks
by constructing a time-expanded acyclic network as illustrated in Figure .. The nodes
in the original network are replicated b times and auxiliary source and destination nodes
are added as shown in the figure. An edge is drawn between two nodes in consecutive
levels of the time-expanded network if the nodes are connected by an edge in the original
network. The auxiliary source node is connected to each copy of the original source node
and each copy of a destination node is connected to the corresponding auxiliary desti-
nation node. Note that this time-expanded network is always acyclic. Hence, we can use
the random coding scheme for the acyclic network, which implies that the same message
is in effect sent over b transmission blocks using independent mappings. The key step is
to show that for sufficiently large b, the cutset bound for the time-expanded network is
roughly b times the capacity of the original network.
378 Graphical Networks

Time 1 2 3 4
1

2
2
3
1 4
4

5
3

Figure .. A cyclic network and its time-expanded acyclic network. The cycle
2 → 3 → 4 → 2 in the original network is unfolded to the paths 2(t) → 3(t + 1) →
4(t + 2) → 2(t + 3) in the time-expanded network.

The network coding theorem was subsequently proved using linear coding by Li,
Yeung, and Cai () and Koetter and Médard (). The proof of achievability in Sec-
tion .. follows the latter, who also extended the result to cyclic networks using convo-
lutional codes. Jaggi, Sanders, Chou, Effros, Egner, Jain, and Tolhuizen () developed
a polynomial-time algorithm for finding a vector α that makes A j (α) invertible for each
j ∈ D. In comparison, finding the optimal routing is the same as packing Steiner trees
(Chou, Wu, and Jain ), which is an NP-complete problem. For sufficiently large n, a
randomly generated linear network code achieves the capacity with high probability. This
random linear network coding can be used as both a construction tool and a method of
attaining robustness to link failures and network topology changes (Ho, Médard, Koetter,
Karger, Effros, Shi, and Leong , Chou, Wu, and Jain ).
Multicommodity flow was first studied by Hu (), who established a version of
max-flow min-cut theorem for undirected networks with two commodities. Extensions
to (unicast) networks with more than two commodities can be found, for example, in
Schrijver (). The routing capacity region of a general multimessage network was es-
tablished by Cannons, Dougherty, Freiling, and Zeger (). The capacity of the multi-
message multicast network in Theorem . is due to Ahlswede, Cai, Li, and Yeung ().
Dougherty, Freiling, and Zeger () showed via an ingenuous counterexample that
unlike the multicast case, linear network coding fails to achieve the capacity region of
a general graphical multimessage network error-free. This counterexample hinges on
a deep connection between linear network coding and matroid theory; see Dougherty,
Freiling, and Zeger () and the references therein. Network coding has attracted much
attention from researchers in coding theory, wireless communication, and networking, in
addition to information theorists. Comprehensive treatments of network coding and its
extensions and applications can be found in Yeung, Li, Cai, and Zhang (a,b), Fragouli
and Soljanin (a,b), Yeung (), and Ho and Lun ().
Problems 379

PROBLEMS

.. Show that it suffices to take the minimum over connected cuts in the max-flow
min-cut theorem.
.. Show that it suffices to take n ≤ ⌈log(|D|R + 1)⌉ in the achievability proof of the
network coding theorem in Section ...
.. Prove the cutset bound for the general graphical multimessage network in Theo-
rem ..
.. Duality for the max-flow min-cut theorem. Consider the optimization problem
in (.).
(a) Verify that (.) and (.) are dual to each other by rewriting them in the
standard forms of the LP and the dual LP in Appendix E.
(b) Show that the set of feasible solutions to (.) is characterized by (󰜈1 , . . . , 󰜈N )
such that 󰜈 j ∈ [0, 1], 󰜈1 = 1, 󰜈N = 0, and 󰜈 j ≥ 󰜈k if j precedes k. Verify that
the solution in (.) is feasible.
(c) Using part (b) and the fact that every node lies on some path from node  to
node N, show that every feasible solution to (.) is a convex combination of
extreme points of the form (.).
.. Hypergraphical network. Consider a wireless network modeled by a weighted di-
rected acyclic hypergraph H = (N , E , C), where E now consists of a set of hyper-
edges ( j, N j ) with capacity C jN 󰑗 . Each hyperedge models a noiseless broadcast
channel from node j to a set of receiver nodes N j . Suppose that source node 
wishes to communicate a message M ∈ [1 : 2nR ] to a set of destination nodes D.
(a) Generalize the cutset bound for the graphical multicast network to establish
an upper bound on the capacity of the hypergraphical multicast network.
(b) Show that the cutset bound in part (a) is achievable via linear network coding.
.. Multimessage network. Consider the network depicted in Figure ., where R3 =
⋅ ⋅ ⋅ = R6 = 0, D1 = {4, 5}, D2 = {6}, and C jk = 1 for all ( j, k) ∈ E.

1 4
M1 ̂ 14
M

3 6
̂ 26
M

M2 ̂ 15
M
2 5
Figure .. Multimessage network for which the cutset bound is not tight.
380 Graphical Networks

(a) Show that the cutset bound is the set of rate pairs (R1 , R2 ) such that R1 , R2 ≤ 1.
(b) Show that the capacity region is the set of rate pairs (R1 , R2 ) such that 2R1 +
R2 ≤ 2 and R2 ≤ 1. (Hint: For the proof of the converse, use Fano’s inequality
and the data processing inequality to show that nR1 ≤ I(M ̂ 14 ; M
̂ 15 ) − nєn ≤
I(M34 ; M35 ) − nєn , and n(R1 + R2 ) ≤ H(M34 , M35 ) − nєn .)
Remark: A similar example appeared in Yeung (, Section .) and the hint is
due to G. Kramer.
.. Triangular cyclic network. Consider the -node graphical multiple-unicast net-
work in Figure ., where C12 = C23 = C31 = 1. Node j = 1, 2, 3 wishes to com-
municate a message M j ∈ [1 : 2nR 󰑗 ] to its predecessor node.

M2 ̂3
M

1 3
̂2
M M3

M1 ̂1
M
Figure .. Triangular cyclic network.

(a) Find the cutset bound.

(b) Show that the capacity region is the set of rate triples (R1 , R2 , R3 ) such that
R j + Rk ≤ 1 for j, k = 1, 2, 3 with j ̸= k.
Remark: This problem was studied by Kramer and Savari (), who developed
the edge-cut outer bound.
.. Multiple-unicast routing. Consider a multiple-unicast network G = (N , E , C). A
(2nR1 , . . . , 2nR󰑘 , n) code as defined in Section . is said to be a routing code if each
encoder sends a subset of the incoming bits over each outgoing edge (when the
messages and indices are represented as binary sequences). A rate tuple (R1 , . . . ,
Rk ) is said to be achievable by routing if there exists a sequence of (2nR1 , . . . , 2nR󰑘 , n)
routing codes such that limn→∞ Pe(n) = 0. The routing capacity region of the net-
work G is the closure of all rate tuples achievable by routing.
(a) Characterize the routing capacity region by finding the weighted sum-capacity
∑kj=1 λ j R j using linear programming.
(b) Show that the routing capacity region is achieved via forwarding, that is, no
Appendix 15A Proof of Lemma 15.1 381

duplication of the same information is needed. Thus, for routing over unicast
networks, information from different sources can be treated as physical com-
modities.
(c) Show that the routing capacity region for average probability of error is the
same as that for maximal probability of error.

APPENDIX 15A PROOF OF LEMMA 15.1

Let P(x1 , . . . , xk ) be a nonzero polynomial in 𝔽2 [x1 , . . . , xk ]. We show that if n is suf-

ficiently large, then there exist α1 , . . . , αk ∈ 𝔽2󰑛 such that P(α1 , . . . , αk ) ̸= 0. First, sup-
pose k = 1 and recall the fact that the number of roots of a (single-variable) polynomial
P(x1 ) ∈ 𝔽[x1 ] cannot exceed its degree for any field 𝔽 (Lidl and Niederreiter , The-
orem .). Hence by treating P(x1 ) as a polynomial over 𝔽2󰑛 , there exists an element
α1 ∈ 𝔽2󰑛 with P(α1 ) ̸= 0, if 2n is strictly larger than the degree of the polynomial.
We proceed by induction on the number of variables k. Express the polynomial as

d
j
P(x1 , . . . , xk ) = 󵠈 P j (x2 , . . . , xk )x1 .
j=0

Since P(x1 , . . . , xk ) ̸= 0, P j (x2 , . . . , xk ) ̸= 0 for some j. Then by the induction hypothe-

sis, if n is sufficiently large, there exist α2 , . . . , αk ∈ 𝔽2󰑛 such that P j (α2 , . . . , αk ) ̸= 0 for
some j. But this implies that P(x1 , α2 , . . . , αk ) ∈ 𝔽2󰑛 is nonzero. Hence, using the afore-
mentioned fact on the number of roots of single-variable polynomials, we conclude that
P(α1 , α2 , . . . , αk ) is nonzero for some α1 ∈ 𝔽2󰑛 if n is sufficiently large.
CHAPTER 16

Relay Channels

In this chapter, we begin our discussion of communication over general multihop net-
works. We study the -node relay channel, which is a model for point-to-point com-
munication with the help of a relay, such as communication between two base stations
through both a terrestrial link and a satellite, or between two nodes in a mesh network
with an intermediate node acting as a relay.
The capacity of the relay channel is not known in general. We establish a cutset upper
bound on the capacity and discuss several coding schemes that are optimal in some special
cases. We first discuss the following two extreme schemes.
∙ Direct transmission: In this simple scheme, the relay is not actively used in the com-
munication.
∙ Decode–forward: In this multihop scheme, the relay plays a central role in the com-
munication. It decodes for the message and coherently cooperates with the sender
to communicate it to the receiver. This scheme involves the new techniques of block
Markov coding, backward decoding, and the use of binning in channel coding.
We observe that direct transmission can outperform decode–forward when the channel
from the sender to the relay is weaker than that to the receiver. This motivates the devel-
opment of the following two schemes.
∙ Partial decode–forward: Here the relay recovers only part of the message and the rest
of the message is recovered only by the receiver. We show that this scheme is optimal
for a class of semideterministic relay channels and for relay channels with orthogonal
sender components.
∙ Compress–forward: In this scheme, the relay does not attempt to recover the message.
Instead, it uses Wyner–Ziv coding with the receiver’s sequence acting as side informa-
tion, and forwards the bin index. The receiver then decodes for the bin index, finds
the corresponding reconstruction of the relay received sequence, and uses it together
with its own sequence to recover the message. Compress–forward is shown to be op-
timal for a class of deterministic relay channels and for a modulo- sum relay channel
example whose capacity turns out to be strictly lower than the cutset bound.
16.1 Discrete Memoryless Relay Channel 383

Motivated by wireless networks, we study the following three Gaussian relay channel
models.

∙ Full-duplex Gaussian RC: The capacity for this model is not known for any set of
nonzero channel parameter values. We evaluate and compare the cutset upper bound
and the decode–forward and compress–forward lower bounds. We show that the par-
tial decode–forward lower bound reduces to the largest of the rates for direct trans-
mission and decode–forward.
∙ Half-duplex Gaussian RC with sender frequency division: In contrast to the full-
duplex RC, we show that partial decode–forward is optimal for this model.
∙ Half-duplex Gaussian RC with receiver frequency division: We show that the cutset
bound coincides with the decode–forward bound for a range of channel parameter
values. We then present the amplify–forward coding scheme in which the relay sends
a scaled version of its previously received signal. We generalize amplify–forward to
linear relaying functions that are weighted sums of past received signals and establish
a single-letter characterization of the capacity with linear relaying.

In the last section of this chapter, we study the effect of relay lookahead on capacity. In
the relay channel setup, we assume that the relaying functions depend only on past re-
ceived relay symbols; hence they are strictly causal. Here we allow the relaying functions
to depend with some lookahead on the relay received sequence. We study two extreme
lookahead models—the noncausal relay channel and the causal relay channel. We present
upper and lower bounds on the capacity for these two models that are tight in some cases.
In particular, we show that the cutset bound for the strictly causal relay channel does not
hold for the causal relay channel. We further show that simple instantaneous relaying can
be optimal and achieves higher rates than the cutset bound for the strictly causal relay
channel. We then extend these results to the (full-duplex) Gaussian case. For the non-
causal Gaussian RC, we show that capacity is achieved via noncausal decode–forward if
the channel from the sender to the relay is sufficiently strong, while for the causal Gaussian
RC, we show that capacity is achieved via instantaneous amplify–forward if the channel
from the sender to the relay is sufficiently weaker than the other two channels. These re-
sults are in sharp contrast to the strictly causal case for which capacity is not known for
any nonzero channel parameter values.

16.1 DISCRETE MEMORYLESS RELAY CHANNEL

Consider the -node point-to-point communication system with a relay depicted in Fig-
ure .. The sender (node ) wishes to communicate a message M to the receiver (node )
with the help of the relay (node ). We first consider the discrete memoryless relay channel
(DM-RC) model (X1 × X2 , p(y2 , y3 |x1 , x2 ), Y2 × Y3 ) that consists of four finite sets X1 ,
X2 , Y2 , Y3 , and a collection of conditional pmfs p(y2 , y3 |x1 , x2 ) on Y2 × Y3 .
384 Relay Channels

Relay encoder

Y2n X2n

M X1n Y3n ̂
M
Encoder p(y2 , y3 |x1 , x2 ) Decoder

Figure .. Point-to-point communication system with a relay.

A (2nR , n) code for the DM-RC consists of

∙ a message set [1 : 2nR ],
∙ an encoder that assigns a codeword x1n (m) to each message m ∈ [1 : 2nR ],
∙ a relay encoder that assigns a symbol x2i (y2i−1 ) to each past received sequence y2i−1 ∈
Y2i−1 for each time i ∈ [1 : n], and
̂ or an error message e to each received sequence
∙ a decoder that assigns an estimate m
y3 ∈ Y 3 .
n n

The channel is memoryless in the sense that the current received symbols (Y2i , Y3i ) and
the message and past symbols (m, X1i−1 , X2i−1 , Y2i−1 , Y3i−1 ) are conditionally independent
given the current transmitted symbols (X1i , X2i ).
We assume that the message M is uniformly distributed over the message set. The
̂ ̸= M}. A rate R is said to be achievable
average probability of error is defined as Pe(n) = P{M
for the DM-RC if there exists a sequence of (2 , n) codes such that limn→∞ Pe(n) = 0. The
nR

capacity C of the DM-RC is the supremum of all achievable rates.

The capacity of the DM-RC is not known in general. We discuss upper and lower
bounds on the capacity that are tight for some special classes of relay channels.

16.2 CUTSET UPPER BOUND ON THE CAPACITY

The following upper bound is motivated by the cutset bounds for graphical networks we
discussed in Chapter .

Theorem . (Cutset Bound for the DM-RC). The capacity of the DM-RC is upper
bounded as
C ≤ max min󶁁I(X1 , X2 ; Y3 ), I(X1 ; Y2 , Y3 | X2 )󶁑.
p(x1 ,x2 )

The terms under the minimum in the bound can be interpreted as cooperative multiple
access and broadcast bounds as illustrated in Figure .—the sender cannot transmit
information at a higher rate than if both senders or both receivers fully cooperate.
16.2 Cutset Upper Bound on the Capacity 385

X2 Y2 : X2

X1 Y3 X1 Y3

R < I(X1 , X2 ; Y3 ) R < I(X1 ; Y2 , Y3 |X2 )

Cooperative multiple access bound Cooperative broadcast bound

Figure .. Min-cut interpretation of the cutset bound.

The cutset bound is tight for many classes of DM-RCs with known capacity. However,
it is not tight in general as shown via an example in Section ...
Proof of Theorem .. By Fano’s inequality,

nR = H(M) = I(M; Y3n ) + H(M |Y3n ) ≤ I(M; Y3n ) + nєn ,

where єn tends to zero as n → ∞. We now show that

n n
I(M; Y3n ) ≤ min󶁄󵠈 I(X1i , X2i ; Y3i ), 󵠈 I(X1i ; Y2i , Y3i | X2i )󶁔.
i=1 i=1

To establish the first inequality, consider

n
I(M; Y3n ) = 󵠈 I(M; Y3i |Y3i−1 )
i=1
n
≤ 󵠈 I(M, Y3i−1 ; Y3i )
i=1
n
≤ 󵠈 I(X1i , X2i , M, Y3i−1 ; Y3i )
i=1
n
= 󵠈 I(X1i , X2i ; Y3i ).
i=1

To establish the second inequality, consider

I(M; Y3n ) ≤ I(M; Y2n , Y3n )

n
= 󵠈 I(M; Y2i , Y3i |Y2i−1 , Y3i−1 )
i=1
n
(a)
= 󵠈 I(M; Y2i , Y3i |Y2i−1 , Y3i−1 , X2i )
i=1
386 Relay Channels

n
≤ 󵠈 I(M, Y2i−1 , Y3i−1 ; Y2i , Y3i | X2i )
i=1
n
= 󵠈 I(X1i , M, Y2i−1 , Y3i−1 ; Y2i , Y3i | X2i )
i=1
n
= 󵠈 I(X1i ; Y2i , Y3i | X2i ),
i=1

where (a) follows since X2i is a function of Y2i−1 . Finally, let Q ∼ Unif[1 : n] be inde-
pendent of (X1n , X2n , Y2n , Y3n ) and set X1 = X1Q , X2 = X2Q , Y2 = Y2Q , Y3 = Y3Q . Since Q →
(X1 , X2 ) → (Y2 , Y3 ), we have
n
󵠈 I(X1i , X2i ; Y3i ) = nI(X1 , X2 ; Y3 |Q) ≤ nI(X1 , X2 ; Y3 ),
i=1
n
󵠈 I(X1i ; Y2i , Y3i | X2i ) = nI(X1 ; Y2 , Y3 | X2 , Q) ≤ nI(X1 ; Y2 , Y3 | X2 ).
i=1

Thus
R ≤ min󶁁I(X1 , X2 ; Y3 ), I(X1 ; Y2 , Y3 | X2 )󶁑 + єn .

Taking n → ∞ completes the proof of the cutset bound.

16.3 DIRECT-TRANSMISSION LOWER BOUND

One simple coding scheme for the relay channel is to fix the relay transmission at the most
favorable symbol to the channel from the sender to the receiver and to communicate the
message directly using optimal point-to-point channel coding. The capacity of the relay
channel is thus lower bounded by the capacity of the resulting DMC as

C ≥ max I(X1 ; Y3 | X2 = x2 ). (.)

p(x1 ), x2

This bound is tight for the class of reversely degraded DM-RCs in which

p(y2 , y3 |x1 , x2 ) = p(y3 |x1 , x2 )p(y2 | y3 , x2 ),

that is, X1 → Y3 → Y2 form a Markov chain conditioned on X2 . The proof of the converse
follows by the cutset bound and noting that I(X1 ; Y2 , Y3 |X2 ) = I(X1 ; Y3 |X2 ).
When the relay channel is not reversely degraded, however, we can achieve better rates
by actively using the relay.

16.4 DECODE–FORWARD LOWER BOUND

At the other extreme of direct transmission, the decode–forward coding scheme relies
heavily on the relay to help communicate the message to the receiver. We develop this
16.4 Decode–Forward Lower Bound 387

scheme in three steps. In the first two steps, we use a multihop relaying scheme in which
the receiver treats the transmission from the sender as noise. The decode–forward scheme
improves upon this multihop scheme by having the receiver decode also for the informa-
tion sent directly by the sender.

16.4.1 Multihop Lower Bound

In the multihop relaying scheme, the relay recovers the message received from the sender
in each block and retransmits it in the following block. This gives the lower bound on the
capacity of the DM-RC

C ≥ max min󶁁I(X2 ; Y3 ), I(X1 ; Y2 | X2 )󶁑. (.)

p(x1 )p(x2 )

It is not difficult to show that this lower bound is tight when the DM-RC consists of a
cascade of two DMCs, i.e., p(y2 , y3 |x1 , x2 ) = p(y2 |x1 )p(y3 |x2 ). In this case the capacity
expression simplifies to

C = max min󶁁I(X2 ; Y3 ), I(X1 ; Y2 | X2 )󶁑

p(x1 )p(x2 )

= max min󶁁I(X2 ; Y3 ), I(X1 ; Y2 )󶁑

p(x1 )p(x2 )

= min󶁃max I(X2 ; Y3 ), max I(X1 ; Y2 )󶁓.

p(x2 ) p(x1 )

Achievability of the multihop lower bound uses b transmission blocks, each consisting
of n transmissions, as illustrated in Figure .. A sequence of (b − 1) messages M j , j ∈
[1 : b − 1], each selected independently and uniformly over [1 : 2nR ], is sent over these b
blocks, We assume mb = 1 by convention. Note that the average rate over the b blocks is
R(b − 1)/b, which can be made as close to R as desired.

M1 M2 M3 Mb−1 1
Block 1 Block 2 Block 3 Block b − 1 Block b

Figure .. Multiple transmission blocks used in the multihop scheme.

Codebook generation. Fix the product pmf p(x1 )p(x2 ) that attains the multihop lower
bound in (.). Randomly and independently generate a codebook for each block. For
each j ∈ [1 : b], randomly and independently generate 2nR sequences x1n (m j ), m j ∈ [1 :
2nR ], each according to ∏ni=1 p X1 (x1i ). Similarly, generate 2nR sequences x2n (m j−1 ), m j−1 ∈
[1 : 2nR ], each according to ∏ni=1 p X2 (x2i ). This defines the codebook

C j = 󶁁(x1n (m j ), x2n (m j−1 )) : m j−1 , m j ∈ [1 : 2nR ]󶁑, j ∈ [1 : b].

The codebooks are revealed to all parties.

388 Relay Channels

Encoding. Let m j ∈ [1 : 2nR ] be the new message to be sent in block j. The encoder
transmits x1n (m j ) from codebook C j .
Relay encoding. By convention, let m ̃ 0 = 1. At the end of block j, the relay finds the
unique message m ̃ j such that (x1n (m ̃ j−1 ), y2n ( j)) ∈ Tє(n) . In block j + 1, it transmits
̃ j ), x2n (m
̃ j ) from codebook C j+1 .
x2 (m
n

Since the relay codeword transmitted in a block depends statistically on the message
transmitted in the previous block, we refer to this scheme as block Markov coding.
̂ j such that
Decoding. At the end of block j + 1, the receiver finds the unique message m
(n)
n
̂ j ), y3 ( j + 1)) ∈ Tє .
(x2 (m n

Analysis of the probability of error. We analyze the probability of decoding error for the
message M j averaged over codebooks. Assume without loss of generality that M j = 1. Let
M̃ j be the relay message estimate at the end of block j. Since

̂ j ̸= 1} ⊆ {M
{M ̃ j ̸= 1} ∪ {M
̂ j ̸= M
̃ j },

the decoder makes an error only if one or more of the following events occur:

Ẽ1 ( j) = 󶁁(X1n (1), X2n (M

̃ j−1 ), Y2n ( j)) ∉ Tє(n) 󶁑,
Ẽ2 ( j) = 󶁁(X1n (m j ), X2n (M
̃ j−1 ), Y n ( j)) ∈ T (n) for some m j ̸= 1󶁑,
2 є
̃ j ), Y3n ( j + 1)) ∉ Tє(n) 󶁑,
E1 ( j) = 󶁁(X2n (M
̃ j 󶁑.
E2 ( j) = 󶁁(X2n (m j ), Y3n ( j + 1)) ∈ Tє(n) for some m j ̸= M

Thus, the probability of error is upper bounded as

P(E( j)) = P{M̂ j ̸= 1}

≤ P(Ẽ1 ( j) ∪ Ẽ2 ( j) ∪ E1 ( j) ∪ E2 ( j))
≤ P(Ẽ1 ( j)) + P(Ẽ2 ( j)) + P(E1 ( j)) + P(E2 ( j)),

where the first two terms upper bound P{M ̃ j ̸= 1} and the last two terms upper bound
P{M ̂ j ̸= M̃ j }.
Now, by the independence of the codebooks, the relay message estimate M ̃ j−1 , which
is a function of Y2 ( j − 1) and codebook C j−1 , is independent of the codewords X1n (m j ),
n

X2n (m j−1 ), m j , m j−1 ∈ [1 : 2nR ], from codebook C j . Hence, by the LLN, P(Ẽ1 ( j)) tends
to zero as n → ∞, and by the packing lemma, P(Ẽ2 ( j)) tends to zero as n → ∞ if R <
I(X1 ; Y2 |X2 ) − δ(є). Similarly, by the independence of the codebooks and the LLN,
P(E1 ( j)) tends to zero as n → ∞, and by the same independence and the packing lemma,
P(E2 ( j)) tends to zero as n → ∞ if R < I(X2 ; Y3 ) − δ(є). Thus we have shown that un-
der the given constraints on the rate, P{M ̂ j ̸= M j } tends to zero as n → ∞ for each j ∈
[1 : b − 1]. This completes the proof of the multihop lower bound.
16.4 Decode–Forward Lower Bound 389

16.4.2 Coherent Multihop Lower Bound

The multihop scheme discussed in the previous subsection can be improved by having
the sender and the relay coherently cooperate in transmitting their codewords. With this
improvement, we obtain the lower bound on the capacity of the DM-RC

C ≥ max min󶁁I(X2 ; Y3 ), I(X1 ; Y2 | X2 )󶁑. (.)

p(x1 ,x2 )

Again we use a block Markov coding scheme in which a sequence of (b − 1) i.i.d. messages
M j , j ∈ [1 : b − 1], is sent over b blocks each consisting of n transmissions.
Codebook generation. Fix the pmf p(x1 , x2 ) that attains the lower bound in (.). For
j ∈ [1 : b], randomly and independently generate 2nR sequences x2n (m j−1 ), m j−1 ∈ [1 :
2nR ], each according to ∏ni=1 p X2 (x2i ). For each m j−1 ∈ [1 : 2nR ], randomly and condition-
ally independently generate 2nR sequences x1n (m j |m j−1 ), m j ∈ [1 : 2nR ], each according to
∏ni=1 p X1 |X2 (x1i |x2i (m j−1 )). This defines the codebook

C j = 󶁁(x1n (m j |m j−1 ), x2n (m j−1 )) : m j−1 , m j ∈ [1 : 2nR ]󶁑, j ∈ [1 : b].

The codebooks are revealed to all parties.

Encoding and decoding are explained with the help of Table ..

Block    ⋅⋅⋅ b−1 b

Y2 ̃1
m ̃2
m ̃3
m ⋅⋅⋅ ̃ b−1
m 

X2 x2n (1) ̃ 1)
x2n (m ̃ 2)
x2n (m ⋅⋅⋅ ̃ b−2 )
x2n (m ̃ b−1 )
x2n (m

Y3  ̂1
m ̂2
m ⋅⋅⋅ ̂ b−2
m ̂ b−1
m

Table .. Encoding and decoding for the coherent multihop lower bound.

Encoding. Let m j ∈ [1 : 2nR ] be the message to be sent in block j. The encoder transmits
x1n (m j |m j−1 ) from codebook C j , where m0 = mb = 1 by convention.
Relay encoding. By convention, let m ̃ 0 = 1. At the end of block j, the relay finds the
unique message m ̃ j such that (x1n (m
̃ j |m ̃ j−1 ), y2n ( j)) ∈ Tє(n) . In block j + 1, it
̃ j−1 ), x2n (m
̃ j ) from codebook C j+1 .
transmits x2 (m
n

̂ j such that
Decoding. At the end of block j + 1, the receiver finds the unique message m
(n)
n
̂ j ), y3 ( j + 1)) ∈ Tє .
(x2 (m n

Analysis of the probability of error. We analyze the probability of decoding error for M j
averaged over codebooks. Assume without loss of generality that M j−1 = M j = 1. Let M ̃j
390 Relay Channels

be the relay message estimate at the end of block j. As before, the decoder makes an error
only if one or more of the following events occur:
̃ j) = {M
E( ̃ j ̸= 1},
̃ j ), Y n ( j + 1)) ∉ T (n) 󶁑,
E1 ( j) = 󶁁(X2n (M 3 є
̃ j 󶁑.
E2 ( j) = 󶁁(X2n (m j ), Y3n ( j + 1)) ∈ Tє(n) for some m j ̸= M
Thus, the probability of error is upper bounded as
̃ j) ∪ E1 ( j) ∪ E2 ( j)) ≤ P(E(
̂ j ̸= 1} ≤ P(E(
P(E( j)) = P{M ̃ j)) + P(E1 ( j)) + P(E2 ( j)).

Following the same steps to the analysis of the probability of error for the (noncoherent)
multihop scheme, the last two terms, P(E1 ( j)) and P(E2 ( j)), tend to zero as n → ∞ if
̃ j)), define
R < I(X2 ; Y3 ) − δ(є). To upper bound the first term P(E(
Ẽ1 ( j) = 󶁁(X1n (1| M
̃ j−1 ), X n (M
2
̃ j−1 ), Y n ( j)) ∉ T (n) 󶁑,
2 є

Ẽ2 ( j) = 󶁁(X1n (m j | M
̃ j−1 ), X n (M
2
̃ j−1 ), Y n ( j)) ∈ T (n) for some m j ̸= 1󶁑.
2 є

Then
̃ j)) ≤ P(E(
P(E( ̃ j − 1) ∪ Ẽ1 ( j) ∪ Ẽ2 ( j))
̃ j − 1)) + P(Ẽ1 ( j) ∩ Ẽc ( j − 1)) + P(Ẽ2 ( j)).
≤ P(E(
Consider the second term
P(Ẽ1 ( j) ∩ Ẽc ( j − 1)) = P󶁁(X1n (1| M
̃ j−1 ), X n (M
2
̃ j−1 ), Y n ( j)) ∉ T (n) , M
2 є
̃ j−1 = 1󶁑
≤ P󶁁(X1n (1|1), X2n (1), Y2n ( j)) ∉ Tє(n) 󵄨󵄨󵄨󵄨 M
̃ j−1 = 1󶁑,

which, by the independence of the codebooks and the LLN, tends to zero as n → ∞. By
the packing lemma, P(Ẽ2 ( j)) tends to zero as n → ∞ if R < I(X1 ; Y2 |X2 ) − δ(є). Note that
M̃ 0 = 1 by definition. Hence, by induction, P(E( ̃ j)) tends to zero as n → ∞ for every j ∈
[1 : b − 1]. Thus we have shown that under the given constraints on the rate, P{M ̂ j ̸= M j }
tends to zero as n → ∞ for every j ∈ [1 : b − 1]. This completes the proof of achievability
of the coherent multihop lower bound in (.).

16.4.3 Decode–Forward Lower Bound

The coherent multihop scheme can be further improved by having the receiver decode
simultaneously for the messages sent by the sender and the relay. This leads to the follow-
ing.

Theorem . (Decode–Forward Lower Bound). The capacity of the DM-RC is

lower bounded as

C ≥ max min󶁁I(X1 , X2 ; Y3 ), I(X1 ; Y2 | X2 )󶁑.

p(x1 ,x2 )
16.4 Decode–Forward Lower Bound 391

Note that the main difference between this bound and the cutset bound is that the
latter includes Y3 in the second mutual information term.
This lower bound is tight when the DM-RC is degraded, i.e.,

p(y2 , y3 |x1 , x2 ) = p(y2 |x1 , x2 )p(y3 | y2 , x2 ).

The proof of the converse for this case follows by the cutset upper bound in Theorem .
since the degradedness of the channel implies that I(X1 ; Y2 , Y3 |X2 ) = I(X1 ; Y2 |X2 ). We
illustrate this capacity result in the following.
Example . (Sato relay channel). Consider the degraded DM-RC with X1 = Y2 = Y3 =
{0, 1, 2}, X2 = {0, 1}, and Y2 = X1 as depicted in Figure .. With direct transmission,
R0 = 1 bits/transmission can be achieved by setting X2 = 0 (or 1). By comparison, using
the optimal first-order Markov relay function x2i (y2,i−1 ) yields R1 = 1.0437, and using the
optimal second-order Markov relay function x2i (y2,i−1 , y2,i−2 ) yields R2 = 1.0549. Since
the channel is degraded, the capacity coincides with the decode–forward lower bound in
Theorem .. Evaluating this bound yields C = 1.1619.

X1 Y3 X1 Y3
1/2
0 0 0 0
1/2
1 1

1 1
1/2
2 2 2 2
1/2

X2 = 0 X2 = 1

Figure .. Sato relay channel.

16.4.4 Proof via Backward Decoding

Again we consider b transmission blocks, each consisting of n transmissions, and use a
block Markov coding scheme. A sequence of (b − 1) i.i.d. messages M j ∈ [1 : 2nR ], j ∈ [1 :
b − 1], is to be sent over the channel in nb transmissions.
Codebook generation. Fix the pmf p(x1 , x2 ) that attains the lower bound. As in the
coherent multihop scheme, we randomly and independently generate codebooks C j =
{(x1n (m j |m j−1 ), x2n (m j−1 )) : m j−1 , m j ∈ [1 : 2nR ]}, j ∈ [1 : b].
Encoding and backward decoding are explained with the help of Table ..
Encoding. Encoding is again the same as in the coherent multihop scheme. To send m j
in block j, the encoder transmits x1n (m j |m j−1 ) from codebook C j , where m0 = mb = 1 by
convention.
392 Relay Channels

Block    ⋅⋅⋅ b−1 b

Y2 ̃1 →
m ̃2 →
m ̃3 →
m ⋅⋅⋅ ̃ b−1
m 

X2 x2n (1) ̃ 1)
x2n (m ̃ 2)
x2n (m ⋅⋅⋅ ̃ b−2 )
x2n (m ̃ b−1 )
x2n (m

Y3  ̂1
m ̂2
←m ⋅⋅⋅ ̂ b−2
←m ̂ b−1
←m

Table .. Encoding and backward decoding for the decode–forward lower bound.

Relay encoding. Relay encoding is also the same as in the coherent multihop scheme.
By convention, let m ̃ 0 = 1. At the end of block j, the relay finds the unique message m ̃j
(n)
n
̃ j |m
such that (x1 (m ̃ j−1 ), x2 (m
n
̃ j−1 ), y2 ( j)) ∈ Tє . In block j + 1, it transmits x2 (m
n n
̃ j ) from
codebook C j+1 .
Backward decoding. Decoding at the receiver is done backwards after all b blocks are
received. For j = b − 1, b − 2, . . . , 1, the receiver finds the unique message m ̂ j such that
(n)
̂ j+1 |m
(x1 (m
n
̂ j ), x2 (m
n
̂ j ), y3 ( j + 1)) ∈ Tє , successively with the initial condition m
n
̂ b = 1.
Analysis of the probability of error. We analyze the probability of decoding error for
the message M j averaged over codebooks. Assume without loss of generality that M j =
M j+1 = 1. The decoder makes an error only if one or more of the following events occur:
̃ j) = {M
E( ̃ j ̸= 1},
̂ j+1 ̸= 1},
E( j + 1) = {M
̂ j+1 | M
E1 ( j) = 󶁁(X1n (M ̃ j ), X n (M
̃ j ), Y n ( j + 1)) ∉ T (n) 󶁑,
2 3 є
̂ j+1 |m j ), X n (m j ), Y n ( j + 1)) ∈ T (n) for some m j ̸= M
E2 ( j) = 󶁁(X1n (M ̃ j 󶁑.
2 3 є

Thus the probability of error is upper bounded as

P(E( j)) = P{M̂ j ̸= 1}
̃ j) ∪ E( j + 1) ∪ E1 ( j) ∪ E2 ( j))
≤ P(E(
̃ j)) + P(E( j + 1)) + P(E1 ( j) ∩ Ẽc ( j) ∩ E c ( j + 1)) + P(E2 ( j)).
≤ P(E(
Following the same steps as in the analysis of the probability of error for the coherent
̃ j)) tends to zero as n → ∞ if R < I(X1 ; Y2 |X2 ) − δ(є).
multihop scheme, the first term P(E(
The third term is upper bounded as
P(E1 ( j) ∩ Ẽc ( j) ∩ E c ( j + 1))
= P(E1 ( j) ∩ {M ̂ j+1 = 1} ∩ { M
̃ j = 1})
̂ j+1 = 1, M
= P󶁁(X1n (1|1), X2n (1), Y3n ( j + 1)) ∉ Tє(n) , M ̃ j = 1󶁑
̃ j = 1󶁑,
≤ P󶁁(X1n (1|1), X2n (1), Y3n ( j + 1)) ∉ Tє(n) | M
16.4 Decode–Forward Lower Bound 393

which, by the independence of the codebooks and the LLN, tends to zero as n → ∞.
By the same independence and the packing lemma, the fourth term P(E2 ( j)) tends to
zero as n → ∞ if R < I(X1 , X2 ; Y3 ) − δ(є). Finally for the second term P(E( j + 1)), note
that M̂ b = Mb = 1. Hence, by induction, P{M ̂ j ̸= M j } tends to zero as n → ∞ for every
j ∈ [1 : b − 1] if the given constraints on the rate are satisfied. This completes the proof of
the decode–forward lower bound.

16.4.5 Proof via Binning

The excessive delay of backward decoding can be alleviated by using binning or sliding
window decoding. We describe the binning scheme here. The proof using sliding window
decoding will be given in Chapter .
In the binning scheme, the sender and the relay cooperatively send the bin index L j of
the message M j (instead of the message itself) in block j + 1 to help the receiver recover
Mj.
Codebook generation. Fix the pmf p(x1 , x2 ) that attains the lower bound. Let 0 ≤ R2 ≤ R.
For each j ∈ [1 : b], randomly and independently generate 2nR2 sequences x2n (l j−1 ), l j−1 ∈
[1 : 2nR2 ], each according to ∏ni=1 p X2 (x2i ). For each l j−1 ∈ [1 : 2nR2 ], randomly and condi-
tionally independently generate 2nR sequences x1n (m j |l j−1 ), m j ∈ [1 : 2nR ], each according
to ∏ni=1 p X1 |X2 (x1i |x2i (l j−1 )). This defines the codebook

C j = 󶁁(x1n (m j | l j−1 ), x2n (l j−1 )) : m j ∈ [1 : 2nR ], l j−1 ∈ [1 : 2nR2 ]󶁑, j ∈ [1 : b].

Partition the set of messages into 2nR2 equal size bins B(l) = [(l − 1)2n(R−R2 ) + 1 :
l2n(R−R2 ) ], l ∈ [1 : 2nR2 ]. The codebooks and bin assignments are revealed to all parties.
Encoding and decoding are explained with the help of Table ..

Block    ⋅⋅⋅ b−1 b

Y2 ̃ 1 , ̃l1
m ̃ 2 , ̃l2
m ̃ 3 , ̃l3
m ⋅⋅⋅ ̃ b−1 , ̃lb−1
m 

X2 x2n (1) x2n ( ̃l1 ) x2n ( ̃l2 ) ⋅⋅⋅ x2n ( ̃lb−2 ) x2n ( ̃lb−1 )

̂l , m ̂l , m ̂l , m ̂l , m
Y3  1 ̂1 2 ̂2 ⋅⋅⋅ b−2 ̂ b−2 b−1 ̂ b−1

Table .. Encoding and decoding of the binning scheme for decode–forward.

Encoding. Let m j ∈ [1 : 2nR ] be the new message to be sent in block j and assume that
m j−1 ∈ B(l j−1 ). The encoder transmits x1n (m j |l j−1 ) from codebook C j , where l0 = mb = 1
by convention.
394 Relay Channels

Relay encoding. At the end of block j, the relay finds the unique message m ̃ j such that
n
̃ j | ̃l j−1 ), x2 ( ̃l j−1 ), y2 ( j)) ∈ Tє . If m
(x1 (m n n (n)
̃ j ∈ B( ̃l j ), the relay transmits x2 ( ̃l j ) from code-
n

book C j+1 in block j + 1, where by convention ̃l0 = 1.

Decoding. At the end of block j + 1, the receiver finds the unique index ̂l j such that
(x2n ( ̂l j ), y3n ( j + 1)) ∈ Tє(n) . It then finds the unique message m ̂ j | ̂l j−1 ),
̂ j such that (x1n (m
n ̂ (n)
x2 ( l j−1 ), y3 ( j)) ∈ Tє and m
n
̂ j ∈ B( ̂l j ).
Analysis of the probability of error. We analyze the probability of decoding error for
the message M j averaged over codebooks. Assume without loss of generality that M j =
L j−1 = L j = 1 and let L̃ j be the relay estimate of L j . Then the decoder makes an error only
if one or more of the following events occur:
̃ j − 1) = {L̃ j−1 ̸= 1},
E(
E1 ( j − 1) = {L̂ j−1 ̸= 1},
E1 ( j) = {L̂ j ̸= 1},
E2 ( j) = 󶁁(X1n (1| L̂ j−1 ), X2n (L̂ j−1 ), Y3n ( j)) ∉ Tє(n) 󶁑,
E3 ( j) = 󶁁(X1n (m j | L̂ j−1 ), X2n (L̂ j−1 ), Y3n ( j)) ∈ Tє(n) for some m j ̸= 1, m j ∈ B(L̂ j )󶁑.
Thus the probability of error is upper bounded as
P(E( j)) = P{M̂ j ̸= 1}
̃ j − 1) ∪ E1 ( j − 1) ∪ E1 ( j) ∪ E2 ( j) ∪ E3 ( j))
≤ P(E(
̃ j − 1)) + P(E1 ( j)) + P(E1 ( j − 1))
≤ P(E(
+ P(E2 ( j) ∩ Ẽc ( j − 1) ∩ E c ( j − 1) ∩ E c ( j))
1 1
+ P(E3 ( j) ∩ Ẽc ( j − 1) ∩ E1c ( j − 1) ∩ E1c ( j)).
Following similar steps to the analysis of the error probability for the coherent multihop
scheme with L̃ j−1 replacing M ̃ j − 1)) tends to zero as n → ∞ if R2 <
̃ j−1 , the first term P(E(
I(X1 ; Y2 |X2 ) − δ(є). Again following similar steps to the analysis of the error probability
for the coherent multihop scheme, the second and third terms, P(E1 ( j)) and P(E1 ( j − 1)),
tend to zero as n → ∞ if R2 < I(X2 ; Y3 ) − δ(є). The fourth term is upper bounded as
P(E2 ( j) ∩ Ẽc ( j − 1) ∩ E1c ( j − 1) ∩ E1c ( j))
= P(E2 ( j) ∩ {L̃ j−1 = 1} ∩ {L̂ j−1 = 1} ∩ {L̂ j = 1})
≤ P󶁁(X1n (1|1), X2n (1), Y3n ( j)) ∉ Tє(n) 󵄨󵄨󵄨󵄨 L̃ j−1 = 1󶁑,
which, by the independence of the codebooks and the LLN, tends to zero as n → ∞. The
last term is upper bounded as
P(E3 ( j) ∩ Ẽc ( j − 1) ∩ E1c ( j − 1) ∩ E1c ( j))
= P(E3 ( j) ∩ {L̃ j−1 = 1} ∩ {L̂ j−1 = 1} ∩ {L̂ j = 1})
≤ P󶁁(X1n (m j |1), X2n (1), Y3n ( j)) ∈ Tє(n) for some m j ̸= 1, m j ∈ B(1) 󵄨󵄨󵄨󵄨 L̃ j−1 = 1󶁑,
16.5 Gaussian Relay Channel 395

which, by the same independence and the packing lemma, tends to zero as n → ∞ if R −
R2 < I(X1 ; Y3 |X2 ) − δ(є). Combining the bounds and eliminating R2 , we have shown that
P{M̂ j ̸= M j } tends to zero as n → ∞ for each j ∈ [1 : b − 1] if R < I(X1 ; Y2 |X2 ) − δ(є) and
R < I(X1 ; Y3 |X2 ) + I(X2 ; Y3 ) − 2δ(є) = I(X1 , X2 ; Y3 ) − 2δ(є). This completes the proof of
the decode–forward lower bound using binning.

16.5 GAUSSIAN RELAY CHANNEL

Consider the Gaussian relay channel depicted in Figure ., which is a simple model for
wireless point-to-point communication with a relay. The channel outputs corresponding
to the inputs X1 and X2 are

Y2 = д21 X1 + Z2 ,
Y3 = д31 X1 + д32 X2 + Z3 ,

where д21 , д31 , and д32 are channel gains, and Z2 N(0, 1) and Z3 N(0, 1) are independent
noise components. Assume average power constraint P on each of X1 and X2 . Since the
relay can both send X2 and receive Y2 at the same time, this model is sometimes referred
to as the full-duplex Gaussian RC, compared to the half-duplex models we discuss in Sec-
tions .. and ..
2
We denote the SNR of the direct channel by S31 = д31 P, the SNR of the channel from
2
the sender to the relay receiver by S21 = д21 P, and the SNR of the channel from the relay
2
to the receiver by S32 = д32 P. Note that under this model, the RC cannot be degraded or
reversely degraded. In fact, the capacity is not known for any S21 , S31 , S32 > 0.

Z3
д21 Y2 X2 д32

X1 Y3
д31

Figure .. Gaussian relay channel.

16.5.1 Upper and Lower Bounds on the Capacity of the Gaussian RC

We evaluate the upper and lower bounds we discussed in the previous sections.
Cutset upper bound. The proof of the cutset bound in Theorem . applies to arbitrary
alphabets. By optimizing the bound subject to the power constraints, we can show that it
is attained by jointly Gaussian (X1 , X2 ) (see Appendix A) and simplifies to

C ≤ max min󶁁C󶀡S31 + S32 + 2ρ󵀄S31 S32 󶀱, C󶀡(1 − ρ2 )(S31 + S21 )󶀱󶁑

0≤ρ≤1
396 Relay Channels

2
C 󶀢󶀡󵀄S21 S32 + 󵀄S31 (S31 + S21 − S32 )󶀲 /(S31 + S21 )󶀱 if S21 ≥ S32 ,
=󶁇 (.)
C(S31 + S21 ) otherwise.

Direct-transmission lower bound. It is straightforward to see that the lower bound

in (.) yields the lower bound
C ≥ C(S31 ).

Multihop lower bound. Consider the multihop lower bound in (.) subject to the
power constraints. The distributions on the inputs X1 and X2 that optimize the bound are
not known in general. Assuming X1 and X2 to be Gaussian, we obtain the lower bound

C ≥ min󶁁C(S21 ), C(S32 /(S31 + 1))󶁑. (.)

To prove achievability of this bound, we extend the multihop achievability to the case with
input costs and use the discretization procedure in Section ..
Decode–forward lower bound. Maximizing the decode–forward lower bound in Theo-
rem . subject to the power constraints yields

C ≥ max min󶁁C󶀡S31 + S32 + 2ρ󵀄S31 S32 󶀱, C󶀡S21 (1 − ρ2 )󶀱󶁑

0≤ρ≤1
2
C󶀡󶀡󵀄S31 (S21 − S32 ) + 󵀄S32 (S21 − S31 )󶀱 /S21 󶀱 if S21 ≥ S31 + S32 ,
=󶁇 (.)
C(S21 ) otherwise.

Achievability follows by setting X2 ∼ N(0, P) and X1 = ρX2 + X1󳰀 , where X1󳰀 ∼ N(0, (1 −
ρ2 )P) is independent of X2 and carries the new message to be recovered first by the re-
lay. Note that when S21 < S31 , the decode–forward rate becomes lower than the direct
transmission rate C(S31 ).
Noncoherent decode–forward lower bound. Since implementing coherent communi-
cation is difficult in wireless systems, one may consider a noncoherent decode–forward
coding scheme, where X1 and X2 are independent. This gives the lower bound

C ≥ min󶁁C(S31 + S32 ), C(S21 )󶁑. (.)

This scheme uses the same codebook generation and encoding steps as the (noncoherent)
multihop scheme, but achieves a higher rate by performing simultaneous decoding.

16.6 PARTIAL DECODE–FORWARD LOWER BOUND

In decode–forward, the relay fully recovers the message, which is optimal for the degraded
RC because the relay receives a strictly better version of X1 than the receiver. But in some
cases (e.g., the Gaussian RC with S21 < S31 ), the channel to the relay can be a bottleneck
and decode–forward can be strictly worse than direct transmission. In partial decode–
forward, the relay recovers only part of the message. This yields a tighter lower bound on
the capacity than both decode–forward and direct transmission.
16.6 Partial Decode–Forward Lower Bound 397

Theorem . (Partial Decode–Forward Lower Bound). The capacity of the DM-
RC is lower bounded as

C ≥ max min󶁁I(X1 , X2 ; Y3 ), I(U ; Y2 | X2 ) + I(X1 ; Y3 | X2 , U)󶁑,

p(u,x1 ,x2 )

where |U| ≤ |X1 |⋅|X2 |.

Note that if we set U = X1 , this lower bound reduces to the decode–forward lower
bound, and if we set U = , it reduces to the direct-transmission lower bound.
Proof outline. We use block Markov coding and backward decoding. Divide the message
M j , j ∈ [1 : b − 1], into two independent messages M 󳰀j at rate R 󳰀 and M 󳰀󳰀j at rate R󳰀󳰀 . Hence
R = R󳰀 + R󳰀󳰀 . Fix the pmf p(u, x1 , x2 ) that attains the lower bound and randomly generate
an independent codebook
󳰀 󳰀󳰀
C j = 󶁁(un (m󳰀j |m󳰀j−1 ), x1n (m󳰀j , m󳰀󳰀j |m󳰀j−1 ), x2n (m󳰀j−1 )) : m󳰀j−1 , m󳰀j ∈ [1 : 2nR ], m󳰀󳰀j ∈ [1 : 2nR ]󶁑

for each block j ∈ [1 : b]. The sender and the relay cooperate to communicate m󳰀j to
the receiver. The relay recovers m󳰀j at the end of block j using joint typicality decod-
ing (with un (m󳰀j |m󳰀j−1 ) replacing x1n (m j |m j−1 ) in decode–forward). The probability of
error for this step tends to zero as n → ∞ if R󳰀 < I(U ; Y2 |X2 ) − δ(є). After receiving
all the blocks, the messages m󳰀j , j ∈ [1 : b − 1], are first recovered at the receiver us-
ing backward decoding (with (un (m󳰀j+1 |m󳰀j ), x2n (m󳰀j )) replacing (x1n (m j+1 |m j ), x2n (m j )) in
decode–forward). The probability of error for this step tends to zero as n → ∞ if R󳰀 <
I(U , X2 ; Y3 ) − δ(є). The receiver then finds the unique message m󳰀󳰀j , j ∈ [1 : b − 1], such
that (un (m󳰀j |m󳰀j−1 ), x1n (m󳰀j , m󳰀󳰀j |m󳰀j−1 ), x2n (m󳰀j−1 ), y3n ( j)) ∈ Tє(n) . The probability of error for
this step tends to zero as n → ∞ if R󳰀󳰀 < I(X1 ; Y3 |U , X2 ) − δ(є). Eliminating R󳰀 and R󳰀󳰀
from the rate constraints establishes the partial decode–forward lower bound in Theo-
rem ..
The partial decode–forward scheme is optimal in some special cases.

16.6.1 Semideterministic DM-RC

Suppose that Y2 is a function of (X1 , X2 ), i.e., Y2 = y2 (X1 , X2 ). Then, the capacity of this
semideterministic DM-RC is

C = max min󶁁I(X1 , X2 ; Y3 ), H(Y2 | X2 ) + I(X1 ; Y3 | X2 , Y2 )󶁑. (.)

p(x1 ,x2 )

Achievability follows by setting U = Y2 in the partial decode–forward lower bound in

Theorem ., which is feasible since Y2 is a function of (X1 , X2 ). The converse follows by
the cutset bound in Theorem ..
398 Relay Channels

16.6.2 Relay Channel with Orthogonal Sender Components

The relay channel with orthogonal components is motivated by the fact that in many wire-
less communication systems the relay cannot send and receive in the same time slot or
in the same frequency band. The relay channel model can be specialized to accommo-
date this constraint by assuming orthogonal sender or receiver components. Here we
consider the DM-RC with orthogonal sender components depicted in Figure ., where
X1 = (X1󳰀 , X1󳰀󳰀 ) and p(y2 , y3 |x1 , x2 ) = p(y3 |x1󳰀 , x2 )p(y2 |x1󳰀󳰀 , x2 ). The relay channel with or-
thogonal receiver components is discussed in Section ... It turns out that the capacity
is known for this case.

Proposition .. The capacity of the DM-RC with orthogonal sender components is

C= max min󶁁I(X1󳰀 , X2 ; Y3 ), I(X1󳰀󳰀 ; Y2 | X2 ) + I(X1󳰀 ; Y3 | X2 )󶁑.

p(x2 )p(x1 |x2 )p(x1󳰀󳰀 |x2 )
󳰀

The proof of achievability uses partial decode–forward with U = X1󳰀󳰀 . The proof of the
converse follows by the cutset bound.

X1󳰀󳰀 p(y2 |x1󳰀󳰀 , x2 ) Y2 X2

p(y3 |x1󳰀 , x2 ) Y3

X1󳰀

Figure .. Relay channel with orthogonal sender components.

16.6.3 SFD Gaussian Relay Channel

We consider the Gaussian counterpart of the relay channel with orthogonal sender com-
ponents depicted in Figure ., which we refer to as the sender frequency-division (SFD)
Gaussian RC. In this half-duplex model, the channel from the sender to the relay uses a
separate frequency band. More specifically, in this model X1 = (X1󳰀 , X1󳰀󳰀 ) and
Y2 = д21 X1󳰀󳰀 + Z2 ,
Y3 = д31 X1󳰀 + д32 X2 + Z3 ,
where д21 , д31 , and д32 are channel gains, and Z2 ∼ N(0, 1) and Z3 ∼ N(0, 1) are indepen-
dent noise components. Assume average power constraint P on each of X1 = (X1󳰀 , X1󳰀󳰀 )
and X2 . The capacity of the SFD Gaussian RC is
̄ 21 ) + C󶀡α(1 − ρ2 )S31 󶀱󶁑.
C = max min󶁁C󶀡αS31 + S32 + 2ρ󵀄αS31 S32 󶀱, C(αS (.)
0≤α,ρ≤1

Achievability is proved by extending the partial decode–forward lower bound in Theo-

rem . to the case with input cost constraints and using the discretization procedure in
16.7 Compress–Forward Lower Bound 399

д21 Z3
X1󳰀󳰀 Y2 X2 д32

X1󳰀 д31
Y3

Figure .. Sender frequency-division Gaussian relay channel.

Section . with U = X1󳰀󳰀 ∼ N(0, αP),

̄ X1󳰀 ∼ N(0, αP), and X2 ∼ N(0, P), where (X1󳰀 , X2 )
is jointly Gaussian with correlation coefficient ρ and is independent of X1󳰀󳰀 . The converse
follows by showing that the capacity in Proposition . is attained by the same choice of
(X1󳰀 , X1󳰀󳰀 , X2 ).

Remark 16.1. It can be readily verified that direct transmission achieves C(S31 ) (corre-
sponding to α = 1 in (.)), while decode–forward achieves min{C(S21 ), C(S32 )} (cor-
responding to α = 0 in (.)). Both of these coding schemes are strictly suboptimal in
general.
Remark 16.2. By contrast, it can be shown that the partial decode–forward lower bound
for the (full-duplex) Gaussian RC in Section . is equal to the maximum of the direct-
transmission and decode–forward lower bounds; see Appendix B. As such, partial
decode–forward does not offer any rate improvement over these simpler schemes for the
full-duplex case.

16.7 COMPRESS–FORWARD LOWER BOUND

In the (partial) decode–forward coding scheme, the relay recovers the entire message (or
part of it). If the channel from the sender to the relay is weaker than the direct channel
to the receiver, this requirement can reduce the rate below that for direct transmission
in which the relay is not used at all. In the compress–forward coding scheme, the relay
helps communication by sending a description of its received sequence to the receiver.
Because this description is correlated with the received sequence, Wyner–Ziv coding is
used to reduce the rate needed to communicate it to the receiver. This scheme achieves
the following lower bound.

Theorem . (Compress–Forward Lower Bound). The capacity of the DM-RC is

lower bounded as

C ≥ max min󶁁I(X1 , X2 ; Y3 ) − I(Y2 ; Ŷ2 | X1 , X2 , Y3 ), I(X1 ; Ŷ2 , Y3 | X2 )󶁑,

where the maximum is over all conditional pmfs p(x1 )p(x2 )p( ŷ2 |x2 , y2 ) with |Ŷ2 | ≤
|X2 |⋅|Y2 | + 1.
400 Relay Channels

Compared to the cutset bound in Theorem ., the first term in the minimum is the
multiple access bound without coherent cooperation (X1 and X2 are independent) and
with a subtracted term, and the second term resembles the broadcast bound but with Y2
replaced by the description Ŷ2 .

Remark 16.3. The compress–forward lower bound can be equivalently characterized as

C ≥ max I(X1 ; Ŷ2 , Y3 | X2 ), (.)

where the maximum is over all conditional pmfs p(x1 )p(x2 )p( ŷ2 |x2 , y2 ) such that

I(X2 ; Y3 ) ≥ I(Y2 ; Ŷ2 | X2 , Y3 ).

We establish this equivalence in Appendix C.

Remark 16.4. The bound (before maximization) is not in general convex in p(x1 )p(x2 )
p( ŷ2 |x2 , y2 ) and hence the compress–forward scheme can be improved using coded time
sharing to yield the lower bound

C ≥ max min󶁁I(X1 , X2 ; Y3 |Q) − I(Y2 ; Ŷ2 | X1 , X2 , Y3 , Q), I(X1 ; Ŷ2 , Y3 | X2 , Q)󶁑,

where the maximum is over all conditional pmfs p(q)p(x1 |q)p(x2 |q)p( ŷ2 |x2 , y2 , q) with
|Ŷ2 | ≤ |X2 |⋅|Y2 | + 1 and |Q| ≤ 2.

16.7.1 Proof of the Compress–Forward Lower Bound

Again a block Markov coding scheme is used to communicate (b − 1) i.i.d. messages in
b blocks. At the end of block j, a reconstruction sequence ŷ2n ( j) of y2n ( j) conditioned on
x2n ( j) (which is known to both the relay and the receiver) is chosen by the relay. Since the
receiver has side information y3n ( j) about ŷ2n ( j), we use binning as in Wyner–Ziv coding
to reduce the rate necessary to send ŷ2n ( j). The bin index is sent to the receiver in block
j + 1 via x2n ( j + 1). At the end of block j + 1, the receiver decodes for x2n ( j + 1). It then
uses y3n ( j) and x2n ( j) to decode for ŷ2n ( j) and x1n ( j) simultaneously. We now give the details
of the proof.
Codebook generation. Fix the conditional pmf p(x1 )p(x2 )p( ŷ2 |y2 , x2 ) that attains the
lower bound. Again, we randomly generate an independent codebook for each block. For
j ∈ [1 : b], randomly and independently generate 2nR sequences x1n (m j ), m j ∈ [1 : 2nR ],
each according to ∏ni=1 p X1 (x1i ). Randomly and independently generate 2nR2 sequences
x2n (l j−1 ), l j−1 ∈ [1 : 2nR2 ], each according to ∏ni=1 p X2 (x2i ). For each l j−1 ∈ [1 : 2nR2 ], ran-
̃
domly and conditionally independently generate 2nR2 sequences ŷ2n (k j |l j−1 ), k j ∈ [1 :
̃
2nR2 ], each according to ∏ni=1 pŶ2 |X2 ( ŷ2i |x2i (l j−1 )). This defines the codebook
̃
C j = 󶁁(x1n (m j ), x2n (l j−1 ), ŷ2n (k j | l j−1 )) : m j ∈ [1 : 2nR ], l j−1 ∈ [1 : 2nR2 ], k j ∈ [1 : 2nR2 ]󶁑.
̃
Partition the set [1 : 2nR2 ] into 2nR2 equal size bins B(l j ), l j ∈ [1 : 2nR2 ]. The codebook and
bin assignments are revealed to all parties.
16.7 Compress–Forward Lower Bound 401

Encoding and decoding are explained with the help of Table ..

Block    ⋅⋅⋅ b−1 b

X1 x1n (m1 ) x1n (m2 ) x1n (m3 ) ⋅⋅⋅ x1n (mb−1 ) x1n (1)

Y2 ŷ2n (k1 |1), l1 ŷ2n (k2 |l1 ), l2 ŷ2n (k3 |l2 ), l3 ⋅⋅⋅ ŷ2n (kb−1 |lb−2 ), lb−1 

X2 x2n (1) x2n (l1 ) x2n (l2 ) ⋅⋅⋅ x2n (lb−2 ) x2n (lb−1 )

Y3  ̂l , k̂ , ̂l , k̂ , ⋅⋅⋅ ̂l , k̂ , ̂l , ̂k ,
1 1 2 2 b−2 b−2 b−1 b−1
̂1
m ̂2
m ̂ b−2
m ̂ b−1
m

Table .. Encoding and decoding for the compress–forward lower bound.

Encoding. Let m j ∈ [1 : 2nR ] be the message to be sent in block j. The encoder transmits
x1n (m j ) from codebook C j , where mb = 1 by convention.
Relay encoding. By convention, let l0 = 1. At the end of block j, the relay finds an index
k j such that (y2n ( j), ŷ2n (k j |l j−1 ), x2n (l j−1 )) ∈ Tє(n)
󳰀 . If there is more than one such index, it

selects one of them uniformly at random. If there is no such index, it selects an index
̃
from [1 : 2nR2 ] uniformly at random. In block j + 1 the relay transmits x2n (l j ), where l j is
the bin index of k j .

Decoding. Let є > є 󳰀 . At the end of block j + 1, the receiver finds the unique index
̂l such that (x n ( ̂l ), y n ( j + 1)) ∈ T (n) . It then finds the unique message m
̂ j such that
j 2 j 3 є
(x (m
1
n
̂ ), x ( l ), ŷ (k | l ), y ( j)) ∈ T for some k̂ ∈ B( ̂l ).
j
n ̂
2
n ̂ ̂
j−1 2
n
j j−1
(n)
3 є j j

Analysis of the probability of error. We analyze the probability of decoding error for the
message M j averaged over codebooks. Assume without loss of generality that M j = 1 and
let L j−1 , L j , K j denote the indices chosen by the relay in block j. Then the decoder makes
an error only if one or more of the following events occur:
̃
̃ j) = 󶁁(X n (L j−1 ), Ŷ n (k j |L j−1 ), Y n ( j)) ∉ T (n)
E( for all k j ∈ [1 : 2nR2 ]󶁑,
2 2 2 є󳰀

E1 ( j − 1) = 󶁁L̂ j−1 ̸= L j−1 󶁑,

E1 ( j) = 󶁁L̂ j ̸= L j 󶁑,

E2 ( j) = 󶁁(X1n (1), X2n (L̂ j−1 ), Ŷ2n (K j | L̂ j−1 ), Y3n ( j)) ∉ Tє(n) 󶁑,

E3 ( j) = 󶁁(X1n (m j ), X2n (L̂ j−1 ), Ŷ2n (K j | L̂ j−1 ), Y3n ( j)) ∈ Tє(n) for some m j ̸= 1󶁑,

E4 ( j) = 󶁁(X1n (m j ), X2n (L̂ j−1 ), Ŷ2n (k̂ j | L̂ j−1 ), Y3n ( j)) ∈ Tє(n)
for some k̂ j ∈ B(L̂ j ), k̂ j ̸= K j , m j ̸= 1󶁑.
402 Relay Channels

Thus the probability of error is upper bounded as

P(E( j)) = P{M̂ j ̸= 1}

̃ j)) + P(E1 ( j − 1)) + P(E1 ( j)) + P(E2 ( j) ∩ Ẽc ( j) ∩ E c ( j − 1))
≤ P(E( 1
+ P(E3 ( j)) + P(E4 ( j) ∩ E1c ( j − 1) ∩ E1c ( j)).

By independence of the codebooks and the covering lemma, the first term P(E( ̃ j)) tends to
̃ ̂ 󳰀
zero as n → ∞ if R2 > I(Y2 ; Y2 |X2 ) + δ(є ). Following the analysis of the error probability
in the multihop coding scheme, the next two terms P(E1 ( j − 1)) = P{L̂ j−1 ̸= L j−1 } and
P(E1 ( j)) = P{L̂ j ̸= L j } tend to zero as n → ∞ if R2 < I(X2 ; Y3 ) − δ(є). The fourth term is
upper bounded as

P(E2 ( j) ∩ Ẽc ( j) ∩ E1c ( j − 1)) ≤ P󶁁(X1n (1), X2n (L j−1 ), Ŷ2n (K j |L j−1 ), Y3n ( j)) ∉ Tє(n) 󵄨󵄨󵄨󵄨 Ẽc ( j)󶁑,

which, by the independence of the codebooks and the conditional typicality lemma, tends
to zero as n → ∞. By the same independence and the packing lemma, P(E3 ( j)) tends to
zero as n → ∞ if R < I(X1 ; X2 , Ŷ2 , Y3 ) + δ(є) = I(X1 ; Ŷ2 , Y3 |X2 ) + δ(є). Finally, following
similar steps as in Lemma ., the last term is upper bounded as

P(E4 ( j) ∩ E1c ( j − 1) ∩ E1c ( j)) ≤ P󶁁(X1n (m j ), X2n (L j−1 ), Ŷ2n (k̂ j |L j−1 ), Y3n ( j)) ∈ Tє(n)
for some k̂ j ∈ B(L j ), k̂ j ̸= K j , m j ̸= 1󶁑
≤ P󶁁(X1n (m j ), X2n (L j−1 ), Ŷ2n (k̂ j |L j−1 ), Y3n ( j)) ∈ Tє(n)
for some k̂ j ∈ B(1), m j ̸= 1󶁑, (.)

which, by the independence of the codebooks, the joint typicality lemma (twice), and the
union of events bound, tends to zero as n → ∞ if R + R̃ 2 − R2 < I(X1 ; Y3 |X2 ) +
I(Ŷ2 ; X1 , Y3 |X2 ) − δ(є). Combining the bounds and eliminating R2 and R̃ 2 , we have shown
that P{M ̂ j ̸= M j } tends to zero as n → ∞ for every j ∈ [1 : b − 1] if

R < I(X1 , X2 ; Y3 ) + I(Ŷ2 ; X1 , Y3 | X2 ) − I(Ŷ2 ; Y2 | X2 ) − 2δ(є) − δ(є 󳰀 )

(a)
= I(X1 , X2 ; Y3 ) + I(Ŷ2 ; X1 , Y3 | X2 ) − I(Ŷ2 ; X1 , Y2 , Y3 | X2 ) − δ 󳰀 (є)
= I(X1 , X2 ; Y3 ) − I(Ŷ2 ; Y2 | X1 , X2 , Y3 ) − δ 󳰀 (є),

where (a) follows since Ŷ2 → (X2 , Y2 ) → (X1 , Y3 ) form a Markov chain. This completes
the proof of the compress–forward lower bound.
Remark .. There are several other coding schemes that achieve the compress–forward
lower bound, most notably, the noisy network coding scheme described in Section ..

16.7.2 Compress–Forward for the Gaussian RC

The conditional distribution F(x1 )F(x2 )F( ŷ2 |y2 , x2 ) that attains the compress–forward
lower bound in Theorem . is not known for the Gaussian RC in general. Let X1 ∼
N(0, P), X2 ∼ N(0, P), and Z ∼ N(0, N) be mutually independent and Ŷ2 = Y2 + Z (see
16.7 Compress–Forward Lower Bound 403

Figure .). Substituting in the compress–forward lower bound in Theorem . and
optimizing over N, we obtain the lower bound

S21 S32
C ≥ C 󶀥S31 + 󶀵. (.)
S31 + S21 + S32 + 1

This bound becomes tight as S32 tends to infinity. When S21 is small, the bound can be
improved via time sharing on the sender side.

Ŷ2

Z2 Z ∼ N(0, N)

Z3
д21 Y2 X2 д32

X1 Y3
д31

Figure .. Compress–forward for the Gaussian RC.

Figure . compares the cutset bound, the decode–forward lower bound, and the
compress–forward lower bound as a function of S31 for different values of S21 and S32 .
Note that in general compress–forward outperforms decode–forward when the channel
from the sender to the relay is weaker than that to the receiver, i.e., S21 < S31 , or when
the channel from the relay to the receiver is sufficiently strong, specifically if S32 ≥ S21 (1 +
S21 )/S31 − (1 + S31 ). Decode–forward outperforms compress–forward (when the latter is
evaluated using Gaussian distributions) in other regimes. In general, it can be shown that
both decode–forward and compress–forward achieve rates within half a bit of the cutset
bound.

16.7.3 Relay Channel with Orthogonal Receiver Components

As a dual model to the DM-RC with orthogonal sender components (see Section ..),
consider the DM-RC with orthogonal receiver components depicted in Figure .. Here
Y3 = (Y3󳰀 , Y3󳰀󳰀 ) and p(y2 , y3 |x1 , x2 ) = p(y3󳰀 , y2 |x1 )p(y3󳰀󳰀 |x2 ), decoupling the broadcast chan-
nel from the sender to the relay and the receiver from the direct channel from the relay to
the receiver.
The capacity of the DM-RC with orthogonal receiver components is not known in
general. The cutset bound in Theorem . simplifies to

C ≤ max min󶁁I(X1 ; Y3󳰀 ) + I(X2 ; Y3󳰀󳰀 ), I(X1 ; Y2 , Y3󳰀 )󶁑.

p(x1 )p(x2 )
404 Relay Channels

2.5

RCS
2

RCF
1.5
RDF

0.5

00 2 4 6 8 10
S31

(a) S21 = S31 , S32 = 4S31 .

2.5
RCS
RDF
2
RCF
1.5

0.5

0
0 2 4 6 8 10
S31

(b) S21 = 4S31 , S32 = S31 .

Figure .. Comparison of the cutset bound RCS , the decode–forward lower bound
RDF , and the compress–forward lower bound RCF on the capacity of the Gaussian
relay channel.
16.7 Compress–Forward Lower Bound 405

Y2 X2 p(y3󳰀󳰀 |x2 ) Y3󳰀󳰀

X1 p(y2 , y3󳰀 |x1 )
Y3󳰀

Figure .. Relay channel with orthogonal receiver components.

Let C0 = max p(x2 ) I(X2 ; Y3󳰀󳰀 ) denote the capacity of the channel from the relay to the re-
ceiver. Then the cutset bound can be expressed as

C ≤ max min󶁁I(X1 ; Y3󳰀 ) + C0 , I(X1 ; Y2 , Y3󳰀 )󶁑. (.)

p(x1 )

By comparison, the compress–forward lower bound in Theorem . simplifies to

C≥ max min󶁁I(X1 ; Y3󳰀 ) − I(Y2 ; Ŷ2 | X1 , Y3󳰀 ) + C0 , I(X1 ; Ŷ2 , Y3󳰀 )󶁑. (.)
p(x1 )p( ŷ2 |y2 )

These two bounds coincide for the deterministic relay channel with orthogonal receiver
components where Y2 is a function of (X1 , Y3󳰀 ). The proof follows by setting Ŷ2 = Y2 in the
compress–forward lower bound (.) and using the fact that H(Y2 |X1 , Y3󳰀 ) = 0. Note
that in general, the capacity itself depends on p(y3󳰀󳰀 |x2 ) only through C0 .
The following example shows that the cutset bound is not tight in general.
Example . (Modulo- sum relay channel). Consider the DM-RC with orthogonal
receiver components depicted in Figure ., where

Y3󳰀 = X1 ⊕ Z3 ,
Y2 = Z2 ⊕ Z3 ,
and Z2 ∼ Bern(p) and Z3 ∼ Bern(1/2) are independent of each other and of X1 .

C0
Z2 Y2 X2 Y3󳰀󳰀

X1 Y3󳰀

Figure .. Modulo- sum relay channel.

For C0 ∈ [0, 1], the capacity of this relay channel is

C = 1 − H󶀡p ∗ H −1 (1 − C0 )󶀱,

where H −1 (󰑣) ∈ [0, 1/2] is the inverse of the binary entropy function. The proof of achiev-
ability follows by setting Ŷ2 = Y2 ⊕ V , where V ∼ Bern(α) is independent of (X1 , Z2 , Z3 )
406 Relay Channels

and α = H −1 (1 − C0 ), in the compress–forward lower bound in (.). For the proof of

the converse, consider

nR ≤ I(X1n ; Y3󳰀n , Y3󳰀󳰀n ) + nєn

(a)
= I(X1n ; Y3󳰀n |Y3󳰀󳰀n ) + nєn
≤ n − H(Y3󳰀n | X1n , Y3󳰀󳰀n ) + nєn
= n − H(Z3n |Y3󳰀󳰀n ) + nєn
(b)
≤ n − nH(p ∗ H −1 (H(Y2n |Y3󳰀󳰀n )/n)) + nєn
(c)
≤ n − nH(p ∗ H −1 (1 − C0 )) + nєn ,

where (a) follows by the independence of X1n and (Z2n , Z3n , X2n , Y3󳰀󳰀n ), (b) follows by the
vector MGL with Z3n = Y2n ⊕ Z2n , which yields H(Z3n |Y3󳰀󳰀n ) ≥ nH(p ∗ H −1 (H(Y2n |Y3󳰀󳰀n )/n)),
and (c) follows since nC0 ≥ I(X2n ; Y3󳰀󳰀n ) ≥ I(Y2n ; Y3󳰀󳰀n ) = n − H(Y2n |Y3󳰀󳰀n ). Note that the
cutset bound in (.) simplifies to min{1 − H(p), C0 }, which is strictly larger than the
capacity if p ̸= 1/2 and 1 − H(p) ≤ C0 . Hence, the cutset bound is not tight in general.

16.8 RFD GAUSSIAN RELAY CHANNEL

The receiver frequency-division (RFD) Gaussian RC depicted in Figure . has orthog-
onal receiver components. In this half-duplex model, the channel from the relay to the
receiver uses a different frequency band from the broadcast channel from the sender to
the relay and the receiver. More specifically, in this model Y3 = (Y3󳰀 , Y3󳰀󳰀 ) and

Y2 = д21 X1 + Z2 ,
Y3󳰀 = д31 X1 + Z3󳰀 ,
Y3󳰀󳰀 = д32 X2 + Z3󳰀󳰀 ,

where д21 , д31 , and д32 are channel gains, and Z2 ∼ N(0, 1) and Z3 ∼ N(0, 1) are indepen-
dent noise components. Assume average power constraint P on each of X1 and X2 .
The capacity of this channel is not known in general. The cutset upper bound in The-
orem . (under the power constraints) simplifies to

C(S31 ) + C(S32 ) if S21 ≥ S32 (S31 + 1),

C≤󶁇 (.)
C(S31 + S21 ) otherwise.

The decode–forward lower bound in Theorem . simplifies to

C(S ) + C(S32 ) if S21 ≥ S31 + S32 (S31 + 1),

C ≥ 󶁇 31 (.)
C(S21 ) otherwise.

When S21 ≥ S31 + S32 (S31 + 1), the bounds in (.) and (.) coincide and the capacity
16.8 RFD Gaussian Relay Channel 407

Z2 Z3󳰀󳰀

д32
д21 Y2 X2 Y3󳰀󳰀

X1
д31 Y3󳰀

Z3󳰀

Figure .. Receiver frequency-division Gaussian relay channel.

C = C(S31 ) + C(S32 ) is achieved by decode–forward. If S21 ≤ S31 , the decode–forward

lower bound is worse than the direct-transmission lower bound. As in the full-duplex
case, the partial decode–forward lower bound reduces to the maximum of the direct-
transmission and decode–forward lower bounds, which is in sharp contrast to the sender
frequency-division Gaussian RC; see Remarks . and .. The compress–forward lower
bound in Theorem . with X1 ∼ N(0, P), X2 ∼ N(0, P), and Z ∼ N(0, N), independent
of each other, and Ŷ2 = Y2 + Z, simplifies (after optimizing over N) to
S21 S32 (S31 + 1)
C ≥ C 󶀥S31 + 󶀵. (.)
S21 + (S31 + 1)(S32 + 1)
This bound becomes asymptotically tight as either S31 or S32 approaches infinity. At low
S21 , that is, low SNR for the channel from the sender to the relay, compress–forward out-
performs both direct transmission and decode–forward. Furthermore, the compress–
forward rate can be improved via time sharing at the sender, that is, by having the sender
transmit at power P/α for a fraction α ∈ [0, 1] of the time and at zero power for the rest
of the time.

16.8.1 Linear Relaying for RFD Gaussian RC

Consider the RFD Gaussian RC with the relaying functions restricted to being linear
combinations of past received symbols. Note that under the orthogonal receiver com-
ponents assumption, we can eliminate the delay in relay encoding simply by relabeling
the transmission time for the channel from X2 to Y3󳰀󳰀 . Hence, we equivalently consider re-
laying functions of the form x2i = ∑ij=1 ai j y2 j , i ∈ [1 : n], or in vector notation of the form
X2n = AY2n , where the A is an n × n lower triangular matrix. This scheme reduces the relay
channel to a point-to-point Gaussian channel with input X1n and output Y3n = (Y3󳰀n , Y3󳰀󳰀n ).
Note that linear relaying is considerably simpler to implement in practice than decode–
forward and compress–forward. It turns out that its performance also compares well with
these more complex schemes under certain high SNR conditions.
The capacity with linear relaying, CL , is characterized by the multiletter expression
CL = lim CL(k) ,
k→∞
408 Relay Channels

where
1
CL(k) = sup I(X1k ; Y3k )
F(x1󰑘 ), A
k

and the supremum is over all cdfs F(x1k ) and lower triangular matrices A that satisfy the
sender and the relay power constraints P; see Problems . and . for multiletter
characterizations of the capacity of the DM and Gaussian relay channels. It can be easily
shown that CL(k) is attained by a Gaussian input X1k that satisfies the power constraint.

16.8.2 Amplify–Forward
Consider CL(1) , which is the maximum rate achieved via a simple amplify–forward relaying
scheme. It can be shown that CL(1) is attained by X1 ∼ N(0, P) and X2 = 󵀄P/(S21 + 1) Y2 .
Therefore
S21 S32
CL(1) = C 󶀥S31 + 󶀵. (.)
S21 + S32 + 1

This rate approaches the capacity as S32 → ∞.

The amplify–forward rate CL(1) is not convex in P. In fact, it is concave for small P
and convex for large P. Hence the rate can be improved by time sharing between direct
transmission and amplify–forward. Assuming amplify–forward is performed α of the
time and the relay transmits at power P/α during this time, we can achieve the improved
rate
̄
βS (β/α)(S31 + S21 S32 )
max 󶀦ᾱ C󶀦 󶀶 + α C󶀦 󶀶󶀶 . (.)
0<α,β≤1 ᾱ βS21 + S32 + α

Figure . compares the cutset bound to the decode–forward, compress–forward,

and amplify–forward lower bounds for different SNRs. Note that compress–forward out-
performs amplify–forward in general, but is significantly more complex to implement.

16.8.3* Linear Relaying Capacity of the RFD Gaussian RC

We can establish the following single-letter characterization of the capacity with linear
relaying.

Theorem .. The linear relaying capacity of the RFD Gaussian RC is

4 2 2
β0 P βjP д21 д32 η j
CL = max󶁦α0 C󶀦 󶀶 + 󵠈 α j C󶀦 󶀤1 + 2 η
󶀴󶀶󶁶,
α0 j=1 αj 1 + д32 j

where the maximum is over α j , β j ≥ 0 and η j > 0 such that ∑4j=0 α j = ∑4j=0 β j = 1 and
∑4j=1 η j 󶀢д21
2
β j + α j 󶀲 = P.
16.8 RFD Gaussian Relay Channel 409

2.5

2
RCF RCS

1.5 RAF

RDF
1

0.5

00 2 4 6 8 10
S31

(a) S21 = S31 , S32 = 3S31 .

2.5
RDF RCS

2
RCF
1.5 RAF

0.5

0
0 2 4 6 8 10
S31

(b) S21 = 3S31 , S32 = S31 .

Figure .. Comparison of the cutset bound RCS , the decode–forward lower
bound RDF , the compress–forward lower bound RCF , and the amplify–forward lower
bound RAF on the capacity of the RFD Gaussian RC.
410 Relay Channels

Proof outline. Assume without loss of generality that д31 = 1. Then CL(k) is the solution
to the optimization problem
󵄨󵄨 󵄨󵄨
󵄨󵄨 󵄨󵄨
󵄨󵄨󶁧 I + KX1 д21 д32 KX1 AT
󶁷 󵄨󵄨
󵄨󵄨 2 2 2 T 󵄨󵄨
󵄨󵄨 д21 д32 AKX1 I + д21 д32 AKX1 A + д32 AA 󵄨󵄨
T
log 󵄨 󵄨
1
maximize
2k 󵄨󵄨󵄨 I 0 󵄨󵄨
󵄨
󵄨󵄨󶁦 󵄨
󵄨󵄨 0 I + д 2 AAT 󶁶󵄨󵄨󵄨
󵄨󵄨 32 󵄨󵄨
subject to KX1 ⪰ 0,
tr(KX1 ) ≤ kP,
2
tr(д21 KX1 AT A + AT A) ≤ kP,
A lower triangular,

where KX1 = E(X1k (X1k )T ) and A are the optimization variables. This is a nonconvex prob-
lem in (KX1 , A) with k 2 + k variables. For a fixed A, the problem is convex in KX1 and has
a water-filling solution. However, finding A for a fixed KX1 is a nonconvex problem.
Now it can be shown that it suffices to consider diagonal KX1 and A. Thus, the opti-
mization problem simplifies to

k 2 2 2
1 д21 д32 a j
maximize log 󵠉󶀦1 + σ j 󶀦1 + 2 a2
󶀶󶀶
2k j=1 1 + д32 j

subject to σ j ≥ 0, j ∈ [1 : k],
k
󵠈 σ j ≤ kP,
j=1
k
󵠈 a2j (1 + д21
2
σ j ) ≤ kP.
j=1

While this is still a nonconvex optimization problem, the problem now involves only 2k
variables. Furthermore, it can be shown that at the optimum point, if σ j = 0, then a j = 0,
and conversely if a j = a j󳰀 = 0, then σ j = σ j󳰀 . Thus, the optimization problem can be fur-
ther simplified to
2 2 2
1 kβ P k0 k д21 д32 a j
maximize log󶁦󶀤1 + 0 󶀴 󵠉 󶀦1 + σ j 󶀦1 + 2 2
󶀶󶀶󶁶
2k k0 j=k +1
1 + д32 aj
0

subject to a j > 0, j ∈ [k0 + 1 : k],

k
󵠈 σ j ≤ k(1 − β0 )P,
j=k0
k
󵠈 a2j (1 + д21
2
σ j ) ≤ kP.
j=k0 +1
16.9 Lookahead Relay Channels 411

By the KKT condition, it can be shown that at the optimum, there are no more than four
distinct nonzero (σ j , a j ) pairs. Hence, CL(k) is the solution to the optimization problem
2 2 2 k󰑗
1 kβ0 P k0 4 д21 д32 a j
maximize log󶁦󶀤1 + 󶀴 󵠉󶀦1 + σ j 󶀦1 + 2 2
󶀶󶀶 󶁶
2k k0 j=1 1 + д32 aj
subject to a j > 0, j ∈ [1 : 4],
4
󵠈 k j σ j ≤ k(1 − β0 )P,
j=1
4
󵠈 k j a2j (1 + д21
2
σ j ) ≤ kP,
j=1
4
󵠈 k j = k,
j=0

where k j is a new optimization variable that denotes the number of times the pair (σ j , a j )
is used during transmission. Taking k → ∞ completes the proof.
Remark .. This is a rare example for which there is no known single-letter mutual in-
formation characterization of the capacity, yet we are able to reduce the multiletter char-
acterization directly to a computable characterization.

16.9 LOOKAHEAD RELAY CHANNELS

The relay channel can be viewed as a point-to-point communication system with side in-
formation about the input X1 available at the receiver through the relay. As we have seen
in Chapters  and , the degree to which side information can help communication de-
pends on its temporal availability (causal versus noncausal). In our discussion of the relay
channel so far, we have assumed that the relaying functions depend only on the past re-
ceived relay sequence, and hence the side information at the receiver is available strictly
causally. In this section, we study the relay channel with causal or lookahead relaying
functions. The lookahead at the relay may be, for example, the result of a difference be-
tween the arrival times of the signal from the sender to the receiver and to the relay. If
the signal arrives at both the relay and the receiver at the same time, then we obtain the
strictly causal relaying functions assumed so far. If the signal arrives at the receiver later
than at the relay, then we effectively have lookahead relaying functions.
To study the effect of lookahead on the capacity of the relay channel, consider the
lookahead DM-RC (X1 × X2 , p(y2 |x1 )p(y3 |x1 , x2 , y2 ), Y2 × Y3 , l) depicted in Figure ..
The integer parameter l specifies the amount of relaying lookahead. Note that here we
define the lookahead relay channel as p(y2 |x1 )p(y3 |x1 , x2 , y2 ), since the conditional pmf
p(y2 , y3 |x1 , x2 ) depends on the code due to the instantaneous or lookahead dependency of
X2 on Y2 . The channel is memoryless in the sense that p(y2i |x1i , y2i−1 , m) = pY2 |X1 (y2i |x1i )
and p(y3i |x1i , x2i , y2i , y3i−1 , m) = pY3 |X1 ,X2 ,Y2 (y3i |x1i , x2i , y2i ).
412 Relay Channels

A (2nR , n) code for the lookahead relay channel consists of

∙ a message set [1 : 2nR ],
∙ an encoder that assigns a codeword x1n (m) to each message m ∈ [1 : 2nR ],
∙ a relay encoder that assigns a symbol x2i (y2i+l ) to each sequence y2i+l for i ∈ [1 : n],
where the symbols that do not have positive time indices or time indices greater than
n are arbitrary, and
̂ 3n ) or an error message e to each received se-
∙ a decoder that assigns an estimate m(y
n
quence y3 .
We assume that the message M is uniformly distributed over [1 : 2nR ]. The definitions
of probability of error, achievability, and capacity C l are as before. Clearly, C l is mono-
tonically nondecreasing in l. The capacity C l of the lookahead DM-RC is not known in
general for any finite or unbounded l.

Y2i X2i (Y2i+l )

X1i Y3i
p(y2 |x1 )p(y3 |x1 , x2 , y2 )

Figure .. Lookahead relay channel.

The DM-RC we studied earlier corresponds to lookahead parameter l = −1, or equiv-

alently, a delay of . We denote its capacity C−1 by C. In the following, we discuss two
special cases.
∙ Noncausal relay channel: For this setup, l is unbounded, that is, the relaying func-
tions can depend on the entire relay received sequence y2n . We denote the capacity of
the noncausal relay channel by C∞ . The purpose of studying this extreme case is to
quantify the limit on the potential gain from allowing lookahead at the relay.
∙ Causal relay channel: for this setup, the lookahead parameter l = 0, that is, the relaying
function at time i can depend only on the past and present relay received symbols y2i
(instead of y2i−1 as in the DM-RC). We investigate the effect of this change on the
relaying gain.

16.9.1 Noncausal Relay Channels

The noncausal relay allows for arbitrarily large lookahead relaying functions. Its capacity
C∞ is not known in general. We establish upper and lower bounds on C∞ that are tight in
some cases, and show that C∞ can be strictly larger than the cutset bound on the capacity
C of the (strictly causal) relay channel in Theorem .. Hence C∞ can be larger than the
capacity C itself.
16.9 Lookahead Relay Channels 413

We first consider the following upper bound on C∞ .

Theorem . (Cutset Bound for the Noncausal DM-RC). The capacity of the non-
causal DM-RC p(y2 |x1 )p(y3 |x1 , x2 , y2 ) is upper bounded as

C∞ ≤ max min󶁁I(X1 ; Y2 ) + I(X1 ; Y3 | X2 , Y2 ), I(U , X1 ; Y3 )󶁑.

p(x1 )p(u|x1 , y2 ), x2 (u, y2 )

The proof follows by noting that a (2nR , n) code for the noncausal relay channel in-
duces a joint pmf of the form

p(m, x1n , x2n , y2n , y3n )

and using standard converse proof arguments.

By extending decode–forward to the noncausal case, we can establish the following
lower bound.

Theorem . (Noncausal Decode–Forward Lower Bound). The capacity of the

noncausal DM-RC p(y2 |x1 )p(y3 |x1 , x2 , y2 ) is lower bounded as

C∞ ≥ max min󶁁I(X1 ; Y2 ), I(X1 , X2 ; Y3 )󶁑.

p(x1 ,x2 )

To prove achievability, fix the pmf p(x1 , x2 ) that attains the lower bound and randomly
and independently generate 2nR sequence pairs (x1n , x2n )(m), m ∈ [1 : 2nR ], each according
to ∏ni=1 p X1 ,X2 (x1i , x2i ). Since the relay knows y2n in advance, it decodes for the message
m before transmission commences. The probability of error at the relay tends to zero
as n → ∞ if R < I(X1 ; Y2 ) − δ(є). The sender and the relay then cooperatively transmit
(x1n , x2n )(m) and the receiver decodes for m. The probability of error at the receiver tends
to zero as n → ∞ if R < I(X1 , X2 ; Y3 ) − δ(є). Combining the two conditions completes
the proof.
This lower bound is tight in some special cases.
Example . (Noncausal Sato relay channel). Consider the noncausal version of the
Sato relay channel in Example .. We showed that the capacity for the strictly causal case
is C = 1.1619 and coincides with the cutset bound in Theorem .. Now for the noncausal
case, note that by the degradedness of the channel, I(X1 ; Y3 |X2 , Y2 ) = 0, and p(x2 |x1 , y2 ) =
p(x2 |x1 ), since Y2 = X1 . Hence, the noncausal decode–forward lower bound in Theo-
rem . coincides with the cutset bound in Theorem . (maximized by setting U =
(X2 , Y2 ) = (X1 , X2 )) and characterizes the capacity C∞ . Optimizing the capacity expres-
sion C∞ = max p(x1 ,x2 ) min{I(X1 ; Y2 ), I(X1 , X2 ; Y3 )} by setting p(x1 , x2 ) as p X1 ,X2 (0, 1) =
p X1 ,X2 (1, 0) = p X1 ,X2 (1, 1) = p X1 ,X2 (2, 1) = 1/18 and p X1 ,X2 (0, 0) = p X1 ,X2 (2, 1) = 7/18, we
414 Relay Channels

obtain C∞ = log(9/4) = 1.1699. Therefore, for this example C∞ is strictly larger than the
cutset bound on the capacity C of the relay channel in Theorem ..
As another example, consider the noncausal version of the Gaussian relay channel in
Section .. By evaluating the cutset bound in Theorem . and the noncausal decode–
forward lower bound in Theorem . with average power constraint P on each of X1 and
X2 , we can show that if S21 ≥ S31 + S32 , the capacity is
C∞ = C󶀡S31 + S32 + 2󵀄S31 S32 󶀱. (.)
As in the Sato relay channel, this capacity is strictly larger than the cutset bound for the
Gaussian RC in (.).

16.9.2 Causal Relay Channels

We now consider the causal DM-RC, where for each i ∈ [1 : n], the relay encoder assigns a
symbol x2i (y2i ) to every y2i ∈ Y2i , that is, the relaying lookahead is l = 0. Again the capacity
C0 is not known in general. We provide upper and lower bounds on C0 and show that rates
higher than the cutset bound on the capacity C of the corresponding DM-RC can still be
achieved.
Consider the following upper bound on the capacity.

Theorem . (Cutset Bound for the Causal DM-RC). The capacity of the causal
DM-RC p(y2 |x1 )p(y3 |x1 , x2 , y2 ) is upper bounded as

C0 ≤ max min󶁁I(X1 ; Y2 , Y3 |U ), I(U , X1 ; Y3 )󶁑,

p(u,x1 ), x2 (u, y2 )

where |U | ≤ |X1 |⋅|X2 | + 1.

Note that restricting x2 to be a function only of u reduces this bound to the cutset
bound for the DM-RC in Theorem .. Conversely, this upper bound can be expressed
as a cutset bound C0 ≤ max p(x1 ,x2󳰀 ) min{I(X1 ; Y2 , Y3 |X2󳰀 ), I(X1 , X2󳰀 ; Y3 )} for a DM-RC with
relay sender alphabet X2󳰀 that consists of all mappings x2󳰀 : Y2 → X2 . This is analogous to
the capacity expression for the DMC with DM state available causally at the encoder in
Section ., which is achieved via the Shannon strategy.
Now we present lower bounds on the capacity of the causal DM-RC. Note that any
lower bound on the capacity of the DM-RC, for example, using partial decode–forward
or compress–forward, is a lower bound on the capacity of the causal relay channel. We
expect, however, that higher rates can be achieved by using the present relay received
symbol in addition to its past received symbols.
Instantaneous relaying lower bound. In this simple scheme, the relay transmitted sym-
bol x2i = x2 (y2i ) for i ∈ [1 : n], that is, x2i is a function only of y2i . This simple scheme
yields the lower bound
C0 ≥ max I(X1 ; Y3 ). (.)
p(x1 ), x2 (y2 )
16.9 Lookahead Relay Channels 415

We now show that this simple lower bound can be tight.

Example . (Causal Sato relay channel). Consider the causal version of the Sato relay
channel in Example .. We have shown that C = 1.1619 and C∞ = log(9/4) = 1.1699.
Consider the instantaneous relaying lower bound in (.) with pmf (3/9, 2/9, 4/9) on
X1 and the function x2 (y2 ) = 0 if y2 = 0 and x2 (y2 ) = 1 if y2 = 1 or 2. It can be shown that
this choice yields I(X1 ; Y3 ) = 1.1699. Hence the capacity of the causal Sato relay channel
is C0 = C∞ = 1.1699.
This result is not too surprising. Since the channel from the sender to the relay is noise-
less, complete cooperation, which requires knowledge of the entire received sequence in
advance, can be achieved simply via instantaneous relaying. Since for this example C0 > C
and C coincides with the cutset bound, instantaneous relaying alone can achieve a higher
rate than the cutset bound on C!

Causal decode–forward lower bound. The decode–forward lower bound for the DM-
RC can be easily extended to incorporate the present received symbol at the relay. This
yields the lower bound

C0 ≥ max min󶁁I(X1 ; Y2 |U ), I(U , X1 ; Y3 )󶁑. (.)

p(u,x1 ), x2 (u, y2 )

As for the cutset bound in Theorem ., this bound can be viewed as a decode–forward
lower bound for the DM-RC p(y2 , y3 |x1 , x2󳰀 ), where the relay sender alphabet X2󳰀 consists
of all mappings x2󳰀 : Y2 → X2 . Note that the causal decode–forward lower bound coin-
cides with the cutset bound when the relay channel is degraded, i.e., p(y3 |x1 , x2 , y2 ) =
p(y3 |x2 , y2 ).
Now we investigate the causal version of the Gaussian relay channel in the previous
subsection and in Section .. Consider the amplify–forward relaying scheme, which is a
special case of instantaneous relaying, where the relay in time i ∈ [1 : n] transmits a scaled
version of its received signal, i.e., x2i = ay2i . To satisfy the relay power constraint, we must
have a2 ≤ P/(д21 2
P + 1). The capacity of the resulting equivalent point-to-point Gaussian
channel with average received power (aд21 д32 + д31 )2 P and noise power (a2 д32 2
+ 1) yields
the lower bound on the capacity of the causal Gaussian relay channel

(aд21 д32 + д31 )2 P

C0 ≥ C 󶀦 2 +1
󶀶.
a2 д32

Now, it can be shown that if S21 (S21 + 1) ≤ S31 S32 , then this bound is optimized by a∗ =
д21 /д31 д32 and simplifies to
C0 ≥ C(S21 + S31 ). (.)

Evaluating the cutset bound on C∞ in Theorem . for the Gaussian case with average
power constraints yields
C∞ ≤ C(S21 + S31 ), (.)

if S21 ≤ S32 . Thus, if S21 ≤ S32 min{1, S31 /(S21 + 1)}, the bounds in (.) and (.)
416 Relay Channels

coincide and
C0 = C∞ = C(S21 + S31 ). (.)

This shows that amplify–forward alone can be optimal for both the causal and the non-
causal Gaussian relay channels, which is surprising given the extreme simplicity of this
relaying scheme. Note that the capacity in the above range of SNR values coincides with
the cutset bound on C. This is not the case in general, however, and it can be shown using
causal decode–forward that the capacity of the causal Gaussian relay channel can exceed
this cutset bound.
Remark .. Combining (.) and (.), we have shown that the capacity of the non-
causal Gaussian RC is known if S21 ≥ S31 + S32 or if S21 ≤ S32 min{1, S31 /(S21 + 1)}.

16.9.3 Coherent Cooperation

In studying the relay channel with and without lookahead, we have encountered three
different forms of coherent cooperation:
∙ Decode–forward: Here the relay recovers part or all of the message and the sender
and the relay cooperate on communicating the previous message. This requires that
the relay know only its past received symbols and therefore decode–forward can be
implemented for any finite lookahead l.
∙ Instantaneous relaying: Here the relay sends a function only of its current received
symbol. This is possible when the relay has access to the current received symbol,
which is the case for every l ≥ 0.
∙ Noncausal decode–forward: This scheme is possible only when the relaying functions
are noncausal. The relay recovers part or all of the message before communication
commences and cooperates with the sender to communicate the message to the re-
ceiver.
Although instantaneous relaying alone is sometimes optimal (e.g., for the causal Sato
relay channel and for a class of causal Gaussian relay channels), utilizing past received
symbols at the relay can achieve a higher rate in general.

SUMMARY

∙ Discrete memoryless relay channel (DM-RC)

∙ Cutset bound for the relay channel:
∙ Cooperative MAC and BC bounds
∙ Not tight in general
∙ Block Markov coding
∙ Use of multiple independent codebooks
Summary 417

∙ Decode–forward:
∙ Backward decoding
∙ Use of binning in channel coding
∙ Optimal for the degraded relay channel
∙ Partial decode–forward:
∙ Optimal for the semideterministic relay channel
∙ Optimal for the relay channel with orthogonal sender components
∙ Compress–forward:
∙ Optimal for the deterministic relay channel with orthogonal receiver components
∙ Optimal for the modulo- sum relay channel, which shows that cutset bound is
not always tight
∙ Gaussian relay channel:
∙ Capacity is not known for any nonzero channel gains
∙ Decode–forward and compress–forward are within / bit of the cutset bound
∙ Partial decode–forward reduces to maximum of decode–forward and direct trans-
mission rates
∙ SFD Gaussian RC: Capacity is achieved by partial decode–forward
∙ RFD Gaussian RC:
∙ Capacity is known for a range of channel gains and achieved using decode–forward
∙ Compress–forward outperforms amplify–forward
∙ Linear relaying capacity
∙ Lookahead relay channels:
∙ Cutset bounds for causal and noncausal relay channels
∙ Causal and noncausal decode–forward
∙ Instantaneous relaying
∙ Capacity of noncausal Gaussian RC is known for a large range of channel gains
and achieved by noncausal decode–forward/amplify forward
∙ Coherent cooperation:
∙ Cooperation on the previous message (decode–forward)
∙ Cooperation on the current symbol (instantaneous relaying)
418 Relay Channels

∙ Combination of the above (causal decode–forward)

∙ Cooperation on the current message (noncausal decode–forward)
∙ Open problems:
16.1. Is decode–forward or compress–forward optimal for the Gaussian RC with any
nonzero channel gains?
16.2. What joint distribution maximizes the compress–forward lower bound for the
Gaussian RC?
16.3. What is the linear relaying capacity of the Gaussian RC?
16.4. Can linear relaying outperform compress–forward?

BIBLIOGRAPHIC NOTES

The relay channel was first introduced by van der Meulen (a), who also established
the multiletter characterization of the capacity in Problem .. The relay channel was
independently motivated by work on packet radio systems at the University of Hawaii in
the mid s by Sato () among others. The relay channel in Example . is due to
him. The cutset bound and the decode–forward, partial decode–forward, and compress–
forward coding schemes are due to Cover and El Gamal (). They also established the
capacity of the degraded and reversely degraded relay channels as well as a lower bound
that combines compress–forward and partial decode–forward. The proof of decode–
forward using binning is due to Cover and El Gamal (). Backward decoding is due
to Willems and van der Meulen (), who used it to establish the capacity region of the
DM-MAC with cribbing encoders. The proof of decode–forward using backward decod-
ing is due to Zeng, Kuhlmann, and Buzo ().
The capacity of the semideterministic DM-RC is due to El Gamal and Aref ().
The capacity of the DM and Gaussian relay channels with orthogonal sender components
was established by El Gamal and Zahedi (). The characterization of the compress–
forward lower bound in Theorem . is due to El Gamal, Mohseni, and Zahedi ().
They also established the equivalence to the original characterization in (.) by Cover
and El Gamal (). The capacity of the deterministic RC with orthogonal receiver com-
ponents is due to Kim (a), who established it using the hash–forward coding scheme
due to Cover and Kim (). The modulo- sum relay channel example, which shows
that the cutset bound can be strictly loose, is due to Aleksic, Razaghi, and Yu (). The
bounds on the capacity of the receiver frequency-division Gaussian RC in Section .
are due to Høst-Madsen and Zhang (), Liang and Veeravalli (), and El Gamal,
Mohseni, and Zahedi ().
The amplify–forward relaying scheme was proposed by Schein and Gallager ()
for the diamond relay network and subsequently studied for the Gaussian relay channel
in Laneman, Tse, and Wornell (). The capacity of the RFD Gaussian RC with linear
relaying was established by El Gamal, Mohseni, and Zahedi (). The results on the
Problems 419

lookahead relay channel in Section . mostly follow El Gamal, Hassanpour, and Mam-
men (). The capacity of the causal relay channel in Example . is due to Sato ().
A survey of the literature on the relay channel can found in van der Meulen ().

PROBLEMS

.. Provide the details of the achievability proof of the partial decode–forward lower
bound in Theorem ..
.. Provide the details of the converse proof for the capacity of the DM-RC with
orthogonal sender components in Proposition ..
.. Beginning with the capacity expression for the DM case in Proposition ., estab-
lish the capacity of the sender frequency-division Gaussian RC in (.).
.. Justify the last inequality in (.) and show that the upper bound tends to zero as
n → ∞ using the joint typicality lemma.
.. Show that the capacity of the DM-RC with orthogonal receiver components
p(y3󳰀 , y2 |x1 )p(y3󳰀󳰀 |x2 ) depends on p(y3󳰀󳰀 |x2 ) only through C0 = max p(x2 ) I(X2 ; Y3󳰀󳰀 ).

.. Prove the cutset bound for the noncausal DM-RC in Theorem ..
.. Provide the details of the achievability proof for the decode–forward lower bound
for the noncausal DM-RC in Theorem ..
.. Establish the capacity of the noncausal Gaussian relay channel for S21 ≥ S31 + S32
in (.).
.. Multiletter characterization of the capacity. Show that the capacity of the DM-RC
p(y2 , y3 |x1 , x2 ) is characterized by

C = sup C (k) = lim C (k) , (.)

k k→∞

where
1
C (k) = max I(X1k ; Y3k ).
p(x1󰑘 ),{x2󰑖 (y2󰑖−1 )}󰑘󰑖=1 k

(Hint: To show that the supremum is equal to the limit, establish the superaddi-
tivity of kC (k) , i.e., jC ( j) + kC (k) ≤ ( j + k)C ( j+k) for all j, k.)
.. Gaussian relay channel. Consider the Gaussian relay channel with SNRs S21 , S31 ,
and S32 .
420 Relay Channels

(a) Derive the cutset bound in (.) starting from the cutset bound in Theo-
rem . under the average power constraints.
(b) Derive the multihop lower bound in (.).
(c) Using jointly Gaussian input distributions, derive an expression for the coher-
ent multihop lower bound for the Gaussian RC.
(d) Derive the decode–forward lower bound in (.) starting from the decode–
forward lower bound in Theorem . under average power constraints.
(e) Show that the decode–forward lower bound is within 1/2 bit of the cutset upper
bound.
(f) Derive the noncoherent decode–forward lower bound in (.).
(g) Using the Gaussian input distribution on (X1 , X2 ) and Gaussian test channel
for Ŷ2 , derive the compress–forward lower bound in (.).
(h) Show that the compress–forward lower bound is within 1/2 bit of the cutset
upper bound.
.. A multiletter characterization of the Gaussian relay channel capacity. Show that the
capacity of the Gaussian relay channel with average power constraint P on each of
X1 and X2 is characterized by

C(P) = lim C (k) (P),

k→∞

where
1
C (k) (P) = sup I(X1k ; Y3k ).
F(x1󰑘 ), {x2󰑖 (y2󰑖−1 )}󰑘󰑖=1
k
󰑘 2
∑󰑖=1 E(X1󰑖 )≤kP, ∑󰑘󰑖=1 E(X2󰑖 2
)≤kP

.. Properties of the Gaussian relay channel capacity. Let C(P) be the capacity of the
Gaussian RC with average power constraint P on each of X1 and X2 .
(a) Show that C(P) > 0 if P > 0 and C(P) tends to infinity as P → ∞.
(b) Show that C(P) tends to zero as P → 0.
(c) Show that C(P) is concave and strictly increasing in P.
.. Time-division lower bound for the Gaussian RC. Consider the Gaussian relay chan-
nel with SNRs S21 , S31 , and S32 . Suppose that the sender transmits to the receiver
for a fraction α1 of the time, to the relay α2 of the time, and the relay transmits to
the receiver the rest of the time. Find the highest achievable rate using this scheme
in terms of the SNRs. Compare this time-division lower bound to the multihop
and direct transmission lower bounds.
.. Gaussian relay channel with correlated noise components. Consider a Gaussian relay
channel where д31 = д21 = д32 = 1 and the noise components Z2 ∼ N(0, N2 ) and
Z3 ∼ N(0, N3 ) are jointly Gaussian with correlation coefficient ρ. Assume average
Problems 421

power constraint P on each of X1 and X2 . Derive expressions for the cutset bound
and the decode–forward lower bound in terms of P, N1 , N2 , and ρ. Under what
conditions do these bounds coincide? Interpret the result.
.. RFD-Gaussian relay channel. Consider the receiver frequency-division Gaussian
RC with SNRs S21 , S31 , and S32 .
(a) Derive the cutset bound in (.).
(b) Derive the decode–forward lower bound in (.). Show that it coincides with
the cutset bound when S21 ≥ S31 + S32 (S31 + 1).
(c) Using Gaussian inputs and test channel, derive the compress–forward lower
bound in (.).
(d) Consider Gaussian inputs and test channel as in part (c) with time sharing
between X1 ∼ N(0, P/α) for a fraction α of the time and X1 = 0 the rest of
the time. Derive an expression for the corresponding compress–forward lower
bound. Compare the bound to the one without time sharing.
(e) Show that the partial decode–forward lower bound reduces to the maximum
of the direct-transmission and decode–forward lower bounds.
(f) Derive the amplify–forward lower bound without time sharing in (.) and
the one with time sharing in (.).
(g) Show that the compress–forward lower bound with or without time sharing is
tighter than the amplify–forward lower bound with or without time sharing,
respectively.
.. Another modulo- sum relay channel. Consider the DM-RC with orthogonal re-
ceiver components p(y2 , y3󳰀 |x1 )p(y3󳰀󳰀 |x2 ), where

Y3󳰀 = Z2 ⊕ Z3 ,
Y2 = X1 ⊕ Z2 ,

and Z2 ∼ Bern(1/2) and Z3 ∼ Bern(p) are independent of each other and of X1 .

Suppose that the capacity of the relay-to-receiver channel p(y3󳰀󳰀 |x2 ) is C0 ∈ [0, 1].
Find the capacity of the relay channel.
.. Broadcasting over the relay channel. Consider the DM-RC p(y2 , y3 |x1 , x2 ). Sup-
pose that the message M is to be reliably communicated to both the relay and the
receiver. Find the capacity.
.. Partial decode–forward for noncausal relay channels. Consider the noncausal DM-
RC p(y2 |x1 )p(y3 |x1 , x2 , y2 ).
(a) By adapting partial decode–forward for the DM-RC to this case, show that the
capacity is lower bounded as

C∞ ≥ max min󶁁I(X1 , X2 ; Y3 ), I(U ; Y2 ) + I(X1 ; Y3 | X2 , U )󶁑.

p(u,x1 ,x2 )
422 Relay Channels

(b) Suppose that Y2 = y2 (X1 ). Using the partial decode–forward lower bound in
part (a) and the cutset bound, show that the capacity of this noncausal semide-
terministic RC is

C∞ = max min󶁁I(X1 , X2 ; Y3 ), H(Y2 ) + I(X1 ; Y3 | X2 , Y2 )󶁑.

p(x1 ,x2 )

(c) Consider the noncausal DM-RC with orthogonal sender components, where
X1 = (X1󳰀 , X1󳰀󳰀 ) and p(y2 |x1 )p(y3 |x1 , x2 , y2 ) = p(y2 |x1󳰀󳰀 )p(y3 |x1󳰀 , x2 ). Show that
the capacity is

C∞ = max min󶁁I(X1󳰀 , X2 ; Y3 ), I(X1󳰀󳰀 ; Y2 ) + I(X1󳰀 ; Y3 | X2 )󶁑.

p(x1󳰀 ,x2 )p(x1󳰀󳰀 )

.. Partial decode–forward for causal relay channels. Consider the causal DM-RC
p(y2 |x1 )p(y3 |x1 , x2 , y2 ).
(a) By adapting partial decode–forward for the DM-RC to this case, show that the
capacity is lower bounded as

C0 ≥ max min󶁁I(V , X1 ; Y3 ), I(U ; Y2 |V ) + I(X1 ; Y3 |U , V )󶁑,

p(u,󰑣,x1 ), x2 (󰑣, y2 )

(b) Suppose that Y2 = y2 (X1 ). Using the partial decode–forward lower bound in
part (a) and the cutset bound, show that the capacity of this causal semideter-
ministic RC is

C0 = max min󶁁I(V , X1 ; Y3 ), H(Y2 |V ) + I(X1 ; Y3 |V , Y2 )󶁑.

p(󰑣,x1 ), x2 (󰑣, y2 )

.. Instantaneous relaying and compress–forward. Show how compress–forward can

be combined with instantaneous relaying for the causal DM-RC. What is the re-
sulting lower bound on the capacity C0 ?
.. Lookahead relay channel with orthogonal receiver components. Consider the DM-
RC with orthogonal receiver components p(y3󳰀 , y2 |x1 )p(y3󳰀󳰀 |x2 ). Show that the ca-
pacity with and without lookahead is the same, i.e., C = C l = C∞ for all l.
.. MAC with cribbing encoders. Consider the DM-MAC p(y|x1 , x2 ), where sender
j = 1, 2 wishes to communicate an independent message M j to the receiver.
(a) Suppose that the codeword from sender  is known strictly causally at sender ,
that is, the encoding function at sender  is x1i (m1 , x2i−1 ) at time i ∈ [1 : n]. Find
the capacity region.
(b) Find the capacity region when the encoding function is x1i (m1 , x2i ).
(c) Find the capacity region when the encoding function is x1i (m1 , x2n ).
(d) Find the capacity region when both encoders are “cribbing,” that is, the encod-
ing functions are x1i (m1 , x2i ) and x2i (m2 , x1i−1 ).
Remark: This problem was studied by Willems and van der Meulen ().
Appendix 16A Cutset Bound for the Gaussian RC 423

APPENDIX 16A CUTSET BOUND FOR THE GAUSSIAN RC

The cutset bound for the Gaussian RC is given by

C≤ sup min{I(X1 , X2 ; Y3 ), I(X1 ; Y2 , Y3 | X2 )}.

F(x1 ,x2 ):E(X12 )≤P, E(X22 )≤P

We perform the maximization by first establishing an upper bound on the right hand side
of this expression and then showing that it is attained by jointly Gaussian (X1 , X2 ).
We begin with the first mutual information term. Assume without loss of generality
that E(X1 ) = E(X2 ) = 0. Consider

I(X1 , X2 ; Y3 ) = h(Y3 ) − h(Y3 | X1 , X2 )

1
= h(Y3 ) − log(2πe)
2
1
≤ log(E(Y32 ))
2
1 2
≤ log󶀡1 + д31 E(X12 ) + д32
2
E(X22 ) + 2д31 д32 E(X1 X2 )󶀱
2
1
≤ log󶀡1 + S31 + S32 + 2ρ󵀄S31 S32 󶀱
2
= C󶀡S31 + S32 + 2ρ󵀄S31 S32 󶀱,

where ρ = E(X1 X2 )/󵀆E(X12 ) E(X22 ) is the correlation coefficient.

Next we consider the second mutual information term in the cutset bound

I(X1 ; Y2 , Y3 | X2 ) = h(Y2 , Y3 | X2 ) − h(Y2 , Y3 | X1 , X2 )

= h(Y2 , Y3 | X2 ) − h(Z2 , Z3 )
= h(Y3 | X2 ) + h(Y2 |Y3 , X2 ) − log(2πe)
1 1
≤ log󶀡E(Var(Y3 | X2 ))󶀱 + log󶀡E(Var(Y2 |Y3 , X2 ))󶀱
2 2
(a) 1
≤ log󶀡1 + д31 (E(X1 ) − (E(X1 X2 ))2 / E(X22 ))󶀱
2 2
2
2 2 2 2 2
1 1 + (д21 + д31 )(E(X1 ) − (E(X1 X2 )) / E(X2 ))
+ log 󶀦 2 (E(X 2 ) − (E(X X ))2 / E(X 2 ))
󶀶
2 1 + д31 1 1 2 2
1 2 2
= log󶀡1 + (д21 + д31 )(E(X12 ) − (E(X1 X2 ))2 / E(X22 ))󶀱
2
1
≤ log󶀡1 + (1 − ρ2 )(S21 + S31 )󶀱
2
= C󶀡(1 − ρ2 )(S21 + S31 )󶀱,

where (a) follows from the fact that the mean-squared errors of the linear MMSE estimates
of Y3 given X2 and of Y2 given (Y3 , X2 ) are greater than or equal to the expected conditional
variances E(Var(Y3 |X2 )) and E(Var(Y2 |Y3 , X2 )), respectively. Now it is clear that a zero-
mean Gaussian (X1 , X2 ) with the same power P and correlation coefficient ρ attains the
above upper bounds.
424 Relay Channels

APPENDIX 16B PARTIAL DECODE–FORWARD FOR THE GAUSSIAN RC

It is straightforward to verify that

I(X1 , X2 ; Y3 ) ≤ C󶀡S31 + S32 + 2ρ󵀄S31 S32 󶀱,

where ρ is the correlation coefficient between X1 and X2 . Next consider

We now upper bound (h(Y3 |X2 , U ) − h(Y2 |X2 , U )). First consider the case S21 > S31 , i.e.,
|д21 | > |д31 |. In this case

Hence, I(U ; Y2 |X2 ) + I(X1 ; Y3 |X2 , U ) ≤ C((1 − ρ2 )S21 ) and the rate of partial decode–
forward reduces to that of decode–forward.
Next consider the case S21 ≤ S31 . Since
1 1
log(2πe) ≤ h(Y2 | X2 , U) ≤ h(Y2 ) = log(2πe(1 + S21 )),
2 2
there exists a constant β ∈ [0, 1] such that
1
h(Y2 | X2 , U ) = log(2πe(1 + βS21 )).
2
Now consider

h(д21 X1 + Z2 | X2 , U ) = h󶀡(д21 /д31 )(д31 X1 + (д31 /д21 )Z2 ) | X2 , U 󶀱

where in (a) Z3󳰀 ∼ N(0, 1) and Z3󳰀󳰀 ∼ N(0, д31

2 2
/д21 − 1) are independent, and (b) follows
by the entropy power inequality. Since

1
h(д21 X1 + Z2 | X2 , U) = log(2πe(1 + βS21 )),
2

we obtain
2πe(S31 /S21 + βS31 ) ≥ 22h(Y3 |X2 ,U ) + 2πe(S31 /S21 − 1).

Thus h(Y3 |X2 , U) ≤ (1/2) log(2πe(1 + βS31 )) and

1 1 + βS31
h(Y3 | X2 , U ) − h(Y2 | X2 , U) ≤ log 󶀥 󶀵
2 1 + βS21
(a) 1 1 + S31
≤ log 󶀥 󶀵,
2 1 + S21

where (a) follows since if S21 ≤ S31 , (1 + βS31 )/(1 + βS21 ) is a strictly increasing function
of β and attains its maximum when β = 1. Substituting, we obtain

1 1 1 + S31
I(U ; Y2 | X2 ) + I(X1 ; Y3 | X2 , U ) ≤ log󶀡1 + S21 (1 − ρ2 )󶀱 + log 󶀥 󶀵
2 2 1 + S21
1 1 1 + S31
≤ log(1 + S21 ) + log 󶀥 󶀵
2 2 1 + S21
= C(S31 ),

which is the capacity of the direct channel. Thus the rate of partial decode–forward re-
duces to that of direct transmission.

APPENDIX 16C EQUIVALENT COMPRESS–FORWARD LOWER BOUND

Denote the compress–forward lower bound in Theorem . by

R󳰀 = max min󶁁I(X1 , X2 ; Y3 ) − I(Y2 ; Ŷ2 | X1 , X2 , Y3 ), I(X1 ; Ŷ2 , Y3 | X2 )󶁑,

where the maximum is over all conditional pmfs p(x1 )p(x2 )p( ŷ2 |x2 , y2 ), and denote the
alternative characterization in (.) by

R󳰀󳰀 = max I(X1 ; Ŷ2 , Y3 | X2 ),

where the maximum is over all conditional pmfs p(x1 )p(x2 )p( ŷ2 |x2 , y2 ) that satisfy the
constraint
I(X2 ; Y3 ) ≥ I(Y2 ; Ŷ2 | X2 , Y3 ). (.)
426 Relay Channels

We first show that R 󳰀󳰀 ≤ R󳰀 . For the conditional pmf that attains R󳰀󳰀 , we have

R󳰀󳰀 = I(X1 ; Y3 , Ŷ2 | X2 )

where (a) follows by (.).

To show that R󳰀 ≤ R 󳰀󳰀 , note that this is the case if I(X1 ; Y3 , Ŷ2 |X2 ) ≤ I(X1 , X2 ; Y3 ) −
I(Y2 ; Ŷ2 |X1 , X2 , Y3 ) for the conditional pmf that attains R󳰀 . Now assume that at the op-
timum conditional pmf, I(X1 ; Y3 , Ŷ2 |X2 ) > I(X1 , X2 ; Y3 ) − I(Y2 ; Ŷ2 |X1 , X2 , Y3 ). We show
that a higher rate can be achieved. Fix a product pmf p(x1 )p(x2 ) and let Ŷ2󳰀 = Ŷ2 with prob-
ability p and Ŷ2󳰀 =  with probability (1 − p). Then Ŷ2󳰀 → Ŷ2 → (Y2 , X2 ) form a Markov
chain. Note that the two mutual information terms are continuous in p and that as p
increases, the first term decreases and the second increases. Thus there exists p∗ such that

I(X1 ; Y3 , Ŷ2󳰀 | X2 ) = I(X1 , X2 ; Y3 ) − I(Y2 ; Ŷ2󳰀 | X1 , X2 , Y3 )

and the rate using p( ŷ2󳰀 |y2 , x2 ) is larger than that using p( ŷ2 |y2 , x2 ). By the above argu-
ment, at the optimum conditional pmf,

I(X1 , X2 ; Y3 ) − I(Y2 ; Ŷ2 | X1 , X2 , Y3 ) = I(X1 ; Y3 , Ŷ2 | X2 ).

Thus

I(X2 ; Y3 ) = I(X1 ; Y3 , Ŷ2 | X2 ) + I(Y2 ; Ŷ2 | X1 , X2 , Y3 ) − I(X1 ; Y3 | X2 )

= I(X1 ; Ŷ2 | X2 , Y3 ) + I(Y2 ; Ŷ2 | X1 , X2 , Y3 )
= I(X1 , Y2 ; Ŷ2 | X2 , Y3 )
= I(Y2 ; Ŷ2 | X2 , Y3 ).

This completes the proof of the equivalence.

CHAPTER 17

Interactive Channel Coding

The network models we studied so far involve only one-way (feedforward) communi-
cation. Many communication systems, however, are inherently interactive, allowing for
cooperation through feedback and information exchange over multiway channels. In this
chapter, we study the role of feedback in communication and present results on the two-
way channel introduced by Shannon as the first multiuser channel. The role of multi-
way interaction in compression and secure communication will be studied in Chapters 
and , respectively.
As we showed in Section .., the capacity of a memoryless point-to-point channel
does not increase when noiseless causal feedback is present. Feedback can still benefit
point-to-point communication, however, by simplifying coding and improving reliability.
The idea is to first send the message uncoded and then to use feedback to iteratively reduce
the receiver’s error about the message, the error about the error, and so on. We demon-
strate this iterative refinement paradigm via the Schalkwijk–Kailath coding scheme for
the Gaussian channel and the Horstein and block feedback coding schemes for the bi-
nary symmetric channel. We show that the probability of error for the Schalkwijk–Kailath
scheme decays double-exponentially in the block length, which is significantly faster than
the single-exponential decay of the probability of error without feedback.
We then show that feedback can enlarge the capacity region in multiuser channels. For
the multiple access channel, feedback enlarges the capacity region by enabling statistical
cooperation between the transmitters. We show that the capacity of the Gaussian MAC
with feedback coincides with the outer bound obtained by allowing arbitrary (instead
of product) joint input distributions. For the broadcast channel, feedback can enlarge the
capacity region by enabling the sender to simultaneously refine both receivers’ knowledge
about the messages. For the relay channel, we show that the cutset bound is achievable
when noiseless causal feedback from the receiver to the relay is allowed. This is in contrast
to the case without feedback in which the cutset bound is not achievable in general.
Finally, we discuss the two-way channel, where two nodes wish to exchange their mes-
sages interactively over a shared noisy channel. The capacity region of this channel is not
known in general. We first establish simple inner and outer bounds on the capacity region.
The outer bound is further improved using the new idea of dependence balance. Finally
we introduce the notion of directed information and use it to establish a nontrivial multi-
letter characterization of the capacity region of the two-way channel.
428 Interactive Channel Coding

17.1 POINT-TO-POINT COMMUNICATION WITH FEEDBACK

Consider the point-to-point feedback communication system depicted in Figure .. The
sender wishes to communicate a message M to the receiver in the presence of noiseless
causal feedback from the receiver, that is, the encoder assigns a symbol xi (m, y i−1 ) to
each message m ∈ [1 : 2nR ] and past received output symbols y i−1 ∈ Y i−1 for i ∈ [1 : n].
Achievability and capacity are defined as for the DMC with no feedback in Section ..
Recall from Section .. that feedback does not increase the capacity when the chan-
nel is memoryless, e.g., a DMC or a Gaussian channel. Feedback, however, can greatly
simplify coding and improve reliability. For example, consider a BEC with erasure prob-
ability p. Without feedback, we need to use block error correcting codes to approach the
channel capacity C = (1 − p) bits/transmission. With feedback, however, we can achieve
capacity by simply retransmitting each bit immediately after it is erased. It can be shown
that roughly n = k/(1 − p) transmissions suffice to send k bits of information reliably.
Thus, with feedback there is no need for sophisticated error correcting codes.
This simple observation can be extended to other channels with feedback. The ba-
sic idea is to first send the message uncoded and then to iteratively refine the receiver’s
knowledge about it. In the following, we demonstrate this general paradigm of iterative
refinement for feedback communication.

M Xi Yi ̂
M
Encoder p(y|x) Decoder

Y i−1

Figure .. Point-to-point feedback communication system.

17.1.1 Schalkwijk–Kailath Coding Scheme for the Gaussian Channel

Consider a Gaussian channel with noiseless causal feedback, where the channel output is
Y = X + Z and Z ∼ N(0, 1) is the noise. Assume the expected average transmission power
constraint
n
󵠈 E(xi2 (m, Y i−1 )) ≤ nP, m ∈ [1 : 2nR ]. (.)
i=1

As shown in Section ., the capacity of this channel is C = C(P). We present a simple
coding scheme by Schalkwijk and Kailath that achieves any rate R < C(P).
Codebook. Divide the interval [−󵀂P, 󵀂P] into 2nR equal-length “message intervals.”
Represent each message m ∈ [1 : 2nR ] by the midpoint θ(m) of its interval with distance
Δ = 2󵀂P ⋅ 2−nR between neighboring message points.
Encoding. To simplify notation, we assume that transmission begins at time i = 0. The
encoder first transmits X0 = θ(m) at time i = 0; hence Y0 = θ(m) + Z0 . Because of the
17.1 Point-to-Point Communication with Feedback 429

feedback of Y0 , the encoder can learn the noise Z0 = Y0 − X0 . The encoder then transmits
X1 = γ1 Z0 at time i = 1, where γ1 = 󵀂P is chosen so that E(X12 ) = P. Subsequently, in
time i ∈ [2 : n], the encoder forms the MMSE estimate E(Z0 |Y i−1 ) of Z0 given Y i−1 , and
transmits
Xi = γi (Z0 − E(Z0 |Y i−1 )), (.)

where γi is chosen so that E(Xi2 ) = P for i ∈ [1 : n]. Hence, Yi ∼ N(0, P + 1) for every
i ∈ [1 : n], and the total (expected) power consumption over the (n + 1) transmissions is
upper bounded as
n
󵠈 E(Xi2 ) ≤ (n + 1)P.
i=0

Decoding. Upon receiving Y n , the receiver estimates θ(m) by

̂ n = Y0 − E(Z0 |Y n ) = θ(m) + Z0 − E(Z0 |Y n ),
Θ

̂ is sent if θ(m)
and declares that m ̂ n.
̂ is the closest message point to Θ
Analysis of the probability of error. Since Z0 and Z1 are independent and Gaussian, and
Y1 = γ1 Z0 + Z1 , it follows that E(Z0 |Y1 ) is linear in Y1 . Thus by the orthogonality principle
in Appendix B, X2 = γ2 (Z0 − E(Z0 |Y1 )) is Gaussian and independent of Y1 . Furthermore,
since Z2 is Gaussian and independent of (Y1 , X2 ), Y2 = X2 + Z2 is also Gaussian and in-
dependent of Y1 . In general, for i ≥ 1, E(Z0 |Y i−1 ) is linear in Y i−1 , and Yi is Gaussian and
independent of Y i−1 . Thus the output sequence Y n is i.i.d. with Yi ∼ N(0, P + 1).
Now we expand I(Z0 ; Y n ) in two ways. On the one hand,
n
I(Z0 ; Y n ) = 󵠈 I(Z0 ; Yi |Y i−1 )
i=1
n
= 󵠈󶀡h(Yi |Y i−1 ) − h(Yi |Z0 , Y i−1 )󶀱
i=1
n
= 󵠈󶀡h(Yi ) − h(Zi |Z0 , Y i−1 )󶀱
i=1
n
= 󵠈󶀡h(Yi ) − h(Zi )󶀱
i=1
n
= log(1 + P)
2
= n C(P).

On the other hand,

1 1
I(Z0 ; Y n ) = h(Z0 ) − h(Z0 |Y n ) = log .
2 Var(Z0 |Y n )

Hence, Var(Z0 |Y n ) = 2−2n C(P) and Θ ̂ n ∼ N(θ(m), 2−2n C(P) ). It is easy to see that the de-
coder makes an error only if Θ ̂ n is closer to the nearest neighbors of θ(m) than to θ(m)
430 Interactive Channel Coding

̂ n − θ(m)| > Δ/2 = 2−nR 󵀂P (see Figure .). The probability of error is
itself, that is, if |Θ
thus upper bounded as Pe(n) ≤ 2Q󶀡2n(C(P)−R) 󵀂P󶀱, where
∞
1 −t 2 /2
Q(x) = 󵐐 e dt, x ≥ 0.
x 󵀂2π
2
Since Q(x) ≤ (1/󵀂2π)e −x /2
for x ≥ 1 (Durrett , Theorem ..), if R < C(P), we have

2 22n(C(P)−R) P
Pe(n) ≤ 󵀊 exp 󶀦− 󶀶,
π 2

that is, the probability of error decays double-exponentially fast in block length n.

Δ Δ
θ− 2
θ θ+ 2

Figure .. Error event for the Schalkwijk–Kailath coding scheme.

Remark 17.1. The Schalkwijk–Kailath encoding rule (.) can be interpreted as updat-
ing the receiver’s knowledge about the initial noise Z0 (or equivalently the message θ(m))
in each transmission. This encoding rule can be alternatively expressed as

Xi = γi (Z0 − E(Z0 |Y i−1 ))

= γi (Z0 − E(Z0 |Y i−2 ) + E(Z0 |Y i−2 ) − E(Z0 |Y i−1 ))
γ
= i (Xi−1 − E(Xi−1 |Y i−1 ))
γi−1
γ
= i (Xi−1 − E(Xi−1 |Yi−1 )). (.)
γi−1
Thus, the sender iteratively corrects the receiver’s error in estimating the previous trans-
mission.
Remark 17.2. Consider the channel with input X1 and output X̂ 1 (Y n ) = E(X1 |Y n ). Be-
cause the MMSE estimate X̂ 1 (Y n ) is a linear function of X1 and Z n with I(X1 ; X̂ 1 ) = n C(P)
for Gaussian X1 , the channel from X1 to X̂ 1 is equivalent to a Gaussian channel with SNR
22n C(P) − 1 (independent of the specific input distribution on X1 ). Hence, the Schalkwijk–
Kailath scheme transforms n uses of the Gaussian channel with SNR P into a single use
of the channel with received SNR 22n C(P) − 1. Thus to achieve the capacity, the sender
can first send X1 = θ(m) and then use the same linear feedback functions as in (.) in
subsequent transmissions.
17.1 Point-to-Point Communication with Feedback 431

Remark 17.3. As another implication of its linearity, the Schalkwijk–Kailath scheme can
be used even when the additive noise is not Gaussian. In this case, by the Chebyshev
̂ n − θ(m)| > 2−nR 󵀂P} ≤ 2−2n(C(P)−R) P, which tends to zero as n →
inequality, Pe(n) ≤ P{|Θ
∞ if R < C(P).
Remark 17.4. The double-exponential decay of the error probability depends crucially
on the assumption of an expected power constraint ∑ni=1 E(xi2 (m, Y i−1 )) ≤ nP. Under the
more stringent almost-sure average power constraint
n
P󶁄󵠈 xi2 (m, Y i−1 ) ≤ nP󶁔 = 1 (.)
i=1

as assumed in the nonfeedback case, the double-exponential decay is no longer achievable.

Nevertheless, Schalkwijk–Kailath coding can still be used with a slight modification to
provide a simple constructive coding scheme.

17.1.2* Horstein’s Coding Scheme for the BSC

Consider a BSC(p), where the channel output is Y = X ⊕ Z and Z ∼ Bern(p). We present
a simple coding scheme by Horstein that achieves any rate R < 1 − H(p). The scheme is
illustrated in Figure ..
Codebook. Represent each message m ∈ [1 : 2nR ] by one of 2nR equidistant points θ(m) =
α + (m − 1)2−nR ∈ [0, 1), where the offset α ∈ [0, 2−nR ) is to be specified later.
Encoding. Define the encoding map for every θ0 ∈ [0, 1] (not only the message points
θ(m)). For each θ ∈ [0, 1) and pdf f on [0, 1), define

1 if θ is greater than the median of f ,

ϕ(θ, f ) = 󶁇
0 otherwise.

Let f0 = f (θ) be the uniform pdf (prior) on [0, 1) (i.e., Θ ∼ Unif[0, 1)). In time i = 1, the
encoder transmits x1 = 1 if θ0 > 1/2, and x1 = 0 otherwise. In other words, x1 = ϕ(θ0 , f0 ).
In time i ∈ [2 : n], upon receiving Y i−1 = y i−1 , the encoder calculates the conditional
pdf (posterior) as

f (θ|y i−2 )p(yi−1 |y i−2 , θ)

The encoder then transmits xi = ϕ(θ0 , fi−1 ). Note that fi−1 is a function of y i−1 .
432 Interactive Channel Coding

f (θ) f (θ|y1 ) f (θ|y 2 )

4 p̄2

2 p̄

1
4p p̄
2p

0 0 0
0 θ0 1 0 θ0 1 0 θ0 1

Y1 = 1 Y2 = 0
X1 = 1 X2 = 0 X3 = 1

Figure .. Horstein coding scheme. The area of the shaded region under each
conditional pdf is equal to 1/2. At time i, Xi = 0 is transmitted if θ0 is in the shaded
region and Xi = 1 is transmitted, otherwise.

Decoding. Upon receiving Y n = y n , the decoder uses maximal posterior interval decod-
ing. It finds the interval [β, β + 2−nR ) of length 2−nR that maximizes the posterior proba-
bility
β+2−󰑛󰑅
󵐐 f (θ| y n )dθ.
β

̂ is sent if θ(m)
If there is a tie, it chooses the smallest such β. Then it declares that m ̂ ∈
[β, β + 2−nR ).
Analysis of the probability of error (outline). We first consider the average probability
of error for Θ ∼ Unif[0, 1). Note that X1 ∼ Bern(1/2) and Y1 ∼ Bern(1/2). Moreover, for
every y i−1 ,

P{Xi = 1 | Y i−1 = y i−1 } = P󶁁ϕ(Θ, fΘ|Y 󰑖−1 (θ| y i−1 )) = 1 󵄨󵄨󵄨󵄨 Y i−1 = y i−1 󶁑 = .
1
2

Thus, Xi ∼ Bern(1/2) is independent of Y i−1 and hence Yi ∼ Bern(1/2) is also indepen-

dent of Y i−1 . Therefore,

I(Θ; Y n ) = H(Y n ) − H(Z n ) = n(1 − H(p)) = nC,

which implies that h(Θ|Y n ) = −nC. Moreover, by the LLN and the recursive definitions
of the encoding maps and conditional pdfs, it can be shown that (1/n) log f (Θ|Y n ) con-
verges to C in probability. These two facts strongly suggest (albeit do not prove) that the
probability of error P{Θ ∉ [β, β + 2−nR )} tends to zero as n → ∞ if R < C. By a more
17.1 Point-to-Point Communication with Feedback 433

refined analysis of the evolution of f (Θ|Y i ), i ∈ [1 : n], based on iterated function sys-
tems (Diaconis and Freedman , Steinsaltz ), it can be shown that the probability
of error EΘ 󶀡P{Θ ∉ [β, β + 2−nR )|Θ}󶀱 indeed tends to zero as n → ∞ if R < C. Therefore,
there must exist an αn ∈ [0, 2−nR ) such that
󰑛󰑅
1 2
Pe(n) = 󵠈 P󶁁θ ∉ [β, β + 2−nR ) 󵄨󵄨󵄨󵄨 Θ = αn + (m − 1)2−nR 󶁑
2nR m=1

tends to zero as n → ∞ if R < C. Thus, with high probability θ(M) = αn + (M − 1)2−nR

is the unique message point within the [β, β + 2−nR ) interval.
Posterior matching scheme. The Schalkwijk–Kailath and Horstein coding schemes are
special cases of a more general posterior matching coding scheme, which can be applied
to any DMC. In this coding scheme, to send a message point θ0 ∈ [0, 1), the encoder
transmits
Xi = FX−1 (FΘ|Y 󰑖−1 (θ0 |Y i−1 ))

in time i ∈ [1 : n], where FX−1 (u) = min{x : F(x) ≥ u} denotes the generalized inverse of
the capacity-achieving input cdf F(x) and the iteration begins with the uniform prior on
Θ. Note that Xi ∼ FX (xi ) is independent of Y i−1 .
Letting Ui (θ0 , Y i−1 ) = FΘ|Y 󰑖−1 (θ0 |Y i−1 ), we can express the coding scheme recursively
as

Ui = FU |Y (Ui−1 |Yi−1 ),
Xi = FX−1 (Ui ), i ∈ [1 : n],

where FU|Y (u|y) is the backward channel cdf for U ∼ Unif[0, 1), X = FX−1 (U), and Y | {X =
x} ∼ p(y|x). Thus {(Ui , Yi )} is a Markov process. Using this Markovity and properties of
iterated function systems, it can be shown that the maximal posterior interval decoding
used in the Horstein scheme achieves the capacity of the DMC (for some relabeling of the
input symbols in X ).

17.1.3 Block Feedback Coding Scheme for the BSC

In the previous iterative refinement coding schemes, the encoder sends a large amount of
uncoded information in the initial transmission and then iteratively refines the receiver’s
knowledge of this information in subsequent transmissions. Here we present a coding
scheme that implements iterative refinement at the block level; the encoder initially trans-
mits an uncoded block of information and then refines the receiver’s knowledge about it
in subsequent blocks.
Again consider a BSC(p), where the channel output is Y = X ⊕ Z and Z ∼ Bern(p).
Encoding. To send message m ∈ [1 : 2n ], the encoder first transmits the binary represen-
tation X1 = X n (m) of the message m. Upon receiving the feedback signal Y1 , the encoder
434 Interactive Channel Coding

compresses the noise sequence Z1 = X1 ⊕ Y1 losslessly and transmits the binary represen-
tation X2 (Z1 ) of the compression index over the next block of n(H(p) + δ(є)) transmis-
sions. Upon receiving the resulting feedback signal Y2 , the encoder transmits the corre-
sponding noise index X3 (Z2 ) over the next n(H(p) + δ(є))2 transmissions. Continuing in
the same manner, in the j-th block, the encoder transmits the noise index X j (Z j−1 ) over
n(H(p) + δ(є)) j−1 transmissions. After k1 such unequal-length block transmissions, the
encoder transmits the compression index of Zk1 over nk2 (H(p) + δ(є))k1 −1 transmissions
using repetition coding, that is, the encoder transmits the noise index k2 times without any
k1
coding. Note that the rate of this code is 1/(∑ j=1 (H(p) + δ(є)) j−1 + k2 (H(p) + δ(є))k1 ).
Decoding and analysis of the probability of error. Decoding is performed backwards.
The decoder first finds Zk1 using majority decoding, recovers Zk1 −1 through X(Zk1 −1 ) =
Yk1 ⊕ Zk1 , then recovers Zk1 −2 , and so on, until Z1 is recovered. The message is finally
recovered through X1 (m) = Y1 ⊕ Z1 . By choosing k1 = log(n/ log n) and k2 = log2 n, it
can be shown that the probability of error over all transmission blocks tends to zero while
the achieved rate approaches 1/ ∑∞ j=1 (H(p) + δ(є))
j−1
= 1 − H(p) − δ(є) as n → ∞.

17.2 MULTIPLE ACCESS CHANNEL WITH FEEDBACK

Consider the multiple access channel with noiseless causal feedback from the receiver
to both senders depicted in Figure .. A (2nR1 , 2nR2 , n) code for the DM-MAC with
feedback consists of two message sets [1 : 2nR 󰑗 ], j = 1, 2, two encoders x ji (m j , y i−1 ), i ∈
̂ 1 (y n ), m
[1 : n], j = 1, 2, and a decoder (m ̂ 2 (y n )). The probability of error, achievability,
and the capacity region are defined as for the DM-MAC with no feedback in Section ..

Y i−1

M1 X1i
Encoder 
Yi ̂ 1, M
M ̂2
p(y|x1 , x2 ) Decoder
M2 X2i
Encoder 

Y i−1

Figure .. Multiple access channel with feedback.

Unlike the point-to-point case, feedback can enlarge the capacity region by inducing
statistical cooperation between the two senders. The capacity region with feedback, how-
ever, is known only in some special cases, including the Gaussian MAC in Section ..
and the binary erasure MAC in Example ..
Consider the following outer bound on the capacity region with feedback.
17.2 Multiple Access Channel with Feedback 435

Proposition .. Any achievable rate pair (R1 , R2 ) for the DM-MAC p(y|x1 , x2 ) with
feedback must satisfy the inequalities

R1 ≤ I(X1 ; Y | X2 ),
R2 ≤ I(X2 ; Y | X1 ),
R1 + R2 ≤ I(X1 , X2 ; Y)

for some pmf p(x1 , x2 ).

Proof. To establish this cooperative outer bound, note that by the memoryless property,
(M1 , M2 , Y i−1 ) → (X1i , X2i ) → Yi form a Markov chain. Hence, as in the proof of the
converse for the DM-MAC in Section ., the following inequalities continue to hold
R1 ≤ I(X1Q ; YQ | X2Q , Q),
R2 ≤ I(X2Q ; YQ | X1Q , Q),
R1 + R2 ≤ I(X1Q , X2Q ; YQ |Q).
Now, since Q → (X1Q , X2Q ) → YQ form a Markov chain, the above inequalities can be
further relaxed to
R1 ≤ I(X1Q ; YQ | X2Q ),
R2 ≤ I(X2Q ; YQ | X1Q ),
R1 + R2 ≤ I(X1Q , X2Q ; YQ ).
Identifying X1 = X1Q , X2 = X2Q , and Y = YQ completes the proof of Proposition ..
Remark .. Note that unlike the case with no feedback, X1 and X2 are no longer con-
ditionally independent given Q. Hence, the cooperative outer bound is evaluated over all
joint pmfs p(x1 , x2 ). Later in Section .., we find constraints on the set of joint pmfs
induced between X1 and X2 through feedback, which yield a tighter outer bound in gen-
eral.

17.2.1 Cover–Leung Inner Bound

We establish the following inner bound on the capacity region of the DM-MAC with
feedback.

Theorem . (Cover–Leung Inner Bound). A rate pair (R1 , R2 ) is achievable for the
DM-MAC p(y|x1 , x2 ) with feedback if
R1 < I(X1 ; Y | X2 , U ),
R2 < I(X2 ; Y | X1 , U ),
R1 + R2 < I(X1 , X2 ; Y)
for some pmf p(u)p(x1 |u)p(x2 |u), where |U | ≤ min{|X1 |⋅|X2 | + 1, |Y| + 2}.
436 Interactive Channel Coding

Note that this region is convex and has a very similar form to the capacity region with
no feedback, which consists of all rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y | X2 , Q),
R2 ≤ I(X2 ; Y | X1 , Q),
R1 + R2 ≤ I(X1 , X2 ; Y |Q)

for some pmf p(q)p(x1 |q)p(x2 |q), where |Q| ≤ 2. The only difference between the two
regions is the conditioning in the sum-rate bound.
The above inner bound is tight in some special cases.
Example . (Binary erasure MAC with feedback). Consider the binary erasure MAC
in which the channel input symbols X1 and X2 are binary and the channel output symbol
Y = X1 + X2 is ternary. We know that the capacity region without feedback is given by

R1 ≤ 1,
R2 ≤ 1,
3
R1 + R2 ≤ .
2

First we show that this region can be achieved via a simple feedback coding scheme remi-
niscent of the block feedback coding scheme in Section ... We focus on the symmetric
rate pair (R1 , R2 ) = (R, R).
Suppose that each encoder sends k independent bits uncoded. Then on average k/2
bits are erased (that is, Y = 0 + 1 = 1 + 0 = 1 is received). Since encoder  knows the exact
locations of the erasures through feedback, it can retransmit the erased bits over the next
k/2 transmissions (while encoder  sends X2 = 0). The decoder now can recover both
messages. Since k bits are sent over k + k/2 transmissions, the rate R = 2/3 is achievable.
Alternatively, since encoder  also knows the k/2 erased bits for encoder  through
feedback, the two encoders can cooperate by each sending half of the k/2 erased bits over
the following k/4 transmissions. These retransmissions result in roughly k/8 erased bits,
which can be retransmitted again over the following k/16 transmissions, and so on. Pro-
ceeding recursively, we can achieve the rate R = k/(k + k/4 + k/16 + ⋅ ⋅ ⋅ ) = 3/4. This
coding scheme can be further improved by inducing further cooperation between the
encoders. Again suppose that k independent bits are sent initially. Since both encoders
know the k/2 erased bits, they can cooperate to send them over the next k/(2 log 3) trans-
missions. Hence we can achieve the rate R = k/(k + k/(2 log 3)) = 0.7602 > 3/4.
Now evaluating the Cover–Leung inner bound by taking X j = U ⊕ V j , where U ∼
Bern(1/2) and V j ∼ Bern(0.2377), j = 1, 2, are mutually independent, we can achieve
R = 0.7911. This rate can be shown to be optimal.

17.2.2 Proof of the Cover–Leung Inner Bound

The proof of achievability involves superposition coding, block Markov coding, and back-
ward decoding. A sequence of b i.i.d. message pairs (m1 j , m2 j ) ∈ [1 : 2nR1 ] × [1 : 2nR2 ],
17.2 Multiple Access Channel with Feedback 437

j ∈ [1 : b], are sent over b blocks of n transmissions, where by convention, m20 = m2b = 1.
At the end of block j − 1, sender  recovers m2, j−1 and then in block j both senders co-
operatively transmit information to the receiver (carried by U) to resolve the remaining
uncertainty about m2, j−1 , superimposed with information about the new message pair
(m1 j , m2 j ).
We now provide the details of the proof.
Codebook generation. Fix a pmf p(u)p(x1 |u)p(x2 |u). As in block Markov coding for the
relay channel, we randomly and independently generate a codebook for each block. For
j ∈ [1 : b], randomly and independently generate 2nR2 sequences un (m2, j−1 ), m2, j−1 ∈ [1 :
2nR2 ], each according to ∏ni=1 pU (ui ). For each m2, j−1 ∈ [1 : 2nR2 ], randomly and condi-
tionally independently generate 2nR1 sequences x1n (m1 j |m2, j−1 ), m1 j ∈ [1 : 2nR1 ], each ac-
cording to ∏ni=1 p X1 |U (x1i |ui (m2, j−1 )), and 2nR2 sequences x2n (m2 j |m2, j−1 ), m2 j ∈ [1 : 2nR2 ],
each according to ∏ni=1 p X2 |U (x2i |ui (m2, j−1 )). This defines the codebook C j = {(u(m2, j−1 ),
x1n (m1 j |m2, j−1 ), x2n (m2 j |m2, j−1 )) : m1 j ∈ [1 : 2nR1 ], m2, j−1 , m2 j ∈ [1 : 2nR2 ]} for j ∈ [1 : b].
The codebooks are revealed to all parties.
Encoding and decoding are explained with the help of Table ..

Block   ⋅⋅⋅ b−1 b

X1 x1n (m11 |1) ̃ 21 )

x1n (m12 |m ⋅⋅⋅ ̃ 2,b−2 )
x1n (m1,b−1 |m ̃ 2,b−1 )
x1n (m1b |m

(X1 , Y) ̃ 21 →
m ̃ 22 →
m ⋅⋅⋅ ̃ 2,b−1
m 

X2 x2n (m21 |1) x2n (m22 |m21 ) ⋅⋅⋅ x2n (m2,b−1 |m2,b−2 ) x2n (1|m2,b−1 )

Y ̂ 11
m ̂ 12 , m
←m ̂ 21 ⋅⋅⋅ ̂ 1,b−1 , m
←m ̂ 2,b−2 ̂ 1b , m
←m ̂ 2,b−1

Table .. Encoding and decoding for the Cover–Leung inner bound.

Encoding. Let (m1 j , m2 j ) be the messages to be sent in block j. Encoder  transmits

x2n (m2 j |m2, j−1 ) from codebook C j . At the end of block j − 1, encoder  finds the unique
message m ̃ 2, j−1 such that (un (m
̃ 2, j−2 ), x1n (m1, j−1 |m
̃ 2, j−2 ), x2n (m
̃ 2, j−1 |m
̃ 2, j−2 ), y n ( j − 1)) ∈
(n)
Tє . In block j, encoder  transmits x1 (m1 j |m
n
̃ 2, j−1 ) from codebook C j , where m ̃ 20 = 1 by
convention.
Decoding. Decoding at the receiver is performed successively backwards after all b blocks
are received. For j = b, . . . , 1, the decoder finds the unique message (m ̂ 1j, m
̂ 2, j−1 ) such
(n)
that (u (m
n
̂ 2, j−1 ), x1 (m
n
̂ 1 j |m
̂ 2, j−1 ), x2 (m
n
̂ 2 j |m
̂ 2, j−1 ), y ( j)) ∈ Tє , with the initial condition
n

m̂ 2b = 1.
Analysis of the probability of error. We analyze the probability of decoding error for the
message pair (M1 j , M2, j−1 ) averaged over codebooks. Assume without loss of generality
that M1 j = M2, j−1 = M2 j = 1. Then the decoder makes an error only if one or more of the
438 Interactive Channel Coding

following events occur:

̃ j) = 󶁁M
E( ̃ 2, j−1 ̸= 1󶁑,
̂ 1, j+1 , M
E( j + 1) = 󶁁(M ̂ 2j) =
̸ (1, 1)󶁑,
̃ 2, j−1 ), X n (1|1), Y n ( j)) ∉ T (n) 󶁑,
E1 ( j) = 󶁁(U n (1), X1n (1| M 2 є
̂ 2 j |1), Y n ( j)) ∈ Tє(n) for some m1 j ̸= 1󶁑,
E2 ( j) = 󶁁(U n (1), X1n (m1 j |1), X2n (M
̂ 2 j |m2, j−1 ), Y n ( j)) ∈ T (n)
E3 ( j) = 󶁁(U n (m2, j−1 ), X1n (m1 j |m2, j−1 ), X2n (M є
for some m2, j−1 ̸= 1, m1 j 󶁑.
Thus the probability of error is upper bounded as
P(E( j)) = P{(M̂ 1j, M
̂ 2, j−1 ) =
̸ (1, 1)}
̃ j) ∪ E( j + 1) ∪ E1 ( j) ∪ E2 ( j) ∪ E3 ( j))
≤ P(E(
̃ j)) + P(E( j + 1)) + P(E1 ( j) ∩ Ẽc ( j) ∩ E c ( j + 1))
≤ P(E(
+ P(E2 ( j) ∩ Ẽc ( j)) + P(E3 ( j) ∩ Ẽc ( j)).
Following steps similar to the analysis of the coherent multihop relaying scheme in Sec-
tion .., by independence of the codebooks, the LLN, the packing lemma, and induc-
̃ j)) tends to zero as n → ∞ for all j ∈ [1 : b] if R2 < I(X2 ; Y , X1 |U) − δ(є) =
tion, P(E(
I(X2 ; Y|X1 , U ) − δ(є). Also, following steps similar to the analysis of the decode–forward
scheme via backward decoding in Section .., by the independence of the codebooks,
the LLN, the packing lemma, and induction, the last three terms tend to zero as n → ∞
if R1 < I(X1 ; Y |X2 , U) − δ(є) and R1 + R2 < I(U , X1 , X2 ; Y) − δ(є) = I(X1 , X2 ; Y ) − δ(є).
By induction, the second term P(E( j + 1)) tends to zero as n → ∞ and hence P(E( j))
tends to zero as n → ∞ for every j ∈ [1 : b] if the inequalities in the theorem are satisfied.
This completes the proof of the Cover–Leung inner bound.
Remark .. The Cover–Leung coding scheme uses only one-sided feedback. As such,
it is not surprising that this scheme is suboptimal in general, as we show next.

17.2.3 Gaussian MAC with Feedback

Consider the Gaussian multiple access channel Y = д1 X1 + д2 X2 + Z with feedback under
expected average power constraint P on each of X1 and X2 (as defined in (.)). It can be
shown that the Cover–Leung inner bound simplifies to the inner bound consisting of all
(R1 , R2 ) such that
R1 ≤ C(ᾱ1 S1 ),
R2 ≤ C(ᾱ2 S2 ), (.)
R1 + R2 ≤ C󶀡S1 + S2 + 2󵀄α1 α2 S1 S2 󶀱
for some α1 , α2 ∈ [0, 1], where S1 = д12 P and S2 = д22 P. Achievability follows by setting
X1 = 󵀄α1 PU + V1 and X2 = 󵀄α2 PU + V2 , where U ∼ N(0, 1), V1 ∼ N(0, ᾱ1 P), and V2 ∼
N(0, ᾱ2 P) are mutually independent.
17.2 Multiple Access Channel with Feedback 439

It turns out this inner bound is strictly suboptimal.

Theorem .. The capacity region of the Gaussian MAC with feedback is the set of
rate pairs (R1 , R2 ) such that

R1 ≤ C󶀡(1 − ρ2 )S1 󶀱,
R2 ≤ C󶀡(1 − ρ2 )S2 󶀱,
R1 + R2 ≤ C󶀡S1 + S2 + 2ρ󵀄S1 S2 󶀱

for some ρ ∈ [0, 1].

The converse follows by noting that in order to evaluate the cooperative outer bound
in Proposition ., it suffices to consider a zero-mean Gaussian (X1 , X2 ) with the same
average power P and correlation coefficient ρ.

17.2.4 Achievability Proof of Theorem 17.2

Achievability is proved using an interesting extension of the Schalkwijk–Kailath coding
scheme. We first show that the sum-capacity
Csum = max min󶁁C󶀡(1 − ρ2 )S1 󶀱 + C󶀡(1 − ρ2 )S2 󶀱, C󶀡S1 + S2 + 2ρ󵀄S1 S2 󶀱󶁑 (.)
ρ∈[0,1]

is achievable. Since the first term in the minimum decreases with ρ while the second term
increases with ρ, the maximum is attained by a unique ρ∗ ∈ [0, 1] such that
2 2
C󶀡(1 − ρ∗ )S1 󶀱 + C󶀡(1 − ρ∗ )S2 󶀱 = C󶀡S1 + S2 + 2ρ∗ 󵀄S1 S2 󶀱.
This sum-capacity corresponds to the rate pair
2
R1 = I(X1 ; Y ) = I(X1 ; Y | X2 ) = C󶀡(1 − ρ∗ )S1 󶀱,
2
(.)
R2 = I(X2 ; Y ) = I(X2 ; Y | X1 ) = C󶀡(1 − ρ∗ )S2 󶀱
attained by a zero-mean Gaussian (X1 , X2 ) with the same average power P and correlation
coefficient ρ∗ . For simplicity of exposition, we only consider the symmetric case S1 = S2 =
S = д 2 P. The general case follows similarly.
Codebook. Divide the interval [−󵀄P(1 − ρ∗ ), 󵀄P(1 − ρ∗ )] into 2nR 󰑗 , j = 1, 2, message
intervals and represent each message m j ∈ [1 : 2nR 󰑗 ] by the midpoint θ j (m j ) of an interval
with distance Δ j = 2󵀄P(1 − ρ∗ ) ⋅ 2−nR 󰑗 between neighboring messages.
Encoding. To simplify notation, we assume that transmission commences at time i = −2.
To send the message pair (m1 , m2 ), in the initial  transmissions the encoders transmit
(X1,−2 , X2,−2 ) = 󶀡0, θ2 (m2 )/󵀄1 − ρ∗ 󶀱,
(X1,−1 , X2,−1 ) = 󶀡θ1 (m1 )/󵀄1 − ρ∗ , 0󶀱,
(X1,0 , X2,0 ) = (0, 0).
440 Interactive Channel Coding

From the noiseless feedback, encoder  knows the initial noise values (Z−1 , Z0 ) and en-
coder  knows (Z−2 , Z0 ). Let U1 = 󵀄1 − ρ∗ Z−1 + 󵀄ρ∗ Z0 and U2 = 󵀄1 − ρ∗ Z−2 + 󵀄ρ∗ Z0
be jointly Gaussian with zero means, unit variances, and correlation coefficient ρ∗ .
In time i = 1, the encoders transmit X11 = γ1U1 and X21 = γ1U2 , respectively, where
2 2
γ1 is chosen so that E(X11 ) = E(X21 ) = P. In time i ∈ [2 : n], the encoders transmit

X1i = γi (X1,i−1 − E(X1,i−1 |Yi−1 )),

X2i = −γi (X2,i−1 − E(X2,i−1 |Yi−1 )),

where γi , i ∈ [2 : n], is chosen so that E(X1i2 ) = E(X2i2 ) = P for each i. Such γi exists since
the “errors” (X1i − E(X1i |Yi )) and (X2i − E(X2i |Yi )) have the same power. Note that the
errors are scaled with opposite signs. The total (expected) power consumption for each
sender over the (n + 3) transmissions is upper bounded as
n
󵠈 E(X 2ji ) ≤ (n + 1)P for j = 1, 2.
i=−2

Decoding. The decoder estimates θ1 (m1 ) and θ2 (m2 ) as

̂ 1n = 󵀆1 − ρ∗Y−1 + 󵀆ρ∗Y0 − E(U1 |Y n ) = θ1 (m1 ) + U1 − E(U1 |Y n ),

Θ
̂ 2n = 󵀆1 − ρ∗Y−2 + 󵀆ρ∗Y0 − E(U2 |Y n ) = θ2 (m2 ) + U2 − E(U2 |Y n ).
Θ

Analysis of the probability of error. Following similar steps as in Remark ., note that
we can rewrite X1i , X2i as

X1i = γ󳰀i (U1 − E(U1 |Y i−1 )),

X2i = (−1)i−1 γi󳰀 (U2 − E(U2 |Y i−1 )).

Hence X1i , X2i , and Yi are independent of Y i−1 . Now consider

n
I(U1 ; Y n ) = 󵠈 I(U1 ; Yi |Y i−1 )
i=1
n
= 󵠈 h(Yi ) − h(Yi |U1 , Y i−1 )
i=1
n
= 󵠈 h(Yi ) − h(Yi | X1i , Y i−1 )
i=1
n
(a)
= 󵠈 h(Yi ) − h(Yi | X1i )
i=1
n
= 󵠈 I(X1i ; Yi ), (.)
i=1

where (a) follows since X1i and Yi are independent of Y i−1 . Each mutual information
term satisfies the following fixed point property.
17.2 Multiple Access Channel with Feedback 441

Lemma .. Let ρi be the correlation coefficient between X1i and X2i . Then, ρi = ρ∗
for all i ∈ [1 : n], where ρ∗ ∈ [0, 1] uniquely satisfies the condition
2
2 C󶀡(1 − ρ∗ )S󶀱 = C󶀡2(1 + ρ∗ )S󶀱.

In other words, the correlation coefficient ρi between X1i and X2i stays constant at ρ∗
over all i ∈ [1 : n]. The proof of this lemma is given in Appendix A.
Continuing with the proof of achievability, from (.), (.), and Lemma ., we have
2
I(U1 ; Y n ) = n C󶀡(1 − ρ∗ )S󶀱,

which implies that

∗2
̂ 1n − θ1 (m1 ) = U1 − E(U1 |Y n ) ∼ N(0, 2−2n C((1−ρ
Θ )S)
).
2
Similarly, we have I(U2 ; Y n ) = n C((1 − ρ∗ )S) and
∗2
̂ 2n − θ2 (m2 ) = U2 − E(U2 |Y n ) ∼ N(0, 2−2n C((1−ρ
Θ )S)
).

Hence, as in the Schalkwijk–Kailath coding scheme, the probability of error Pe(n) tends to
2
zero double-exponentially fast as n → ∞ if R1 , R2 < C((1 − ρ∗ )S). This completes the
proof of achievability for the sum-capacity in (.) when S1 = S2 = S.
Achieving other points in the capacity region. Other rate pairs in the capacity region can
be achieved by combining rate splitting, superposition coding, and successive cancellation
decoding with the above feedback coding scheme for the sum-capacity. Encoder  divides
M1 into two independent messages M10 and M11 , and sends M10 using a Gaussian random
code with power αP1 and M11 using the above feedback coding scheme while treating the
codeword for M10 as noise. Encoder  sends M2 using the above scheme. The decoder
first recovers (M11 , M2 ) as in the above feedback coding scheme and then recovers M10
using successive cancellation decoding.
Now it can be easily shown that the rate triple (R10 , R11 , R2 ) is achievable if

R10 < C(αS1 ),

(1 − (ρ∗ )2 )αS
̄ 1
R11 < C 󶀦 󶀶,
1 + αS1
(1 − (ρ∗ )2 )S2
R2 < C 󶀦 󶀶,
1 + αS1

where ρ∗ ∈ [0, 1] satisfies the condition

(1 − (ρ∗ )2 )αS
̄ 1 (1 − (ρ∗ )2 )S2 ̄ + S2 + 2ρ󵀄αS
αS ̄ 1 S2
C󶀦 󶀶 + C󶀦 󶀶 = C󶀦 1 󶀶.
1 + αS1 1 + αS1 1 + αS1
442 Interactive Channel Coding

By substituting R1 = R10 + R11 , taking ρ = 󵀂αρ ̄ ∗ , and varying α ∈ [0, 1], we can establish
the achievability of any rate pair (R1 , R2 ) such that

R1 < C󶀡(1 − ρ2 )S1 󶀱,

R2 < C󶀡S1 + S2 + 2ρ󵀄S1 S2 󶀱 − C󶀡(1 − ρ2 )S1 󶀱

for some ρ ≤ ρ∗ . By symmetry, we can show the achievability of any rate pair (R1 , R2 )
such that

R1 < C󶀡S1 + S2 + 2ρ󵀄S1 S2 󶀱 − C󶀡(1 − ρ2 )S2 󶀱,

R2 < C󶀡(1 − ρ2 )S2 󶀱

for some ρ ≤ ρ∗ . Finally, taking the union over these two regions and noting that the
inequalities in the capacity region are inactive for ρ > ρ∗ completes the proof of Theo-
rem ..
Remark .. Consider the coding scheme for achieving the sum-capacity in (.). Since
the covariance matrix KX󰑖 of (X1i , X2i ) is constant over time, the error scaling factor γi ≡ γ
is also constant for i ≥ 2. Let

γ 0
A=󶁦 󶁶,
0 −γ
B =󶁢д д 󶁲,

and consider

X1i = γ(X1,i−1 − E(X1,i−1 |Yi−1 )),

X2i = −γ(X2,i−1 − E(X2,i−1 |Yi−1 ))

Then KX󰑖 can be expressed recursively as

KX󰑖 = AKX󰑖−1 AT − (AKX󰑖−1 B T )(1 + BKX󰑖−1 B T )−1 (AKX󰑖−1 B T )T .

Now from the properties of discrete algebraic Riccati equations (Lancaster and Rodman
, Chapter ; Kailath, Sayed, and Hassibi , Section .), it can be shown that

P P ρ∗
lim KX󰑖 = K ⋆ = 󶁦 ∗ 󶁶
i→∞ Pρ P

for any initial condition KX1 ≻ 0. Equivalently, both (1/n)I(X11 ; Y n ) and (1/n)I(X21 ; Y n )
2
converge to C((1 − ρ∗ )S) as n → ∞. Thus using the same argument as for the Schalkwijk–
Kailath coding scheme (see Remark .), this implies that no initial transmission phase
is needed and the same rate pair can be achieved over any additive (non-Gaussian) noise
MAC using linear feedback coding.
17.3 Broadcast Channel with Feedback 443

17.3 BROADCAST CHANNEL WITH FEEDBACK

Consider the broadcast channel with noiseless causal feedback from both receivers de-
picted in Figure .. The sender wishes to communicate independent messages M1 and
M2 to receivers  and , respectively.

Y1i−1 , Y2i−1

Y1i ̂1
M
Decoder 
M1 , M2 Xi
Encoder p(y1 , y2 |x)
Y2i ̂2
M
Decoder 

Y1i−1 , Y2i−1

Figure .. Broadcast channel with feedback.

If the broadcast channel is physically degraded, i.e., p(y1 , y2 |x) = p(y1 |x)p(y2 |y1 ),
then the converse proof in Section . can be modified to show that feedback does not
enlarge the capacity region. Feedback can, however, enlarge the capacity region of broad-
cast channels in general. We illustrate this fact via the following two examples.
Example . (Dueck’s example). Consider the DM-BC depicted in Figure . with in-
put X = (X0 , X1 , X2 ) and outputs Y1 = (X0 , X1 ⊕ Z) and Y2 = (X0 , X2 ⊕ Z), where X0 =
X1 = X2 = {0, 1} and Z ∼ Bern(1/2). It is easy to show that the capacity region of this
channel without feedback is the set of rate pairs (R1 , R2 ) such that R1 + R2 ≤ 1. By com-
parison, the capacity region with feedback is the set of rate pairs (R1 , R2 ) such that R1 ≤ 1
and R2 ≤ 1. The proof of the converse is straightforward. To prove achievability, we show
that i.i.d. Bern(1/2) sequences X1n and X2n can be sent error-free to receivers  and , re-
spectively, in (n + 1) transmissions. In the first transmission, we send (0, X11 , X21 ) over
the channel. Using the feedback link, the sender can determine the noise Z1 . The triple
(Z1 , X12 , X22 ) is then sent over the channel in the second transmission, and upon receiving
the common information Z1 , the decoders can recover X11 and X21 perfectly. Thus using
the feedback link, the sender and the receivers can recover Z n and hence each receiver
can recover its intended sequence perfectly. Therefore, (R1 , R2 ) = (1, 1) is achievable with
feedback.
As shown in the above example, feedback can increase the capacity of the DM-BC by
letting the encoder broadcast common channel information to all decoders. The following
example demonstrates this role of feedback albeit in a less transparent manner.
Example . (Gaussian BC with feedback). Consider the symmetric Gaussian broad-
cast channel Y j = X + Z j , j = 1, 2, where Z1 ∼ N(0, 1) and Z2 ∼ N(0, 1) are independent
444 Interactive Channel Coding

X1
Y1
X0
Y2
X2

Figure .. DM-BC for Example ..

of each other. Assume expected average power constraint P on X. Without feedback, the
capacity region is the set of rate pairs (R1 , R2 ) such that R1 + R2 ≤ C(P). With feedback,
we can use the following variation of the Schalkwijk–Kailath coding scheme. After proper
initialization, we send Xi = X1i + X2i , where
X1i = γi (X1,i−1 − E(X1,i−1 |Y1,i−1 )),
X2i = −γi (X2,i−1 − E(X2,i−1 |Y2,i−1 )),

and γi is chosen so that E(Xi2 ) ≤ P for each i. It can be shown that

1 P(1 + ρ∗ )/2
R1 = R2 = C󶀥 󶀵
2 1 + P(1 − ρ∗ )/2
is achievable, where ρ∗ satisfies the condition
P(1 − ρ∗ ) P(P + 2)
ρ∗ 󶀥1 + (P + 1) 󶀥1 + 󶀵󶀵 = (1 − ρ∗ ).
2 2
For example, when P = 1, (R1 , R2 ) = (0.2803, 0.2803) is achievable with feedback, which
is strictly outside the capacity region without feedback.
Remark .. As shown in Lemma ., the capacity region of the broadcast channel with
no feedback depends only on its conditional marginal pmfs. Hence, the capacity region
of the Gaussian BC in Example . is the same as that of its corresponding physically
degraded broadcast channel Y1 = Y2 = X + Z, where Z ∼ N(0, 1). The above example
shows that this is not the case when feedback is present. Hence, in general the capacity
region of a stochastically degraded BC can be strictly larger than that of its physically
degraded version.

17.4 RELAY CHANNEL WITH FEEDBACK

Consider the DM-RC p(y2 , y3 |x1 , x2 ) with noiseless causal feedback from the receivers
to the relay and the sender as depicted in Figure .. Here, for each i ∈ [1 : n], the relay
17.5 Two-Way Channel 445

encoder assigns a symbol x2i (y2i−1 , y3i−1 ) to each pair (y2i−1 , y3i−1 ) and the encoder assigns
a symbol x1i (m, y2i−1 , y3i−1 ) to each triple (m, y2i−1 , y3i−1 ).

Y3i−1

Relay Encoder

Y2i−1
Y2i X2i

M X1i Y3i ̂
M
Encoder p(y2 , y3 |x1 , x2 ) Decoder

Figure .. Relay channel with feedback.

Unlike the nonfeedback case, we have the following single-letter characterization of

the capacity.

Theorem .. The capacity of DM-RC p(y2 , y3 |x1 , x2 ) with feedback is

C = max min󶁁I(X1 , X2 ; Y3 ), I(X1 ; Y2 , Y3 | X2 )󶁑

p(x1 ,x2 )

The proof of the converse follows by noting that the cutset bound on the capacity in
Theorem . continues to hold when feedback is present. To prove achievability, note
that due to feedback, the relay encoder at time i observes (Y2i−1 , Y3i−1 ) instead of only
Y2i−1 . Hence, feedback in effect converts the relay channel p(y2 , y3 |x1 , x2 ) into a (physi-
cally) degraded relay channel p(y2󳰀 , y3 |x1 , x2 ), where Y2󳰀 = (Y2 , Y3 ). Substituting Y2󳰀 in the
corresponding decode–forward lower bound

C ≥ max min󶁁I(X1 , X2 ; Y3 ), I(X1 ; Y2󳰀 | X2 )󶁑

p(x1 ,x2 )

completes the proof of achievability.

Remark .. To achieve the capacity in Theorem ., we only need feedback from the
receiver to the relay. Hence, adding other feedback links does not increase the capacity.

17.5 TWO-WAY CHANNEL

Consider the two-way communication system depicted in Figure .. We assume a dis-
crete memoryless two-way channel (DM-TWC) model (X1 × X2 , p(y1 , y2 |x1 , x2 ), Y1 × Y2 )
that consists of four finite sets X1 , X2 , Y1 , Y2 and a collection of conditional pmfs
p(y1 , y2 |x1 , x2 ) on Y1 × Y2 . Each node wishes to send a message to the other node.
446 Interactive Channel Coding

A (2nR1 , 2nR2 , n) code for the DM-TWC consists of

∙ two message sets [1 : 2nR1 ] and [1 : 2nR2 ],
∙ two encoders, where encoder  assigns a symbol x1i (m1 , y1i−1 ) to each pair (m1 , y1i−1 )
and encoder  assigns a symbol x2i (m2 , y2i−1 ) to each pair (m2 , y2i−1 ) for i ∈ [1 : n], and
∙ two decoders, where decoder  assigns an estimate m ̂ 2 to each pair (m1 , y1n ) and de-
̂ 1 to each pair (m2 , y2n ).
coder  assigns an estimate m
The channel is memoryless in the sense that (Y1i , Y2i ) is conditionally independent
of (M1 , M2 , X1i−1 , X2i−1 , Y1i−1 , Y2i−1 ) given (X1i , X2i ). We assume that the message pair
(M1 , M2 ) is uniformly distributed over [1 : 2nR1 ] × [1 : 2nR2 ]. The average probability
of error is defined as Pe(n) = P{(M ̂ 1, M̂ 2) =
̸ (M1 , M2 )}. A rate pair (R1 , R2 ) is said to be
achievable for the DM-TWC if there exists a sequence of (2nR1 , 2nR2 , n) codes such that
limn→∞ Pe(n) = 0. The capacity region C of the DM-TWC is the closure of the set of achiev-
able rate pairs (R1 , R2 ).
The capacity regionof the DM-TWC is not known in general. The main difficulty is
that the two information flows share the same channel, causing interference to each other.
In addition, each node has to play the two competing roles of communicating its own
message and providing feedback to help the other node.

M1 X1i Y2i ̂1
M
Encoder  Decoder 
p(y1 , y2 |x1 , x2 )
̂2
M Y1i X2i M2
Decoder  Encoder 

Node  Node 

Figure .. Two-way communication system.

17.5.1 Simple Inner and Outer Bounds

By having the encoders completely ignore the received outputs and use randomly and
independently generated point-to-point codebooks, we obtain the following inner bound.

Proposition .. A rate pair (R1 , R2 ) is achievable for the DM-TWC p(y1 , y2 |x1 , x2 )
if
R1 < I(X1 ; Y2 | X2 , Q),
R2 < I(X2 ; Y1 | X1 , Q)

for some pmf p(q)p(x1 |q)p(x2 |q).

17.5 Two-Way Channel 447

This inner bound is tight in some special cases, for example, when the channel de-
composes into two separate DMCs, i.e., p(y1 , y2 |x1 , x2 ) = p(y1 |x2 )p(y2 |x1 ). The bound is
not tight in general, however, as will be shown in Example ..
Using standard converse techniques, we can establish the following outer bound.

Proposition .. If a rate pair (R1 , R2 ) is achievable for the DM-TWC p(y1 , y2 |x1 , x2 ),
then it must satisfy the inequalities

R1 ≤ I(X1 ; Y2 | X2 ),
R2 ≤ I(X2 ; Y1 | X1 )

for some pmf p(x1 , x2 ).

This outer bound is a special case of the cutset bound that will be discussed in Sec-
tion . and is not tight in general.
Example . (Binary multiplier channel (BMC)). Consider the DM-TWC with X1 =
X2 = {0, 1} and Y1 = Y2 = Y = X1 ⋅ X2 depicted in Figure ..

X1 X2

Y Y

Figure .. Binary multiplier channel.

The inner bound in Proposition . simplifies to the set of rate pairs (R1 , R2 ) such that

R1 < α2 H(α1 ),
R2 < α1 H(α2 ),

which is attained by Q = , X1 ∼ Bern(α1 ), and X2 ∼ Bern(α2 ) for some α1 , α2 ∈ [1/2, 1].

In the other direction, the outer bound in Proposition . simplifies to the set of rate pairs
(R1 , R2 ) such that
α2
R1 ≤ ᾱ1 H 󶀥 󶀵,
ᾱ1
α
R2 ≤ ᾱ2 H 󶀥 1 󶀵 ,
ᾱ2
which is attained by setting p X1 ,X2 (1, 0) = α1 , p X1 ,X2 (0, 1) = α2 , and p X1 ,X2 (1, 1) = 1 − α1 −
α2 for some α1 , α2 ≥ 0 such that α1 + α2 ≤ 1. The two bounds are plotted in Figure ..
Note that these bounds lead to the lower and upper bounds 0.6170 ≤ Csym ≤ 0.6942
on the symmetric capacity Csym = max{R : (R, R) ∈ C }.
448 Interactive Channel Coding

0.8

0.6

R2
0.4

0.2

00 0.2 0.4 0.6 0.8 1

Figure .. Simple inner and outer bounds on the capacity region of the BMC.

17.5.2 Dependence Balance Bound

Consider the DM-TWC with common output Y1 = Y2 = Y . We can establish the following
improved outer bound on the capacity region.

Theorem . (Dependence Balance Bound for DM-TWC with Common Output).
If a rate pair (R1 , R2 ) is achievable for the DM-TWC with common output p(y|x1 , x2 ),
then it must satisfy the inequalities

R1 ≤ I(X1 ; Y | X2 , U ),
R2 ≤ I(X2 ; Y | X1 , U )

for some pmf p(u, x1 , x2 ) such that

I(X1 ; X2 |U) ≤ I(X1 ; X2 |Y , U) (.)

and |U | ≤ 3.

Note that the only difference between this bound and the simple outer bound in Propo-
sition . is the extra dependence balance condition in (.), which limits the set of joint
pmfs that can be formed through sequential transmissions. This bound is not tight in
general, however. For the case of the BMC, the dependence balance condition in (.)
is inactive (since U = Y = X1 ⋅ X2 ), which leads to the same upper bound on the sym-
metric capacity Csym ≤ 0.6942 obtained by the simple outer bound in Proposition ..
17.6 Directed Information 449

The dependence balance bound, however, can be improved by a genie argument to yield
the tighter upper bound Csym ≤ 0.6463.
The proof of the dependence balance bound follows by standard converse techniques
with the auxiliary random variable identification Ui = Y i−1 .
Remark .. A similar outer bound can be derived for the DM-MAC with feedback. In
this case, we add the inequality

R1 + R2 ≤ I(X1 , X2 ; Y |U)

to the inequalities in Theorem .. It can be shown that the resulting outer bound is
in general tighter than the cooperative outer bound for the DM-MAC with feedback in
Proposition ., due to the dependence balance condition in (.) on the set of joint pmfs
on (X1 , X2 ).

17.6 DIRECTED INFORMATION

Just as mutual information between two random sequences X n and Y n captures the un-
certainty about Y n reduced by knowing X n , directed information from X n to Y n captures
the uncertainty about Y n reduced by causal knowledge of X n . This notion of directed in-
formation provides a nontrivial multiletter characterization of the capacity region of the
DM-TWC.
More precisely, the directed information from a random sequence X n to another ran-
dom sequence Y n of the same length is defined as
n n
I(X n → Y n ) = 󵠈 I(X i ; Yi |Y i−1 ) = H(Y n ) − 󵠈 H(Yi |Y i−1 , X i ).
i=1 i=1

In comparison, the mutual information between X n and Y n is

n n
I(X n ; Y n ) = 󵠈 I(X n ; Yi |Y i−1 ) = H(Y n ) − 󵠈 H(Yi |Y i−1 , X n ).
i=1 i=1

The directed information from X n to Y n causally conditioned on Z n is similarly defined as

n
I(X n → Y n ‖Z n ) = 󵠈 I(X i ; Yi |Y i−1 , Z i ).
i=1

As a related notion, the pmf of X causally conditioned on Y n is defined as

n
p(x n ‖y n ) = 󵠉 p(xi |x i−1 , y i ).
i=1

By convention, the pmf of X n causally conditioned on (, Y n−1 ) is expressed as

n
p(x n ‖y n−1 ) = 󵠉 p(xi |x i−1 , y i−1 ).
i=1
450 Interactive Channel Coding

17.6.1 Multiletter Characterization of the DM-TWC Capacity Region

We use directed information to obtain a multiletter characterization of the capacity region
of the DM-TWC with common output.

Theorem .. Let C (k) , k ≥ 1, be the set of rate pairs (R1 , R2 ) such that
1
R1 ≤ I(X1k → Y k ‖X2k ),
k
1
R2 ≤ I(X2k → Y k ‖X1k )
k

for some (causally conditional) pmf p(x1k ‖y k−1 )p(x2k ‖y k−1 ). Then the capacity region
C of the DM-TWC with common output p(y|x1 , x2 ) is the closure of ⋃k C (k) .

While this characterization in itself is not computable and thus of little use, each choice
of k and p(x1k ‖y k−1 )p(x2k ‖y k−1 ) leads to an inner bound on the capacity region, as illus-
trated in the following.
Example . (BMC revisited). Consider the binary multiplier channel Y = X1 ⋅ X2 in
Example .. We evaluate the inequalities in Theorem . with U0 =  and Ui = ui (Y i ) ∈
{0, 1, 2}, i ∈ [1 : n], such that

󶀂
󶀒 0 if (Ui−1 , Yi ) = (0, 1) or (Ui−1 , Yi ) = (1, 0) or Ui = 2,
󶀒
Ui = 󶀊1 if (Ui−1 , Yi ) = (0, 0),
󶀒
󶀒
󶀚2 otherwise.

For j = 1, 2, consider the same causally conditional pmf p(x kj ‖y k−1 ) of the form

for some α, β ∈ [0, 1]. Then, by optimizing over α, β, it can be shown that
1 1
lim I(X1k → Y k ‖X2k ) = lim I(X2k → Y k ‖X1k ) = 0.6191,
k→∞ k k→∞ k

and hence (R1 , R2 ) = (0.6191, 0.6191) is achievable, which is strictly outside the inner
bound in Proposition ..

Remark .. A similar multiletter characterization to Theorem . can be found for the
general DM-TWC p(y1 , y2 |x1 , x2 ) and the DM-MAC with feedback.
17.6 Directed Information 451

17.6.2 Proof of the Converse

By Fano’s inequality,

nR1 ≤ I(M1 ; Y n , M2 ) + nєn

= I(M1 ; Y n |M2 ) + nєn
n
= 󵠈 I(M1 ; Yi |M2 , Y i−1 ) + nєn
i=1
n
= 󵠈 I(X1i ; Yi | X2i , Y i−1 ) + nєn
i=1
n
≤ 󵠈 I(X1i ; Y i | X2i ) + nєn
i=1
= I(X1n → Y n ‖X2n ) + nєn .

Similarly, nR2 ≤ I(X2n → Y n ‖X1n ) + nєn . Also it can be shown that I(M1 ; M2 |X2i , Y i ) ≤
I(M1 ; M2 |X2i−1 , Y i−i ) for all i ∈ [1 : n]. Hence

I(X1i ; X2i | X2i−1 , Y i−1 ) ≤ I(M1 ; M2 | X2i−1 , Y i−1 ) ≤ I(M1 ; M2 ) = 0,

or equivalently, X2i → (X2i−1 , Y i−1 ) → X1i form a Markov chain. This implies that the joint
pmf is of the form p(x1n ‖y n−1 )p(x2n ‖y n−1 ). Therefore, for any є > 0, (R1 − є, R2 − є) ∈ C (n)
for n sufficiently large. This completes the converse proof of Theorem ..

17.6.3 Proof of Achievability

Communication takes place over k interleaved blocks, each used for one of k indepen-
dent message pairs (M1 j , M2 j ) ∈ [1 : 2nR1 󰑗 ] × [1 : 2nR2 󰑗 ], j ∈ [1 : k]. Block j consists of
transmission times j, k + j, 2k + j, . . . , (n − 1)k + j, as illustrated in Figure ..
For each block, we treat the channel input and output sequences from the previous
blocks as causal state information available at each encoder–decoder pair and use the
multiplexing technique in Section ...
Codebook generation. Fix k and a conditional pmf

k
j−1 j−1
p(x1k ‖y k−1 )p(x2k ‖y k−1 ) = 󵠉 p(x1 j |x1 , y j−1 )p(x2 j |x1 , y j−1 ).
j=1

j−1 j−1
Let S1 j = (X1 , Y j−1 ) and S2 j = (X2 , Y j−1 ). Divide the message m1 into k indepen-
dent messages (m11 , . . . , m1k ). Then further divide each message m1 j into the messages
j−1
(m1 j (s1 j ) : s1 j ∈ S1 j = X1 × Y j−1 ). Thus, R1 = ∑kj=1 ∑s1 󰑗 ∈S1 󰑗 R1 j (s1 j ). For j ∈ [1 : k]
and s1 j ∈ S1 j , randomly and conditionally independently generate 2nR1 󰑗 (s1 󰑗 ) sequences
x1 j (m1 j (s1 j ), s1 j ), m1 j (s1 j ) ∈ [1 : 2nR1 󰑗 (s1 󰑗 ) ], each according to ∏ni=1 p X1 󰑗 |S1 󰑗 (x1,(i−1)k+ j |s1 j ).
452 Interactive Channel Coding

X1,(n−1)k+1

Y(n−1)k+1
X1,2k+1
X1,k+1

Y2k+1
Yk+1
X1,1

nk
2k

3k
k

Y1k
󶀂
S1 j 󶀊
󶀚

Block j

Figure .. Multiletter coding scheme for the DM-TWC.

These sequences form the codebook C1 j (s1 j ) for j ∈ [1 : k]. Similarly, generate random
j−1
codebooks C2 j (s2 j ), j ∈ [1 : k], s2 j ∈ S2 j = X2 × Y j−1 for m2 = (m21 , . . . , m2k ) and m2 j =
(m2 j (s2 j ) : s2 j ∈ S2 j ).
Encoding. For each block j ∈ [1 : k] (consisting of transmission times j, k + j, 2k +
j−1
j, . . . , (n − 1)k + j), encoder  treats the sequences (x1 , y j−1 ) ∈ S1nj as causal state in-
formation available at both encoder  and decoder . As in the achievability proof in Sec-
j−1
tion .., encoder  stores the codewords x1 j (m1 j (s1 j ), s1 j ), s1 j ∈ X1 × Y j−1 , in FIFO
buffers and transmits a symbol from the buffer that corresponds to the state symbol of
the given time. Similarly, encoder  stores the codewords in FIFO buffers and multiplexes
over them according to its state sequence.
Decoding and the analysis of the probability of error. Upon receiving y kn , decoder 
recovers m11 , . . . , m1k successively. For interleaved block j, decoder  first forms the state
j−1
sequence estimate ̂s1 j = (̂x1 , y j−1 ) from the codeword estimates and output sequences
from previous blocks, and demultiplexes the output sequence for block j into |S1 j | subse-
quences accordingly. Then it finds the unique index m ̂ 1 j (s1 j ) for each subsequence corre-
sponding to the state s1 j . Following the same argument as in Section .., the probability
of error for M1 j tends to zero as n → ∞ if previous decoding steps are successful and

R1 j < I(X1 j ; X2k , Y jk |S1 j ) − δ(є)

(a) j j−1
= I(X1 j ; X2,k j+1 , Y jk | X2 , X1 , Y j−1 ) − δ(є)
k
j−1
= 󵠈 I(X1 j ; X2,i+1 , Yi | X2i , X1 , Y i−1 ) − δ(є)
i= j
k
(b) j−1
= 󵠈 I(X1 j ; Yi | X2i , X1 , Y i−1 ) − δ(є)
i= j

j−1 j
where (a) and (b) follow since X1 j → (X1 , Y j−1 ) → X2 and X2,i+1 → (X2i , Y i ) → X1i+1 ,
respectively. Therefore, by induction, the probability of error over all time slots j ∈ [1 : k]
Summary 453

tends to zero as n → ∞ if
k k k
j−1
󵠈 R1 j < 󵠈 󵠈 I(X1 j ; Yi | X2i , X1 , Y i−1 ) − kδ(є)
j=1 j=1 i= j
k i
j−1
= 󵠈 󵠈 I(X1 j ; Yi | X2i , X1 , Y i−1 ) − kδ(є)
i=1 j=1
k
= 󵠈 I(X1i ; Yi | X2i , Y i−1 ) − kδ(є)
i=1

= I(X1k → Y k ‖X2k ) − kδ(є).

Similarly, the probability of error for decoder  tends to zero as n → ∞ if

k
󵠈 R2 j < I(X2k → Y k ‖X1k ) − kδ(є).
j=1

This completes the achievability proof of Theorem ..

SUMMARY

∙ Feedback can simplify coding and improve reliability

∙ Iterative refinement:
∙ Schalkwijk–Kailath scheme for the Gaussian channel
∙ Horstein’s scheme for the BSC
∙ Posterior matching scheme for the DMC
∙ Block feedback scheme for the DMC
∙ Feedback can enlarge the capacity region of multiuser channels:
∙ Feedback can be used to induce cooperation between the senders
∙ Cover–Leung inner bound
∙ A combination of the Schalkwijk–Kailath scheme, rate splitting, and superposition
coding achieves the capacity region of the -sender Gaussian MAC with feedback
∙ Feedback can be used also to provide channel information to be broadcast to mul-
tiple receivers
∙ The cutset bound is achievable for the relay channel with feedback
∙ Dependence balance bound
∙ Directed information and the multiletter characterization of the capacity region of the
two-way channel
454 Interactive Channel Coding

∙ Open problems:
17.1. What is the sum-capacity of the -sender symmetric Gaussian MAC with feed-
back?
17.2. What is the sum-capacity of the -receiver symmetric Gaussian BC with feed-
back?
17.3. What is the symmetric capacity of the binary multiplier channel?

BIBLIOGRAPHIC NOTES

The Schalkwijk–Kailath coding scheme appeared in Schalkwijk and Kailath () and
Schalkwijk (). Pinsker () and Shepp, Wolf, Wyner, and Ziv () showed that
under the almost-sure average power constraint in (.), the probability of error decays
only (single) exponentially fast, even for sending a binary message M ∈ {1, 2}. Wyner
() provided a simple modification of the Schalkwijk–Kailath scheme that achieves
any rate R < C under this more stringent power constraint. Butman () extended the
Schalkwijk–Kailath scheme to additive colored Gaussian noise channels with feedback,
the optimality of which was established by Kim ().
The Horstein coding scheme appeared in Horstein (). The first rigorous proof
that this scheme achieves the capacity of the BSC is due to Shayevitz and Feder (),
who also developed the posterior matching scheme. The block feedback coding scheme
for the BSC in Section .. is originally due to Weldon () and has been extended to
arbitrary DMCs by Ahlswede () and Ooi and Wornell ().
Gaarder and Wolf () first showed that feedback can enlarge the capacity region of
DM multiuser channels via the binary erasure MAC (R = 0.7602 in Example .). The
Cover–Leung inner bound on the capacity region of the DM-MAC with feedback ap-
peared in Cover and Leung (). The proof of achievability in Section .. follows
Zeng, Kuhlmann, and Buzo (). The capacity region of the Gaussian MAC with feed-
back in Theorem . is due to Ozarow (). Carleial () studied the MAC with
generalized feedback, where the senders observe noisy versions of the channel output at
the receiver; see Kramer () for a discussion of this model.
El Gamal () showed that feedback does not enlarge the capacity region of the
physically degraded DM-BC. Example . is due to Dueck (). Ozarow and Leung
() extended the Schalkwijk–Kailath coding scheme to the Gaussian BC as described
in Example .. The improved inner bound for the Gaussian BC is due to Elia ().
Cover and El Gamal () established the capacity of the DM-RC with feedback in The-
orem ..
Shannon () introduced the DM-TWC and established the inner and outer bounds
in Propositions . and .. The binary multiplier channel (BMC) in Example . first
appeared in the same paper, where it is attributed to D. Blackwell. Dueck () showed
via a counterexample that Shannon’s inner bound is not tight in general. Hekstra and
Willems () established the dependence balance bound and its extensions. They also
Problems 455

established the symmetric capacity upper bound Csym ≤ 0.6463. The symmetric capacity
lower bound in Example . was originally obtained by Schalkwijk () using a con-
structive coding scheme in the flavor of the Horstein scheme. This bound was further
improved by Schalkwijk () to Csym ≥ 0.6306.
The notion of directed information was first introduced by Marko () in a slightly
different form. The definition in Section . is due to Massey (), who demonstrated
its utility by showing that the capacity of point-to-point channels with memory is upper
bounded by the maximum directed information from X n to Y n . The causal condition-
ing notation was developed by Kramer (), who also established the multiletter char-
acterization of the capacity region for the DM-TWC in Kramer (). Massey’s upper
bound was shown to be tight for classes of channels with memory by Tatikonda and Mitter
(), Kim (b), and Permuter, Weissman, and Goldsmith (). The inner bound
on the capacity region of the BMC using directed information in Example . is due to
Ardestanizadeh (). Directed information and causally conditional probabilities arise
as a canonical answer to many other problems with causality constraints (Permuter, Kim,
and Weissman ).

PROBLEMS

.. Show that the cutset bound on the capacity of the DM-RC in Theorem . con-
tinues to hold when feedback from the receivers to the senders is present.
.. Provide the details of the analysis of the probability of error for the Cover–Leung
inner bound in Theorem ..
.. Show that the Cover–Leung inner bound for the Gaussian MAC simplifies to (.).
.. Prove achievability of the sum-capacity in (.) of the Gaussian MAC with feed-
back when S1 ̸= S2 .
.. Provide the details of the achievability proof of every rate pair in the capacity re-
gion of the Gaussian MAC.
.. Show that feedback does not increase the capacity region of the physically de-
graded broadcast channel. (Hint: Consider auxiliary random variable identifica-
tion Ui = (M2 , Y1i−1 , Y2i−1 ).)
.. Show that the simple outer bound on the capacity region of the DM-BC p(y1 , y2 |x)
in Figure . continues to hold with feedback.
.. Provide the details of the proof of the dependence balance bound in Theorem ..
.. Semideterministic MAC with feedback. Show that the Cover–Leung inner bound
on the capacity region of the DM-MAC p(y|x1 , x2 ) with feedback is tight when X1
is a function of (X2 , Y).
Remark: This result is due to Willems ().
456 Interactive Channel Coding

.. Gaussian MAC with noise feedback. Consider the Gaussian multiple access chan-
nel Y = д1 X1 + д2 X2 + Z in Section ... Suppose that there is noiseless causal
feedback of the noise Z, instead of Y, to the senders. Thus, encoder j = 1, 2 as-
signs a symbol x ji (m j , z i−1 ) at time i ∈ [1 : N]. Show that the capacity region of
this channel is the set of rate pairs (R1 , R2 ) such that

R1 + R2 ≤ C󶀡S1 + S2 + 2󵀄S1 S2 󶀱,

which is equal to the cooperative capacity region (see Problem .).

.. Directed information. Prove the following properties of directed information and
causally conditional probability distributions:
(a) Chain rule: p(y n , x n ) = p(y n ||x n )p(x n ||y n−1 ).
(b) Nonnegativity: I(X n → Y n ) ≥ 0 with equality iff p(y n ||x n ) = p(y n ).
(c) Conservation: I(X n ; Y n ) = I(X n → Y n ) + I(Y0 , Y n−1 → X n ), where Y0 = .
(d) Comparison to mutual information: I(X n → Y n ) ≤ I(X n ; Y n ) with equality if
there is no feedback, i.e., p(xi |x i−1 , y i−1 ) = p(xi |x i−1 ), i ∈ [1 : n].
.. Compound channel with feedback. Consider the compound channel p(ys |x) in
Chapter  with finite state alphabet S.
(a) Find the capacity of the compound channel with noiseless causal feedback.
(b) Compute the capacity of the binary erasure compound channel with feedback
where the erasure probability can take one of the four values (0, 0.1, 0.2, 0.25).
(c) Let C1 be the capacity of the compound channel without feedback, C2 be the
capacity of the compound channel without feedback but when the sender and
code designer know the actual channel (in addition to the decoder), and C3 be
the capacity of the compound channel with feedback. Which of the following
statements hold in general?
(1) C1 = C2 .
(2) C1 = C3 .
(3) C2 = C3 .
(4) C1 = C2 = C3 .
(5) C1 ≤ C2 .
(d) Consider a compound channel where the capacity of each individual channel
s ∈ S is achieved by the same pmf p∗ (x) on X . Answer part (c) for this case.
(e) Consider a compound channel having zero capacity without feedback. Is the
capacity with feedback also equal to zero in this case? Prove it or provide a
counterexample.
.. Modulo- sum two-way channel. Consider a DM-TWC with X1 = X2 = Y1 = Y2 =
{0, 1} and Y1 = Y2 = X1 ⊕ X2 . Find its capacity region.
Problems 457

.. Gaussian two-way channel. Consider the Gaussian two-way channel

Y1 = д12 X2 + Z1 ,
Y2 = д21 X1 + Z2 ,

where the noise pair (Z1 , Z2 ) ∼ N(0, K). Assume average power constraint P on
each of X1 and X2 . Find the capacity region of this channel in terms of the power
constraint P, channel gains д12 and д21 , and the noise covariance matrix K.
.. Common-message feedback capacity of broadcast channels. Consider the DM-BC
p(y1 , y2 |x) with feedback from the receivers. Find the common-message capacity
CF .
.. Broadcast channel with feedback. Consider the generalization of Dueck’s broad-
cast channel with feedback example as depicted in Figure ., where (Z1 , Z2 ) is
a DSBS(p). Show that the capacity region with feedback is the set of rate pairs
(R1 , R2 ) such that

R1 ≤ 1,
R2 ≤ 1,
R1 + R2 ≤ 2 − H(p).

Remark: This result is due to Shayevitz and Wigger ().

X1
Y1
X0
Y2
X2

Figure .. Generalized Dueck’s example.

.. Channels with state and feedback. Consider the DMC with DM state p(y|x, s)p(s)
in Chapter . Suppose that noiseless causal feedback from the receiver is available
at the encoder. Show that feedback does not increase the capacity when the state
information is available causally or noncausally at the encoder.
.. Delayed feedback. We considered the capacity region for channels with causal feed-
back. Now consider the DM-MAC p(y|x1 , x2 ) and assume that the feedback has a
458 Interactive Channel Coding

constant delay d such that encoder j = 1, 2 assigns a symbol x ji (m j , y i−d ) for time
i ∈ [1 : N]. Show that the capacity region is identical to that for the causal case
(d = 1). (Hint: For example, when d = 2, consider  parallel DM-MACs corre-
sponding to even and odd transmissions.)

APPENDIX 17A PROOF OF LEMMA 17.1

We use induction. By the construction of U1 and U2 , ρ1 = ρ∗ . For i ≥ 2, consider

(a)
I(X1i ; X2i ) = I(X1i ; X2i |Yi−1 )
= I(X1,i−1 ; X2,i−1 |Yi−1 )
= I(X1,i−1 ; X2,i−1 ) + I(X1,i−1 ; Yi−1 | X2,i−1 ) − I(X1,i−1 ; Yi−1 )
(b)
= I(X1,i−1 ; X2,i−1 ),

where (a) follows since the pair (X1i , X2i ) is independent of Yi−1 , (b) follows by the in-
duction hypothesis I(X1,i−1 ; Yi−1 |X2,i−1 ) = I(X1,i−1 ; Yi−1 ). Hence, we have ρ2i = ρ2i−1 .
To show that ρi = ρi−1 (same sign), consider

I(X1i ; X1i + X2i ) = I(X1i ; X1i + X2i |Yi−1 )

(a)
= I(X1,i−1 ; X1,i−1 − X2,i−1 |Yi−1 )
(b)
= I(X1,i−1 , Yi−1 ; X1,i−1 − X2,i−1 )
= I(X1,i−1 ; X1,i−1 − X2,i−1 ) + I(X2,i−1 ; Yi−1 | X1,i−1 )
> I(X1,i−1 ; X1,i−1 − X2,i−1 ),

where (a) follows by the definitions of X1i , X2i and (b) follows since X1,i−1 − X2,i−1 is in-
dependent of Yi−1 . Hence we must have ρi ≥ 0, which completes the proof of Lemma ..
CHAPTER 18

Discrete Memoryless Networks

We introduce the discrete memoryless network (DMN) as a model for general multihop
networks that includes as special cases graphical networks, the DM-RC, the DM-TWC,
and the DM single-hop networks with and without feedback we studied earlier. In this
chapter we establish outer and inner bounds on the capacity region of this network.
We first consider the multicast messaging case and establish a cutset upper bound on
the capacity. We then extend the decode–forward coding scheme for the relay channel to
multicast networks using the new idea of sliding block decoding. The resulting network
decode–forward lower bound on the capacity is tight for the physically degraded multicast
network. Generalizing compress–forward for the relay channel and network coding for
graphical networks, we introduce the noisy network coding scheme. This scheme involves
several new ideas. Unlike the compress–forward coding scheme for the relay channel in
Section ., where independent messages are sent over multiple blocks, in noisy network
coding the same message is sent multiple times using independent codebooks. Further-
more, the relays do not use Wyner–Ziv binning and each decoder performs simultaneous
nonunique decoding on the received signals from all the blocks without explicitly decod-
ing for the compression indices. We show that this scheme is optimal for certain classes of
deterministic networks and wireless erasure networks. These results also extend the net-
work coding theorem in Section . to graphical networks with cycles and broadcasting.
We then turn our attention to the general multimessage network. We establish a cutset
bound on the multimessage capacity region of the DMN, which generalizes all previous
cutset bounds. We also extend the noisy network coding inner bound to multimessage
multicast networks and show that it coincides with the cutset bound for the aforemen-
tioned classes of deterministic networks and wireless erasure networks. Finally, we show
how the noisy network coding scheme can be combined with decoding techniques for
the interference channel to establish inner bounds on the capacity region of the general
DMN.

18.1 DISCRETE MEMORYLESS MULTICAST NETWORK

Consider the multicast network depicted in Figure .. The network is modeled by an
N-node discrete memoryless network (DMN) (X1 × ⋅ ⋅ ⋅ × XN , p(y1 , . . . , yN |x1 , . . . , xN ),
Y1 × ⋅ ⋅ ⋅ × YN ) that consists of N sender–receiver alphabet pairs (Xk , Yk ), k ∈ [1 : N],
460 Discrete Memoryless Networks

2 j ̂j
M

1 N
M p(y1 , . . . , yN |x1 , . . . , xN ) ̂N
M

3 k
̂k
M

Figure .. Discrete memoryless multicast network.

and a collection of conditional pmfs p(y1 , . . . , yN |x1 , . . . , xN ). Note that the topology of
this network, that is, which nodes can communicate directly to which other nodes, is de-
fined through the structure of the conditional pmf p(y N |x N ). In particular, the graphical
network discussed in Chapter  is a special case of the DMN.
We assume that source node  wishes to send a message M to a set of destination nodes
D ⊆ [2 : N]. We wish to find the capacity of this DMN.
A (2nR , n) code for this discrete memoryless multicast network (DM-MN) consists of
∙ a message set [1 : 2nR ],
∙ a source encoder that assigns a symbol x1i (m, y1i−1 ) to each message m ∈ [1 : 2nR ] and
received sequence y1i−1 for i ∈ [1 : n],
∙ a set of relay encoders, where encoder k ∈ [2 : N] assigns a symbol xki (yki−1 ) to every
received sequence yki−1 for i ∈ [1 : n], and
̂ k or an error message e
∙ a set of decoders, where decoder k ∈ D assigns an estimate m
n
to each yk .
We assume that the message M is uniformly distributed over the message set. The average
̂ k ̸= M for some k ∈ D}. A rate R is said to
probability of error is defined as Pe(n) = P{M
be achievable if there exists a sequence of (2nR , n) codes such that limn→∞ Pe(n) = 0. The
capacity of the DM-MN is the supremum of the set of achievable rates.
Consider the following special cases:
∙ If N = 2, Y1 = Y2 , X2 = , and D = {2}, then the DM-MN reduces to the DMC with
feedback.
∙ If N = 3, X3 = Y1 = , and D = {3}, the DM-MN reduces to the DM-RC.
∙ If X2 = ⋅ ⋅ ⋅ = XN =  and D = [2 : N], then the DM-MN reduces to the DM-BC with
common message (with feedback if Y1 = (Y2 , . . . , YN ) or without feedback if Y1 = ).
∙ If D = {N}, then the DM-MN reduces to a discrete memoryless unicast network.
18.1 Discrete Memoryless Multicast Network 461

As we already know from the special case of the DM-RC, the capacity of the DM-MN is
not known in general.
Cutset bound. The cutset bounds for the graphical multicast network in Section . and
the DM-RC in Section . can be generalized to the DM-MN.

Theorem . (Cutset Bound for the DM-MN). The capacity of the DM-MN with
destination set D is upper bounded as

C ≤ max min min I(X(S); Y(S c )| X(S c )).

p(x 󰑁 ) k∈D S:1∈S,k∈S 󰑐

Proof. To establish this cutset upper bound, let k ∈ D and consider a cut (S , S c ) such that
1 ∈ S and k ∈ S c . Then by Fano’s inequality, H(M|Y n (S c )) ≤ H(M|Ykn ) ≤ nєn , where єn
tends to zero as n → ∞. Now consider

nR = H(M)
≤ I(M; Y n (S c )) + nєn
n
= 󵠈 I(M; Yi (S c )|Y i−1 (S c )) + nєn
i=1
n
= 󵠈 I(M; Yi (S c )|Y i−1 (S c ), Xi (S c )) + nєn
i=1
n
≤ 󵠈 I(M, Y i−1 (S c ); Yi (S c )| Xi (S c )) + nєn
i=1
n
≤ 󵠈 I(Xi (S), M, Y i−1 (S c ); Yi (S c )| Xi (S c )) + nєn
i=1
n
= 󵠈 I(Xi (S); Yi (S c )| Xi (S c )) + nєn .
i=1

Introducing a time-sharing random variable Q ∼ Unif[1 : n] independent of all other ran-

dom variables and defining X(S) = XQ (S), Y (S c ) = YQ (S c ), and X(S c ) = XQ (S c ) so that
Q → X N → Y N , we obtain

nR ≤ nI(XQ (S); YQ (S c )| XQ (S c ), Q) + nєn

≤ nI(XQ (S); YQ (S c )| XQ (S c )) + nєn
= nI(X(S); Y (S c )| X(S c )) + nєn ,

which completes the proof of Theorem ..

As discussed in Example ., this cutset bound is not tight in general.

462 Discrete Memoryless Networks

18.2 NETWORK DECODE–FORWARD

We now consider coding schemes and corresponding lower bounds on the capacity of
the DM-MN. First we generalize the decode–forward scheme for the relay channel to
DM multicast networks.

Theorem . (Network Decode–Forward Lower Bound). The capacity of the DM-
MN p(y N |x N ) with destination set D is lower bounded as

C ≥ max
󰑁
min I(X k ; Yk+1 | Xk+1
N
).
p(x ) k∈[1:N−1]

Remark 18.1. For N = 3 and X3 = , the network decode–forward lower bound reduces
to the decode–forward lower bound for the relay channel in Section ..
Remark 18.2. The network decode–forward lower bound is tight when the DM-MN is
degraded, i.e.,
N
p(yk+2 |x N , y k+1 ) = p(yk+2
N N
|xk+1 , yk+1 )

for k ∈ [1 : N − 2]. The converse follows by evaluating the cutset bound in Theorem .
only for cuts of the form S = [1 : k], k ∈ [1 : N − 1], instead of all possible cuts. Hence

C ≤ max min I(X k ; Yk+1

N N
| Xk+1 ) = max min I(X k ; Yk+1 | Xk+1
N
),
p(x 󰑁 ) k∈[1:N−1] p(x 󰑁 ) k∈[1:N−1]

where the equality follows by the degradedness condition X k → (Yk+1 , Xk+1

N
) → Yk+2
N
.
Remark 18.3. The network decode–forward lower bound is also tight for broadcasting
over the relay channel p(y2 , y3 |x1 , x2 ), i.e., D = {2, 3} (see Problem .). For this case,
the capacity is
C = max min󶁁I(X1 ; Y2 | X2 ), I(X1 , X2 ; Y3 )󶁑.
p(x1 ,x2 )

Remark 18.4. The decode–forward lower bound holds for any set of destination nodes
D ⊆ [2 : N]. The bound can be further improved by removing some relay nodes and
relabeling the nodes in the best order.

18.2.1 Sliding Window Decoding for the DM-RC

The achievability proof of the network decode–forward lower bound in Theorem . in-
volves the new idea of sliding window decoding. For clarity of presentation, we first use
this decoding scheme to provide an alternative proof of the decode–forward lower bound
for the relay channel

C ≥ max min󶁁I(X1 , X2 ; Y3 ), I(X1 ; Y2 | X2 )󶁑. (.)

p(x1 ,x2 )
18.2 Network Decode–Forward 463

The codebook generation, encoding, and relay encoding are the same as in the decode–
forward scheme for the DM-RC in Section .. However, instead of using backward
decoding, the receiver decodes for the message sequence in the forward direction using
a different procedure.
As before, we use a block Markov coding scheme to send a sequence of (b − 1) i.i.d.
messages, m j ∈ [1 : 2nR ], j ∈ [1 : b − 1], in b transmission blocks, each consisting of n
transmissions.

Codebook generation. We use the same codebook generation as in cooperative multihop

relaying discussed in Section ... Fix the pmf p(x1 , x2 ) that attains the decode–forward
lower bound in (.). For each block j ∈ [1 : b], randomly and independently generate 2nR
sequences x2n (m j−1 ), m j−1 ∈ [1 : 2nR ], each according to ∏ni=1 p X2 (x2i ). For each m j−1 ∈
[1 : 2nR ], randomly and conditionally independently generate 2nR sequences x1n (m j |m j−1 ),
m j ∈ [1 : 2nR ], each according to ∏ni=1 p X1 |X2 (x1i |x2i (m j−1 )). This defines the codebook
C j = {(x1n (m j |m j−1 ), x2n (m j−1 )) : m j−1 , m j ∈ [1 : 2nR ]} for j ∈ [1 : b]. The codebooks are
revealed to all parties.
Encoding and decoding are explained with the help of Table ..

Block    ⋅⋅⋅ b−1 b

Y2 ̃1
m ̃2
m ̃3
m ⋅⋅⋅ ̃ b−1
m 

X2 x2n (1) ̃ 1)
x2n (m ̃ 2)
x2n (m ⋅⋅⋅ ̃ b−2 )
x2n (m ̃ b−1 )
x2n (m

Y3  ̂1
m ̂2
m ⋅⋅⋅ ̂ b−2
m ̂ b−1
m

Table .. Encoding and sliding window decoding for the relay channel.

Encoding. Encoding is again the same as in the coherent multihop coding scheme. To
send m j in block j, the sender transmits x1n (m j |m j−1 ) from codebook C j , where m0 =
mb = 1 by convention.

Relay encoding. Relay encoding is also the same as in the coherent multihop coding
scheme. By convention, let m ̃ 0 = 1. At the end of block j, the relay finds the unique
message m ̃ j such that (x1n (m
̃ j |m ̃ j−1 ), y2n ( j)) ∈ Tє(n) . In block j + 1, it transmits
̃ j−1 ), x2n (m
̃ j ) from codebook C j+1 .
x2 ( m
n

Sliding window decoding. At the end of block j + 1, the receiver finds the unique mes-
̂ j such that (x1n (m
sage m ̂ j |m ̂ j−1 ), y3n ( j)) ∈ Tє(n) and (x2n (m
̂ j−1 ), x2n (m ̂ j ), y3n ( j + 1)) ∈ Tє(n)
simultaneously.

Analysis of the probability of error. We analyze the probability of decoding error for M j
464 Discrete Memoryless Networks

averaged over codebooks. Assume without loss of generality that M j−1 = M j = 1. Then
the decoder makes an error only if one or more of the following events occur:
̃ j − 1) = {M
E( ̃ j−1 ̸= 1},
̃ j) = {M
E( ̃ j ̸= 1},
̂ j−1 ̸= 1},
E( j − 1) = {M
̃ j |M
E1 ( j) = 󶁁(X1n (M ̂ j−1 ), X n (M
̂ j−1 ), Y n ( j)) ∉ T (n) or (X n (M
̃ j ), Y n ( j + 1)) ∉ T (n) 󶁑,
2 3 є 2 3 є
̂ j−1 ), X n (M
E2 ( j) = 󶁁(X1n (m j | M ̂ j−1 ), Y n ( j)) ∈ T (n) and (X n (m j ), Y n ( j + 1)) ∈ T (n)
2 3 є 2 3 є
for some m j ̸= M ̃ j 󶁑.

Thus the probability of error is upper bounded as

P(E( j)) = P{M̂ j ̸= 1}
̃ j − 1) ∪ E(
≤ P(E( ̃ j) ∪ E( j − 1) ∪ E1 ( j) ∪ E2 ( j))
̃ j − 1)) + P(E(
≤ P(E( ̃ j)) + P(E( j − 1))
+ P(E1 ( j) ∩ Ẽ ( j − 1) ∩ Ẽc ( j) ∩ E c ( j − 1)) + P(E2 ( j) ∩ Ẽc ( j)).
c

By independence of the codebooks, the LLN, and the packing lemma, the first, second,
and fourth terms tend to zero as n → ∞ if R < I(X1 ; Y2 |X2 ) − δ(є), and by induction, the
third term tends to zero as n → ∞. For the last term, consider
P(E2 ( j) ∩ Ẽc ( j)) = P󶁁(X1n (m j | M
̂ j−1 ), X n (M
2
̂ j−1 ), Y n ( j)) ∈ T (n) ,
3 є
̃ j = 1󶁑
(X2n (m j ), Y3n ( j + 1)) ∈ Tє(n) for some m j ̸= 1, and M
̂ j−1 ), X2n (M
≤ 󵠈 P󶁁(X1n (m j | M ̂ j−1 ), Y3n ( j)) ∈ Tє(n) ,
m 󰑗 ̸=1
̃ j = 1󶁑
(X2n (m j ), Y3n ( j + 1)) ∈ Tє(n) , and M
(a)
= 󵠈 P󶁁(X1n (m j | M ̂ j−1 ), X n (M
̂ j−1 ), Y n ( j)) ∈ T (n) and M
̃ j = 1󶁑
2 3 є
m 󰑗 ̸=1
⋅ P󶁁(X2n (m j ), Y3n ( j + 1)) ∈ Tє(n) 󵄨󵄨󵄨󵄨 M
̃ j = 1󶁑
≤ 󵠈 P󶁁(X1n (m j | M ̂ j−1 ), X2n (M
̂ j−1 ), Y3n ( j)) ∈ Tє(n) 󶁑
m 󰑗 ̸=1
⋅ P󶁁(X2n (m j ), Y3n ( j + 1)) ∈ Tє(n) 󵄨󵄨󵄨󵄨 M
̃ j = 1󶁑
(b) nR −n(I(X ;Y |X )−δ(є)) −n(I(X ;Y )−δ(є))
≤2 2 1 3 2 2 3
2 ,
which tends to zero as n → ∞ if R < I(X1 ; Y3 |X2 ) + I(X2 ; Y3 ) − 2δ(є) = I(X1 , X2 ; Y3 ) −
2δ(є). Here step (a) follows since, by the independence of the codebooks, the events
̂ j−1 ), X n (M
󶁁(X1n (m j | M ̂ j−1 ), Y n ( j)) ∈ T (n) 󶁑
2 3 є

and
󶁁(X2n (m j ), Y3n ( j + 1)) ∈ Tє(n) 󶁑
are conditionally independent given M ̃ j = 1 for m j ̸= 1, and (b) follows by independence
of the codebooks and the joint typicality lemma. This completes the proof of the decode–
forward lower bound for the DM-RC using sliding block decoding.
18.2 Network Decode–Forward 465

18.2.2 Proof of the Network Decode–Forward Lower Bound

We now extend decode–forward using sliding window decoding for the relay channel to
the DM-MN. Each relay node k ∈ [2 : N − 1] recovers the message and forwards it to the
next node. A sequence of (b − N + 2) i.i.d. messages m j ∈ [1 : 2nR ], j ∈ [1 : b − N + 2],
is to be sent over the channel in b blocks, each consisting of n transmissions. Note that
the average rate over the b blocks is R(b − N + 2)/b, which can be made as close to R as
desired.
Codebook generation. Fix p(x N ). For every j ∈ [1 : b], randomly and independently
generate a sequence xNn ( j) according to ∏ni=1 p X󰑁 (xNi ) (this is similar to coded time shar-
ing). For every relay node k = N − 1, N − 2, . . . , 1 and every (m j−N+2 , . . . , m j−k ), gener-
j−k
ate 2nR conditionally independent sequences xkn (m j−k+1 |m j−N+2 ), m j−k+1 ∈ [1 : 2nR ], each
j−k−1
according to ∏ni=1 p X󰑘 |X󰑘+1
󰑁 (x ki |xk+1,i (m j−k |m
j−N+2 ), . . . , xN−1,i (m j−N+2 ), xNi ( j)). This de-
fines the codebook

j−1 j−2
C j = 󶁁(x1n (m j |m j−N+2 ), x2n (m j−1 |m j−N+2 ), . . . , xN−1
n
(m j−N+2 ), xNn ( j)) :
m j−N+2 , . . . , m j ∈ [1 : 2nR ]󶁑

for each block j ∈ [1 : b]. The codebooks are revealed to all parties.
Encoding and decoding for N = 4 are explained with the help of Table ..
Encoding. Let m j ∈ [1 : 2nR ] be the new message to be sent in block j. Source node 
j−1
transmits x1n (m j |m j−N+2 ) from codebook C j . At the end of block j + k − 2, relay node
k ∈ [2 : N − 1] has an estimate m ̂ k j of the message m j . In block j + k − 1, it transmits
j−1
̂ k j |m
x k (m
n
̂ k, j+k−N+1 ) from codebook C j+k−1 . Node N transmits xNn ( j) in block j ∈ [1 : N].
Decoding and analysis of the probability of error. The decoding procedures for message
̂ k j such
m j are as follows. At the end of block j + k − 2, node k finds the unique message m
that

̂ k j ), x2n , . . . , xNn ( j), ykn ( j)) ∈ Tє(n) ,

(x1n (m
̂ k j ), x3n , . . . , xNn ( j + 1), ykn ( j + 1)) ∈ Tє(n) ,
(x2n (m
..
.
n
(xk−1 ̂ k j ), xkn , . . . , xNn ( j + k − 2), ykn ( j + k − 2)) ∈ Tє(n) ,
(m

j−1
where the dependence of codewords on previous message indices m ̂ k, j−N+2 is suppressed
for brevity. For example, for the case N = 4, node  at the end of block j + 2 finds the
unique message m ̂ 4 j such that

̂ 4 j |m
(x1n (m ̂ 4, j−2 , m
̂ 4, j−1 ), x2n (m
̂ 4, j−1 | m ̂ 4, j−2 ), x4n ( j), y4n ( j)) ∈ Tє(n) ,
̂ 4, j−2 ), x3n (m
̂ 4 j |m
(x2n (m ̂ 4, j−1 ), x4n ( j + 1), y4n ( j + 1)) ∈ Tє(n) ,
̂ 4, j−1 ), x3n (m
̂ 4 j ), x4n ( j + 2), y4n ( j + 2)) ∈ Tє(n) .
(x3n (m
466 Discrete Memoryless Networks

Block    ⋅⋅⋅ j

x1n (m3 |m21 )

j−1
X1 x1n (m1 |1, 1) x1n (m2 |1, m1 ) ⋅⋅⋅ x1n (m j |m j−2 )

Y2 ̂ 21
m ̂ 22
m ̂ 23
m ⋅⋅⋅ ̂ 2j
m

X2 x2n (1|1) ̂ 21 |1)

x2n (m ̂ 22 |m
x2n (m ̂ 21 ) ⋅⋅⋅ ̂ 2, j−1 |m
x2n (m ̂ 2, j−2 )

Y3  ̂ 31
m ̂ 32
m ⋅⋅⋅ ̂ 3, j−1
m

X3   ̂ 31 )
x3n (m ⋅⋅⋅ ̂ 3, j−2 )
x3n (m

Y4   ̂ 41
m ⋅⋅⋅ ̂ 4, j−2
m

Block j+1 ⋅⋅⋅ b−2 b−1 b

j
X1 x1n (m j+1 |m j−1 ) ⋅⋅⋅ x1n (mb−2 |mb−4
b−3
) x1n (1|mb−2
b−3 ) x1n (1|mb−2 , 1)

Y2 ̂ 2, j+1
m ⋅⋅⋅ ̂ 2,b−2
m  

X2 ̂ 2 j |m
x2n (m ̂ 2, j−1 ) ⋅⋅⋅ ̂ 2,b−3 |m
x2n (m ̂ 2,b−4 ) ̂ 2,b−2 |m
x2n (m ̂ 2,b−3 ) ̂ 2,b−2 )
x2n (1|m

Y3 ̂ 3, j
m ⋅⋅⋅ ̂ 3,b−3
m ̂ 3,b−2
m 

X3 ̂ 3, j−1 )
x3n (m ⋅⋅⋅ ̂ 3,b−4 )
x3n (m ̂ 3,b−3 )
x3n (m ̂ 3,b−2 )
x3n (m

Y4 ̂ 4, j−1
m ⋅⋅⋅ ̂ 4,b−4
m ̂ 4,b−3
m ̂ 4,b−2
m

Table .. Encoding and decoding for network decode–forward for N = 4.

Following similar steps to the proof of the sliding window decoding for the relay chan-
nel, by independence of the codebooks, the LLN, the joint typicality lemma, and induc-
tion, it can be shown that the probability of error tends to zero as n → ∞ if
k−1
R < 󵠈 I(Xk󳰀 ; Yk | XkN󳰀 +1 ) − δ(є) = I(X k−1 ; Yk | XkN ) − δ(є).
k 󳰀 =1

This completes the proof of Theorem ..

18.3 NOISY NETWORK CODING

The compress–forward coding scheme for the relay channel and network coding for the
graphical multicast network can be extended to DM multicast networks.
18.3 Noisy Network Coding 467

Theorem . (Noisy Network Coding Lower Bound). The capacity of the DM-MN
p(y N |x N ) with destination set D is lower bounded as

C ≥ max min min ̂ c ), Yk | X(S c )) − I(Y (S); Ŷ (S)| X N , Y(S

󶀡I(X(S); Y(S ̂ c ), Yk )󶀱,
k∈D S:1∈S ,k∈S 󰑐

where the maximum is over all conditional pmfs ∏Nk=1 p(xk )p( ŷk |yk , xk ) and Ŷ1 =  by
convention.

Note that this lower bound differs from the cutset bound in Theorem . in the fol-
lowing three aspects:
∙ The first term is similar to the cutset bound with Y(S c ) replaced by the “compressed
version” Ŷ (S c ).
∙ There is a subtracted term that captures the rate penalty for describing the compressed
version.
∙ The maximum is over independent X N (no coherent cooperation).

Remark 18.5. For N = 3, the noisy network coding lower bound reduces to the com-
press–forward lower bound for the DM-RC in Section ..
Remark 18.6. As in compress–forward for the relay channel, the noisy network coding
scheme can be improved by coded time sharing (i.e., by incorporating a time-sharing
random variable Q into the characterization of the lower bound).
Before proving Theorem ., we discuss several special cases.

18.3.1 Deterministic Multicast Network

Consider the deterministic multicast network depicted in Figure ., where the received
symbol Yk at each node k ∈ [1 : N] is a function of the transmitted symbols, i.e., Yk =
yk (X N ). Note that this model generalizes the graphical multicast network. It also captures
the interference and broadcasting aspects of the DMN without the noise; hence it is a good
model for networks with “high SNR.”
The cutset bound for this deterministic multicast network simplifies to

C ≤ max min min H(Y(S c )| X(S c )). (.)

󰑁 p(x ) k∈D S:1∈S,k∈S 󰑐

In the other direction, by setting Ŷk = Yk for all k ∈ [2 : N], the noisy network coding
lower bound simplifies to

C≥ max min min H(Y (S c )| X(S c )). (.)

∏󰑁 k∈D S:1∈S ,k∈S 󰑐
󰑘=1 p(x 󰑘 )

Note that the maximum here is taken over all product pmfs instead of all joint pmfs as in
468 Discrete Memoryless Networks

X1
X1 X2 X1
X2 д2 Y2 X2
д1 Y1 дN YN
XN
XN X1 XN
X2
дk Yk

Figure .. Deterministic network.

the cutset bound in (.). As such, these two bounds do not coincide in general. There
are several interesting special cases where the bounds coincide, however.
Graphical multicast network with cycles. Consider a multicast network modeled by a
directed cyclic graph (N , E , C) with destination set D. Each node k ∈ [1 : N] transmits
Xk = (Xkl : (k, l) ∈ E) and receives Yk = (X jk : ( j, k) ∈ E). Thus, each link ( j, k) ∈ E
carries a symbol X jk ∈ X jk noiselessly from node j to node k with link capacity C jk =
log |X jk |. Now for each cut (S, S c ) separating source node  and some destination node,

H(Y (S c )| X(S c )) ≤ 󵠈 H(X jk : j ∈ S, ( j, k) ∈ E)

k∈S 󰑐

≤ 󵠈 󵠈 H(X jk )
k∈S 󰑐 j∈S,( j,k)∈E

≤ 󵠈 C jk
( j,k)∈E
j∈S ,k∈S 󰑐

= C(S)
with equality if (X jk : ( j, k) ∈ E) has a uniform product pmf. Hence, the cutset bound
in (.) and the lower bound in (.) coincide, and the capacity is
C = min min C(S).
k∈D S:1∈S ,k∈S 󰑐

Note that this result extends the network coding theorem in Section . to graphical
networks with cycles.
Deterministic multicast network with no interference. Suppose that each node in the
deterministic network receives a collection of single-variable functions of input symbols,
i.e., Yk = (yk1 (X1 ), . . . , ykN (XN )), k ∈ [1 : N]. Then it can be shown that the cutset bound
in (.) is attained by a product input pmf and thus the capacity is
C= max min min 󰑐
󵠈 H(Yl j : l ∈ S c ). (.)
∏󰑁
󰑘=1 p(x󰑘 ) k∈D S:1∈S,k∈S j∈S
18.3 Noisy Network Coding 469

For example, if N = 3, D = {2, 3}, Y2 = (y21 (X1 ), y23 (X3 )), and Y3 = (y31 (X1 ), y32 (X2 )),
then

C= max min󶁁H(Y21 , Y31 ), H(Y21 ) + H(Y23 ), H(Y31 ) + H(Y32 )󶁑.

p(x1 )p(x2 )p(x3 )

Finite-field deterministic multicast network (FFD-MN). Suppose that each node re-
ceives a linear function
N
Y j = 󵠈 д jk Xk ,
k=1

where д jk , j, k ∈ [1 : N], and Xk , k ∈ [1 : N], take values in the finite field 𝔽q . Then it
can be easily shown that the cutset bound is attained by the uniform product input pmf.
Hence the capacity is

C = min min rank(G(S)) log q,

k∈D S:1∈S,k∈S 󰑐

where G(S) for each cut (S , S c ) is defined such that

Y (S) G 󳰀(S) G(S c ) X(S) Z(S)

󶁦 c 󶁶=󶁦 󳰀 c 󶁶󶁦 c 󶁶+󶁦 󶁶,
Y(S ) G(S) G (S ) X(S ) Z(S c )

for some submatrices G 󳰀 (S) and G 󳰀 (S c ).

Remark .. The lower bound in (.) does not always coincide with the cutset bound.
For example, consider the deterministic relay channel with X3 = Y1 = , Y2 = y2 (X1 , X2 ),
Y3 = y3 (X1 , X2 ), and D = {3}. Then the capacity is

C = max min󶁁H(Y3 ), H(Y2 , Y3 | X2 )󶁑,

p(x1 ,x2 )

and is achieved by partial decode–forward as discussed in Section .. In the other di-
rection, the noisy network coding lower bound in (.) simplifies to

C ≥ max min󶁁H(Y3 ), H(Y2 , Y3 | X2 )󶁑,

p(x1 )p(x2 )

which can be strictly lower than the capacity.

18.3.2 Wireless Erasure Multicast Network

Consider a wireless data network with packet loss modeled by a hypergraph H = (N , E , C)
with random input erasures as depicted in Figure .. Each node k ∈ [1 : N] broadcasts a
symbol Xk to a subset of nodes Nk over a hyperedge (k, Nk ) and receives Yk = (Yk j : k ∈
N j ) from nodes j for k ∈ N j , where

e with probability pk j ,
Yk j = 󶁇
Xj with probability 1 − pk j .
470 Discrete Memoryless Networks

̂j
M
2 j

e e

1 4 e N
M ̂N
M

e
3 k
̂k
M

Figure .. Wireless erasure multicast network.

Note that the capacity of each hyperedge (k, Nk ) (with no erasure) is Ck = log |Xk |. We
assume that the erasures are independent of each other. Assume further that the erasure
pattern of the entire network is known at each destination node.
The capacity of this network is

C = min min 󵠈 󶀤1 − 󵠉 p lk 󶀴 Ck . (.)

j∈D S:1∈S , j∈S 󰑐
k∈S:N󰑘 ∩S 󰑐 ̸= l∈N󰑘 ∩S 󰑐

The proof follows by evaluating the noisy network coding lower bound using the uniform
product pmf on X N and Ŷk = Yk , k ∈ [2 : N], and showing that this choice attains the
maximum in the cutset bound in Theorem ..

18.3.3 Noisy Network Coding for the DM-RC

While the noisy network coding scheme achieves the same compress–forward lower
bound for the relay channel, it involves several new ideas not encountered in the com-
press–forward scheme. First, unlike other block Markov coding techniques we have seen
so far, in noisy network coding the same message m ∈ [1 : 2nbR ] is sent over b blocks
(each with n transmissions), while the relays send compressed versions of the received
sequences in the previous block. Second, instead of using Wyner–Ziv coding to send the
compression index, the relay sends the compression index without binning. Third, the
receiver performs simultaneous joint typicality decoding on the received sequences from
all b blocks without explicitly decoding for the compression indices.
For clarity of presentation, we first use the noisy network coding scheme to obtain an
alternative proof of the compress–forward lower bound for the relay channel

C≥ max min󶁁I(X1 , X2 ; Y3 ) − I(Y2 ; Ŷ2 | X1 , X2 , Y3 ), I(X1 ; Ŷ2 , Y3 | X2 )󶁑. (.)

p(x1 )p(x2 )p( ŷ2 |y2 ,x2 )
18.3 Noisy Network Coding 471

Codebook generation. Fix the conditional pmf p(x1 )p(x2 )p( ŷ2 |y2 , x2 ) that attains the
lower bound. For each j ∈ [1 : b], we generate an independent codebook as follows.
Randomly and independently generate 2nbR sequences x1n ( j, m), m ∈ [1 : 2nbR ], each ac-
cording to ∏ni=1 p X1 (x1i ). Randomly and independently generate 2nR2 sequences x2n (l j−1 ),
l j−1 ∈ [1 : 2nR2 ], each according to ∏ni=1 p X2 (x2i ). For each l j−1 ∈ [1 : 2nR2 ], randomly and
conditionally independently generate 2nR2 sequences ŷ2n (l j |l j−1 ), l j ∈ [1 : 2nR2 ], each ac-
cording to ∏ni=1 pŶ2 |X2 ( ŷ2i |x2i (l j−1 )). This defines the codebook

C j = 󶁁(x1n ( j, m), x2n (l j−1 ), ŷ2n (l j | l j−1 )) : m ∈ [1 : 2nbR ], l j , l j−1 ∈ [1 : 2nR2 ]󶁑

for j ∈ [1 : b]. The codebooks are revealed to all parties.

Encoding and decoding are explained with the help of Table ..

Block    ⋅⋅⋅ b−1 b

X1 x1n (1, m) x1n (2, m) x1n (3, m) ⋅⋅⋅ x1n (b − 1, m) x1n (b, m)

X2 x2n (1) x2n (l1 ) x2n (l2 ) ⋅⋅⋅ x2n (lb−2 ) x2n (lb−1 )

Y3    ⋅⋅⋅  ̂
m

Table .. Noisy network coding for the relay channel.

Encoding. To send m ∈ [1 : 2nbR ], the sender transmits x1n ( j, m) from C j in block j.

Relay encoding. By convention, let l0 = 1. At the end of block j, the relay finds an index
l j such that (y2n ( j), ŷ2n (l j |l j−1 ), x2n (l j−1 )) ∈ Tє(n)
󳰀 . If there is more than one such index, it

selects one of them uniformly at random. If there is no such index, it selects an index
from [1 : 2nR2 ] uniformly at random. The relay then transmits x2n (l j ) from C j+1 in block
j + 1.
Decoding. Let є > є 󳰀 . At the end of block b, the receiver finds the unique index m ̂ such
(n)
̂ x2 (l j−1 ), ŷ2 (l j |l j−1 ), y3 ( j)) ∈ Tє for all j ∈ [1 : b] for some l1 , l2 , . . . , lb .
that (x1 ( j, m),
n n n n

Analysis of the probability of error. To bound the probability of error, assume without
loss of generality that M = 1 and L1 = L2 = ⋅ ⋅ ⋅ = Lb = 1. Then the decoder makes an error
only if one or more of the following events occur:

E1 = 󶁁(Y2n ( j), Ŷ2n (l j |1), X2n (1)) ∉ Tє(n)

󳰀 for all l j for some j ∈ [1 : b]󶁑,
E2 = 󶁁(X1n ( j, 1), X2n (1), Ŷ2n (1|1), Y3n ( j)) ∉ Tє(n) for some j ∈ [1 : b]󶁑,
E3 = 󶁁(X1n ( j, m), X2n (l j−1 ), Ŷ2n (l j | l j−1 ), Y3n ( j)) ∈ Tє(n) for all j for some l b , m ̸= 1󶁑.
472 Discrete Memoryless Networks

Thus, the probability of error is upper bounded as

P(E) ≤ P(E1 ) + P(E2 ∩ E1c ) + P(E3 ).

By the covering lemma and the union of events bound (over b blocks), P(E1 ) tends to zero
as n → ∞ if R2 > I(Ŷ2 ; Y2 |X2 ) + δ(є 󳰀 ). By the conditional typicality lemma and the union
of events bound, the second term P(E2 ∩ E1c ) tends to zero as n → ∞. For the third term,
define the events

Ẽj (m, l j−1 , l j ) = 󶁁(X1n ( j, m), X2n (l j−1 ), Ŷ2n (l j | l j−1 ), Y3n ( j)) ∈ Tє(n) 󶁑.

Now consider

b
P(E3 ) = P󶀣 󵠎 󵠎 󵠏 Ẽj (m, l j−1 , l j )󶀳
m̸=1 l 󰑏 j=1
b
≤ 󵠈 󵠈 P󶀣 󵠏 Ẽj (m, l j−1 , l j )󶀳
m̸=1 l 󰑏 j=1
b
(a)
= 󵠈 󵠈 󵠉 P(Ẽj (m, l j−1 , l j ))
m̸=1 l 󰑏 j=1
b
≤ 󵠈 󵠈 󵠉 P(Ẽj (m, l j−1 , l j )), (.)
m̸=1 l 󰑏 j=2

where (a) follows since the codebooks are generated independently for each block j ∈
[1 : b] and the channel is memoryless. Note that if m ̸= 1 and l j−1 = 1, then X1n ( j, m) ∼
∏ni=1 p X1 (x1i ) is independent of (Ŷ2n (l j |l j−1 ), X2n (l j−1 ), Y3n ( j)). Hence, by the joint typical-
ity lemma,

P(Ẽj (m, l j−1 , l j )) = P󶁁(X1n ( j, m), X2n (l j−1 ), Ŷ2n (l j | l j−1 ), Y3n ( j)) ∈ Tє(n) 󶁑
≤ 2−n(I1 −δ(є)) ,

where I1 = I(X1 ; Ŷ2 , Y3 |X2 ). Similarly, if m ̸= 1 and l j−1 ̸= 1, then (X1n ( j, m), X2n (l j−1 ),
Ŷ2n (l j |l j−1 )) ∼ ∏ni=1 p X1 (x1i )p X2 ,Ŷ2 (x2i , ŷ2i ) is independent of Y3n ( j). Hence

P(Ẽj (m, l j−1 , l j )) ≤ 2−n(I2 −δ(є)) ,

where I2 = I(X1 , X2 ; Y3 ) + I(Ŷ2 ; X1 , Y3 |X2 ). If π(1|l b−1 ) = k/(b − 1), i.e., l b−1 has k ones,
then
b
󵠉 P(Ẽj (m, l j−1 , l j )) ≤ 2−n(kI1 +(b−1−k)I2 −(b−1)δ(є)) .
j=2
18.3 Noisy Network Coding 473

Continuing with the bound in (.), we have

b b
󵠈 󵠈 󵠉 P(Ẽj (m, l j−1 , l j )) = 󵠈 󵠈 󵠈 󵠉 P(Ẽj (m, l j−1 , l j ))
m̸=1 l 󰑏 j=2 m̸=1 l󰑏 l 󰑏−1 j=2
b−1
b − 1 n(b−1− j)R2 −n( jI1 +(b−1− j)I2 −(b−1)δ(є))
≤ 󵠈 󵠈󵠈󶀤 󶀴2 ⋅2
m̸=1 l󰑏 j=0 j
b−1
b − 1 −n( jI1 +(b−1− j)(I2 −R2 )−(b−1)δ(є))
= 󵠈 󵠈󵠈󶀤 󶀴2
m̸=1 l󰑏 j=0 j
b−1
b − 1 −n(b−1)(min{I1 , I2 −R2 }−δ(є))
≤ 󵠈 󵠈󵠈󶀤 󶀴2
m̸=1 l󰑏 j=0 j

≤ 2nbR ⋅ 2nR2 ⋅ 2b ⋅ 2−n(b−1)(min{I1 , I2 −R2 }−δ(є)) ,

which tends to zero as n → ∞ if
b−1 R
R< 󶀡min{I1 , I2 − R2 } − δ 󳰀 (є)󶀱 − 2 .
b b
Finally, by eliminating R2 > I(Ŷ2 ; Y2 |X2 ) + δ(є 󳰀 ), substituting I1 = I(X1 ; Ŷ2 , Y3 |X2 ) and
I2 = I(X1 , X2 ; Y3 ) + I(Ŷ2 ; X1 , Y3 |X2 ), and taking b → ∞, we have shown that the proba-
bility of error tends to zero as n → ∞ if
R < min󶁁I(X1 ; Ŷ2 , Y3 | X2 ), I(X1 , X2 ; Y3 ) − I(Ŷ2 ; Y2 | X1 , X2 , Y3 )󶁑 − δ 󳰀 (є) − δ(є 󳰀 ).
This completes the proof of the compress–forward lower bound for the relay channel
in (.) using noisy network coding.

18.3.4 Proof of the Noisy Network Coding Lower Bound

We now establish the noisy network coding lower bound for the DM-MN. First consider
the unicast special case with (N − 2) relay nodes k ∈ [2 : N − 1] and a single destination
node N. Assume without loss of generality that XN =  (otherwise, we can consider a new
destination node N + 1 with YN+1 = YN and XN+1 = ). To simplify notation, let Ŷ1 = 
and ŶN = YN .
Codebook generation. Fix the conditional pmf ∏N−1 ̂k |yk , xk ) that attains the
k=1 p(xk )p( y
lower bound. For each j ∈ [1 : b], randomly and independently generate 2nbR sequences
x1n ( j, m), m ∈ [1 : 2nbR ], each according to ∏ni=1 p X1 (x1i ). For each k ∈ [2 : N − 1], ran-
domly and independently generate 2nR󰑘 sequences xkn (lk, j−1 ), lk, j−1 ∈ [1 : 2nR󰑘 ], each ac-
cording to ∏ni=1 p X󰑘 (xki ). For each k ∈ [2 : N − 1] and lk, j−1 ∈ [1 : 2nR󰑘 ], randomly and
conditionally independently generate 2nR󰑘 sequences ŷkn (lk j |lk, j−1 ), lk j ∈ [1 : 2nR󰑘 ], each
according to ∏ni=1 pŶ󰑘 |X󰑘 ( ŷki |xki (lk, j−1 )). This defines the codebook
C j = 󶁁(x1n (m, j), x2n (l2, j−1 ), . . . , xN−1
n
(lN−1, j−1 ), ŷ2n (l2 j | l2, j−1 ), . . . , ŷN−1
n
(lN−1, j | lN−1, j−1 )) :
m ∈ [1 : 2nbR ], lk, j−1 , lk j ∈ [1 : 2nR󰑘 ], k ∈ [2 : N − 1]󶁑
for j ∈ [1 : b]. The codebooks are revealed to all parties.
474 Discrete Memoryless Networks

Encoding and decoding are explained with the help of Table ..

Block   ⋅⋅⋅ b−1 b

X1 x1n (1, m) x1n (2, m) ⋅⋅⋅ x1n (b − 1, m) x1n (b, m)

Yk ŷkn (lk1 |1), lk1 ŷkn (lk2 |lk1 ), lk2 ⋅⋅⋅ ŷkn (lk,b−1 |lk,b−2 ), lk,b−1 ŷkn (lkb |lk,b−1 ), lkb

Xk xkn (1) xkn (lk1 ) ⋅⋅⋅ xkn (lk,b−2 ) xkn (lk,b−1 )

YN   ⋅⋅⋅  ̂
m

Table .. Encoding and decoding for the noisy network coding lower bound.

Encoding. To send m ∈ [1 : 2nbR ], source node  transmits x n ( j, m) in block j.

Relay encoding. Relay node k, upon receiving ykn ( j), finds an index lk j such that (ykn ( j),
ŷkn (lk j |lk, j−1 ), xkn (lk, j−1 )) ∈ Tє(n)
󳰀 . If there is more than one such index, it selects one of

them uniformly at random. If there is no such index, it selects an index from [1 : 2nR󰑘 ]
uniformly at random. Relay node k then transmits xkn (lk j ) in block j + 1.
Decoding and analysis of the probability of error. Let є > є 󳰀 . At the end of block b, the
̂ such that
receiver finds the unique index m

̂ x2n (l2, j−1 ), . . . , xN−1

(x1n ( j, m), n
(lN−1, j−1 ),
ŷ2n (l2 j | l2, j−1 ), . . . , ŷN−1
n
(lN−1, j | lN−1, j−1 ), yNn ( j)) ∈ Tє(n) for every j ∈ [1 : b]

for some l b = (l 1 , . . . , l b ), where l j = (l2 j , . . . , lN−1, j ), j ∈ [1 : b], and lk0 = 1, k ∈ [2 :

N − 1], by convention.
Analysis of the probability of error. To bound the probability of error, assume without
loss of generality that M = 1 and Lk j = 1, k ∈ [2 : N − 1], j ∈ [1 : b]. The decoder makes
an error only if one or more of the following events occur:

E1 = 󶁁(Ykn ( j), Ŷkn (lk j |1), Xkn (1)) ∉ Tє(n)

󳰀

for all lk j for some j ∈ [1 : b], k ∈ [2 : N − 1]󶁑,

E2 = 󶁁(X1n ( j, 1), X2n (1), . . . , XN−1
n
(1), Ŷ2n (1|1), . . . , ŶN−1
n
(1|1), YNn ( j)) ∉ Tє(n)
for some j ∈ [1 : b]󶁑,
E3 = 󶁁(X1n ( j, m), X2n (l2 j ), . . . , XN−1
n
(lN−1, j ), Ŷ2n (l2 j | l2, j−1 ), . . . , ŶN−1
n
(lN−1, j | lN−1, j−1 ),
YNn ( j)) ∈ Tє(n) for all j ∈ [1 : b] for some l b , m ̸= 1󶁑.

Thus, the probability of error is upper bounded as

P(E) ≤ P(E1 ) + P(E2 ∩ E1c ) + P(E3 ).

18.3 Noisy Network Coding 475

By the covering lemma and the union of events bound, P(E1 ) tends to zero as n → ∞ if
Rk > I(Ŷk ; Yk |Xk ) + δ(є 󳰀 ), k ∈ [2 : N − 1]. By the Markov lemma and the union of events
bound, the second term P(E2 ∩ E1c ) tends to zero as n → ∞. For the third term, define
the events

Ẽj (m, l j−1 , l j ) = 󶁁(X1n ( j, m), X2n (l2, j−1 ), . . . , XN−1

n
(lN−1, j−1 ),
Ŷ2n (l2 j | l2, j−1 ), . . . ŶN−1
n
(lN−1, j | lN−1, j−1 ), YNn ( j)) ∈ Tє(n) 󶁑.

Then
b
P(E3 ) = P󶀣 󵠎 󵠎 󵠏 Ẽj (m, l j−1 , l j )󶀳
m̸=1 l 󰑏 j=1
b
≤ 󵠈 󵠈 P󶀣 󵠏 Ẽj (m, l j−1 , l j )󶀳
m̸=1 l 󰑏 j=1

b
(a)
= 󵠈 󵠈 󵠉 P(Ẽj (m, l j−1 , l j ))
m̸=1 l 󰑏 j=1
b
≤ 󵠈 󵠈 󵠉 P(Ẽj (m, l j−1 , l j )), (.)
m̸=1 l 󰑏 j=2

where (a) follows since the codebook is generated independently for each block.
For each l b and j ∈ [2 : b], define the subset of nodes

S j (l b ) = {1} ∪ {k ∈ [2 : N − 1] : lk, j−1 ̸= 1}.

Note that S j (l b ) depends only on l j−1 and hence can be written as S j (l j−1 ). Now by the
joint typicality lemma,

P(Ẽj (m, l j−1 , l j )) ≤ 2−n(I1 (S 󰑗 (l 󰑗−1 ))+I2 (S 󰑗 (l 󰑗−1 ))−δ(є)) , (.)

where
̂ c )| X(S c )),
I1 (S) = I(X(S); Y(S
I2 (S) = 󵠈 I(Ŷk ; Y(S
̂ c ∪ {k 󳰀 ∈ S : k 󳰀 < k}), X N−1 | Xk ).
k∈S

Furthermore

󵠈 2−n(I1 (S 󰑗 (l 󰑗−1 ))+I2 (S 󰑗 (l 󰑗−1 ))−δ(є)) ≤ 󵠈 󵠈 2−n(I1 (S 󰑗 (l 󰑗−1 ))+I2 (S 󰑗 (l 󰑗−1 ))−δ(є))
l 󰑗−1 S:1∈S,N∈S 󰑐 l 󰑗−1 :S 󰑗 (l 󰑗−1 )=S

≤ 󵠈 2−n(I1 (S)+I2 (S)−∑󰑘∈S R󰑘 −δ(є))

S:1∈S,N∈S 󰑐

≤ 2N−2 ⋅ 2−n[minS (I1 (S)+I2 (S)−∑󰑘∈S R󰑘 −δ(є))] .

476 Discrete Memoryless Networks

Continuing with the bound in (.), we have

b b
󵠈 󵠈 󵠉 P(Ẽj (m, l j−1 , l j ) ≤ 󵠈 󵠈 󵠉󶀦󵠈 2−n(I1 (S 󰑗 (l 󰑗−1 ))+I2 (S 󰑗 (l 󰑗−1 )−δ(є)) 󶀶
m̸=1 l 󰑏 j=2 m̸=1 l 󰑏 j=2 l 󰑗−1
󰑁−1
≤ 2(N−2)(b−1) ⋅ 2n[bR+∑󰑘=2 R󰑘 −(b−1) minS (I1 (S)+I2 (S)−∑󰑘∈S R󰑘 −δ(є))]
,

which tends to zero as n → ∞ if

N−1
b−1 1
R< 󶀦 min 󰑐 󶀤I1 (S) + I2 (S) − 󵠈 Rk 󶀴 − δ(є)󶀶 − 󵠈 Rk .
b S:1∈S ,N∈S
k∈S
b k=2

By eliminating Rk > I(Ŷk ; Yk |Xk ) + δ(є 󳰀 ), noting that

I2 (S) − 󵠈 I(Ŷk ; Yk | Xk ) = − 󵠈 I(Ŷk ; Yk | X N−1 , Y(S

̂ c ), Y({k
̂ 󳰀 ∈ S : k 󳰀 < k}))
k∈S k∈S

= − 󵠈 I(Ŷk ; Y (S) | X N−1 , Ŷ (S c ), Y({k

̂ 󳰀 ∈ S : k 󳰀 < k}))
k∈S

= −I(Ŷ (S); Y(S)| X N−1 , Y(S

̂ c )),

and taking b → ∞, the probability of error tends to zero as n → ∞ if

R< min 󶀤I1 (S) + I2 (S) − 󵠈 I(Ŷk ; Yk | Xk )󶀴 − (N − 2)δ(є 󳰀 ) − δ(є)

S:1∈S,N∈S 󰑐
k∈S

= ̂ c )| X(S c )) − I(Ŷ (S); Y(S)| X N−1 , Y(S

min 󰑐 󶀡I(X(S); Y(S ̂ c ))󶀱 − δ 󳰀 (є).
S:1∈S,N∈S

This completes the proof of the noisy network coding lower bound in Theorem . for a
single destination node N.
In general when XN ̸=  and ŶN ̸= YN , it can be easily seen by relabeling the nodes
that the above condition becomes

R< min ̂ c ), YN | X(S c )) − I(Ŷ (S); Y (S)| X N , Y(S

󶀡I(X(S); Y(S ̂ c ), YN )󶀱 − δ 󳰀 (є).
S:1∈S ,N∈S 󰑐

Now to prove achievability for the general multicast case, note that each relay encoder
operates in the same manner at the same rate regardless of which node is the destination.
Therefore, if each destination node performs the same multiblock decoding procedure
as in the single-destination case described above, the probability of decoding error for
destination node k ∈ D tends to zero as n → ∞ if

R< min 󶀡I(X(S); Ŷ (S c ), Yk | X(S c )) − I(Y(S);

̂ Y(S)| X N , Ŷ (S c ), Yk )󶀱.
S:1∈S,k∈S 󰑐

This completes the proof of Theorem ..

18.4 Discrete Memoryless Multimessage Network 477

18.4 DISCRETE MEMORYLESS MULTIMESSAGE NETWORK

We consider extensions of the upper and lower bounds on the multicast capacity presented
in the previous sections to multimessage networks. Assume that node j ∈ [1 : N] wishes
to send a message M j to a set of destination nodes D j ⊆ [1 : N]. Note that a node may
not be a destination for any message or a destination for one or more messages, and may
serve as a relay for messages from other nodes.
A (2nR1 , . . . , 2nR󰑁 , n) code for the DMN consists of
∙ N message sets [1 : 2nR1 ], . . . , [1 : 2nR󰑁 ],
∙ a set of encoders, where encoder j ∈ [1 : N] assigns a symbol x ji (m j , y i−1
j ) to each pair
(m j , y j ) for i ∈ [1 : n], and
i−1

̂ jk : j ∈ [1 : N], k ∈
∙ a set of decoders, where decoder k ∈ ⋃ j D j assigns an estimate (m
D j ) or an error message e to each pair (mk , yk ).
n

Assume that (M1 , . . . , MN ) is uniformly distributed over [1 : 2nR1 ] × ⋅ ⋅ ⋅ × [1 : 2nR󰑁 ]. The

average probability of error is defined as

̂ jk ̸= M j for some j ∈ [1 : N], k ∈ D j 󶁑.

Pe(n) = P󶁁M

A rate tuple (R1 , . . . , RN ) is said to be achievable if there exists a sequence of codes such
that limn→∞ Pe(n) = 0. The capacity region of the DMN is the closure of the set of achiev-
able rates.
Note that this model includes the DM-MAC, DM-IC, DM-RC, and DM-TWC (with
and without feedback) as special cases. In particular, when XN =  and D1 = ⋅ ⋅ ⋅ = DN−1 =
{N}, the corresponding network is referred to as the DM-MAC with generalized feedback.
However, our DMN model does not include broadcast networks with multiple messages
communicated by a single source to different destination nodes.
The cutset bound can be easily extended to multimessage multicast networks.

Theorem . (Cutset Bound for the DMN). If a rate tuple (R1 , . . . , RN ) is achievable
for the DMN p(y N |x N ) with destination sets (D1 , . . . , DN ), then it must satisfy the in-
equality
󵠈 R j ≤ I(X(S); Y(S c )| X(S c ))
j∈S:D 󰑗 ∩S 󰑐 ̸=

for all S ⊂ [1 : N] such that S c ∩ D(S) ̸=  for some pmf p(x N ), where D(S) = ⋃ j∈S D j .

This cutset outer bound can be improved for some special network models. For exam-
ple, when no cooperation between the nodes is possible (or allowed), that is, the encoder
of each node is a function only of its own message, the bound can be tightened by condi-
tioning on a time-sharing random variable Q and replacing p(x N ) with p(q) ∏Nj=1 p(x j |q).
This modification makes the bound tight for the N-sender DM-MAC. It also yields a
478 Discrete Memoryless Networks

tighter outer bound on the capacity region of the N sender–receiver pair interference
channel.

18.4.1 Noisy Network Coding for the Multimessage Multicast Network

We extend noisy network coding to multimessage multicast networks, where the sets of
destination nodes are the same, i.e., D j = D for all j ∈ [1 : N].

Theorem . (Multimessage Multicast Noisy Network Coding Inner Bound).

A rate tuple (R1 , . . . , RN ) is achievable for the DM multimessage multicast network
p(y N |x N ) with destinations D if

󵠈 R j < min ̂ c ), Yk | X(S c ), Q) − I(Y (S); Ŷ (S)| X N , Y(S

I(X(S); Y(S ̂ c ), Yk , Q)
󰑐k∈S ∩D
j∈S

for all S ⊂ [1 : N] such that S c ∩ D ̸=  for some pmf p(q) ∏Nj=1 p(x j |q)p( ŷ j |y j , x j , q).

This inner bound is tight for several classes of networks. For the deterministic net-
work Y N = y N (X N ), the cutset bound in Theorem . simplifies to the set of rate tuples
(R1 , . . . , RN ) such that
󵠈 R j ≤ H(Y(S c )| X(S c )) (.)
j∈S

for all S ⊂ [1 : N] with D ∩ S c ̸=  for some p(x N ). In the other direction, by setting
Ŷk = Yk , k ∈ [1 : N], the inner bound in Theorem . simplifies to the set of rate tuples
(R1 , . . . , RN ) such that
󵠈 R j < H(Y(S c )| X(S c )) (.)
j∈S

for all S ⊂ [1 : N] with D ∩ S c ̸=  for some ∏Nj=1 p(x j ). It can be easily shown that the
two bounds coincide for the graphical multimessage multicast network, the deterministic
network with no interference, and the deterministic finite-field network, extending the
single-message results in Section . and Section ... Similarly, it can be shown that
the capacity region of the wireless erasure multimessage multicast network is the set of all
rate tuples (R1 , . . . , RN ) such that

󵠈 Rj ≤ 󵠈 󶀤1 − 󵠉 pl j 󶀴 C j (.)
j∈S j∈S:N 󰑗 ∩S 󰑐 ̸= l∈N 󰑗 ∩S 󰑐

for all S ⊂ [1 : N] with D ∩ S c ̸= . This extends the results for the wireless erasure multi-
cast network in Section ...
Proof of Theorem . (outline). Randomly generate an independent codebook C j =
̃
{xkn (mk , lk, j−1 ), ŷkn (lk j |mk , lk, j−1 ) : mk ∈ [1 : 2nbR󰑘 ], lk j , lk, j−1 ∈ [1 : 2nR󰑘 ], k ∈ [1 : N]} for
each block j ∈ [1 : b], where lk0 = 1, k ∈ [1 : N], by convention. At the end of block j,
18.4 Discrete Memoryless Multimessage Network 479

node k finds an index lk j such that (ykn ( j), ŷkn (lk j |mk , lk, j−1 ), xkn (mk , lk, j−1 )) ∈ Tє(n)
󳰀 and
transmits xkn (mk , lk j ) from C j+1 in block j + 1. The probability of error for this joint typi-
cality encoding step tends to zero as n → ∞ if R̃ k > I(Ŷk ; Yk |Xk ) + δ(є 󳰀 ) as in the proof of
Theorem .. Let ̂lk j = lk j and m ̂ kk = mk by convention. At the end of block b, decoder
k ∈ D finds the unique message tuple (m ̂ 1k , . . . , m
̂ N k ), such that
̂ 1k , ̂l1, j−1 ), . . . , xNn (m
󶀡x1n (m ̂ N k , ̂lN , j−1 ),
(.)
ŷ1n ( ̂l1 j | m
̂ 1k , ̂l1, j−1 ), . . . , ŷNn ( ̂lN j | m
̂ N k , ̂lN , j−1 ), ykn ( j)󶀱 ∈ Tє(n)

for all j ∈ [1 : b] for some (̂l 1 , . . . , ̂l b ), where ̂l j = ( ̂l1 j , . . . , ̂lN j ). Following similar steps to
the proof of Theorem ., we define S j (m, l b ) = {k 󳰀 ∈ [1 : N] : mk󳰀 ̸= 1 or lk 󳰀 , j−1 ̸= 1} and
T (m) = {k 󳰀 ∈ [1 : N] : mk 󳰀 ̸= 1} ⊆ S j (m, l b ), where m = (m1 , . . . , mN ). Then, it can be
shown that the probability of error corresponding to P(E3 ) in the proof of Theorem .
tends to zero as n → ∞ if
b−1 1
󵠈 Rk 󳰀 < 󶀤I1 (S) + I2 (S) − 󵠈 R̃ k 󳰀 − δ(є)󶀴 − 󶀤 󵠈 R̃ k󳰀 󶀴
b b
k 󳰀 ∈T 󳰀 k ∈S 󳰀 k ̸= k

for every S, T ⊂ [1 : N] such that  ̸= T ⊆ S and k ∈ S c , where I1 (S) and I2 (S) are
defined as before. By eliminating R̃ k 󳰀 , k 󳰀 ∈ [1 : N], taking b → ∞, and observing that
each proper subset T of S corresponds to an inactive inequality, we obtain the condition
in Theorem ..

18.4.2* Noisy Network Coding for General Multimessage Networks

We now extend noisy network coding to general multimessage networks. As a first step,
we note that Theorem . continues to hold for general networks with multicast comple-
tion of destination nodes, that is, when every message is decoded by all the destination
N
nodes in D = ⋃ j=1 D j . Thus, we can obtain an inner bound on the capacity region of the
N
DMN in the same form as the inner bound in Theorem . with D = ⋃ j=1 D j .
This multicast-completion inner bound can be improved by noting that noisy net-
work coding transforms a multihop relay network p(y N |x N ) into a single-hop network
p( ỹN |x N ), where the effective output at destination node k is Ỹk = (Yk , Ŷ N ) and the com-
pressed channel outputs Ŷ N are described to the destination nodes with some rate penalty.
This observation leads to improved coding schemes that combine noisy network coding
with decoding techniques for interference channels.
Simultaneous nonunique decoding. Each receiver decodes for all the messages and
compression indices without requiring the correct recovery of unintended messages and
the compression indices; see the simultaneous-nonunique-decoding inner bound in Sec-
tion .. This approach yields the following inner bound on the capacity region that con-
sists of all rate tuples (R1 , . . . , RN ) such that
󵠈 Rk 󳰀 < min ̂ c ), Yk | X(S c ), Q) − I(Y (S); Ŷ (S)| X N , Y(S
󶀡I(X(S); Y(S ̂ c ), Yk , Q)󶀱
k∈S 󰑐 ∩D(S)
k 󳰀 ∈S
(.)
480 Discrete Memoryless Networks

for all S ⊂ [1 : N] with S c ∩ D(S) ̸=  for some conditional pmf p(q) ∏Nk=1 p(xk |q) ⋅
p( ŷk |yk , xk , q). The proof of this inner bound is similar to that of Theorem ., except that
decoder k finds the unique index tuple (m ̂ k󳰀 k : k 󳰀 ∈ Nk ), where Nk = {k 󳰀 ∈ [1 : N] : k ∈
Dk 󳰀 }, such that (.) holds for all j ∈ [1 : b] for some (̂l 1 , . . . , ̂l b ) and (m
̂ k 󳰀 k : k 󳰀 ∉ Nk ).
Treating interference as noise. As an alternative to decoding for all the messages, each
destination node can simply treat interference as noise as in the interference channel; see
Section .. This can be combined with superposition coding of the message (not intended
for every destination) and the compression index at each node to show that a rate tuple
(R1 , . . . , RN ) is achievable for the DMN if

󵠈 Rk 󳰀 < I(X(T ), U (S); Ŷ (S c ), Yk | X(Nk ∩ T c ), U(S c ), Q)

k 󳰀 ∈T
̂
− I(Y(S); Y(S)| X(Nk ), U N , Ŷ (S c ), Yk , Q)

for all S , T ⊂ [1 : N], k ∈ D(S) such that S c ∩ D(S) ̸=  and S ∩ Nk ⊆ T ⊆ Nk for some
conditional pmf p(q) ∏Nk=1 p(uk , xk |q)p( ŷk |yk , uk , q), where Nk = {k 󳰀 ∈ [1 : N] : k ∈ Dk󳰀 }
is the set of sources for node k. To prove achievability, we use a randomly and indepen-
dently generated codebook C j = {unk (lk, j−1 ), xkn (mk |lk, j−1 ), ŷkn (lk j |lk, j−1 ) : mk ∈ [1 : 2nbR󰑘 ],
̃
lk j , lk, j−1 ∈ [1 : 2nR󰑘 ], k ∈ [1 : N]} for each block j ∈ [1 : b]. Upon receiving ykn ( j) at
̃
the end of block j, node k ∈ [1 : N] finds an index lk j ∈ [1 : 2nR󰑘 ] such that (unk (lk, j−1 ),
ŷkn (lk j |lk, j−1 ), ykn ( j)) ∈ Tє(n)󳰀 , and transmits the codeword xk (m k |l k j ) in block j + 1. The
n

probability of error for this joint typicality encoding step tends to zero as n → ∞ if R̃ k >
I(Ŷk ; Yk |Uk ) + δ(є 󳰀 ). At the end of block b, decoder k finds the unique index tuple (m ̂ k󳰀 k :
󳰀
̂ ̂ 󳰀 n ̂ n ̂
̂n ̂ ̂
k ∈ Nk ) such that ((xk󳰀 (mk 󳰀 k | lk󳰀 , j−1 ) : k ∈ Nk ), u1 ( l1, j−1 ), . . . , uN ( lN , j−1 ), y1 ( l1 j | l1, j−1 ),
n

. . . , ŷn ( ̂l | ̂l
N N j N , j−1 ), y n ( j)) ∈ T (n) for all j ∈ [1 : b] for some (l , . . . , l ). Defining S (l b ) =
k є 1 b j
{k ∈ [1 : N] : lk, j−1 ̸= 1} and T (m) = {k 󳰀 ∈ Nk : mk󳰀 ̸= 1}, and following similar steps
to the proof of Theorems . and ., it can be shown that the probability of error
corresponding to P(E3 ) in the proof of Theorem . tends to zero as n → ∞ if

b−1 1
󵠈 Rk 󳰀 < 󶀤I1 (S , T ) + I2 (S, T ) − 󵠈 R̃ k 󳰀 − δ(є)󶀴 − 󶀤 󵠈 R̃ k 󳰀 󶀴
k 󳰀 ∈T
b k 󳰀 ∈S
b k 󳰀 ̸=k

for all S ⊂ [1 : N] and T ⊆ Nk such that k ∈ S c ∩ T c , where

I1 (S , T ) = I(X((S ∪ T ) ∩ Nk ), U(S); Y(S ̂ c ), Yk | X((S c ∩ T c ) ∩ Nk ), U (S c )),

I2 (S , T ) = 󵠈 I(Ŷ 󳰀 ; Ŷ (S c ∪ {k 󳰀󳰀 ∈ S : k 󳰀󳰀 < k 󳰀 }), Yk , X(Nk ), U N |Uk󳰀 ).
k
k 󳰀 ∈S

Eliminating R̃ k , k ∈ [1 : N], taking b → ∞, and removing inactive inequalities completes

the proof of the inner bound.
Remark .. As for the interference channel, neither simultaneous nonunique decoding
nor treating interference as noise consistently outperforms the other and the rates can be
further improved using more sophisticated interference channel coding schemes.
Summary 481

SUMMARY

∙ Discrete memoryless network (DMN): Generalizes graphical networks, DM-RC, and

single-hop networks with and without feedback
∙ Cutset bounds for the DMN
∙ Sliding window decoding for network decode–forward
∙ Noisy network coding:
∙ Same message is sent multiple times using independent codebooks
∙ No Wyner–Ziv binning
∙ Simultaneous nonunique decoding without requiring the recovery of compression
bin indices
∙ Includes compress–forward for the relay channel and network coding for graphical
networks as special cases
∙ Extensions to multimessage networks via interference channel coding schemes
∙ Open problems:
18.1. What is the common-message capacity of a general DM network (that is, multi-
cast with D = [2 : N])?
18.2. What is the capacity of a general deterministic multicast network?
18.3. How can partial decode–forward be extended to noisy networks?
18.4. How can noisy network coding and interference alignment be combined?

BIBLIOGRAPHIC NOTES

The cutset bound in Theorem . is due to El Gamal (). Aref () established the ca-
pacity of physically degraded unicast networks by extending decode–forward via binning
for the relay channel. He also established the capacity of deterministic unicast networks by
extending partial decode–forward. The sliding window decoding scheme was developed
by Carleial (). The network decode–forward lower bound for the general network in
Theorem . is due to Xie and Kumar () and Kramer, Gastpar, and Gupta ().
This improves upon a previous lower bound based on an extension of decode–forward
via binning by Gupta and Kumar (). Kramer et al. () also provided an extension
of compress–forward for the relay channel. As in the original compress–forward scheme,
this extension involves Wyner–Ziv coding and sequential decoding. Decode–forward of
compression indices is used to enhance the performance of the scheme.
In an independent line of work, Ratnakar and Kramer () extended network cod-
ing to establish the capacity of deterministic multicast networks with no interference
482 Discrete Memoryless Networks

in (.). Subsequently Avestimehr, Diggavi, and Tse () further extended this result
to obtain the lower bound on the capacity of general deterministic multicast networks
in (.), and showed that it is tight for finite-field deterministic networks, which they
used to approximate the capacity of Gaussian multicast networks. As in the original proof
of the network coding theorem (see the Bibliographic Notes for Chapter ), the proof by
Avestimehr et al. () is divided into two steps. In the first step, layered networks as ex-
emplified in Figure . are considered and it is shown that if the rate R satisfies the lower
bound, then the end-to-end mapping is one-to-one with high probability. The proof is
then extended to nonlayered networks by considering a time-expanded layered network
with b blocks, and it is shown that if the rate bR is less than b times the lower bound
in (.) for sufficiently large b, then the end-to-end mapping is again one-to-one with
high probability.

д2 Y2 X2 д4 Y4 X4

X1 д6 Y6

д3 Y3 X3 д5 Y5 X5

Figure .. A layered deterministic network example.

The noisy network coding scheme in Section . generalizes compress–forward for
the relay channel and network coding and its extensions to deterministic networks to
noisy networks. The extensions of noisy network coding to multimessage networks in
Section . are due to Lim, Kim, El Gamal, and Chung (). Other extensions of noisy
network coding can be found in Lim, Kim, El Gamal, and Chung (). The inner bound
on the capacity region of deterministic multimessage multicast networks was established
by Perron (). The capacity region of wireless erasure networks in (.) is due to
Dana, Gowaikar, Palanki, Hassibi, and Effros (), who extended linear network coding
to packet erasures.
In the discrete memoryless network model in Section ., each node wishes to com-
municate at most one message. A more general model of multiple messages at each node
was considered by Kramer (), where a cutset bound similar to Theorem . was es-
tablished. The simplest example of such multihop broadcast networks is the relay broad-
cast channel studied by Kramer, Gastpar, and Gupta () and Liang and Kramer ().

PROBLEMS

.. Provide the details of the probability of error analysis for the network de-
code–forward lower bound in Section ...
Problems 483

.. Consider the cutset bound on the multicast capacity for deterministic networks
in (.). Show that the product input pmf attains the maximum in the bound for
the deterministic network with no interference and that the uniform input pmf
attains the maximum for the finite-field deterministic network.
.. Consider the wireless erasure multicast network in Section ... Show that the
cutset bound in Theorem . and the noisy network coding lower bound in The-
orem . coincide and simplify to the capacity expression in (.).
.. Derive the inequality in (.) using the fact that m ̸= 1 and lk, j−1 ̸= 1 for k ∈
S j (l j−1 ).
.. Prove the cutset bound for noisy multimessage networks in Theorem ..
.. Complete the details of the achievability proof of Theorem ..
.. Broadcasting over a diamond network. Consider a DMN p(y2 , y3 , y4 |x1 , x2 , x3 ) =
p(y2 |x1 )p(y3 |x1 )p(y4 |x2 , x3 ). Node  wishes to communicate a common message
M to all other nodes. Find the capacity.
.. Two-way relay channel. Consider a DMN p(y1 , y2 , y3 |x1 , x2 , x3 ). Node  wishes to
communicate a message M1 ∈ [1 : 2nR1 ] to node  and node  wishes to commu-
nicate a message M2 ∈ [1 : 2nR2 ] to node  with the help of relay node .
(a) Characterize the cutset bound for this channel.
(b) Noting that the channel is a multimessage multicast network with D = {1, 2},
simplify the noisy network coding inner bound in Theorem ..
.. Interference relay channel. Consider a DMN p(y3 , y4 , y5 |x1 , x2 , x3 ). Node  wishes
to communicate a message M1 ∈ [1 : 2nR1 ] to node  and node  wishes to com-
municate a message M2 ∈ [1 : 2nR2 ] to node  with the help of relay node .
(a) Characterize the cutset bound for this channel.
(b) Simplify the noisy network coding inner bound with simultaneous nonunique
decoding in (.) for this channel.
CHAPTER 19

Gaussian Networks

In this chapter, we discuss models for wireless multihop networks that generalize the
Gaussian channel models we studied earlier. We extend the cutset bound and the noisy
network coding inner bound on the capacity region of the multimessage DMN presented
in Chapter  to Gaussian networks. We show through a Gaussian two-way relay chan-
nel example that noisy network coding can outperform decode–forward and amplify–
forward, achieving rates within a constant gap of the cutset bound while the inner bounds
achieved by these other schemes can have an arbitrarily large gap to the cutset bound.
More generally, we show that noisy network coding for the Gaussian multimessage multi-
cast network achieves rates within a constant gap of the capacity region independent of
network topology and channel gains. For Gaussian networks with other messaging de-
mands, e.g., general multiple-unicast networks, however, no such constant gap results
exist in general. Can we still obtain some guarantees on the capacity of these networks?
To address this question, we introduce the scaling-law approach to capacity, where we
seek to find the order of capacity scaling as the number of nodes in the network becomes
large. In addition to providing some guarantees on network capacity, the study of capacity
scaling sheds light on the role of cooperation through relaying in combating interference
and path loss in large wireless networks. We first illustrate the scaling-law approach via a
simple unicast network example that shows how relaying can dramatically increase the ca-
pacity by reducing the effect of high path loss. We then present the Gupta–Kumar random
network model in which the nodes are randomly distributed over a geographical area and
the goal is to determine the capacity scaling law that holds for most such networks. We es-
tablish lower and upper bounds on the capacity scaling law for the multiple-unicast case.
The lower bound is achieved via a cellular time-division scheme in which the messages are
sent simultaneously using a simple multihop scheme with nodes in cells along the lines
from each source to its destination acting as relays. We show that this scheme achieves
much higher rates than direct transmission with time division, which demonstrates the
role of relaying in mitigating interference in large networks. This cellular time-division
scheme also outperforms noncellular multihop through spatial reuse of time enabled by
high path loss. Finally, we derive an upper bound on the capacity scaling law using the
cutset bound and a network augmentation technique. This upper bound becomes tighter
as the path loss exponent increases and has essentially the same order as the cellular time-
division lower bound under the absorption path loss model.
19.1 Gaussian Multimessage Network 485

19.1 GAUSSIAN MULTIMESSAGE NETWORK

Consider an N-node Gaussian network

N
Yk = 󵠈 дk j X j + Zk , k ∈ [1 : N],
j=1

where дk j is the gain from the transmitter of node j to the receiver of node k, and the
noise components Zk ∼ N(0, 1), k ∈ [1 : N] are i.i.d. N(0, 1). We assume expected average
power constraint P on each X j , i.e., ∑ni=1 E(x 2ji (m j , Y ji−1 )) ≤ nP, m j ∈ [1 : 2nR 󰑗 ], j ∈ [1 :
N]. We consider a general multimessage demand where each node j wishes to send a
message M j to a set of destination nodes D j . The definitions of a code, probability of error,
achievability, and capacity region follow those for the multimessage DMN in Section ..
Consider the following special cases:
∙ If XN =  and D j = {N} for j ∈ [1 : N − 1], then the network reduces to the (N − 1)-
sender Gaussian MAC with generalized feedback.
∙ If N = 2k, Xk+1 = ⋅ ⋅ ⋅ = XN = Y1 = ⋅ ⋅ ⋅ = Yk = , D j = { j + k} for j ∈ [1 : k], then the
network reduces to the k-user-pair Gaussian IC.
∙ If N = 3, X3 = Y1 = , D1 = {3}, and R2 = 0, then the network reduces to the Gaussian
RC.
The Gaussian network can be equivalently written in a vector form
Y N = G XN + ZN , (.)
where X N is the channel input vector, G ∈ ℝN×N is the channel gain matrix, and Z N is
a vector of i.i.d. N(0, 1) noise components. Using this vector form, the cutset bound in
Theorem . can be easily adapted to the Gaussian network model.

Theorem . (Cutset Bound for the Gaussian Multimessage Network). If a rate
tuple (R1 , . . . , RN ) is achievable for the Gaussian multimessage network with destina-
tion sets (D1 , . . . , DN ), then it must satisfy the inequality
1
󵠈 Rj ≤ log |I + G(S)K(S |S c )G T(S)|
j∈S:D 󰑗 ∩S 󰑐 ̸=
2

for all S such that S c ∩ D(S) ̸=  for some covariance matrix K ⪰ 0 with K j j ≤ P,
j ∈ [1 : N]. Here D(S) = ⋃ j∈S D j , K(S|S c ) is the conditional covariance matrix of
X(S) given X(S c ) for X N ∼ N(0, K), and G(S) is defined such that

Y (S) G 󳰀(S) G(S c ) X(S) Z(S)

󶁦 c 󶁶=󶁦 󶁶󶁦 󶁶+󶁦 󶁶,
Y(S ) G(S) G 󳰀(S c ) X(S c ) Z(S c )

for some gain submatrices G 󳰀 (S) and G 󳰀 (S c ).

486 Gaussian Networks

When no cooperation between the nodes is possible, the cutset bound can be tight-
ened as for the DMN by conditioning on a time-sharing random variable Q and consid-
ering X N |{Q = q} ∼ N(0, K(q)), where K(q) is diagonal and EQ (K j j (Q)) ≤ P. This yields
the improved bound with conditions
1
󵠈 Rj ≤ E (log |I + G(S)K(S |Q)G T(S)|)
j∈S:D 󰑗 ∩S 󰑐 ̸=
2 Q

for all S such that S c ∩ D(S) ̸= , where K(S|Q) is the (random) covariance matrix of
X(S) given Q.

19.1.1 Noisy Network Coding Lower Bound

The inner bound on the capacity region of the DM multimessage multicast network in
Theorem . can be also adapted to Gaussian networks. By adding the power constraints,
we can readily obtain the noisy network coding inner bound that consists of all rate tuples
(R1 , . . . , RN ) such that
󵠈 R j < min ̂ c ), Yk | X(S c ), Q) − I(Y (S); Ŷ (S)| X N , Y(S
I(X(S); Y(S ̂ c ), Yk , Q) (.)
󰑐k∈S ∩D
j∈S

for all S satisfying S c ∩ D ̸=  for some conditional distribution p(q) ∏Nj=1 F(x j |q) ⋅
F( ŷ j |y j , x j , q) such that E(X 2j ) ≤ P for j ∈ [1 : N]. The optimizing conditional distribu-
tion of the inner bound in (.) is not known in general. To compare this noisy network
coding inner bound to the cutset bound in Theorem . and to other inner bounds, we
set Q = , X j , j ∈ [1 : N], i.i.d. N(0, P), and
Ŷk = Yk + Ẑk , k ∈ [1 : N],
where Ẑk ∼ N(0, 1), k ∈ [1 : N], are independent of each other and of (X N , Y N ). Substi-
tuting in (.), we have
(a)
I(Y (S); Ŷ (S)| X N , Y(S
̂ c ), Yk ) ≤ I(Ŷ (S); Y (S)| X N )
̂
= h(Y(S)| ̂
X N ) − h(Y(S)|Y (S), X N )
|S| |S|
= log(4πe) − log(2πe)
2 2
|S|
=
2
̂ c ), Yk ) →
for each k ∈ D and S such that S c ∩ D ̸= . Here step (a) follows since (Y(S
(X , Y(S)) → Ŷ (S) form a Markov chain. Furthermore
N

I(X(S); Ŷ (S c ), Yk | X(S c )) ≥ I(X(S); Y(S

Hence, we obtain the inner bound characterized by the set of inequalities

1 󵄨󵄨 P 󵄨󵄨 |S|
󵠈 Rj < log 󵄨󵄨󵄨 I + G(S)G T(S)󵄨󵄨󵄨 − (.)
2 󵄨󵄨 2 󵄨󵄨 2
j∈S

for all S with S c ∩ D ̸= .

Remark .. As in the compress–forward lower bound for the Gaussian RC in Sec-
tion ., the choice of Ŷk = Yk + Ẑk with Ẑk ∼ N(0, 1) can be improved upon by opti-
mizing over the average powers of Ẑk , k ∈ [1 : N], for the given channel gain matrix. The
bound can be improved also by time sharing. It is not known, however, if Gaussian test
channels are optimal.
In the following, we compare this noisy network coding inner bound to the cutset
bound and other inner bounds on the capacity region.

19.1.2 Gaussian Two-Way Relay Channel

Consider the -node Gaussian two-way relay channel with no direct links depicted in
Figure . with outputs

Y1 = д13 X3 + Z1 ,
Y2 = д23 X3 + Z2 ,
Y3 = д31 X1 + д32 X2 + Z3 ,

where the noise components Zk , k = 1, 2, 3, are i.i.d. N(0, 1). We assume expected average
power constraint P on each of X1 , X2 , and X3 . Denote the received SNR for the signal from
node j to node k as Sk j = дk2 j P. Node 1 wishes to communicate a message M1 to node 
and node  wishes to communicate a message M2 to node  with the help of relay node ,
i.e., D = {1, 2}; see Problem . for a more general DM counterpart.

д31 д32
X1 X2

Y3
X3
Y1 Y2
д13 д23

Z1 Z2
Figure .. Gaussian two-way relay channel with no direct links.
488 Gaussian Networks

The capacity region of this multimessage multicast network is not known in general.
We compare the following outer and inner bounds on the capacity region.
Cutset bound. The cutset bound in Theorem . can be readily specialized to this Gauss-
ian two-way channel. If a rate pair (R1 , R2 ) is achievable, then it must satisfy the inequal-
ities
R1 ≤ min{C(S31 ), C(S23 )},
(.)
R2 ≤ min{C(S32 ), C(S13 )}.

Decode–forward inner bound. The decode–forward coding scheme for the DM-RC in
Section . can be extended to this two-way relay channel. Node  recovers both M1 and
M2 over the multiple access channel Y3 = д31 X1 + д32 X2 + Z3 and broadcasts them. It can
be easily shown that a rate pair (R1 , R2 ) is achievable if

R1 < min{C(S31 ), C(S23 )},

R2 < min{C(S32 ), C(S13 )}, (.)
R1 + R2 < C(S31 + S32 ).

Amplify–forward inner bound. The amplify–forward relaying scheme for the RFD
Gaussian RC in Section . can be easily extended to this setting by having node  send a
scaled version of its received symbol. The corresponding inner bound consists of all rate
pairs (R1 , R2 ) such that
S23 S31
R1 < C 󶀥 󶀵,
1 + S23 + S31 + S32
(.)
S13 S32
R2 < C 󶀥 󶀵.
1 + S13 + S31 + S32

Noisy network coding inner bound. By setting Q =  and Ŷ3 = Y3 + Ẑ3 , where Ẑ3 ∼
N(0, σ 2 ) is independent of (X 3 , Y 3 ), in (.), we obtain the inner bound that consists of
all rate pairs (R1 , R2 ) such that

R1 < min󶁁C󶀡S31 /(1 + σ 2 )󶀱, C(S23 ) − C(1/σ 2 )󶁑,

(.)
R2 < min󶁁C󶀡S32 /(1 + σ 2 )󶀱, C(S13 ) − C(1/σ 2 )󶁑

for some σ 2 > 0.

Figure . compares the cutset bound to the decode–forward, amplify–forward, and
noisy network coding bounds on the sum-capacity (with optimized parameters). The
plots in the figure assume that nodes  and  are unit distance apart and node  is distance
r ∈ [0, 1] from node  along the line between nodes  and ; the channel gains are of the
form дk j = rk−3/2
j , where rk j is the distance between nodes j and k, hence д13 = д31 =
r −3/2 , д23 = д32 = (1 − r)−3/2 ; and the power P = 10. Note that noisy network coding
outperforms amplify–forward and decode–forward when the relay is sufficiently far from
both destination nodes.
19.1 Gaussian Multimessage Network 489

6.5

5.5 RCS
RNNC
5
RAF
4.5

4
RDF
3.5

2.50 0.1 0.2 0.3 0.4 0.5

Figure .. Comparison of the cutset bound RCS , decode–forward lower bound
RDF , amplify–forward lower bound RAF , and noisy network coding lower bound
RNNC on the sum-capacity of the Gaussian two-way relay channel as the function of
the distance r between nodes  and .

In general, it can be shown that noisy network coding achieves the capacity region
within 1/2 bit per dimension, while the other schemes have an unbounded gap to the
cutset bound as P → ∞ (see Problem .).
Remark .. Unlike the case of the RFD Gaussian relay channel studied in Section .,
noisy network coding does not always outperform amplify–forward. The reason is that
both destination nodes are required to recover the compression index and hence its rate
is limited by the worse channel. This limitation can be overcome by sending layered de-
scriptions of Y3n such that the weaker receiver recovers the coarser description while the
stronger receiver recovers both descriptions.

19.1.3 Multimessage Multicast Capacity Region within a Constant Gap

We show that noisy network coding achieves the capacity region of the Gaussian multi-
message network Y N = G X N + Z N within a constant gap uniformly for any channel gain
matrix G.

Theorem . (Constant Gap for Gaussian Multimessage Multicast Network).

For the Gaussian multimessage multicast network, if a rate tuple (R1 , . . . , RN ) is in the
cutset bound in Theorem ., then the rate tuple (R1 − Δ, . . . , RN − Δ) is achievable,
where Δ = (N/2) log 6.
490 Gaussian Networks

Proof. Note that the cutset bound in Theorem . can be loosened as

log 󵄨󵄨󵄨󵄨 I + G(S)KX(S) G T(S)󵄨󵄨󵄨󵄨

1
󵠈 Rj ≤
j∈S
2

log 󵄨󵄨󵄨󵄨 I + KX(S) G T(S)G(S)󵄨󵄨󵄨󵄨

1
=
2
󵄨 󵄨
≤ log 󵄨󵄨󵄨󵄨 I + KX(S) G T(S)G(S) + K X(S) + G T(S)G(S)󵄨󵄨󵄨󵄨
1 2 P
2 󵄨 P 2 󵄨
1 󵄨󵄨 2 󵄨󵄨󵄨󵄨 P T 󵄨󵄨
= log 󶀤󵄨󵄨󵄨 I + K X(S) 󵄨󵄨󵄨󵄨󵄨󵄨 I + G (S)G(S)󵄨󵄨󵄨󶀴
2 󵄨 P 󵄨󵄨 2 󵄨
|S|
(a) 󵄨 󵄨
log 3 + log 󵄨󵄨󵄨󵄨 I + G(S)G T(S)󵄨󵄨󵄨󵄨
1 P
≤
2 2 󵄨 2 󵄨
N 1 󵄨󵄨 P 󵄨󵄨
≤ log 3 + log 󵄨󵄨󵄨 I + G(S)G (S)󵄨󵄨󵄨
T
(.)
2 2 󵄨 2 󵄨

for all S such that D ∩ S c ̸= , where K X(S) denotes the covariance matrix of X(S) when
X N ∼ N(0, K), and (a) follows by Hadamard’s inequality. In the other direction, by loos-
ening the inner bound in (.), a rate tuple (R1 , . . . , RN ) is achievable if

󵄨 󵄨 N
log 󵄨󵄨󵄨󵄨 I + G(S)G T(S)󵄨󵄨󵄨󵄨 −
1 P
󵠈 Rj < (.)
j∈S
2 󵄨 2 󵄨 2

for all S such that D ∩ S c ̸= . Comparing (.) and (.) completes the proof of Theo-
rem ..

19.2 CAPACITY SCALING LAWS

As we have seen, the capacity of Gaussian networks is known only in very few special
cases. For the multimessage multicast case, we are able to show that the capacity region
for any Gaussian network is within a constant gap of the cutset bound. No such constant
gap results exist, however, for other multimessage demands. The scaling laws approach to
capacity provides another means for obtaining guarantees on the capacity of a Gaussian
network. It aims to establish the optimal scaling order of the capacity as the number of
nodes grows.
In this section, we focus on Gaussian multiple-unicast networks in which each node
in a source-node set S wishes to communicate a message to a distinct node in a disjoint
destination-node set D. The rest of the nodes as well as the source and destination nodes
themselves can also act as relays. We define the symmetric network capacity C(N) as the
supremum of the set of symmetric rates R such that the rate tuple (R, . . . , R) is achievable.
We seek to establish the scaling law for C(N), that is, to find a function д(N) such that
C(N) = Θ(д(N)); see Notation.
We illustrate this approach through the following simple example. Consider the N-
node Gaussian unicast network depicted in Figure .. Assume the power law path loss
(channel gain) д(r) = r −󰜈/2 , where r is the distance and 󰜈 > 2 is the path loss exponent.
19.2 Capacity Scaling Laws 491

X1 (Y2 , X2 ) (Y3 , X3 ) (YN−1 , XN−1 ) YN

1 1 1

Figure .. Gaussian unicast network.

Hence the received signal at node k is

N−1
Yk = 󵠈 | j − k| −󰜈/2 X j + Zk , k ∈ [2 : N].
j=1, j̸=k

We assume expected average power constraint P on each X j , j ∈ [1 : N − 1].

Source node  wishes to communicate a message to destination node N with the other
nodes acting as relays to help the communication; thus the source and relay encoders
are specified by x1n (m) and x ji (y i−1
j ), i ∈ [1 : n], for j ∈ [2 : N − 1]. As we discussed in
Chapter , the capacity of this network is not known for N = 3 for any nonzero channel
parameter values. How does C(N) scale with N? To answer this question consider the
following bounds on C(N).
Lower bounds. Consider a simple multihop relaying scheme, where signals are Gaussian
and interference is treated as noise. In each transmission block, the source transmits a
new message to the first relay and node j ∈ [2 : N − 1] transmits its most recently received
message to node j + 1. Then C(N) ≥ min j C(P/(I j + 1)). Now the interference power at
−󰜈
node j ∈ [2 : N] is I j = ∑N−1
k=1,k̸= j−1, j | j − k| P. Since 󰜈 > 2, I j = O(1) for all j. Hence
C(N) = Ω(1).
Upper bound. Consider the cooperative broadcast upper bound on the capacity

C(N) ≤ sup I(X1 ; Y2 , . . . , YN | X2 , . . . , XN−1 )

F(x 󰑁−1 ):E(X 2󰑗 )≤P, j∈[1:N−1]

1
≤ log |I + APAT |
2
1
= log |I + PAT A|,
2
T
where A = 󶁢1 2−󰜈/2 3−󰜈/2 ⋅ ⋅ ⋅ (N − 1)−󰜈/2 󶁲 . Hence

N−1
1
C(N) ≤ C 󶀨󶀦 󵠈 󶀶P󶀸 .
j=1 j󰜈

Since 󰜈 > 2, C(N) = O(1), which is the same scaling as achieved by the simple multihop
scheme. Thus, we have shown that C(N) = Θ(1).
Remark .. The maximum rate achievable by direct transmission from the source to
the destination using the same total system power NP is C(PN(N − 1)−󰜈 ) = Θ(N 1−󰜈 ).
Since 󰜈 > 2, this rate tends to zero as N → ∞.
492 Gaussian Networks

This example shows that relaying can dramatically increase the communication rate
when the path loss exponent 󰜈 is large. Relaying can also help mitigate the effect of inter-
ference as we see in the next section.

19.3 GUPTA–KUMAR RANDOM NETWORK

The Gupta–Kumar random network approach aims to establish capacity scaling laws that
apply to most large ad-hoc wireless networks. The results can help our understanding of
the role of cooperation in large networks, which in turn can guide network architecture
design and coding scheme development.
We assume a “constant density” network with 2N nodes, each randomly and inde-
pendently placed according to a uniform pdf over a square of area N as illustrated in
Figure .. The nodes are randomly partitioned into N source–destination (S-D) pairs.
Label the source nodes as 1, 2, . . . , N and the destination nodes as N + 1, N + 2, . . . , 2N.
Once generated, the node locations and the S-D assignments are assumed to be fixed
and known to the network architect (code designer). We allow each node, in addition to
being either a source or a destination, to act as a relay to help other nodes communicate
their messages.
We assume the Gaussian network model in (.) with power law path loss, that is, if
the distance between nodes j and k is r jk , then the channel gain д jk = r −󰜈/2
jk for 󰜈 > 2.
Hence, the output signal at each node k ∈ [1 : 2N] is
2N
Yk = 󵠈 rk−󰜈/2
j X j + Zk .
j=1, j̸=k

We consider a multiple-unicast setting in which source node j ∈ [1 : N] wishes to

󵀂N/2

2
1

−󵀂N/2
−󵀂N/2 󵀂N/2

Figure .. Gupta–Kumar random network.

19.3 Gupta–Kumar Random Network 493

communicate a message M j ∈ [1 : 2nR 󰑗 ] reliably to destination node j + N. The messages

are assumed to be independent and uniformly distributed. We wish to determine the
scaling law for the symmetric capacity C(N) that holds with high probability (w.h.p.), that
is, with probability ≥ (1 − єN ), where єN tends to zero as N → ∞. In other words, the
scaling law holds for most large networks generated in this random manner.
We establish the following bounds on the symmetric capacity.

Theorem .. The symmetric capacity of the random network model with path loss
exponent 󰜈 > 2 has the following order bounds:
. Lower bound: C(N) = Ω󶀡N −1/2 (log N)−(󰜈+1)/2 󶀱 w.h.p.
. Upper bound: C(N) = O󶀡N −1/2+1/󰜈 log N󶀱 w.h.p.

In other words, there exist constants a1 , a2 > 0 such that

lim P󶁁a1 N −1/2 (log N)−(󰜈+1)/2 ≤ C(N) ≤ a2 N −1/2+1/󰜈 log N󶁑 = 1.

N→∞

Before proving the upper and lower order bounds on the symmetric capacity scaling,
consider the following simple transmission schemes.
Direct transmission. Suppose that there is only a single randomly chosen S-D pair. Then
it can be readily checked that the S-D pair is Ω(N −1/2 ) apart w.h.p. and thus direct trans-
mission achieves the rate Ω(N −󰜈/2 ) w.h.p. Hence, for N S-D pairs, using time division
with power control achieves the symmetric rate Ω(N −󰜈/2 ) w.h.p.
Multihop relaying. Consider a single randomly chosen S-D pair. As we mentioned
above, the S-D pair is Ω(N −1/2 ) apart. Furthermore, it can be shown that with high prob-
ability, there are roughly Ω((N/ log N)1/2 ) relays placed close to the straight line from
the source to the destination with distance O((log N)1/2 ) between every two consecu-
tive relays. Using the multihop scheme in Section . with these relays, we can show
that Ω((log N)−󰜈/2 ) is achievable w.h.p. Hence, using time division and multihop relay-
ing (without power control), we can achieve the lower bound on the symmetric capacity
C(N) = Ω((log N)−󰜈/2 /N) w.h.p., which is a huge improvement over direct transmission
when the path loss exponent 󰜈 > 2 is large.
Remark .. Using relaying, each node transmits at a much lower power than using di-
rect transmission. This has the added benefit of reducing interference between the nodes,
which can be exploited through spatial reuse of time/frequency to achieve higher rates.

19.3.1 Proof of the Lower Bound

To prove the lower bound in Theorem ., consider the cellular time-division scheme
illustrated in Figure . with cells of area log N (to guarantee that no cell is empty w.h.p.).
As shown in the figure, the cells are divided into nine groups. We assume equal trans-
mission rates for all S-D pairs. A block Markov transmission scheme is used, where each
494 Gaussian Networks

󵀂N/2

1 2 3
Cells of area log N
4 5 6
Active cells
7 8 9

−󵀂N/2
−󵀂N/2 󵀂N/2

Figure .. Cellular time-division scheme.

source node sends messages over several transmission blocks. Each transmission block is
divided into nine cell-blocks. A cell is said to be active if its nodes are allowed to transmit.
Each cell is active only during one out of the nine cell-blocks. Nodes in inactive cells act
as receivers. As shown in Figure ., each message is sent from its source node to the
destination node using other nodes in cells along the straight line joining them (referred
to as an S-D line) as relays.
Transmission from each node in an active cell to nodes in its four neighboring cells
is performed using Gaussian random codes with power P and each receiver treats inter-
ference from other senders as noise. Let S(N) be the maximum number of sources in a
cell and L(N) be the maximum number of S-D lines passing through a cell, over all cells.

S
S󳰀

D󳰀

Figure .. Messages transmitted via relays along S-D lines.

19.3 Gupta–Kumar Random Network 495

Each cell-block is divided into S(N) + L(N) node-blocks for time-division transmission
by nodes inside each active cell as illustrated in Figure .. Each source node in an ac-
tive cell broadcasts a new message during its node-block using a Gaussian random code
with power P. One of the nodes in the active cell acts as a relay for the S-D pairs that
communicate their messages through this cell. It relays the messages during the allotted
L(N) node-blocks using a Gaussian random code with power P.

Cell-blocks

1 2 3 9
Node-blocks

S(N) L(N)

Figure .. Time-division scheme.

Analysis of the probability of failure. The cellular time-division scheme fails if one or
both of the following events occur:

E1 = {there is a cell with no nodes in it},

E2 = {transmission from a node in a cell to a node in a neighboring cell fails}.

Then the probability that the scheme fails is upper bounded as

P(E) ≤ P(E1 ) + P(E2 ∩ E1c ).

It is straightforward to show that P(E1 ) tends to zero as N → ∞. Consider the second

term P(E2 ∩ E1c ). From the cell geometry, the distance between each transmitting node
in a cell and each receiving node in its neighboring cells is always less than or equal to
(5 log N)1/2 . Since each transmitting node uses power P, the received power at a node
in a neighboring cell is always greater than or equal to (5 log N)−󰜈/2 P. Under worst-case
placement of the sender, the receiver, and the interfering transmitters during a cell-block
(see Figure .), it can be shown that the total average interference power at a receiver
from all other transmitting nodes is
∞ ∞ ∞
2P 4P
I≤󵠈 󰜈/2
+󵠈󵠈 󰜈/2
. (.)
j=1 󶀡(3 j − 2)2 log N󶀱 j=1 k=1 󶀡󶀡(3 j − 2)2 + (3k − 1)2 󶀱 log N󶀱

Hence, if 󰜈 > 2, I ≤ a3 (log N)−󰜈/2 for some constant a3 > 0.

Since we are using Gaussian random codes, the probability of error tends to zero as
the node-block length n → ∞ if the transmission rate for each node block is less than
496 Gaussian Networks

Sender

Interferers
Receiver

Figure .. Placement of the active nodes assumed in the derivation of the bound
on interference power.

C((5 log N)−󰜈/2 P/(1 + a3 (log N)−󰜈/2 )). Thus, for a fixed network, P(E2 ∩ E1c ) tends to zero
as n → ∞ if the symmetric rate

1 (5 log N)−󰜈/2 P
R(N) < C󶀦 󶀶. (.)
9(S(N) + L(N)) 1 + a3 (log N)−󰜈/2

In the following, we bound S(N) + L(N) over a random network.

Lemma .. S(N) + L(N) = O󶀡(N log N)1/2 󶀱 w.h.p.

The proof of this lemma is given in Appendix A. Combining Lemma . and the
bound on R(N) in (.), we have shown that C(N) = Ω(N −1/2 (log N)−(󰜈+1)/2 ) w.h.p.,
which completes the proof of the lower bound in Theorem ..
Remark .. The lower bound achieved by the cellular time-division scheme represents
a vast improvement over time division with multihop, which, by comparison, achieves
C(N) = Ω((log N)−󰜈/2 /N) w.h.p. This improvement is the result of spatial reuse of time
(or frequency), which enables simultaneous transmission with relatively low interference
due to the high path loss.

19.3.2 Proof of the Upper Bound

We prove the upper bound in Theorem ., i.e., C(N) = O(N −1/2+1/󰜈 log N) w.h.p. For
a given random network, divide the square area of the network into two halves. Assume
the case where there are at least N/3 sources on the left half and at least a third of them
19.3 Gupta–Kumar Random Network 497

transmit to destinations on the right half. Since the locations of sources and destinations
are chosen independently, it can be easily shown that the probability of this event tends
to one as N → ∞. We relabel the nodes so that these sources are 1, . . . , N 󳰀 and the cor-
responding destinations are N + 1, . . . , N + N 󳰀 .
By the cutset bound in Theorem ., the symmetric capacity for these source nodes
is upper bounded by

1 󳰀
N+N 󳰀 󵄨󵄨 N 1 󳰀
N+N 󳰀
max 󳰀
I󶀡X N ; YN+1 󵄨
󵄨 XN 󳰀 +1 󶀱 ≤ max󳰀 󳰀 I󶀡X N ; ỸN+1 󶀱,
F(x ) 󰑁 N F(x 󰑁 ) N

󳰀
where Ỹk = ∑Nj=1 дk j X j + Zk for k ∈ [N 󳰀 + 1 : 2N 󳰀 ]. Since the symmetric capacity of the
original network is upper bounded by the symmetric capacity of these N 󳰀 source–destina-
tion pairs, from this point on, we consider the subnetwork consisting only of these source–
destination pairs and ignore the reception at the source nodes and the transmission at
the destination nodes. To simplify the notation, we relabel N 󳰀 as N and Ỹk as Yk for k ∈
[N + 1 : 2N]. Thus, each source node j ∈ [1 : N] transmits X j with the same average power
constraint P and each destination node k ∈ [N + 1 : 2N] receives

N
Yk = 󵠈 дk j X j + Zk .
j=1

2N
We upper bound (1/N)I(X N ; YN+1 ) for this 2N-user interference channel.
Let node j (source or destination) be at random location (U j , V j ). We create an aug-
mented network by adding 2N mirror nodes as depicted in Figure .. For every destina-
tion node Y j , j ∈ [N + 1 : 2N], we add a sender node X j at location (−UN+ j , VN+ j ), and
for every source node X j , j ∈ [1 : N], we add a receiver node Y j at location (−U j , V j ).

X1 Y1
(U1 , V1 ) (−U1 , V1 )

X4 Y4
(−U4 , V4 ) (U4 , V4 )

X2 Y2
(U2 , V2 ) (−U2 , V2 )

X3 Y3
(−U3 , V3 ) (U3 , V3 )

Figure .. Augmented network.

498 Gaussian Networks

The received vector of this augmented network is

Y 2N = G X 2N + Z 2N ,
where Z 2N is a vector of i.i.d. N(0, 1) noise components. The gain matrix G is symmetric
and G j j = (2|U j |)−󰜈/2 . Furthermore, it can be shown that G ⪰ 0 (for 󰜈 > 0).
Now consider
2N
NC(N) ≤ sup I(X N ; YN+1 )
F(x 2󰑁 ):E |X 2󰑗 |≤P, j∈[1:2N]

≤ sup I(X 2N ; Y 2N )
F(x 2󰑁 ):E |X 2󰑗 |≤P, j∈[1:2N]

1
≤ max log |I + GKX G T |
KX ⪰0:tr(KX )≤2NP 2
2N
1
= max 󵠈 log(1 + P j λ2j )
P 󰑗 :∑2󰑁
󰑗=1 P 󰑗 ≤2NP
2 j=1
2N
1
≤ 󵠈 log(1 + 2NPλ2j )
2 j=1
2N
≤ 󵠈 log(1 + (2NP)1/2 λ j )
j=1

= log |I2N + (2NP)1/2 G |

2N
≤ 󵠈 log(1 + (2NP)1/2 G j j ), (.)
j=1

where P j and λ j , j ∈ [1 : 2N], are the eigenvalues of the positive semidefinite matrices KX
and G, respectively, and G j j = (2|U j |)−󰜈/2 . Define
2N
D(N) = 󵠈 log󶀡1 + (2|U j |)−󰜈/2 (2NP)1/2 󶀱.
j=1

Then we have the following.

Lemma .. D(N) = O󶀡N 1/2+1/󰜈 log N󶀱 w.h.p.

The proof of this lemma is given in Appendix B.

Combining the lemma with the bound in (.) completes the proof of Theorem ..
Remark .. If we assume the path loss to include absorption, i.e.,
д(r) = e −γr/2 r −󰜈/2
for some γ > 0, the effect of interference in the network becomes more localized and the
upper bound on C(N) reduces to O(N −1/2 (log N)2 ) w.h.p., which has roughly the same
order as the lower bound.
Summary 499

SUMMARY

∙ Cutset bound for the Gaussian network

∙ Noisy network coding achieves within a constant gap of the cutset bound for Gaussian
multimessage multicast networks
∙ Relaying plays a key role in combating high path loss and interference in large wireless
networks
∙ Scaling laws for network capacity
∙ Random network model
∙ Cellular time-division scheme:
∙ Outperforms time division with relaying via spatial reuse of time/frequency en-
abled by high path loss
∙ Achieves close to the symmetric capacity order for most networks as the path loss
exponent becomes large, and is order-optimal under the absorption model
∙ Use of network augmentation in the proof of the symmetric capacity upper bound
∙ Open problems:
19.1. What is the capacity region of the Gaussian two-way relay channel with no di-
rect links?
19.2. What is the symmetric capacity scaling law for the random network model?

BIBLIOGRAPHIC NOTES

The noisy network coding inner bound on the capacity region of the Gaussian multi-
message multicast network in (.) and the constant gap result in Theorem . were es-
tablished by Lim, Kim, El Gamal, and Chung (). The Gaussian two-way relay channel
with and without direct links was studied by Rankov and Wittneben (), Katti, Maric,
Goldsmith, Katabi, and Médard (), Nam, Chung, and Lee (), and Lim, Kim,
El Gamal, and Chung (, ). The layered noisy network coding scheme mentioned
in Remark . was proposed by Lim, Kim, El Gamal, and Chung (), who showed that
it can significantly improve the achievable rates over nonlayered noisy network coding.
The random network model was first introduced by Gupta and Kumar (). They
analyzed the network under two network theoretic models for successful transmission, the
signal-to-interference ratio (SIR) model and the protocol model. They roughly showed
that the symmetric capacity under these models scales as Θ(N −1/2 ). Subsequent work
under these network theoretic models include Grossglauser and Tse () and El Gamal,
Mammen, Prabhakar, and Shah (a,b).
500 Gaussian Networks

The capacity scaling of a random network was first studied by Xie and Kumar ().
Subsequent work includes Gastpar and Vetterli () and Özgür, Lévêque, and Preiss-
mann (). The lower bound in Theorem . is due to El Gamal, Mammen, Prab-
hakar, and Shah (a). This lower bound was improved to C(N) = Ω(N −1/2 ) w.h.p.
by Franceschetti, Dousse, Tse, and Thiran (). The upper bound in Theorem . was
established by Lévêque and Telatar (). Analysis of scaling laws based on physical lim-
itations on electromagnetic wave propagation was studied in Franceschetti, Migliore, and
Minero () and Özgür, Lévêque, and Tse ().

PROBLEMS

.. Prove the cutset bound in Theorem ..

.. Consider the Gaussian two-way relay channel in Section ...
(a) Derive the cutset bound in (.), the decode–forward inner bound in (.),
the amplify–forward inner bound in (.), and the noisy network coding inner
bound in (.).
(b) Suppose that in the decode–forward coding scheme, node  uses network cod-
ing and broadcasts the modulo- sum of the binary sequence representations
of M1 and M2 , instead of (M1 , M2 ), and nodes  and  find each other’s message
by first recovering the modulo- sum. Show that this modified coding scheme
yields the lower bound

R1 < min󶁁C(S31 ), C(S13 ), C(S23 )󶁑,

R2 < min󶁁C(S32 ), C(S13 ), C(S23 )󶁑,
R1 + R2 < C(S31 + S32 ).

Note that this bound is worse than the decode–forward lower bound when
node  broadcasts (M1 , M2 ) ! Explain this surprising result.
(c) Let д31 = д32 = 1 and д13 = д23 = 2. Show that the gap between the decode–
forward inner bound and the cutset bound is unbounded.
(d) Let д31 = д13 = д23 = 1 and д32 = 󵀂P. Show that the gap between the amplify–
forward inner bound and the cutset bound is unbounded.
.. Consider the cellular time-division scheme in Section ...
(a) Show that P(E1 ) tends to zero as n → ∞.
(b) Verify the upper bound on the total average interference power in (.).
.. Capacity scaling of the N-user-pair Gaussian IC. Consider the N-user-pair sym-
metric Gaussian interference channel

Y N = G XN + ZN ,
Appendix 19A Proof of Lemma 19.1 501

where the channel gain matrix is

1 a ⋅⋅⋅ a
󶀔a 1 ⋅ ⋅ ⋅ a󶀅
󶀄 󶀕
G = 󶀔. . . .. 󶀕 .
󶀔 .
󶀜. .. . . .󶀕 󶀝
a a ⋅⋅⋅ 1
Assume average power constraint P on each sender. Denote the symmetric capac-
ity by C(N).
(a) Using time division with power control, show that the symmetric capacity is
lower bounded as
1
C(N) ≥ C(NP).
N
(b) Show that the symmetric capacity is upper bounded as C(N) ≤ C(P). (Hint:
Consider the case a = 0.)
(c) Tighten the bound in part (b) and show that
1
C(N) ≤ log |I + GG T P |
2N
1
= log 󶀡(1 + (a − 1)2 P)k−1 (1 + (a(N − 1) + 1)2 P)󶀱.
2N
(d) Show that when a = 1, the symmetric capacity is C(N) = (1/N) C(P).

APPENDIX 19A PROOF OF LEMMA 19.1

It is straightforward to show that the number of sources in each cell S(N) = O(log N)
w.h.p. We now bound L(N). Consider a torus with the same area and the same square cell
division as the square area. For each S-D pair on the torus, send each packet along the
four possible lines connecting them. Clearly for every configuration of nodes, each cell in
the torus has at least as many S-D lines crossing it as in the original square. The reason we
consider the torus is that the pmf of the number of lines in each cell becomes the same,
which greatly simplifies the proof.
Let H j be the total number of hops taken by packets traveling along one of the four
lines between S-D pair j, j ∈ [1 : N]. It is not difficult to see that the expected length of
each path is Θ(N 1/2 ). Since the hops are along cells having side-length (log N)1/2 ,

E(H j ) = Θ󶀡(N/ log N)1/2 󶀱.

Fix a cell c ∈ [1 : N/ log N] and define E jc to be the indicator of the event that a line
between S-D pair j ∈ [1 : N] passes through cell c ∈ [1 : N/ log N], i.e.,

1 if a hop of S-D pair j is in cell c,

E jc = 󶁇
0 otherwise.
502 Gaussian Networks

Summing up the total number of hops in the cells in two different ways, we obtain

N N/ log N N
󵠈 󵠈 E jc = 󵠈 H j .
j=1 c=1 j=1

Taking expectations on both sides and noting that the probabilities P{E jc = 1} are equal
for all j ∈ [1 : N] because of the symmetry on the torus, we obtain

N2
P{E jc = 1} = N E(H j ),
log N

or equivalently,
P{E jc = 1} = Θ󶀡(log N/N)1/2 󶀱

for j ∈ [1 : N] and c ∈ [1 : N/ log N]. Now for a fixed cell c, the total number of lines
passing through it is Lc = ∑Nj=1 E jc . This is the sum of N i.i.d. Bernoulli random variables
since the positions of the nodes are independent and E jc depends only on the positions
of the source and destination nodes of S-D pair j. Moreover
N
E(L c ) = 󵠈 P{E jc = 1} = Θ󶀡(N log N)1/2 󶀱
j=1

for every cell c. Hence, by the Chernoff bound,

P{L c > (1 + δ) E(L c )} ≤ exp(− E(L c )δ 2 /3).

Choosing δ = 2󵀄2 log N/ E(L c ) yields

P{L c > (1 + δ) E(L c )} ≤ 1/N 2 .

Since δ = o(1), L c = O(E(L c )) with probability ≥ 1 − 1/N 2 . Finally using the union of
events bound over N/ log N cells shows that L(N) = max c∈[1:N/ log N] L c = O((N log N)1/2 )
with probability ≥ 1 − 1/(N log N) for sufficiently large N.

APPENDIX 19B PROOF OF LEMMA 19.2

Define
W(N) = log 󶀢1 + (2U )−󰜈/2 (2NP)1/2 󶀲 ,

where U ∼ Unif[0, N 1/2 ]. Since D(N) is the sum of i.i.d. random variables, we have

E(D(N)) = N E(W(N)),
Var(D(N)) = N Var(W(N)).

We find upper and lower bounds on E(W(N)) and an upper bound on E(W 2 (N)). For
Appendix 19B Proof of Lemma 19.2 503

simplicity, we assume the natural logarithm here since we are only interested in order
󰜈−1 󳰀
results. Let a = e − 2 P 1/2 , 󰜈󳰀 = 󰜈/2, k = N 1/2 , and u0 = (ak)1/󰜈 . Consider
k 󳰀
k E(W(N)) = 󵐐 log(1 + au−󰜈 k) du
0
1 󳰀 u0 󳰀
= 󵐐 log(1 + au−󰜈 k) du + 󵐐 log(1 + au−󰜈 k) du
0 1
k 󳰀
+ 󵐐 log(1 + au−󰜈 k) du (.)
u0
1 󳰀 u0 k 󳰀
≤ 󵐐 log((1 + ak)u−󰜈 ) du + 󵐐 log(1 + ak) du + 󵐐 au−󰜈 k du
0 1 u0
1
= log(1 + ak) + 󰜈󳰀 󵐐 log(1/u) du + (u0 − 1) log(1 + ak)
0
ak 󳰀 󳰀
+ 󳰀 󶀡u−(󰜈
0
−1)
− k −(󰜈 −1) 󶀱.
󰜈 −1
Thus, there exists a constant b1 > 0 such that for N sufficiently large,

E(W(N)) ≤ b1 N −1/2+1/󰜈 log N. (.)

Now we establish a lower bound on E(W(N)). From (.), we have

u0 󳰀 u0
k E(W(N)) ≥ 󵐐 log(1 + au−󰜈 k) du ≥ 󵐐 log(1 + a) du = (u0 − 1) log(1 + a).
1 1

Thus there exists a constant b2 > 0 such that for N sufficiently large,

E(W(N)) ≥ b2 N −1/2+1/󰜈 . (.)

Next we find an upper bound on E(W 2 (N)). Consider

1 󳰀 u0 󳰀
2 2
k E(W 2 (N)) = 󵐐 󶀡log(1 + au−󰜈 k)󶀱 du + 󵐐 󶀡log(1 + au−󰜈 k)󶀱 du
0 1
k
−󰜈󳰀 2
+ 󵐐 󶀡log(1 + au k)󶀱 du
u0
1 󳰀 u0 k 󳰀
2 2
≤ 󵐐 󶀡log((1 + ak)u−󰜈 )󶀱 du + 󵐐 󶀡log(1 + ak)󶀱 du + 󵐐 a2 u−2󰜈 k 2 du
0 1 u0
1
2
≤ 󶀡log(1 + ak)󶀱 + (󰜈 󳰀 )2 󵐐 (log(1/u))2 du
0
1
󳰀 2
+ 2󰜈 log(1 + ak) 󵐐 log(1/u) du + (u0 − 1)󶀡log(1 + ak)󶀱
0
a2 k 2 󳰀 󳰀
+ 󳰀 󶀣u0−(2󰜈 −1) − k −(2󰜈 −1) 󶀳 .
2󰜈 − 1
504 Gaussian Networks

Thus there exists a constant b3 > 0 such that for N sufficiently large,

E(W 2 (N)) ≤ b3 N −1/2+1/󰜈 (log N)2 . (.)

Finally, using the Chebyshev lemma in Appendix B and substituting from (.), (.),
and (.), then for N sufficiently large, we have

P󶁁D(N) ≥ 2b1 N 1/2+1/󰜈 log N󶁑 ≤ P󶁁D(N) ≥ 2 E(D(N))󶁑

Var(D(N))
≤
(E[D(N)])2
N E(W 2 (N))
≤ 2
N (E[W(N)])2
b N −1/2+1/󰜈 (log N)2
≤ 3
b22 N 2/󰜈
b3
=󶀦 󶀶 N −1/2−1/󰜈 (log N)2 ,
b22

which tends to zero as N → ∞. This completes the proof of the lemma.

CHAPTER 20

Compression over Graphical

Networks

In this chapter, we study communication of correlated sources over networks represented

by graphs, which include as special cases communication of independent messages over
graphical networks studied in Chapter  and lossy and lossless source coding over noise-
less links studied in Part II of the book. We first consider networks modeled by a directed
acyclic graph. We show that the optimal coding scheme for communicating correlated
sources losslessly to the same set of destination nodes (multisource multicast) is to per-
form separate Slepian–Wolf coding and linear network coding. For the lossy case, we
consider the multiple description network in which a node observes a source and wishes
to communicate it with prescribed distortions to other nodes in the network. We establish
the optimal rate–distortion region for special classes of this network.
We then consider interactive communication over noiseless public broadcast chan-
nels. We establish the optimal rate region for multiway lossless source coding (coding for
omniscience/CFO problem) in which each node has a source and the nodes communi-
cate in several rounds over a noiseless public broadcast channel so that a subset of them
can losslessly recover all the sources. Achievability is proved via independent rounds of
Slepian–Wolf coding; hence interaction does not help in this case. Finally, we discuss
the two-way lossy source coding problem. We establish the rate–distortion region for a
given number of communication rounds, and show through an example that two rounds
of communication can strictly outperform a single round. Hence, interaction can help
reduce the rates in lossy source coding over graphical networks.

20.1 DISTRIBUTED LOSSLESS SOURCE–NETWORK CODING

In Chapter , we showed that linear network coding can achieve the cutset bound for
graphical multimessage multicast networks. We extend this result to lossless source cod-
ing of correlated sources over graphical networks. We model an N-node network by a di-
rected acyclic graph G = (N , E), and assume that the nodes are ordered so that there is no
path from node k to node j if j < k. Let (X1 , . . . , XN ) be an N-DMS. Node j ∈ [1 : N − 1]
observes the source X j and wishes to communicate it losslessly to a set of destination
506 Compression over Graphical Networks

X̂ 12 ( X̂ 1 j , X̂ 3 j )
2 j
X2

R12

1 R14 N
X1 X̂ 1N
4
R13

X3
3 k
X̂ 13 X̂ 2k

Figure .. Graphical multisource network.

nodes D j ⊆ [ j + 1 : N] as depicted in Figure .. The problem is to determine the set of

achievable link rate tuples (R jk : ( j, k) ∈ E).
A ((2nR 󰑗󰑘 : ( j, k) ∈ E), n) lossless source code for the multisource network G = (N , E)
is defined in a similar manner to the multimessage case. The probability of error is defined
as
Pe(n) = P󶁁 X̂ njk ̸= X nj for some j ∈ [1 : N − 1], k ∈ D j 󶁑.

A rate tuple (R jk : ( j, k) ∈ E) is said to be achievable if there exists a sequence of ((2nR 󰑗󰑘 :

( j, k) ∈ E), n) codes such that limn→∞ Pe(n) = 0. The optimal rate region R ∗ is the closure
of the set of achievable rates tuples.
Since this problem is a generalization of the multiple independent message case, the
optimal rate region is not known in general. The cutset bound in Theorem . can be
extended to the following.

Theorem . (Cutset Bound for the Graphical Multisource Network). If the rate
tuple (R jk : ( j, k) ∈ E) is achievable, then it must satisfy the inequality

R(S) ≥ H󶀡X(J (S)) 󵄨󵄨󵄨󵄨 X(S c )󶀱

for all S ⊂ [1 : N] such that S c ∩ D j ̸=  for some j ∈ [1 : N − 1], where

R(S) = 󵠈 R jk
( j,k): j∈S,k∈S 󰑐

and J (S) = { j ∈ S : S c ∩ D j ̸= }.

The cutset outer bound is tight when the sources are to be sent to the same set of
destination nodes. Suppose that Xk+1 = ⋅ ⋅ ⋅ = XN =  and D1 = ⋅ ⋅ ⋅ = Dk = D.
20.1 Distributed Lossless Source–Network Coding 507

Theorem .. The optimal rate region R ∗ for lossless multisource multicast coding
of a k-DMS (X1 , . . . , Xk ) is the set of rate tuples (R jl : ( j, l) ∈ E) such that

R(S) ≥ H󶀡X(S ∩ [1 : k]) 󵄨󵄨󵄨󵄨 X(S c ∩ [1 : k])󶀱

for all S ⊂ [1 : N] such that S c ∩ D ̸= .

The converse of this theorem follows by the cutset bound. The proof of achievabil-
ity follows by separate Slepian–Wolf coding of the sources at rates R1 , . . . , Rk and linear
network coding to send the bin indices.
Theorems . and . can be easily extended to wireless networks with cycles mod-
eled by a directed cyclic hypergraph. This is illustrated in the following.
Example . (Source coding with a two-way relay). Consider the noiseless two-way
relay channel depicted in Figure ., which is the source coding counterpart of the DM
and Gaussian two-way relay channels in Problem . and Section .., respectively.

X1n M1 (X1n ) M2 (X2n ) X2n

Node  Node  Node 
X̂ 2n X̂ 1n

M3 (M1 , M2 )

Figure .. Source coding over a noiseless two-way relay channel.

Let (X1 , X2 ) be a -DMS. Node  observes X1 and node  observes X2 . Communication

is performed in two rounds. In the first round, node j = 1, 2 encodes X nj into an index M j
at rate R j and sends it to the relay node . Node  then encodes the index pair (M1 , M2 )
into an index M3 at rate R3 and broadcasts it to both source nodes. Upon receiving M3 ,
nodes  and  find the estimates X̂ 2n and X̂ 1n , respectively, based on their respective sources
and the index M3 . By the extension of Theorem . to noiseless wireless networks, the
optimal rate region is the set of rate triples (R1 , R2 , R3 ) such that

R1 ≥ H(X1 | X2 ),
R2 ≥ H(X2 | X1 ),
R3 ≥ max{H(X1 | X2 ), H(X2 | X1 )}.
The converse follows by a simple cutset argument. To prove achievability, we use Slepian–
Wolf coding and linear network coding. Node  sends the nR1 -bit bin index M1 of X1n and
node  sends the nR2 -bit bin index M2 of X2n . Assume without loss of generality that
R1 ≥ R2 . The relay pads the binary sequence representation of M2 with zeros to obtain
the nR1 -bit index M2󳰀 . It then broadcasts the modulo- sum M3 = M1 ⊕ M2󳰀 . Upon re-
ceiving M3 , each node decodes the other node’s index to recover its source. Following the
508 Compression over Graphical Networks

achievability proof of the Slepian–Wolf theorem, the probability of error tends to zero as
n → ∞ if R1 > H(X1 |X2 ) + δ(є), R2 > H(X2 |X1 ) + δ(є) and R3 ≥ max{R1 , R2 }.

20.2 MULTIPLE DESCRIPTION NETWORK CODING

The optimal rate–distortion region for lossy source coding over graphical networks is not
known even for the single-hop case discussed in Chapter . Hence, we focus our dis-
cussion on the multiple description network, which extends multiple description coding
to graphical networks and is well motivated by multimedia distribution over communi-
cation networks as mentioned in the introduction of Chapter . Consider an N-node
communication network modeled by a directed acyclic graph G = (N , E). Let X be a
DMS and d j (x, x̂ j ), j ∈ [2 : N], be a set of distortion measures. Source node  observes
the source X and node j ∈ [2 : N] wishes to reconstruct it with a prescribed distortion D j .
We wish to find the set of achievable link rates.
A ((2nR 󰑗󰑘 : ( j, k) ∈ E), n) multiple description code for the graphical network G =
(N , E) consists of
∙ a source encoder that assigns an index m1 j (x n ) ∈ [1 : 2nR1 󰑗 ] to each x n ∈ X n for each
(1, j) ∈ E,
∙ a set of N − 2 relay encoders, where encoder k ∈ [2 : N − 1] assigns an index mkl ∈
[1 : 2nR󰑘󰑙 ] to each received index tuple (m jk : ( j, k) ∈ E) for each (k, l) ∈ E, and
∙ a set of N − 1 decoders, where decoder k ∈ [2 : N] assigns an estimate x̂nj to every
received index tuple (m jk : ( j, k) ∈ E).
A rate–distortion tuple ((R jk : ( j, k) ∈ E), D2 , . . . , DN ) is said to be achievable if there
exists a sequence of ((2nR 󰑗󰑘 : ( j, k) ∈ E), n) codes such that

lim sup E(d j (X n , X̂ nj )) ≤ D j , j ∈ [2 : N].

n→∞

The rate–distortion region R(D2 , . . . , DN ) for the multiple description network is the clo-
sure of the set of rate tuples (R jk : ( j, k) ∈ E) such that ((R jk : ( j, k) ∈ E), D2 , . . . , DN ) is
achievable.
It is easy to establish the following cutset outer bound on the rate–distortion region.

Theorem . (Cutset Bound for the Multiple Description Network). If the rate–
distortion tuple ((R jk : ( j, k) ∈ E), D2 , . . . , DN ) is achievable, then it must satisfy the
inequality
R(S) ≥ I(X; X(S ̂ c ))

for all S ⊂ [1 : N] such that 1 ∈ S for some conditional pmf p(x̂2N |x) that satisfies the
constraints E(d j (X, X̂ j )) ≤ D j , j ∈ [2 : N].
20.2 Multiple Description Network Coding 509

If d j , j ∈ [2 : N], are Hamming distortion measures and D j = 0 or 1, then the problem

reduces to communication of a single message at rate H(X) over the graphical multicast
network in Chapter  with the set of destination nodes D = { j ∈ [2 : N] : D j = 0}. The
rate–distortion region is the set of edge rate tuples such that the resulting min-cut capacity
C (minimized over all destination nodes) satisfies the condition H(X) ≤ C.
The rate–distortion region for the multiple description network is not known in gen-
eral. In the following, we present a few nontrivial special cases for which the optimal
rate–distortion tradeoff is known.

20.2.1 Cascade Multiple Description Network

The cascade multiple description network is a special case of the multiple description net-
work, where the graph has a line topology, i.e., E = {( j − 1, j) : j ∈ [2 : N]}. The rate–
distortion region for this network is known.

Theorem .. The rate–distortion region for a cascade multiple description network
with distortion tuple (D2 , . . . , DN ) is the set of rate tuples (R j−1, j : j ∈ [2 : N]) such that

̂ j : N])),
R j−1, j ≥ I(X; X([ j ∈ [2 : N],

for some conditional pmf p(x̂2N |x) that satisfies the constraints E(d j (X, X̂ j )) ≤ D j , j ∈
[2 : N].

For example, consider the -node cascade multiple description network depicted in
Figure .. By Theorem ., the rate–distortion region with distortion pair (D2 , D3 ) is
the set of rate pairs (R12 , R23 ) such that

R12 ≥ I(X; X̂ 2 , X̂ 3 ),
R23 ≥ I(X; X̂ 3 )

for some conditional pmf p(x̂2 , x̂3 |x) that satisfies the constraints E(d j (X, X̂ j )) ≤ D j , j =
2, 3.

X̂ 2

R12 R23
X X̂ 3
1 2 3

Figure .. Three-node cascade multiple description network.

Proof of Theorem .. The proof of the converse follows immediately by relaxing the
cutset bound in Theorem . only to the cuts (S, S c ) = ([1 : j − 1], [ j : N]), j ∈ [2 : N].
510 Compression over Graphical Networks

To prove achievability, fix the conditional pmf p(x̂2N |x) that satisfies the distortion con-
straints (with a factor 1/(1 + є) as in the achievability proofs for single-hop lossy source
coding problems). We generate the codebook sequentially for j = N , N − 1, . . . , 2. For
̃
each (m̃ j+1 , . . . , m
̃ N ), randomly and independently generate 2nR 󰑗 sequences x̂nj (m ̃ j, . . . ,
̃
m̃ N ), m
̃ j ∈ [1 : 2nR 󰑗 ], each according to ∏ni=1 p X̂ | X̂ 󰑁 (x̂ ji |x̂ j+1,i (m
̃ j+1 ), . . . , x̂Ni (m
̃ N )). By
󰑗 󰑗+1
the covering lemma and the typical average lemma, if
R̃ j > I(X; X̂ j | X̂ Nj+1 ) + δ(є) for all j ∈ [2 : N],

then there exists a tuple (x̂2n (m

̃ N2 ), x̂3n (m
̃ 3N ), . . . , x̂Nn (m
̃ N )) that satisfies the distortion con-
straints asymptotically. Letting
R j−1, j = R̃ j + R̃ j+1 + ⋅ ⋅ ⋅ + R̃ N , j ∈ [2 : N],
and using Fourier–Motzkin elimination, we obtain the desired inequalities
R j−1, j > I(X; X̂ Nj ) + δ(є), j ∈ [2 : N].
This completes the proof of Theorem ..

20.2.2 Triangular Multiple Description Network

Consider the -node triangular multiple description network depicted in Figure ..

2
X̂ 2
R12 R23

1 R13 3
X X̂ 3

Figure .. Triangular multiple description network.

The rate–distortion region is again known for this special case of the multiple descrip-
tion network.

Theorem .. The rate–distortion region for a triangular multiple description net-
work with distortion pair (D2 , D3 ) is the set of rate tuples (R12 , R13 , R23 ) such that

R23 ≥ I(U ; X),

R12 ≥ I(U , X̂ 2 ; X),
R13 ≥ I( X̂ 3 ; X |U )

for some conditional pmf p(u|x)p(x̂2 |u, x)p(x̂3 |u, x) that satisfies the constraints
E(d j (X, X̂ j )) ≤ D j , j = 2, 3.
20.2 Multiple Description Network Coding 511

To illustrate this theorem, consider the following.

Example .. Let X be a WGN(P) source and d2 , d3 be squared error distortion mea-
sures. The rate–distortion region for distortion pair (D2 , D3 ) is the set of rate tuples
(R12 , R13 , R23 ) such that
P
R12 ≥ R 󶀥 󶀵,
D2
P
R12 + R13 ≥ R󶀥 󶀵,
D3
P
R13 + R23 ≥ R󶀥 󶀵.
D3
The proof of the converse follows immediately by applying the cutset bound in Theo-
rem . to the quadratic Gaussian case. The proof of achievability follows by the Gaussian
successive refinement coding in Example .. This rate–distortion region can be equiva-
lently represented as the set of rate tuples (R12 , R13 , R23 ) such that
P
R12 ≥ R 󶀥 󶀵,
D2
D󳰀
R13 ≥ R󶀥 󶀵,
D3
P
R23 ≥ R󶀤 󳰀󶀴
D
for some D 󳰀 ≥ max{D2 , D3 }. This equivalent characterization follows by evaluating The-
orem . directly using the Gaussian test channels for the successive refinement scheme
in Section ..

Proof of Theorem .. To prove achievability, we first generate a description U of X at

rate R̃ 3 > I(U ; X). Conditioned on each description U, we generate reconstructions X̂ 2
and X̂ 3 of X at rates R̃ 2 > I(X; X̂ 2 |U ) and R13 > I(X; X̂ 3 |U), respectively. By substituting
R12 = R̃ 2 + R̃ 3 and R23 = R̃ 3 , and using the Fourier–Motzkin elimination procedure in
Appendix D, the inequalities in the theorem guarantee the achievability of the desired
distortions.
To prove the converse, we use the auxiliary random variable identification Ui = (M23 ,
X i−1 ). First consider
nR23 ≥ H(M23 )
n
= 󵠈 I(M23 , X i−1 ; Xi )
i=1
n
= 󵠈 I(Ui ; Xi )
i=1
= nI(UQ , Q; XQ )
= nI(U ; X),
512 Compression over Graphical Networks

where Q is a time-sharing random variable, U = (UQ , Q), and X = XQ . Next, since M23
is a function of M12 , we have

nR12 ≥ H(M12 )
≥ H(M12 |M23 )
≥ I(M12 ; X n |M23 )
n
≥ 󵠈 I( X̂ 2i ; Xi |M23 , X i−1 )
i=1

= nI( X̂ 2 ; X |U),

which depends only on p(x)p(u|x)p(x̂2 |u, x). Similarly,

nR13 ≥ nI( X̂ 3 ; X |U),

which depends only on p(x)p(u|x)p(x̂3 |u, x). Even though as defined, X̂ 2 and X̂ 3 are
not in general conditionally independent given (U , X), we can assume without loss of
generality that they are, since none of the mutual information and distortion terms in the
theorem is a function of the joint conditional pmf of ( X̂ 2 , X̂ 3 ) given (U , X). This completes
the proof of the theorem.

Remark 20.1. Theorems . and . can be extended to the network with edge set

E = {(1, 2), (2, 3), (3, 4), . . . , (N − 1, N), (1, N)}.

In this case, the rate–distortion region for distortion tuple (D2 , D3 , . . . , DN ) is the set of
rate tuples (R12 , R23 , R34 , . . . , RN−1,N , R1N ) such that

RN−1,N ≥ I(X; U),

R j−1, j ≥ I(X; X̂ N−1 , U ),
j j ∈ [2 : N − 1], (.)
R1N ≥ I(X; X̂ N |U)

for some conditional pmf p(u|x)p(x̂2N−1 |u, x)p(x̂N |u, x) such that E(d j (X, X̂ j )) ≤ D j , j ∈
[2 : N].
Remark 20.2. The multiple description network can be generalized by considering wire-
less networks modeled by a hypergraph or by adding correlated side information at vari-
ous nodes; see Problems . and ..

20.3 INTERACTIVE SOURCE CODING

As we showed in Chapter , interaction among nodes can increase the capacity of multi-
user networks. Does interactive communication also help reduce the rates needed for
encoding correlated sources? We investigate this question by considering source coding
over noiseless public broadcast channels.
20.3 Interactive Source Coding 513

20.3.1 Two-Way Lossless Source Coding

Consider the two-way lossless source coding setting depicted in Figure .. Let (X1 , X2 )
be a -DMS and assume a -node communication network in which node  observes
X1 and node  observes X2 . The nodes interactively communicate over a noiseless bi-
directional link so that each node can losslessly recover the source observed by the other
node. We wish to determine the number of communication rounds needed and the set of
achievable rates for this number of rounds.
Assume without loss of generality that node  sends the first index and that the num-
ber of rounds of communication q is even. A (2nr1 , . . . , 2nr󰑞 , n) code for two-way lossless
source coding consists of
∙ two encoders, one for each node, where in round l j ∈ { j, j + 2, . . . , q − 2 + j}, encoder
j = 1, 2 sends an index ml 󰑗 (x nj , m l 󰑗 −1 ) ∈ [1 : 2 󰑙 󰑗 ], that is, a function of its source vector
nr

and all previously transmitted indices, and

∙ two decoders, one for each node, where decoder  assigns an estimate x̂2n to each re-
ceived index and source sequence pair (mq , x1n ) and decoder  assigns an estimate x̂1n
to each index and source sequence pair (mq , x2n ).
The probability of error is defined as Pe(n) = P{( X̂ 1n , X̂ 2n ) ̸= (X1n , X2n )}. The total transmis-
sion rate for node j = 1, 2 is defined as R j = ∑ l 󰑗 = j, j+2,...,q−2+ j r l 󰑗 . A rate pair (R1 , R2 ) is said
to be achievable if there exists a sequence of (2nr1 , . . . , 2nr󰑞 , n) codes for some q such that
limn→∞ Pe(n) = 0 and R j = ∑l 󰑗 = j, j+2,...,q−2+ j r l 󰑗 . The optimal rate region R ∗ for two-way
lossless source coding is the closure of the set of achievable rate pairs (R1 , R2 ).
The optimal rate region for two-way lossless source coding is easy to establish. By the
Slepian–Wolf theorem, we know that R1 > H(X1 |X2 ) and R2 > H(X2 |X1 ) are sufficient
and can be achieved in two independent communication rounds. It is also straightforward
to show that this set of rate pairs is necessary. Hence, interaction does not help in this
lossless source coding setting.

X1n M l (X1n , M l−1 ) X2n

Node  M l+1 (X2n , M l ) Node 
X̂ 2n X̂ 1n

Figure .. Two-way lossless source coding.

Remark .. For error-free compression, it can be shown that

R1 ≥ H(X1 ),
R2 ≥ H(X2 | X1 )
is necessary in some cases, for example, when p(x, y) > 0 for all (x, y) ∈ X × Y. By con-
trast, without interaction, R1 ≥ H(X1 ) and R2 ≥ H(X2 ) are sometimes necessary. Hence
interaction can help in error-free compression.
514 Compression over Graphical Networks

20.3.2 CFO Problem

We now consider the more general problem of multiway lossless source coding of an
N-DMS (X1 , . . . , XN ) over an N-node wireless network, where node j ∈ [1 : N] observes
the source X j . The nodes interactively communicate over a noiseless public broadcast
channel so that each node in some subset A ⊆ [1 : N] can losslessly recover all the sources.
This problem is referred to as communication for omniscience (CFO).
As for the -node special case in the previous subsection, we assume that the nodes
communicate in a round-robin fashion in q rounds, where q is divisible by N and can
depend on the code block length. Node j ∈ [1 : N] broadcasts in rounds j, N + j, . . . , q −
N + j. Note that this “protocol” can achieve any other node communication order by
having some of the nodes send a null message in some of their allotted rounds.
A (2nr1 , . . . , 2nr󰑞 , n) code for the CFO problem consists of
∙ a set of encoders, where in round l j ∈ { j, N + j, . . . , q − N + j}, encoder j ∈ [1 : N]
sends an index ml 󰑗 (x nj , m l 󰑗 −1 ) ∈ [1 : 2 󰑙 󰑗 ], that is, a function of its source vector and
nr

all previously transmitted indices, and

∙ a set of decoders, where decoder k ∈ A assigns an estimate (x̂1k
n
, . . . , x̂Nn k ) to each index
sequence m and source sequence xk ∈ Xk .
q n n

The probability of error is defined as

Pe(n) = P󶁁( X̂ 1k
n
, . . . , X̂ Nn k ) =
̸ (X1n , . . . , XNn ) for some k ∈ A󶁑.

The total transmission rate for node j is defined as R j = ∑ l 󰑗 = j, j+N ,...,q−N+ j r l 󰑗 . A rate tuple
(R1 , . . . , RN ) is said to be achievable if there exists a sequence of (2nr1 , . . . , 2nr󰑞 , n) codes for
some q such that limn→∞ Pe(n) = 0 and R j = ∑ l 󰑗 = j, j+N ,...,q−N+ j r l 󰑗 . The optimal rate region
R ∗ (A) for the CFO problem is the closure of the set of achievable rate tuples (R1 , . . . , RN ).
In some cases, we are interested in the minimum sum-rate

N
RCFO (A) = min 󵠈 Rj.
(R1 ,...,R󰑁 )∈R ∗ (A)
j=1

As in the two-way lossless source coding problem, optimal coding for the CFO prob-
lem involves N rounds of Slepian–Wolf coding.

Theorem .. The optimal rate region R ∗ (A) for the CFO problem is the set of rate
tuples (R1 , R2 , . . . , RN ) such that

󵠈 R j ≥ H(X(S)| X(S c ))
j∈S

for every subset S ⊆ [1 : N]\{k} and every k ∈ A.

20.3 Interactive Source Coding 515

For N = 2 and A = {1, 2}, the optimal rate region is the set of rate pairs (R1 , R2 ) such
that
R1 ≥ H(X1 | X2 ),
R2 ≥ H(X2 | X1 ),

and the minimum sum-rate is RCFO ({1, 2}) = H(X1 |X2 ) + H(X2 |X1 ), which recovers the
two-way case in the previous subsection. Note that the optimal rate region for the set A is
the intersection of the optimal rates for the sets {k}, k ∈ A, i.e., R ∗ (A) = ⋂k∈A R ∗ ({k}).
Example .. Let N = 3 and A = {1, 2}. The optimal rate region is the set of rate triples
(R1 , R2 , R3 ) such that

Using Fourier–Motzkin elimination in Appendix D, we can show that

RCFO ({1, 2}) = H(X1 | X2 ) + H(X2 | X1 ) + max{H(X3 | X1 ), H(X3 | X2 )}.

Remark .. The minimum sum-rate RCFO (A) can be computed efficiently using du-
ality results from linear programming. In particular, a maximum of N constraints, as
compared to 2N for naive computation, are required.

20.3.3 Two-Way Lossy Source Coding

We now turn our attention to interactive lossy source coding; again see Figure .. Let
(X1 , X2 ) be a -DMS and d1 , d2 be two distortion measures. Node  observes X1 and
node  observes X2 . The nodes communicate in rounds over a noiseless bidirectional link
so that each node reconstruct the source at the other node with a prescribed distortion.
We wish to find the optimal number of communication rounds and the rate–distortion
region for this number of rounds.
We again assume that node  sends the first index and the number of rounds q is even.
A (2nr1 , . . . , 2nr󰑞 , n) code is defined as for the lossless case. A rate–distortion quadruple
(R1 , R2 , D1 , D2 ) is said to be achievable if there exists a sequence of (2nr1 , . . . , 2nr󰑞 , n) codes
such that lim supn→∞ E(d j (X nj , X̂ nj )) ≤ D j and R j = ∑ l 󰑗 = j, j+2,...,q−2+ j rl 󰑗 , j = 1, 2. The q-
round rate–distortion region R (q) (D1 , D2 ) is the closure of the set of rate pairs (R1 , R2 )
such that (R1 , R2 , D1 , D2 ) is achievable in q rounds. The rate–distortion region R(D1 , D2 )
is the closure of the set of rate pairs (R1 , R2 ) such that (R1 , R2 , D1 , D2 ) is achievable for
some q.
The rate–distortion region R(D1 , D2 ) is not known in general. We first consider sim-
ple inner and outer bounds.
516 Compression over Graphical Networks

Simple inner bound. Consider the inner bound on the rate–distortion region for two-
way lossy source coding obtained by having each node perform an independent round
of Wyner–Ziv coding considering the source at the other node as side information. This
simple scheme, which requires only two rounds of one-way communication, yields the
inner bound consisting of rate pairs (R1 , R2 ) such that

R1 > I(X1 ; U1 | X2 ),
(.)
R2 > I(X2 ; U2 | X1 )

for some conditional pmf p(u1 |x1 )p(u2 |x2 ) and functions x̂1 (u1 , x2 ) and x̂2 (u2 , x1 ) that
satisfy the constraints E(d j (X j , X̂ j )) ≤ D j , j = 1, 2.
Simple outer bound. Even if each encoder (but not the decoders) has access to the source
observed by the other node, the rate pair (R1 , R2 ) must satisfy the inequalities

R1 ≥ I(X1 ; X̂ 1 | X2 ),
(.)
R2 ≥ I(X2 ; X̂ 2 | X1 )

for some conditional pmf p(x̂1 |x1 , x2 )p(x̂2 |x1 , x2 ) that satisfies E(d j (X j , X̂ j )) ≤ D j , j =
1, 2.
These bounds can be sometimes tight.
Example .. Let (X1 , X2 ) be a -WGN(P, ρ) source and assume squared error distor-
tion measures. The inner and outer bounds in (.) and (.) coincide, since in this
case the Wyner–Ziv rate is the same as the rate when the side information is available at
both the encoder and the decoder. Hence, the rate–distortion region R(D1 , D2 ) is the set
of rate pairs (R1 , R2 ) such that R1 ≥ R((1 − ρ2 )P/D1 ) and R2 ≥ R((1 − ρ2 )P/D2 ), and is
achieved by two independent rounds of Wyner–Ziv coding.

The inner and outer bounds in (.) and (.), however, are not tight in general, and
interactive communication is needed to achieve the rate–distortion region. To reduce the
rates, the two nodes interactively build additional correlation between their respective
knowledge of the sources instead of using two independent rounds of Wyner–Ziv coding.

Theorem .. The q-round rate–distortion region R (q) (D1 , D2 ) for two-way lossy
source coding is the set of rate pairs (R1 , R2 ) such that

R1 ≥ I(X1 ; U q | X2 ),
R2 ≥ I(X2 ; U q | X1 )
q
for some conditional pmf ∏ l=1 p(u l |u l−1 , x j󰑙 ) with |Ul | ≤ |X j󰑙 |⋅(∏ l−1
j=1 |U j |) + 1 and
functions x̂1 (u , x2 ) and x̂2 (u , x1 ) that satisfy the constraints E(d j (X j , X̂ j )) ≤ D j , j =
q q

1, 2, where j l = 1 if l is odd and j l = 2 if l is even.

20.3 Interactive Source Coding 517

Note that the rate–distortion region in this theorem is computable for any given num-
ber of rounds q. However, there is no bound on q that holds in general; hence this theo-
rem does not yield a computable characterization of the rate–distortion region R(D1 , D2 ).
The following example shows that interactive communication can achieve a strictly lower
sum-rate than one round of Wyner–Ziv coding.
Example .. Let (X1 , X2 ) be a DSBS(p), d2 ≡ 0 (that is, there is no distortion constraint
on reconstructing X2 ), and

󶀂
󶀒 0 if x̂1 = x1 ,
󶀒
̂
d1 (x1 , x1 ) = 󶀊1 if x̂1 = e,
󶀒
󶀒
󶀚∞ if x̂1 = 1 − x1 .

With one-way communication from node  to node , it can be easily shown that the
optimal Wyner–Ziv rate is R (1) = (1 − D1 )H(p).
Now consider the rate–distortion region in Theorem . for two rounds of commu-
nication, i.e., q = 2. We set the conditional pmf p(u1 |x2 ) as a BSC(α), the conditional pmf
p(u2 |x1 , u1 ) as in Table ., and the function x̂1 (u2 , u1 , x2 ) = u2 . Evaluating the sum-rate
for p = 0.03, α = 0.3, and β = 0.3, we obtain
(2)
Rsum = I(X2 ; U1 | X1 ) + I(U2 ; X1 | X2 , U1 ) = 0.0858,

and the corresponding distortion is E(d1 (X1 , X̂ 1 )) = 0.5184. By comparison, the one-way
rate for this distortion is R(1) = 0.0936. Hence, interactive communication can reduce the
rate.

u2
(x1 , u1 ) 0 e 1
(0, 0) 1−β β 0

(0, 1) 0 1 0

(1, 0) 0 1 0

(1, 1) 0 β 1−β

Table .. Test channel p(u2 |x1 , u1 ) for Example ..

Proof of Theorem .. Achievability is established by performing Wyner–Ziv coding

in each round. Fix q and a joint pmf p(uq |x1 , x2 ) and reconstruction functions of the
forms specified in the theorem. In odd rounds l ∈ [1 : q], node  sends the bin index of
the description U l of X1 given U l−1 to node  at rate r1l > I(X1 ; Ul |U l−1 , X2 ), and in even
rounds node  sends the index of the description Ul of X2 at rate r2l > I(X2 ; U l |U l−1 , X1 ).
Summing up the rates for each node establishes the required bounds on R1 = ∑ l odd r1l
518 Compression over Graphical Networks

and R2 = ∑ l even r2l . At the end of the q rounds, node  forms the estimate x̂2 (U q , X1 ) of
X2 and node  forms the estimate x̂1 (U q , X2 ) of X1 . The details follow the achievability
proof of the Wyner–Ziv theorem in Section ..
The proof of the converse involves careful identification of the auxiliary random vari-
ables. Consider
q−1
nR1 ≥ 󵠈 H(Ml )
l odd
≥ H(M1 , M3 , . . . , Mq−1 )
≥ I(X1n ; M1 , M3 , . . . , Mq−1 | X2n )
= H(X1n | X2n ) − H(X1n | X2n , M1 , M3 , . . . , Mq−1 )
(a)
= H(X1n | X2n ) − H(X1n | X2n , M q )
n
n
≥ 󵠈 󶀢H(X1i | X2i ) − H(X1i | X2i , X2,i+1 , X1i−1 , M q )󶀲
i=1
n
n
= 󵠈 I(X1i ; X2,i+1 , X1i−1 , M q | X2i )
i=1
n
(b)
= 󵠈 I(X1i ; U1i , . . . , Uqi | X2i ),
i=1

where (a) follows since (M2 , M4 , . . . , Mq ) is a function of (M1 , M3 , . . . , Mq−1 , X2n ) and
(b) follows by the auxiliary random variable identifications U1i = (M1 , X1i−1 , X2,i+1 n
) and
U li = M l for l ∈ [2 : q]. The bound on nR2 can be similarly established with the same
identifications of U li . Next, we establish the following properties of U q (i) = (U1i , . . . , Uqi ).

Lemma .. For every i ∈ [1 : n], the following Markov relationships hold:
. X2i → (X1i , U1i , . . . , Ul−1,i ) → Uli for l odd.
. X1i → (X2i , U1i , . . . , Ul−1,i ) → Uli for l even.
. X2i−1 → (X2i , U q (i)) → X1i .
n
. X1,i+1 → (X1i , U q (i)) → X2i .

The proof of this lemma is given in Appendix A. We also need the following simple
lemma.

Lemma .. Suppose Y → Z → W form a Markov chain and d(y, ŷ) is a distortion
measure. Then for every reconstruction function ŷ(z, 󰑤), there exists a reconstruction
function ŷ∗ (z) such that

E󶁡d(Y , ŷ∗ (Z))󶁱 ≤ E󶁡d(Y , ŷ(Z, W))󶁱.

Summary 519

To prove Lemma ., consider

E󶁡d(Y , ŷ(Z, W))󶁱 = 󵠈 p(y, z, 󰑤)d(y, ŷ(z, 󰑤))

y,z,󰑤

= 󵠈 p(y, z) 󵠈 p(󰑤|z)d(y, ŷ(z, 󰑤))

y,z 󰑤
(a)
≥ 󵠈 p(y, z) min d(y, ŷ(z, 󰑤(z)))
y,z 󰑤(z)

= E󶁡d(Y , ŷ∗ (Z))󶁱,

where (a) follows since the minimum is less than or equal to the average.
Continuing with the converse proof of Theorem ., we introduce a time-sharing
random variable T and set U1 = (T , U1T ) and Ul = U lT for l ∈ [2 : q]. Noting that
(X1T , X2T ) has the same pmf as (X1 , X2 ), we have
n
1
R1 ≥ 󵠈 I(X1i ; U q (i)| X2i ) = I(X1T ; U q (T)| X2T , T ) = I(X1 ; U q | X2 )
n i=1
and
R2 ≥ I(X2 ; U q | X1 ).

Using the first two statements in Lemma . and the identification of U q , it can be eas-
q
ily verified that p(uq |x1 , x2 ) = ∏l=1 p(u l |u l−1 , x j󰑙 ) as desired. Finally, to check that the
distortion constraints are satisfied, note from Lemma . and the third statement of
Lemma . that for every i ∈ [1 : n], there exists a reconstruction function x̂1∗ such that

E󶁡d(X1i , x̂1∗ (i, U q (i), X2i ))󶁱 ≤ E󶁡d(X1i , x̂1i (M q , X2n ))󶁱.

Hence n
󵠈 E󶁡d(X1i , x̂1∗ (i, U q (i), X2i )) 󵄨󵄨󵄨󵄨 T = i󶁱
1
E󶁡d(X1 , x̂1∗ (U q , X2 ))󶁱 =
n i=1
n
1
≤ 󵠈 E󶁡d(X1i , x̂1i (M q , X2n ))󶁱.
n i=1

Letting n → ∞ shows that E[d(X1 , x̂1∗ (U q , X2 ))] ≤ D1 . The other distortion constraint
can be verified similarly. This completes the proof of Theorem ..

SUMMARY

∙ Cutset bounds for source coding networks

∙ Distributed lossless source–network coding: Separate Slepian–Wolf coding and net-
work coding suffices for multicasting correlated sources over graphical networks
∙ Multiple description network: Rate–distortion regions for tree and triangular net-
works
520 Compression over Graphical Networks

∙ CFO problem:
∙ Noninteractive rounds of Slepian–Wolf coding achieve the cutset bound
∙ Interaction does not reduce the rates in lossless source coding over noiseless public
broadcast channels
∙ Two-way lossy source coding:
∙ q-round rate–distortion region
∙ Identification of auxiliary random variables satisfying Markovity in the proof of
the converse
∙ Interaction can reduce the rates in lossy source coding over noiseless bidirectional
links

BIBLIOGRAPHIC NOTES

The optimal rate region for distributed lossless source-network coding in Theorem .
was established by Effros, Médard, Ho, Ray, Karger, Koetter, and Hassibi (). The loss-
less source coding problem over a noiseless two-way relay channel in Example . is a
special case of the three-source setup investigated by Wyner, Wolf, and Willems ().
The cascade multiple description network was formulated by Yamamoto (), who es-
tablished the rate–distortion region. The rate–distortion region for the triangular network
in Theorem . was also established by Yamamoto ().
El Gamal and Orlitsky () showed that R1 ≥ H(X1 ), R2 ≥ H(X2 |X1 ) is sometimes
necessary for interactive error-free source coding. The CFO problem was formulated by
Csiszár and Narayan (), who established the optimal rate region. The q-round rate–
distortion region for two-way lossy source coding was established by Kaspi (). Exam-
ple ., which shows that two rounds of communication can strictly outperform a single
round, is due to Ma and Ishwar ().

PROBLEMS

.. Prove Theorem ..

.. Provide the details of the proof of Theorem ..
.. Show that the rate–distortion region of the generalized triangular multiple de-
scription network in Remark . is given by (.).
.. Prove Theorem ..
.. Provide the details of the proof of the rate–distortion region for two-way quadratic
Gaussian source coding in Example ..
Problems 521

.. Dual-cascade multiple description network. Consider the multiple description net-
work depicted in Figure .. Find the rate–distortion region R(D2 , D3 , D4 , D5 ).
Remark: This result can be easily extended to networks with multiple cascades.

X̂ 2

2 R23 3
X̂ 3

R12

1
X

R14
R45
X̂ 5
4 5

X̂ 4

Figure .. Dual-cascade multiple description network.

.. Branching multiple description network. Consider the -node branching multiple
description network depicted in Figure .. Show that a rate triple (R12 , R23 , R24 )
is achievable for distortion triple (D2 , D3 , D4 ) if

R12 > I(X; X̂ 2 , X̂ 3 , X̂ 4 |U) + 2I(X; U ) + I( X̂ 1 ; X̂ 2 |U ),

R23 > I(X; X̂ 3 , U),
R24 > I(X; X̂ 4 , U),
R23 + R24 > I(X; X̂ 3 , X̂ 4 |U) + 2I(X; U ) + I( X̂ 1 ; X̂ 2 |U )

for some conditional pmf p(u, x̂2 , x̂3 , x̂4 |x) such that E(d j (X, X̂ j )) ≤ D j , j = 2, 3, 4.
.. Diamond multiple description network. Consider the -node diamond multiple
network depicted in Figure .. Show that a rate quadruple (R12 , R13 , R24 , R34 ) is
achievable for distortion triple (D2 , D3 , D4 ) if

R12 > I(X; X̂ 2 , U2 , U4 ),

R13 > I(X; X̂ 3 , U3 , U4 ),
R24 > I(X; U2 , U4 ),
R34 > I(X; U3 , U4 ),
R24 + R34 > I(X; X̂ 4 , U2 , U3 |U4 ) + 2I(U4 ; X) + I(U2 ; U3 |U4 )
522 Compression over Graphical Networks

3
X̂ 3
R23

1 R12 2
X X̂ 2

R24

X̂ 4
4

Figure .. Branching multiple description network.

2
X̂ 2
R12 R24

1 4
X X̂ 4

R13 R34

X̂ 3
3

Figure .. Diamond multiple description network.

for some conditional pmf p(u2 , u3 , u4 , x̂2 , x̂3 , x̂4 |x) such that E(d j (X, X̂ j )) ≤ D j ,
j = 2, 3, 4.
.. Cascade multiple description network with side information. The cascade multiple
description network can be extended to various side information scenarios. Here
we consider two cases. In the first case, the rate–distortion region is known. In the
second case, the rate–distortion region is known only for the quadratic Gaussian
case. In the following assume the -DMS (X, Y) and distortion measures d2 and
d3 . The definitions of a code, achievability and rate–distortion region for each
scenario are straightforward extensions of the case with no side information.
(a) Consider the case where the side information Y is available at both node 
and node  as depicted in Figure .. Show that the rate–distortion region
R(D2 , D3 ) is the set of rate pairs (R12 , R23 ) such that

R12 ≥ I(X; X̂ 2 , X̂ 3 |Y),

R23 ≥ I(X, Y ; X̂ 3 )
Problems 523

for some conditional pmf p(x̂2 , x̂3 |x, y) that satisfies E(d j (X, X̂ j )) ≤ D j , j =
2, 3.

X̂ 2

1 R12 2 R23 3
X X̂ 3

Y Y
Figure .. Cascade multiple description network with side information at node 
and node .

(b) Consider the case where the side information Y is available only at node 
as depicted in Figure .. Establish the inner bound on the rate–distortion
region for distortion pair (D2 , D3 ) that consists of the set of rate pairs (R12 , R23 )
such that
R23 > I(X; U , V |Y),
R12 > I(X; X̂ 2 , U ) + I(X; V |U , Y)
for some conditional pmf p(u, 󰑣|x) and function x̂3 (u, 󰑣, y) that satisfy the con-
straints E(d j (X, X̂ j )) ≤ D j , j = 2, 3.

X̂ 2

1 R12 2 R23 3
X X̂ 3

Y
Figure .. Cascade multiple description network with side information at node .

Remark: The first part is due to Permuter and Weissman () and the second
part is due to Vasudevan, Tian, and Diggavi ().
.. Quadratic Gaussian extension. Consider the cascade multiple description net-
work with side information depicted in Figure .. Let (X, Y) be a -WGN with
X ∼ N(0, P) and Y = X + Z, where Z ∼ N(0, N) is independent of X, and let d2
and d3 be squared error distortion measures. Show that the rate–distortion region
R(D2 , D3 ) is the set of rate pairs (R12 , R23 ) such that
P
R12 ≥ R 󶀥 󶀵,
D2
PX|Y
R23 ≥ R󶀥 󶀵
D3
524 Compression over Graphical Networks

for D3 ≥ D2∗ , and

PN
R12 ≥ R 󶀥 󶀵,
D3 (D2 + N)
PX|Y
R23 ≥ R󶀥 󶀵
D3

for D3 < D2∗ , where D2∗ = D2 N/(D2 + N) and PX|Y = PN/(P + N).
(Hint: Evaluate the inner bound in Problem . for this case. For D3 ≥ D2∗ ,
set X = X̂ 2 + Z2 , where X̂ 2 ∼ N(0, P − D2 ) and Z2 ∼ N(0, D2 ). Next, let X̂ 2 = U +
Z3 , where Z3 ∼ N(0, D3 N/(N − D3 ) − D2 ), U ∼ N(0, PU ), and X̂ 3 = E(X|U , Y ).
For D3 < D2∗ , set U = X + Z2 and V = X + Z3 , where Z2 ∼ N(0, P2 ) and Z3 ∼
N(0, P3 ) are independent, and show that P2 and P3 can be chosen such that 1/P +
1/P2 = 1/D2 and 1/P + 1/P2 + 1/P3 + 1/N = 1/D3 . To prove the inequality on
R12 , assume that node  receives the message M12 in addition to M23 . Argue that
the minimum R12 for this modified setup is equal to the minimum rate (at the
same distortions) for lossy source coding when side information may be absent
discussed in Section ..)
.. Triangular multiple description network with side information. Consider the -node
multiple description network with side information at nodes  and  as depicted in
Figure . for a -DMS (X, Y) and distortion measures d2 and d3 . Show that the
rate–distortion region R(D2 , D3 ) is the set of rate triple (R12 , R23 , R13 ) such that

R12 ≥ I(X; X̂ 2 , U |Y),

R23 ≥ I(X, Y ; U),
R13 ≥ I(X; X̂ 3 |U )

for some conditional pmf p(x̂2 , u|x, y)p(x̂3 |x, u) that satisfies E(d j (X, X̂ j )) ≤ D j ,
j = 2, 3.

2
Y X̂ 2
R12 R23

1 R13 3
X X̂ 3

Figure .. Triangular multiple description network with side information.

Appendix 20A Proof of Lemma 20.1 525

.. Conditional CFO problem. Let (X1 , . . . , XN , Y ) be an (N + 1)-DMS. Consider a

variation on the CFO problem in Section .., where node j ∈ [1 : N] observes
the common side information Y as well as its own source X j . Find the optimal rate
region R ∗ (A).
.. Lossy source coding over a noiseless two-way relay channel. Consider the lossy ver-
sion of the source coding problem over a noiseless two-way relay channel in Ex-
ample .. Let d1 and d2 be two distortion measures. Suppose that source node 
wishes to reconstruct the source X2 with distortion D2 and source node  wishes
to reconstruct X1 with distortion D1 .
(a) Show that if the rate–distortion quintuple (R1 , R2 , R3 , D1 , D2 ) is achievable,
then it must satisfy the inequalities

R1 ≥ I(X1 ; U1 | X2 ),
R2 ≥ 0,
R3 ≥ I(X2 ; U2 | X1 , U1 )

for some conditional pmf p(u1 |x1 )p(u2 |x2 , u1 ) and functions x̂1 (u1 , u2 , x2 ) and
x̂2 (u1 , u2 , x1 ) such that E(d j (X j , X̂ j )) ≤ D j , j = 1, 2. (Hint: Consider the two-
round communication between node  and supernode {2, 3}.)
(b) Show that the rate–distortion quintuple (R1 , R2 , R3 , D1 , D2 ) is achievable if

R1 > RSI-D2 (D1 ),

R1 > RSI-D1 (D2 ),
R3 > max{RSI-D2 (D1 ), RSI-D1 (D2 )},

where

RSI-D2 (D1 ) = min I(X1 ; U | X2 ),

̂ 1 ))≤D1
p(u|x1 ), x̂1 (u,x2 ):E(d1 (X1 , X

RSI-D1 (D2 ) = min I(X2 ; U | X1 ).

̂ 2 ))≤D2
p(u|x2 ), x̂2 (u,x1 ):E(d2 (X2 , X

(c) Find the rate–distortion region when the sources are independent.
(d) Find the rate–distortion region when (X1 , X2 ) is a -WGN(P,ρ) source and d1
and d2 are squared error distortion measures.
Remark: This problem was studied by Su and El Gamal ().

APPENDIX 20A PROOF OF LEMMA 20.1

We first prove the following.

526 Compression over Graphical Networks

Lemma .. Let (Y1 , Y2 , Z1 , Z2 ) be a quadruple of random variables with joint pmf
p(y1 , y2 , z1 , z2 ) = p(y1 , z1 )p(y2 , z2 ) and M l is a function of (Y1 , Y2 , M l−1 ) for l odd and
(Z1 , Z2 , M l−1 ) for l even. Then

I(Y2 ; Z1 |M q , Y1 , Z2 ) = 0,
I(Z1 ; Ml |M l−1 , Y1 , Z2 ) = 0 for l odd,
I(Y2 ; Ml |M l−1 , Y1 , Z2 ) = 0 for l even.

To prove this lemma, consider

I(Y2 ; Z1 |M q , Y1 , Z2 ) = H(Y2 |M q , Y1 , Z2 ) − H(Y2 |M q , Y1 , Z2 , Z1 )

Continuing this process gives I(Y2 ; Z1 |Y1 , Z2 ) = 0. Hence, all the inequalities hold with
equality, which completes the proof of Lemma ..
We are now ready to prove Lemma .. First consider the first statement X2i →
(X1i , U l−1 (i)) → Ul (i) for l > 1. Since Uli = M l ,

I(Ul (i); X2i | X1i , Uil−1 ) ≤ I(M l ; X2i , X2i−1 | X1i , M l−1 , X1i−1 , X2,i+1
n
).

Setting Y1 = (X1i−1 , X1i ), Y2 = X1,i+1

n
, Z1 = (X2i−1 , X2i ), and Z2 = X2,i+1
n
in Lemma .
i−1 l−1 i−1 n
yields I(M l ; X2i , X2 | X1,i , M , X1 , X2,i+1 ) = 0 as desired. For l = 1, note that

I(X2i ; M1 , X1i−1 , X2,i+1

n
| X1i ) ≤ I(X2i ; M1 , X1i−1 , X2,i+1
n n
, X1,i+1 | X1i )
= I(X2i ; X1i−1 , X2,i+1
n n
, X1,i+1 | X1i )
= 0.

The second statement can be similarly proved. To prove the third statement, note that

I(X1i ; X2i−1 |M q , X2i , X2,i+1

n
, X1i−1 ) ≤ I(X1i , X1,i+1
n
; X2i−1 |M q , X2i , X2,i+1
n
, X1i−1 ).

Setting Y1 = X1i−1 , Y2 = (X1,i+1

n
, X1i ), Z1 = X2i−1 , and Z2 = (X2,i+1
n
, X2i ) in Lemma .
yields I(X1i , X1 ; X2 |M , X2i , X2,i+1 , X1 ) = 0 as desired. The last statement can be
i−1 i−1 q n i−1

similarly proved.
PART IV

EXTENSIONS
CHAPTER 21

Communication for Computing

In the first three parts of the book we investigated the limits on information flow in net-
works whose task is to communicate (or store) distributed information. In many real-
world distributed systems, such as multiprocessors, peer-to-peer networks, networked
mobile agents, and sensor networks, the task of the network is to compute a function,
make a decision, or coordinate an action based on distributed information. Can the com-
munication rate needed to perform such a task at some node be reduced relative to com-
municating all the sources to this node?
This question has been formulated and studied in computer science under communi-
cation complexity and gossip algorithms, in control and optimization under distributed
consensus, and in information theory under coding for computing and the μ-sum prob-
lem, among other topics. In this chapter, we study information theoretic models for dis-
tributed computing over networks. In some cases, we find that the total communication
rate can be significantly reduced when the task of the network is to compute a function of
the sources rather than to communicate the sources themselves, while in other cases, no
such reduction is possible.
We first show that the Wyner–Ziv theorem in Chapter  extends naturally to the case
when the decoder wishes to compute a function of the source and the side information.
We provide a refined characterization of the lossless special case of this result in terms of
conditional graph entropy. We then discuss distributed coding for computing. Although
the rate–distortion region for this case is not known in general (even when the goal is to
reconstruct the sources themselves), we show through examples that the total communi-
cation rate needed for computing can be significantly lower than for communicating the
sources themselves. The first example we discuss is the μ-sum problem, where the de-
coder wishes to reconstruct a weighted sum of two separately encoded Gaussian sources
with a prescribed quadratic distortion. We establish the rate–distortion region for this
setting by reducing the problem to the CEO problem discussed in Chapter . The second
example is lossless computing of the modulo- sum of a DSBS. Surprisingly, we find that
using the same linear code at both encoders can outperform Slepian–Wolf coding.
Next, we consider coding for computing over multihop networks.
∙ We extend the result on two-way lossy source coding in Chapter  to computing a
function of the sources and show that interaction can help reduce the total commu-
nication rate even for lossless computing.
530 Communication for Computing

∙ We establish the optimal rate region for cascade lossless coding for computing and
inner and outer bounds on the rate–distortion region for the lossy case.
∙ We present an information theoretic formulation of the distributed averaging problem
and establish upper and lower bounds on the network rate–distortion function that
differ by less than a factor of  for some large networks.
Finally, we show through an example that even when the sources are independent,
source–channel separation may fail when the goal is to compute a function of several
sources over a noisy multiuser channel.

21.1 CODING FOR COMPUTING WITH SIDE INFORMATION

Consider the communication system for computing with side information depicted in
Figure .. Let (X, Y) be a -DMS and d(z, ẑ) be a distortion measure. Suppose that the
receiver wishes to reconstruct a function Z = д(X, Y ) of the -DMS with distortion D.
What is the rate–distortion function R д (D), that is, the minimum rate needed to achieve
distortion D for computing the function д(X, Y)?

Xn M ( Ẑ n , D)
Encoder Decoder

Figure .. Coding for computing with side information at the decoder.

We define a (2nR , n) code, achievability, and the rate–distortion function R д (D) for
this setup as in Chapter . Following the proof of the Wyner–Ziv theorem, we can readily
determine the rate–distortion function.

Theorem .. The rate–distortion function for computing Z = д(X, Y) with side in-
formation Y is
R д (D) = min I(X; U |Y),

where the minimum is over all conditional pmfs p(u|x) with |U | ≤ |X | + 1 and func-
̂ ≤ D.
tions ẑ(u, y) such that E(d(Z, Z))

To illustrate this result, consider the following.

Example .. Let (X, Y) be a -WGN(P, ρ) source and d be a squared error distortion
measure. If the function to be computed is д(X, Y) = (X + Y )/2, that is, the average of
the source and the side information, then R д (D) = R((1 − ρ2 )P/4D), and is achieved by
Wyner–Ziv coding of the source X/2 with side information Y .
21.1 Coding for Computing with Side Information 531

21.1.1 Lossless Coding for Computing

Now suppose that the receiver wishes to recover the function Z = д(X, Y) losslessly. What
is the optimal rate R∗д ? Clearly

H(Z |Y ) ≤ R∗д ≤ H(X |Y ). (.)

These bounds sometimes coincide, for example, if Z = X, or if (X, Y ) is a DSBS(p) and

д(X, Y ) = X ⊕ Y. The bounds do not coincide in general, however, as illustrated in the
following.
Example .. Let X = (V1 , V2 , . . . , V10 ), where the V j , j ∈ [1 : 10], are i.i.d. Bern(1/2),
and Y ∼ Unif[1 : 10]. Suppose д(X, Y) = VY . The lower bound gives H(VY |Y) = 1 bit and
the upper bound gives H(X|Y ) = 10 bits. It can be shown that R∗д = 10 bits, that is, the
decoder must be able to recover X losslessly.

Optimal rate. As we showed in Sections . and ., lossless source coding can be viewed
as a special case of lossy source coding. Similarly, the optimal rate for lossless comput-
ing with side information R∗д can be obtained from the rate–distortion function in Theo-
rem . by assuming a Hamming distortion measure and setting D = 0. This yields

R∗д = R д (0) = min I(X; U |Y),

where the minimum is over all conditional pmfs p(u|x) such that H(Z|U , Y) = 0.
The characterization of the optimal rate R∗д can be further refined. We first introduce
some needed definitions. Let G = (N , E) be an undirected simple graph (i.e., a graph with
no self-loops). A set of nodes in N is said to be independent if no two nodes are connected
to each other through an edge. An independent set is said to be maximally independent if it
is not a subset of any other independent set. Define Γ(G) to be the collection of maximally
independent sets of G. For example, the collection of maximally independent sets for the
graph in Figure . is Γ(G) = {{1, 3, 5}, {1, 3, 6}, {1, 4, 6}, {2, 5}, {2, 6}}.

1 2 4 5 6

Figure .. Example graph.

Graph entropy. Let X be a random variable over the nodes of the graph G. Define a
random variable W ∈ Γ(G) with conditional pmf p(󰑤|x) such that p(󰑤|x) = 0 if x ∉ 󰑤,
and hence for every x, ∑󰑤: x∈󰑤 p(󰑤|x) = 1. The graph entropy of X is defined as

HG (X) = min I(X; W).

p(󰑤|x)
532 Communication for Computing

To better understand this definition, consider the following.

Example 21.3. Let G be a graph with no edges, i.e., E = . Then, the graph entropy
HG (X) = 0, since Γ(G) = {N } has a single element.
Example 21.4. Let G be a complete graph. Then, the graph entropy is HG (X) = H(X)
and is attained by W = {X}.
Example 21.5. Let X be uniformly distributed over N = {1, 2, 3} and assume that the
graph G has only a single edge (1, 3). Then, Γ(G) = {{1, 2}, {2, 3}}. By convexity, I(X; W)
is minimized when p({1, 2}|2) = p({2, 3}|2) = 1/2. Thus
1 2
HG (X) = H(W) − H(W | X) = 1 − = .
3 3

Conditional graph entropy. Now let (X, Y) be a pair of random variables and let the
nodes of the graph G be the support set of X. The conditional graph entropy of X given
Y is defined as
HG (X |Y) = min I(X; W |Y),

where the minimum is over all conditional pmfs p(󰑤|x) such that W ∈ Γ(G) and p(󰑤|x) =
0 if x ∉ 󰑤. To explain this definition, consider the following.

Example 21.6. Let E = . Then HG (X|Y) = 0.

Example 21.7. Let G be a complete graph. Then HG (X|Y) = H(X|Y ).
Example 21.8. Let (X, Y) be uniformly distributed over {(x, y) ∈ {1, 2, 3}2 : x ̸= y} and
assume that G has a single edge (1, 3). By convexity, I(X; W|Y) is minimized by setting
p({1, 2}|2) = p({2, 3}|2) = 1/2. Thus
1 2 1 1 2 1
HG (X |Y ) = H(W |Y ) − H(W | X, Y) = + H 󶀣 󶀳 − = H 󶀣 󶀳.
3 3 4 3 3 4

Refined characterization. Finally, define the characteristic graph G of the (X, Y , д) triple
such that the set of nodes is the support of X and there is an edge between two distinct
nodes x and x 󳰀 iff there exists a symbol y such that p(x, y), p(x 󳰀 , y) > 0 and д(x, y) ̸=
д(x 󳰀 , y). Then, we can obtain the following refined expression for the optimal rate.

Theorem .. The optimal rate for losslessly computing the function д(X, Y) with
side information Y is
R∗д = HG (X |Y ).

To illustrate this result, consider the following.

Example . (Online card game). Recall the online card game discussed in Chapter ,
where Alice and Bob each select one card without replacement from a virtual hat with
three cards labeled ,,. The one with the larger number wins. Let the -DMS (X, Y )
21.2 Distributed Coding for Computing 533

represent the numbers on Alice and Bob’s cards. Bob wishes to find who won the game,
i.e., д(X, Y ) = max{X, Y}. Since д(x, y) ̸= д(x 󳰀 , y) iff (x, y, x 󳰀 ) = (1, 2, 3) or (3, 2, 1), the
characteristic graph G has one edge (1, 3). Hence the triple (X, Y , G) is the same as in
Example . and the optimal rate is R∗д = (2/3)H(1/4).

Proof of Theorem .. To prove the theorem, we need to show that

min I(U ; X |Y ) = min I(W ; X |Y),

p(u|x) p(󰑤|x)

where the first minimum is over all conditional pmfs p(u|x) such that H(Z|U , Y) = 0 and
the second minimum is over all conditional pmfs p(󰑤|x) such that W ∈ Γ(G) and X ∈ W.
Consider p(󰑤|x) that achieves the second minimum. Since 󰑤 is a maximally independent
set with respect to the characteristic graph of (X, Y , д), for each y ∈ Y, д(x, y) is constant
for all x ∈ 󰑤 with p(x, y) > 0. Hence, д(x, y) can be uniquely determined from (󰑤, y)
whenever p(󰑤, x, y) > 0, or equivalently, H(д(X, Y)|W , Y ) = 0. By identifying U = W,
it follows that min p(u|x) I(U ; X|Y) ≤ min p(󰑤|x) I(W ; X|Y ).
To establish the other direction of the inequality, let p(u|x) be the conditional pmf
that attains the first minimum and define W = 󰑤(U ), where 󰑤(u) = {x : p(u, x) > 0}. If
p(󰑤, x) > 0, then there exists a u such that 󰑤(u) = 󰑤 and p(u, x) > 0, which implies that
x ∈ 󰑤. Thus, p(󰑤|x) = 0 if x ∉ 󰑤. Now suppose that x ∈ 󰑤 and p(󰑤) > 0. Then there
exists a u such that 󰑤(u) = 󰑤 and p(u, x) > 0. Hence by the Markovity of U → X → Y ,
p(x, y) > 0 implies that p(u, x, y) > 0. But since H(д(X, Y )|U , Y ) = 0, the pair (u, y)
must determine д(x, y) uniquely. Therefore, if x, x 󳰀 ∈ 󰑤 and p(x, y), p(x 󳰀 , y) > 0, then
д(x, y) = д(x 󳰀 , y), which implies that W is maximally independent. Finally, since W is a
function of U , I(U ; X|Y ) ≥ I(W ; X|Y) ≥ min p(󰑤|x) I(W ; X|Y ). This completes the proof
of Theorem ..

21.2 DISTRIBUTED CODING FOR COMPUTING

Consider the distributed communication system for computing depicted in Figure .
for a -DMS (X1 , X2 ) and distortion measure d(z, ẑ). Suppose that the decoder wishes
to compute a function Z = д(X1 , X2 ) of the -DMS with distortion D. The goal is to find
the rate–distortion region R д (D) for computing д.

X1n M1
Encoder 
(Ẑ n , D)
Decoder
X2n M2
Encoder 

Figure .. Distributed coding for computing.

534 Communication for Computing

A (2nR1 , 2nR2 , n) code, achievability, and the rate–distortion region R д (D) can be de-
fined as in Chapter . The rate–distortion region for this problem is not known in general.
In the following we discuss special cases.

21.2.1 μ-Sum Problem

Let (Y1 , Y2 ) be a -WGN(, ρ) source with ρ ∈ (0, 1) and μ = [μ1 μ2 ]T . The sources are
separately encoded with the goal of computing Z = μT Y at the decoder with a prescribed
mean squared error distortion D. The μ-sum rate–distortion region R μ (D) is character-
ized in the following.

Theorem .. Let μ1 μ2 = δ 2 s1 s2 /ρ > 0, where s1 = ρ(ρ + μ1 /μ2 )/(1 − ρ2 ), s2 = ρ(ρ +

μ2 /μ1 )/(1 − ρ2 ), and δ = 1/(1 + s1 + s2 ). The μ-sum rate–distortion region R μ (D) is
the set of rate pairs (R1 , R2 ) such that
−1
󶀡1 + s2 (1 − 2−2r2 )󶀱
R1 ≥ r 1 + R 󶀧 󶀷,
D+δ
−1
󶀡1 + s1 (1 − 2−2r1 )󶀱
R2 ≥ r 2 + R 󶀧 󶀷,
D+δ
1
R1 + R2 ≥ r1 + r2 + R 󶀤 󶀴
D+δ

for some r1 , r2 ≥ 0 that satisfy the condition D + δ ≥ (1 + s1 (1 − 2−2r1 ) + s2 (1 − 2−2r2 ))−1 .

Note that if (R1 , R2 ) is achievable with distortion D for μ, then for any b > 0, (R1 , R2 )
is also achievable with distortion b 2 D for bμ since
̃ 2 󶁱 ≤ b2 D
E󶁡(bμ1Y1 + bμ2Y2 − Z)
̂ 2 ] ≤ D, where Z̃ = b Ẑ is the estimate of bZ. Thus, this theorem
if E[(μ1Y1 + μ2 Y2 − Z)
characterizes the rate–distortion region for any μ with μ1 μ2 > 0.
Proof of Theorem .. We show that the μ-sum problem is equivalent to the quadratic
Gaussian CEO problem discussed in Section . and use this equivalence to find R μ (D).
Since Y1 and Y2 are jointly Gaussian, they can be expressed as

Y1 = a1 X + Z1 ,
Y2 = a2 X + Z2

for some random variables X ∼ N(0, 1), Z1 ∼ N(0, 1 − a12 ), and Z2 ∼ N(0, 1 − a22 ), inde-
pendent of each other, such that a1 , a2 ∈ (0, 1) and a1 a2 = ρ. Consider the MMSE estimate
of X given (Y1 , Y2 )
a − ρa2 a2 − ρa1
X̃ = E(X |Y1 , Y2 ) = 󶁡a1 a2 󶁱 KY−1 Y = 󶁥 1 󶁵 Y.
1 − ρ2 1 − ρ2
21.2 Distributed Coding for Computing 535

We now choose a j , j = 1, 2, such that X̃ = μT Y, i.e.,

a1 − ρa2
= μ1 ,
1 − ρ2
a2 − ρa1
= μ2 .
1 − ρ2

Solving for a1 , a2 , we obtain

ρ + μ1 /μ2 s1
a21 = ρ = ,
1 + ρμ1 /μ2 1 + s1
ρ + μ2 /μ1 s2
a22 = ρ = .
1 + ρμ2 /μ1 1 + s2

It can be readily checked that the constraint ρ = a1 a2 is equivalent to the normalization

ρ
μ1 μ2 =
(ρ + μ1 /μ2 )(ρ + μ2 /μ1 )

and that the corresponding mean squared error is

a1 − ρa2 a2 − ρa1 a21 + a22 − 2ρ2

̃ 2󶀱 = 1 − 󶀥
E󶀡(X − X) a + a 󶀵 = 1 − = δ.
1 − ρ2 1 1 − ρ2 2 1 − ρ2

Now let Z̃ = X − X,
̃ then for every U such that X → (Y1 , Y2 ) → U form a Markov chain,

E󶁡(X − E(X |U))2 󶁱 = E󶁡(μT Y + Z̃ − E(μT Y + Z̃ |U))2 󶁱

= δ + E󶁡(μT Y − E(μT Y|U))2 󶁱.

Therefore, every code that achieves distortion D in the μ-sum problem can be used to
achieve distortion D + δ in the CEO problem and vice versa.
The observations for the corresponding CEO problem are Y j /a j = X + Z̃ j , j = 1, 2,
where Z̃ j = Z j /a j ∼ N(0, 1/s j ) since s j = a2j /(1 − a2j ). Hence by Theorem ., the rate–
distortion region for the CEO problem with distortion D + δ is the set of rate pairs (R1 , R2 )
satisfying the inequalities in the theorem. Finally, by equivalence, this region is also the
rate–distortion region for the μ-sum problem, which completes the proof of Theorem ..
Remark .. When μ1 μ2 = 0, the μ-sum problem is equivalent to the quadratic Gaussian
distributed source coding discussed in Section . with D1 ≥ 1 or D2 ≥ 1, and the rate–
distortion region is given by Theorem . with proper normalization. Thus, for μ1 μ2 ≥ 0,
the Berger–Tung inner bound is tight. When μ1 μ2 < 0, however, it can be shown that the
Berger–Tung inner bound is not tight in general.
536 Communication for Computing

21.2.2 Distributed Lossless Computing

Consider the lossless special case of distributed coding for computing. As for the general
lossy case, the optimal rate region is not known in general. Consider the following simple
inner and outer bounds on R ∗д .
Inner bound. The Slepian–Wolf region consisting of all rate pairs (R1 , R2 ) such that

R1 ≥ H(X1 | X2 ),
R2 ≥ H(X2 | X1 ), (.)
R1 + R2 ≥ H(X1 , X2 )

constitutes an inner bound on the optimal rate region for any function д(X1 , X2 ).
Outer bound. Even when the receiver knows X1 , R2 ≥ H(Z|X1 ) is still necessary. Sim-
ilarly, we must have R1 ≥ H(Z|X2 ). These inequalities constitute an outer bound on the
optimal rate region for any function Z = д(X1 , X2 ) that consists of all rate pairs (R1 , R2 )
such that

R1 ≥ H(Z | X2 ),
(.)
R2 ≥ H(Z | X1 ).

The above simple inner bound is sometimes tight, e.g., when Z = (X1 , X2 ). The fol-
lowing is an interesting example in which the outer bound is tight.
Example . (Distributed computing of the modulo- sum of a DSBS). Let (X1 , X2 )
be a DSBS(p), that is, X1 ∼ Bern(1/2) and Z ∼ Bern(p) are independent and X2 = X1 ⊕ Z.
The decoder wishes to losslessly compute the modulo- sum of the two sources Z = X1 ⊕
X2 . The inner bound in (.) on the optimal rate region reduces to R1 ≥ H(p), R2 ≥ H(p),
R1 + R2 ≥ 1 + H(p), while the outer bound in (.) reduces to R1 ≥ H(p), R2 ≥ H(p).
We now show that the outer bound can be achieved using random linear codes! Ran-
domly generate k × n binary parity-check matrix H with i.i.d. Bern(1/2) entries. Suppose
that encoder  sends the binary k-vector H X1n and encoder  sends the binary k-vector
H X2n . Then the receiver adds the two binary k-vectors to obtain H X1n ⊕ H X2n = H Z n .
Following similar steps to the random linear binning achievability proof of the lossless
source coding theorem in Remark ., it can be readily shown that the probability of
decoding error averaged over the random encoding matrix H tends to zero as n → ∞ if
k/n > H(p) + δ(є).

Remark .. The outer bound in (.) can be tightened by using Theorem . twice,
once when the receiver knows X1 and a second time when it knows X2 . Defining the
characteristic graphs G1 for (X1 , X2 , д) and G2 for (X2 , X1 , д) as before, we obtain the
tighter outer bound consisting of all rate pairs (R1 , R2 ) such that

R1 ≥ HG1 (X1 | X2 ),
R2 ≥ HG2 (X2 | X1 ).
21.3 Interactive Coding for Computing 537

21.3 INTERACTIVE CODING FOR COMPUTING

Consider the two-way communication system for computing depicted in Figure . for
a -DMS (X1 , X2 ) and two distortion measures d1 and d2 . Node  wishes to reconstruct
a function Z1 = д1 (X1 , X2 ) and node  wishes to reconstruct a function Z2 = д2 (X1 , X2 )
with distortions D1 and D2 , respectively. The goal is to find the rate–distortion region
R(D1 , D2 ) defined as in Section ...

X1n M l (X1n , M l−1 ) X2n

Ẑ1n Node  M l+1 (X2n , M l ) Node  Ẑ2n

Figure .. Two-way coding for computing.

We have the following simple bounds on the rate–distortion region.

Inner bound. Using two independent rounds of Wyner–Ziv coding, we obtain the inner
bound consisting of all rate pairs (R1 , R2 ) such that
R1 > I(X1 ; U1 | X2 ),
(.)
R2 > I(X2 ; U2 | X1 )
for some conditional pmf p(u1 |x1 )p(u2 |x2 ) and functions ẑ1 (u1 , x2 ) and ẑ2 (u2 , x1 ) that
satisfy the constraints E(d j (Z j , Ẑ j )) ≤ D j , j = 1, 2.
Outer bound. Even if each encoder knows the source at the other node, every achievable
rate pair must satisfy the inequalities
R1 ≥ I(Z2 ; Ẑ2 | X2 ),
(.)
R2 ≥ I(Z1 ; Ẑ1 | X1 )

for some conditional pmf p(̂z1 |x1 , x2 )p(̂z2 |x1 , x2 ) such that E(d j (Z j , Ẑ j )) ≤ D j , j = 1, 2.
These bounds are sometimes tight.
Example .. Let (X1 , X2 ) be a -WGN(P, ρ) source and Z1 = Z2 = Z = (X1 + X2 )/2.
The rate–distortion region for mean squared error distortion D1 = D2 = D is the set of
rate pairs (R1 , R2 ) such that
(1 − ρ2 )P
R1 ≥ R 󶀦 󶀶,
4D
(1 − ρ2 )P
R2 ≥ R 󶀦 󶀶.
4D
It is easy to see that this region coincides with both the inner bound in (.) and the outer
bound in (.). Note that this problem is equivalent (up to scaling) to the two-way lossy
source coding problem for a -WGN source in Example ..
538 Communication for Computing

The bounds in (.) and (.) are not tight in general. Following Theorem . in
Section .., we can readily establish the rate–distortion region for q rounds of com-
munication.

Theorem .. The q-round rate–distortion region Rq (D1 , D2 ) for computing func-
tions Z1 and Z2 with distortion pair (D1 , D2 ) is the set of rate pairs (R1 , R2 ) such that

R1 ≥ I(X1 ; U q | X2 ),
R2 ≥ I(X2 ; U q | X1 )
q
for some conditional pmf ∏ l=1 p(u l |u l−1 , x j󰑙 ) and functions ẑ1 (uq , x1 ) and ẑ2 (uq , x2 )
that satisfy the constraints E(d j (Z j , Ẑ j )) ≤ D j , j = 1, 2, where j l = 1 if l is odd and
j l = 2 if l is even.

Note that the above region is computable for every given q. However, there is no bound
on q in general.

21.3.1 Interactive Coding for Lossless Computing

Now consider the two-way coding for computing setting in which node  wishes to re-
cover Z1 (X1 , X2 ) and node  wishes to recover Z2 (X1 , X2 ) losslessly. As we have seen in
Section ., when Z1 = X2 and Z2 = X1 , the optimal rate region is achieved by two in-
dependent rounds of Slepian–Wolf coding. For arbitrary functions, Theorem . for the
lossy setting can be specialized to yield the following characterization of the optimal rate
region for q rounds.

Theorem .. The optimal q-round rate region Rq∗ for lossless computing of the func-
tions Z1 and Z2 is the set of rate pairs (R1 , R2 ) such that

R1 ≥ I(X1 ; U q | X2 ),
R2 ≥ I(X2 ; U q | X1 )

for some conditional pmf ∏ l=1 p(u l |u l−1 , x j󰑙 ) that satisfy H(д1 (X1 , X2 )|X1 , U q ) = 0
and H(д2 (X1 , X2 )|X2 , U q ) = 0, where j l = 1 if l is odd and j l = 2 if l is even.

In some cases, two independent rounds are sufficient, for example, if Z1 = X2 and
Z2 = X1 , or if (X1 , X2 ) is a DSBS and Z1 = Z2 = X1 ⊕ X2 . In general, however, interactivity
can help reduce the transmission rate.
Assume that Z2 =  and relabel Z1 as Z. Consider two rounds of communication. In
the first round, node  sends a message to node  that depends on X1 and in the second
round, node  sends a second message to node  that depends on X2 and the first message.
21.4 Cascade Coding for Computing 539

By Theorem ., the optimal rate region for two rounds of communication is the set of
rate pairs (R1 , R2 ) such that

R1 ≥ I(U1 ; X1 | X2 ),
R2 ≥ I(U2 ; X2 |U1 , X1 )

for some conditional pmf p(u1 |x1 )p(u2 |u1 , x2 ) that satisfies H(Z|U1 , U2 , X1 ) = 0.
The following example shows that two rounds can achieve strictly lower rates than one
round.
Example .. Let X1 ∼ Bern(p), p ≪ 1/2, and X2 ∼ Bern(1/2) be two independent
sources, and Z = X1 ⋅ X2 . The minimum rate for one round of communication from
node  to node  is R(1) = 1 bit/symbol.
Consider the following two-round coding scheme. In the first round, node  uses
Slepian–Wolf coding to communicate X1 losslessly to node  at rate R1 > H(p). In the
second round, node  sets Y = X2 if X1 = 1 and Y = e if X1 = 0. It then uses Slepian–Wolf
coding to send Y losslessly to node  at rate R2 > H(Y|X1 ) = p. Thus the sum-rate for the
two rounds is
(2)
Rsum = H(p) + p < 1

for p sufficiently small.

The sum-rate can be reduced further by using more than two rounds.
Example .. Let (X1 , X2 ) be a DSBS(p), Z1 = Z2 = X1 ⋅ X2 , and let d1 and d2 be Ham-
ming distortion measures. For two rounds of communication a rate pair (R1 , R2 ) is achiev-
able if R1 > HG (X1 |X2 ) = H(X1 |X2 ) = H(p) and R2 > H(Z1 |X1 ) = (1/2)H(p). Optimality
follows by Theorem . for the bound on R1 and by the outer bound for the lossy case
in (.) with Hamming distortion D = 0 for the bound on R2 .
Consider the following -round coding scheme. Set the auxiliary random variables
in Theorem . as U1 = (1 − X1 ) ⋅ W, where W ∼ Bern(1/2) is independent of X1 , U2 =
X2 ⋅ (1 − U1 ), and U3 = X1 ⋅ U2 . Since U3 = X1 ⋅ X2 , both nodes can compute the product
losslessly. This gives the upper bound on the sum-rate

(3) 5 1 1−p 1−p 3

Rsum < H(p) + H 󶀤 󶀴− < H(p).
4 2 2 2 2

Note that the sum-rate can be reduced further by using more rounds.

21.4 CASCADE CODING FOR COMPUTING

Consider the cascade communication system for computing depicted in Figure . for a
-DMS (X1 , X2 ) and distortion measure d(z, ẑ). The decoder wishes to reconstruct the
function Z = д(X1 , X2 ) with prescribed distortion D. The definitions of a (2nR1 , 2nR2 , n)
code, achievability, and rate–distortion region are similar to those for the cascade multiple
540 Communication for Computing

X1n X2n
M1 M2 Ẑ n
Encoder  Encoder  Decoder

Figure .. Cascade coding for computing.

description network in Section .. Again we wish to find the rate–distortion region
R д (D).
The rate–distortion region for this problem is not known in general, even when X1
and X2 are independent! In the following, we discuss inner and outer bounds on the
rate–distortion region.
Cutset outer bound. If a rate pair (R1 , R2 ) is achievable with distortion D for cascade
coding for computing, then it must satisfy the inequalities

R1 ≥ I(X1 ; U | X2 ),
(.)
̂
R2 ≥ I(Z; Z)

̂ ≤ D.
for some conditional pmf p(u|x1 )p(̂z |x2 , u) such that E(d(Z, Z))
Local computing inner bound. Encoder  uses Wyner–Ziv coding to send a description
U of its source X1 to encoder  at rate R1 > I(X1 ; U|X2 ). Encoder  sends the estimate Ẑ of
̂ to the decoder. This gives the local computing
Z based on (U , X2 ) at rate R2 > I(X2 , U ; Z)
inner bound on the rate distortion region R д (D) that consists of all rate pairs (R1 , R2 )
such that

R1 > I(X1 ; U | X2 ),
(.)
̂
R2 > I(X2 , U ; Z)

̂ ≤ D.
for some conditional pmf p(u|x1 )p(̂z |x2 , u) that satisfies the constraint E(d(Z, Z))
The local computing inner bound coincides with the cutset bound in (.) for the
special case of lossless computing of Z. To show this, let d be a Hamming distortion
measure and consider the case of D = 0. Then the inner bound in (.) simplifies to the
set of rate pairs (R1 , R2 ) such that

R1 > I(X1 ; U | X2 ),
R2 > H(Z)

for some conditional pmf p(u|x1 ) that satisfies the condition H(Z|U , X2 ) = 0. Now, con-
sider the cutset bound in . and set Ẑ = Z. Since Ẑ → (X2 , U) → X1 form a Markov
chain and Z is a function of (X1 , X2 ), we must have H(Z|U , X2 ) = 0. As in Theorem .,
the inequality R1 > I(X1 ; U |X2 ) can be further refined to the conditional graph entropy
HG (X1 |X2 ).
21.4 Cascade Coding for Computing 541

Forwarding inner bound. Consider the following alternative coding scheme for cascade
coding for computing. Encoder  again sends a description U1 of its source X1 at rate
R1 > I(X1 ; U1 |X2 ). Encoder  forwards the description from encoder  together with a
description U2 of its source X2 given U1 at total rate R2 > I(X1 ; U1 ) + I(X2 ; U2 |U1 ). The
decoder then computes the estimate Z(U ̂ 1 , U2 ). This yields the forwarding inner bound
on the rate–distortion region R д (D) that consists of all rate pairs (R1 , R2 ) such that

R1 > I(X1 ; U1 | X2 ),
(.)
R2 > I(U1 ; X1 ) + I(U2 ; X2 |U1 )

for some conditional pmf p(u1 |x1 )p(u2 |x2 , u1 ) and function ẑ(u1 , u2 ) that satisfy the con-
̂ ≤ D.
straint E(d(Z, Z))
It is straightforward to show that forwarding is optimal when X1 and X2 are indepen-
dent and Z = (X1 , X2 ). The lossless computing example shows that local computing can
outperform forwarding. The following is another example where the same conclusion
holds.
Example .. Let X1 and X2 be independent WGN(1) sources, Z = X1 + X2 , and d be
a squared error distortion measure. Consider the forwarding inner bound in (.). It
can be shown that the minimum sum-rate for this region is attained by setting U1 = X̂ 1 ,
U2 = X̂ 2 , and independent Gaussian test channels. This yields the forwarding sum-rate

1 1 1
RF = R1 + R2 = 2 R 󶀥 󶀵 + R󶀥 󶀵 = R󶀦 2 󶀶
D1 D2 D1 D2

such that D1 + D2 ≤ D. Optimizing over D1 , D2 yields the minimum forwarding sum-rate

27
RF∗ = R 󶀤 󶀴,
4D 3
which is achieved for D1 = 2D2 = 2D/3.
Next consider the local computing inner bound in (.). Let U = X1 + W1 , V =
E(X1 |U ) + X2 + W2 , and Ẑ = E(Z|V ), where W1 ∼ N(0, D1 /(1 − D1 )) and W2 ∼ N(0, (2 −
D1 )D2 /(2 − D1 − D2 )) are independent of each other and of (X1 , X2 ). The local computing
sum-rate for this choice of test channels is

1 2 − D1
RLC = R1 + R2 = R 󶀥 󶀵 +R󶀥 󶀵
D1 D2

subject to the distortion constraint D1 + D2 ≤ D. Optimizing over D1 , D2 , we obtain

∗ 1
RLC = R󶀦 2
󶀶 < RF∗
2 󶀢1 − 󵀄1 − D/2󶀲

for all D ∈ (0, 1]. Thus local computing can outperform forwarding.
542 Communication for Computing

Combined inner bound. Local computing and forwarding can be combined as follows.
Encoder  uses Wyner–Ziv coding to send the description pair (U , V ) of X1 to encoder .
Encoder  forwards the description V and the estimate Ẑ based on (U , V , X2 ) to the de-
coder. This yields the inner bound on the rate–distortion region R д (D) that consists of
all rate pairs (R1 , R2 ) such that

R1 > I(X1 ; U , V | X2 ),
R2 > I(X1 ; V ) + I(X2 , U ; Ẑ |V )

̂ ≤ D. This inner
for some conditional pmf p(u, 󰑣|x1 )p(̂z |x2 , u, 󰑣) that satisfies E(d(Z, Z))
bound, however, does not coincide with the cutset bound in (.) in general.
Remark . (Cascade versus distributed coding for computing). The optimal rate re-
gion for lossless computing is known for the cascade coding case but is not known in gen-
eral for the distributed coding case. This suggests that the cascade source coding prob-
lem is more tractable than distributed source coding for computing. This is not always
the case, however. For example, while the rate–distortion region for Gaussian sources is
known for the μ-sum problem, it is not known for the cascade case even when X1 and X2
are independent.

21.5 DISTRIBUTED LOSSY AVERAGING

Distributed averaging is a canonical example of distributed consensus, which has been

extensively studied in control and computer science. The problem arises, for example,
in distributed coordination of autonomous agents, distributed computing in sensor net-
works, and statistics collection in peer-to-peer networks. We discuss a lossy source coding
formulation of this problem.
Consider a network modeled by a graph with N nodes and a set of undirected edges
representing noiseless bidirectional links. Node j ∈ [1 : N] observes an independent
WGN(1) source X j . Each node wishes to estimate the average Z = (1/N) ∑Nj=1 X j to
the same prescribed mean squared error distortion D. Communication is performed in
q rounds. In each round an edge (node-pair) is selected and the two nodes communicate
over a noiseless bidirectional link in several subrounds using block codes as in two-way
coding for lossy computing in Section .. Let r l be the total communication rate in
q
round l ∈ [1 : q] and define the network sum-rate as R = ∑ l=1 r l . A rate–distortion pair
(R, D) is said to be achievable if there exist a number of rounds q and associated sequence
of edge selections and codes such that
n
1
lim sup 󵠈 E󶀡( Ẑ ji − Zi )2 󶀱 ≤ D for all j ∈ [1 : N],
n→∞ n i=1

where Zi = (1/N) ∑Nj=1 X ji and Ẑ ji is the estimate of Zi at node j. The network rate–
distortion function R(D) is the infimum of network sum-rates R such that (R, D) is achiev-
able.
21.5 Distributed Lossy Averaging 543

Note that R(D) = 0 if D ≥ (N − 1)/N 2 , and is achieved by the estimates Ẑ ji = X ji /N for

j ∈ [1 : N] and i ∈ [1 : n]. Also, R(D) is known completely for N = 2; see Example ..
The network rate–distortion function is not known in general, however. Consider the
following upper and lower bounds.

Theorem . (Cutset Bound for Distributed Lossy Averaging). The network rate–
distortion function for the distributed lossy averaging problem is lower bounded as

N −1
R(D) ≥ N R 󶀤 󶀴.
N 2D

This cutset bound is established by considering the information flow into each node
from the rest of the nodes when they are combined into a single “supernode.” The bound
turns out to be achievable within a factor of two for sufficiently large N and D < 1/N 2 for
every connected network that contains a star subnetwork, that is, every network having a
node with N − 1 edges.

Theorem .. The network rate–distortion function for a network with star subnet-
work is upper bounded as

2N − 3
R(D) ≤ 2(N − 1) R 󶀤 󶀴
N 2D

for D < (N − 1)/N 2 .

To prove this upper bound, suppose that there are (N − 1) edges between node  and
nodes 2, . . . , N. Node j ∈ [2 : N] sends the description of the Gaussian reconstruction X̂ j
of its source X j to node  at a rate higher than I(X j ; X̂ j ) = R(1/D 󳰀 ), where D 󳰀 ∈ (0, 1) is
determined later. Node  computes the estimates
N
1 1
Ẑ1 = X1 + 󵠈 X̂ j ,
N N j=2
1
U j = Ẑ1 − X̂ j ,
N

and sends the description of the Gaussian reconstruction Û j of U j to node j ∈ [2 : N] at

a rate higher than
1
I(U j ; Û j ) = R 󶀤 󳰀 󶀴 .
D

Node j ∈ [2 : N] then computes the estimate Ẑ j = (1/N)X j + Û j . The corresponding dis-

tortion is
N 2
1 N −1 󳰀
D1 = E 󶁡( Ẑ1 − Z)2 󶁱 = E 󶁤󶀤 󵠈 (X j − X̂ j )󶀴 󶁴 = D,
N j=2 N2
544 Communication for Computing

and for j ∈ [2 : N]

D j = E 󶁡( Ẑ j − Z)2 󶁱
2
1
= E 󶁤󶀤(Û j − U j ) + 󵠈 ( X̂ − Xk )󶀴 󶁴
N k̸=1, j k
1 N −2 N −2 󳰀
=󶀤 + (1 − D 󳰀 )󶀴 D 󳰀 + D
N2 N2 N2
2N − 3 󳰀
≤ D.
N2

Now choose D 󳰀 = N 2 D/(2N − 3) to satisfy the distortion constraint. Then every sum-
rate higher than
1 2N − 3
2(N − 1) R 󶀤 󳰀 󶀴 = 2(N − 1) R 󶀤 2 󶀴
D N D

is achievable. This completes the proof of Theorem ..

21.6 COMPUTING OVER A MAC

As we discussed in Chapter , source–channel separation holds for communicating in-

dependent sources over a DM-MAC, but does not necessarily hold when the sources are
correlated. We show through an example that even when the sources are independent,
separation does not necessarily hold when the goal is to compute a function of the sources
over a DM-MAC.
Let (U1 , U2 ) be a pair of independent Bern(1/2) sources and suppose that the function
Z = U1 ⊕ U2 is to be computed over a modulo- sum DM-MAC followed by a BSC(p) as
depicted in Figure .. Sender j = 1, 2 encodes the source sequence U jk into a channel
input sequence X nj . Hence the transmission rate is r = k/n bits per transmission. We wish
to find the necessary and sufficient condition for computing Z losslessly.

U1k X1n Bern(p)

Encoder 
Yn Ẑ k
Decoder
U2k X2n
Encoder 

Figure .. Computing over a MAC example.

If we use separate source and channel coding, Z = U1 ⊕ U2 can be computed losslessly

Summary 545

over the DM-MAC at rate r if

rH(U1 ) < I(X1 ; Y | X2 , Q),

rH(U2 ) < I(X2 ; Y | X1 , Q),
r(H(U1 ) + H(U2 )) < I(X1 , X2 ; Y |Q)

for some pmf p(q)p(x1 |q)p(x2 |q). This reduces to r(H(U1 ) + H(U2 )) = 2r < 1 − H(p).
Thus, a rate r is achievable via separate source and channel coding if r < (1 − H(p))/2. In
the other direction, considering the cut at the input of the BSC part of the channel yields
the upper bound
rH(U1 ⊕ U2 ) ≤ H(X1 ⊕ X2 ) ≤ 1 − H(p),

or equivalently, r ≤ 1 − H(p).
We now show that the upper bound is achievable using random linear codes as in
Example ..
Codebook generation. Randomly and independently generate an l × k binary matrix A
and an n × l binary matrix B, each with i.i.d. Bern(1/2) entries.
Encoding. Encoder  sends X1n = BAU1k and encoder  sends X2n = BAU2k .
Decoding. The receiver first decodes for A(U1k ⊕ U2k ) and then decodes for Z k = U1k ⊕ U2k .
If l < n(1 − H(p) − δ(є)), the receiver can recover A(U1k ⊕ U2k ) with probability of error
(averaged over the random matrix B) that tends to zero as n → ∞; see Section ... In
addition, if l > k(1 + δ(є)), then the receiver can recover Z k with probability of error
(averaged over the random matrix A) that tends to zero as n → ∞. Therefore, r < 1 −
H(p) is achievable using joint source–channel coding.

SUMMARY

∙ Information theoretic framework for distributed computing

∙ Optimal rate for lossless computing with side information via conditional graph en-
tropy
∙ μ-sum problem is equivalent to the quadratic Gaussian CEO problem
∙ Using the same linear code can outperform random codes for distributed lossless com-
puting
∙ Interaction can reduce the rates in lossless coding for computing
∙ Cascade coding for computing:
∙ Local computing
∙ Forwarding
∙ Neither scheme outperforms the other in general
546 Communication for Computing

∙ Information theoretic formulation of the distributed averaging problem

∙ Source–channel separation is suboptimal for computing even for independent sources
∙ Open problems:
21.1. What is the optimal rate region for distributed lossless computing of X1 ⋅ X2 ,
where (X1 , X2 ) is a DSBS?
21.2. What is the rate–distortion region of cascade coding for computing the sum of
independent WGN sources?

BIBLIOGRAPHIC NOTES

The rate–distortion function for lossy computing with side information in Theorem .
was established by Yamamoto (). The refined characterization for the lossless case in
Theorem . using conditional graph entropy and Example . are due to Orlitsky and
Roche (). The rate–distortion region for the μ-sum in Theorem . was established
by Wagner, Tavildar, and Viswanath (). Krithivasan and Pradhan () showed that
lattice coding can outperform Berger–Tung coding when μ1 μ2 < 0. Han and Kobayashi
() established necessary and sufficient conditions on the function д(X1 , X2 ) such that
the Slepian–Wolf region in (.) is optimal for distributed lossless computing. The loss-
less computing problem of the modulo- sum of a DSBS in Example . is due to Körner
and Marton (). Their work has motivated the example of distributed computing over
a MAC in Section . by Nazer and Gastpar (a) as well as coding schemes for Gauss-
ian networks that use structured, instead of random, coding (Nazer and Gastpar b,
Wilson, Narayanan, Pfister, and Sprintson ).
Two-way coding for computing is closely related to communication complexity intro-
duced by Yao (). In this setup each node has a value drawn from a finite set and both
nodes wish to compute the same function with no errors. Communication complexity is
measured by the minimum number of bits that need to be exchanged. Work in this area
is described in Kushilevitz and Nisan (). Example . is due to Su and El Gamal
(). The optimal two-round rate region for lossless computing in Theorem . was
established by Orlitsky and Roche (), who also demonstrated the benefit of interac-
tion via Example .. The optimal rate region for an arbitrary number of rounds was
established by Ma and Ishwar (), who showed that three rounds can strictly out-
perform two rounds via Example .. They also characterized the optimal rate region
R ∗ = ⋃q Rq∗ for this example. Cascade coding for computing in Section . was studied
by Cuff, Su, and El Gamal (), who established the local computing, forwarding, and
combined inner bounds.
The distributed averaging problem has been extensively studied in control and com-
puter science, where each node is assumed to have a nonrandom real number and wishes
to estimate the average by communicating synchronously, that is, in each round the nodes
communicate with their neighbors and update their estimates simultaneously, or asyn-
chronously, that is, in each round a randomly selected subset of the nodes communicate
Problems 547

and update their estimates. The cost of communication is measured by the number of
communication rounds needed. This setup was further extended to quantized averaging,
where the nodes exchange quantized versions of their observations. Results on these for-
mulations can be found, for example, in Tsitsiklis (), Xiao and Boyd (), Boyd,
Ghosh, Prabhakar, and Shah (), Xiao, Boyd, and Kim (), Kashyap, Başar, and
Srikant (), Nedić, Olshevsky, Ozdaglar, and Tsitsiklis (), and survey papers by
Olfati-Saber, Fax, and Murray () and Shah (). The information theoretic treat-
ment in Section . is due to Su and El Gamal (). Another information theoretic
formulation can be found in Ayaso, Shah, and Dahleh ().

PROBLEMS

.. Evaluate Theorem . to show that the rate–distortion function in Example . is
R д (D) = R((1 − ρ2 )P/4D).
.. Establish the upper and lower bounds on the optimal rate for lossless computing
with side information in (.).
.. Evaluate Theorem . to show that the optimal rate in Example . is R ∗д = 10.
.. Prove achievability of the combined inner bound in Section ..
.. Prove the cutset bound for distributed lossy averaging in Theorem ..
.. Cutset bound for distributed lossy averaging over tree networks. Show that the cutset
bound in Theorem . can be improved for a tree network to yield

N −1 1
R(D) ≥ log 󶀤 3 2 󶀴 .
2N 2N D
(Hint: In a tree, removing an edge partitions the network into two disconnected
subnetworks. Hence, we can use the cutset bound for two-way quadratic Gaussian
lossy coding.)
.. Compute–compress inner bound. Consider the lossy source coding over a noiseless
two-way relay channel in Problem ., where (X1 , X2 ) is a DSBS(p), and d1 and
d2 are Hamming distortion measures. Consider the following compute–compress
coding scheme. Node j = 1, 2 sends the index M j to node  at rate R j so that node 
can losslessly recover their mod- sum Z n = X1n ⊕ X2n . The relay then performs
lossy source coding on Z n and sends the index M3 at rate R3 to nodes  and .
Node  first reconstructs an estimate Ẑ1n of Z n and then computes the estimate
X̂ 2n = X1n ⊕ Ẑ1n of X2n . Node  computes the reconstruction X̂ 1n similarly.
(a) Find the inner bound on the rate–distortion region achieved by this coding
scheme. (Hint: For lossless computing of Z n at node , use random linear
binning as in Example .. For lossy source coding of Z n at nodes  and ,
use the fact that Z n is independent of X1n and of X2n .)
548 Communication for Computing

(b) Show that the inner bound in part (a) is strictly larger than the inner bound in
part (b) of Problem . that uses two rounds of Wyner–Ziv coding.
(c) Use this compute–compress idea to establish the inner bound on the rate–
distortion region for a general -DMS (X1 , X2 ) and distortion measures d1 , d2
that consists of all rate triples (R1 , R2 , R3 ) such that

for some conditional pmf p(q)p(u1 |x1 , q)p(u2 |x2 , q)p(󰑤|z, q) and functions
z(x1 , x2 ), x̂1 (󰑤, x2 ), and x̂2 (󰑤, x1 ) that satisfy the constraints H(V |U1 , U2 ) = 0
and

E󶁡d1 (X1 , X̂ 1 (W , X2 ))󶁱 ≤ D1 ,

E󶁡d2 (X2 , X̂ 2 (W , X1 ))󶁱 ≤ D2 .
CHAPTER 22

Information Theoretic Secrecy

Confidentiality of information is a key consideration in many networking applications,

including e-commerce, online banking, and intelligence operations. How can informa-
tion be communicated reliably to the legitimate users, while keeping it secret from eaves-
droppers? How does such a secrecy constraint on communication affect the limits on
information flow in the network?
In this chapter, we study these questions under the information theoretic notion of
secrecy, which requires each eavesdropper to obtain essentially no information about the
messages sent from knowledge of its received sequence, the channel statistics, and the
codebooks used. We investigate two approaches to achieve secure communication. The
first is to exploit the statistics of the channel from the sender to the legitimate receivers and
the eavesdroppers. We introduce the wiretap channel as a -receiver broadcast channel
with a legitimate receiver and an eavesdropper, and establish its secrecy capacity, which is
the highest achievable secret communication rate. The idea is to design the encoder so that
the channel from the sender to the receiver becomes effectively stronger than the channel
to the eavesdropper; hence the receiver can recover the message but the eavesdropper
cannot. This wiretap coding scheme involves multicoding and randomized encoding.
If the channel from the sender to the receiver is weaker than that to the eavesdropper,
however, secret communication at a positive rate is not possible. This brings us to the
second approach to achieve secret communication, which is to use a secret key shared
between the sender and the receiver but unknown to the eavesdropper. We show that the
rate of such secret key must be at least as high as the rate of the confidential message. This
raises the question of how the sender and the receiver can agree on such a long secret
key in the first place. After all, if they had a confidential channel with sufficiently high
capacity to communicate the key, then why not use it to communicate the message itself!
We show that the sender and the receiver can still agree on a secret key even when the
channel has zero secrecy capacity if they have access to correlated sources (e.g., through
a satellite beaming common randomness to both of them). We first consider the source
model for key agreement, where the sender communicates with the receiver over a noise-
less public broadcast channel to generate a secret key from their correlated sources. We
establish the secret key capacity from one-way public communication. The coding scheme
involves the use of double random binning to generate the key while keeping it secret from
the eavesdropper. We then obtain upper and lower bounds on the secret key capacity for
multiple rounds of communication and show that interaction can increase the key rate.
550 Information Theoretic Secrecy

As a more general model for key agreement, we consider the channel model in which
the sender broadcasts a random sequence over a noisy channel to generate a correlated
output sequence at the receiver (and unavoidably another correlated sequence at the
eavesdropper). The sender and the receiver then communicate over a noiseless public
broadcast channel to generate the key from these correlated sequences as in the source
model. We illustrate this setting via Maurer’s example in which the channel to the receiver
is a degraded version of the channel to the eavesdropper and hence no secret communica-
tion is possible without a key. We show that the availability of correlated sources through
channel transmissions, however, makes it possible to generate a secret key at a positive
rate.

22.1 WIRETAP CHANNEL

Consider the point-to-point communication system with an eavesdropper depicted in

Figure .. We assume a discrete memoryless wiretap channel (DM-WTC) (X , p(y, z|x),
Y × Z) with sender X, legitimate receiver Y , and eavesdropper Z. The sender X (Alice)
wishes to communicate a message M to the receiver Y (Bob) while keeping it secret from
the eavesdropper Z (Eve).
A (2nR , n) secrecy code for the DM-WTC consists of
∙ a message set [1 : 2nR ],
∙ a randomized encoder that generates a codeword X n (m), m ∈ [1 : 2nR ], according to a
conditional pmf p(x n |m), and
̂ ∈ [1 : 2nR ] or an error message e to each received
∙ a decoder that assigns an estimate m
sequence y n ∈ Y n .
The message M is assumed to be uniformly distributed over the message set. The infor-
mation leakage rate associated with the (2nR , n) secrecy code is defined as

1
RL(n) = I(M; Z n ).
n

The average probability of error for the secrecy code is defined as Pe(n) = P{M ̂ ̸= M}. A
rate–leakage pair (R, RL ) is said to be achievable if there exists a sequence of (2 , n) codes
nR

Yn ̂
M
Decoder
M Xn
Encoder p(y, z|x)
Zn
Eavesdropper

Figure .. Point-to-point communication system with an eavesdropper.

22.1 Wiretap Channel 551

such that

lim Pe(n) = 0,
n→∞

lim sup RL(n) ≤ RL .

n→∞
∗
The rate–leakage region R is the closure of the set of achievable rate–leakage pairs
(R, RL ). As in the DM-BC (see Lemma .), the rate–leakage region depends on the chan-
nel conditional pmf p(y, z|x) only through the conditional marginals p(y|x) and p(z|x).
We focus mainly on the secrecy capacity CS = max{R : (R, 0) ∈ R ∗ }.
Remark .. As we already know, using a randomized encoder does not help communi-
cation when there is no secrecy constraint; see Problem .. We show that under a secrecy
constraint, however, a randomized encoder can help the sender hide the message from the
eavesdropper; hence it can increase the secret communication rate.

The secrecy capacity has a simple characterization.

Theorem .. The secrecy capacity of the DM-WTC is

CS = max 󶀡I(U ; Y) − I(U ; Z)󶀱,

p(u,x)

where |U| ≤ |X |.

If the channel to the eavesdropper is a degraded version of the channel to the receiver,
i.e., p(y, z|x) = p(y|x)p(z|y),

I(U ; Y) − I(U ; Z) = I(U ; Y |Z) ≤ I(X; Y |Z) = I(X; Y) − I(X; Z). (.)

Hence, the secrecy capacity simplifies to

CS = max󶀡I(X; Y ) − I(X; Z)󶀱. (.)

p(x)

More generally, this capacity expression holds if Y is more capable than Z, i.e., I(X; Y) ≥
I(X; Z) for all p(x). To show this, note that

I(U ; Y) − I(U ; Z) = I(X; Y) − I(X; Z) − (I(X; Y |U) − I(X; Z |U ))

≤ I(X; Y) − I(X; Z),

since the more capable condition implies that I(X; Y|U) − I(X; Z|U) ≥ 0 for all p(u, x).
To illustrate Theorem . and the special case in (.), consider the following.

Example 22.1 (Binary symmetric wiretap channel). Consider the BS-WTC with chan-
nel outputs Y = X ⊕ Z1 and Z = X ⊕ Z2 , where Z1 ∼ Bern(p1 ) and Z2 ∼ Bern(p2 ). The
secrecy capacity of this BS-WTC is
+
CS = 󶁡H(p2 ) − H(p1 )󶁱 ,
552 Information Theoretic Secrecy

where [a]+ = max{0, a}. If p1 ≥ p2 , the eavesdropper can recover any message intended
for the receiver; thus CS = 0. To prove the converse, we set X ∼ Bern(p) in (.) and
consider

I(X; Y) − I(X; Z) = H(p2 ) − H(p1 ) − (H(p ∗ p2 ) − H(p ∗ p1 )). (.)

For p < 1/2, the function p ∗ u is monotonically increasing in u ∈ [0, 1/2]. This implies
that H(p ∗ p2 ) − H(p ∗ p1 ) ≥ 0. Hence, setting p = 1/2 optimizes the expression in (.)
and CS can be achieved using binary symmetric random codes.
Example 22.2 (Gaussian wiretap channel). Consider the Gaussian WTC with outputs
Y = X + Z1 and Z = X + Z2 , where Z1 ∼ N(0, N1 ) and Z2 ∼ N(0, N2 ). Assume an almost-
sure average power constraint
n
P󶁄󵠈 Xi2 (m) ≤ nP󶁔 = 1, m ∈ [1 : 2nR ].
i=1

The secrecy capacity of the Gaussian WTC is

P P +
CS = 󶁤C 󶀤 󶀴 − C󶀤 󶀴󶁴 .
N1 N2
To prove the converse, assume without loss of generality that the channel is physically
degraded and Z2 = Z1 + Z2󳰀 , where Z2󳰀 ∼ N(0, N2 − N1 ). Consider

1 N
I(X; Y) − I(X; Z) = log 󶀥 2 󶀵 − (h(Z) − h(Y )).
2 N1
By the entropy power inequality (EPI),

h(Z) − h(Y ) = h(Y + Z2󳰀 ) − h(Y)

1 󳰀
≥ log(22h(Z2 ) + 22h(Y) ) − h(Y )
2
1
= log(2πe(N2 − N1 ) + 22h(Y) ) − h(Y ).
2

Now since the function (1/2) log(2πe(N2 − N1 ) + 22u ) − u is monotonically increasing in

u and h(Y) ≤ (1/2) log(2πe(P + N1 )),
1 1
h(Z) − h(Y ) ≥ log󶀡2πe(N2 − N1 ) + 2πe(P + N1 )󶀱 − log(2πe(P + N1 ))
2 2
1 P + N2
= log 󶀥 󶀵.
2 P + N1
Hence
1 N 1 P + N2 P P
CS ≤ log 󶀥 2 󶀵 − log 󶀥 󶀵 = C󶀥 󶀵 − C󶀥 󶀵,
2 N1 2 P + N1 N1 N2
which is attained by X ∼ N(0, P).
22.1 Wiretap Channel 553

Example . can be further extended to the vector case, which is not degraded in
general.
Example . (Gaussian vector wiretap channel). Consider the Gaussian vector WTC
Y = G1 X + Z1 ,
Z = G2 X + Z2 ,
where Z1 ∼ N(0, I) and Z2 ∼ N(0, I), and assume almost-sure average power constraint
P on X. It can be shown that the secrecy capacity is

󶀤 log 󵄨󵄨󵄨󵄨 I + G1 KX G1T 󵄨󵄨󵄨󵄨 − log 󵄨󵄨󵄨󵄨 I + G2 KX G2T 󵄨󵄨󵄨󵄨󶀴.

1 1
CS = max
KX ⪰0:tr(KX )≤P 2 2

Note that the addition of the spatial dimension through multiple antennas allows for
beamforming the signal away from the eavesdropper.

22.1.1 Achievability Proof of Theorem 22.1

The coding scheme for the DM-WTC is illustrated in Figure .. We assume that CS > 0
and fix the pmf p(u, x) that attains it. Thus I(U ; Y) − I(U ; Z) > 0. We use multicoding
and a two-step randomized encoding scheme to help hide the message from the eaves-
̃
dropper. For each message m, we generate a subcodebook C(m) consisting of 2n(R−R)
u (l) sequences. To send m, the encoder randomly chooses one of the u (l) sequences
n n

in its subcodebook. It then randomly generates the codeword X n ∼ ∏ni=1 p X|U (xi |ui (l))
and transmits it. The receiver can recover the index l if R̃ < I(U ; Y ). For each sub-
̃
codebook C(m), the eavesdropper has roughly 2n(R−R−I(U ;Z)) un (l) sequences such that
(un (l), z n ) ∈ Tє(n) . Thus, if R̃ − R > I(U ; Z), the eavesdropper has a roughly equal num-
ber (in the exponent) of jointly typical sequences from each subcodebook and hence has
almost no information about the actual message sent.

̃ ̃
l: 2n(R−R) 2 nR

C(1) C(2) C(3) C(2nR )

Figure .. Coding scheme for the wiretap channel. The black circle corresponds
to (un (l), y n ) ∈ Tє(n) and the gray circles correspond to (un (l), z n ) ∈ Tє(n) .

In the following we provide details of this scheme.

Codebook generation. For each message m ∈ [1 : 2nR ], generate a subcodebook C(m)
̃
consisting of 2n(R−R) randomly and independently generated sequences un (l), l ∈ [(m −
̃ ̃
1)2n(R−R) + 1 : m2n(R−R) ], each according to ∏ni=1 pU (ui ). The codebook C = {C(m) : m ∈
[1 : 2nR ]} is revealed to all parties (including the eavesdropper).
554 Information Theoretic Secrecy

Encoding. To send message m ∈ [1 : 2nR ], the encoder chooses an index L uniformly

̃ ̃
at random from [(m − 1)2n(R−R) + 1 : m2n(R−R) ], generates X n (m) ∼ ∏ni=1 p X|U (xi |ui (L)),
and transmits it.
̂ such that (un (l), y n ) ∈ Tє(n) for some
Decoding. The decoder finds the unique message m
̂
un (l) ∈ C(m).
Analysis of the probability of error. By the LLN and the packing lemma, the probability
of error P(E) averaged over the random codebook and encoding tends to zero as n → ∞
if R̃ < I(U ; Y) − δ(є).
Analysis of the information leakage rate. Let M be the message sent and L be the ran-
domly selected index by the encoder. Every codebook C induces a conditional pmf on
(M, L, U n , Z n ) of the form
n
̃
p(m, l, un , z n | C ) = 2−nR 2−n(R−R) p(un | l, C ) 󵠉 pZ|U (zi |ui ).
i=1

In particular, p(u , z ) =
n n
∏ni=1
pU ,Z (ui , zi ). Now we bound the amount of information
leakage averaged over codebooks. Consider
I(M; Z n |C) = H(M |C) − H(M |Z n , C)
= nR − H(M, L|Z n , C) + H(L|Z n , M, C)
= nR − H(L|Z n , C) + H(L|Z n , M, C), (.)
where the last step follows since the message M is a function of the index L. We first estab-
lish a lower bound on the equivocation term H(L|Z n , C) in (.), which is tantamount
to lower bounding the number of U n sequences that are jointly typical with Z n . Consider
H(L|Z n , C) = H(L|C) − I(L; Z n |C)
= nR̃ − I(L; Z n |C)
= nR̃ − I(U n , L; Z n |C)
≥ nR̃ − I(U n , L, C; Z n )
(a)
= nR̃ − I(U n ; Z n )
(b)
= nR̃ − nI(U ; Z), (.)
where (a) follows since (L, C) → U n → Z n form a Markov chain and (b) follows since
by construction p(un , z n ) = ∏ni=1 pU ,Z (ui , zi ). Next, we establish an upper bound on the
second equivocation term in (.), which is tantamount to upper bounding the number
of U n sequences in the subcodebook C(M) that are jointly typical with Z n .

Lemma .. If R̃ − R ≥ I(U ; Z), then

1
lim sup H(L|Z n , M, C) ≤ R̃ − R − I(U ; Z) + δ(є).
n→∞ n
22.1 Wiretap Channel 555

The proof of this lemma is given in Appendix A. Substituting from (.) and this
lemma into (.) and using the condition that R̃ < I(U ; Y ) − δ(є), we have shown that
lim supn→∞ (1/n)I(M; Z n | C) ≤ δ(є), if R < I(U ; Y ) − I(U ; Z) − δ(є). Therefore, there
must exist a sequence of codes such that Pe(n) tends to zero and RL(n) ≤ δ(є) as n → ∞,
which completes the proof of achievability.
Remark .. Instead of sending a random X n , we can randomly generate a satellite code-
󳰀
book consisting of 2nR codewords x n (l, l 󳰀 ) for each l. The encoder then randomly trans-
mits one of the codewords in the chosen satellite codebook. To achieve secrecy, we must
have R󳰀 > I(X; Z|U ) + δ(є) in addition to R̃ − R ≥ I(U ; Z). As we discuss in Section ..,
this approach is useful for establishing lower bounds on the secrecy capacity of wiretap
channels with more than one legitimate receiver.

22.1.2 Converse Proof of Theorem 22.1

Consider any sequence of (2nR , n) codes such that Pe(n) tends to zero and RL(n) ≤ є as
n → ∞. Then, by Fano’s inequality, H(M|Y n ) ≤ nєn for some єn that tends to zero as
n → ∞. Now consider

≤ max󶀡I(U ; Y |V = 󰑣) − I(U ; Z |V = 󰑣)󶀱 + n(є + єn )

󰑣
(e)
≤ CS + n(є + єn ),

where (a) and (b) follow by the Csiszár sum identity in Section ., (c) follows by identify-
ing the auxiliary random variables Vi = (Y i−1 , Zi+1
n
) and Ui = (M, Vi ), and (d) follows by
introducing a time-sharing random variable Q and defining U = (UQ , Q), V = (VQ , Q),
Y = YQ , and Z = ZQ . Step (e) follows since U → X → (Y , Z) form a Markov chain given
{V = 󰑣}. The bound on the cardinality of U can be proved using the convex cover method
in Appendix C. This completes the proof of the converse.
556 Information Theoretic Secrecy

22.1.3 Extensions
The wiretap channel model and its secrecy capacity can be extended in several directions.
Secrecy capacity for multiple receivers or eavesdroppers. The secrecy capacity for more
than one receiver and/or eavesdropper is not known. We discuss a lower bound on the
secrecy capacity of a -receiver -eavesdropper wiretap channel that involves several ideas
beyond wiretap channel coding. Consider a DM-WTC p(y1 , y2 , z|x) with sender X, two
legitimate receivers Y1 and Y2 , and a single eavesdropper Z. The sender wishes to commu-
nicate a common message M ∈ [1 : 2nR ] reliably to both receivers Y1 and Y2 while keeping
it asymptotically secret from the eavesdropper Z, i.e., limn→∞ (1/n)I(M; Z n ) = 0.
A straightforward extension of Theorem . to this case yields the lower bound
CS ≥ max min󶁁I(U ; Y1 ) − I(U ; Z), I(U ; Y2 ) − I(U ; Z)󶁑. (.)
p(u,x)

Now suppose that Z is a degraded version of Y1 . Then from (.), I(U ; Y1 ) − I(U ; Z) ≤
I(X; Y1 ) − I(X; Z) for all p(u, x), with equality if U = X. However, no such inequality
holds in general for the second term; hence taking U = X does not attain the maximum
in (.) in general. Using the satellite codebook idea in Remark . and indirect decod-
ing in Section ., we can obtain the following tighter lower bound.

Proposition .. The secrecy capacity of the -receiver -eavesdropper DM-WTC

p(y1 , y2 , z|x) is lower bounded as

CS ≥ max min󶁁I(X; Y1 ) − I(X; Z), I(U ; Y2 ) − I(U ; Z)󶁑.

p(u,x)

To prove achievability of this lower bound, for each message m ∈ [1 : 2nR ], gener-
̃
ate a subcodebook C(m) consisting of 2n(R−R) randomly and independently generated se-
̃ ̃
quences un (l0 ), l0 ∈ [(m − 1)2n(R−R) + 1 : m2n(R−R) ], each according to ∏ni=1 pU (ui ). For
each l0 , conditionally independently generate 2nR1 sequences x n (l0 , l1 ), l1 ∈ [1 : 2nR1 ], each
according to ∏ni=1 p X|U (xi |ui (l0 )). To send m ∈ [1 : 2nR ], the encoder chooses an index
̃ ̃
pair (L0 , L1 ) uniformly at random from [(m − 1)2n(R−R) + 1 : m2n(R−R) ] × [1 : 2nR1 ], and
transmits x n (L0 , L1 ). Receiver Y2 decodes for the message directly through U and receiver
Y1 decodes for the message indirectly through (U , X) as detailed in Section .. Then the
probability of error tends to zero as n → ∞ if
R̃ < I(U ; Y2 ) − δ(є),
R̃ + R1 < I(X; Y1 ) − δ(є).
It can be further shown that M is kept asymptotically secret from the eavesdropper Z if
R̃ − R > I(U ; Z) + δ(є),
R1 > I(X; Z |U ) + δ(є).
Eliminating R̃ and R1 by the Fourier–Motzkin procedure in Appendix D, we obtain the
lower bound in Proposition ..
22.2 Conﬁdential Communication via Shared Key 557

Broadcast channel with common and confidential messages. Consider a DM-BC

p(y, z|x), where the sender wishes to communicate a common message M0 reliably to
both receivers Y and Z in addition to communicating a confidential message M1 to Y
and keeping it partially secret from Z. A (2nR0 , 2nR1 , n) code for the DM-BC with com-
mon and confidential messages consists of
∙ two message sets [1 : 2nR0 ] and [1 : 2nR1 ],
∙ an encoder that randomly assigns a codeword X n (m0 , m1 ) to each (m0 , m1 ) according
to a conditional pmf p(x n |m0 , m1 ), and
∙ two decoders, where decoder  assigns an estimate (m̂ 01 , m
̂ 11 ) ∈ [1 : 2nR0 ] × [1 : 2nR1 ]
n
or an error message e to each sequence y , and decoder  assigns an estimate m ̂ 02 ∈
[1 : 2 ] or an error message e to each sequence z .
nR0 n

Assume that the message pair (M0 , M1 ) is uniformly distributed over [1 : 2nR0 ] × [1 : 2nR1 ].
The average probability of error is defined as Pe(n) = P{(M ̂ 01 , M
̂ 1 ) ̸= (M0 , M1 ) or M
̂ 02 ̸=
(n)
M0 }. The information leakage rate at receiver Z is defined as RL = (1/n)I(M1 ; Z n ). A
rate–leakage triple (R0 , R1 , RL ) is said to be achievable if there exists a sequence of codes
such that limn→∞ Pe(n) = 0 and lim supn→∞ RL(n) ≤ RL . The rate–leakage region is the clo-
sure of the set of achievable rate triples (R0 , R1 , RL ).

Theorem .. The rate–leakage region of the -receiver DM-BC p(y, z|x) is the set
of rate triples (R0 , R1 , RL ) such that

R0 ≤ min{I(U ; Z), I(U ; Y)},

R0 + R1 ≤ I(U ; Z) + I(V ; Y |U ),
R0 + R1 ≤ I(V ; Y),
+
RL ≤ min 󶁁R1 , 󶁡I(V ; Y |U) − I(V ; Z |U )󶁱 󶁑

for some pmf p(u)p(󰑣|u)p(x|󰑣) with |U | ≤ |X | + 3, |V| ≤ (|X | + 1)(|X | + 3), where
[a]+ = max{a, 0}.

The proof of achievability uses superposition coding and randomized encoding for M1
as in the achievability proof of Theorem .. The proof of the converse follows similar
steps to the converse proofs for the DM-BC with degraded message sets in Section . and
for the DM-WTC in Section ...

22.2 CONFIDENTIAL COMMUNICATION VIA SHARED KEY

If the channel to the eavesdropper is less noisy than the channel to the receiver, no secret
communication can take place at a positive rate without a secret key shared between the
sender and the receiver. How long must this secret key be to ensure secrecy?
558 Information Theoretic Secrecy

To answer this question, we consider the secure communication system depicted in

Figure ., where the sender wishes to communicate a message M to the receiver over a
public DMC p(y|x) in the presence of an eavesdropper who observes the channel output.
The sender and the receiver share a secret key K, which is unknown to the eavesdropper,
and use the key to keep the message secret from the eavesdropper. We wish to find the
optimal tradeoff between the secrecy capacity and the rate of the secret key.

M Xn Yn ̂
M
Encoder p(y|x) Decoder

Eavesdropper

Figure .. Secure communication over a public channel.

A (2nR , 2nRK , n) secrecy code for the DMC consists of

∙ a message set [1 : 2nR ] and a key set [1 : 2nRK ],
∙ a randomized encoder that generates a codeword X n (m, k) ∈ X n according to a con-
ditional pmf p(x n |m, k) for each message–key pair (m, k) ∈ [1 : 2nR ] × [1 : 2nRK ], and
∙ a decoder that assigns an estimate m̂ ∈ [1 : 2nR ] or an error message e to each received
sequence y n ∈ Y n and key k ∈ [1 : 2nRK ].
We assume that the message–key pair (M, K) is uniformly distributed over [1 : 2nR ] ×
[1 : 2nRK ]. The information leakage rate associated with the (2nR , 2nRK , n) secrecy code is
RL(n) = (1/n)I(M; Y n ). The average probability of error and achievability are defined as for
the DM-WTC. The rate–leakage region R ∗ is the set of achievable rate triples (R, RK , RL ).
The secrecy capacity with key rate RK is defined as CS (RK ) = max{R : (R, RK , 0) ∈ R ∗ }.
The secrecy capacity is upper bounded by the key rate until saturated by the channel
capacity.

Theorem .. The secrecy capacity of the DMC p(y|x) with key rate RK is

CS (RK ) = min󶁃RK , max I(X; Y)󶁓.

p(x)

To prove achievability for this theorem, assume without loss of generality that R = RK .
The sender first encrypts the message M by taking the bitwise modulo- sum L of the bi-
nary expansions of M and K (referred to as a one-time pad), and transmits the encrypted
22.3 Secret Key Agreement: Source Model 559

message L using an optimal channel code for the DMC. If R < CS , the receiver can re-
cover L and decrypt it to recover M. Since M and L are independent, M and Y n are also
independent. Thus, I(M; Y n ) = 0 and we can achieve perfect secrecy.
To prove the converse, consider any sequence of (2nR , 2nRK , n) codes such that Pe(n)
tends to zero and RL(n) ≤ є as n → ∞. Then we have

nR ≤ I(M; Y n , K) + nєn
= I(M; Y n |K) + nєn
≤ I(M, K ; Y n ) + nєn
≤ I(X n ; Y n ) + nєn
≤ n max I(X; Y) + nєn .
p(x)

Furthermore,

where (a) follows by the secrecy constraint and (b) follows by Fano’s inequality. This com-
pletes the proof of Theorem ..
Wiretap channel with secret key. A secret key between the sender and the receiver can
be combined with wiretap channel coding to further increase the secrecy capacity. If RK
is the rate of the secret key, then the secrecy capacity of a DM-WTC p(y, z|x) is

CS (RK ) = max min󶁁I(U ; Y) − I(U ; Z) + RK , I(U ; Y )󶁑. (.)

p(u,x)

Such a secret key can be generated via interactive communication. For example, if noise-
less causal feedback is available from the receiver to the sender but not to the eavesdropper,
then it can be distilled into a secret key; see Problem .. In the following, we show that
even nonsecure interaction can be used to generate a secret key.

22.3 SECRET KEY AGREEMENT: SOURCE MODEL

Suppose that the sender and the receiver observe correlated sources. Then it turns out that
they can agree on a secret key through interactive communication over a public channel
that the eavesdropper has complete access to. We first discuss this key agreement scheme
under the source model.
560 Information Theoretic Secrecy

Consider a network with two sender–receiver nodes, an eavesdropper, and a -DMS

(X1 , X2 , Z) as depicted in Figure .. Node  observes the DMS X1 , node  observes the
DMS X2 , and the eavesdropper observes the DMS Z. Nodes  and  communicate over a
noiseless public broadcast channel that the eavesdropper has complete access to with the
goal of agreeing on a key that the eavesdropper has almost no information about. We wish
to find the maximum achievable secret key rate. We assume that the nodes communicate
in a round robin fashion over q rounds as in the interactive source coding setup studied
in Section .. Assume without loss of generality that q is even and that node  transmits
during the odd rounds l = 1, 3, . . . , q − 1 and node  transmits during the even rounds.
A (2nr1 , . . . , 2nr󰑞 , n) key agreement code consists of

∙ two randomized encoders, where in odd rounds l = 1, 3, . . . , q − 1, encoder  gener-

ates an index M l ∈ [1 : 2nr󰑙 ] according to a conditional pmf p(m l |x1n , m l−1 ) (that is, a
random mapping given its source vector and all previously transmitted indices), and
in even rounds l = 2, 4, . . . , q, encoder  generates an index M l ∈ [1 : 2nr󰑙 ] according
to p(m l |x2n , m l−1 ), and
∙ two decoders, where decoder j = 1, 2 generates a key K j according to a conditional
pmf p(k j |mq , x nj ) (that is, a random mapping given its source sequence and all received
indices).

The probability of error for the key agreement code is defined as Pe(n) = P{K1 ̸= K2 }. The
key leakage rate is defined as RL(n) = max j∈{1,2} (1/n)I(K j ; Z n , M q ). A key rate–leakage pair
(R, RL ) is said to be achievable if there exists a sequence of (2nr1 , 2nr2 , . . . , 2nr󰑞 , n) codes
such that limn→∞ Pe(n) = 0, lim supn→∞ RL(n) ≤ RL , and lim inf n→∞ (1/n)H(K j ) ≥ R for
j = 1, 2. The key rate–leakage region R ∗ is the closure of the set of achievable rate–
leakage pairs (R, RL ). As in the wiretap channel, we focus on the secret key capacity
CK = max{R : (R, 0) ∈ R ∗ }. The secret key capacity is not known in general.

M l (X1n , M l−1 )
X1n X2n
Node  M l+1 (X2n , M)
l Node 

Zn
Eavesdropper

Figure .. Source model for secret key agreement.

22.3.1 Secret Key Agreement from One-Way Communication

We first consider the case where the public communication is from node  to node  only,
i.e., q = 1. We establish the following single-letter characterization of the one-round secret
key capacity.
22.3 Secret Key Agreement: Source Model 561

Theorem .. The one-round secret key capacity for the -DMS (X1 , X2 , Z) is

CK(1) = max 󶀡I(V ; X2 |U ) − I(V ; Z |U )󶀱,

p(u,󰑣|x1 )

where |U| ≤ |X1 | and |V| ≤ |X1 |.

Note that the auxiliary random variable U can be taken as a function of V with cardi-
nality bounds |U | ≤ |X1 | and |V| ≤ |X1 |2 . Also note that if X1 → X2 → Z form a Markov
chain, then the one-round secret key capacity simplifies to CK(1) = I(X1 ; X2 |Z).
Remark . (One-way secret key agreement). Suppose that the public communication
is either from node  to node  or vice versa, that is, M q = (M1 , ) or (, M2 ). By changing
the roles of nodes  and  in Theorem ., it can be readily shown that the one-way secret
key capacity is

CK-OW = max󶁁 max 󶀡I(V ; X2 |U ) − I(V ; Z |U )󶀱,

p(u,󰑣|x1 )
(.)
max 󶀡I(V ; X1 |U ) − I(V ; Z |U )󶀱󶁑.
p(u,󰑣|x2 )

Clearly, CK(1) ≤ CK-OW ≤ CK . As we show in Example ., these inequalities can be strict.

Converse proof of Theorem .. Consider

where (a) follows by Fano’s inequality, (b) follows by the secrecy constraint, (c) follows by
the converse proof of Theorem ., (d) follows by identifying Ui = (M, X2i−1 , Zi+1 n
) and
Vi = (K1 , Ui ), and (e) follows by introducing a time-sharing random variable Q and defin-
ing U = (UQ , Q), V = (VQ , Q), X1Q = X1 , X2Q = X2 , and ZQ = Z. Finally, observing that
the pmf p(u, 󰑣|x1 , x2 , z) = p(󰑣|x1 )p(u|󰑣) completes the converse proof of Theorem ..
562 Information Theoretic Secrecy

22.3.2 Achievability Proof of Theorem 22.4

We first consider the special case of U =  and V = X1 , and then generalize the result.
Assume that I(X1 ; X2 ) − I(X1 ; Z) = H(X1 |Z) − H(X1 |X2 ) > 0. To generate the key, we
use the double random binning scheme illustrated in Figure ..

k: 1 2 2nR 1 2 2nR 1 2 2nR

B(1) B(2) ̃
B(2nR )

Figure .. Key generation scheme. The black circles correspond to (x1n , x2n ) ∈ Tє(n)
and the white circles correspond to (x1n , z n ) ∈ Tє(n) .

Codebook generation. Randomly and independently partition the set of sequences X1n
̃ ̃
into 2nR bins B(m), m ∈ [1 : 2nR ]. Randomly and independently partition the sequences
in each nonempty bin B(m) into 2nR sub-bins B(m, k), k ∈ [1 : 2nR ]. The bin assignments
are revealed to all parties.
Encoding. Given a sequence x1n , node 1 finds the index pair (m, k) such that x1n ∈ B(m, k).
It then sends the index m to both node  and the eavesdropper.
Decoding and key generation. Node  sets its key K1 = k. Upon receiving m, node 
finds the unique sequence x̂1n ∈ B(m) such that (x̂1n , x2n ) ∈ Tє(n) . If such a unique sequence
exists, then it sets K2 = k̂ such that x̂1n ∈ B(m, k).
̂ If there is no such sequence or there is
more than one, it sets K2 = 1.
Analysis of the probability of error. We analyze the probability of error averaged over
(X1n , X2n ) and the random bin assignments. Since node  makes an error only if X̂ 1n ̸= X1n ,
P(E) = P{K1 ̸= K2 } ≤ P{ X̂ 1n ̸= X1n },
which, by the achievability proof of the Slepian–Wolf theorem, tends to zero as n → ∞ if
R̃ > H(X1 |X2 ) + δ(є).
Analysis of the key rate. We first establish the following bound on the key rate averaged
over the random bin assignments C.

Lemma .. If R < H(X1 ) − 4δ(є), then lim inf n→∞ (1/n)H(K1 |C) ≥ R − δ(є).

The proof of this lemma is in Appendix B.

and similarly H(K2 |C) ≥ H(K1 |C) − 1 − nR P(E). Thus, if R̃ > H(X1 |X2 ) + δ(є),
1 󵄨󵄨 H(K |C) − H(K |C) 󵄨󵄨 = 0,
lim 󵄨󵄨 2 1 󵄨󵄨 (.)
n→∞ n
hence by Lemma ., lim inf n→∞ (1/n)H(K2 | C) ≥ R − δ(є).
Analysis of the key leakage rate. Consider the key leakage rate averaged over C. Follow-
ing a similar argument to (.), we only need to consider the K1 term

I(K1 ; Z n , M |C) = I(K1 , X1n ; Z n , M |C) − I(X1n ; Z n , M |K1 , C)

and
H(X1n |K1 , C) = H(X1n ) + H(K1 | X1n , C) − H(K1 |C)
= H(X1n ) − H(K1 |C)
≥ n󶀡H(X1 ) − R󶀱.

Substituting, we have

I(K1 ; Z n , M |C) ≤ n󶀡R̃ + R − H(X1 |Z)󶀱 + H(X1n |Z n , M, K1 , C). (.)

We bound the remaining term in the following.

Lemma .. If R < H(X1 |Z) − H(X1 |X2 ) − 2δ(є), then

1
lim sup H(X1n |Z n , M, K1 , C) ≤ H(X1 |Z) − R̃ − R + δ(є).
n→∞ n

The proof of this lemma is in Appendix C.

Substituting from this lemma into the bound in (.) and taking limits shows that
(1/n)I(K1 ; Z n , M | C) ≤ δ(є) as n → ∞ if R < H(X1 |Z) − H(X1 |X2 ) − 2δ(є).
To summarize, we have shown that if R < H(X1 |Z) − H(X1 |X2 ) − 2δ(є), then P(E)
tends to zero as n → ∞ and for j = 1, 2,
1
lim inf H(K j |C) ≥ R − δ(є),
n→∞ n
1
lim sup I(K j ; Z n , M |C) ≤ δ(є).
n→∞ n
564 Information Theoretic Secrecy

Therefore, there exists a sequence of key agreement codes such that Pe(n) tends to zero,
H(K j ) ≥ R − δ(є), j = 1, 2, and RL(n) ≤ δ(є) as n → ∞.
Finally, to complete the proof of achievability for a general (U , V ), we modify code-
book generation as follows. Node  generates (U n , V n ) according to ∏ni=1 p(ui , 󰑣i |x1i ).
Node  then sends U n over the public channel. Following the above proof with V in place
of X1 at node , (U , X2 ) at node 2, and (U , Z) at the eavesdropper proves the achievability
of R < I(V ; X2 , U ) − I(V ; Z, U) = I(V ; X2 |U) − I(V ; Z|U ).
Remark . (One-round secret key agreement with rate constraint). As in the wire-
tap channel case (see Remark .), instead of generating (U n , V n ) randomly, we can
generate codewords un (l), 󰑣 n (l , l 󳰀 ) and use Wyner–Ziv coding to send the indices. Thus
the one-round secret key capacity under a rate constraint r1 ≤ R1 on the public discussion
link is lower bounded as

CK(1) (R1 ) ≥ max 󶀡I(V ; X2 |U) − I(V ; Z |U )󶀱, (.)

where the maximum is over all conditional pmfs p(u, 󰑣|x1 ) such that R1 ≥ I(U , V ; X1 ) −
I(U , V ; X2 ). It can be shown that this lower bound is tight, establishing the secret key
capacity for this case.

22.3.3 Lower Bound on the Secret Key Capacity

The secret key capacity for general interactive communication is not known. We establish
the following lower bound on the q-round secrecy capacity.

Theorem .. The q-round secret key capacity for the -DMS (X1 , X2 , Z) is lower
bounded as
q
(q)
CK ≥ max 󵠈󶀡I(Ul ; X j󰑙+1 |U l−1 ) − I(Ul ; Z |U l−1 )󶀱
l=1
q
= max󶀤H(U |Z) − 󵠈 H(Ul |U l−1 , X j󰑙+1 )󶀴,
q

l=1

q
where the maximum is over all conditional pmfs ∏ l=1 p(u l |u l−1 , x j󰑙 ) and j l = 1 if l is
odd and j l = 2 if l is even.

This lower bound can be strictly tighter than the one-way secret key capacity in (.),
as demonstrated in the following.
Example .. Let X1 = (X11 , X12 ), where X11 and X12 are independent Bern(1/2) ran-
dom variables. The joint conditional pmf for X2 = (X21 , X22 ) and Z = (Z1 , Z2 ) is defined
in Figure ..
By setting U1 = X11 and U2 = X21 in Theorem ., interactive communication can
22.3 Secret Key Agreement: Source Model 565

p
p 0 1 є 0 0 0 p p 0
0 0 0
1/2
e e e 1 e e

1 1 1
p 1 1 є 1 1 1 p p 1
p
X11 X21 T1 Z1 Z2 T2 X12 X22

Figure .. DMS for key agreement in Example .; p, є ∈ (0, 1).

achieve the secret key rate

I(X1 ; X2 |Z) = I(X11 ; X21 |Z1 ) + I(X12 ; X22 |Z2 ), (.)

which will be shown to be optimal in Theorem .. This, however, is strictly larger than
the one-round secrecy key capacity CK(1) .
To show this, first note that the one-round secret key capacity from X1 to X2 in The-
orem . depends only on the marginal pmfs p(x1 , x2 ) and p(x1 , z). As shown in Fig-
ure ., (X1 , X2 ) = (X11 , X12 , X21 , X22 ) has the same joint pmf as (X1 , X2󳰀 ) = (X11 , X12 ,
X21 , T2 ). Furthermore, X1 → X2󳰀 → Z form a Markov chain. Thus, the one-round secret
key capacity is

CK(1) = I(X1 ; X2󳰀 |Z) = I(X11 ; X21 |Z1 ) + I(X12 ; T2 |Z2 ).

But this rate is strictly less than the interactive communication rate in (.), since

I(X12 ; T2 |Z2 ) = I(X12 ; T2 ) − I(X12 ; Z2 )

< I(X12 ; T2 ) − I(X22 ; Z2 )
= I(X12 ; X22 ) − I(X22 ; Z2 )
= I(X12 ; X22 |Z2 ).

Similarly, note that (X1 , X2 ) = (X11 , X12 , X21 , X22 ) has the same joint pmf as (X1󳰀 , X2 ) =
(T1 , X12 , X21 , X22 ), X2 → X1󳰀 → Z form a Markov chain, and I(X11 ; Z1 ) < I(X21 ; Z1 ).
Hence, the one-round secret key capacity from X2 to X1 is strictly less than the interactive
communication rate in (.).
Thus, as we have shown for channel coding in Chapter , source coding in Chapter ,
and coding for computing in Chapter , interaction can also improve the performance
of coding for secrecy.
q
Proof of Theorem . (outline). Fix the pmf ∏ l=1 p(u l |u l−1 , x j󰑙 ), j l = 1 if l is odd and
j l = 2 if l is even, that attains the maximum in the lower bound in Theorem .. For each
̃
l ∈ [1 : q], randomly and independently assign each sequence u1n ∈ Uln to one of 2nR󰑙 bins
566 Information Theoretic Secrecy

̃ ̃
B l (m l ), m l ∈ [1 : 2nR󰑙 ]. For each message sequence (m1 , . . . , mq ) ∈ [1 : 2nR󰑙 ] × ⋅ ⋅ ⋅ × [1 :
2nR󰑞 ], randomly and independently assign each sequence (u1n , . . . , uqn ) ∈ B l (m l ) × ⋅ ⋅ ⋅ ×
Bq (mq ) to one of 2nR sub-bins B(m1 , . . . , mq , k), k ∈ [1 : 2nR ].
The coding procedure for each odd round l is as follows. Assuming that node  has
(û 2 , û 4n , . . . , û nl−1 ) as the estimate of the sequences generated by node , node  generates a
n

sequence unl according to ∏ni=1 p(u li |u1i , û 2i , u3i , . . . , û l−1,i , x1i ). Node  sends the index
m l such that unl ∈ B l (ml ) to node . Upon receiving m l , node  finds the unique sequence
û nl ∈ B l (ml ) such that (û 1n , u2n , . . . , û nl , x2n ) ∈ Tє(n) . The same procedure is repeated for the
next round with the roles of nodes  and  interchanged. By the Slepian–Wolf theorem and
induction, the probability of error tends to zero as n → ∞ if R̃ l > H(Ul |U l−1 , X j󰑙+1 ) + δ(є).
At the end of round q, if the unique set of sequences (u1n , û 2n , . . . , unl−1 , û nl ) is avail-
able at node , it finds the sub-bin index k̂ to which the set of sequences belongs and sets
K1 = k. ̂ Otherwise it sets K to a random index in [1 : 2nR ]. Node  generates K similarly.
1 2
The key rates and the eavesdropper’s key leakage rate can be analyzed by following sim-
ilar steps to the one-way case in Section ... If R̃ l > H(U l |U l−1 , X j󰑙+1 ) + δ(є) and R +
∑ l=1 R̃ l < H(U q |Z), then H(K1 | C), H(K2 | C) ≥ n(R − єn ) and (1/n)I(K1 ; Z n , M q | C) ≤
q

δ(є) as n → ∞. This completes the proof of achievability.

Remark .. As in the one-way communication case, the key rate may improve by first
exchanging U1 , U2 , . . . , Uq󳰀 −1 for some q 󳰀 ≤ q and then conditionally double binning
Uq󳰀 , Uq󳰀 +1 , . . . , Uq to generate the key. This yields the lower bound
q
(q)
CK ≥ max 󵠈 󶀡I(Ul ; X j󰑙+1 |U l−1 ) − I(U l ; Z |U l−1 )󶀱,
l=q 󳰀

where the maximum is over all conditional pmfs ∏ l=1 p(u l |u l−1 , x j󰑙 ) and q 󳰀 ∈ [1 : q].
q

22.3.4 Upper Bound on the Secret Key Capacity

We establish the following general upper bound on the secret key capacity.

Theorem .. The secret key capacity for the -DMS (X1 , X2 , Z) is upper bounded as

̄
CK ≤ min I(X1 ; X2 | Z),
p(̄z |z)

̄ ≤ |Z|.
where |Z|

This bound is tight when X1 → X2 → Z or X2 → X1 → Z, in which case CK =

I(X1 ; X2 |Z). The proof follows immediately from the lower bound on the one-round se-
cret key capacity in Theorem . and the Markov condition. In particular, consider the
following interesting special cases.
∙ Z is known everywhere: Suppose that Z is a common part between X1 and X2 . Then
nodes  and  know the eavesdropper’s sequence Z n and the secrecy capacity is CK =
I(X1 ; X2 |Z).
22.3 Secret Key Agreement: Source Model 567

∙ Z = : Suppose that the eavesdropper has access to the public communication but
does not have correlated prior information. The secrecy capacity for this case is CK =
I(X1 ; X2 ). This is in general strictly larger than the Gács–Körner–Witsenhausen com-
mon information K(X1 ; X2 ), which is the maximum amount of common randomness
nodes  and  can agree on without public information; see Section .. and Prob-
lem ..
The theorem also shows that the interactive secret key rate in (.) for Example .
is optimal.
In all the cases above, the upper bound in Theorem . is tight with Z̄ = Z. In the
following, we show that the upper bound can be strictly tighter with Z̄ ̸= Z.
Example .. Let X1 and X2 be independent Bern(1/2) sources and Z = X1 ⊕ X2 . Then
I(X1 ; X2 |Z) = 1. However, I(X1 ; X2 ) = 0. By taking Z̄ = , the upper bound in Theo-
rem . is tight and yields CK = 0.

Proof of Theorem .. We first consider the case where the encoding and key generation
mappings are deterministic and establish the upper bound on the key rate R ≤ I(X1 ; X2 |Z).
We then generalize the proof for arbitrary Z̄ and randomized codes. Given a sequence of
deterministic codes such that limn→∞ Pe(n) = 0 and

1
max lim sup I(K j ; Z n , M q ) = 0,
j∈{1,2} n→∞ n

we establish an upper bound on the key rate min{(1/n)H(K1 ), (1/n)H(K2 )}. By Fano’s
inequality, it suffices to provide a bound only on (1/n)H(K1 ). Consider

H(K1 ) ≤ H(K1 |M q , Z n ) + I(K1 ; M q , Z n )

where (a) follows by the secrecy constraint, (b) and (c) follow since H(K1 |M q , X1n ) = 0,
(d) follows since H(K1 |M q , X2n , Z n ) = H(K1 |M q , K2 , X2n , Z n ) ≤ H(K1 |K2 ), and finally (e)
follows by the condition that P{K1 ̸= K2 } ≤ єn and Fano’s inequality.
Next, since q is even by convention and encoding is deterministic by assumption,
568 Information Theoretic Secrecy

H(Mq |X2n , M q−1 ) = 0. Hence

I(X1n ; X2n |M q , Z n ) = H(X1n |M q , Z n ) − H(X1n |M q , X2n , Z n )

Using the same procedure, we expand I(X1n ; X2n |M q−1 , Z n ) in the other direction to obtain

I(X1n ; X2n |M q−1 , Z n ) = H(X2n |M q−1 , Z n ) − H(X2n |M q−1 , X1n , Z n )

≤ I(X1n ; X2n |M q−2 , Z n ).

Repeating this procedure q times, we obtain I(X1n ; X2n |M q , Z n ) ≤ I(X1n ; X2n |Z n ) =

nI(X1 ; X2 |Z). Substituting and taking n → ∞ shows that R = lim supn→∞ (1/n)H(K1 ) ≤
I(X1 ; X2 |Z) + є for every є > 0 and hence R ≤ I(X1 ; X2 |Z).
We can readily improve the bound by allowing the eavesdropper to pass Z through
an arbitrary channel p(̄z |z) to generate a new random variable Z. ̄ However, any secrecy
̄ since
code for the original system with (X1 , X2 , Z) is also a secrecy code for (X1 , X2 , Z),

̄ M q ) = H(K1 ) − H(K1 | Z,
I(K1 ; Z, ̄ Mq)
̄ Z, M q )
≤ H(K1 ) − H(K1 | Z,
= I(K1 ; M q , Z).

̄ for every p(̄z |z).

This establishes the upper bound R ≤ I(X1 ; X2 | Z)
Finally, we consider the case with randomized encoding and key generation. Note that
randomization is equivalent to having random variables W1 at node  and W2 at node 
that are independent of each other and of (X1n , X2n , Z n ) and performing encoding and
key generation using deterministic functions of the source sequence, messages, and the
random variable at each node. Using this equivalence and the upper bound on the key
rate for deterministic encoding obtained above, the key rate for randomized encoding is
upper bounded as

R ≤ min sup ̄
I(X1 , W1 ; X2 , W2 | Z)
p(̄z |z) p(󰑤 )p(󰑤 )
1 2

= min ̄ + I(X1 , W1 ; X2 | Z,
sup 󶀡I(X1 , W1 ; W2 | Z) ̄ W2 )󶀱
p(̄z |z) p(󰑤 )p(󰑤 )
1 2

= min sup ̄
I(X1 , W1 ; X2 | Z)
p(̄z |z) p(󰑤 )p(󰑤 )
1 2

= min sup ̄
I(X1 ; X2 | Z)
p(̄z |z) p(󰑤 )p(󰑤 )
1 2

̄
= min I(X1 ; X2 | Z).
p(̄z |z)

This completes the proof of Theorem ..

22.3 Secret Key Agreement: Source Model 569

22.3.5 Extensions to Multiple Nodes

Consider a network with N nodes, an eavesdropper, and an (N + 1)-DMS (X1 , . . . , XN , Z).
Node j ∈ [1 : N] observes source X j and the eavesdropper observes Z. The nodes commu-
nicate over a noiseless public broadcast channel in a round robin fashion over q rounds,
where q is divisible by N, such that node j ∈ [1 : N] communicates in rounds j, N + j, . . . ,
q − N + j. A (2nr1 , . . . , 2nr󰑞 , n) code for key agreement consists of
∙ a set of encoders, where in round l j ∈ { j, N + j, . . . , q − N + j}, encoder j ∈ [1 : N]
generates an index M l 󰑗 ∈ [1 : 2 󰑙 󰑗 ] according to a conditional pmf p(m l 󰑗 |x nj , m l 󰑗 −1 ),
nr

and
∙ a set of decoders, where decoder j ∈ [1 : N] generates a key K j according to a condi-
tional pmf p(k j |mq , x nj ).

The probability of error for the key agreement code is defined as Pe(n) = 1 − P{K1 = K2 =
⋅ ⋅ ⋅ = KN }. The key leakage rate, achievability, key rate–leakage region, and secret key
capacity are defined as for the -node case in the beginning of this section. Consider the
following bounds on the secret key capacity.
Lower bound. The lower bound on the secret key capacity for  nodes in Theorem .
can be extended to N nodes.

Proposition .. The q-round secret key capacity for the (N + 1)-DMS (X1 , . . . ,
XN , Z) is lower bounded as
q
(q)
CK ≥ max 󵠈 󶀤 min I(Ul ; X j |U l−1 ) − I(Ul ; Z |U l−1 )󶀴,
j∈[1:N]
l=q󳰀

where the maximum is over all conditional pmfs ∏Nj=1 ∏ l 󰑗 p(u l 󰑗 |u l 󰑗 −1 , x j ) and q󳰀 ∈
[1 : q].

Upper bound. The upper bound on the secret key capacity for  nodes in Theorem .
can also be extended to N nodes. The upper bound for N nodes has a compact rep-
resentation in terms of the minimum sum-rate of the CFO problem in Section ...
Let RCFO ([1 : N]|Z) be the minimum sum-rate for the CFO problem when all the nodes
know Z, that is, RCFO ([1 : N]|Z) for the (N + 1)-DMS (X1 , . . . , XN , Z) is equivalent to
RCFO ([1 : N]) for the N-DMS ((X1 , Z), . . . , (XN , Z)).

Proposition .. The secret key capacity for the (N + 1)-DMS (X1 , . . . , XN , Z) is up-
per bounded as
CK ≤ min 󶀡H(X N | Z)̄ − RCFO ([1 : N]| Z)󶀱,
̄
p(̄z |z)

̄ ≤ |Z|.
where |Z|
570 Information Theoretic Secrecy

To establish this upper bound, it suffices to bound H(K1 ), which results in an upper
bound on the achievable key rate. As for the -node case, we first establish the bound for
Z̄ = Z and no randomization. Consider

H(K1 ) = H(K1 |M q , Z n ) + I(K1 ; M q , Z n )

where (a) follows by the secrecy constraint, (b) and (c) follow since both K1 and M q are
functions of X n ([1 : N]), and (d) follows by defining

nR󳰀j = 󵠈 H(M l 󰑗 |M l 󰑗 −1 , Z n ) + H(X nj | K1 , M q , X n ([1 : j − 1]), Z n ), j ∈ [1 : N].

l󰑗

Now for every proper subset J of [1 : N],

+ 󵠈 H(X nj | K1 , M q , X n ([1 : j − 1]), X n (J c ∩ [ j + 1 : N]), Z n )

j∈J

(a) N
≤ 󵠈 󵠈 H(M l 󰑗 |M l 󰑗 −1 , X n (J c ), Z n ) + 󵠈 H(X nj |K1 , M q , X n ([1 : j − 1]), Z n ) + nєn
j=1 l 󰑗 j∈J

(b)
= 󵠈 󶀤󵠈 H(M l 󰑗 |M l 󰑗 −1 , Z n ) + H(X nj | K1 , M q , X n ([1 : j − 1]), Z n )󶀴 + nєn
j∈J l󰑗

= 󵠈 nR󳰀j + nєn ,
j∈J

where (a) follows by Fano’s inequality (since J c is nonempty) and (b) follows since M l 󰑗
is a function of (M l 󰑗 −1 , X nj ) and thus H(M l 󰑗 |M l 󰑗 −1 , X n (J c ), Z n ) = 0 if j ∉ J . Hence, by
22.3 Secret Key Agreement: Source Model 571

Theorem ., the rate tuple (R1󳰀 + єn , . . . , RN󳰀 + єn ) is in the optimal rate region for the
CFO problem when all the nodes know Z. In particular,
N
󵠈 R󳰀j ≥ RCFO ([1 : N]|Z) − N єn .
j=1

Substituting in (.), we have (1/n)H(K1 ) ≤ H(X([1 : N])|Z) − RCFO ([1 : N]|Z) + N єn +

є, which completes the proof of the upper bound without randomization. The proof of
the bound with Z̄ and randomization follows the same steps as for the -node case.
Secret key capacity when all nodes know Z. Suppose that the eavesdropper observes Z
and node j ∈ [1 : N] observes (X j , Z).

Theorem .. The secret key capacity when all nodes know Z is

CK = H(X N |Z) − RCFO ([1 : N]|Z).

The converse follows by noting that Z̄ = Z minimizes the upper bound in Proposi-
tion .. Achievability follows by setting U j = (X j , Z), j ∈ [1 : N], in Proposition ..
We give an alternative proof of achievability that uses the achievability proof for the
CFO problem explicitly. The codebook generation for the public communications be-
tween the nodes is the same as that for the CFO problem. To generate the secret key
codebook, we randomly and independently partition the set of x n ([1 : N]) sequences into
2nR bins. The index of the bin is the key to be agreed upon. The nodes communicate
to achieve omniscience, that is, each node j ∈ [1 : N] broadcasts the bin index M j of its
source sequence. The decoder for each node first recovers x̂n ([1 : N]) and then sets the
secret key equal to the bin index of x̂n ([1 : N]). The analysis of the probability of error
follows that for the CFO problem with side information Z (see Problem .).
The analysis of the key leakage rate is similar to that for the -node case. We consider
the leakage only for K1 . The other cases follow by Fano’s inequality as in the -node case.
Denote Xn = X n ([1 : N]) and consider

I(K1 ; M N , Z n |C) = I(K1 , Xn ; M N , Z n |C) − I(Xn ; M N , Z n |K1 |C)

where (a) and (c) follow since K1 is a function of (Xn , C) and (b) follows since H(M N ) ≤
n(RCFO ([1 : N]|Z) + δ(є)) in the CFO coding scheme. Using a similar approach to the
-node case, it can be shown that if R < H(X N |Z) − RCFO ([1 : N]|Z) − δ(є), then
1
lim sup H(Xn | M N , K1 , Z n , C) ≤ H(Xn |Z) − R − RCFO ([1 : N]|Z) + δ(є).
n→∞ n

Substituting in bound (.) yields I(K1 ; M N , Z n ) ≤ nδ(є), which completes the proof of
achievability.

Remark 22.6. Consider the case where only a subset of the nodes A ⊆ [1 : N] wish to
agree on a secret key and the rest of the nodes act as “helpers” to achieve a higher secret
key rate. As before, all nodes observe the source Z at the eavesdropper. In this case, it can
be shown that
CK (A) = H(X N |Z) − RCFO (A|Z). (.)

Note that this secret key capacity can be larger than the capacity without helpers in The-
orem ..
Remark 22.7. Since RCFO (A|Z) can be computed efficiently (see Remark .), the secret
key capacity with and without helpers in Theorem . and in (.), respectively, can be
also computed efficiently.

22.4 SECRET KEY AGREEMENT: CHANNEL MODEL

In the source model for key agreement, we assumed that the nodes have access to cor-
related sources. The channel model for key agreement provides a natural way for gen-
erating such sources. It generalizes the source model by assuming a DM-WTC p(y, z|x)
from one of the nodes to the other node and the eavesdropper, in addition to the noise-
less public broadcast channel. The input to the DM-WTC is not limited to a DMS X and
communication is performed in multiple rounds. Each round consists of a number of
transmissions over the DM-WTC followed by rounds of interactive communication over
the public channel. The definitions of the secret key rate and achievability are similar to
those for the source model. In the following, we illustrate this channel model through the
following ingenious example.

22.4.1 Maurer’s Example

Consider the BS-WTC in Example .. Assume that the receiver is allowed to feed some
information back to the sender through a noiseless public broadcast channel of unlimited
capacity. This information is also available to the eavesdropper. The sender (Alice) and
the receiver (Bob) wish to agree on a key for secure communication that the eavesdropper
(Eve) is ignorant of. What is the secret key capacity?
Maurer established the secret key capacity using a wiretap channel coding scheme over
a “virtual” degraded WTC from Bob to Alice and Eve as depicted in Figure .. Alice
22.4 Secret Key Agreement: Channel Model 573

K̂ Xn K

Decoder Encoder
Z1
Vn
V n ⊕ Z1n Y n
Wn

Zn
Eavesdropper

Figure .. Maurer’s key agreement scheme.

first sends a random sequence X n . Bob then randomly picks a key K ∈ [1 : 2nR ], adds its
codeword V n to the received sequence Y n , and sends the sum W n = V n ⊕ Y n over the
public channel. Alice uses her knowledge of X n to obtain a better version of V n than Eve’s
and proceeds to decode it to find K. ̂
Now we present the details of this scheme. Let X and V be independent Bern(1/2)
̃
random variables. Randomly and independently generate 2nR sequences 󰑣 n (l), l ∈ [1 :
̃
2nR ], each according to ∏ni=1 pV (󰑣i ), and partition them into 2nR equal-size bins B(k),
k ∈ [1 : 2nR ]. This codebook is revealed to all parties.
Alice sends the randomly generated sequence X n ∼ ∏ni=1 p X (xi ) over the BS-WTC.
̃
Bob picks an index L uniformly at random from the set [1 : 2nR ]. The secret key is the index
K such that 󰑣 n (L) ∈ B(K). Bob then sends W n = 󰑣 n (L) ⊕ Y n over the public channel.
Upon receiving W n , Alice adds it to X n to obtain 󰑣 n (L) ⊕ Z1n . Thus in effect Alice
receives 󰑣 n (L) over a BSC(p1 ). Alice declares ̂l is sent by Bob if it is the unique index
such that (󰑣 n ( ̂l), 󰑣 n (L) ⊕ Z1n ) ∈ Tє(n) and then declares the bin index k̂ of 󰑣 n ( ̂l) as the key
estimate. By the packing lemma, the probability of decoding error tends to zero if R̃ <
I(V ; V ⊕ Z1 ) − δ(є) = 1 − H(p1 ) − δ(є).
We now show that the key K satisfies the secrecy constraint. Let C denote the codebook
(random bin assignments). Then
H(K |Z n , W n , C) = H(K | Z n ⊕ W n , W n , C)
= H(K , W n |Z n ⊕ W n , C) − H(W n |Z n ⊕ W n , C)
= H(K |Z n ⊕ W n , C) + H(W n |K , Z n ⊕ W n , C) − H(W n |Z n ⊕ W n , C)
= H(K |Z n ⊕ W n , C),
574 Information Theoretic Secrecy

which implies that I(K ; Z n , W n |C) = I(K ; Z n ⊕ W n |C), that is, Z n ⊕ W n = Z1n ⊕ Z2n ⊕ V n
is a “sufficient statistic” of (Z n , W n ) for K (conditioned on C). Hence, we can assume that
Eve has Z n ⊕ W n = V n ⊕ Z1n ⊕ Z2n , that is, she receives 󰑣 n (L) over a BSC(p1 ∗ p2 ). From the
analysis of the wiretap channel, a key rate R < I(V ; V ⊕ Z1 ) − I(V ; V ⊕ Z1 ⊕ Z2 ) − 2δ(є) =
H(p1 ∗ p2 ) − H(p1 ) − 2δ(є) is achievable. In other words, the wiretap channel p(y, z|x)
with zero secrecy capacity can generate a secret key of positive rate when accompanied by
public communication.

22.4.2 Lower and Upper Bounds on the Secret Key Capacity

The secret key capacity CK-CM of the DM-WTC p(y, z|x) is not known in general. We
discuss known upper and lower bounds.
Lower bounds. The lower bounds for the source model in Theorems . and . can be
readily extended to the channel model by restricting communication to be one-way. Fix
a pmf p(x) and transmit X n ∼ ∏ni=1 p X (xi ) over the DM-WTC p(y, z|x) to generate the
correlated source (Y n , Z n ). Interactive communication over the public channel is then
used to generate the key. This restriction to one-round communication turns the prob-
lem into an equivalent source model with the -DMS (X, Y , Z) ∼ p(x)p(y, z|x). Thus, if
CK (X; Y|Z) denotes the secret key capacity for the -DMS (X, Y , Z), then

CK-CM ≥ max CK (X; Y |Z). (.)

p(x)

In particular, using the lower bound in Theorem . with U =  yields

CK-CM ≥ max 󶀡I(V ; Y) − I(V ; Z)󶀱,

p(󰑣,x)

which is the secrecy capacity of the DM-WTC (without any public communication) and
corresponds to the simple key agreement scheme of sending a secret key from node  to
node  via wiretap channel coding.
For the general N-receiver -eavesdropper DM-WTC p(y1 , . . . , yN , z|x), the secrecy
capacity is lower bounded as

CK-CM ≥ max CK (X; Y1 ; . . . ; YN |Z), (.)

p(x)

where CK (X; Y1 ; . . . ; YN |Z) is the secret key capacity for the (N + 2)-DMS (X, Y N , Z).
Upper bounds. The secret key capacity for the DM-WTC p(y, z|x) is upper bounded as

̄
CK-CM ≤ max min I(X; Y | Z). (.)
p(x) p(̄z |z)

It can be readily verified that this upper bound is tight for Maurer’s example. The proof of
this upper bound follows similar steps to the upper bound on the secret key capacity for
the source model in Section ...
Summary 575

For the general N-receiver -eavesdropper DM-WTC p(y1 , . . . , yN , z|x), the secrecy
capacity is upper bounded as
̄ − RCFO ([1 : N + 1]| Z)󶀱,
CK-CM ≤ max min 󶀡H(X, Y N | Z) ̄ (.)
p(x) p(̄z |z)

where RCFO ([1 : N + 1]| Z) ̄ is the minimum CFO sum-rate for the (N + 1)-DMS ((X, Z), ̄
̄ ̄
(Y1 , Z), . . . , (YN , Z)). This upper bound coincides with the lower bound in (.) when
all nodes know the eavesdropper’s source Z, establishing the secret key capacity. Thus in
this case, the channel is used solely to provide correlated sources to the nodes.

22.4.3 Connection between Key Agreement and the Wiretap Channel

Note that in Maurer’s example we generated a key by constructing a virtual degraded wire-
tap channel from Y to (X, Z). This observation can be generalized to obtain the one-round
secret key capacity for the source model from the wiretap channel result. We provide an
alternative achievability proof of Theorem . by converting the source model for key
agreement into a wiretap channel. For simplicity, we consider achievability of the rate
R < I(X; Y) − I(X; Z). The general case can be proved as discussed in the achievability
proof in Section ...
Fix V ∼ Unif(X ) independent of (X, Y , Z). For i ∈ [1 : n], node  turns the public dis-
cussion channel into a wiretap channel by sending Vi ⊕ Xi over it. The equivalent wiretap
channel is then V → ((Y , V ⊕ X), (Z, V ⊕ X)). By Theorem ., we can transmit a con-
fidential message at a rate

R < I(V ; Y , V ⊕ X) − I(V ; Z, V ⊕ X)

where (a) follows since V ⊕ X is independent of X and hence of (Y , Z) by the Markov

relation V ⊕ X → X → (Y , Z).

SUMMARY

∙ Information theoretic notion of secrecy

∙ Discrete memoryless wiretap channel (DM-WTC):
∙ Multicoding
∙ Random selection of the codeword
∙ Random mapping of the codeword to the channel input sequence
576 Information Theoretic Secrecy

∙ Upper bound on the average leakage rate via the analysis of the number of jointly
typical sequences
∙ Secrecy capacity of public communication is upper bounded by the key rate
∙ Source model for secret key agreement via public communication:
∙ Double random binning
∙ Lower bound on the average key rate via the analysis of the random pmf
∙ Interaction over the public channel can increase the secret key rate
∙ CFO rate characterizes the secret key capacity when all nodes know the eavesdrop-
per’s source
∙ Randomized encoding achieves higher rates for confidential communication and key
agreement than deterministic encoding
∙ Channel model for secret key agreement:
∙ Generation of correlated sources via a noisy channel
∙ Wiretap channel coding for key generation
∙ Open problems:
22.1. What is the secrecy capacity of the -receiver -eavesdropper DM-WTC?
22.2. What is the secrecy capacity of the -receiver -eavesdropper DM-WTC?
22.3. What is the -round secret key capacity for the -DMS (X1 , X2 , Z)?

BIBLIOGRAPHIC NOTES

Information theoretic secrecy was pioneered by Shannon (b). He considered com-

munication of a message (plaintext) M from Alice to Bob over a noiseless public broadcast
channel in the presence of an eavesdropper (Eve) who observes the channel output (ci-
phertext) L. Alice and Bob share a key, which is unknown to Eve, and they can use it to
encrypt and decrypt M into L and vice versa. Shannon showed that to achieve perfect
secrecy, that is, information leakage I(M; L) = 0, the size of the key must be at least as
large as the size of the message, i.e., H(K) ≥ H(M).
This negative result has motivated subsequent work on the wiretap channel and secret
key agreement. The wiretap channel was first introduced by Wyner (c), who estab-
lished the secrecy capacity of the degraded wiretap channel. The secrecy capacity of the
scalar Gaussian wiretap channel in Example . is due to Leung and Hellman (). The
secrecy capacity for the vector case in Example . is due to Oggier and Hassibi ()
and Khisti and Wornell (a,b). Csiszár and Körner () established the rate–leakage
region of a general DM-BC with common and confidential messages in Theorem .,
Bibliographic Notes 577

which includes Theorem . as a special case. Chia and El Gamal () extended wire-
tap channel coding to the cases of two receivers and one eavesdropper and of one receiver
and two eavesdroppers, and established inner bounds on the rate–leakage region that are
tight in some special cases. For the -receiver -eavesdropper case, they showed that the
secrecy capacity is lower bounded as

CS ≥ max min󶁁I(U0 , U1 ; Y1 |Q) − I(U0 , U1 ; Z |Q), I(U0 , U2 ; Y2 |Q) − I(U0 , U2 ; Z |Q)󶁑,

where the maximum is over all pmfs p(q, u0 , u1 , u2 , x) = p(q, u0 )p(u1 |u0 )p(u2 , x|u1 ) =
p(q, u0 )p(u2 |u0 )p(u1 , x|u0 ) such that
I(U1 , U2 ; Z |U0 ) ≤ I(U1 ; Z |U0 ) + I(U2 ; Z |U0 ) − I(U1 ; U2 |U0 ).
The proof involved Marton coding in addition to wiretap channel coding, satellite code-
book generation, and indirect decoding. Proposition . is a special case of this result.
Wiretap channel coding has been extended to other multiuser channels; see the survey by
Liang, Poor, and Shamai ().
The secrecy capacity of the DM-WTC with a secret key in (.) is due to Yamamoto
(). Ahlswede and Cai () and Ardestanizadeh, Franceschetti, Javidi, and Kim
() studied secure feedback coding schemes to generate such a shared key, which are
optimal when the wiretap channel is physically degraded. Chia and El Gamal () stud-
ied wiretap channels with state and proposed a coding scheme that combines wiretap
channel coding with a shared key extracted from the channel state information to achieve
a lower bound on the secrecy capacity.
The source model for secret key agreement was introduced by Ahlswede and Csiszár
(), who established the one-way secret key capacity in Section ... The one-round
secret key capacity with rate constraint in Remark . is due to Csiszár and Narayan
(). Gohari and Anantharam (a) established the lower bound on the secret key
capacity for interactive key agreement between  nodes in Section .. and its extension
to multiple nodes in Section ... Example . is also due to them. The upper bound
for the -node case in Theorem . is due to Maurer and Wolf () and Example .
is due to A. A. Gohari. The upper bound for multiple nodes in Proposition . is due to
Csiszár and Narayan (), who also established the secret key capacity in Theorem .
when all the nodes know Z and the extension in (.) when some nodes are helpers. A
tighter upper bound was established by Gohari and Anantharam (a).
The example in Section .. is due to Maurer (), who also established the upper
bound on the secret key capacity in (.). Csiszár and Narayan () established the
upper bound in (.) and showed that it is tight when every node knows the eavesdrop-
per’s source. Tighter lower and upper bounds were given by Gohari and Anantharam
(b).
Most of the above results can be extended to a stronger notion of secrecy (Maurer and
Wolf ). For example, Csiszár and Narayan () established the secret key capacity
in Theorem . under the secrecy constraint
lim sup I(K j ; Z n , M q ) = 0, j ∈ [1 : N].
n→∞
578 Information Theoretic Secrecy

Information theoretic secrecy concerns confidentiality at the physical layer. Most secrecy
systems are built at higher layers and use the notion of computational secrecy, which ex-
ploits the computational hardness of recovering the message from the ciphertext without
knowing the key (assuming P ̸= NP); see, for example, Menezes, van Oorschot, and Van-
stone () and Katz and Lindell ().

PROBLEMS

.. Verify that (U , V ) → X → (Y , Z) form a Markov chain in the proof of the converse
for Theorem . in Section ...
.. Provide the details of the proof of the lower bound on the -receiver DM-WTC
secrecy capacity in Proposition ..
.. Prove Theorem ..
.. Establish the secrecy capacity of the DM-WTC p(y, z|x) with secret key rate RK
in (.). (Hint: Randomly and independently generate wiretap channel code-
books CmK for each key mK ∈ [1 : 2nRK ].)
.. Provide the details of the proof of the lower bound on the secret key capacity in
Theorem ..
.. Establish the upper bound in (.) on the secret key capacity for the DM-WTC
p(x2 , z|x1 ).
.. Secrecy capacity of a BSC-BEC WTC. Consider the DM-WTC in Example ..
(a) Suppose that the channel to the receiver is a BEC(є) and the channel to the
eavesdropper is a BSC(p). Find the secrecy capacity for each of the four regimes
of є and p.
(b) Now suppose that the channels to the receiver and the eavesdropper in part (a)
are exchanged. Find the secrecy capacity.
.. Secret key capacity. Consider the sources X1 , X2 , Z ∈ {0, 1, 2, 3} with the joint pmf
1
p X1 ,X2 ,Z (0, 0, 0) = p X1 ,X2 ,Z (0, 1, 1) = p X1 ,X2 ,Z (1, 0, 1) = p X1 ,X2 ,Z (1, 1, 0) = ,
8
1
p X1 ,X2 ,Z (2, 2, 2) = p X1 ,X2 ,Z (3, 3, 3) = .
4
Find the secret key capacity for the -DMS (X1 , X2 , Z).
.. Key generation without communication. Consider a secret key agreement setup,
where node  observes X1 , node  observes X2 , and the eavesdropper observes Z.
Show that the secret key capacity without (public) communication is

CK = H(X0 |Z),

where X0 is the common part of X1 and X2 as defined in Section ..

Appendix 22A Proof of Lemma 22.1 579

.. Secrecy capacity of the DMC with feedback. Consider the physically degraded DM-
WTC p(y|x)p(z|y). Assume that there is noiseless causal secure feedback from the
receiver to the sender. Show that the secrecy capacity is

CS = max max󶁁I(X; Y |Z) + H(Y | X, Z), I(X; Y )󶁑.

p(x)

(Hint: Due to the secure noiseless feedback, the sender and the receiver can agree
on a secret key of rate H(Y|X, Z) independent of the message and the transmitted
random codeword; see Problem .. Using block Markov coding, this key can be
used to increase the secrecy capacity for the next transmission block as in (.).)
Remark: This result is due to Ahlswede and Cai ().

APPENDIX 22A PROOF OF LEMMA 22.1

We bound H(L | Z n , M = m, C) for every m. We first bound the probabilities of the fol-
lowing events. Fix L = l and a sequence z n ∈ Tє(n) . Let

N(m, l , z n ) = 󵄨󵄨󵄨󵄨󶁁U n ( ̃l) ∈ C(m) : (U n ( ̃l), z n ) ∈ Tє(n) , ̃l ̸= l󶁑󵄨󵄨󵄨󵄨.

It is not difficult to show that

̃ ̃
2n(R−R−I(U ;Z)−δ(є)−є󰑛 ) ≤ E(N(m, l, z n )) ≤ 2n(R−R−I(U ;Z)+δ(є)−є󰑛 )

and
̃
Var(N(m, l , z n )) ≤ 2n(R−R−I(U ;Z)+δ(є)−є󰑛 ) ,

where єn tends to zero as n → ∞.

̃
Define the random event E2 (m, l , z n ) = { N(m, l , z n ) ≥ 2n(R−R−I(U ;Z)+δ(є)−є󰑛 /2)+1 }.
By the Chebyshev inequality,
̃
P(E2 (m, l, z n )) = P󶁁N(m, l, z n ) ≥ 2n(R−R−I(U ;Z)+δ(є)−є󰑛 /2)+1 󶁑
̃
≤ P󶁁N(m, l, z n ) ≥ E(N(m, l , z n )) + 2n(R−R−I(U ;Z)+δ(є)−є󰑛 /2) 󶁑
̃
≤ P󶁁|N(m, l, z n ) − E(N(m, l, z n ))| ≥ 2n(R−R−I(U ;Z)+δ(є)−є󰑛 /2) 󶁑
Var(N(m, l, z n ))
≤ ̃
. (.)
22n(R−R−I(U ;Z)+δ(є)−є󰑛 /2)

Thus if R̃ − R − I(U ; Z) ≥ 0, then P(E2 (m, l , z n )) tends to zero as n → ∞ for every m.

Next, for each message m, define N(m) = |{U n ( ̃l) ∈ C(m) : (U n ( ̃l), Z n ) ∈ Tє(n) , ̃l ̸= L}|
̃
and E2 (m) = {N(m) ≥ 2n(R−R−I(U ;Z)+δ(є)−є󰑛 /2)+1 }. Finally, define the indicator random vari-
able E(m) = 1 if (U n (L), Z n ) ∉ Tє(n) or the event E2 (m) occurs, and E(m) = 0 otherwise.
By the union of events bound,

P{E(m) = 1} ≤ P󶁁(U n (L), Z n ) ∉ Tє(n) 󶁑 + P(E2 (m)).

580 Information Theoretic Secrecy

We bound each term. By the LLN, the first term tends to zero as n → ∞. For P(E2 (m)),

P(E2 (m)) ≤ 󵠈 p(z n ) P{E2 (m)|Z n = z n } + P{Z n ∉ Tє(n) }

z 󰑛 ∈T󰜖(󰑛)

= 󵠈 󵠈 p(z n )p(l |z n ) P{E2 (m)|Z n = z n , L = l} + P{Z n ∉ Tє(n) }

z 󰑛 ∈T󰜖(󰑛) l

= 󵠈 󵠈 p(z n )p(l |z n ) P{E2 (m, z n , l)} + P{Z n ∉ Tє(n) }.

z 󰑛 ∈T󰜖(󰑛) l

By (.), the first term tends to zero as n → ∞ if R̃ − R ≥ I(U ; Z), and by the LLN, the
second term tends to zero as n → ∞.
We are now ready to upper bound the equivocation. Consider

H(L|Z n , M = m, C) ≤ 1 + P{E(m) = 1}H(L|Z n , M = m, E(m) = 1, C)

+ H(L|Z n , M = m, E(m) = 0, C)
≤ 1 + n(R̃ − R) P{E(m) = 1} + H(L|Z n , M = m, E(m) = 0, C)
̃
≤ 1 + n(R̃ − R) P{E(m) = 1} + log (2n(R−R−I(U ;Z)+δ(є)−є󰑛 /2)+1 ).

Now since P{E(m) = 1} tends to zero as n → ∞ if R̃ − R ≥ I(U ; Z), then for every m,

1
lim sup H(L|Z n , M = m, C) ≤ R̃ − R − I(U ; Z) + δ 󳰀 (є).
n→∞ n

This completes the proof of the lemma.

APPENDIX 22B PROOF OF LEMMA 22.2

Consider

H(K1 |C) ≥ P󶁁X n ∈ Tє(n) 󶁑H󶀡K1 󵄨󵄨󵄨󵄨 C, X n ∈ Tє(n) 󶀱 ≥ (1 − єn󳰀 )H󶀡K1 󵄨󵄨󵄨󵄨 C, X n ∈ Tє(n) 󶀱

for some єn󳰀 that tends to zero as n → ∞. Let P(k1 ) be the random pmf of K1 given {X n ∈
Tє(n) }, where the randomness is induced by the random bin assignments C. By symmetry,
P(k1 ), k1 ∈ [1 : 2nR ], are identically distributed. Let B 󳰀 (1) = ⋃m B(m, 1) and express P(1)
in terms of a weighted sum of indicator functions as

p(x n )
P(1) = 󵠈 ⋅ E(x n ),
x 󰑛 ∈T󰜖(󰑛)
P{X n ∈ Tє(n) }

where
1 if x n ∈ B 󳰀 (1),
E(x n ) = 󶁇
0 otherwise.
Appendix 22C Proof of Lemma 22.3 581

Then, it can be easily shown that

EC (P(1)) = 2−nR ,
2
p(x n )
Var(P(1)) = 2−nR (1 − 2−nR ) 󵠈 󶀦 󶀶
x 󰑛 ∈T󰜖(󰑛)
P{X n ∈ Tє(n) }
2−2n(H(X)−δ(є))
≤ 2−nR 2n(H(X)+δ(є))
(1 − єn󳰀 )2
≤ 2−n(R+H(X)−4δ(є))

for n sufficiently large. Thus, by the Chebyshev lemma in Appendix B,

Var(P(1)) 2−n(H(X)−R−4δ(є))
P󶁁|P(1) − E(P(1))| ≥ є E(P(1))󶁑 ≤ ≤ ,
(є E[P(1)])2 є2

which tends to zero as n → ∞ if R < H(X) − 4δ(є). Now, by symmetry,

H(K1 |C, X n ∈ Tє(n) )

= 2nR E󶁡P(1) log(1/P(1))󶁱
≥ 2nR P{|P(1) − E(P(1))| < є2−nR } ⋅ E󶁡P(1) log(1/P(1)) 󵄨󵄨󵄨󵄨 |P(1) − E(P(1))| < є2−nR 󶁱
2−n(H(X)−R−4δ(є))
≥ 󶀦1 − 󶀶 ⋅ 󶀡nR(1 − є) − (1 − є) log(1 + є)󶀱
є2
≥ n(R − δ(є))

for n sufficiently large and R < H(X) − 4δ(є). Thus, we have shown that if R < H(X) −
4δ(є), lim inf n→∞ H(K1 |C) ≥ R − δ(є), which completes the proof of the lemma.

APPENDIX 22C PROOF OF LEMMA 22.3

Let E1 = 1 if (X1n , Z n ) ∉ Tє(n) and E1 = 0 otherwise. Note that by the LLN, P{E1 = 1} tends
to zero as n → ∞. Consider

H(X1n |Z n , M, K1 , C) ≤ H(X1n , E1 |Z n , M, K1 , C)
≤ 1 + n P{E1 = 1} log |X1 | + 󵠈 p(z n , m, k1 |E1 = 0)
(z 󰑛 ,m,k1 )

⋅ H(X1n |Z n = z n , M = m, K1 = k1 , E1 = 0, C).

Now, for a codebook C and z n ∈ Tє(n) , let N(z n , C) be the number of sequences x1n ∈
B(m, k1 ) ∩ Tє(n) (X1 |z n ), and define

1 if N(z n , C) ≥ 2 E(N(z n , C)),

E2 (z n , C) = 󶁇
0 otherwise.
582 Information Theoretic Secrecy

It is easy to show that

̃
E(N(z n , C)) = 2−n(R+R) |Tє(n) (X1 |z n )|,
̃
Var(N(z n , C)) ≤ 2−n(R+R) |Tє(n) (X1 |z n )|.

Then, by the Chebyshev lemma in Appendix B,

Var(N(z n , C)) ̃
P{E2 (z n , C) = 1} ≤ ≤ 2−n(H(X1 |Z)−R−R−δ(є)) ,
(E[N(z n , C)])2

which tends to zero as n → ∞ if R̃ + R < H(X1 |Z) − δ(є), i.e., R < H(X1 |Z) − H(X1 |X2 ) −
2δ(є). Now consider

H(X1n | Z n = z n , M = m, K1 = k1 , E1 = 0, C)
≤ H(X1n , E2 | Z n = z n , M = m, K1 = k1 , E1 = 0, C)
≤ 1 + n P{E2 = 1} log |X1 | + H(X1n | Z n = z n , M = m, K1 = k1 , E1 = 0, E2 = 0, C)
≤ 1 + n P{E2 = 1} log |X1 | + n(H(X1 |Z) − R̃ − R + δ(є)),

which implies that

H(X1n |Z n , M, K1 , C)
≤ 2 + n P{E1 = 1} log |X1 | + n P{E2 = 1} log |X1 | + n(H(X1 |Z) − R̃ − R + δ(є)).

Taking n → ∞ completes the proof of the lemma.

CHAPTER 23

Wireless Fading Channels

So far we have modeled wireless channels as time-invariant Gaussian channels. Real-

world wireless channels, however, are time-varying due to multiple signal paths and user
mobility. The wireless fading channel models capture these effects by allowing the gains in
the Gaussian channels to change randomly over time. Since the channel gain information
is typically available at each receiver (through training sequences) and may be available at
each sender (via feedback from the receivers), wireless fading channels can be viewed as
channels with random state, where the state (channel gain) information is available at the
decoders and fully or partially available at the encoders. Depending on the fading model
and coding delay, wireless fading channel capacity may or may not be well defined. More-
over, even when capacity is well defined, it may not be a good measure of performance in
practice because it is overly pessimistic, or because achieving it requires very long coding
delay.
In this chapter we study several canonical wireless fading channels under the block
fading model with stationary ergodic channel gains. We introduce several coding ap-
proaches under the fast and slow fading assumptions, including water-filling in time,
compound channel coding, outage coding, broadcasting, and adaptive coding. We com-
pare these approaches using performance metrics motivated by practice. Finally, we show
that significant increase in rates can be achieved by exploiting fading through interference
alignment as we demonstrated for the k-user-pair symmetric QED-IC in Section ..

23.1 GAUSSIAN FADING CHANNEL

Consider the Gaussian fading channel

Yi = Gi Xi + Zi , i ∈ [1 : n],

where {Gi } is a channel gain process that models fading in wireless communication and
{Zi } is a WGN(1) process independent of {Gi }.
In practice, the channel gain typically varies at a much longer time scale than sym-
bol transmission time. This motivates the simplified block fading model depicted in Fig-
ure ., where the gain Gi is assumed to be constant over each coherence time interval
[(l − 1)k + 1 : lk] of length k for l = 1, 2, . . . , and the block gain process {G l }∞ ∞
l=1 = {G lk } l=1
is stationary ergodic. Assuming this model, we consider two coding paradigms.
584 Wireless Fading Channels

∙ Fast fading. Here the code block length spans a large number of coherence time inter-
vals and thus the channel is ergodic with a well-defined Shannon capacity (sometimes
referred to as the ergodic capacity). However, coding over a large number of coherence
time intervals results in excessive delay.
∙ Slow fading. Here the code block length is in the order of the coherence time interval
and therefore the channel is not ergodic and does not have a Shannon capacity in
general. We discuss alternative coding approaches and corresponding performance
metrics for this case.

G2
G1 Gl

1
i
k 2k lk

Figure .. Wireless channel fading process and its block fading model.

In the following two sections, we consider coding under fast and slow fading with
channel gain availability only at the decoder or at both the encoder and the decoder. When
the channel gain is available only at the decoder, we assume average power constraint P
on X, i.e., ∑ni=1 xi2 (m) ≤ nP, m ∈ [1 : 2nR ], and when the channel gain is also available at
the encoder, we assume expected average power constraint P on X, i.e.,
n
󵠈 E(xi2 (m, Gi )) ≤ nP, m ∈ [1 : 2nR ].
i=1

23.2 CODING UNDER FAST FADING

In fast fading, we code over many coherence time intervals (i.e., n ≫ k) and the block
gain process {G l } is stationary ergodic, for example, an i.i.d. process.
23.2 Coding under Fast Fading 585

Channel gain available only at the decoder. First we show that the ergodic capacity for
this case is
CGI-D = EG 󶁡C(G 2 P)󶁱.

Recall from Remark . that for a DMC with stationary ergodic state p(y|x, s), the capacity
when the state information is available only at the decoder is

CSI-D = max I(X; Y |S).

p(x)

This result can be readily extended to the Gaussian fading channel with a stationary er-
godic block gain process {G l } having the marginal distribution FG (д l ) and with power
constraint P to obtain the capacity

CSI-D (P) = sup I(X; Y |G)

F(x):E(X 2 )≤P

= sup 󶀡h(Y |G) − h(Y |G, X)󶀱

F(x):E(X 2 )≤P

= sup h(G X + Z |G) − h(Z)

F(x):E(X 2 )≤P

= EG 󶁡C(G 2 P)󶁱,

where the supremum is attained by X ∼ N(0, P).

Now the Gaussian fading channel under fast fading can be decomposed in time into
k parallel Gaussian fading channels, the first corresponding to transmission times 1, k + 1,
2k + 1, . . . , the second corresponding to transmission times 2, k + 2, . . . , and so on, with
the same stationary ergodic channel gain process and average power constraint P. Hence,
CGI-D ≥ CSI-D (P). The converse can be readily proved using standard arguments.
Channel gain available at the encoder and decoder. The ergodic capacity in this case is

CGI-ED = max I(X; Y |G)

F(x|д):E(X 2 )≤P

= max 󶀡h(Y |G) − h(Y |G, X)󶀱

F(x|д):E(X 2 )≤P

= max h(G X + Z |G) − h(Z)

F(x|д):E(X 2 )≤P
(a)
= max EG 󶁡C󶀡G 2 ϕ(G)󶀱󶁱,
ϕ(д):E(ϕ(G))≤P

where F(x|д) is the conditional cdf of X given {G = д} and (a) follows since the maxi-
mum is attained by X | {G = д} ∼ N(0, ϕ(д)). This can be proved by Remark . and using
similar arguments to the case with channel gain available only at the decoder.
The above capacity expression can be simplified further. Using a Lagrange multi-
plier λ, we can show that the capacity is attained by

1 +
ϕ∗ (д) = 󶁥λ − 󶁵 ,
д2
586 Wireless Fading Channels

where λ is chosen to satisfy

1 +
EG (ϕ∗ (G)) = EG 󶀥󶁤λ − 󶁴 󶀵 = P.
G2
Note that this power allocation corresponds to water-filling over time; see Figure . in
Section ... At high SNR, however, the capacity gain from water-filling vanishes and
CGI-ED − CGI-D tends to zero as P → ∞.

23.3 CODING UNDER SLOW FADING

Under the assumption of slow fading, we code over a single coherence time interval (i.e.,
n = k) and the notion of channel capacity is not well defined in general. As before, we
consider the cases where the channel gain is available only at the decoder and where it is
available at both the encoder and the decoder.

23.3.1 Channel Gain Available Only at the Decoder

When the encoder does not know the gain, we have several coding options.
Compound channel approach. We code against the worst channel to guarantee reliable
communication. The (Shannon) capacity under this coding approach can be readily found
by extending the capacity of the compound channel in Section . to the Gaussian case,
which yields
CCC = inf C(д 2 P).
д∈G

This compound channel approach, however, becomes impractical when fading causes
very low channel gain. Hence, we consider the following alternative coding approaches
that are more useful in practice.
Outage capacity approach. In this approach we transmit at a rate higher than the com-
pound channel capacity CCC and tolerate some information loss when the channel gain
is too low for the message to be recovered. If the probability of such an outage event is
small, then we can reliably communicate the message most of the time. More precisely, if
we can tolerate an outage probability pout , that is, a loss of a fraction pout of the messages
on average, then we can communicate at any rate lower than the outage capacity

Cout = max C(д 2 P).

д:P{G<д}≤pout

Broadcast channel approach. For simplicity of presentation, assume two fading states
д1 and д2 with д1 > д2 . We view the channel as a Gaussian BC with gains д1 and д2 ,
and use superposition coding to send a common message to both receivers at rate R̃ 0 <
C(д22 αP/(1
̄ + α д22 P)), α ∈ [0, 1], and a private message to the stronger receiver at rate
R̃ 1 < C(д1 αP). If the gain is д2 , the receiver of the fading channel can recover the common
2

message at rate R2 = R̃ 0 , and if the gain is д1 , it can recover both messages at total rate
23.3 Coding under Slow Fading 587

R1 = R̃ 0 + R̃ 1 . Assuming P{G = д1 } = p and P{G = д2 } = p,̄ we can compute the broadcast

capacity as
д 2 αP
̄
CBC = max 󶀦p C(д12 αP) + C 󶀦 2 2 󶀶󶀶 .
α∈[0,1] 1 + α д2 P

This approach is best suited for sending multimedia (video or music) over a fading channel
using successive refinement as discussed in Chapter . If the channel gain is low, the
receiver recovers only the low-fidelity description of the source and if the gain is high, it
also recovers the refinement and obtains the high-fidelity description.

23.3.2 Channel Gain Available at Both the Encoder and the Decoder
Compound channel approach. If the channel gain is available at the encoder, the com-
pound channel capacity is

CCC-E = inf C(д 2 P) = CCC .

д∈G

Thus, the capacity is the same as when the encoder does not know the state.
Adaptive coding. Instead of communicating at the capacity of the channel with the worst
gain, we adapt the transmission rate to the channel gain and communicate at the maxi-
mum rate C д = C(д 2 P) when the gain is д. We define the adaptive capacity as

CA = EG 󶁡C(G 2 P)󶁱.

Note that this is the same as the ergodic capacity when the channel gain is available only
at the decoder. The adaptive capacity, however, is just a convenient performance metric
and not a capacity in the Shannon sense.
Adaptive coding with power control. Since the encoder knows the channel gain, it can
adapt the power as well as the transmission rate. In this case, we define the power-control
adaptive capacity as

CPA = max EG 󶁡C󶀡G 2 ϕ(G)󶀱󶁱,

ϕ(д):E󰐺 (ϕ(G))≤P

where the maximum is attained by the water-filling power allocation that satisfies the
constraint EG (ϕ(G)) ≤ P. Note that the power-control adaptive capacity is identical to the
ergodic capacity when the channel gain is available at both the encoder and the decoder.
Again CPA is not a capacity in the Shannon sense.
Remark .. Although using power control achieves a higher rate on average, in some
practical situations, such as under the Federal Communications Commission (FCC) reg-
ulation, the power constraint must be satisfied in each coding block.

In the following, we compare the performance of the above coding schemes.

588 Wireless Fading Channels

Example .. Assume two fading states д1 and д2 with д1 > д2 and P{G = д1 } = p. In
Figure ., we compare the performance metrics CCC , Cout , CBC , CA , and CPA for different
values of p ∈ [0, 1]. The broadcast channel approach is effective when the better channel
occurs more often (p ≈ 1) and power control is particularly effective when the channel
varies frequently (p ≈ 1/2).

Cout
C(д12 P)

CPA

CA CBC
C(д22 P)
CCC , Cout CCC

0 p
0 1 − pout 1

Figure .. Comparison of performance metrics.

23.4 GAUSSIAN VECTOR FADING CHANNEL

As mentioned in Chapter , in the presence of fading, using multiple antennas can in-
crease the capacity via spatial diversity. To demonstrate this improvement, we model a
MIMO fading channel by the Gaussian vector fading channel

Y(i) = G(i)X(i) + Z(i), i ∈ [1 : n],

where Y(i) is an r-dimensional vector, the input X(i) is a t-dimensional vector, {G(i)}
is the channel gain matrix process that models random multipath fading, and {Z(i)} is a
WGN(Ir ) process independent of {G(i)}. We assume power constraint P on X.
As in the scalar case, we assume the block fading model, that is, G(i) is constant over
each coherence time interval but varies according to a stationary ergodic process over
consecutive coherence time intervals. As before, we can investigate various channel gain
availability scenarios and coding strategies under fast and slow fading assumptions. In
the following, we assume fast fading and study the effect of the number of antennas on
the capacity when the channel gain matrices are available at the decoder.
If the random channel gain matrices G(1), G(2), . . . , G(n) are identically distributed,
23.4 Gaussian Vector Fading Channel 589

then the ergodic capacity when the gain matrices are available only at the decoder is
1
C = max E 󶁣 log |Ir + GKX G T |󶁳 .
tr(KX )≤P 2
The maximization in the capacity expression can be further simplified if G is isotropic,
that is, the joint pdf of the matrix elements (G jk : j ∈ [1 : r], k ∈ [1 : t]) is invariant under
orthogonal transformations. For example, if Rayleigh fading is assumed, that is, if G jk ∼
N(0, 1), and G jk , j ∈ [1 : r], k ∈ [1 : t], are mutually independent, then G is isotropic.

Theorem .. If the channel gain matrix G is isotropic, then the ergodic capacity
when the channel gain is available only at the decoder is

󵄨󵄨 󵄨󵄨 min{t,r}
1 P 1 P
C = E 󶁤 log 󵄨󵄨󵄨 Ir + GG T 󵄨󵄨󵄨󶁴 = 󵠈 E 󶁤log 󶀤1 + Λ j 󶀴󶁴 ,
2 󵄨󵄨 t 󵄨󵄨 2 j=1 t

which is attained by KX∗ = (P/t)It , where Λ j are random nonzero eigenvalues of GG T .

At small P (low SNR), assuming Rayleigh fading with E(|G jk |2 ) = 1, the capacity ex-
pression simplifies to
min{t,r}
1 P
C= 󵠈 E 󶁤log 󶀤1 + Λ 󶀴󶁴
2 j=1 t j
min{t,r}
P
≈ 󵠈 E(Λ j )
2t log 2 j=1
P
= E󶁡tr(GG T )󶁱
2t log 2
P
= E󶁤󵠈 |G | 2 󶁴
2t log 2 j,k jk
rP
= .
2 log 2
This is an r-fold SNR (and capacity) gain compared to the single-antenna case. By con-
trast, at large P (high SNR),
min{t,r}
1 P
C= 󵠈 E 󶁤log 󶀤1 + Λ 󶀴󶁴
2 j=1 t j
min{t,r}
1 P
≈ 󵠈 E 󶁤log 󶀤 Λ j 󶀴󶁴
2 j=1 t
min{t,r}
min{t, r} P 1
= log 󶀤 󶀴 + 󵠈 E(log Λ j ).
2 t 2 j=1

This is a min{t, r}-fold increase in capacity over the single-antenna case.

590 Wireless Fading Channels

23.5 GAUSSIAN FADING MAC

Consider the Gaussian fading MAC

Yi = G1i X1i + G2i X2i + Zi , i ∈ [1 : n],

where {G1i ∈ G1 } and {G2i ∈ G2 } are channel gain processes and {Zi } is a WGN(1) process,
independent of {G1i } and {G2i }. Assume power constraint P on each of X1 and X2 . As
before, we assume the block fading model, where in block l = 1, 2, . . . , G ji = G jl for i ∈
[(l − 1)n + 1 : ln] and j = 1, 2, and consider fast and slow fading scenarios.

23.5.1 Fast Fading

Channel gains available only at the decoder. The ergodic capacity region in this case is
the set of rate pairs (R1 , R2 ) such that

R1 ≤ EG1 󶁡C(G12 P)󶁱,

R2 ≤ EG2 󶁡C(G22 P)󶁱,
R1 + R2 ≤ EG1 ,G2 󶁡C󶀡(G12 + G22 )P󶀱󶁱.

This can be shown by evaluating the capacity region characterization for the DM-MAC
with DM state when the state information is available only at the decoder in Section ..
under average power constraint P on each sender. In particular, the ergodic sum-capacity
is
CGI-D = EG1 ,G2 󶁡C󶀡(G12 + G22 )P󶀱󶁱.

Channel gains available at both encoders and the decoder. The ergodic capacity region
in this case is the set of rate pairs (R1 , R2 ) such that

R1 ≤ E󶁡C󶀡G12 ϕ1 (G1 , G2 )󶀱󶁱,

R2 ≤ E󶁡C󶀡G22 ϕ2 (G1 , G2 )󶀱󶁱,
R1 + R2 ≤ E󶁡C󶀡G12 ϕ1 (G1 , G2 ) + G22 ϕ2 (G1 , G2 )󶀱󶁱

for some ϕ1 and ϕ2 that satisfy the constraints E(ϕ j (G1 , G2 )) ≤ P, j = 1, 2. This can be
shown by evaluating the capacity region characterization for the DM-MAC with DM state
in Section .. under expected average power constraint P on each sender when the state
information is available at both encoders and the decoder. In particular, the ergodic sum-
capacity CGI-ED for this case can be computed by solving the optimization problem

maximize E󶁡C󶀡G12 ϕ1 (G1 , G2 ) + G22 ϕ2 (G1 , G2 )󶀱󶁱

subject to E󶁡ϕ j (G1 , G2 )󶁱 ≤ P, j = 1, 2.

Channel gains available at their respective encoders and at the decoder. Consider the
case where both channel gains are available at the receiver but each sender knows only the
23.5 Gaussian Fading MAC 591

gain of its channel to the receiver. This scenario is motivated by practical wireless commu-
nication systems in which each sender can estimate its channel gain via electromagnetic
reciprocity from a training sequence globally transmitted by the receiver (access point or
base station). Complete gain availability at the receiver can be achieved by feeding back
these gains from the senders to the receiver. Complete channel gain availability at the
senders, however, is not always practical as it requires either communication between the
senders or an additional round of feedback from the access point.
The ergodic capacity region in this case is the set of rate pairs (R1 , R2 ) such that

R1 ≤ E󶁡C󶀡G12 ϕ1 (G1 )󶀱󶁱,

R2 ≤ E󶁡C󶀡G22 ϕ2 (G2 )󶀱󶁱,
R1 + R2 ≤ E󶁡C󶀡G12 ϕ1 (G1 ) + G22 ϕ2 (G2 )󶀱󶁱

for some ϕ1 and ϕ2 that satisfy the constraints E(ϕ j (G j )) ≤ P, j = 1, 2. This again can
be established by evaluating the capacity region characterization for the DM-MAC with
DM state under expected average power constraint P on each sender when the state infor-
mation is available at the decoder and each state component is available at its respective
encoder. In particular, the sum-capacity CDGI-ED can be computed by solving the opti-
mization problem

maximize E󶁡C󶀡G12 ϕ1 (G1 ) + G22 ϕ2 (G2 )󶀱󶁱

subject to E󶁡ϕ j (G j )󶁱 ≤ P, j = 1, 2.

23.5.2 Slow Fading

If the channel gains are available only at the decoder or at both encoders and the decoder,
the compound channel, outage, and adaptive coding approaches and corresponding per-
formance metrics can be analyzed as in the point-to-point Gaussian fading channel case.
Hence, we discuss these coding approaches only for the case when each channel gains are
available at their respective encoders and at the decoder.
Compound channel approach. The capacity region using the compound channel ap-
proach is the set of rate pairs (R1 , R2 ) such that

R1 ≤ min C(д12 P),

д1

R2 ≤ min C(д22 P),

д2

R1 + R2 ≤ min C󶀡(д12 + д22 )P󶀱.

д1 , д2

This is the same as the case where the channel gains are available only at the decoder.
Adaptive coding. Each sender can adapt its rate to the channel gain so that its message
can be reliably communicated regardless of the other sender’s channel gain. This coding
approach makes the fading MAC equivalent to a channel that comprises multiple MACs
592 Wireless Fading Channels

(one for each channel gain pair) with shared inputs and separate outputs as illustrated in
Figure . for G j = {д j1 , д j2 }, j = 1, 2. Note that the capacity region of this equivalent
multiple MACs is the set of rate quadruples (R11 , R12 , R21 , R22 ) such that
2
R11 ≤ C(д11 P),
2
R12 ≤ C(д12 P),
2
R21 ≤ C(д21 P),
2
R22 ≤ C(д22 P),
2 2
(.)
R11 + R21 ≤ C󶀡(д11 + д21 )P󶀱,
2 2
R11 + R22 ≤ C󶀡(д11 + д22 )P󶀱,
2 2
R12 + R21 ≤ C󶀡(д12 + д21 )P󶀱,
2 2
R12 + R22 ≤ C󶀡(д12 + д22 )P󶀱.

Hence, the adaptive capacity region of the original fading MAC is the set of rate pairs
(R1 , R2 ) such that

R1 = P{G1 = д11 }R11 + P{G1 = д12 }R12 ,

R2 = P{G2 = д21 }R21 + P{G2 = д22 }R22 ,

where the rate quadruple (R11 , R12 , R21 , R22 ) satisfies the inequalities in (.). A more

Z11

M11 n
X11 д11 n
Y11 ̂ 11 , M
M ̂ 21
Encoder  Decoder 
Z12

M12 n
X12 д12 n
Y12 ̂ 11 , M
M ̂ 22
Encoder  Decoder 
Z21

M21 n
X21 д21 n
Y21 ̂ 12 , M
M ̂ 21
Encoder  Decoder 
Z22

M22 n
X22 д22 n
Y22 ̂ 12 , M
M ̂ 22
Encoder  Decoder 

Figure .. Equivalent channel with multiple MACs for adaptive coding over the
Gaussian fading MAC.
23.5 Gaussian Fading MAC 593

explicit characterization of the adaptive capacity region can be obtained by the Fourier–
Motzkin elimination procedure.
In general, the adaptive capacity region is the set of rate pairs (R1 , R2 ) = (E[R1 (G1 )],
E[R2 (G2 )]) such that the adaptive rate pair (R1 (д1 ), R2 (д2 )) satisfies the inequalities

R1 (д1 ) ≤ C(д12 P), д1 ∈ G1 ,

R2 (д2 ) ≤ C(д22 P), д2 ∈ G2 , (.)
R1 (д1 ) + R2 (д2 ) ≤ C󶀡(д12 + 2
д2 )P󶀱, (д1 , д2 ) ∈ G1 × G2 .

In particular, the adaptive sum-capacity CA can be computed by solving the optimization

problem

maximize E(R1 (G1 )) + E(R2 (G2 ))

subject to R1 (д1 ) ≤ C(д12 P), д1 ∈ G1 ,
R2 (д2 ) ≤ C(д22 P), д2 ∈ G2 ,
R1 (д1 ) + R2 (д2 ) ≤ C󶀡(д12 + д22 )P󶀱, (д1 , д2 ) ∈ G1 × G2 .

Adaptive coding with power control. Each sender can further adapt its power to the
channel gain according to the power allocation function ϕ j (д j ) that satisfies the constraint
E(ϕ(G j )) ≤ P, j = 1, 2. The power-control adaptive capacity region can be characterized as
in (.) with P replaced by ϕ(д j ). In particular, the power-control adaptive sum-capacity
CPA can be computed by solving the optimization problem

maximize E(R1 (G1 )) + E(R2 (G2 ))

subject to E(ϕ j (G j )) ≤ P, j = 1, 2,
R1 (д1 ) ≤ C(д12 ϕ1 (д1 )), д1 ∈ G1 ,
R2 (д2 ) ≤ C(д22 ϕ2 (д2 )), д2 ∈ G2 ,
R1 (д1 ) + R2 (д2 ) ≤ C󶀡д12 ϕ1 (д1 ) + д22 ϕ2 (д2 )󶀱, (д1 , д2 ) ∈ G1 × G2 .

In the following, we compare the sum-capacities achieved by the above approaches.

Example .. Assume that the gains G1 , G2 ∈ {д (1) , д (2) }, where states д (1) > д (2) and
P{G j = д (1) } = p, j = 1, 2. In Figure ., we compare the sum-capacities CCC (compound
channel), CA (adaptive), CPA (power-control adaptive), CGI-D (ergodic with gains available
only at the decoder), CDGI-ED (ergodic with gains available at their respective encoder and
the decoder), and CGI-ED (ergodic with gains available at both encoders and the decoder)
for different values of p ∈ [0, 1]. At low SNR (Figure .a), power control is useful for
adaptive coding; see CPA versus CA . At high SNR (Figure .b), centralized knowledge
of channel gains significantly improves upon distributed knowledge of channel gains.
594 Wireless Fading Channels

C(S1 )
CGI-ED
CDGI-ED
CPA

CGI-D
CA

C(S2 )
CCC
p
0 1

(a) Low SNR.

C(S1 )

CGI-ED
CDGI-ED
CGI-D
CPA
CA

C(S2 )
CCC
p
0 1

(b) High SNR.

Figure .. Comparison of performance metrics for the fading MAC. Here S j =
󶀡д ( j) )2 P, j = 1, 2.
23.6 Gaussian Fading BC 595

23.6 GAUSSIAN FADING BC

Consider the Gaussian fading BC

Y1i = G1i Xi + Z1i ,
Y2i = G2i Xi + Z2i , i ∈ [1 : n],
where {G1i ∈ G1 } and {G2i ∈ G2 } are channel gain processes and {Z1i } and {Z2i } are WGN(1)
processes, independent of {G1i } and {G2i }. Assume power constraint P on X. As before,
we assume the block fading model, where the channel gains are fixed in each coherence
time interval but vary according to a stationary ergodic process over different intervals.
Fast fading. When the channel gain is available only at the decoders, the ergodic capacity
region is not known in general. The difficulty is that the encoder does not know in which
direction the channel is degraded during each coherence time interval. When the channel
gain is available also at the encoder, however, the direction of channel degradedness is
known and the capacity region can be easily established. Let E(д1 , д2 ) = 1 if д2 ≥ д1 and
E(д1 , д2 ) = 0 otherwise, and Ē = 1 − E. Then, the ergodic capacity region is the set of rate
pairs (R1 , R2 ) such that
αG12 P
R1 ≤ E 󶀦C 󶀦 󶀶󶀶 ,
̄ 12 PE(G1 , G2 )
1 + αG
̄ 22 P
αG
R2 ≤ E 󶀦C 󶀦 󶀶󶀶
̄ 1 , G2 )
1 + αG 2 P E(G
2

for some α ∈ [0, 1].

Slow fading. The interesting case is when the channel gains are available only at the de-
coders. Again we can investigate different approaches to coding. For example, using the
compound channel approach, we code for the worst channel. Let д1∗ and д2∗ be the lowest
gains for the channel to Y1 and the channel to Y2 , respectively, and assume without loss
of generality that д1∗ ≥ д2∗ . Then, the compound capacity region is the set of rate pairs
(R1 , R2 ) such that
R1 ≤ C(д1∗ αP),
д2∗ αP
̄
R2 ≤ C 󶀦 󶀶
1 + д2∗ αP
for some α ∈ [0, 1]. As before, the adaptive coding capacity regions are the same as the
corresponding ergodic capacity regions.

23.7 GAUSSIAN FADING IC

Consider a k-user-pair Gaussian fading IC

Y(i) = G(i)X(i) + Z(i), i ∈ [1 : n],

596 Wireless Fading Channels

where {G j j 󳰀 (i)}∞ 󳰀
i=1 , j, j ∈ [1 : k], are the random channel gain processes from sender j
󳰀

to receiver j, and {Z ji }∞ i=1 , j ∈ [1 : k], are WGN(1) processes, independent of the channel
gain processes. Assume power constraint P on X j , j ∈ [1 : k]. As before, assume the block
fading model, where the channel gains are fixed in each coherence time interval but varies
according to a stationary ergodic processes over different intervals.
We consider fast fading where the channel gains {G(i)} are available at all the encoders
and the decoders. Assuming that the marginal distribution of each channel gain process
{G j j 󳰀 (i)}ni=1 , j, j 󳰀 ∈ [1 : k], is symmetric, that is, P{G j j 󳰀 (i) ≤ д} = P{G j j󳰀 (i) ≥ −д}, we can
apply the interference alignment technique introduced in Section . to achieve higher
rates than time division or treating interference as Gaussian noise.
We illustrate this result through the following simple example. Suppose that for each
i ∈ [1 : n], G j j󳰀 (i), j, j 󳰀 ∈ [1 : k], are i.i.d. Unif{−д, +д}.
Time division. Using time division with power control, we can easily show that the er-
godic sum-capacity is lower bounded as

CGI-ED ≥ C(д 2 kP).

Thus the rate per user tends to zero as k → ∞.

Treating interference as Gaussian noise. Using Gaussian point-to-point channel codes
and treating interference as Gaussian noise, the ergodic sum-capacity is lower bounded
as
д2 P
CGI-ED ≥ k C 󶀦 2 󶀶.
д (k − 1)P + 1

Thus again the rate per user tends to zero as k → ∞.

Ergodic interference alignment. Using interference alignment over time, we show that
the ergodic sum-capacity is
k
CGI-ED = C(2д 2 P).
2
The proof of the converse follows by noting that the pairwise sum-rate R j + R j󳰀 , j ̸= j 󳰀 ,
is upper bounded by the sum-capacity of the two -user-pair Gaussian ICs with strong
interference:
Y j = дX j + дX j󳰀 + Z j ,
Y j 󳰀 = дX j − дX j󳰀 + Z j󳰀 ,

and
Y j = дX j + дX j 󳰀 + Z j ,
Y j󳰀 = дX j + дX j 󳰀 + Z j 󳰀 .

Note that these two Gaussian ICs have the same capacity region.
Achievability can be proved by treating the channel gain matrices G n as a time-sharing
Summary 597

sequence as in Section .. and repeating each codeword twice such that the interfering
signals are aligned. For a channel gain matrix G, let G ∗ be its conjugate channel gain
matrix, that is, G j j󳰀 + G ∗j j󳰀 = 0 for j ̸= j 󳰀 and G j j = G ∗j j . Under our channel model, there
2
are |G| = 2k channel gain matrices and |G|/2 pairs of {G, G ∗ }, and p(G) = 1/|G| for all G.
Following the coding procedure for the DMC with DM state available at both the
encoder and the decoder, associate with each channel gain matrix G a FIFO buffer of
length n. Divide the message M j into |G|/2 equal-rate messages M j,G of rate 2R j /|G|, and
generate codewords x nj (m j,G , G) for each pair {G, G ∗ }. Store each codeword twice into
FIFO buffers corresponding to each G and its conjugate. Then transmit the codewords by
multiplexing over the buffers based on the channel gain matrix sequence.
Now since the same codeword is repeated twice, the demultiplexed channel outputs
corresponding to each gain pair {G, G ∗ } are

Y(G) = GX + Z(G),
Y(G ∗ ) = G ∗ X + Z(G ∗ )

with the same X. Here the corresponding noise vectors Z(G) and Z(G ∗ ) are indepen-
dent because they have different transmission times. Thus, for each gain pair {G, G ∗ }, the
effective channel is
̃ = (G + G ∗ )X + Z(G) + Z(G ∗ ),
Y

which has no interference since G + G ∗ is diagonal! Hence, the probability of error tends
to zero as n → ∞ if the rate

R j,G < (1 − є)p(G) C(2д 2 P),

or equivalently, if
1−є
Rj < C(2д 2 P).
2
This completes the proof of achievability, establishing the sum-capacity of the channel.
Remark .. This ergodic interference alignment technique can be extended to arbi-
trary symmetric fading distributions by quantizing channel gain matrices.

SUMMARY

∙ Gaussian fading channel models

∙ Coding under fast fading:
∙ Ergodic capacity
∙ Water-filling in time
∙ The ergodic capacity of a MIMO fading channel increases linearly with the number of
antennas
598 Wireless Fading Channels

∙ Coding under slow fading:

∙ Compound channel approach
∙ Outage capacity approach
∙ Broadcast channel approach
∙ Adaptive coding with or without power control
∙ Ergodic interference alignment

∙ Open problem .. What is the ergodic capacity region of the Gaussian fading BC
under fast fading when the channel gain information is available only at the decoders?

BIBLIOGRAPHIC NOTES

The block fading model was first studied by Ozarow, Shamai, and Wyner (), who also
introduced the notion of outage capacity. The ergodic capacity of the Gaussian fading
channel was established by Goldsmith and Varaiya (). Cover () introduced the
broadcast channel approach to compound channels, which was later applied to Gaussian
fading channels with slow fading by Shamai () and Shamai and Steiner (). The
Gaussian vector fading channel was first considered by Telatar () and Foschini (),
who established the ergodic capacity in Theorem ..
The ergodic sum-capacity of the Gaussian fading MAC when channel gains are avail-
able at both encoders and the decoder was established by Knopp and Humblet (). This
result was generalized to the capacity region by Tse and Hanly (). The ergodic capac-
ity region when the channel gains are available at the decoder and each respective encoder
was established by Cemal and Steinberg () and Jafar (). The adaptive coding ap-
proach under the same channel gain availability assumption was studied by Shamai and
Telatar () and Hwang, Malkin, El Gamal, and Cioffi ().
The ergodic capacity region of the Gaussian fading BC when the channel gains are
available at the encoder and both decoders was established by Li and Goldsmith ().
Ergodic interference alignment was introduced by Cadambe and Jafar (). The exam-
ple in Section . is due to Nazer, Gastpar, Jafar, and Vishwanath (), who studied
both the k-user QED-IC and Gaussian IC. A survey of the literature on Gaussian fading
channels can be found in Biglieri, Proakis, and Shamai (). More recent developments
as well as applications to wireless systems are discussed in Tse and Viswanath () and
Goldsmith ().
The capacity scaling for random wireless fading networks was studied in Jovičić, Vis-
wanath, and Kulkarni (), Xue, Xie, and Kumar (), Xie and Kumar (), and
Ahmad, Jovičić, and Viswanath (). Özgür, Lévêque, and Tse () showed that
under Rayleigh fading with channel gains available at all the nodes, the order of the sym-
metric ergodic capacity is lower bounded as C(N) = Ω(N −є ) for every є > 0; see also
Ghaderi, Xie, and Shen (). Özgür et al. () used a -phase cellular hierarchical
Problems 599

cooperation scheme, where in phase  source nodes in every cell exchange their messages,
in phase  the source node and other nodes in the same cell cooperatively transmit the in-
tended message for each source–destination pair as for the point-to-point Gaussian vector
fading channel, and in phase  receiver nodes in every cell exchange the quantized ver-
sions of their received outputs. The communication in the first and third phases can be
performed using the same scheme recursively by further dividing the cells into subcells.
This result demonstrates the potential improvement in capacity by transmission coopera-
tion and opportunistic coding using the channel fading information. This result was fur-
ther extended to networks with varying node densities by Özgür, Johari, Tse, and Lévêque
(). The hierarchical cooperation approach was also extended to scaling laws on the
capacity region of wireless networks by Niesen, Gupta, and Shah ().

PROBLEMS

.. Find the compound channel capacity region of the Gaussian fading MAC un-
der slow fading (a) when the channel gains are available only at the decoder and
(b) when the channel gains are available at their respective encoders and at the
decoder.
.. Establish the ergodic capacity region of the Gaussian fading BC under fast fading
when the channel gains are available at the encoder and both decoders.
.. Outage capacity. Consider the slow-fading channel Y = G X + Z, where G ∼
Unif[0, 1], Z ∼ N(0, 1), and assume average power constraint P. Find the outage
capacity (a) when P = 1 and pout = 0.2, and (b) when P = 2 and pout = 0.1.
.. On-off fading MAC. Consider the Gaussian fading MAC under slow fading when
the channel gains are available at their respective encoders and at the decoder.
Suppose that G1 and G2 are i.i.d. Bern(p).
(a) Find the adaptive coding capacity region.
(b) Find the adaptive sum capacity.
.. DM-MAC with distributed state information. Consider the DM-MAC with DM
state p(y|x1 , x2 , s1 , s2 )p(s1 , s2 ), where sender j = 1, 2 wishes to communicate a
message M j ∈ [1 : 2nR 󰑗 ] to the receiver. Suppose that both states are available at
the decoder and each state is causally available at its corresponding encoder. Find
the adaptive coding capacity region of this channel, which is the capacity region of
multiple MACs, one for each channel gain pair (s1 , s2 ) ∈ S1 × S2 , defined similarly
to the Gaussian case.
CHAPTER 24

Networking and Information Theory

The source and network models we discussed so far capture many essential ingredients of
real-world communication networks, including
∙ noise,
∙ multiple access,
∙ broadcast,
∙ interference,
∙ time variation and uncertainty about channel statistics,
∙ distributed compression and computing,
∙ joint source–channel coding,
∙ multihop relaying,
∙ node cooperation,
∙ interaction and feedback, and
∙ secure communication.
Although a general theory for information flow under these models remains elusive, we
have seen that there are several coding techniques—some of which are optimal or close to
optimal—that promise significant performance improvements over today’s practice. Still,
the models we discussed do not capture other key aspects of real-world networks.
∙ We assumed that data is always available at the communication nodes. In real-world
networks, data is bursty and the nodes have finite buffer sizes.
∙ We assumed that the network has a known and fixed number of users. In real-world
networks, users can enter and leave the network at will.
∙ We assumed that the network operation is centralized and communication over the
network is synchronous. Many real-world networks are decentralized and communi-
cation is asynchronous.
24.1 Random Data Arrivals 601

∙ We analyzed performance assuming arbitrarily long delays. In many networking ap-

plications, delay is a primary concern.
∙ We ignored the overhead (protocol) needed to set up the communication as well as
the cost of feedback and channel state information.
While these key aspects of real-world networks have been at the heart of the field of com-
puter networks, they have not been satisfactorily addressed by network information the-
ory, either because of their incompatibility with the basic asymptotic approach of infor-
mation theory or because the resulting models are messy and intractable. There have been
several success stories at the intersection of networking and network information theory,
however. In this chapter we discuss three representative examples.
We first consider the channel coding problem for a DMC with random data arrival.
We show that reliable communication is feasible provided that the data arrival rate is less
than the channel capacity. Similar results can be established for multiuser channels and
multiple data streams. A key new ingredient in this study is the notion of queue stability.
The second example we discuss is motivated by the random medium access control
scheme for sharing a channel among multiple senders such as in the ALOHA network.
We model a -sender -receiver random access system by a modulo- sum MAC with
multiplicative binary state available partially at each sender and completely at the receiver.
We apply various coding approaches introduced in Chapter  to this model and compare
the corresponding performance metrics.
Finally, we investigate the effect of asynchrony on the capacity region of the DM-MAC.
We extend the synchronous multiple access communication system setup in Chapter 
to multiple transmission blocks in order to incorporate unknown transmission delays.
When the delay is small relative to the transmission block length, the capacity region
does not change. However, when we allow arbitrary delay, time sharing cannot be used
and hence the capacity region can be smaller than for the synchronous case.

24.1 RANDOM DATA ARRIVALS

In the point-to-point communication system setup in Section . and subsequent exten-
sions to multiuser channels, we assumed that data is always available at the encoder. In
many networking applications, however, data is bursty and it may or may not be available
at the senders when the channel is free. Moreover, the amount of data at a sender may ex-
ceed its finite buffer size, which results in data loss even before transmission takes place. It
turns out that under fairly general data arrival models, if the data rate λ bits/transmission
is below the capacity C of the channel, then the data can be reliably communicated to the
receiver, while if λ > C, data cannot be reliably communicated either because the incom-
ing data exceeds the sender’s queue size or because transmission rate exceeds the channel
capacity. We illustrate this general result using a simple random data arrival process.
Consider the point-to-point communication system with random data arrival at its
input depicted in Figure .. Suppose that data packets arrive at the encoder at the “end”
602 Networking and Information Theory

{A(i)} Mj X nj Y jn ̂j
M
Encoder p(y|x) Decoder
Queue

Figure .. Communication system with random data arrival.

of transmission time i = 1, 2, . . . according to an i.i.d. process {A(i)}, where

k with probability p,
A(i) = 󶁇
0 ̄
with probability p.

Thus, a packet randomly and uniformly chosen from the set of k-bit sequences arrives at
̄ Assume that the
the encoder with probability p and no packet arrives with probability p.
packets arriving at different transmission times are independent of each other.
A (2nR , n) augmented block code for the DMC consists of
∙ an augmented message set [1 : 2nR ] ∪ {0},
∙ an encoder that assigns a codeword x n (m) to each m ∈ [1 : 2nR ] ∪ {0}, and
̂ ∈ [1 : 2nR ] ∪ {0} or an error message e to each
∙ a decoder that assigns a message m
n
received sequence y .
The code is used in consecutive transmission blocks as follows. Let Q(i) be the number of
bits (backlog) in the sender’s queue at the “beginning” of transmission time i = 1, 2, . . . .
At the beginning of time jn, j = 1, 2, . . . , that is, at the beginning of transmission block
j, nR bits are taken out of the queue if Q( jn) ≥ nR. The bits are represented by a message
M j ∈ [1 : 2nR ] and the codeword x n (m j ) is sent over the DMC. If Q( jn) < nR, no bits are
taken out of the queue and the “-message” codeword x n (0) is sent. Thus, the backlog
Q(i) is a time-varying Markov process with transition law

Q(i) − nR + A(i) if i = jn and Q(i) ≥ nR,

Q(i + 1) = 󶁇 (.)
Q(i) + A(i) otherwise.

The queue is said to be stable if supi E(Q(i)) ≤ B for some constant B < ∞. By the Markov
inequality, queue stability implies that the probability of data loss can be made as small
as desired with a finite buffer size. Define the arrival rate λ = kp as the product of the
packet arrival rate p ∈ (0, 1] and packet size k bits. We have the following sufficient and
necessary conditions on the stability of the queue in terms of the transmission rate R and
the arrival rate λ.

Lemma .. If λ < R, then the queue is stable. Conversely, if the queue is stable, then
λ ≤ R.

The proof of this lemma is given in Appendix ..

24.1 Random Data Arrivals 603

Let p j = P{M j = 0} be the probability that the sender queue has less than nR bits
at the beginning of transmission block j. By the definition of the arrival time process,
M j | {M j ̸= 0} ∼ Unif[1 : 2nR ]. Define the probability of error in transmission block j as

(1 − p j ) 2󰑛󰑅
Pe(n) ̂ j ̸= M j } = p j P{M
= P{M ̂ j ̸= 0 | M j = 0} + ̂ j ̸= m | M j = m}.
󵠈 P{M
j
2nR m=1

The data arriving at the encoder according to the process {A(i)} is said to be reliably com-
municated at rate R over the DMC if the queue is stable and there exists a sequence of
(2nR , n) augmented codes such that limn→∞ sup j Pe(n)
j = 0. We wish to find the necessary
and sufficient condition for reliable communication of the data over the DMC.

Theorem .. The random data arrival process {A(i)} with arrival rate λ can be reli-
ably communicated at rate R over a DMC p(y|x) with capacity C if λ < R < C. Con-
versely, if the process {A(i)} can be reliably communicated at rate R over this DMC,
then λ ≤ R ≤ C.

Proof. To prove achievability, let λ < R < C. Then the queue is stable by Lemma . and
there exists a sequence of (2nR + 1, n) (regular) channel codes such that both the average
probability of error Pe(n) and P{M̂ ̸= M | M = m󳰀 } for some m󳰀 tend to zero as n → ∞.
By relabeling m󳰀 = 0, we have shown that there exists a sequence of (2nR , n) augmented
codes such that Pe(n)
j tends to zero as n → ∞ for every j.
To prove the converse, note first that λ ≤ R from Lemma .. Now, for each j, follow-
ing similar steps to the converse proof of the channel coding theorem in Section .., we
obtain
nR = H(M j | M j ̸= 0)
≤ I(M j ; Y n | M j ̸= 0) + nєn
n (.)
≤ 󵠈 I(Xi ; Yi | M j ̸= 0) + nєn
i=1
≤ n(C + єn ).

This completes the proof of Theorem ..

Remark 24.1. Theorem . continues to hold for arrival processes for which Lemma .
holds. It can be also extended to multiuser channels with random data arrivals at each
sender. For example, consider the case of a DM-MAC with two independent i.i.d. arrival
processes {A 1 (i)} and {A 2 (i)} of arrival rates λ1 and λ2 , respectively. The stability region
S for the two sender queues is the closure of the set of arrival rates (λ1 , λ2 ) such that both
queues are stable. We define the augmented code (2nR1 , 2nR2 , n), the average probability
of error, and achievability as for the point-to-point case. Let C be the capacity region of
the DM-MAC. Then it can be readily shown that S = C . Note that the same result holds
when the packet arrivals (but not the packet contents) are correlated.
604 Networking and Information Theory

Remark 24.2. The conclusion that randomly arriving data can be communicated reli-
ably over a channel when the arrival rate is less than the capacity trivializes the effect of
randomness in data arrival. In real-world applications, packet delay constraints are as
important as queue stability. However, the above result, and the asymptotic approach of
information theory in general, does not capture such constraints well.

24.2 RANDOM ACCESS CHANNEL

The previous section dealt with random data arrivals at the senders. In this section, we
consider random data arrivals at the receivers. We discuss random access, which is a
popular scheme for medium access control in local area networks. In these networks, the
number of senders is not fixed a priori and hence using time division can be inefficient.
The random access scheme improves upon time division by having each active sender
transmit its packets in randomly selected transmission blocks. In practical random access
control systems, however, the packets are encoded at a fixed rate and if more than one
sender transmits in the same block, the packets are lost. It turns out that we can do better
by using more sophisticated coding schemes.
We model a random access channel by a modulo- sum MAC with multiplicative bi-
nary state components as depicted in Figure .. The output of the channel at time i
is
Yi = S1i ⋅ X1i ⊕ S2i ⋅ X2i ,

where the states S1i and S2i are constant over each access time interval [(l − 1)k + 1 : lk]
of length k for l = 1, 2, . . . , and the processes {S̄1l }∞ ∞ ̄ ∞
l=1 = {S1,(l−1)k+1 } l=1 and {S2l } l=1 =
∞
{S2,(l−1)k+1 }l=1 are independent Bern(p) processes. Sender j = 1, 2 is active (has a packet
to transmit) when S j = 1 and is inactive when S j = 0. We assume that the receiver knows
which senders are active in each access time interval, but each sender knows only its own
activity.

M1 X1n
Encoder 
Yn ̂ 1, M
M ̂2
Decoder
M2 X2n
Encoder 

Figure .. Random access channel.

24.2 Random Access Channel 605

Note that this model is analogous to the Gaussian fading MAC in Section ., where
the channel gains are available at the receiver but each sender knows only the gain of its
own channel. Since each sender becomes active at random in each block, the communi-
cation model corresponds to the slow fading scenario. Using the analogy to the fading
MAC, we consider different coding approaches and corresponding performance metrics
for the random access channel. Unlike the fading MAC, however, no coordination is al-
lowed between the senders in the random access channel.
Compound channel approach. In this approach, we code for the worst case in which no
packets are to be transmitted (i.e., S1i = S2i = 0 for all i ∈ [1 : n]). Hence, the capacity
region is {(0, 0)}.
ALOHA. In this approach, sender j = 1, 2 transmits at rate R j = 1 when it is active and
at rate R j = 0 when it is not. When there is collision (that is, both senders are active),
decoding simply fails. The ALOHA sum-capacity (that is, the average total throughput) is

CALOHA = p(1 − p) + p(1 − p) = 2p(1 − p).

Adaptive coding. By reducing the rates in the ALOHA approach to R̃ j ≤ 1 when sender
j = 1, 2 is active (so that the messages can be recovered even under collision), we can
increase the average throughput.
To analyze the achievable rates for this approach, consider the -sender -receiver
channel depicted in Figure .. It can be easily shown that the capacity region of this
channel is the set of rate pairs (R̃ 1 , R̃ 2 ) such that R̃ 1 + R̃ 2 ≤ 1 and is achieved using simul-
taneous decoding without time sharing; see Problem .. Hence, any rate pair (R̃ 1 , R̃ 2 )
in the capacity region of the -sender -receiver channel is achievable for the random ac-
cess channel, even though each sender is aware only of its own activity. In particular, the
adaptive coding sum-capacity is

CA = max 󶀡P{S1 = 1}R̃ 1 + P{S2 = 1}R̃ 2 󶀱 = p.

(R̃ 1 , R̃ 2 ): R̃ 1 +R̃ 2 ≤1

Broadcast channel approach. In the ALOHA approach, the messages cannot be recov-
ered at all when there is a collision. In the adaptive coding approach, both messages must

̃1
M X1n Y1n ̂1
M
Encoder  Decoder 

n
Y12 ̂ 1, M
M ̂2
Decoder 

̃2
M X2n Y2n ̂2
M
Encoder  Decoder 

Figure .. Adaptive coding for the random access channel.

606 Networking and Information Theory

be recovered even when there is a collision. The broadcast approach combines these two
approaches by requiring only part of each message to be recovered when there is a col-
lision and the rest of the message to be also recovered when there is no collision. This
is achieved using superposition coding. To analyze the achievable rates for this strategy,
consider the -sender -receiver channel depicted in Figure .. Here the message pair
(M̃ j0 , M
̃ j j ) from the active sender j is to be recovered when there is no collision, while the
message pair (M ̃ 10 , M
̃ 20 ), one from each sender is to be recovered when there is collision.
It can be shown (see Problem .) that the capacity region of this -sender -receiver
channel is the set of rate quadruples (R̃ 10 , R̃ 11 , R̃ 20 , R̃ 22 ) such that
R̃ 10 + R̃ 20 + R̃ 11 ≤ 1,
(.)
R̃ 10 + R̃ 20 + R̃ 22 ≤ 1.
As for the adaptive coding case, this region can be achieved using simultaneous decoding
without time sharing. Note that taking (R̃ 11 , R̃ 22 ) = (0, 0) reduces to the adaptive coding
case. The average throughput of sender j ∈ {1, 2} is
R j = p(1 − p)(R̃ j0 + R̃ j j ) + p2 R̃ j0 = pR̃ j0 + p(1 − p)R̃ j j .
Thus, the broadcast sum-capacity is
CBC = max 󶀡p(R̃ 10 + R̃ 20 ) + p(1 − p)(R̃ 11 + R̃ 22 )󶀱,
where the maximum is over all rate quadruples in the capacity region in (.). By sym-
metry, it can be readily checked that
CBC = max{2p(1 − p), p}.
Note that this sum-capacity is achieved by setting R̃ 11 = R̃ 22 = 1, R̃ 10 = R̃ 20 = 0 for p ≤ 1/2,
and R̃ 10 = R̃ 20 = 1/2, R̃ 11 = R̃ 22 = 0 for p ≥ 1/2. Hence, ignoring collision (ALOHA)
is throughput-optimal when p ≤ 1/2, while the broadcast channel approach reduces to
adaptive coding when p ≥ 1/2.
Figure . compares the sum-capacities CCC (compound channel approach), CALOHA
(ALOHA), CA (adaptive coding), and CBC (broadcast channel approach). Note that the
broadcast channel approach performs better than adaptive coding when the senders are
active less often (p ≤ 1/2).

̃ 10 , M
M ̃ 11 X1n Y1n ̂ 10 , M
M ̂ 11
Encoder  Decoder 

n
Y12 ̂ 10 , M
M ̂ 20
Decoder 

̃ 20 , M
M ̃ 22 X2n Y2n ̂ 20 , M
M ̂ 22
Encoder  Decoder 

Figure .. Broadcast coding for the random access channel.

24.3 Asynchronous MAC 607

CBC

1/2

CALOHA
CA
CCC = 0
0 p
0 1/2 1

Figure .. Comparison of the sum-capacities of the random access channel—CCC

for the compound approach, CALOHA for ALOHA, CA for adaptive coding, and CBC
for the broadcast channel approach.

24.3 ASYNCHRONOUS MAC

In the single-hop channel models we discussed in Part II of the book, we assumed that
the transmissions from the senders to the receivers are synchronized (at both the symbol
and block levels). In practice, such complete synchronization is often not feasible. How
does the lack of synchronization affect the capacity region of the channel? We answer
this question for the asynchronous multiple access communication system depicted in
Figure ..
Suppose that sender j = 1, 2 wishes to communicate an i.i.d. message sequence (M j1 ,
M j2 , . . .). Assume that the same codebook is used in each transmission block. Further
assume that symbols are synchronized, but that the blocks sent by the two encoders in-
cur arbitrary delays d1 , d2 ∈ [0 : d], respectively, for some d ≤ n − 1. Assume that the
encoders and the decoder do not know the delays a priori. The received sequence Y n is
distributed according to
n
p(y n |x1,1−d
n
1
n
, x2,1−d2
) = 󵠉 pY|X1 ,X2 (yi |x1,i−d1 , x2,i−d2 ),
i=1

where the symbols with negative indices are from the previous transmission block.

M1l X1i X1,i−d1

Encoder  d1
Yi ̂ 1l , M
M ̂ 2l
p(y|x1 , x2 ) Decoder
M2l X2i X2,i−d2
Encoder  d2

Figure .. Asynchronous multiple access communication system.

608 Networking and Information Theory

A (2nR1 , 2nR2 , n, d) code for the asynchronous DM-MAC consists of

∙ two message sets [1 : 2nR1 ] and [1 : 2nR2 ],
∙ two encoders, where encoder  assigns a sequence of codewords x1n (m1l ) to each mes-
sage sequence m1l ∈ [1 : 2nR1 ], l = 1, 2, . . . , and encoder  assigns a sequence of code-
words x2n (m2l ) to each message sequence m2l ∈ [1 : 2nR2 ], l = 1, 2, . . . , and
∙ a decoder that assigns a sequence of message pairs (m ̂ 1l , m
̂ 2l ) ∈ [1 : 2nR1 ] × [1 : 2nR2 ]
or an error message e to each received sequence y(l−1)n+1 for each l = 1, 2, . . . (the
ln+d

ln+d
received sequence y(l−1)n+1 can include parts of the previous and next blocks).
We assume that the message sequences {M1l }∞ ∞
l=1 and {M2l } l=1 are independent and each
message pair (M1l , M2l ), l = 1, 2, . . . , is uniformly distributed over [1 : 2nR1 ] × [1 : 2nR2 ].
The average probability of error is defined as
Pe(n) = max sup Pel(n) (d1 , d2 ),
d1 ,d2 ∈[0:d] l

where Pel(n) (d1 ,

d2 ) = P{(M̂ 1l , M
̂ 2l ) =
̸ (M1l , M2l ) | d1 , d2 }. Note that by the memoryless
property of the channel and the definition of the code, sup l Pel(n) (d1 , d2 ) = Pel(n) (d1 , d2 )
for all l. Thus in the following, we drop the subscript l. Achievability and the capacity
region are defined as for the synchronous DM-MAC.
We consider two degrees of asynchrony.
Mild asynchrony. Suppose that d/n tends to zero as n → ∞. Then, it can be shown that
the capacity region is the same as for the synchronous case.
Total asynchrony. Suppose that d1 and d2 can vary from  to (n − 1), i.e., d = n − 1. In this
case, time sharing is no longer feasible and the capacity region reduces to the following.

Theorem .. The capacity region of the totally asynchronous DM-MAC is the set of
all rate pairs (R1 , R2 ) such that

R1 ≤ I(X1 ; Y | X2 ),
R2 ≤ I(X2 ; Y | X1 ),
R1 + R2 ≤ I(X1 , X2 ; Y)

for some pmf p(x1 )p(x2 ).

Note that this region is not convex in general, since time sharing is sometimes neces-
sary; see Problem .. Hence, unlike the synchronous case, the capacity region for net-
works with total asynchrony is not necessarily convex.

Remark 24.3. The sum-capacity of the totally asynchronous DM-MAC is the same as
that of the synchronous DM-MAC and is given by
Csum = max I(X1 , X2 ; Y).
p(x1 )p(x2 )
24.3 Asynchronous MAC 609

Remark 24.4. The capacity region of the Gaussian MAC does not change with asyn-
chrony because time sharing is not required. However, under total asynchrony, simul-
taneous decoding is needed to achieve all the points in the capacity region.
Remark 24.5. Theorem . also shows that the capacity of a point-to-point channel does
not change with asynchrony.
We prove Theorem . in the next two subsections.

24.3.1 Proof of Achievability

Divide each n-transmission block into (b + 1) subblocks each consisting of k symbols as
illustrated in Figure .; thus n = (b + 1)k and the delays range from 0 to (b + 1)k − 1.
The first subblock labeled j = 0 is the preamble subblock. Also divide the message pair
(M1 , M2 ) into b independent submessage pairs (M1 j , M2 j ) ∈ [1 : 2kR1 ] × [1 : 2kR2 ], j ∈ [1 :
b], and send them in the following b subblocks. Note that the resulting rate pair for this
code, (bR1 /(b + 1), bR2 /(b + 1)), can be made arbitrarily close to (R1 , R2 ) as b → ∞.

k
m11 m12 m1,b−1 m1b
Preamble
k(b + 1)

m21 m22 m2,b−1 m2b

Preamble

Figure .. Transmission block divided into subblocks.

Codebook generation. Fix a product pmf p(x1 )p(x2 ). Randomly and independently gen-
erate a codebook for each subblock. Randomly generate a preamble codeword x1k (0) ac-
cording to ∏ki=1 p X1 (x1i ). For each j ∈ [1 : b], randomly and independently generate 2kR1
codewords x1k (m1 j ), m1 j ∈ [1 : 2kR1 ], each according to ∏ki=1 p X1 (xi1 ). Similarly gener-
ate a preamble codeword x2k (0) and codewords x2k (m2 j ), m2 j ∈ [1 : 2kR2 ], j ∈ [1 : b], each
according to ∏ki=1 p X2 (xi2 ).

Encoding. To send the submessages m1b , encoder  first transmits its preamble codeword
x1k (0) followed by x1k (m1 j ) for each j ∈ [1 : b]. Similarly, encoder  transmits its preamble
codeword x2k (0) followed by x2k (m2 j ) for each j ∈ [1 : b].

Decoding. The decoding procedure consists of two steps—preamble decoding and mes-
sage decoding. The decoder declares d̂1 to be the estimate for d1 if it is the unique num-
d̂ +k
ber in [0 : (b + 1)k − 1] such that 󶀡x1k (0), y ̂1 󶀱 ∈ Tє(n) . Similarly, the decoder declares
d1 +1
610 Networking and Information Theory

d̂2 to be the estimate for d2 if it is the unique number in [0 : (b + 1)k − 1] such that
d̂ +k
󶀡x2k (0), y ̂2 󶀱 ∈ Tє(n) .
d2 +1
Assume without loss of generality that d̂1 ≤ d̂2 . Referring to Figure ., define the
sequences
x1 (mb1 ) = 󶀡x1,δ+1
k
(0), x1k (m11 ), x1k (m12 ), . . . , x1k (m1b ), x1δ (0)󶀱,
̃ 2b+1 ) = 󶀡x2k (m
x2 (m ̃ 21 ), x2k (m
̃ 22 ), . . . , x2k (m
̃ 2,b+1 )󶀱,
(b+1)k+ d̂1 +δ
y = ŷ ,
d1 +δ+1

where δ = d̂2 − d̂1 (mod k). The receiver declares that m

̂ 1b is the sequence of submessages
sent by sender  if it is the unique submessage sequence such that (x1 (m̂ 1b ), x2 (m
̃ 2b+1 ), y) ∈
Tє(n) for some m
̃ 2b+1 .

d̂1 δ x1 (m1b )

X1 m11 m12 m1b

d̂2

X2 ̃ 21
m ̃ 22
m ̃ 23
m ̃ 2,b+1
m

̃ 2b+1 )
x2 (m

Y
y

Figure .. Asynchronous transmission and received sequence.

To recover the message sequence mb2 , the same procedure is repeated beginning with
the preamble of sender .
Analysis of the probability of error. We bound the probability of decoding error for the
submessages M1b from sender  averaged over the codes. Assume without loss of generality
̃ b+1 , X1 (M b ), X2 (M
that M1b = 1 = (1, . . . , 1) and d1 ≤ d2 . Let M ̃ b+1 ), and Y be defined as
2 1 2
before (see Figure .) with (d1 , d2 ) in place of ( d̂1 , d̂2 ). The decoder makes an error only
if one or more of the following events occur:
E0 = 󶁁( d̂1 (Y 2n−1 ), d̂2 (Y 2n−1 )) ̸= (d1 , d2 )󶁑,
̃ b+1 ), Y) ∉ T (n) 󶁑,
E11 = 󶁁(X1 (1), X2 (M 2 є
̄ 2b+1 ), Y) ∈ Tє(n) for some m1b ̸= 1, m
E12 = 󶁁(X1 (mb1 ), X2 (m ̃ b+1 󶁑.
̄ 2b+1 ̸= M 2

Thus, the probability of decoding error for M1b is upper bounded as

P(E1 ) ≤ P(E0 ) + P(E11 ∩ E0c ) + P(E12 ∩ E0c ). (.)
24.3 Asynchronous MAC 611

To bound the first term, the probability of preamble decoding error, define the events
d +k
E01 = 󶁁󶀡X1k (0), Yd 1+1 󶀱 ∉ Tє(n) 󶁑,
1
d +k
E02 = 󶁁󶀡X2k (0), Yd 2+1 󶀱 ∉ Tє(n) 󶁑,
2

d̃ +k
E03 = 󶁁󶀡X1k (0), Yd̃ 1+1 󶀱 ∈ Tє(n) for some d̃1 ̸= d1 , d̃1 ∈ [0 : (b + 1)k − 1]󶁑,
1

d̃ +k
E04 = 󶁁󶀡X2k (0), Yd̃ 2+1 󶀱 ∈ Tє(n) for some d̃2 ̸= d2 , d̃2 ∈ [0 : (b + 1)k − 1]󶁑.
2

Then
P(E0 ) ≤ P(E01 ) + P(E02 ) + P(E03 ) + P(E04 ).
By the LLN, the first two terms tend to zero as k → ∞. To bound the other two terms, we
use the following.

Lemma .. Let (X, Y ) ∼ p(x, y) ̸= p(x)p(y) and (X n , Y n ) ∼ ∏ni=1 p X,Y (xi , yi ). If є >
0 is sufficiently small, then there exists γ(є) > 0 that depends only on p(x, y) such that

P󶁁(X k , Yd+1
d+k
󶀱 ∈ Tє(k) 󶁑 ≤ 2−kγ(є)

for every d ̸= 0.

The proof of this lemma is given in Appendix B.

d̃ +k
Now using this lemma with X k ← X1k (0) and Yd+1
d+k
← Yd̃ 1+1 , we have
1

d̃ +k
P󶁁󶀡X1k (0), Yd̃ 1+1 󶀱 ∈ Tє(k) 󶁑 ≤ 2−kγ(є)
1

for d̃1 < d1 , and the same bound holds also for d̃1 > d1 by changing the role of X and Y
in the lemma. Thus, by the union of events bound,
P(E03 ) ≤ (b + 1)k2−kγ(є) ,
which tends to zero as k → ∞. Similarly, P(E04 ) tends to zero as k → ∞.
We continue with bounding the last two terms in (.). By the LLN, P(E11 ∩ E0c ) tends
to zero as n → ∞. To upper bound P(E12 ∩ E0c ), define the events

̄ 2b+1 ), Y) ∈ Tє(n) for m1 j1 = 1, j1 ∉ J1 , m

E(J1 , J2 ) = 󶁁(X1 (m1b ), X2 (m ̃ 2 j , j2 ∉ J2
̄ 2 j2 = M 2

̄ ̃
and some m1 j1 ̸= 1, j1 ∈ J1 , m2 j2 ̸= M2 j2 , j2 ∈ J2 󶁑
for each J1 ⊆ [1 : b] and J2 ⊆ [1 : b + 1]. Then
P(E12 ∩ E0c ) ≤ 󵠈 P(E(J1 , J2 )).
̸=J1 ⊆[1:b], J2 ⊆[1:b+1]

We bound each term. Consider the event E(J1 , J2 ) illustrated in Figure . for b = 5,
J1 = {1, 3}, J2 = {3, 4}. The (b + 1)k transmissions are divided into the following four
groups:
612 Networking and Information Theory

∙ Transmissions where both m1 j1 and m ̄ 2 j2 are correct: Each symbol in this group is
generated according to p(x1 )p(x2 )p(y|x1 , x2 ). Assume that there are k1 such symbols.
∙ Transmissions where m1 j1 is in error but m̄ 2 j2 is correct: Each symbol in this group is
generated according to p(x1 )p(x2 )p(y|x2 ). Assume that there are k2 such symbols.
∙ Transmissions where m̄ 2 j2 is in error but m1 j1 is correct: Each symbol in this group is
generated according to p(x1 )p(x2 )p(y|x1 ). Assume that there are k3 such symbols.
∙ Transmissions where both m1 j1 and m ̄ 2 j2 are in error: Each symbol in this group is
generated according to p(x1 )p(x2 )p(y). Assume that there are k4 such symbols.
Note that k1 + k2 + k3 + k4 = (b + 1)k, k2 + k4 = k|J1 |, and k3 + k4 = k|J2 |.

x1 (m1b )

̃ 2b+1 )
x2 (m

k2 k4

k3
Figure .. Illustration of error event E(J1 , J2 ) partitioning into four groups. The
shaded subblocks denote the messages in error.

Now, by the independence of the subblock codebooks and the joint typicality lemma,
̄ 2b+1 ), Y) ∈ Tє(n) 󶁑
P󶁁(X1 (m1b ), X2 (m
≤ 2−k2 (I(X1 ;Y|X2 )−δ(є)) ⋅ 2−k3 (I(X2 ;Y|X1 )−δ(є)) ⋅ 2−k4 (I(X1 ,X2 ;Y)−δ(є)) (.)
̄ 2b+2 ) with the given error location. Furthermore,
for each submessage sequence pair (mb1 , m
the total number of such submessage sequence pairs is upper bounded by 2k(|J1 |R1 +|J2 |R2 ) .
Thus, by the union of events bound and (.), we have
P(E(J1 , J2 )) ≤ 2k(|J1 |R1 +|J2 |R2 ) ⋅ 2−k2 (I(X1 ;Y|X2 )−δ(є))−k3 (I(X2 ;Y|X1 )−δ(є))−k4 (I(X1 ,X2 ;Y)−δ(є))
= 2−k2 (I(X1 ;Y|X2 )−R1 −δ(є)) ⋅ 2−k3 (I(X2 ;Y|X1 )−R2 −δ(є)) ⋅ 2−k4 (I(X1 ,X2 ;Y)−R1 −R2 −δ(є)) ,
which tends to zero as k → ∞ if R1 < I(X1 ; Y |X2 ) − δ(є), R2 < I(X2 ; Y|X1 ) − δ(є), and
R1 + R2 < I(X1 , X2 ; Y ) − δ(є).
The probability of decoding error for M2b can be bounded similarly. This completes
the achievability proof of Theorem ..
24.3 Asynchronous MAC 613

24.3.2 Proof of the Converse

Given a sequence of (2nR1 , 2nR2 , n, d = n − 1) codes such that limn→∞ Pe(n) = 0, we wish
to show that the rate pair (R1 , R2 ) must satisfy the inequalities in Theorem . for some
product pmf p(x1 )p(x2 ). Recall that the codebook is used independently in consecutive
blocks. Assume that d1 = 0 and the receiver can synchronize the decoding with the trans-
mitted sequence from sender . The probability of error in this case is

max sup Pel(n) (0, d2 ) ≤ max sup Pel(n) (d1 , d2 ) = Pe(n) .

d2 ∈[0:n−1] l d1 ,d2 ∈[0:n−1] l

Further assume that D2 ∼ Unif[0 : n − 1]. Then the expected probability of error is upper
bounded as ED2 (supl Pel(n) (0, D2 )) ≤ Pe(n) . We now prove the converse under these more
relaxed assumptions.
To simplify the notation and ignore the edge effect, we assume that the communication
2n 2n
started in the distant past, so (X1n , X2n , Y n ) has the same distribution as (X1,n+1 , X2,n+1 ,
2n (κ+1)n−1
Yn+1 ). Consider decoding the received sequence Y to recover the sequence of κ
message pairs (M1l , M2l ) ∈ [1 : 2nR1 ] × [1 : 2nR2 ], l ∈ [1 : κ].
By Fano’s inequality,

H(M1l , M2l |Y (κ+1)n , D2 ) ≤ H(M1l , M2l |Y (κ+1)n−1 ) ≤ nєn

for l ∈ [1 : κ], where єn tends to zero as n → ∞.

Following the converse proof for the synchronous DM-MAC in Section ., it is easy
to show that
(κ+1)n
κnR1 ≤ 󵠈 I(X1i ; Yi | X2,i−D2 , D2 ) + κnєn ,
i=1
(κ+1)n
κnR2 ≤ 󵠈 I(X2,i−D2 ; Yi | X1i , D2 ) + κnєn ,
i=1
(κ+1)n
κn(R1 + R2 ) ≤ 󵠈 I(X1i , X2,i−D2 ; Yi |D2 ) + κnєn .
i=1

Now let Q ∼ Unif[1 : n] (not over [1 : (κ + 1)n − 1]) be the time-sharing random variable
independent of (X1κn , X2κn , Y (κ+1)n , D2 ). Then

κ+1
κnR1 ≤ 󵠈 nI(X1,Q+(l−1)n ; YQ+(l−1)n | X2,Q+(l−1)n−D2 , D2 , Q) + κnєn
l=1
(a)
= (κ + 1)nI(X1Q ; YQ | X2,Q−D2 , Q, D2 ) + κnєn
= (κ + 1)nI(X1 ; Y | X2 , Q, D2 ) + κnєn
(b)
≤ (κ + 1)nI(X1 ; Y | X2 ) + κnєn ,
614 Networking and Information Theory

where X1 = X1Q , X2 = X2,Q−D2 , Y = YQ , (a) follows since the same codebook is used over
blocks, and (b) follows since (Q, D2 ) → (X1 , X2 ) → Y form a Markov chain. Similarly
κnR2 ≤ (κ + 1)nI(X2 ; Y | X1 ) + κnєn ,
κn(R1 + R2 ) ≤ (κ + 1)nI(X1 , X2 ; Y) + κnєn .
Note that since D2 ∼ Unif[0 : n − 1] is independent of Q, X2 is independent of Q and thus
of X1 . Combining the above inequalities, and letting n → ∞ and then κ → ∞ completes
the proof of Theorem ..

SUMMARY

∙ DMC with random arrival model:

∙ Queue stability
∙ Channel capacity is the limit on the arrival rate for reliable communication
∙ Extensions to multiuser channels
∙ Random access channel as a MAC with state:
∙ Compound channel approach
∙ ALOHA
∙ Adaptive coding
∙ Broadcast channel approach
∙ Asynchronous MAC:
∙ Capacity region does not change under mild asynchrony
∙ Capacity region under total asynchrony reduces to the synchronous capacity re-
gion without time sharing
∙ Subblock coding and synchronization via preamble decoding
∙ Simultaneous decoding increases the rates under asynchrony

∙ Open problem .. What is the capacity region of the asynchronous MAC when
d = αn for α ∈ (0, 1)?

BIBLIOGRAPHIC NOTES

The “unconsummated union” between information theory and networking was surveyed
by Ephremides and Hajek (). This survey includes several topics at the intersection of
the two fields, including multiple access protocols, timing channels, effective bandwidth
Problems 615

of bursty data sources, deterministic constraints on data streams, queuing theory, and
switching networks. The result on the stability region of a DM-MAC mentioned in Re-
mark . can be found, for example, in Kalyanarama Sesha Sayee and Mukherji ().
The random access (collision) channel is motivated by the ALOHA System first described
in Abramson (). A comparative study of information theoretic and collision resolu-
tion approaches to the random access channel is given by Gallager (). The adaptive
coding approach in Section . is an example of the DM-MAC with distributed state in-
formation studied in Hwang, Malkin, El Gamal, and Cioffi (). The broadcast channel
approach to the random access channel is due to Minero, Franceschetti, and Tse ().
They analyzed the broadcast channel approach for the N-sender random access channel
and demonstrated that simultaneous decoding can greatly improve the average through-
put over simple collision resolution approaches as sketched in Figure ..

CBC

1/e

CALOHA

CA
0
0 1 λ

Figure .. Comparison of the sum-capacities (average throughputs) of ALOHA

(CALOHA ), adaptive coding (CA ), and broadcast channel approach (CBC ) versus the
load (average number of active senders) λ.

Cover, McEliece, and Posner () showed that mild asynchrony does not affect the
capacity region of the DM-MAC. Massey and Mathys () studied total asynchrony in
the collision channel without feedback and showed that time sharing cannot be used. The
capacity region of the totally asynchronous DM-MAC in Theorem . is due to Poltyrev
() and Hui and Humblet (). Verdú () extended this result to multiple access
channels with memory and showed that unlike the memoryless case, asynchrony can in
general reduce the sum-capacity.

PROBLEMS

.. Provide the details of the converse proof of Theorem . by justifying the second
inequality in (.).
616 Networking and Information Theory

.. Consider the DM-MAC p(y1 , y2 |x) with two i.i.d. arrival processes {A 1 (i)} and
{A 2 (i)} of arrival rates λ1 and λ2 , respectively. Show that the stability region S is
equal to the capacity region C .
.. Nonslotted DMC with random data arrivals. Consider the DMC with random data
arrival process {A(i)} as defined in Section .. Suppose that the sender trans-
mits a codeword if there are more than nR bits in the queue and transmits a fixed
symbol, otherwise. Find the necessary and sufficient conditions for reliable com-
munication (that is, the queue is stable and the message is recovered).
.. Two-sender three-receiver channel with  messages. Consider a DM -sender -
receiver channel p(y1 |x1 )p(y2 |x2 )p(y12 |x1 , x2 ), where the message demands are
specified as in Figure ..
(a) Show that the capacity region of this channel is the set of rate pairs (R̃ 1 , R̃ 2 )
such that

for some p(q)p(x1 |q)p(x2 |q).

(b) Consider the special case in Figure ., where X1 and X2 are binary, and Y1 =
X1 , Y2 = X2 , and Y12 = X1 ⊕ X2 . Show that the capacity region reduces to the
set of rate pairs (R̃ 1 , R̃ 2 ) such that R̃ 1 + R̃ 2 ≤ 1 and can be achieved without
time sharing.
.. Two-sender three-receiver channel with  messages. Consider a DM -sender -
receiver channel p(y1 |x1 )p(y2 |x2 )p(y12 |x1 , x2 ), where the message demands are
specified as in Figure ..
(a) Show that a rate quadruple (R̃ 10 , R̃ 11 , R̃ 20 , R̃ 22 ) is achievable if

R̃ 11 ≤ I(X1 ; Y1 |U1 , Q),

for some pmf p(q)p(u1 , x1 |q)p(u2 , x2 |q).

Appendix 24A Proof of Lemma 24.1 617

(b) Consider the special case in Figure ., where X1 and X2 are binary, and Y1 =
X1 , Y2 = X2 , and Y12 = X1 ⊕ X2 . Show that the above inner bound simplifies
to (.). (Hint: Show that both regions have the same five extreme points
(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1), and (0, 1, 0, 1).)
(c) Prove the converse for the capacity region in (.).
.. MAC and BC with known delays. Consider the DM-MAC and the DM-BC with
constant delays d1 and d2 known at the senders and the receivers. Show that the
capacity regions for these channels coincide with those without any delays.
.. Mild asynchrony. Consider the DM-MAC with delays d1 , d2 ∈ [0 : d] such that d/n
tends to zero as n → ∞. Show that the capacity region is equal to that without any
delays. (Hint: Consider all contiguous codewords of length n − d and perform
joint typicality decoding using y nd+1 for each delay pair.)

APPENDIX 24A PROOF OF LEMMA 24.1

We first prove the converse, that is, the necessity of λ ≤ R. By the transition law for Q(i)
in (.),
Q(i) − nR + A(i) if i = jn,
Q(i + 1) ≥ 󶁇
Q(i) + A(i) otherwise.
jn
Hence, by summing over i and telescoping, we have Q( jn + 1) ≥ ∑i=1 A(i) − jnR. By
taking expectation on both sides and using the stability condition, we have ∞ > B ≥
E(Q( jn + 1)) ≥ jn(λ − R) for j = 1, 2, . . . . This implies that R ≥ λ.
Next we prove the sufficiency of λ < R using an elementary form of Foster–Lyapunov
techniques (Meyn and Tweedie ). Let Q̃ j = Q(( j − 1)n + 1) for j = 1, 2, . . . and Ã j =
jn
∑i=( j−1)n+1 A(i). Then, by the queue transition law,

Q̃ − nR + Ã j if Q̃ j ≥ nR,
Q̃ j+1 = 󶁇 j
Q̃ j + Ã j otherwise
≤ max{Q̃ j − nR, nR} + Ã j
= max{Q̃ j − 2nR, 0} + Ã j + nR.

Since (max{Q̃ j − 2nR, 0})2 ≤ (Q̃ j − 2nR)2 ,

Q̃ 2j+1 ≤ Q̃ 2j + (2nR)2 + (Ã j + nR)2 − 2Q̃ j (nR − Ã j ).

By taking expectation on both sides and using the independence of Q̃ j and Ã j and the
fact that E(Ã j ) = nλ and E((Ã j + nR)2 ) ≤ n2 (k + R)2 , we obtain

E(Q̃ 2j+1 ) ≤ E(Q̃ 2j ) + n2 ((k + R)2 + 4R2 )) − 2n(R − λ) E(Q̃ j ),

618 Networking and Information Theory

or equivalently,
n((k + R)2 + 4R2 ) E(Q̃ j ) − E(Q̃ j+1 )
2 2
E(Q̃ j ) ≤ + .
2(R − λ) 2n(R − λ)

Since Q̃ 1 = 0, summing over j and telescoping, we have

n((k + R)2 + 4R2 ) E(Q̃ 1 ) − E(Q̃ b+1 ) n((k + R)2 + 4R2 )

b 2 2
1
󵠈 E(Q̃ j ) ≤ + ≤ .
b j=1 2(R − λ) 2nb(R − λ) 2(R − λ)

Recall the definition of Q̃ j = Q(( j − 1)n + 1) and note that Q(i) ≤ Q(( j − 1)n + 1) + kn
for i ∈ [( j − 1)n + 1 : jn]. Therefore, we have stability in the mean, that is,
l
1
sup 󵠈 E(Q(i)) ≤ B < ∞. (.)
l l i=1

To prove stability, i.e., supi E(Q(i)) < ∞, which is a stronger notion than stability in the
mean in (.), we note that the Markov chain {Q̃ j } is positively recurrent; otherwise,
the stability in the mean would not hold. Furthermore, it can be readily checked that the
Markov chain is aperiodic. Hence, the chain has a unique limiting distribution and E(Q̃ j )
converges to a limit (Meyn and Tweedie ). But by the Cesàro mean lemma (Hardy
, Theorem ), (1/b) ∑bj=1 E(Q̃ j ) < ∞ for all b implies that lim j E(Q̃ j ) < ∞. Thus,
sup j E(Q̃ j ) < ∞ (since E(Q̃ j ) < ∞ for all j). Finally, using the same argument as before,
we can conclude that supi E(Q(i)) < ∞, which completes the proof of stability.

APPENDIX 24B PROOF OF LEMMA 24.2

First consider the case d ≥ k (indices for the underlying k-sequences do not overlap).
Then (X1 , Yd+1 ), (X2 , Yd+2 ), . . . are i.i.d. with (Xi , Yd+i ) ∼ p X (xi )pY (y d+i ). Hence, by the
joint typicality lemma,

P󶁁(X k , Yd+1
d+k
) ∈ Tє(n) (X, Y)󶁑 ≤ 2−k(I(X;Y)−δ(є)) .

Next consider the case d ∈ [1 : k − 1]. Then X k and Yd+1 d+k

have overlapping indices
and are no longer independent of each other. Suppose that є > 0 is sufficiently small that
(1 − є)p X,Y (x ∗ , y∗ ) ≥ (1 + є)p X (x ∗ )pY (y∗ ) for some (x ∗ , y ∗ ). Let p = p X (x ∗ )pY (y ∗ ) and
q = p X,Y (x ∗ , y∗ ). For i ∈ [1 : k], define Ỹi = Yd+i and

1 if (Xi , Ỹi ) = (x ∗ , y∗ ),
Ei = 󶁇
0 otherwise.

Now consider

|{i : (Xi , Ỹi ) = (x ∗ , y∗ )}| 1 k

π(x, y| X k , Ỹ k ) = = 󵠈 Ei .
k k i=1
Appendix 24B Proof of Lemma 24.2 619

Since {(Xi , Ỹi )} is stationary ergodic with p X󰑖 ,Ỹ󰑖 (x ∗ , y ∗ ) = p for all i ∈ [1 : k],

P󶁁(X k , Yd+1
d+k
) ∈ Tє(n) (X, Y )󶁑 ≤ P󶁁π(x ∗ , y ∗ | X k , Ỹ k ) ≥ (1 − є)q󶁑
≤ P󶁁π(x ∗ , y ∗ | X k , Ỹ k ) ≥ (1 + є)p󶁑,

which, by Birkhoff ’s ergodic theorem (Petersen , Section .), tends to zero as n → ∞.
To show the exponential tail, however, we should bound P{π(x ∗ , y∗ |X k , Ỹd+1
d+k
) ≥ (1 + є)p}
more carefully.
Assuming that k is even, consider the following two cases. First suppose that d is odd.
Let U k/2 = ((X2i−1 , Ỹd+2i−1 ) : i ∈ [1 : k/2]) be the subsequence of odd indices. Then U k/2
is i.i.d. with pU󰑖 (x ∗ , y ∗ ) = p and by the Chernoff bound in Appendix B,
2
P󶁁π(x ∗ , y ∗ |U k/2 ) ≥ (1 + є)p󶁑 ≤ e −kpє /6
.

Similarly, let V k/2 = {(X2i , Ỹ2i )}k/2

i=1 be the subsequence of even indices. Then

2
P󶁁π(x ∗ , y∗ |V k/2 ) ≥ (1 + є)p󶁑 ≤ e −kpє /6
.

Thus, by the union of events bound,

P󶁁π(x ∗ , y ∗ | X k , Ỹ k ) ≥ (1 + є)p󶁑
= P󶁁π(x ∗ , y∗ |U k/2 , V k/2 ) ≥ (1 + є)p󶁑
≤ P󶁁π(x ∗ , y∗ |U k/2 ) ≥ (1 + є)p or π(x ∗ , y ∗ |V k/2 ) ≥ (1 + є)p󶁑
2
≤ 2e −kpє /6
.

Next suppose that d is even. We can construct two i.i.d. subsequences by alternating
even and odd indices for every d indices, we have

U k/2 = 󶀡(Xi , Ỹi ) : i odd ∈ [(2l − 1)d + 1 : 2l d], i even ∈ [2l d + 1 : 2(l + 1)d]󶀱.

For example, if d = 2, then U k/2 = ((Xi , Yd+i ) : i = 1, 4, 5, 8, 9, 12, . . .). The rest of the
analysis is the same as before. This completes the proof of the lemma.
APPENDICES
APPENDIX A

Convex Sets and Functions

We review elementary results on convex sets and functions. The readers are referred to
classical texts such as Eggleston () and Rockafellar () for proofs of these results
and more in depth treatment of convexity.
̄ ∈ R for
Recall that a set R ⊆ ℝ d is said to be convex if x, y ∈ R implies that αx + αy
all α ∈ [0, 1]. Thus, a convex set includes a line segment between any pair of points in the
set. Consequently, a convex set includes every convex combination x of any finite number
of elements x1 , . . . , xk , i.e.,
k
x = 󵠈 αjxj
j=1

for some (α1 , . . . , αk ) such that α j ≥ 0, j ∈ [1 : k], and ∑kj=1 α j = 1, that is, a convex set
includes every possible “average” of any tuple of points in the set.
The convex hull of a set R ⊆ ℝ d , denoted as co(R), is the union of all finite convex
combinations of elements in R. Equivalently, the convex hull of R is the smallest con-
vex set containing R. The convex closure of R is the closure of the convex hull of R, or
equivalently, the smallest closed convex set containing R.

Fenchel–Eggleston–Carathéodory theorem. Any point in the convex closure of a

connected compact set R ∈ ℝ d can be represented as a convex combination of at most
d points in R.

Let R be a convex set and x0 be a point on its boundary. Suppose aT x ≤ aT x0 for

all x ∈ R and some a ̸= 0. Then the hyperplane {y : aT (x − x0 ) = 0} is referred to as a
supporting hyperplane to R at the point x0 . The supporting hyperplane theorem (Eggleston
) states that at least one such hyperplane exists.
Any closed bounded convex set R can be described as the intersection of closed half
spaces characterized by supporting hyperplanes. This provides a condition to check if two
convex sets are identical.

Lemma A.. Let R ⊆ ℝ d be convex. Let R1 ⊆ R2 be two bounded convex subsets of

R, closed relative to R. If every supporting hyperplane of R2 intersects with R1 , then
R1 = R2 .
624 Convex Sets and Functions

Sometimes it is easier to consider supporting hyperplanes of the smaller set.

Lemma A.. Let R ⊆ ℝ d be convex. Let R1 ⊆ R2 be two bounded convex subsets of

R, closed relative to R. Let A be a subset of boundary points of R1 such that the
convex hull of A includes all boundary points. If each x0 ∈ A has a unique supporting
hyperplane and lies on the boundary of R2 , then R1 = R2 .

A real-valued function д(x) is said to be convex if its epigraph {(x, a) : д(x) ≤ a} is

convex for all a. If д(x) is twice differentiable, it is convex iff

󰜕2 д(x)
∇2 д(x) = 󶀦 󶀶
󰜕xi 󰜕x j i j

is positive semidefinite for all x. If −д(x) is convex, then д(x) is concave. The following
are a few examples of convex and concave functions:
∙ Entropy: д(x) = − ∑ j x j log x j is concave on the probability simplex.
∙ Log determinant: д(X) = log |X| is concave on the set of positive definite matrices.
∙ Maximum: The function supθ дθ (x) of any collection of convex functions {дθ (x)} is
convex.
APPENDIX B

Probability and Estimation

We review some basic probability and mean squared error estimation results. More de-
tailed discussion including proofs of some of these results can be found in textbooks on
the subject, such as Durrett (), Mitzenmacher and Upfal (), and Kailath, Sayed,
and Hassibi ().

Probability Bounds and Limits

Union of events bound. Let E1 , E2 , . . . , Ek be events. Then

k k
P󶀦󵠎 E j 󶀶 ≤ 󵠈 P(E j ).
j=1 j=1

Jensen’s inequality. Let X ∈ X (or ℝ) be a random variable with finite mean E(X) and д
be a real-valued convex function over X (or ℝ) with finite expectation E(д(X)). Then
E󶀡д(X)󶀱 ≥ д󶀡E(X)󶀱.
Consider the following variant of the standard Chebyshev inequality.
Chebyshev lemma. Let X be a random variable with finite mean E(X) and variance
Var(X), and let δ > 0. Then
Var(X)
P󶁁| X − E(X)| ≥ δ E(X)󶁑 ≤ .
(δ E(X))2
In particular,
Var(X)
P󶁁X ≤ (1 − δ) E(X)󶁑 ≤ ,
(δ E(X))2
Var(X)
P󶁁X ≥ (1 + δ) E(X)󶁑 ≤ .
(δ E(X))2
Chernoff bound. Let X1 , X2 , . . . be a sequence of independent identically distributed
(i.i.d.) Bern(p) random variables, and let δ ∈ (0, 1). Then
n 2
P󶁆󵠈 Xi ≥ n(1 + δ)p󶁖 ≤ e −npδ /3
,
i=1
n 2
P󶁆󵠈 Xi ≤ n(1 − δ)p󶁖 ≤ e −npδ /2
.
i=1
626 Probability and Estimation

Weak law of large numbers (LLN). Let X1 , X2 , . . . be a sequence of i.i.d. random variables
with finite mean E(X) and variance, then for every є > 0,
󵄨󵄨 1 n 󵄨󵄨
lim P󶁆󵄨󵄨󵄨 󵠈 Xi − E(X)󵄨󵄨󵄨 ≥ є󶁖 = 0.
n→∞ 󵄨󵄨 n i=1 󵄨󵄨

In other words, (1/n) ∑ni=1 Xi converges to E(X) in probability.

Dominated convergence theorem. Let {дn (x)} be a sequence of real-valued functions
such that |дn (x)| ≤ ϕ(x) for some integrable function ϕ(x) (i.e., ∫ ϕ(x)dx < ∞). Then

󵐐 lim дn (x) dx = lim 󵐐 дn (x) dx.

n→∞ n→∞

Functional Representation Lemma

The following lemma shows that any conditional pmf can be represented as a determin-
istic function of the conditioning variable and an independent random variable. This
lemma has been used to establish identities between information quantities optimized
over different sets of distributions (Hajek and Pursley , Willems and van der Meulen
).

Functional Representation Lemma. Let (X, Y , Z) ∼ p(x, y, z). Then, we can rep-
resent Z as a function of (Y , W) for some random variable W of cardinality |W| ≤
|Y|(|Z| − 1) + 1 such that W is independent of Y and X → (Y , Z) → W form a Markov
chain.

To prove this lemma, it suffices to show that Z can be represented as a function of

(Y , W) for some W independent of Y with |W| ≤ |Y|(|Z| − 1) + 1. The Markov rela-
tion X → (Y , Z) → W is guaranteed by generating X according to the conditional pmf
p(x|y, z), which results in the desired joint pmf p(x, y, z) on (X, Y , Z). We first illustrate
the proof through the following.
Example B.. Suppose that (Y , Z) is a DSBS(p) for some p ∈ (0, 1/2), i.e., pY ,Z (0, 0) =
pY ,Z (1, 1) = (1 − p)/2 and pY ,Z (0, 1) = pY ,Z (1, 0) = p/2. Let P = {F(z|y) : (y, z) ∈ Y × Z}
be the set of values that the conditional cdf F(z|y) takes and let W be a random variable
independent of Y with cdf F(󰑤) such that {F(󰑤) : 󰑤 ∈ W} = P as shown in Figure B..
Then P = {0, p, 1 − p, 1} and W is ternary with pmf

󶀂
󶀒 p 󰑤 = 1,
󶀒
p(󰑤) = 󶀊1 − 2p 󰑤 = 2,
󶀒
󶀒
󶀚p 󰑤 = 3.
Now we can write Z = д(Y , W), where

0 (y, 󰑤) = (0, 1) or (0, 2) or (1, 1),

д(y, 󰑤) = 󶁇
1 (y, 󰑤) = (0, 3) or (1, 2) or (1, 3).
Mean Squared Error Estimation 627

Z=0 Z=0 Z=1 FZ|Y (z|0)

Z=0 Z=1 Z=1 FZ|Y (z|1)

W=1 W=2 W=3 F(󰑤)

0 p 1−p 1

Figure B.. Conditional pmf for the DSBS(p) and its functional representation.

It is straightforward to see that for each y ∈ Y, Z = д(y, W) ∼ F(z|y).

We can easily generalize the above example to an arbitrary pmf p(y, z). Assume with-
out loss of generality that Y = {1, 2, . . . , |Y|} and Z = {1, 2, . . . , |Z|}. Let P = {F(z|y) :
(y, z) ∈ Y × Z} be the set of values that F(z|y) takes and define W to be a random vari-
able independent of Y with cdf F(󰑤) that takes values in P. It is easy to see that |W| =
|P| − 1 ≤ |Y|(|Z| − 1) + 1. Now consider the function

д(y, 󰑤) = min{z ∈ Z : F(󰑤) ≤ F(z | y)}.

Then

P{д(Y , W) ≤ z | Y = y} = P{д(y, W) ≤ z | Y = y}
(a)
= P{F(W) ≤ F(z | y) | Y = y}
(b)
= P{F(W) ≤ F(z | y)}
(c)
= F(z | y),

where (a) follows since д(y, 󰑤) ≤ z iff F(󰑤) ≤ F(z|y), (b) follows by the independence of
Y and W, and (c) follows since F(󰑤) takes values in P = {F(z|y)}. Hence we can write
Z = д(Y , W), which completes the proof of the functional representation lemma.

Mean Squared Error Estimation

Let (X, Y) ∼ F(x, y) and Var(X) < ∞. The minimum mean squared error (MMSE) esti-
̂
mate of X given an observation vector Y is a (measurable) function x(Y) of Y that mini-
mizes the mean squared error (MSE) E((X − X) ̂ 2 ).
To find the MMSE estimate, note that E(X д(Y)) = E(E(X|Y)д(Y)) for every function
д(Y) of the observation vector. Hence, the MSE is lower bounded by
̂ 2 ) = E󶀡(X − E(X |Y) + E(X |Y) − X)
E((X − X) ̂ 2󶀱
̂ 2󶀱
= E󶀡(X − E(X |Y))2 󶀱 + E󶀡(E(X |Y) − X)
≥ E󶀡(X − E(X |Y))2 󶀱
628 Probability and Estimation

with equality iff X̂ = E(X|Y). Thus, the MMSE estimate of X given Y is the conditional
expectation E(X|Y) and the corresponding minimum MSE is

E󶀡(X − E(X |Y))2 󶀱 = E(X 2 ) − E󶀡(E(X |Y))2 󶀱 = E(Var(X |Y)).

The MMSE is related to the variance of X through the law of conditional variances

Var(X) = E(Var(X |Y)) + Var(E(X |Y)).

Linear estimation. The linear MMSE estimate of X given Y = (Y1 , . . . , Yn ) is an affine

function X̂ = aT Y + b that minimizes the MSE. For simplicity we first consider the case
E(X) = 0 and E(Y) = 0. The linear MMSE estimate of X given Y is

X̂ = aT Y,
̂ is orthogonal to Y, i.e.,
where a is such that the estimation error (X − X)

E󶀡(X − aT Y)Yi 󶀱 = 0 for every i ∈ [1 : n],

or equivalently, aT KY = K XY . If KY is nonsingular, then the linear MMSE estimate is

X̂ = KXY KY−1 Y

with corresponding minimum MSE

̂ 2 ) (a)
E((X − X) ̂
= E((X − X)X)
= E(X 2 ) − E(K XY KY−1 YX)
= K X − K XY KY−1 KYX ,

where (a) follows since E((X − X) ̂ X)

̂ = 0 by the orthogonality principle.
When X or Y has a nonzero mean, the linear MMSE estimate is determined by finding
the MMSE estimate of X 󳰀 = X − E(X) given Y󳰀 = Y − E(Y) and setting X̂ = X̂ 󳰀 + E(X).
Thus, if KY is nonsingular, the linear MMSE estimate of X given Y is

X̂ = K XY KY−1 (Y − E(Y)) + E(X)

with corresponding MSE

K X − KXY KY−1 KYX .

Thus, unlike the (nonlinear) MMSE estimate, the linear MMSE estimate is a function only
of E(X), E(Y), Var(X), KY , and K XY .
Since the linear MMSE estimate is not optimal in general, its MSE is higher than that
of the MMSE in general, i.e.,

KX − K XY KY−1 KYX ≥ E(Var(X |Y)).

The following fact on matrix inversion is often useful in calculating the linear MMSE
estimate.
Gaussian Random Vectors 629

Matrix Inversion Lemma. If a square block matrix

K11 K12
K=󶁦 󶁶
K21 K22

is invertible, then
−1
(K11 − K12 K22 K21 )−1 = K11
−1 −1
+ K11 −1
K12 (K22 − K21 K11 K12 )−1 K21 K11
−1
.

Example B.. Let X be the signal with mean μ and variance P, and the observations be
Yi = X + Zi for i = [1 : n], where each noise component Zi has zero mean and variance
N. Assume that Zi and Z j are uncorrelated for every i ̸= j ∈ [1 : n], and X and Zi are
uncorrelated for every i ∈ [1 : n].
We find the linear MMSE estimate X̂ of X given Y = (Y1 , . . . , Yn ). Since KY = P11T +
N I, by the matrix inversion lemma (with K11 = N I, K12 = 1, K21 = 1T , K22 = −1/P),

1 P
KY−1 = I− 11T .
N N(nP + N)

Also note that E(Y) = μ1 and KXY = P1T . Hence, the linear MMSE estimate is

X̂ = K XY KY−1 (Y − E(Y)) + E(X)

n
P
= 󵠈(Yi − μ) + μ
nP + N i=1
n
P N
= 󵠈 Yi + μ
nP + N i=1 nP + N

and its MSE is

̂ 2 ) = P − K XY K −1 KYX = PN
E((X − X) Y .
nP + N

Gaussian Random Vectors

Let X = (X1 , . . . , Xn ) be a random vector with mean μ and covariance matrix K ⪰ 0. We
say that (X1 , . . . , Xn ) is jointly Gaussian and X is a Gaussian random vector if every linear
combination aT X is a Gaussian random variable. If K ≻ 0, then the joint pdf is well-
defined as
1 1 󰑇 −1
f (x) = e − 2 (x−μ) K (x−μ) .
󵀄(2π)n |K|

A Gaussian random vector X = (X1 , . . . , Xn ) satisfies the following properties:

. If X1 , . . . , Xn are uncorrelated, that is, K is diagonal, then X1 , . . . , Xn are independent.
This can be verified by substituting Ki j = 0 for all i ̸= j in the joint pdf.
630 Probability and Estimation

. A linear transformation of X is a Gaussian random vector, that is, for every real m × n
matrix A,
Y = AX ∼ N(Aμ, AK AT ).

This can be verified from the characteristic function of Y.

. The marginals of X are Gaussian, that is, X(S) is jointly Gaussian for any subset S ⊆
[1 : n]. This follows from the second property.
. The conditionals of X are Gaussian; more specifically, if

X μ K K12
X = 󶁦 1 󶁶 ∼ N 󶀦󶁦 1 󶁶 , 󶁦 11 󶁶󶀶 ,
X2 μ2 K21 K22

where X1 = (X1 , . . . , Xk ) and X2 = (Xk+1 , . . . , Xn ), then

−1 −1
X2 | {X1 = x} ∼ N 󶀢K21 K11 (x − μ1 ) + μ2 , K22 − K21 K11 K12 󶀲 .

This follows from the above properties of Gaussian random vectors.

Note that the last property implies that if (X, Y) is jointly Gaussian, then the MMSE
estimate of X given Y is linear. Since uncorrelation implies independence for Gaussian
random vectors, the error (X − X)̂ of the MMSE estimate and the observation Y are in-
dependent.
APPENDIX C

Cardinality Bounding Techniques

We introduce techniques for bounding the cardinalities of auxiliary random variables.

Convex Cover Method

The convex cover method is based on the following lemma (Ahlswede and Körner ,
Wyner and Ziv ), which is a direct consequence of the Fenchel–Eggleston–Carathéo-
dory theorem in Appendix A.

Support Lemma. Let X be a finite set and U be an arbitrary set. Let P be a connected
compact subset of pmfs on X and p(x|u) ∈ P, indexed by u ∈ U , be a collection of
(conditional) pmfs on X . Suppose that д j (π), j = 1, . . . , d, are real-valued continuous
functions of π ∈ P. Then for every U ∼ F(u) defined on U , there exist a random
variable U 󳰀 ∼ p(u󳰀 ) with |U 󳰀 | ≤ d and a collection of conditional pmfs p(x|u󳰀 ) ∈ P,
indexed by u󳰀 ∈ U 󳰀 , such that for j = 1, . . . , d,

󵐐 д j (p(x |u)) dF(u) = 󵠈 д j (p(x |u󳰀 ))p(u󳰀 ).

U u󳰀 ∈U 󳰀

We now show how this lemma is used to bound the cardinality of auxiliary random
variables. For concreteness, we focus on bounding the cardinality of the auxiliary random
variable U in the characterization of the capacity region of the (physically) degraded DM-
BC p(y1 |x)p(y2 |y1 ) in Theorem ..
Let U ∼ F(u) and X | {U = u} ∼ p(x|u), where U takes values in an arbitrary set U .
Let R(U , X) be the set of rate pairs (R1 , R2 ) such that

R1 ≤ I(X; Y1 |U),
R2 ≤ I(U ; Y2 ).

We prove that given any (U , X), there exists (U 󳰀 , X) with |U 󳰀 | ≤ min{|X |, |Y1 |, |Y2 |} + 1
such that R(U , X) = R(U 󳰀 , X), which implies that it suffices to consider auxiliary ran-
dom variables with |U | ≤ min{|X |, |Y1 |, |Y2 |} + 1.
We first show that it suffices to take |U| ≤ |X | + 1. Assume without loss of generality
that X = {1, 2, . . . , |X |}. Given (U , X), consider the set P of all pmfs on X (which is
632 Cardinality Bounding Techniques

connected and compact) and the following |X | + 1 continuous functions on P:

󶀂
󶀒 π( j) j = 1, . . . , |X | − 1,
󶀒
д j (π) = 󶀊H(Y1 ) j = |X |,
󶀒
󶀒
󶀚H(Y )
2 j = |X | + 1.

Clearly, the first |X | − 1 functions are continuous. The last two functions are also contin-
uous in π by continuity of the entropy function and linearity of p(y1 ) = ∑x p(y1 |x)π(x)
and p(y2 ) = ∑x p(y2 |x)π(x) in π. Now by the support lemma, we can find a random
variable U 󳰀 taking at most |X | + 1 values such that

H(Y1 |U ) = 󵐐 H(Y1 |U = u) dF(u) = 󵠈 H(Y1 |U 󳰀 = u󳰀 )p(u󳰀 ) = H(Y1 |U 󳰀 ),

U u󳰀

H(Y2 |U ) = 󵐐 H(Y2 |U = u) dF(u) = 󵠈 H(Y2 |U 󳰀 = u󳰀 )p(u󳰀 ) = H(Y2 |U 󳰀 ),

U u󳰀

󵐐 p(x |u) dF(u) = p(x) = 󵠈 p X|U (x |u󳰀 )p(u󳰀 )

U u󳰀

for x = 1, . . . , |X | − 1 and hence for x = |X |. Since p(x) determines p(x, y1 ) = p(x) ⋅

p(y1 |x) and p(y2 ) = ∑x p(x)p(y2 |x), H(Y1 |X) and H(Y2 ) are also preserved, and

I(X; Y1 |U) = H(Y1 |U ) − H(Y1 | X) = H(Y1 |U 󳰀 ) − H(Y1 | X) = I(X; Y1 |U 󳰀 ),

I(U ; Y2 ) = H(Y2 ) − H(Y2 |U ) = H(Y2 ) − H(Y2 |U 󳰀 ) = I(U 󳰀 ; Y2 ).

Thus we have shown that R(U , X) = R(U 󳰀 , X) for some U 󳰀 with |U 󳰀 | ≤ |X | + 1.

Now we derive the bound |U 󳰀 | ≤ |Y2 | + 1. Again assume that Y2 = {1, 2, . . . , |Y2 |}.
Consider the following |Y2 | + 1 continuous functions on P:

󶀂
󶀒 pY ( j) = ∑x pY2 |X ( j|x)π(x) j = 1, . . . , |Y2 | − 1,
󶀒 2
д j (π) = 󶀊H(Y2 ) j = |Y2 |,
󶀒
󶀒
󶀚 I(X; Y )
1 j = |Y2 | + 1.

Again by the support lemma, there exists U 󳰀 (not necessarily the same as the one above)
with |U 󳰀 | ≤ |Y2 | + 1 such that p(y2 ) = ∑u󳰀 ,x p(y2 |x)p(x|u󳰀 )p(u󳰀 ), H(Y2 ), H(Y2 |U 󳰀 ), and
I(X; Y1 |U 󳰀 ) stay the same as with the original U. Hence R(U , X) = R(U 󳰀 , X). Note that
the pmf p(x) of X itself is not necessarily preserved under U 󳰀 , which, however, does not
affect the quantities of interest.
In the same manner, we can show that R(U , X) = R(U 󳰀 , X) for some U 󳰀 with |U 󳰀 | ≤
|Y1 | + 1 by preserving p(y1 ) instead of p(y2 ). In this case, the physical degradedness of
the channel guarantees that preserving p(y1 ) also preserves H(Y2 ).
Combining the above three steps, we have shown that for every U there exists a ran-
dom variable U 󳰀 ∼ p(u󳰀 ) with |U 󳰀 | ≤ min{|X |, |Y1 |, |Y2 |} + 1 such that R(U , X) =
R(U 󳰀 , X). This establishes the desired cardinality bound on U in the capacity region
characterization of the degraded broadcast channel in Theorem ..
Extension to Multiple Random Variables 633

Remark C.. We can apply the same technique to bound the cardinality of the time-
sharing random variable Q appearing in the characterization of the DM-MAC capacity
region. Recall that the capacity region of the DM-MAC is the set of rate pairs (R1 , R2 )
such that

R1 ≤ I(X1 ; Y | X2 , Q),
R2 ≤ I(X2 ; Y | X1 , Q),
R1 + R2 ≤ I(X1 , X2 ; Y |Q)

for some pmf p(q)p(x1 |q)p(x2 |q). By considering the set P of all product pmfs on X1 × X2
(which is a connected compact subset of all pmfs on X1 × X2 ) and the following three
continuous functions:

д1 (p(x1 |q)p(x2 |q)) = I(X1 ; Y | X2 , Q = q),

д2 (p(x1 |q)p(x2 |q)) = I(X2 ; Y | X1 , Q = q),
д3 (p(x1 |q)p(x2 |q)) = I(X1 , X2 ; Y |Q = q),

we can easily establish the cardinality bound |Q| ≤ 3. Although this is weaker than the
bound |Q| ≤ 2 derived in Chapter , the technique described here is more general and
can be easily extended to other scenarios encountered in the book.

Extension to Multiple Random Variables

The above technique can be extended to give cardinality bounds for capacity regions with
two or more auxiliary random variables (Csiszár and Körner , Nair and El Gamal
).
As an example, consider the -receiver degraded DM-BC p(y1 , y2 , y3 |x). The capacity
region is the set of rate triples (R1 , R2 , R3 ) such that

R1 ≤ I(X; Y1 |V ),
R2 ≤ I(V ; Y2 |U),
R3 ≤ I(U ; Y3 )

for some pmf p(u)p(󰑣|u)p(x|󰑣). We show that it suffices to take |U | ≤ |X | + 2 and |V| ≤
(|X | + 1)(|X | + 2).
First fix p(x|󰑣) and consider the following |X | + 2 continuous functions of p(󰑣|u):

p(x |u) = 󵠈 p(x |󰑣)p(󰑣|u), x = 1, . . . , |X | − 1,

As in the -receiver DM-BC example (with the bound |U | ≤ |Y2 | + 1), there exists U 󳰀 with
634 Cardinality Bounding Techniques

|U 󳰀 | ≤ |X | + 2 such that p(x), I(X; Y1 |V ) = I(X; Y1 |V , U ), I(V ; Y2 |U ), and I(U ; Y3 ) are

preserved. Let V 󳰀 denote the corresponding random variable.
Now for each u󳰀 ∈ U 󳰀 , consider the following |X | + 1 continuous functions of
p(x|󰑣 󳰀 , u󳰀 ): p(x|󰑣 󳰀 , u󳰀 ), H(Y1 |V 󳰀 = 󰑣 󳰀 , U 󳰀 = u󳰀 ), and H(Y2 |V 󳰀 = 󰑣 󳰀 , U 󳰀 = u󳰀 ). Then as in
the -receiver DM-BC example (with the bound |U | ≤ |X | + 1), for each u󳰀 , there exists
V 󳰀󳰀 | {U 󳰀 = u󳰀 } ∼ p(󰑣 󳰀󳰀 |u󳰀 ) such that the cardinality of the support of p(󰑣 󳰀󳰀 |u󳰀 ) is upper
bounded by |X | + 1 and the quantities p(x|u󳰀 ), I(X; Y1 |V 󳰀 , U 󳰀 = u󳰀 ), and I(V 󳰀 ; Y2 |U 󳰀 =
u󳰀 ) are preserved. By relabeling the support of p(󰑣 󳰀󳰀 |u󳰀 ) for each u󳰀 ∈ U 󳰀 and redefining
p(x|󰑣 󳰀󳰀 , u󳰀 ) accordingly, we can construct V 󳰀󳰀 with |V 󳰀󳰀 | ≤ |X | + 1. However, this choice
does not satisfy the Markov chain condition U 󳰀 → V 󳰀󳰀 → X in general!
Instead, consider V 󳰀󳰀󳰀 = (U 󳰀 , V 󳰀󳰀 ). Then |V 󳰀󳰀󳰀 | ≤ |U 󳰀 |⋅|V 󳰀󳰀 | ≤ (|X | + 1)(|X | + 2) and
U → V 󳰀󳰀󳰀 → X → (Y1 , Y2 , Y3 ) form a Markov chain. Furthermore,
󳰀

I(X; Y1 |V 󳰀󳰀󳰀 ) = I(X; Y1 |U 󳰀 ) − I(V 󳰀󳰀󳰀 ; Y1 |U 󳰀 )

Perturbation Method
The technique described above does not provide cardinality bounds for all capacity/rate–
distortion regions. Most notably, the cardinality bounds on U0 , U1 , and U2 in Marton’s
inner bound for the DM-BC in Theorem . require a different technique based on a per-
turbation method introduced by Gohari and Anantharam () and further simplified
by Jog and Nair ().
To be concrete, we consider the maximum sum-rate for Marton’s inner bound in The-
orem .,
max 󶀡I(U1 ; Y1 ) + I(U2 ; Y2 ) − I(U1 ; U2 )󶀱, (C.)
p(u1 ,u2 ), x(u1 ,u2 )

and show that it suffices to take |U1 |, |U2 | ≤ |X |.

Let (U1 , U2 , X) be the random variables that attain the maximum in (C.) and let
(U1󳰀 , U2󳰀 , X 󳰀 ) ∼ pє (u1󳰀 , u󳰀2 , x 󳰀 ) = pU1 ,U2 ,X (u1󳰀 , u󳰀2 , x 󳰀 )(1 + єϕ(u1󳰀 )) be its perturbed version.
We assume that 1 + єϕ(u1 ) ≥ 0 for all u1 and E(ϕ(U1 )) = ∑u1 p(u1 )ϕ(u1 ) = 0, so that
pє (u1󳰀 , u2󳰀 , x 󳰀 ) is a valid pmf. We further assume that
E(ϕ(U1 )| X = x) = 󵠈 p(u1 , u2 |x)ϕ(u1 ) = 0
u1 ,u2
Perturbation Method 635

for all x ∈ X , which implies that pє (x 󳰀 ) = p X (x 󳰀 ). In other words, the perturbation pre-
serves the pmf of X and hence the pmf of (Y1 , Y2 ). Note that there exists a nonzero per-
turbation that satisfies this condition (along with E(ϕ(U1 )) = 0) as long as |U1 | ≥ |X | + 1,
since there are |X | + 1 linear constraints. Note also that X 󳰀 continues to be a function of
(U1󳰀 , U2󳰀 ), since pU1 ,U2 ,X (u1󳰀 , u󳰀2 , x 󳰀 ) = 0 implies that pє (u󳰀1 , u2󳰀 , x 󳰀 ) = 0. Now consider

I(U1󳰀 ; Y1󳰀 ) + I(U2󳰀 ; Y2󳰀 ) − I(U1󳰀 ; U2󳰀 )

= H(Y1󳰀 ) + H(Y2󳰀 ) + H(U1󳰀 , U2󳰀 ) − H(U1󳰀 , Y1󳰀 ) − H(U2󳰀 , Y2󳰀 )
= H(Y1 ) + H(Y2 ) + H(U1󳰀 , U2󳰀 ) − H(U1󳰀 , Y1󳰀 ) − H(U2󳰀 , Y2󳰀 )
= H(Y1 ) + H(Y2 ) + H(U1 , U2 ) − H(U1 , Y1 ) + єHϕ (U1 , U2 ) − єHϕ (U1 , Y1 ) − H(U2󳰀 , Y2󳰀 ),

where

Hϕ (U1 , U2 ) = − 󵠈 p(u1 , u2 )ϕ(u1 ) log p(u1 , u2 ),

u1 ,u2

Hϕ (U1 , Y1 ) = − 󵠈 p(u1 , y1 )ϕ(u1 ) log p(u1 , y1 ).

u1 , y 1

Since p(u1 , u2 , x) attains the maximum in (C.),

󰜕2 󵄨󵄨 󰜕2 󵄨󵄨
󶀡I(U1󳰀 ; Y1󳰀 ) + I(U2󳰀 ; Y2󳰀 ) − I(U1󳰀 ; U2󳰀 )󶀱󵄨󵄨󵄨 =− H(U2󳰀 , Y2󳰀 )󵄨󵄨󵄨 ≤ 0.
󰜕є 2 󵄨󵄨 󰜕є 2 󵄨󵄨
є=0 є=0

It can be shown by simple algebra that this is equivalent to E(E(ϕ(U1 )|U2 , Y2 )2 ) ≤ 0. In

particular, E(ϕ(U1 ) |U2 = u2 , Y2 = y2 ) = 0 whenever p(u2 , y2 ) > 0. Hence pє (u2󳰀 , y2󳰀 ) =
pU2 ,Y2 (u2󳰀 , y2󳰀 ) and H(U2󳰀 , Y2󳰀 ) = H(U2 , Y2 ). Therefore

I(U1󳰀 ; Y1󳰀 ) + I(U2󳰀 ; Y2󳰀 ) − I(U1󳰀 ; U2󳰀 )

= H(Y1 ) + H(Y2 ) + H(U1 , U2 ) − H(U1 , Y1 ) − H(U2 , Y2 ) + єHϕ (U1 , U2 ) − єHϕ (U1 , Y1 ).

Once again by the optimality of p(u1 , u2 ), we have

󰜕
I(U1󳰀 ; Y1󳰀 ) + I(U2󳰀 ; Y2󳰀 ) − I(U1󳰀 ; U2󳰀 ) = Hϕ (U1 , U2 ) − Hϕ (U1 , Y1 ) = 0,
󰜕є
which, in turn, implies that

I(U1󳰀 ; Y1󳰀 ) + I(U2󳰀 ; Y2󳰀 ) − I(U1󳰀 ; U2󳰀 ) = I(U1 ; Y1 ) + I(U2 ; Y2 ) − I(U1 ; U2 ),

that is, (U1󳰀 , U2󳰀 , X 󳰀 ) ∼ pє (u1󳰀 , u󳰀2 , x 󳰀 ) also attains the maximum in (C.). Finally, we choose
the largest є > 0 such that 1 + єϕ(u1 ) ≥ 0, that is, 1 + єϕ(u1∗ ) = 0 for some u∗1 ∈ U1 . Then
pє (u1∗ ) = 0, i.e., |U1󳰀 | ≤ |U1 | − 1, and the maximum is still attained.
We can repeat the same argument as long as |U1󳰀 | ≥ |X | + 1. Hence by induction, we
can take |U1󳰀 | = |X | while preserving I(U1 ; Y1 ) + I(U2 ; Y2 ) − I(U1 ; U2 ). Using the same
argument for U2 , we can take |U2󳰀 | = |X | as well.
APPENDIX D

Fourier–Motzkin Elimination

Suppose that R ⊆ ℝ d is the set of tuples (r1 , r2 , . . . , r d ) that satisfy a finite system of linear
inequalities Ar d ≤ b k , i.e.,

a11 r1 + a12 r2 + ⋅ ⋅ ⋅ + a1d rd ≤ b1

a21 r1 + a22 r2 + ⋅ ⋅ ⋅ + a2d rd ≤ b2
.. . (D.)
. ≤ ..
ak1 r1 + ak2 r2 + ⋅ ⋅ ⋅ + akd rd ≤ bk .

Such R is referred to as a polyhedron (or a polytope if it is bounded).

Let R 󳰀 ⊆ ℝd−1 be the projection of R onto the hyperplane {r1 = 0} (or any hyperplane
parallel to it). In other words,
. if (r1 , r2 , . . . , rd ) ∈ R, then (r2 , . . . , rd ) ∈ R 󳰀 ; and
. if (r2 , . . . , rd ) ∈ R 󳰀 , then there exists an r1 such that (r1 , r2 , . . . , r d ) ∈ R.
Thus R 󳰀 captures the exact collection of inequalities satisfied by (r2 , . . . , rd ). How can we
find the system of linear inequalities that characterize R 󳰀 from the original system of k
inequalities for R?
The Fourier–Motzkin elimination procedure (see, for example, Ziegler ) is a sys-
tematic method for finding the system of linear inequalities in (r2 , . . . , rd ). We explain
this procedure through the following.
Example D.. Consider the system of inequalities

−r1 ≤ −1,
−2r2 ≤ −1,
−r1 +r2 ≤ 0,
(D.)
−r1 −r2 ≤ −2,
+r1 +4r2 ≤ 15,
+2r1 −r2 ≤ 3.

These inequalities characterize the set R shown in Figure D..

Fourier–Motzkin Elimination 637

15/4

R
1/2
r1
1/2 3/2 2

Figure D.. The set R of (r1 , r2 ) satisfying the inequalities in (D.).

Note that r2 ∈ R 󳰀 iff there exists an r1 (for given r2 ) that satisfies all the inequalities,
which in turn happens iff every lower bound on r1 is less than or equal to every upper
bound on r1 . Hence, the sufficient and necessary condition for r2 ∈ R 󳰀 becomes
max{1, r2 , 2 − r2 } ≤ min{(r2 + 3)/2, 15 − 4r2 }
( inequalities total) in addition to the inequality −2r2 ≤ −1 in the original system, which
does not involve r1 . Solving for r2 and removing redundant inequalities, it can be easily
checked that R 󳰀 = {r2 : 1/2 ≤ r2 ≤ 3}.

The extension of this method to d dimensions is as follows.

. Partition the system of inequalities in (D.) into three sets,
(a) inequalities without r1 , i.e., a j1 = 0,
(b) upper bounds on r1 , i.e., a j1 > 0, and
(c) lower bounds on r1 , i.e., a j1 < 0.
Note that each upper or lower bound is an affine equation in (r2 , . . . , rd ).
. The first set of inequalities is copied verbatim for R 󳰀 as inequalities in (r2 , . . . , rd ).
. We then generate a new system of inequalities by writing each lower bound as less
than or equal to each upper bound.
In this manner, we obtain a new system of inequalities in (r2 , . . . , r d ).

Remark D.1. The number of inequalities can increase rapidly—in the worst case, there
can be k 2 /4 inequalities generated from the original k inequalities. Some inequalities can
be inactive and thus redundant.
Remark D.2. We can project onto an arbitrary hyperplane by taking an affine transfor-
mation of the variables. We can also eliminate multiple variables by applying the method
successively.
638 Fourier–Motzkin Elimination

Remark D.3. This method can be applied to “symbolic” inequalities as well. In this case,
we can consider (r1 , . . . , rd , b1 , . . . , bk ) as the set of variables and eliminate unnecessary
variables among r1 , . . . , r d . In particular, if the constants b1 , . . . , bk are information the-
oretic quantities, the inequality relations among them can be further incorporated to re-
move inactive inequalities; see Minero and Kim () for an example.
Remark D.4. The Fourier–Motzkin procedure can be performed easily by using com-
puter programs such as PORTA (Christof and Löbel ).

We illustrate the Fourier–Motzkin elimination procedure with the final step in the
proof of the Han–Kobayashi inner bound in Section ..

Example D. (Han–Kobayashi inner bound). We wish to eliminate R10 and R20 in the
system of inequalities

R1 − R10 < I1 ,
R1 < I2 ,
R1 − R10 + R20 < I3 ,
R1 + R20 < I4 ,
R2 − R20 < I5 ,
R2 < I6 ,
R2 − R20 + R10 < I7 ,
R2 + R10 < I8 ,
R10 ≥ 0,
R1 − R10 ≥ 0,
R20 ≥ 0,
R2 − R20 ≥ 0.

Step  (elimination of R10 ). We have three upper bounds on R10 :

R10 < I7 − R2 + R20 ,

R10 < I8 − R2 ,
R10 ≤ R1 ,

and three lower bounds on R10 :

R10 > R1 − I1 ,
R10 > R1 + R20 − I3 ,
R10 ≥ 0.

Comparing the upper and lower bounds, copying the six inequalities in the original sys-
tem that do not involve R10 , and removing trivial or redundant inequalities I1 > 0, R1 ≥ 0,
Fourier–Motzkin Elimination 639

R2 < I8 (since I6 ≤ I8 ), and R2 − R20 < I7 (since I5 ≤ I7 ), we obtain the new system of in-
equalities in (R20 , R1 , R2 ):
R1 < I2 ,
R1 + R20 < I4 ,
R2 − R20 < I5 ,
R2 < I6 ,
R1 + R2 − R20 < I1 + I7 ,
R1 + R2 < I1 + I8 ,
R1 + R2 < I3 + I7 ,
R1 + R2 + R20 < I3 + I8 ,
R20 < I3 ,
R20 ≥ 0,
R2 − R20 ≥ 0.
Step  (elimination of R20 ). Comparing the upper bounds on R20
R20 < I4 − R1 ,
R20 < I3 + I8 − R1 − R2 ,
R20 < I3 ,
R20 ≤ R2
with the lower bounds
R20 > R2 − I5 ,
R20 > R1 + R2 − I1 − I7 ,
R20 ≥ 0,
copying inequalities that do not involve R20 , and removing trivial or redundant inequali-
ties I3 > 0, I5 > 0, R2 ≥ 0, R1 + R2 < I1 + I3 + I7 , R1 < I4 (since I2 ≤ I4 ), R1 + R2 < I3 + I8
(since I1 ≤ I3 ), and 2R1 + 2R2 < I1 + I3 + I7 + I8 (since it is the sum of R1 + R2 < I1 + I8
and R1 + R2 < I3 + I7 ), we obtain nine inequalities in (R1 , R2 ):
R1 < I2 ,
R1 < I1 + I7 ,
R2 < I6 ,
R2 < I3 + I5 ,
R1 + R2 < I1 + I8 ,
R1 + R2 < I3 + I7 ,
R1 + R2 < I4 + I5 ,
2R1 + R2 < I1 + I4 + I7 ,
R1 + 2R2 < I3 + I5 + I8 .
APPENDIX E

Convex Optimization

We review basic results in convex optimization. For details, readers are referred to Boyd
and Vandenberghe ().
An optimization problem

minimize д0 (x)
subject to д j (x) ≤ 0, j ∈ [1 : k],
Ax = b

with variables x is convex if the functions д j , j ∈ [0 : k], are convex. We denote by

D = 󶁁x : д j (x) ≤ 0, j ∈ [1 : k], Ax = b󶁑

the set of feasible points (the domain of the optimization problem). The convex optimiza-
tion problem is said to be feasible if D ̸= . The optimal value of the problem is denoted
by p∗ = inf{д0 (x) : x ∈ D} (or −∞ if the problem is infeasible). Any x that attains the
infimum is said to be optimal and is denoted by x∗ .

Example E.1. Linear program (LP):

minimize cT x
subject to Ax = b,
x j ≥ 0 for all j.

Example E.2. Differential entropy maximization under correlation constraint:

maximize log |K |
subject to K j, j+k = ak , k ∈ [0 : l].

Note that this problem is a special case of matrix determinant maximization (max-det)
with linear matrix inequalities (Vandenberghe, Boyd, and Wu ).

We define the Lagrangian associated with a feasible optimization problem

minimize д0 (x)
subject to д j (x) ≤ 0, j ∈ [1 : k],
Ax = b
Convex Optimization 641

as
k
L(x, λ, 󰜈) = д0 (x) + 󵠈 λ j д j (x) + 󰜈T (Ax − b).
j=1

We further define the Lagrangian dual function (or dual function in short) as

ϕ(λ, 󰜈) = inf
x
L(x, λ, 󰜈).

It can be easily seen that for any (λ, 󰜈) with λ j ≥ 0 for all j and any feasible x, ϕ(λ, 󰜈) ≤
д0 (x). This leads to the (Lagrange) dual problem

maximize ϕ(λ, 󰜈)
subject to λ j ≥ 0, j ∈ [1 : k].

Note that ϕ(λ, 󰜈) is concave and hence the dual problem is convex (regardless of whether
the primal problem is convex or not). The optimal value of the dual problem is denoted
by d ∗ and the dual optimal point is denoted by (λ∗ , 󰜈 ∗ ).
The original optimization problem, its feasible set, and optimal value are sometimes
referred to as the primal problem, primal feasible set, and primal optimal value, respec-
tively.
Example E.. Consider the LP discussed above with x = x n . The Lagrangian is
n
L(x, λ, 󰜈) = cT x − 󵠈 λ j x j + 󰜈 T (Ax − b) = −bT 󰜈 + (c + AT 󰜈 − λ)T x
j=1

and the dual function is

ϕ(λ, 󰜈) = −bT 󰜈 + inf

x
(c + AT 󰜈 − λ)T x

−bT 󰜈 if AT 󰜈 − λ + c = 0,
=󶁇
−∞ otherwise.

Hence, the dual problem is

maximize −bT 󰜈
subject to AT 󰜈 − λ + c = 0,
λj ≥ 0 for all j,

which is another LP.

By the definition of the dual function and dual problem, we already know that the
lower bound on the primal optimal value is d ∗ ≤ p∗ . When this bound is tight (i.e., d ∗ =
p∗ ), we say that strong duality holds. One simple sufficient condition for strong duality to
hold is the following.
642 Convex Optimization

Slater’s Condition. If the primal problem is convex and there exists a feasible x in the
relative interior of D, i.e., д j (x) < 0, j ∈ [1 : k], and Ax = b, then strong duality holds.

If strong duality holds (say, Slater’s condition holds), then

д0 (x∗ ) = ϕ(λ∗ , 󰜈 ∗ )
k
≤ inf
x
д0 (x) + 󵠈 λ∗j д j (x) + (󰜈 ∗ )T (Ax − b)
j=1
k
≤ д0 (x∗ ) + 󵠈 λ∗j д j (x∗ ) + (󰜈 ∗ )T (Ax∗ − b)
j=1
∗
≤ д0 (x ).
Following the equality conditions, we obtain the following sufficient and necessary condi-
tion, commonly referred to as the Karush–Kuhn–Tucker (KKT) condition, for the primal
optimal point x∗ and dual optimal point (λ∗ , 󰜈 ∗ ):
. x∗ minimizes L(x, λ∗ , 󰜈 ∗ ). This condition can be easily checked if L(x, λ∗ , 󰜈 ∗ ) is dif-
ferentiable.
. Complementary slackness: λ∗j д j (x∗ ) = 0, j ∈ [1 : k].

Example E.. Consider the determinant maximization problem

maximize log | X + K |
subject to X ⪰ 0,
tr(X) ≤ P,
where K is a given positive definite matrix.
Noting that X ⪰ 0 iff tr(ΥX) ≥ 0 for every Υ ⪰ 0, we form the Lagrangian
L(X, Υ, λ) = log | X + K | + tr(ΥX) − λ(tr(X) − P).
Since the domain D has a nonempty interior (for example, X = (P/2)I is feasible), Slater’s
condition is satisfied and strong duality holds. Hence the KKT condition characterizes the
optimal X ∗ and (Υ∗ , λ∗ ):
. X ∗ maximizes the Lagrangian. Since the derivative of log |Y | is Y −1 for Y ≻ 0 and the
derivative of tr(AY) is A (Boyd and Vandenberghe ),
󰜕
L(X, Υ∗ , λ∗ ) = (X + K)−1 + Υ ∗ − λI = 0.
󰜕X
. Complementary slackness:
tr(Υ∗ X ∗ ) = 0,
λ∗ (tr(X ∗ ) − P) = 0.
Bibliography

Abramson, N. (). The ALOHA system: Another alternative for computer communications. In
Proc. AFIPS Joint Comput. Conf., Houston, TX, pp. –. []
Ahlswede, R. (). Multiway communication channels. In Proc. nd Int. Symp. Inf. Theory,
Tsahkadsor, Armenian SSR, pp. –. [xvii, ]
Ahlswede, R. (). A constructive proof of the coding theorem for discrete memoryless channels
in case of complete feedback. In Trans. th Prague Conf. Inf. Theory, Statist. Decision Functions,
Random Processes (Tech Univ., Prague, ), pp. –. Academia, Prague. []
Ahlswede, R. (). The capacity region of a channel with two senders and two receivers. Ann.
Probability, (), –. [xvii, , , ]
Ahlswede, R. (). Elimination of correlation in random codes for arbitrarily varying channels.
Probab. Theory Related Fields, (), –. [, ]
Ahlswede, R. (). Coloring hypergraphs: A new approach to multi-user source coding—I. J.
Combin. Inf. Syst. Sci., (), –. []
Ahlswede, R. (). The rate–distortion region for multiple descriptions without excess rate. IEEE
Trans. Inf. Theory, (), –. []
Ahlswede, R. and Cai, N. (). Transmission, identification, and common randomness capaci-
ties for wire-tape channels with secure feedback from the decoder. In R. Ahlswede, L. Bäumer,
N. Cai, H. Aydinian, V. Blinovsky, C. Deppe, and H. Mashurian (eds.) General Theory of Infor-
mation Transfer and Combinatorics, pp. –. Springer, Berlin. [, ]
Ahlswede, R., Cai, N., Li, S.-Y. R., and Yeung, R. W. (). Network information flow. IEEE Trans.
Inf. Theory, (), –. [, ]
Ahlswede, R. and Csiszár, I. (). Common randomness in information theory and cryptogra-
phy—I: Secret sharing. IEEE Trans. Inf. Theory, (), –. []
Ahlswede, R., Gács, P., and Körner, J. (). Bounds on conditional probabilities with applications
in multi-user communication. Probab. Theory Related Fields, (), –. Correction ().
ibid, (), –. []
Ahlswede, R. and Körner, J. (). Source coding with side information and a converse for de-
graded broadcast channels. IEEE Trans. Inf. Theory, (), –. [, , ]
Ahlswede, R. and Wolfowitz, J. (). Correlated decoding for channels with arbitrarily varying
channel probability functions. Inf. Control, (), –. []
Ahlswede, R. and Wolfowitz, J. (). The capacity of a channel with arbitrarily varying channel
probability functions and binary output alphabet. Z. Wahrsch. verw. Gebiete, (), –.
[]
Ahmad, S. H. A., Jovičić, A., and Viswanath, P. (). On outer bounds to the capacity region of
wireless networks. IEEE Trans. Inf. Theory, (), –. []
644 Bibliography

Aleksic, M., Razaghi, P., and Yu, W. (). Capacity of a class of modulo-sum relay channels. IEEE
Trans. Inf. Theory, (), –. []
Annapureddy, V. S. and Veeravalli, V. V. (). Gaussian interference networks: Sum capacity
in the low interference regime and new outer bounds on the capacity region. IEEE Trans. Inf.
Theory, (), –. []
Ardestanizadeh, E. (). Feedback communication systems: Fundamental limits and control-
theoretic approach. Ph.D. thesis, University of California, San Diego, La Jolla, CA. []
Ardestanizadeh, E., Franceschetti, M., Javidi, T., and Kim, Y.-H. (). Wiretap channel with
secure rate-limited feedback. IEEE Trans. Inf. Theory, (), –. []
Aref, M. R. (). Information flow in relay networks. Ph.D. thesis, Stanford University, Stanford,
CA. []
Arıkan, E. (). Channel polarization: A method for constructing capacity-achieving codes for
symmetric binary-input memoryless channels. IEEE Trans. Inf. Theory, (), –. []
Artstein, S., Ball, K. M., Barthe, F., and Naor, A. (). Solution of Shannon’s problem on the
monotonicity of entropy. J. Amer. Math. Soc., (), –. []
Avestimehr, A. S., Diggavi, S. N., and Tse, D. N. C. (). Wireless network information flow: A
deterministic approach. IEEE Trans. Inf. Theory, (), –. [, ]
Ayaso, O., Shah, D., and Dahleh, M. A. (). Information theoretic bounds for distributed com-
putation over networks of point-to-point channels. IEEE Trans. Inf. Theory, (), –.
[]
Bandemer, B. (). Capacity region of the -user-pair symmetric deterministic IC. Applet avail-
able at http://www.stanford.edu/~bandemer/detic/detic/. []
Bandemer, B. and El Gamal, A. (). Interference decoding for deterministic channels. IEEE
Trans. Inf. Theory, (), –. []
Bandemer, B., Vazquez-Vilar, G., and El Gamal, A. (). On the sum capacity of a class of cycli-
cally symmetric deterministic interference channels. In Proc. IEEE Int. Symp. Inf. Theory, Seoul,
Korea, pp. –. []
Barron, A. R. (). The strong ergodic theorem for densities: Generalized Shannon–McMillan–
Breiman theorem. Ann. Probab., (), –. []
Beckner, W. (). Inequalities in Fourier analysis. Ann. Math., (), –. []
Berger, T. (). Rate distortion theory for sources with abstract alphabets and memory. Inf.
Control, (), –. []
Berger, T. (). Multiterminal source coding. In G. Longo (ed.) The Information Theory Approach
to Communications, pp. –. Springer-Verlag, New York. [, ]
Berger, T. and Yeung, R. W. (). Multiterminal source encoding with one distortion criterion.
IEEE Trans. Inf. Theory, (), –. [, ]
Berger, T. and Zhang, Z. (). Minimum breakdown degradation in binary source encoding.
IEEE Trans. Inf. Theory, (), –. []
Berger, T., Zhang, Z., and Viswanathan, H. (). The CEO problem. IEEE Trans. Inf. Theory,
(), –. []
Bergmans, P. P. (). Random coding theorem for broadcast channels with degraded components.
IEEE Trans. Inf. Theory, (), –. []
Bergmans, P. P. (). A simple converse for broadcast channels with additive white Gaussian
noise. IEEE Trans. Inf. Theory, (), –. []
Bibliography 645

Bierbaum, M. and Wallmeier, H. (). A note on the capacity region of the multiple-access chan-
nel. IEEE Trans. Inf. Theory, (), . []
Biglieri, E., Calderbank, R., Constantinides, A., Goldsmith, A. J., Paulraj, A., and Poor, H. V. ().
MIMO Wireless Communications. Cambridge University Press, Cambridge. []
Biglieri, E., Proakis, J., and Shamai, S. (). Fading channels: Information-theoretic and com-
munications aspects. IEEE Trans. Inf. Theory, (), –. []
Blachman, N. M. (). The convolution inequality for entropy powers. IEEE Trans. Inf. Theory,
(), –. []
Blackwell, D., Breiman, L., and Thomasian, A. J. (). The capacity of a class of channels. Ann.
Math. Statist., (), –. []
Blackwell, D., Breiman, L., and Thomasian, A. J. (). The capacity of a certain channel classes
under random coding. Ann. Math. Statist., (), –. []
Borade, S., Zheng, L., and Trott, M. (). Multilevel broadcast networks. In Proc. IEEE Int. Symp.
Inf. Theory, Nice, France, pp. –. []
Boyd, S., Ghosh, A., Prabhakar, B., and Shah, D. (). Randomized gossip algorithms. IEEE
Trans. Inf. Theory, (), –. []
Boyd, S. and Vandenberghe, L. (). Convex Optimization. Cambridge University Press, Cam-
bridge. [, , ]
Brascamp, H. J. and Lieb, E. H. (). Best constants in Young’s inequality, its converse, and its
generalization to more than three functions. Adv. Math., (), –. []
Breiman, L. (). The individual ergodic theorem of information theory. Ann. Math. Statist.,
(), –. Correction (). (), –. []
Bresler, G., Parekh, A., and Tse, D. N. C. (). The approximate capacity of the many-to-one and
one-to-many Gaussian interference channel. IEEE Trans. Inf. Theory, (), –. [,
]
Bresler, G. and Tse, D. N. C. (). The two-user Gaussian interference channel: A deterministic
view. Euro. Trans. Telecomm., (), –. []
Bross, S. I., Lapidoth, A., and Wigger, M. A. (). The Gaussian MAC with conferencing en-
coders. In Proc. IEEE Int. Symp. Inf. Theory, Toronto, Canada, pp. –. []
Bucklew, J. A. (). The source coding theorem via Sanov’s theorem. IEEE Trans. Inf. Theory,
(), –. []
Butman, S. (). Linear feedback rate bounds for regressive channels. IEEE Trans. Inf. Theory,
(), –. []
Cadambe, V. and Jafar, S. A. (). Interference alignment and degrees of freedom of the K-user
interference channel. IEEE Trans. Inf. Theory, (), –. [, , ]
Cadambe, V., Jafar, S. A., and Shamai, S. (). Interference alignment on the deterministic chan-
nel and application to fully connected Gaussian interference channel. IEEE Trans. Inf. Theory,
(), –. []
Caire, G. and Shamai, S. (). On the capacity of some channels with channel state information.
IEEE Trans. Inf. Theory, (), –. []
Caire, G. and Shamai, S. (). On the achievable throughput of a multiantenna Gaussian broad-
cast channel. IEEE Trans. Inf. Theory, (), –. []
Cannons, J., Dougherty, R., Freiling, C., and Zeger, K. (). Network routing capacity. IEEE
Trans. Inf. Theory, (), –. []
646 Bibliography

Carleial, A. B. (). A case where interference does not reduce capacity. IEEE Trans. Inf. Theory,
(), –. []
Carleial, A. B. (). Interference channels. IEEE Trans. Inf. Theory, (), –. []
Carleial, A. B. (). Multiple-access channels with different generalized feedback signals. IEEE
Trans. Inf. Theory, (), –. [, ]
Cemal, Y. and Steinberg, Y. (). The multiple-access channel with partial state information at
the encoders. IEEE Trans. Inf. Theory, (), –. [, ]
Chen, J. (). Rate region of Gaussian multiple description coding with individual and central
distortion constraints. IEEE Trans. Inf. Theory, (), –. []
Chen, J., Tian, C., Berger, T., and Hemami, S. S. (). Multiple description quantization via
Gram–Schmidt orthogonalization. IEEE Trans. Inf. Theory, , –. []
Cheng, R. S. and Verdú, S. (). Gaussian multiaccess channels with ISI: Capacity region and
multiuser water-filling. IEEE Trans. Inf. Theory, (), –. []
Chia, Y.-K. and El Gamal, A. (). -receiver broadcast channels with common and confidential
messages. In Proc. IEEE Int. Symp. Inf. Theory, Seoul, Korea, pp. –. []
Chia, Y.-K. and El Gamal, A. (). Wiretap channel with causal state information. In Proc. IEEE
Int. Symp. Inf. Theory, Austin, TX, pp. –. []
Chong, H.-F., Motani, M., Garg, H. K., and El Gamal, H. (). On the Han–Kobayashi region
for the interference channel. IEEE Trans. Inf. Theory, (), –. []
Chou, P. A., Wu, Y., and Jain, K. (). Practical network coding. In Proc. st Ann. Allerton Conf.
Comm. Control Comput., Monticello, IL. []
Christof, T. and Löbel, A. (). PORTA: Polyhedron representation transformation algorithm.
Software available at http://typo.zib.de/opt-long_projects/Software/Porta/. []
Cohen, A. S. and Lapidoth, A. (). The Gaussian watermarking game. IEEE Trans. Inf. Theory,
(), –. []
Costa, M. H. M. (). Writing on dirty paper. IEEE Trans. Inf. Theory, (), –. []
Costa, M. H. M. (). A new entropy power inequality. IEEE Trans. Inf. Theory, (), –.
[]
Costa, M. H. M. and Cover, T. M. (). On the similarity of the entropy power inequality and
the Brunn–Minkowski inequality. IEEE Trans. Inf. Theory, (), –. []
Costa, M. H. M. and El Gamal, A. (). The capacity region of the discrete memoryless interfer-
ence channel with strong interference. IEEE Trans. Inf. Theory, (), –. []
Cover, T. M. (). Broadcast channels. IEEE Trans. Inf. Theory, (), –. [xvii, , , ]
Cover, T. M. (a). A proof of the data compression theorem of Slepian and Wolf for ergodic
sources. IEEE Trans. Inf. Theory, (), –. []
Cover, T. M. (b). An achievable rate region for the broadcast channel. IEEE Trans. Inf. Theory,
(), –. [, ]
Cover, T. M. (c). Some advances in broadcast channels. In A. J. Viterbi (ed.) Advances in
Communication Systems, vol. , pp. –. Academic Press, San Francisco. []
Cover, T. M. (). Comments on broadcast channels. IEEE Trans. Inf. Theory, (), –.
[]
Cover, T. M. and Chiang, M. (). Duality between channel capacity and rate distortion with
two-sided state information. IEEE Trans. Inf. Theory, (), –. []
Bibliography 647

Cover, T. M. and El Gamal, A. (). Capacity theorems for the relay channel. IEEE Trans. Inf.
Theory, (), –. [, ]
Cover, T. M., El Gamal, A., and Salehi, M. (). Multiple access channels with arbitrarily corre-
lated sources. IEEE Trans. Inf. Theory, (), –. []
Cover, T. M. and Kim, Y.-H. (). Capacity of a class of deterministic relay channels. In Proc.
IEEE Int. Symp. Inf. Theory, Nice, France, pp. –. []
Cover, T. M. and Leung, C. S. K. (). An achievable rate region for the multiple-access channel
with feedback. IEEE Trans. Inf. Theory, (), –. []
Cover, T. M., McEliece, R. J., and Posner, E. C. (). Asynchronous multiple-access channel
capacity. IEEE Trans. Inf. Theory, (), –. []
Cover, T. M. and Thomas, J. A. (). Elements of Information Theory. nd ed. Wiley, New York.
[, , , ]
Csiszár, I. and Körner, J. (). Broadcast channels with confidential messages. IEEE Trans. Inf.
Theory, (), –. [, ]
Csiszár, I. and Körner, J. (a). On the capacity of the arbitrarily varying channel for maximum
probability of error. Z. Wahrsch. Verw. Gebiete, (), –. []
Csiszár, I. and Körner, J. (b). Information Theory: Coding Theorems for Discrete Memoryless
Systems. Akadémiai Kiadó, Budapest. [xvii, , , , , , , ]
Csiszár, I. and Narayan, P. (). The capacity of the arbitrarily varying channel revisited: Positiv-
ity, constraints. IEEE Trans. Inf. Theory, (), –. []
Csiszár, I. and Narayan, P. (). Common randomness and secret key generation with a helper.
IEEE Trans. Inf. Theory, (), –. []
Csiszár, I. and Narayan, P. (). Secrecy capacities for multiple terminals. IEEE Trans. Inf. Theory,
(), –. [, ]
Csiszár, I. and Narayan, P. (). Secrecy capacities for multiterminal channel models. IEEE
Trans. Inf. Theory, (), –. []
Cuff, P., Su, H.-I., and El Gamal, A. (). Cascade multiterminal source coding. In Proc. IEEE
Int. Symp. Inf. Theory, Seoul, Korea, pp. –. []
Dana, A. F., Gowaikar, R., Palanki, R., Hassibi, B., and Effros, M. (). Capacity of wireless
erasure networks. IEEE Trans. Inf. Theory, (), –. []
De Bruyn, K., Prelov, V. V., and van der Meulen, E. C. (). Reliable transmission of two correlated
sources over an asymmetric multiple-access channel. IEEE Trans. Inf. Theory, (), –.
[]
Devroye, N., Mitran, P., and Tarokh, V. (). Achievable rates in cognitive radio channels. IEEE
Trans. Inf. Theory, (), –. []
Diaconis, P. and Freedman, D. (). Iterated random functions. SIAM Rev., (), –. []
Dobrushin, R. L. (a). General formulation of Shannon’s main theorem in information theory.
Uspkhi Mat. Nauk, (), –. English translation (). Amer. Math. Soc. Transl., (),
–. []
Dobrushin, R. L. (b). Optimum information transmission through a channel with unknown
parameters. Radio Eng. Electron., (), –. []
Dougherty, R., Freiling, C., and Zeger, K. (). Insufficiency of linear coding in network infor-
mation flow. IEEE Trans. Inf. Theory, (), –. []
648 Bibliography

Dougherty, R., Freiling, C., and Zeger, K. (). Network coding and matroid theory. Proc. IEEE,
(), –. []
Dueck, G. (). Maximal error capacity regions are smaller than average error capacity regions
for multi-user channels. Probl. Control Inf. Theory, (), –. []
Dueck, G. (). The capacity region of the two-way channel can exceed the inner bound. Inf.
Control, (), –. []
Dueck, G. (). Partial feedback for two-way and broadcast channels. Inf. Control, (), –.
[]
Dueck, G. (a). A note on the multiple access channel with correlated sources. IEEE Trans. Inf.
Theory, (), –. []
Dueck, G. (b). The strong converse of the coding theorem for the multiple-access channel.
J. Combin. Inf. Syst. Sci., (), –. []
Dunham, J. G. (). A note on the abstract-alphabet block source coding with a fidelity criterion
theorem. IEEE Trans. Inf. Theory, (), . []
Durrett, R. (). Probability: Theory and Examples. th ed. Cambridge University Press, Cam-
bridge. [, ]
Effros, M., Médard, M., Ho, T., Ray, S., Karger, D. R., Koetter, R., and Hassibi, B. (). Linear
network codes: A unified framework for source, channel, and network coding. In Advances
in Network Information Theory, pp. –. American Mathematical Society, Providence, RI.
[]
Eggleston, H. G. (). Convexity. Cambridge University Press, Cambridge. []
El Gamal, A. (). The feedback capacity of degraded broadcast channels. IEEE Trans. Inf. Theory,
(), –. []
El Gamal, A. (). The capacity of a class of broadcast channels. IEEE Trans. Inf. Theory, (),
–. [, ]
El Gamal, A. (). Capacity of the product and sum of two unmatched broadcast channels. Probl.
Inf. Transm., (), –. []
El Gamal, A. (). On information flow in relay networks. In Proc. IEEE National Telecomm.
Conf., vol. , pp. D..–D... New Orleans, LA. []
El Gamal, A. and Aref, M. R. (). The capacity of the semideterministic relay channel. IEEE
Trans. Inf. Theory, (), . []
El Gamal, A. and Costa, M. H. M. (). The capacity region of a class of deterministic interference
channels. IEEE Trans. Inf. Theory, (), –. []
El Gamal, A. and Cover, T. M. (). Multiple user information theory. Proc. IEEE, (), –
. [xvii, ]
El Gamal, A. and Cover, T. M. (). Achievable rates for multiple descriptions. IEEE Trans. Inf.
Theory, (), –. []
El Gamal, A., Hassanpour, N., and Mammen, J. (). Relay networks with delays. IEEE Trans.
Inf. Theory, (), –. []
El Gamal, A., Mammen, J., Prabhakar, B., and Shah, D. (a). Optimal throughput–delay scaling
in wireless networks—I: The fluid model. IEEE Trans. Inf. Theory, (), –. [, ]
El Gamal, A., Mammen, J., Prabhakar, B., and Shah, D. (b). Optimal throughput–delay scaling
in wireless networks—II: Constant-size packets. IEEE Trans. Inf. Theory, (), –. []
Bibliography 649

El Gamal, A., Mohseni, M., and Zahedi, S. (). Bounds on capacity and minimum energy-per-
bit for AWGN relay channels. IEEE Trans. Inf. Theory, (), –. []
El Gamal, A. and Orlitsky, A. (). Interactive data compression. In Proc. th Ann. Symp. Found.
Comput. Sci., Washington DC, pp. –. [, ]
El Gamal, A. and van der Meulen, E. C. (). A proof of Marton’s coding theorem for the discrete
memoryless broadcast channel. IEEE Trans. Inf. Theory, (), –. []
El Gamal, A. and Zahedi, S. (). Capacity of a class of relay channels with orthogonal compo-
nents. IEEE Trans. Inf. Theory, (), –. []
Elia, N. (). When Bode meets Shannon: Control-oriented feedback communication schemes.
IEEE Trans. Automat. Control, (), –. []
Elias, P. (). Coding for noisy channels. In IRE Int. Conv. Rec., vol. , part , pp. –. []
Elias, P., Feinstein, A., and Shannon, C. E. (). A note on the maximum flow through a network.
IRE Trans. Inf. Theory, (), –. [, ]
Ephremides, A. and Hajek, B. E. (). Information theory and communication networks: An
unconsummated union. IEEE Trans. Inf. Theory, (), –. []
Equitz, W. H. R. and Cover, T. M. (). Successive refinement of information. IEEE Trans. Inf.
Theory, (), –. Addendum (). ibid, (), –. []
Erez, U., Shamai, S., and Zamir, R. (). Capacity and lattice strategies for canceling known
interference. IEEE Trans. Inf. Theory, (), –. []
Erez, U. and Zamir, R. (). Achieving 12 log(1 + SNR) on the AWGN channel with lattice en-
coding and decoding. IEEE Trans. Inf. Theory, (), –. []
Etkin, R., Tse, D. N. C., and Wang, H. (). Gaussian interference channel capacity to within one
bit. IEEE Trans. Inf. Theory, (), –. [, ]
Fano, R. M. (). Transmission of information. Unpublished course notes, Massachusetts Insti-
tute of Technology, Cambridge, MA. []
Feinstein, A. (). A new basic theorem of information theory. IRE Trans. Inf. Theory, (), –.
[]
Ford, L. R., Jr. and Fulkerson, D. R. (). Maximal flow through a network. Canad. J. Math., (),
–. [, ]
Forney, G. D., Jr. (). Information theory. Unpublished course notes, Stanford University, Stan-
ford, CA. []
Foschini, G. J. (). Layered space-time architecture for wireless communication in a fading
environment when using multi-element antennas. Bell Labs Tech. J., (), –. [, ]
Fragouli, C. and Soljanin, E. (a). Network coding fundamentals. Found. Trends Netw., (),
–. []
Fragouli, C. and Soljanin, E. (b). Network coding applications. Found. Trends Netw., (),
–. []
Franceschetti, M., Dousse, O., Tse, D. N. C., and Thiran, P. (). Closing the gap in the capacity
of wireless networks via percolation theory. IEEE Trans. Inf. Theory, (), –. []
Franceschetti, M., Migliore, M. D., and Minero, P. (). The capacity of wireless networks:
Information-theoretic and physical limits. IEEE Trans. Inf. Theory, (), –. []
Fu, F.-W. and Yeung, R. W. (). On the rate–distortion region for multiple descriptions. IEEE
Trans. Inf. Theory, (), –. []
650 Bibliography

Gaarder, N. T. and Wolf, J. K. (). The capacity region of a multiple-access discrete memoryless
channel can increase with feedback. IEEE Trans. Inf. Theory, (), –. []
Gács, P. and Körner, J. (). Common information is far less than mutual information. Probl.
Control Inf. Theory, (), –. []
Gallager, R. G. (). Low-Density Parity-Check Codes. MIT Press, Cambridge, MA. []
Gallager, R. G. (). A simple derivation of the coding theorem and some applications. IEEE
Trans. Inf. Theory, (), –. []
Gallager, R. G. (). Information Theory and Reliable Communication. Wiley, New York. [–]
Gallager, R. G. (). Capacity and coding for degraded broadcast channels. Probl. Inf. Transm.,
(), –. []
Gallager, R. G. (). A perspective on multiaccess channels. IEEE Trans. Inf. Theory, (), –
. []
Gardner, R. J. (). The Brunn–Minkowski inequality. Bull. Amer. Math. Soc. (N.S.), (), –
. []
Gastpar, M., Rimoldi, B., and Vetterli, M. (). To code, or not to code: Lossy source-channel
communication revisited. IEEE Trans. Inf. Theory, (), –. []
Gastpar, M. and Vetterli, M. (). On the capacity of large Gaussian relay networks. IEEE Trans.
Inf. Theory, (), –. []
Gelfand, S. I. (). Capacity of one broadcast channel. Probl. Inf. Transm., (), –. []
Gelfand, S. I. and Pinsker, M. S. (a). Capacity of a broadcast channel with one deterministic
component. Probl. Inf. Transm., (), –. []
Gelfand, S. I. and Pinsker, M. S. (b). Coding for channel with random parameters. Probl.
Control Inf. Theory, (), –. []
Gelfand, S. I. and Pinsker, M. S. (). On Gaussian channels with random parameters. In Proc.
th Int. Symp. Inf. Theory, Tashkent, USSR, part , pp. –. []
Geng, Y., Gohari, A. A., Nair, C., and Yu, Y. (). The capacity region for two classes of product
broadcast channels. In Proc. IEEE Int. Symp. Inf. Theory, Saint Petersburg, Russia. []
Ghaderi, J., Xie, L.-L., and Shen, X. (). Hierarchical cooperation in ad hoc networks: optimal
clustering and achievable throughput. IEEE Trans. Inf. Theory, (), –. []
Ghasemi, A., Motahari, A. S., and Khandani, A. K. (). Interference alignment for the K user
MIMO interference channel. In Proc. IEEE Int. Symp. Inf. Theory, Austin, TX, pp. –.
[]
Gohari, A. A. and Anantharam, V. (). An outer bound to the admissible source region of
broadcast channels with arbitrarily correlated sources and channel variations. In Proc. th
Ann. Allerton Conf. Comm. Control Comput., Monticello, IL, pp. –. []
Gohari, A. A. and Anantharam, V. (). Evaluation of Marton’s inner bound for the general
broadcast channel. In Proc. IEEE Int. Symp. Inf. Theory, Seoul, Korea, pp. –. [,
]
Gohari, A. A. and Anantharam, V. (a). Information-theoretic key agreement of multiple ter-
minals—I: Source model. IEEE Trans. Inf. Theory, (), –. []
Gohari, A. A. and Anantharam, V. (b). Information-theoretic key agreement of multiple ter-
minals—II: Channel model. IEEE Trans. Inf. Theory, (), –. []
Gohari, A. A., El Gamal, A., and Anantharam, V. (). On an outer bound and an inner bound for
Bibliography 651

the general broadcast channel. In Proc. IEEE Int. Symp. Inf. Theory, Austin, TX, pp. –.
[]
Golay, M. J. E. (). Note on the theoretical efficiency of information reception with PPM. Proc.
IRE, (), . []
Goldsmith, A. J. (). Wireless Communications. Cambridge University Press, Cambridge. []
Goldsmith, A. J. and Varaiya, P. P. (). Capacity of fading channels with channel side informa-
tion. IEEE Trans. Inf. Theory, (), –. [, ]
Gou, T. and Jafar, S. A. (). Capacity of a class of symmetric SIMO Gaussian interference chan-
nels within O(1). In Proc. IEEE Int. Symp. Inf. Theory, Seoul, Korea, pp. –. []
Gou, T. and Jafar, S. A. (). Degrees of freedom of the K user M × N MIMO interference channel.
IEEE Trans. Inf. Theory, (), –. []
Gray, R. M. (). Entropy and Information Theory. Springer, New York. [, ]
Gray, R. M. and Wyner, A. D. (). Source coding for a simple network. Bell Syst. Tech. J., (),
–. []
Grossglauser, M. and Tse, D. N. C. (). Mobility increases the capacity of ad hoc wireless net-
works. IEEE/ACM Trans. Netw., (), –. []
Gupta, A. and Verdú, S. (). Operational duality between lossy compression and channel coding.
IEEE Trans. Inf. Theory, (), –. []
Gupta, P. and Kumar, P. R. (). The capacity of wireless networks. IEEE Trans. Inf. Theory, (),
–. []
Gupta, P. and Kumar, P. R. (). Toward an information theory of large networks: An achievable
rate region. IEEE Trans. Inf. Theory, (), –. []
Hajek, B. E. and Pursley, M. B. (). Evaluation of an achievable rate region for the broadcast
channel. IEEE Trans. Inf. Theory, (), –. [, ]
Han, T. S. (). The capacity region of general multiple-access channel with certain correlated
sources. Inf. Control, (), –. [, ]
Han, T. S. (). An information-spectrum approach to capacity theorems for the general
multiple-access channel. IEEE Trans. Inf. Theory, (), –. []
Han, T. S. and Costa, M. H. M. (). Broadcast channels with arbitrarily correlated sources. IEEE
Trans. Inf. Theory, (), –. []
Han, T. S. and Kobayashi, K. (). A new achievable rate region for the interference channel. IEEE
Trans. Inf. Theory, (), –. [, ]
Han, T. S. and Kobayashi, K. (). A dichotomy of functions F(X, Y) of correlated sources (X, Y)
from the viewpoint of the achievable rate region. IEEE Trans. Inf. Theory, (), –. []
Hardy, G. H. (). Divergent Series. nd ed. American Mathematical Society, Providence, RI.
[]
Heegard, C. and Berger, T. (). Rate distortion when side information may be absent. IEEE
Trans. Inf. Theory, (), –. [, ]
Heegard, C. and El Gamal, A. (). On the capacity of computer memories with defects. IEEE
Trans. Inf. Theory, (), –. [, ]
Hekstra, A. P. and Willems, F. M. J. (). Dependence balance bounds for single-output two-way
channels. IEEE Trans. Inf. Theory, (), –. []
Ho, T. and Lun, D. (). Network Coding: An Introduction. Cambridge University Press, Cam-
bridge. []
652 Bibliography

Ho, T., Médard, M., Koetter, R., Karger, D. R., Effros, M., Shi, J., and Leong, B. (). A random
linear network coding approach to multicast. IEEE Trans. Inf. Theory, (), –. []
Hoeffding, W. (). Probability inequalities for sums of bounded random variables. J. Amer.
Statist. Assoc., , –. []
Horstein, M. (). Sequential transmission using noiseless feedback. IEEE Trans. Inf. Theory,
(), –. []
Høst-Madsen, A. and Zhang, J. (). Capacity bounds and power allocation for wireless relay
channels. IEEE Trans. Inf. Theory, (), –. []
Hu, T. C. (). Multi-commodity network flows. Oper. Res., (), –. []
Hughes-Hartogs, D. (). The capacity of the degraded spectral Gaussian broadcast channel. Ph.D.
thesis, Stanford University, Stanford, CA. [, ]
Hui, J. Y. N. and Humblet, P. A. (). The capacity region of the totally asynchronous multiple-
access channel. IEEE Trans. Inf. Theory, (), –. []
Hwang, C.-S., Malkin, M., El Gamal, A., and Cioffi, J. M. (). Multiple-access channels with
distributed channel state information. In Proc. IEEE Int. Symp. Inf. Theory, Nice, France, pp.
–. [, ]
Jafar, S. A. (). Capacity with causal and noncausal side information: A unified view. IEEE
Trans. Inf. Theory, (), –. [, ]
Jafar, S. A. and Vishwanath, S. (). Generalized degrees of freedom of the symmetric Gaussian
K user interference channel. IEEE Trans. Inf. Theory, (), –. [, ]
Jaggi, S., Sanders, P., Chou, P. A., Effros, M., Egner, S., Jain, K., and Tolhuizen, L. M. G. M. ().
Polynomial time algorithms for multicast network code construction. IEEE Trans. Inf. Theory,
(), –. []
Jog, V. and Nair, C. (). An information inequality for the BSSC channel. In Proc. UCSD Inf.
Theory Appl. Workshop, La Jolla, CA. [, ]
Jovičić, A. and Viswanath, P. (). Cognitive radio: An information-theoretic perspective. IEEE
Trans. Inf. Theory, (), –. []
Jovičić, A., Viswanath, P., and Kulkarni, S. R. (). Upper bounds to transport capacity of wireless
networks. IEEE Trans. Inf. Theory, (), –. []
Kailath, T., Sayed, A. H., and Hassibi, B. (). Linear Estimation. Prentice-Hall, Englewood
Cliffs, NJ. [, ]
Kalyanarama Sesha Sayee, K. C. V. and Mukherji, U. (). A multiclass discrete-time processor-
sharing queueing model for scheduled message communication over multiaccess channels
with joint maximum-likelihood decoding. In Proc. th Ann. Allerton Conf. Comm. Control
Comput., Monticello, IL, pp. –. []
Kashyap, A., Başar, T., and Srikant, R. (). Quantized consensus. Automatica, (), –.
[]
Kaspi, A. H. (). Two-way source coding with a fidelity criterion. IEEE Trans. Inf. Theory, (),
–. []
Kaspi, A. H. (). Rate–distortion function when side-information may be present at the decoder.
IEEE Trans. Inf. Theory, (), –. []
Kaspi, A. H. and Berger, T. (). Rate–distortion for correlated sources with partially separated
encoders. IEEE Trans. Inf. Theory, (), –. []
Bibliography 653

Katti, S., Maric, I., Goldsmith, A. J., Katabi, D., and Médard, M. (). Joint relaying and network
coding in wireless networks. In Proc. IEEE Int. Symp. Inf. Theory, Nice, France, pp. –.
[]
Katz, J. and Lindell, Y. (). Introduction to Modern Cryptography. Chapman & Hall/CRC, Boca
Raton, FL. []
Keshet, G., Steinberg, Y., and Merhav, N. (). Channel coding in the presence of side informa-
tion. Found. Trends Comm. Inf. Theory, (), –. []
Khisti, A. and Wornell, G. W. (a). Secure transmission with multiple antennas—I: The
MISOME wiretap channel. IEEE Trans. Inf. Theory, (), –. []
Khisti, A. and Wornell, G. W. (b). Secure transmission with multiple antennas—II: The
MIMOME wiretap channel. IEEE Trans. Inf. Theory, (), –. []
Kim, Y.-H. (a). Capacity of a class of deterministic relay channels. IEEE Trans. Inf. Theory,
(), –. []
Kim, Y.-H. (b). A coding theorem for a class of stationary channels with feedback. IEEE Trans.
Inf. Theory, (), –. [, ]
Kim, Y.-H. (). Feedback capacity of stationary Gaussian channels. IEEE Trans. Inf. Theory,
(), –. []
Kimura, A. and Uyematsu, T. (). Multiterminal source coding with complementary delivery.
In Proc. IEEE Int. Symp. Inf. Theory Appl., Seoul, Korea, pp. –. []
Knopp, R. and Humblet, P. A. (). Information capacity and power control in single-cell multi-
user communications. In Proc. IEEE Int. Conf. Comm., Seattle, WA, vol. , pp. –. []
Koetter, R. and Médard, M. (). An algebraic approach to network coding. IEEE/ACM Trans.
Netw., (), –. []
Kolmogorov, A. N. (). On the Shannon theory of information transmission in the case of
continuous signals. IRE Trans. Inf. Theory, (), –. []
Korada, S. and Urbanke, R. (). Polar codes are optimal for lossy source coding. IEEE Trans.
Inf. Theory, (), –. []
Körner, J. and Marton, K. (a). Comparison of two noisy channels. In I. Csiszár and P. Elias
(eds.) Topics in Information Theory (Colloquia Mathematica Societatis János Bolyai, Keszthely,
Hungary, ), pp. –. North-Holland, Amsterdam. []
Körner, J. and Marton, K. (b). General broadcast channels with degraded message sets. IEEE
Trans. Inf. Theory, (), –. []
Körner, J. and Marton, K. (c). Images of a set via two channels and their role in multi-user
communication. IEEE Trans. Inf. Theory, (), –. []
Körner, J. and Marton, K. (). How to encode the modulo-two sum of binary sources. IEEE
Trans. Inf. Theory, (), –. []
Kramer, G. (). Feedback strategies for a class of two-user multiple-access channels. IEEE Trans.
Inf. Theory, (), –. []
Kramer, G. (). Capacity results for the discrete memoryless network. IEEE Trans. Inf. Theory,
(), –. []
Kramer, G. (). Outer bounds on the capacity of Gaussian interference channels. IEEE Trans.
Inf. Theory, (), –. []
Kramer, G. (). Topics in multi-user information theory. Found. Trends Comm. Inf. Theory,
(/), –. [, ]
654 Bibliography

Kramer, G., Gastpar, M., and Gupta, P. (). Cooperative strategies and capacity theorems for
relay networks. IEEE Trans. Inf. Theory, (), –. [, ]
Kramer, G. and Nair, C. (). Comments on “Broadcast channels with arbitrarily correlated
sources”. In Proc. IEEE Int. Symp. Inf. Theory, Seoul, Korea, pp. –. []
Kramer, G. and Savari, S. A. (). Edge-cut bounds on network coding rates. J. Network Syst.
Manage., (), –. []
Krithivasan, D. and Pradhan, S. S. (). Lattices for distributed source coding: Jointly Gaussian
sources and reconstruction of a linear function. IEEE Trans. Inf. Theory, (), –.
[]
Kushilevitz, E. and Nisan, N. (). Communication Complexity. Cambridge University Press,
Cambridge. []
Kuznetsov, A. V. and Tsybakov, B. S. (). Coding in a memory with defective cells. Probl. Inf.
Transm., (), –. []
Lancaster, P. and Rodman, L. (). Algebraic Riccati Equations. Oxford University Press, New
York. []
Laneman, J. N., Tse, D. N. C., and Wornell, G. W. (). Cooperative diversity in wireless networks:
efficient protocols and outage behavior. IEEE Trans. Inf. Theory, (), –. []
Lapidoth, A. and Narayan, P. (). Reliable communication under channel uncertainty. IEEE
Trans. Inf. Theory, (), –. [, ]
Leung, C. S. K. and Hellman, M. E. (). The Gaussian wire-tap channel. IEEE Trans. Inf. Theory,
(), –. []
Lévêque, O. and Telatar, İ. E. (). Information-theoretic upper bounds on the capacity of large
extended ad hoc wireless networks. IEEE Trans. Inf. Theory, (), –. []
Li, L. and Goldsmith, A. J. (). Capacity and optimal resource allocation for fading broadcast
channels—I: Ergodic capacity. IEEE Trans. Inf. Theory, (), –. []
Li, S.-Y. R., Yeung, R. W., and Cai, N. (). Linear network coding. IEEE Trans. Inf. Theory,
(), –. []
Liang, Y. (). Multiuser communications with relaying and user cooperation. Ph.D. thesis, Uni-
versity of Illinois, Urbana-Champaign, IL. []
Liang, Y. and Kramer, G. (). Rate regions for relay broadcast channels. IEEE Trans. Inf. Theory,
(), –. []
Liang, Y., Kramer, G., and Poor, H. V. (). On the equivalence of two achievable regions for the
broadcast channel. IEEE Trans. Inf. Theory, (), –. []
Liang, Y., Kramer, G., and Shamai, S. (). Capacity outer bounds for broadcast channels. In
Proc. IEEE Inf. Theory Workshop, Porto, Portugal, pp. –. []
Liang, Y., Poor, H. V., and Shamai, S. (). Information theoretic security. Found. Trends Comm.
Inf. Theory, (/), –. []
Liang, Y. and Veeravalli, V. V. (). Gaussian orthogonal relay channels: Optimal resource allo-
cation and capacity. IEEE Trans. Inf. Theory, (), –. []
Liao, H. H. J. (). Multiple access channels. Ph.D. thesis, University of Hawaii, Honolulu, HI.
[xvii, ]
Lidl, R. and Niederreiter, H. (). Finite Fields. nd ed. Cambridge University Press, Cambridge.
[, ]
Bibliography 655

Lieb, E. H. (). Proof of an entropy conjecture of Wehrl. Comm. Math. Phys., (), –. []
Lim, S. H., Kim, Y.-H., El Gamal, A., and Chung, S.-Y. (). Layered noisy network coding. In
Proc. IEEE Wireless Netw. Coding Workshop, Boston, MA. [, ]
Lim, S. H., Kim, Y.-H., El Gamal, A., and Chung, S.-Y. (). Noisy network coding. IEEE Trans.
Inf. Theory, (), –. [, ]
Lim, S. H., Minero, P., and Kim, Y.-H. (). Lossy communication of correlated sources over mul-
tiple access channels. In Proc. th Ann. Allerton Conf. Comm. Control Comput., Monticello,
IL, pp. –. []
Linder, T., Zamir, R., and Zeger, K. (). On source coding with side-information-dependent
distortion measures. IEEE Trans. Inf. Theory, (), –. []
Ma, N. and Ishwar, P. (). Two-terminal distributed source coding with alternating messages
for function computation. In Proc. IEEE Int. Symp. Inf. Theory, Toronto, Canada, pp. –.
[]
Ma, N. and Ishwar, P. (). Interaction strictly improves the Wyner–Ziv rate distortion function.
In Proc. IEEE Int. Symp. Inf. Theory, Austin, TX, pp. –. []
Maddah-Ali, M. A., Motahari, A. S., and Khandani, A. K. (). Communication over MIMO X
channels: Interference alignment, decomposition, and performance analysis. IEEE Trans. Inf.
Theory, (), –. [, ]
Madiman, M. and Barron, A. R. (). Generalized entropy power inequalities and monotonicity
properties of information. IEEE Trans. Inf. Theory, (), –. []
Marko, H. (). The bidirectional communication theory: A generalization of information the-
ory. IEEE Trans. Comm., (), –. []
Marton, K. (). A coding theorem for the discrete memoryless broadcast channel. IEEE Trans.
Inf. Theory, (), –. []
Massey, J. L. (). Causality, feedback, and directed information. In Proc. IEEE Int. Symp. Inf.
Theory Appl., Honolulu, HI, pp. –. []
Massey, J. L. and Mathys, P. (). The collision channel without feedback. IEEE Trans. Inf. Theory,
(), –. []
Maurer, U. M. (). Secret key agreement by public discussion from common information. IEEE
Trans. Inf. Theory, (), –. []
Maurer, U. M. and Wolf, S. (). Unconditionally secure key agreement and the intrinsic condi-
tional information. IEEE Trans. Inf. Theory, (), –. []
Maurer, U. M. and Wolf, S. (). Information-theoretic key agreement: From weak to strong
secrecy for free. In Advances in Cryptology—EUROCRYPT  (Bruges, Belgium), pp. –
. Springer, Berlin. []
McEliece, R. J. (). The Theory of Information and Coding. Addison-Wesley, Reading, MA. []
McMillan, B. (). The basic theorems of information theory. Ann. Math. Statist., (), –.
[]
Menezes, A. J., van Oorschot, P. C., and Vanstone, S. A. (). Handbook of Applied Cryptography.
CRC Press, Boca Raton, FL. []
Meyn, S. and Tweedie, R. L. (). Markov Chains and Stochastic Stability. nd ed. Cambridge
University Press, Cambridge. [, ]
Minero, P., Franceschetti, M., and Tse, D. N. C. (). Random access: An information-theoretic
perspective. Preprint available at http://arxiv.org/abs/./. []
656 Bibliography

Minero, P. and Kim, Y.-H. (). Correlated sources over broadcast channels. In Proc. IEEE Int.
Symp. Inf. Theory, Seoul, Korea, pp. –. [, ]
Mitzenmacher, M. and Upfal, E. (). Probability and Computing. Cambridge University Press,
Cambridge. []
Mohseni, M. (). Capacity of Gaussian vector broadcast channels. Ph.D. thesis, Stanford Uni-
versity, Stanford, CA. []
Motahari, A. S., Gharan, S. O., Maddah-Ali, M. A., and Khandani, A. K. (). Real inter-
ference alignment: Exploring the potential of single antenna systems. Preprint available at
http://arxiv.org/abs/./. [, ]
Motahari, A. S. and Khandani, A. K. (). Capacity bounds for the Gaussian interference channel.
IEEE Trans. Inf. Theory, (), –. []
Moulin, P. and O’Sullivan, J. A. (). Information-theoretic analysis of information hiding. IEEE
Trans. Inf. Theory, (), –. []
Nair, C. (). An outer bound for -receiver discrete memoryless broadcast channels. Preprint
available at http://arxiv.org/abs/./. []
Nair, C. (). Capacity regions of two new classes of two-receiver broadcast channels. IEEE
Trans. Inf. Theory, (), –. [, ]
Nair, C. and El Gamal, A. (). An outer bound to the capacity region of the broadcast channel.
IEEE Trans. Inf. Theory, (), –. []
Nair, C. and El Gamal, A. (). The capacity region of a class of three-receiver broadcast channels
with degraded message sets. IEEE Trans. Inf. Theory, (), –. [, ]
Nair, C. and Wang, Z. V. (). On the inner and outer bounds for -receiver discrete memoryless
broadcast channels. In Proc. UCSD Inf. Theory Appl. Workshop, La Jolla, CA, pp. –. []
Nam, W., Chung, S.-Y., and Lee, Y. H. (). Capacity of the Gaussian two-way relay channel to
within 12 bit. IEEE Trans. Inf. Theory, (), –. []
Nazer, B. and Gastpar, M. (a). Computation over multiple-access channels. IEEE Trans. Inf.
Theory, (), –. []
Nazer, B. and Gastpar, M. (b). Lattice coding increases multicast rates for Gaussian multiple-
access networks. In Proc. th Ann. Allerton Conf. Comm. Control Comput., Monticello, IL, pp.
–. []
Nazer, B., Gastpar, M., Jafar, S. A., and Vishwanath, S. (). Ergodic interference alignment. In
Proc. IEEE Int. Symp. Inf. Theory, Seoul, Korea, pp. –. [, ]
Nedić, A., Olshevsky, A., Ozdaglar, A., and Tsitsiklis, J. N. (). On distributed averaging algo-
rithms and quantization effects. IEEE Trans. Automat. Control, (), –. []
Niesen, U., Gupta, P., and Shah, D. (). On capacity scaling in arbitrary wireless networks. IEEE
Trans. Inf. Theory, (), –. []
Oggier, F. and Hassibi, B. (). The MIMO wiretap channel. In Proc. rd Int. Symp. Comm.
Control Signal Processing, St. Julians, Malta, pp. –. []
Olfati-Saber, R., Fax, J. A., and Murray, R. M. (). Consensus and cooperation in networked
multi-agent systems. Proc. IEEE, (), –. []
Oohama, Y. (). Gaussian multiterminal source coding. IEEE Trans. Inf. Theory, (), –
. []
Oohama, Y. (). The rate–distortion function for the quadratic Gaussian CEO problem. IEEE
Trans. Inf. Theory, (), –. []
Bibliography 657

Oohama, Y. (). Rate–distortion theory for Gaussian multiterminal source coding systems with
several side informations at the decoder. IEEE Trans. Inf. Theory, (), –. [, ]
Ooi, J. M. and Wornell, G. W. (). Fast iterative coding techniques for feedback channels. IEEE
Trans. Inf. Theory, (), –. []
Orlitsky, A. and El Gamal, A. (). Average and randomized communication complexity. IEEE
Trans. Inf. Theory, (), –. []
Orlitsky, A. and Roche, J. R. (). Coding for computing. IEEE Trans. Inf. Theory, (), –.
[, ]
Ozarow, L. H. (). On a source-coding problem with two channels and three receivers. Bell Syst.
Tech. J., (), –. []
Ozarow, L. H. (). The capacity of the white Gaussian multiple access channel with feedback.
IEEE Trans. Inf. Theory, (), –. []
Ozarow, L. H. and Leung, C. S. K. (). An achievable region and outer bound for the Gaussian
broadcast channel with feedback. IEEE Trans. Inf. Theory, (), –. []
Ozarow, L. H., Shamai, S., and Wyner, A. D. (). Information theoretic considerations for cel-
lular mobile radio. IEEE Trans. Veh. Tech., (), –. []
Özgür, A., Johari, R., Tse, D. N. C., and Lévêque, O. (). Information-theoretic operating regimes
of large wireless networks. IEEE Trans. Inf. Theory, (), –. []
Özgür, A., Lévêque, O., and Preissmann, E. (). Scaling laws for one- and two-dimensional
random wireless networks in the low-attenuation regimes. IEEE Trans. Inf. Theory, (),
–. []
Özgür, A., Lévêque, O., and Tse, D. N. C. (). Hierarchical cooperation achieves optimal capac-
ity scaling in ad hoc networks. IEEE Trans. Inf. Theory, (), –. []
Özgür, A., Lévêque, O., and Tse, D. N. C. (). Linear capacity scaling in wireless networks:
Beyond physical limits? In Proc. UCSD Inf. Theory Appl. Workshop, La Jolla, CA. []
Permuter, H. H., Kim, Y.-H., and Weissman, T. (). Interpretations of directed information
in portfolio theory, data compression, and hypothesis testing. IEEE Trans. Inf. Theory, (),
–. []
Permuter, H. H. and Weissman, T. (). Cascade and triangular source coding with side infor-
mation at the first two nodes. In Proc. IEEE Int. Symp. Inf. Theory, Austin, TX, pp. –. []
Permuter, H. H., Weissman, T., and Goldsmith, A. J. (). Finite state channels with time-
invariant deterministic feedback. IEEE Trans. Inf. Theory, (), –. []
Perron, E. (). Information-theoretic secrecy for wireless networks. Ph.D. thesis, École Polytech-
nique Fédérale de Lausanne, Lausanne, Switzerland. []
Petersen, K. (). Ergodic Theory. Cambridge University Press, Cambridge. []
Pinsker, M. S. (). Information and Information Stability of Random Variables and Processes.
Holden-Day, San Francisco. []
Pinsker, M. S. (). The probability of error in block transmission in a memoryless Gaussian
channel with feedback. Probl. Inf. Transm., (), –. []
Pinsker, M. S. (). Capacity of noiseless broadcast channels. Probl. Inf. Transm., (), –.
[]
Poltyrev, G. S. (). The capacity of parallel broadcast channels with degraded components. Probl.
Inf. Transm., (), –. [, ]
658 Bibliography

Poltyrev, G. S. (). Coding in an asynchronous multiple-access channel. Probl. Inf. Transm.,

(), –. []
Prabhakaran, V., Tse, D. N. C., and Ramchandran, K. (). Rate region of the quadratic Gaussian
CEO problem. In Proc. IEEE Int. Symp. Inf. Theory, Chicago, IL, pp. . [, ]
Pradhan, S. S., Chou, J., and Ramchandran, K. (). Duality between source and channel coding
and its extension to the side information case. IEEE Trans. Inf. Theory, (), –. []
Puri, R., Pradhan, S. S., and Ramchandran, K. (). n-channel symmetric multiple descrip-
tions—II: An achievable rate–distortion region. IEEE Trans. Inf. Theory, (), –. []
Rankov, B. and Wittneben, A. (). Achievable rate region for the two-way relay channel. In
Proc. IEEE Int. Symp. Inf. Theory, Seattle, WA, pp. –. []
Ratnakar, N. and Kramer, G. (). The multicast capacity of deterministic relay networks with
no interference. IEEE Trans. Inf. Theory, (), –. []
Richardson, T. and Urbanke, R. (). Modern Coding Theory. Cambridge University Press,
Cambridge. []
Rimoldi, B. (). Successive refinement of information: Characterization of the achievable rates.
IEEE Trans. Inf. Theory, (), –. []
Rockafellar, R. T. (). Convex Analysis. Princeton University Press, Princeton, NJ. []
Rosenzweig, A., Steinberg, Y., and Shamai, S. (). On channels with partial channel state infor-
mation at the transmitter. IEEE Trans. Inf. Theory, (), –. []
Royden, H. L. (). Real Analysis. rd ed. Macmillan, New York. []
Sato, H. (). Information transmission through a channel with relay. Technical Report B-,
The Aloha Systems, University of Hawaii, Honolulu, HI. [, ]
Sato, H. (). Two-user communication channels. IEEE Trans. Inf. Theory, (), –. []
Sato, H. (a). An outer bound to the capacity region of broadcast channels. IEEE Trans. Inf.
Theory, (), –. []
Sato, H. (b). On the capacity region of a discrete two-user channel for strong interference.
IEEE Trans. Inf. Theory, (), –. []
Schalkwijk, J. P. M. (). A coding scheme for additive noise channels with feedback—II: Band-
limited signals. IEEE Trans. Inf. Theory, (), –. []
Schalkwijk, J. P. M. (). The binary multiplying channel: A coding scheme that operates beyond
Shannon’s inner bound region. IEEE Trans. Inf. Theory, (), –. []
Schalkwijk, J. P. M. (). On an extension of an achievable rate region for the binary multiplying
channel. IEEE Trans. Inf. Theory, (), –. []
Schalkwijk, J. P. M. and Kailath, T. (). A coding scheme for additive noise channels with
feedback—I: No bandwidth constraint. IEEE Trans. Inf. Theory, (), –. []
Schein, B. and Gallager, R. G. (). The Gaussian parallel relay channel. In Proc. IEEE Int. Symp.
Inf. Theory, Sorrento, Italy, pp. . []
Schrijver, A. (). Combinatorial Optimization.  vols. Springer-Verlag, Berlin. []
Shah, D. (). Gossip algorithms. Found. Trends Netw., (), –. []
Shamai, S. (). A broadcast strategy for the Gaussian slowly fading channel. In Proc. IEEE Int.
Symp. Inf. Theory, Ulm, Germany, pp. . []
Shamai, S. and Steiner, A. (). A broadcast approach for a single-user slowly fading MIMO
channel. IEEE Trans. Inf. Theory, (), –. []
Bibliography 659

Shamai, S. and Telatar, İ. E. (). Some information theoretic aspects of decentralized power
control in multiple access fading channels. In Proc. IEEE Inf. Theory Netw. Workshop, Metsovo,
Greece, pp. . []
Shamai, S. and Wyner, A. D. (). A binary analog to the entropy-power inequality. IEEE Trans.
Inf. Theory, (), –. []
Shang, X., Kramer, G., and Chen, B. (). A new outer bound and the noisy-interference sum-
rate capacity for Gaussian interference channels. IEEE Trans. Inf. Theory, (), –. []
Shannon, C. E. (). A mathematical theory of communication. Bell Syst. Tech. J., (), –,
(), –. [, , , , –]
Shannon, C. E. (a). Communication in the presence of noise. Proc. IRE, (), –. []
Shannon, C. E. (b). Communication theory of secrecy systems. Bell Syst. Tech. J., (), –
. []
Shannon, C. E. (). The zero error capacity of a noisy channel. IRE Trans. Inf. Theory, (), –.
[, ]
Shannon, C. E. (). Channels with side information at the transmitter. IBM J. Res. Develop.,
(), –. []
Shannon, C. E. (). Coding theorems for a discrete source with a fidelity criterion. In IRE
Int. Conv. Rec., vol. , part , pp. –. Reprint with changes (). In R. E. Machol (ed.)
Information and Decision Processes, pp. –. McGraw-Hill, New York. [, , , , ]
Shannon, C. E. (). Two-way communication channels. In Proc. th Berkeley Symp. Math. Statist.
Probab., vol. I, pp. –. University of California Press, Berkeley. [xvii, ]
Shayevitz, O. and Feder, M. (). Optimal feedback communication via posterior matching. IEEE
Trans. Inf. Theory, (), –. []
Shayevitz, O. and Wigger, M. A. (). An achievable region for the discrete memoryless broadcast
channel with feedback. In Proc. IEEE Int. Symp. Inf. Theory, Austin, TX, pp. –. []
Shepp, L. A., Wolf, J. K., Wyner, A. D., and Ziv, J. (). Binary communication over the Gaussian
channel using feedback with a peak energy constraint. IEEE Trans. Inf. Theory, (), –.
[]
Slepian, D. (). On bandwidth. Proc. IEEE, (), –. []
Slepian, D. and Wolf, J. K. (a). Noiseless coding of correlated information sources. IEEE Trans.
Inf. Theory, (), –. [xvii, , ]
Slepian, D. and Wolf, J. K. (b). A coding theorem for multiple access channels with correlated
sources. Bell Syst. Tech. J., (), –. [, ]
Stam, A. J. (). Some inequalities satisfied by the quantities of information of Fisher and Shan-
non. Inf. Control, (), –. []
Steinberg, Y. (). Coding for the degraded broadcast channel with random parameters, with
causal and noncausal side information. IEEE Trans. Inf. Theory, (), –. []
Steinsaltz, D. (). Locally contractive iterated function systems. Ann. Probab., (), –.
[]
Su, H.-I. and El Gamal, A. (). Distributed lossy averaging. In Proc. IEEE Int. Symp. Inf. Theory,
Seoul, Korea, pp. –. [, ]
Su, H.-I. and El Gamal, A. (). Two-way source coding through a relay. In Proc. IEEE Int. Symp.
Inf. Theory, Austin, TX, pp. –. []
660 Bibliography

Tatikonda, S. and Mitter, S. (). The capacity of channels with feedback. IEEE Trans. Inf. Theory,
(), –. []
Tavildar, S., Viswanath, P., and Wagner, A. B. (). The Gaussian many-help-one distributed
source coding problem. In Proc. IEEE Inf. Theory Workshop, Chengdu, China, pp. –.
[]
Telatar, İ. E. (). Capacity of multi-antenna Gaussian channels. Euro. Trans. Telecomm., (),
–. [, ]
Telatar, İ. E. and Tse, D. N. C. (). Bounds on the capacity region of a class of interference
channels. In Proc. IEEE Int. Symp. Inf. Theory, Nice, France, pp. –. []
Tse, D. N. C. and Hanly, S. V. (). Multiaccess fading channels—I: Polymatroid structure, opti-
mal resource allocation and throughput capacities. IEEE Trans. Inf. Theory, (), –.
[, ]
Tse, D. N. C. and Viswanath, P. (). Fundamentals of Wireless Communication. Cambridge
University Press, Cambridge. []
Tsitsiklis, J. N. (). Problems in decentralized decision making and computation. Ph.D. thesis,
Massachusetts Institute of Technology, Cambridge, MA. []
Tung, S.-Y. (). Multiterminal source coding. Ph.D. thesis, Cornell University, Ithaca, NY. []
Uhlmann, W. (). Vergleich der hypergeometrischen mit der Binomial-Verteilung. Metrika,
(), –. []
van der Meulen, E. C. (a). Three-terminal communication channels. Adv. Appl. Probab., (),
–. []
van der Meulen, E. C. (b). The discrete memoryless channel with two senders and one receiver.
In Proc. nd Int. Symp. Inf. Theory, Tsahkadsor, Armenian SSR, pp. –. []
van der Meulen, E. C. (). Random coding theorems for the general discrete memoryless broad-
cast channel. IEEE Trans. Inf. Theory, (), –. []
van der Meulen, E. C. (). A survey of multi-way channels in information theory: –.
IEEE Trans. Inf. Theory, (), –. [xvii, , ]
van der Meulen, E. C. (). Recent coding theorems and converses for multi-way channels—I:
The broadcast channel (–). In J. K. Skwyrzinsky (ed.) New Concepts in Multi-User
Communication, pp. –. Sijthoff & Noordhoff, Alphen aan den Rijn. []
van der Meulen, E. C. (). Recent coding theorems and converses for multi-way channels—
II: The multiple access channel (–). Department Wiskunde, Katholieke Universiteit
Leuven, Leuven, Belgium. []
van der Meulen, E. C. (). A survey of the relay channel. In E. Biglieri and L. Györfi (eds.)
Multiple Access Channels: Theory and Practice, pp. –. IOS Press, Boston. []
Vandenberghe, L., Boyd, S., and Wu, S.-P. (). Determinant maximization with linear matrix
inequality constraints. SIAM J. Matrix Anal. Appl., (), –. []
Vasudevan, D., Tian, C., and Diggavi, S. N. (). Lossy source coding for a cascade communica-
tion system with side-informations. In Proc. th Ann. Allerton Conf. Comm. Control Comput.,
Monticello, IL, pp. –. []
Venkataramani, R., Kramer, G., and Goyal, V. K. (). Multiple description coding with many
channels. IEEE Trans. Inf. Theory, (), –. []
Verdú, S. (). Multiple-access channels with memory with and without frame synchronism.
IEEE Trans. Inf. Theory, (), –. []
Bibliography 661

Verdú, S. (). On channel capacity per unit cost. IEEE Trans. Inf. Theory, (), –. []
Verdú, S. and Guo, D. (). A simple proof of the entropy-power inequality. IEEE Trans. Inf.
Theory, (), –. []
Verdú, S. and Han, T. S. (). A general formula for channel capacity. IEEE Trans. Inf. Theory,
(), –. []
Vishwanath, S., Jindal, N., and Goldsmith, A. J. (). Duality, achievable rates, and sum-rate
capacity of Gaussian MIMO broadcast channels. IEEE Trans. Inf. Theory, (), –.
[]
Viswanath, P. and Tse, D. N. C. (). Sum capacity of the vector Gaussian broadcast channel and
uplink-downlink duality. IEEE Trans. Inf. Theory, (), –. []
Viswanathan, H. and Berger, T. (). The quadratic Gaussian CEO problem. IEEE Trans. Inf.
Theory, (), –. []
Wagner, A. B. and Anantharam, V. (). An improved outer bound for multiterminal source
coding. IEEE Trans. Inf. Theory, (), –. []
Wagner, A. B., Kelly, B. G., and Altuğ, Y. (). Distributed rate–distortion with common compo-
nents. IEEE Trans. Inf. Theory, (), –. []
Wagner, A. B., Tavildar, S., and Viswanath, P. (). Rate region of the quadratic Gaussian two-
encoder source-coding problem. IEEE Trans. Inf. Theory, (), –. [, ]
Wang, J., Chen, J., and Wu, X. (). On the sum rate of Gaussian multiterminal source coding:
New proofs and results. IEEE Trans. Inf. Theory, (), –. []
Wang, J., Chen, J., Zhao, L., Cuff, P., and Permuter, H. H. (). On the role of the refinement layer
in multiple description coding and scalable coding. IEEE Trans. Inf. Theory, (), –.
[]
Wang, Z. V. and Nair, C. (). The capacity region of a class of broadcast channels with a sequence
of less noisy receivers. In Proc. IEEE Int. Symp. Inf. Theory, Austin, TX, pp. –. [, ]
Weingarten, H., Steinberg, Y., and Shamai, S. (). The capacity region of the Gaussian multiple-
input multiple-output broadcast channel. IEEE Trans. Inf. Theory, (), –. []
Weissman, T. and El Gamal, A. (). Source coding with limited-look-ahead side information
at the decoder. IEEE Trans. Inf. Theory, (), –. [, ]
Weldon, E. J., Jr. (). Asymptotic error coding bounds for the binary symmetric channel with
feedback. Ph.D. thesis, University of Florida, Gainesville, FL. []
Willems, F. M. J. (). The feedback capacity region of a class of discrete memoryless multiple
access channels. IEEE Trans. Inf. Theory, (), –. []
Willems, F. M. J. (). The maximal-error and average-error capacity region of the broadcast
channel are identical: A direct proof. Probl. Control Inf. Theory, (), –. [, ]
Willems, F. M. J. and van der Meulen, E. C. (). The discrete memoryless multiple-access channel
with cribbing encoders. IEEE Trans. Inf. Theory, (), –. [, , ]
Wilson, M. P., Narayanan, K., Pfister, H. D., and Sprintson, A. (). Joint physical layer coding
and network coding for bidirectional relaying. IEEE Trans. Inf. Theory, (), –. []
Witsenhausen, H. S. (). Entropy inequalities for discrete channels. IEEE Trans. Inf. Theory,
(), –. []
Witsenhausen, H. S. (). On sequences of pairs of dependent random variables. SIAM J. Appl.
Math., (), –. []
662 Bibliography

Witsenhausen, H. S. (a). Values and bounds for the common information of two discrete
random variables. SIAM J. Appl. Math., (), –. []
Witsenhausen, H. S. (b). The zero-error side information problem and chromatic numbers.
IEEE Trans. Inf. Theory, (), –. []
Witsenhausen, H. S. (). Indirect rate distortion problems. IEEE Trans. Inf. Theory, (), –
. []
Witsenhausen, H. S. and Wyner, A. D. (). A conditional entropy bound for a pair of discrete
random variables. IEEE Trans. Inf. Theory, (), –. []
Witsenhausen, H. S. and Wyner, A. D. (). Source coding for multiple descriptions—II: A binary
source. Bell Syst. Tech. J., (), –. []
Wolf, J. K., Wyner, A. D., and Ziv, J. (). Source coding for multiple descriptions. Bell Syst. Tech.
J., (), –. []
Wolf, J. K., Wyner, A. D., Ziv, J., and Körner, J. (). Coding for write-once memory. Bell Syst.
Tech. J., (), –. []
Wolfowitz, J. (). The coding of messages subject to chance errors. Illinois J. Math., (), –.
[]
Wolfowitz, J. (). Simultaneous channels. Arch. Rational Mech. Anal., (), –. []
Wolfowitz, J. (). Coding Theorems of Information Theory. rd ed. Springer-Verlag, Berlin. []
Wu, W., Vishwanath, S., and Arapostathis, A. (). Capacity of a class of cognitive radio channels:
Interference channels with degraded message sets. IEEE Trans. Inf. Theory, (), –.
[]
Wyner, A. D. (). The capacity of the band-limited Gaussian channel. Bell Syst. Tech. J., (),
–. []
Wyner, A. D. (). On the Schalkwijk–Kailath coding scheme with a peak energy constraint.
IEEE Trans. Inf. Theory, (), –. []
Wyner, A. D. (). A theorem on the entropy of certain binary sequences and applications—II.
IEEE Trans. Inf. Theory, (), –. [, ]
Wyner, A. D. (). Recent results in the Shannon theory. IEEE Trans. Inf. Theory, (), –.
[, ]
Wyner, A. D. (a). The common information of two dependent random variables. IEEE Trans.
Inf. Theory, (), –. []
Wyner, A. D. (b). On source coding with side information at the decoder. IEEE Trans. Inf.
Theory, (), –. []
Wyner, A. D. (c). The wire-tap channel. Bell Syst. Tech. J., (), –. []
Wyner, A. D., Wolf, J. K., and Willems, F. M. J. (). Communicating via a processing broadcast
satellite. IEEE Trans. Inf. Theory, (), –. []
Wyner, A. D. and Ziv, J. (). A theorem on the entropy of certain binary sequences and appli-
cations—I. IEEE Trans. Inf. Theory, (), –. []
Wyner, A. D. and Ziv, J. (). The rate–distortion function for source coding with side informa-
tion at the decoder. IEEE Trans. Inf. Theory, (), –. [, ]
Xiao, L. and Boyd, S. (). Fast linear iterations for distributed averaging. Syst. Control Lett.,
(), –. []
Xiao, L., Boyd, S., and Kim, S.-J. (). Distributed average consensus with least-mean-square
deviation. J. Parallel Distrib. Comput., (), –. []
Bibliography 663

Xie, L.-L. and Kumar, P. R. (). A network information theory for wireless communication:
scaling laws and optimal operation. IEEE Trans. Inf. Theory, (), –. []
Xie, L.-L. and Kumar, P. R. (). An achievable rate for the multiple-level relay channel. IEEE
Trans. Inf. Theory, (), –. []
Xie, L.-L. and Kumar, P. R. (). On the path-loss attenuation regime for positive cost and linear
scaling of transport capacity in wireless networks. IEEE Trans. Inf. Theory, (), –.
[]
Xue, F., Xie, L.-L., and Kumar, P. R. (). The transport capacity of wireless networks over fading
channels. IEEE Trans. Inf. Theory, (), –. []
Yamamoto, H. (). Source coding theory for cascade and branching communication systems.
IEEE Trans. Inf. Theory, (), –. []
Yamamoto, H. (). Wyner–Ziv theory for a general function of the correlated sources. IEEE
Trans. Inf. Theory, (), –. []
Yamamoto, H. (). Source coding theory for a triangular communication systems. IEEE Trans.
Inf. Theory, (), –. []
Yamamoto, H. (). Rate–distortion theory for the Shannon cipher system. IEEE Trans. Inf.
Theory, (), –. []
Yang, Y., Zhang, Y., and Xiong, Z. (). A new sufficient condition for sum-rate tightness in
quadratic Gaussian MT source coding. In Proc. UCSD Inf. Theory Appl. Workshop, La Jolla,
CA. []
Yao, A. C.-C. (). Some complexity questions related to distributive computing. In Proc. th
Ann. ACM Symp. Theory Comput., Atlanta, GA, pp. –. []
Yeung, R. W. (). Information Theory and Network Coding. Springer, New York. [, ]
Yeung, R. W., Li, S.-Y. R., Cai, N., and Zhang, Z. (a). Network coding theory—I: Single source.
Found. Trends Comm. Inf. Theory, (), –. []
Yeung, R. W., Li, S.-Y. R., Cai, N., and Zhang, Z. (b). Network coding theory—II: Multiple
source. Found. Trends Comm. Inf. Theory, (), –. []
Yu, W. and Cioffi, J. M. (). Sum capacity of Gaussian vector broadcast channels. IEEE Trans.
Inf. Theory, (), –. []
Yu, W., Rhee, W., Boyd, S., and Cioffi, J. M. (). Iterative water-filling for Gaussian vector
multiple-access channels. IEEE Trans. Inf. Theory, (), –. []
Yu, W., Sutivong, A., Julian, D. J., Cover, T. M., and Chiang, M. (). Writing on colored paper.
In Proc. IEEE Int. Symp. Inf. Theory, Washington DC, pp. . []
Zamir, R. (). Lattices are everywhere. In Proc. UCSD Inf. Theory Appl. Workshop, La Jolla, CA,
pp. –. []
Zamir, R. and Feder, M. (). A generalization of the entropy power inequality with applications.
IEEE Trans. Inf. Theory, (), –. []
Zeng, C.-M., Kuhlmann, F., and Buzo, A. (). Achievability proof of some multiuser channel
coding theorems using backward decoding. IEEE Trans. Inf. Theory, (), –. [,
]
Zhang, Z. and Berger, T. (). New results in binary multiple descriptions. IEEE Trans. Inf.
Theory, (), –. []
Ziegler, G. M. (). Lectures on Polytopes. Springer-Verlag, New York. []
Common Symbols

α, β numbers in [0, 1]
δ, є small positive numbers
ρ correlation coefficient
B input cost constraint
b input cost/number of transmission blocks
C capacity
C Gaussian capacity function
C capacity region
D distortion constraint/relative entropy
d distortion measure/dimension
E indicator variable
E expectation
E error events/set of edges
e error message
F cumulative distribution function (cdf)
𝔽 field
f probability density function (pdf)
G channel gain matrix/generator matrix
G graph
д channel gain/generic function
H entropy/parity check matrix
h differential entropy
I mutual information/interference-to-noise ratio (INR)
i transmission time
j, k, l generic indices
K covariance matrix/crosscovariance matrix/secret key
665

M, m message/index
N number of nodes in a network/noise power
N set of nodes
n transmission block length
P power constraint
Pe(n) probability of error
P probability
p probability mass function (pmf)
Q time-sharing random variable/Q-function
q number of rounds
R rate
R Gaussian rate function
R rate region
ℝ set of real numbers
S signal-to-noise ratio (SNR)/random channel state
s channel state
Tє(n) typical set
U , V , W, X, Y, Z source/channel/auxiliary random variables
U, V, W, X , Y, Z source/channel/auxiliary random variable alphabets
Author Index

Abramson, N., ,  Butman, S., , 

Ahlswede, R., xvii, , , , , , , Buzo, A., , , 
, , , , , , , 
Ahmad, S. H. A., ,  Cadambe, V., , , 
Aleksic, M., ,  Cai, N., , , , , , , 
Altuğ, Y., ,  Caire, G., , , 
Anantharam, V., , , , , , ,  Calderbank, R., , 
Annapureddy, V. S., ,  Cannons, J., , 
Arapostathis, A., ,  Carleial, A. B., , , , 
Ardestanizadeh, E., , ,  Cemal, Y., , , 
Aref, M. R., , , ,  Chen, B., , 
Arıkan, E., ,  Chen, J., , , , 
Artstein, S., ,  Cheng, R. S., , 
Avestimehr, A. S., , ,  Chia, Y.-K., , 
Ayaso, O., ,  Chiang, M., , , , 
Chong, H.-F., , 
Ball, K. M., ,  Chou, J., , 
Bandemer, B., , ,  Chou, P. A., , , 
Barron, A. R., , ,  Christof, T., , 
Barthe, F., ,  Chung, S.-Y., , , , 
Başar, T., ,  Cioffi, J. M., , , , , , 
Beckner, W., ,  Cohen, A. S., , 
Berger, T., , , , , –, , , Constantinides, A., , 
, , , ,  Costa, M. H. M., , , , , , , ,
Bergmans, P. P., ,  
Bierbaum, M., ,  Cover, T. M., xvii, , , , , , , , ,
Biglieri, E., , ,  , , , , , , , , ,
Blachman, N. M., ,  , –, 
Blackwell, D., , , ,  Coviello, L., 
Borade, S., ,  Csiszár, I., xvii, , , , , , , , ,
Boyd, S., , , , , , , , , , , , , 
 Cuff, P., , , , 
Brascamp, H. J., , 
Breiman, L., , ,  Dahleh, M. A., , 
Bresler, G., , ,  Dana, A. F., , 
Bross, S. I., ,  De Bruyn, K., , 
Bucklew, J. A., ,  Devroye, N., , 
Author Index 667

Diaconis, P., ,  Ghaderi, J., , 

Diggavi, S. N., , , , ,  Gharan, S. O., , 
Dobrushin, R. L., , ,  Ghasemi, A., , 
Dougherty, R., , , ,  Ghosh, A., , 
Dousse, O., ,  Gohari, A. A., , , , , 
Dueck, G., , , , ,  Golay, M. J. E., , 
Dunham, J. G., ,  Goldsmith, A. J., , , , , , ,
Durrett, R., , ,  , , , , , 
Gou, T., , 
Effros, M., , , , , ,  Gowaikar, R., , 
Eggleston, H. G., ,  Goyal, V. K., , 
Egner, S., ,  Gray, R. M., , , , 
El Gamal, A., xvii, , , , , , , Grossglauser, M., , 
, , , , , , , , , Guo, D., , 
, , , , , , , , , Gupta, A., , 
, , , , , , –, Gupta, P., , , , , , , 
–, , 
El Gamal, H., ,  Hajek, B. E., , , , , 
Elia, N., ,  Han, T. S., , , , , , , , , 
Elias, P., , , ,  Hanly, S. V., , , 
Ephremides, A., ,  Hardy, G. H., , 
Equitz, W. H. R., ,  Hassanpour, N., , 
Erez, U., , ,  Hassibi, B., , , , , , , ,
Etkin, R., , ,  , 
Heegard, C., , , , , 
Fano, R. M., ,  Hekstra, A. P., , 
Fax, J. A., ,  Hellman, M. E., , 
Feder, M., , , ,  Hemami, S. S., , 
Feinstein, A., , , ,  Ho, T., , , , , 
Ford, L. R., Jr., , ,  Hoeffding, W., , 
Forney, G. D., Jr., ,  Horstein, M., , 
Foschini, G. J., , ,  Høst-Madsen, A., , 
Fragouli, C., ,  Hu, T. C., , 
Franceschetti, M., , , , , ,  Hughes-Hartogs, D., , , 
Freedman, D., ,  Hui, J. Y. N., , 
Freiling, C., , , ,  Humblet, P. A., , , , 
Fu, F.-W., ,  Hwang, C.-S., , , 
Fulkerson, D. R., , , 
Ishwar, P., , , 
Gaarder, N. T., , 
Gács, P., , , ,  Jafar, S. A., , , , , , , , 
Gallager, R. G., –, , , , ,  Jaggi, S., , 
Gardner, R. J., ,  Jain, K., , , 
Garg, H. K., ,  Javidi, T., , 
Gastpar, M., , , , , , , , Jindal, N., , 
, ,  Jog, V., , , 
Gelfand, S. I., , ,  Johari, R., , 
Geng, Y., ,  Jovičić, A., , , , 
Gersho, A.,  Julian, D. J., , 
668 Author Index

Kailath, T., , , , ,  Linder, T., , 
Kalyanarama Sesha Sayee, K. C. V., ,  Löbel, A., , 
Karger, D. R., , , ,  Lun, D., , 
Kashyap, A., , 
Kaspi, A. H., , , ,  Ma, N., , , 
Katabi, D., ,  Maddah-Ali, M. A., , , 
Katti, S., ,  Madiman, M., , 
Katz, J., ,  Malkin, M., , , 
Kelly, B. G., ,  Mammen, J., , , , 
Keshet, G., ,  Maric, I., , 
Khandani, A. K., , , , ,  Marko, H., , 
Khisti, A., ,  Marton, K., , , , , 
Kim, S.-J., ,  Massey, J. L., , , 
Kim, Y.-H., , , , , , , , , Mathys, P., , 
, , , , – Maurer, U. M., , 
Kimura, A., ,  McEliece, R. J., , , , 
Knopp, R., ,  McMillan, B., , 
Kobayashi, K., , , ,  Médard, M., , , , , 
Koetter, R., , , , ,  Menezes, A. J., , 
Merhav, N., , 
Kolmogorov, A. N., , 
Meyn, S., , , 
Korada, S., , 
Migliore, M. D., , 
Körner, J., xvii, , , , , , , , ,
Minero, P., , , , , , , 
, , , , , , , , ,
Mitran, P., , 
, , 
Mitter, S., , 
Kramer, G., , , , , , , ,
Mitzenmacher, M., , 
, , , , –
Mohseni, M., , , , 
Krithivasan, D., , 
Motahari, A. S., , , , , 
Kuhlmann, F., , , 
Motani, M., , 
Kulkarni, S. R., , 
Moulin, P., , 
Kumar, P. R., , , , , , 
Mukherji, U., , 
Kushilevitz, E., , 
Murray, R. M., , 
Kuznetsov, A. V., , 
Nair, C., , , , , , , , ,
Lancaster, P., ,  , , , 
Laneman, J. N., ,  Nam, W., , 
Lapidoth, A., , , , , ,  Naor, A., , 
Lee, Y. H., ,  Narayan, P., , , , , , 
Leong, B., ,  Narayanan, K., , 
Leung, C. S. K., , , , ,  Nazer, B., , , , 
Lévêque, O., , , , ,  Nedić, A., , 
Li, L., ,  Niederreiter, H., , , 
Li, S.-Y. R., , , , ,  Niesen, U., , 
Liang, Y., , , , ,  Nisan, N., , 
Liao, H. H. J., xvii, , 
Lidl, R., , ,  Oggier, F., , 
Lieb, E. H., , ,  Olfati-Saber, R., , 
Lim, S. H., , , ,  Olshevsky, A., , 
Lindell, Y., ,  Oohama, Y., , , , 
Author Index 669

Ooi, J. M., ,  Shah, D., , , , , , , , ,
Orlitsky, A., , , , , , ,  
O’Sullivan, J. A., ,  Shamai, S., , , , , , , , ,
Ozarow, L. H., , , ,  , , , , –, 
Ozdaglar, A., ,  Shang, X., , 
Özgür, A., , , ,  Shannon, C. E., xvii, –, , , , –, ,
, , , , , , 
Palanki, R., ,  Shayevitz, O., , , 
Parekh, A., , ,  Shen, X., , 
Paulraj, A., ,  Shepp, L. A., , 
Permuter, H. H., , , , ,  Shi, J., , 
Perron, E., ,  Slepian, D., xvii, , , , , , 
Petersen, K., ,  Soljanin, E., , 
Pfister, H. D., ,  Sprintson, A., , 
Pinsker, M. S., , , , , ,  Srikant, R., , 
Poltyrev, G. S., , , , ,  Stam, A. J., , 
Poor, H. V., , , , ,  Steinberg, Y., , , , , , , ,
Posner, E. C., ,  
Prabhakar, B., , , , ,  Steiner, A., , 
Prabhakaran, V., , ,  Steinsaltz, D., , 
Pradhan, S. S., , , , ,  Su, H.-I., , , , , 
Preissmann, E., ,  Sutivong, A., , 
Prelov, V. V., , 
Proakis, J., ,  Tarokh, V., , 
Puri, R., ,  Tatikonda, S., , 
Pursley, M. B., , ,  Tavildar, S., , , , 
Telatar, İ. E., , , , , , , ,
Ramchandran, K., , , , ,  
Rankov, B., ,  Thiran, P., , 
Ratnakar, N., ,  Thomas, J. A., , , , , 
Ray, S., ,  Thomasian, A. J., , 
Razaghi, P., ,  Tian, C., , , , 
Rhee, W., ,  Tolhuizen, L. M. G. M., , 
Richardson, T., ,  Trott, M., , 
Rimoldi, B., , , ,  Tse, D. N. C., , , , , , , , ,
Roche, J. R., , ,  , , , , , , , , ,
Rockafellar, R. T., ,  , , , , , , 
Rodman, L., ,  Tsitsiklis, J. N., , , 
Rosenzweig, A., ,  Tsybakov, B. S., , 
Royden, H. L., ,  Tung, S.-Y., , 
Tweedie, R. L., , , 
Salehi, M., , 
Sanders, P., ,  Uhlmann, W., , 
Sato, H., , , , ,  Upfal, E., , 
Savari, S. A., ,  Urbanke, R., , , 
Sayed, A. H., , ,  Uyematsu, T., , 
Schalkwijk, J. P. M., , , 
Schein, B., ,  van der Meulen, E. C., xvii, , , , ,
Schrijver, A., ,  , , , , , , , 
670 Author Index

van Oorschot, P. C., ,  Wolfowitz, J., , , , 
Vandenberghe, L., , , , ,  Wornell, G. W., , , , , , 
Vanstone, S. A., ,  Wu, S.-P., , 
Varaiya, P. P., , ,  Wu, W., , 
Vasudevan, D., ,  Wu, X., , 
Vazquez-Vilar, G., ,  Wu, Y., , 
Veeravalli, V. V., , , ,  Wyner, A. D., , , , , , , , ,
Venkataramani, R., ,  , , , , , , , , ,
Verdú, S., ,  , , 
Vetterli, M., , , 
Vishwanath, S., , , , , , , , Xiao, L., , 
,  Xie, L.-L., , , , , 
Viswanath, P., , , , , , , , Xiong, Z., , 
,  Xue, F., , 
Viswanathan, H., , , 
Yamamoto, H., , , , 
Wagner, A. B., , , , ,  Yang, Y., , 
Wallmeier, H., ,  Yao, A. C.-C., , 
Wang, H., , ,  Yeung, R. W., , , , , , , ,
Wang, J., , ,  , , , 
Wang, Z. V., , , , ,  Yu, W., , , , , , 
Weingarten, H., ,  Yu, Y., , 
Weissman, T., , , , , , 
Weldon, E. J., Jr., ,  Zahedi, S., , 
Wigger, M. A., , , ,  Zamir, R., , , , , , , 
Willems, F. M. J., , , , , , , Zeger, K., , , , , , 
, , , ,  Zeng, C.-M., , , 
Wilson, M. P., ,  Zhang, J., , 
Witsenhausen, H. S., , , , , , , Zhang, Y., , 
 Zhang, Z., , , , , 
Wittneben, A., ,  Zhao, L., , 
Wolf, J. K., xvii, , , , , , , , Zheng, L., , 
, , ,  Ziegler, G. M., , 
Wolf, S., ,  Ziv, J., , , , , , , , 
Subject Index

absorption, 498 cardinality bound, see cardinality

access point, 591 identification, 113–115, 122, 125, 177,
access time interval, 604 182, 190, 226, 300, 347, 511, 518,
achievability, 39, 41, 55 555, 561
achievable SNR region, 102 time-sharing, see time sharing
ad-hoc network, 1, 10, 492 AVC, see arbitrarily varying channel
adaptive capacity, 587 average probability of error, 39
adaptive coding
for Gaussian fading channel, 587 backlog, 602
for Gaussian fading MAC, 591, 593 backward channel, 58, 64, 266
backward decoding, 391, 397, 434, 436–438
for random access channel, 605
bandwidth, 70
with power control, 587, 593
base station, 6, 7, 93, 131, 258, 363, 382, 591
adversary, 168, 172, 173
BC, see broadcast channel
AEP, see asymptotic equipartition property
beamforming, 227, 553
algorithm, 93, 378
BEC, see binary erasure channel
decoding, 69
Berger–Tung coding, 297–299, 301, 310,
Ford–Fulkerson, 377
312
gossip, 529
with common part, 312
iterative water-filling, 234 Bernoulli process, xxvii
linear programming, 377 Bernoulli random variable, xxvii
subcarrier bit-loading, 54 Bernoulli source, 55, 68
ALOHA, 14, 605, 606 bin, 261
alphabet, 38 product, 263
amplify–forward, 408, 415, 416, 488 binary entropy function, xxvii, 17
analog-to-digital conversion, 56 inverse, 116
antenna, 227 binary erasure channel (BEC), 40, 121, 126,
arbitrarily varying channel (AVC), 172, 173, 428
191 binary field, 44
arrival, 411 binary multiplier channel (BMC), 447, 448,
arrival rate, 602, 603 450, 454
asymptotic equipartition property (AEP), binary symmetric channel (BSC), 40, 43, 68,
32 107, 116, 121, 195, 431–434
asynchrony, 608, 617 backward, 58, 266
augmented code, 602 compound, 172
augmented network, 375, 497 with state, 194
authentication, 185 binning, 400
auxiliary random variable, 109, 146, 266 compress, 282–285
672 Subject Index

deterministic, 284 less noisy, 121, 124–126, 128

distributed compress, 297 lossless communication of a 2-DMS,
double random, 562 345–351
for decode–forward, 393–395 Marton’s inner bound, 205, 213
random, 261–264 with common message, 212, 213
binomial random variable, xxvii, 316 more capable, 121–126, 164, 348
biological network, 1 multilevel, 199–204
Blackwell channel, 206, 349 Nair–El Gamal outer bound, 217
block fading, 583 outer bound, 106, 214
block feedback coding, 433 product, 126, 127, 203
block Markov coding, 388, 391, 397, 400, relay, 482
436, 437, 463, 493 reversely degraded, 127, 237–240
blowing-up lemma, 271 semideterministic, 206
BMC, see binary multiplier channel superposition coding inner bound
boundary corner point, 246–249 extended, 199
briefing, 308 superposition coding inner bound,
broadcast capacity, 587 109, 111
broadcast channel (BC), 104–130, 197–226, symmetric, 106
443, 444, 460, 557, 595, 598, 617 time-division inner bound, 106
binary erasure, 126 with confidential message, 557
binary skew-symmetric, 215, 216 with degraded message sets, 198–204,
binary symmetric (BS-BC), 107, 108, 218
112, 115–117 with feedback, 443, 444, 454
Blackwell, 206, 349 with known delays, 617
BSC–BEC, 121, 125, 214, 215 with more than two receivers, 124,
capacity region, 105 125, 128, 199–204, 217, 239, 240,
private-message, 106 252, 253
code for, 105 with orthogonal components, 106
degraded, 112–117, 124–127 with state, 183, 184, 188, 189, 191, 195
capacity region, 112, 124 broadcast channel approach
enhanced, 240, 247, 249 for Gaussian fading channel, 586
product, 126 for random access channel, 605
deterministic, 206, 212, 217 BSC, see binary symmetric channel
discrete memoryless (DM-BC), butterfly network, 368
104–117, 121–127, 198–210, 2-unicast, 376
212–226, 557
Gaussian, 117–120, 125, 127, 128, 210, capacity, 39
443, 444, 454, 595, 598 capacity–cost function, 47
capacity region, 118 capacity gap, 151, 489
duality to Gaussian MAC, 128 capacity per unit cost, 70
fading, 595, 598 capacity region, 82
minimum-energy-per-bit region, convexity, 85
127 capacity scaling law, 490
product, 235–240, 253 cardinality, xxv, 92, 103, 115, 123, 145, 177,
vector (GV-BC), 234–253, 255–257 182, 198, 200, 213, 268, 279, 285,
with additive Gaussian state, 188, 347, 555, 631–635
189 card game, 7, 532
with feedback, 443, 444, 454 Cauchy’s inequality, 65
Subject Index 673

causal conditioning, 449, 455, 456 strictly causal state information

causal state information, 175 available at the encoder, 193
CD-ROM, 178 with input cost, 192
cdf, see cumulative distribution function characteristic graph, 532
cellular system, 1, 81, 93, 104 Chebyshev inequality, 431, 579, 625
cellular time division, 493–496 Chebyshev lemma, 223, 504, 581, 582, 625
CEO problem, 308–311, 313, 315, 534, 535 Chernoff bound, 316, 502, 619, 625
Cesáro mean lemma, 618 Chernoff–Hoeffding bound, 73
CFO (communication for omniscience) cloud center, 107, 109, 111, 144, 198, 199,
problem, 514, 515, 525, 569, 571 202
conditional, 525, 571 cocktail party, 131
chain rule codebook, 39, 57
causally conditional pmf, 456 coded state information, 189
differential entropy, 21 coded time sharing, 93, 134, 400, 467
entropy, 18 codeword, 39
mutual information, 24, 25 coding for computing, 529–548
pdf, 21 cascade, 539–542, 546
pmf, 18 distributed, 533–536, 546
channel coding theorem, 39 interactive, 537–539
channel enhancement, 240, 247, 249 over a MAC, 544, 545
channel gain, 49, 94, 117 with side information, 530–533
matrix, 228, 485
cognitive radio, 196
process, 583
coherence time interval, 583
channel with random state, 173–190, 285
commodity, 2, 367–369, 381, see also
broadcast, 183, 184, 188, 189, 191, 195,
multicommodity flow
see also broadcast channel
common-message capacity, 126, 195, 238,
causal state information available at
457, 481
the encoder, 175–178
common information, 308, 330, 347
DMC with DM state, 173–183, 189,
common part, 312, 342
190, 192–196
common randomness, 172
Gaussian channel with additive
communication complexity, 546
Gaussian state, 184–187, 195
complementary delivery, 291
Gaussian vector channel with additive
Gaussian vector state, 241 complementary slackness, 232, 642
memory with stuck-at faults, 178–180, compound channel, 169–172, 192
195 binary symmetric channel (BSC), 172
multiple access, 174, 175, 187, 188, capacity, 170, 172
191, 193, 599, see also multiple multiple access, 191
access channel Z, 170
noncausal state information available compound channel approach
at the encoder, 178–187 for Gaussian fading channel, 586, 587
no state information, 173, 193 for Gaussian fading MAC, 591
state information available at both the for random access channel, 605
encoder and the decoder, 173, compress–bin, 282–285
174, 177 distributed, 297–299
state information available at the compress–forward, 399, 402, 407, 481
decoder, 173 computable characterization, 71, 84, 92,
stationary ergodic state, 193 411, 450, 517, 538, see also
674 Subject Index

multiletter characterization, crossover probability, 40, see also binary

single-letter characterization symmetric channel
compute–compress, 547, 548 crosstalk, 131
concave function, 624 Csiszár sum identity, 25, 32, 122, 182, 198,
conditional covariance matrix, xxvi 226, 555
conditional entropy, 18, see also cumulative distribution function (cdf), xxvi
equivocation cut, 365
conditional expectation, xxvi cutset bound
conditional graph entropy, 532, 540 cascade coding for computing, 540
conditional source coding distributed lossless source–network
lossless, 271 coding, 506
lossy, 275 distributed lossy averaging, 543, 547
conditional typicality lemma, 27, 28, 30, 32, multicast network, 461
35, 36, 181, 209, 267, 278, 284, deterministic, 467
296, 298, 351, 402, 472 graphical, 365
conditional variance, xxvi multimessage network, 477
constant density model, 492 deterministic, 478
converse, 41, 55, 69 Gaussian, 485, 490
convex closure, 623 graphical, 374
convex cover method, 92, 115, 123, 145, multiple description network, 508
182, 198, 200, 268, 279, 285, 347, relay channel, 384
555, 631–633 causal, 414
convex function, 624 Gaussian, 395
convex hull, 623 noncausal, 413
convex optimization, 231, 234, 246, 248, RFD Gaussian, 406
256, 640–642 with orthogonal receiver
convex set, 623 components, 403, 405
convolutional code, 373 cut capacity, 365
cooperation, 477
coherent, 389, 416, 467 data processing inequality, 24, 25, 44, 48,
hierarchical, 599 51, 65, 67, 380
receiver, 106, 133, 384 decode–forward, 390, 396, 399, 406, 416,
sender, 384, 434 481, 488
correlated codewords, 325 causal, 415
correlated noise, 420 network, 462–466
correlated sources, 14, 258, 342, 519, 559, noncausal, 413, 416
574–576 noncoherent, 396
correlation, 280 partial, 396–399, 407, 469
coefficient, xxvii decoder, 39, 55
matrix, 21, 640 degraded, 112, 126, 391, 462
correlation elimination, 125 inconsistently, 236
cost constraint, 47 physically, 112
covariance matrix, xxvi reversely, 127, 236, 386
covering lemma, 62, 181, 267, 278, 284, 298, stochastically, 112
326, 333, 402, 472, 475, 510 degraded message sets
multivariate, 218, 326, 333, 351 broadcast channel (BC), 198
mutual, 208, 209, 221, 222 interference channel (IC), 196
crosscovariance matrix, xxvi multiple access channel (MAC), 129
Subject Index 675

degraded side information, 291 2-component DMS (2-DMS),

degraded source sets, 271 258–264, 271
delay, 160, 320, 393, 412, 584, 604, 607 code for, 259
dependence balance bound, 448, 449 k-DMS, 269
dependent messages, 103 optimal rate region, 259, 260, 269
determinant, xxviii over a BC, 345–351
de Bruijn’s identity, 22, 31 over a MAC, 336–344
differential entropy, 19 stationary ergodic sources, 270
chain rule, 21 with degraded source sets, 271
digital subscriber line (DSL), 49, 54, 131, distributed lossy averaging, 542–544, 547
138 distributed state information, 175
digital TV, 104 diversity coding, 227
digital watermarking, 185 DMC (discrete memoryless channel), see
directed information, 449, 455, 456 point-to-point channel
direct transmission, 386, 396, 399, 491, 493 DMN (discrete memoryless network), see
discrete algebraic Riccati equation, 442 unicast network, multicast
discrete memoryless, 39 network, multimessage network
discretization procedure DMS (discrete memoryless source), see
Gaussian channel, 51, 53, 95, 97, 119, lossless source coding, lossy
186, 211, 229, 233, 239 source coding
Gaussian source, 65, 304, 328 DoF, see symmetric degrees of freedom
distortion, 56, 57 dominated convergence theorem, 77, 626
distortion measure, 56 double-exponential decay, 430, 431
erasure, 61 doubly symmetric binary source (DSBS),
Hamming, 57 261, 265, 271, 276, 277, 282, 292,
quadratic, 64 312, 457
semideterministic, 324 downlink, 8, 9, 104, see also broadcast
side information dependent, 289 channel
squared error, 64 downshift, 153
unbounded, 61 DSBS, see doubly symmetric binary source
distributed lossy source coding, 294–300, DSL, see digital subscriber line
312, 317–319 duality, 51, 120, 274
2-component DMS (2-DMS), BC–MAC, 128, 230, 245, 246, 252, 286
294–300, 312 channel coding–source coding, 54,
Berger–Tung inner bound, 295 285, 286
Berger–Tung outer bound, 300 covering–packing, 62, 286, 296
code for, 295 Lagrange, 231, 247, 248, 286, 367, 368,
cooperative lower bound, 306 515, 641, 642
μ-sum lower bound, 307 linear coding–linear binning, 262
quadratic Gaussian, 300–308 multicoding–binning, 284, 286
rate–distortion region, 295 strong, 641, 642
with more than two sources, 313 successive cancellation–successive
with one distortion measure, 295, 315 refinement, 331
distributed consensus, 542 Dueck’s example, 443
distributed lossless source–network coding,
505–508 e-commerce, 14, 549
distributed lossless source coding, 258–264, eavesdropper, 550
271 edge-cut outer bound, 380
676 Subject Index

eigenvalue decomposition, xxviii fidelity, 56

electromagnetic reciprocity, 591 finite field, xxv, 262
electromagnetic wave propagation, 500 first-in first-out (FIFO) buffer, 174
El Gamal–Cover coding, 325–327, 330 Ford–Fulkerson algorithm, 377
empirical pmf, 172 forwarding, 380, 541
encoder, 39, 55 Fourier–Motzkin elimination, 145, 164,
randomized, 550 199, 201, 213, 219, 238, 326, 333,
energy-per-bit–rate function, 52, 74, see 351, 510, 511, 515, 556, 593
also minimum energy per bit frame, 274
entropy, 17 frequency division, 82, 85, 398, 406
chain rule, 18 functional representation lemma, 178, 205,
conditional, 18 334, 626, 627
differential, 19
graph, see graph entropy game, 7, 173, 532
joint, 18 Gaussian broadcast channel, see broadcast
entropy power inequality (EPI), 21, 22, 31, channel
32, 34, 35, 120, 239, 251, 305, Gaussian capacity function, xxvii, 50
310, 329, 425, 552 Gaussian channel, see point-to-point
EPI, see entropy power inequality channel
epigraph, 624 Gaussian interference channel, see
equivocation, 18, see also conditional interference channel
entropy Gaussian multiple access channel, see
erasure, 40, 289 multiple access channel
erasure distortion measure, 61 Gaussian network, see unicast network,
erasure probability, 40 multicast network, multimessage
ergodic capacity, 584 network
error-free communication, 155, 156, 158, Gaussian random variable, xxvii
339, 368–370, 378 Gaussian random vector, xxvii, 629
error-free compression, 70, 270, 513, 520 Gaussian relay channel, see relay channel
error exponent, 69, 72 Gaussian source, see lossy source coding
excess rate, 324 Gelfand–Pinsker coding, 180, 181, 183, 186,
expected value, xxvi 210, 241
Gelfand–Pinsker theorem, 180
fading, 583, 584, 588 generator matrix, 69
Fano’s inequality, 19, 31, 32, 44, 48, 56, 74, genie, 141, 143, 147, 149, 164, 449
84, 89, 113, 119, 122, 137, 147, gossip algorithm, 529
164, 166, 167, 171, 177, 182, 190, graph, 364, 373
225, 251, 267, 271, 365, 380, 451, augmented, 375
461, 555, 559, 561, 562, 567, 570, characteristic, 532
571, 613 cyclic, 468
fast fading, 584 graphical network, see unicast network,
FCC (Federal Communications multicast network, multimessage
Commission), 587 network
feedback, 45, 49, 67, 428, 559, 579, 591 graph entropy, 531
generalized, 477 conditional, 532, 540
one-sided, 438 Gray–Wyner system, 345–348, 357
Fenchel–Eggleston–Carathéodory theorem, Gupta–Kumar random network, see
103, 623, 631 random network
Subject Index 677

Gács–Körner–Witsenhausen common interference-as-noise inner bound,

information, 347 138
minimum-energy-per-bit region,
Hadamard’s inequality, 21, 34, 229, 490 162
half-bit theorem, 151 normalized symmetric capacity, 151
half duplex, 398 simultaneous-nonunique-decoding
Hamming distortion measure, 57 inner bound, 139
Han–Kobayashi coding, 143–146, 152, 163 symmetric, 139, 141, 151–157
handoff, 161 symmetric degrees of freedom
handset, 1, 93, 131, 363 (DoF), 151, 152, 158
hash–forward, 418 symmetric capacity, 151
hashing, 262 time-division inner bound, 138
helper, 264 with strong interference, 139, 140
hierarchical cooperation, 599 with very strong interference, 140
high distortion, 328 with weak interference, 141–143
Horstein coding, 431, 455 Z, 161
host image, 185 Han–Kobayashi inner bound, 143, 162
hybrid coding, 353, 355 injective deterministic, 145–148, 159,
hypergeometric random variable, 316 163
hypergraph, 379, 469, 512 injective semideterministic, 148–150
interference-as-noise inner bound, 133
image, 185, 274 modulo-2 sum, 133
image of a set, 220 outer bound, 134
independent set, 531 q-ary expansion deterministic
indirect decoding, 200–203, 218, 219, 556, (QED-IC), 153–157
577 semideterministic, 162
information capacity, 39, 41 simultaneous-decoding inner bound,
information hiding, 185 134
information leakage rate, 550 simultaneous-nonunique-decoding
INR, see interference-to-noise ratio inner bound, 135
instantaneous relaying, 414–416 sum-capacity, 132
integral domain, 372 time-division inner bound, 133
intercell interference, 9, 131 with more than two user pairs,
interference-to-noise ratio (INR), 138 157–159
interference alignment, 157, 596 with degraded message sets, 196
interference channel (IC), 131–167, 254, with orthogonal components, 133
595–597 with strong interference, 136, 137, 139,
binary expansion deterministic, 155 140
capacity region, 132 with very strong interference, 136, 140
code for, 132 Z, 161, 162
discrete memoryless (DM-IC), interference decoding, 160
132–137, 145–148, 160, 162, 163 interference relay channel, 483
Gaussian, 137–143, 148–157, 160–164, interior point method, 377
196, 254, 595–597 Internet, xvii, 1, 2, 5, 320
fading, 595–597 irrational basis, 160
genie-aided outer bound, 164 isotropic, 589
half-bit theorem, 151 iterated function system, 433
Han–Kobayashi inner bound, 163 iterative refinement, 428, 433
678 Subject Index

iterative water-filling, 234 LDPC, see low density parity check code
less noisy, 121, 125, 126, 128, see also
jammer, 168 broadcast channel
Jensen’s inequality, 17, 18, 21, 65, 305, 329, linear binning, 262, 264, 536
625 linear code
joint entropy, 18 BSC, 43
joint source–channel coding, 66–68, computing, 536, 545
336–359, 545
distributed lossless source coding, 264
code for, 66
graphical network, 370
lossless communication of a 2-DMS
lossless source coding, 262
over a DM-BC, 345–352
lossy source coding, 208
lossless communication of a 2-DMS
QED-IC, 155
over a DM-MAC, 336–344, 351
lossy communication of a 2-DMS over linear estimation, 628
a DM-BC, 352 linear network coding, 369–374, 376–378,
lossy communication of a 2-DMS over 482, 507
a DM-MAC, 352, 353, 355 linear program (LP), 367, 376, 377, 380,
lossy communication of a Gaussian 515, 640, 641
source over a Gaussian BC, 357 dual, 641
single-hop network, 351–355 linear relaying, 407
stationary ergodic source, 67 list code, 74, 102
with feedback, 67 LLN, see law of large numbers
joint typicality codebook generation, 207 local area network (LAN), 81, 604
joint typicality decoding, 42, 171, 180, 207, local computing, 540, 541
213, 219, 297, 354, 437, 470, 479, lookahead, 411, 412
617 lossless source coding, 54–56, 61, 62, 71,
joint typicality encoding, 59, 180, 190, 266, 271, 279–281
277, 282, 297, 322, 353 as a special case of lossy source
joint typicality lemma, 29, 30, 35, 43, 47, 60, coding, 61, 62, 279–281
63, 224, 359, 402, 419, 464, 466, Bernoulli source, 55
472, 475, 612, 618 code for, 55
conditional, 271
Karush–Kuhn–Tucker (KKT) condition, discrete memoryless source (DMS),
232, 248, 250, 251, 411, 642 54–56, 61, 62, 271, 279–281
key leakage rate, 560
distributed, see distributed lossless
Kullback–Leibler divergence, see relative
source coding
entropy
optimal rate, 55
Lagrange duality, 231, 247, 248, 286, 367, stationary ergodic source, 71
368, 515, 641, 642 with a helper, 264–269, 271
Lagrange multiplier, 53, 585 with causal side information, 279, 280
Lagrangian, 53, 232, 248 with noncausal side information, 281
LAN, see local area network lossless source coding theorem, 55
lattice code, 69, 192 lossy source coding with side information,
law of conditional variances, 628 274–293
law of large numbers (LLN), 626 2-component DMS (2-DMS), 274–290
law of total expectation, 60, 284, 326 causal side information available at the
layered coding, 107, see also superposition decoder, 275–280, 286, 287
coding coded side information, 290
Subject Index 679

different side information available at matroid theory, 378

several decoders, 288, 291 Maurer’s example, 572–574
doubly symmetric binary source max-flow min-cut theorem, 366
(DSBS), 276, 282 maximal coding theorem, 69
from a noisy observation, 289 maximal independence, 531
noncausal side information available maximal posterior interval decoding, 432,
at the decoder, 280–288 433
no side information, 275 maximal probability of error, 43, 73, 89, 100,
quadratic Gaussian, 281, 289, 290, 292 125, 172, 173, 222, 223, 377, 381
side information available at both the maximum differential entropy lemma, 50,
encoder and the decoder, 275 141, 165, 229
side information available at the maximum flow, 365
encoder, 275 maximum likelihood decoding, 72, 191
side information dependent distortion maximum mutual information decoding,
measure, 289 191
when side information may be absent, medium access, 14, 81, 604
286–288, 290 memory
lossy source coding, 56–65, 71 ternary-input, 194
Bernoulli source and Hamming with stuck-at faults, 168, 178–180, 195
distortion measure, 57 write-once (WOM), 168, 178
code for, 57 memoryless property, 38, 39, 45, 59, 90, 384,
discrete memoryless source (DMS), 411, 435, 472, 608
56–62 mesh network, 1, 10, 363, 382
distributed, see distributed lossy message interval, 428
source coding message point, 428, 431, 433
quadratic Gaussian, 64, 65 MGL, see Mrs. Gerber’s lemma
rate–distortion function, 57 mild asynchrony, 608, 617
stationary ergodic source, 71 minimum energy per bit, 51, 74, 102, 127,
with two reconstructions, 322 162, see also energy-per-bit–rate
lossy source coding theorem, 57 function
low density parity check (LDPC) code, 69 minimum mean squared error (MMSE)
low distortion, 328 estimation, xxvi, 76, 186, 195,
LP, see linear program 241, 301, 310, 319, 429, 627, 630
Minkowski sum, 127, 236, 238
MAC, see multiple access channel mobile agent, 529
Markov chain, xxvii modulo-2 sum, xxv, 99, 133, 369, 507, 536,
Markov inequality, 43, 100, 602 544, 558
Markov lemma, 296, 299, 475 more capable, 121, 125, 126, 136, see also
Marton coding, 207–213, 217, 218, 243, 577 broadcast channel
matrix, xxviii movie, 5, 320
channel gain, 228, 485 Mrs. Gerber’s lemma (MGL), 19, 22, 31, 33,
conditional covariance, xxvi 115, 116, 120, 266, 312, 406
correlation, 21, 640 μ-sum problem, 535
covariance, xxvi multicast completion, 479
crosscovariance, xxvi multicast network
generator, 69 capacity, 460
parity-check, 262, 536 code for, 460
matrix inversion lemma, 303, 318, 319, 629 cutset bound, 461
680 Subject Index

degraded, 462 multiple-unicast, 375, 376, 380

deterministic, 467–469 routing capacity region, 380
finite-field, 469 routing code for, 380
layered, 482 multicast, 478, 479, 483
nonlayered, 482 multicast-completion inner bound,
with no interference, 468 479
discrete memoryless (DM-MN), 459 noisy network coding inner bound,
graphical, 364–366, 368–373 478
butterfly, 368 simultaneous-nonunique-decoding
capacity, 365, 369 inner bound, 479, 480
code for, 364 wireless erasure, 478
cutset bound, 365 multipath, 227, 588
linear code for, 370 multiple-input multiple-output (MIMO),
with cycles, 369, 373, 377, 468 227, 588
with delays, 369 multiple access channel (MAC), 81–103,
hypergraphical, 379 129, 130, 254, 434–442, 590–593,
network decode–forward lower bound, 599, 604, 607–614, 617
462 asynchronous, 607–614, 617
noisy network coding lower bound, binary erasure, 83, 86, 436
467 binary multiplier, 82, 99
wireless erasure, 469, 470 capacity region, 82, 86, 92, 98
multicoding, 179, 180, 207, 284, 553 multiletter characterization, 84, 101
multicommodity flow, 375 code for, 82
multigraph, 369–371 compound, 191
multihop relaying, 387–390, 396, 491, 493 cooperative capacity, 101
coherent, 389–390 discrete memoryless (DM-MAC),
multiletter characterization, 71, 84, 101, 81–93, 98–103, 129, 130,
408, 411, 450 434–438, 599
multimedia, 320, 508 Gaussian, 93–98, 101, 102, 128, 130,
multimessage network, 477–480 188, 254, 438, 454, 590–593, 599
capacity region, 477 achievable SNR region, 102
code for, 477 asynchronous, 609
cutset bound, 477 capacity region, 94, 98
deterministic, 478 duality to Gaussian BC, 128
finite-field, 478 fading, 590–593, 599
with no interference, 478 minimum-energy-per-bit region,
discrete memoryless (DMN), 477 102
Gaussian, 485–490 time-division inner bound, 96, 254
cutset bound, 485 vector (GV-MAC), 232–234
multicast, 486, 490 with additive Gaussian state, 187,
multiple-unicast, 492 188
graphical, 373–377, 478 with feedback, 438–442, 454
2-unicast, 376, 377 lossless communication of a 2-DMS,
butterfly, 376 336–344
capacity region, 374 lossless computing, 544, 545
code for, 373 modulo-2 sum, 99, 604
cutset bound, 374 outer bound, 83
multicast, 374, 375 push-to-talk, 99
Subject Index 681

sum-capacity, 82, 92 quadratic Gaussian, 511, 523

time-division inner bound, 83 rate–distortion region, 508
with common message, 129 triangular, 510–512, 524
with degraded message sets, 129 with side information, 522–524
with dependent messages, 103 multiprocessors, 529
with more than two senders, 98 multivariate covering lemma, 218, 326, 333,
with a helper, 130 351
with cribbing encoders, 418 music, 198
with feedback, 434–442, 450, 454 μ-sum problem, 534
cooperative outer bound, 434, 439, mutual covering lemma, 208, 209, 221, 222
449 mutual information, 22, 23, 347
Cover–Leung inner bound, 435, 438 chain rule, 24, 25
dependence balance bound, 449 conditional, 24
with generalized feedback, 477 mutual packing lemma, 297, 299, 314
with input costs, 101
with known delays, 617 nearest neighbor decoding, 429
with list codes, 102 nested sources, 348, 357
with more than two senders, 130, 234 network coding, 369
with orthogonal components, 338 linear, 369–374, 376–378, 507
with random arrivals, 603 network coding theorem, 369
with state, 174, 175, 187, 188, 191, 193, noisy network coding, 369, 466–476,
599 478–480, 486, 488, 489
multiple description coding, 320–335 layered, 489
Bernoulli source and Hamming noisy state information, 196
distortion measures, 324, 331, non-Gaussian additive noise, 442
332, 334 noncausal state information, 178
code for, 321 nonconvex optimization, 93
discrete memoryless source (DMS), nonuniform message, 73, 338
321–326, 331–334 NP-complete, 378
El Gamal–Cover inner bound, 324 number theory, 160
more than two descriptions, 335
outer bound, 323 OFDM, see orthogonal frequency division
quadratic Gaussian, 327–331 multiplexing
rate–distortion region, 321 one-helper problem, 265
with semideterministic distortion one-time pad, 558
measure, 324 online banking, 549
with combined description only, 323 operational capacity, 41
with no combined reconstruction, 322 opportunistic coding, 599
with no excess rate, 324 optimal rate, 55
Zhang–Berger inner bound, 332, 333 optimization, 117
multiple description network coding, convex, 231, 234, 246, 248, 256,
508–512, 521–524 640–642
branching, 521 nonconvex, 93, 410
cascade, 509, 510, 522, 523 order notation, xxviii
code for, 508 orthogonality principle, 187, 310, 429, 628
cutset bound, 508 orthogonal frequency division multiplexing
diamond, 521 (OFDM), 54
dual-cascade, 521 outage capacity, 586, 599
682 Subject Index

packet, 469 polar code, 69

packet radio, 418 polyhedron, 636
packing lemma, 46, 72, 88, 89, 93, 110, 111, polymatroid, 98
135, 145, 171, 172, 174, 176, 181, polynomial, 372, 381
201, 205, 209, 222, 284, 388, 390, polytope, 367, 636
393, 395, 402, 438, 464, 554, 573 posterior, 431
mutual, 297, 299, 314 posterior matching, 433
parity-check matrix, 262, 536 power-to-distortion ratio, 64
path diversity, 320 power constraint, 49, 94, 117, 228, 232, 235
path loss, 12, 49, 490, 492, 496, 498, see also almost-sure, 431, 552
channel gain expected, 76, 184, 428, 584
pdf, see probability density function sum, 245
peer-to-peer, 529, 542 power control, 96, 97, 102, 138, 139, 161,
pentagon, 86 587, 593
perfect secrecy, 559 power grid, 2
perturbation method, 213, 634, 635 power law, 490, 492
pipe, 368 power spectral density (psd), 70
pmf, see probability mass function preamble, 609
point-to-point channel, 38–54, 583–589 press conference, 81
asynchronous, 609 probability density function (pdf), xxvi
binary erasure (BEC), 40 probability mass function (pmf), xxvi
binary symmetric (BSC), 40 causally conditional, 449, 456
capacity, 39 conditional, xxvi
code for, 39 induced by feedback, 435
discrete memoryless (DMC), 38–48 induced by messages, 89, 110, 145, 201
energy-per-bit–rate function, 52 joint, xxvi
Gaussian, 49–54, 428–431, 583–589 protocol model, 499
capacity, 50 psd, see power spectral density
fading, 583–589 push-to-talk, 99
minimum energy-per-bit, 52
Q-function, xxvii, 430
product, 52–54, 230
quadratic distortion measure, 64, 75
spectral, 70
quadratic Gaussian rate function, xxvii, 64
vector, 227–232, 588, 589
quantization, 56
vector fading, 588, 589 queue, 601, 602
with additive Gaussian state,
184–187, 195 Radon–Nikodym derivative, 23
with feedback, 49 randomized encoding, 73, 89, 172, 173, 283,
product, 41, 52–54 289, 551, 553, 557, 558, 568, 576
secure communication over, 558, 559 randomized time sharing, 216
with feedback, 45, 428–434, 460 random access channel, 604–606
with input cost, 47 random binning, 261–264
with memory, 71 random coding, 42, 59, 61, 62, 377, 378
with random data arrival, 601–604 random coding exponent, 69
with state, 169–187, 192–196, see also random data arrival, 601–604
arbitrarily varying channel, random network, 492–499, 598
channel with random state, with fading, 598
compound channel random process, xxvii
Z, 71 random variable, xxv
Subject Index 683

random vector, xxvi reliability function, 69, 72

rank, xxviii ring, 372
rate–distortion function, 57 robust typicality, 32
rate splitting, 85, 144, 152, 164, 174, 198, routing, xix, 5, 366–368, 372, 375–377, 380
200, 218, 237, 323, 325, 441
Rayleigh fading, 589 saddle point, 173
RC, see relay channel satellite, 382, 549
reciprocity lemma, 231 satellite codebook, 555, 556, 577
reconstruction, 54, 56 satellite codeword, 107, 109, 111, 144, 199,
relative entropy (Kullback–Leibler 200
divergence), 23, 33 scalar quantization, 56
relay broadcast channel, 482 Schalkwijk–Kailath coding, 428
relay channel (RC), 382–426, 444, 445, 460, secret key agreement, 559–575, 578
462–464, 470–473 channel model, 572–575
broadcast bound, 384 code for, 560
capacity, 384 lower bound, 564
cascade, 387 multiple nodes, 569–572
causal, 414–416 one-way, 560–564
code for, 384 q-round, 564–572
coherent multihop lower bound, 389 secret key capacity, 560
compress–forward lower bound, 399 source model, 559–572, 578
cutset bound, 384 upper bound, 566
decode–forward lower bound, 390 without communication, 578
degraded, 391 with helpers, 572
direct transmission lower bound, 386 with rate constraint, 564
discrete memoryless (DM-RC), 383 semideterministic distortion measure, 324
Gaussian, 395–396, 399, 402, 414–416 sensor, 1, 6, 258, 308, 363
receiver frequency-division (RFD), sensor network, 6, 7, 258, 529, 542
406–411 separate source and channel coding, 66,
sender frequency-division (SFD), 337, 338, 343–345, 348, 353, 545
398 sequential decoding, 89, 481
interference, 483 Shannon capacity, 584
lookahead, 411–416 Shannon lower bound, 75
modulo-2 sum, 405 Shannon–McMillan–Breiman theorem, 32
multihop lower bound, 387 Shannon strategy, 176, 177, 414
multiple access bound, 384 shift, 153
noncausal, 412–414 side information, 169, 264, 274, 411, 530
partial decode–forward lower bound, causal, 275
396 coded, 264, 290
reversely degraded, 386 degraded, 291
Sato, 391, 413, 415 noncausal, 280
semideterministic, 397 signal-to-interference ratio model, 499
two-way, 483 signal-to-noise ratio (SNR), 49, 94, 98, 102,
with feedback, 444, 445 118, 138
with orthogonal receiver components, simplex method, 377
403 simultaneous decoding, 88, 89, 111, 133,
with orthogonal sender components, 136, 152, 238
398 simultaneous nonunique decoding, 110,
684 Subject Index

135, 136, 139, 144, 160, 161, 479, sum-capacity, 82

483 super-symbol, 84
simultaneous transmission, 496 supernode, 543
single-letter characterization, 84, 86, 101, superposition coding, 107–111, 119, 121,
269, 408 123, 144, 198–200, 214, 218, 236,
singular value decomposition, xxviii, 237, 349, 436, 441, 480, 557, 586
229–231 supporting hyperplane, 252, 623
Slater’s condition, 232, 248, 368, 642 supporting line, 116, 246, 247, 256
Slepian–Wolf coding, 263, 264, 338, 507, support lemma, 631, 632
539, 562 symmetric degrees of freedom (DoF), 151
Slepian–Wolf theorem, 260 symmetric network capacity, 490, 493
sliding window decoding, 462–466 symmetric square root, xxviii
slow fading, 584 symmetrization argument, 116, 117, 195,
smart genie, 141, 142 271, 272
SNR, see signal-to-noise ratio
source–channel separation, 66, 67, 338 target tracking, 275, 290
source–channel separation theorem, 66 terrain, 275, 290
spatial multiplexing, 227 terrestrial link, 382
spatial reuse of time/frequency, 496 test channel
squared error distortion measure, 64 distributed Gaussian, 301–303, 310
stability, 602 Gaussian, 487
in the mean, 618 time division, 82, 85, 96, 99, 106, 107, 133,
stability region, 603 152, 493, 596
state information, 173 cellular, 493–496
causal, 175 with power control, 96, 97, 102, 138,
coded, 189 254, 493
distributed, 175 time expansion, 377, 482
noisy, 196 time sharing, 85, 88, 89, 92, 97, 99, 100, 102,
noncausal, 178 128, 134, 136, 139, 161, 188, 243,
strictly causal, 193 259, 265, 304, 325, 334, 340, 403,
value of, 194 407, 408, 487
Steiner tree, 378 argument, 85
storage, 2, 38 coded, 93, 134, 400, 467
strictly causal state information, 193 on the state sequence, 174, 190
strong converse, 69, 70, 75, 99, 125, 220, 271 randomized, 216
strong duality, 641, 642 random variable, 91, 92, 114, 123, 147,
strong interference, 136, 139, 158, 163, 596 164, 167, 171, 226, 268, 344, 461,
strong typicality, 32 477, 486, 512, 519, 555, 561, 613,
structured code, 69, 160 633
sub-bin, 562 sequence, 93, 174
subblock, 96, 609 torus, 501, 502
subcodebook, 179, 181, 207, 284, 553 total asynchrony, 608
subspace, 157, 160 trace, xxviii
successive cancellation decoding, 87–89, 97, training sequence, 171, 591
98, 102, 107, 119, 128, 136, 139, trajectory, 275
160, 188, 255, 441 transmitter-referred noise, 117
successive refinability, 331, 332 transportation, 2, 368
successive refinement, 320, 330–332, 511 treating interference as noise, 95, 97, 133,
Subject Index 685

138, 141, 142, 152, 156–158, 161, waveform, 52, 70, 236
480 weak convergence, 77
TWC, see two-way channel weak converse, 69
two-sender three-receiver channel, 606, 616 weak interference, 141, 158
two-sender two-receiver channel, 102 weak typicality, 32
two-way channel (TWC), 445–453 white Gaussian noise (WGN) process, xxvii
binary multiplier channel (BMC), 447, 2-component (2-WGN), xxvii
448, 450, 454 white Gaussian noise (WGN) source, 64
capacity region, 446 wideband, 70
discrete memoryless (DM-TWC), wireless erasure network, 469, 470
445–453 wiretap channel (WTC), 550–556, 572–575,
Gaussian, 457 578, 579
with common output, 448 binary symmetric (BS-WTC), 551,
two-way lossless source coding, 513 572–574
with a relay, 507 BSC–BEC, 578
two-way lossy source coding, 515–519 code for, 550
with a relay, 525, 547 degraded, 551, 579
two-way relay channel, 483, 487–489, 507, discrete memoryless (DM-WTC),
547 550–556, 572–574, 578, 579
Gaussian, 487–489, 499 Gaussian, 552
noiseless, 507, 525, 547 Gaussian vector, 553
typicality, 25, 32 more capable, 551
typical average lemma, 26, 32, 48, 60, 278, rate–leakage region, 551
284, 299, 326, 510 secrecy capacity, 551
virtual, 572, 575
uncertainty, 449 with feedback, 559, 579
uncoded transmission, 68, 339 with more than two receivers, 556,
unicast network 576, 577
discrete memoryless, 460 with secret key, 559
Gaussian, 490 worst noise, 165
graphical, 366–368 write-once memory (WOM), 168, 178
uniform random variable, xxvii writing on dirty paper, 184–189, 195, 211,
union of events bound, 625 243, 252
uplink, 9, 81, 93, see also multiple access vector, 241, 242, 244
channel Wyner–Ziv coding, 282–285, 400, 481, 516,
upshift, 153 517, 530, 537, 540, 542, 564
useful genie, 141, 142 Wyner–Ziv theorem, 280
Wyner’s common information, 347
variable-length code, 70, 270
variance, xxvi Young’s inequality, 22, 31
vector quantization, 56
very strong interference, 136, 140 Zhang–Berger coding, 332
video, 1, 198, 274 Z channel, 71
compound, 170
water-filling, 53, 54, 70, 74, 229, 231, 232,
234, 410, 586, 587
iterative, 234
over time, 586
watermark, 185, 186

Full Fundamentals of Massive MIMO 1st Edition Thomas L. Marzetta Ebook All Chapters
100% (7)
Full Fundamentals of Massive MIMO 1st Edition Thomas L. Marzetta Ebook All Chapters
40 pages
Notes For EE 229A: Information and Coding Theory UC Berkeley Fall 2020
100% (1)
Notes For EE 229A: Information and Coding Theory UC Berkeley Fall 2020
70 pages
Information Theory Lecture Notes
100% (1)
Information Theory Lecture Notes
97 pages
A. Ya. Khinchin - Mathematical Foundations of Information Theory-Dover Publications (1957) PDF
100% (1)
A. Ya. Khinchin - Mathematical Foundations of Information Theory-Dover Publications (1957) PDF
142 pages
978 981 15 2584 1 PDF
No ratings yet
978 981 15 2584 1 PDF
684 pages
Advanced Topics Information Theory-Lecture Notes - Stefan M. Moser 2.5 PDF
No ratings yet
Advanced Topics Information Theory-Lecture Notes - Stefan M. Moser 2.5 PDF
416 pages
Electrical Engineering 229A Lecture Notes Information Theory and Coding
No ratings yet
Electrical Engineering 229A Lecture Notes Information Theory and Coding
117 pages
Multi-Gigabit Microwave and Millimeter-Wave Wireless Communication by Coll.
No ratings yet
Multi-Gigabit Microwave and Millimeter-Wave Wireless Communication by Coll.
245 pages
[Electronic Science] Norman Abramson - Information Theory and Coding (1963, McGraw-Hill Inc.,US) - Libgen.li
No ratings yet
[Electronic Science] Norman Abramson - Information Theory and Coding (1963, McGraw-Hill Inc.,US) - Libgen.li
111 pages
Information Theory For Electrical Engineers Orhan Gazi pdf download
No ratings yet
Information Theory For Electrical Engineers Orhan Gazi pdf download
90 pages
Handbook of Formal Languages
No ratings yet
Handbook of Formal Languages
889 pages
Alajaji Chen2018 Book AnIntroductionToSingle UserInf
No ratings yet
Alajaji Chen2018 Book AnIntroductionToSingle UserInf
333 pages
Lecture Notes in Information Theory Volume II
No ratings yet
Lecture Notes in Information Theory Volume II
293 pages
Gallagher Information Theory
No ratings yet
Gallagher Information Theory
604 pages
Dokumen - Pub Discrete Time Speech Signal Processing Principles and Practice Low Price Ed Lpe 013242942x 9780132429429 9788177587463 8177587463
No ratings yet
Dokumen - Pub Discrete Time Speech Signal Processing Principles and Practice Low Price Ed Lpe 013242942x 9780132429429 9788177587463 8177587463
802 pages
The Zero Error Capacity of A Noisy Channel
100% (1)
The Zero Error Capacity of A Noisy Channel
12 pages
Jacob Millman, Arvin Grabel - Microelectronics-McGraw-Hill (1987) PDF
No ratings yet
Jacob Millman, Arvin Grabel - Microelectronics-McGraw-Hill (1987) PDF
1,030 pages
EE 376A: Information Theory: Lecture Notes
No ratings yet
EE 376A: Information Theory: Lecture Notes
75 pages
Information Theory and Coding by Example
89% (9)
Information Theory and Coding by Example
528 pages
Wei Xiang, Kan Zheng, Xuemin Sherman
50% (2)
Wei Xiang, Kan Zheng, Xuemin Sherman
690 pages
MIMO Text Book PDF
100% (2)
MIMO Text Book PDF
240 pages
Communication Systems 4th Edition - Simon Haykin PDF
No ratings yet
Communication Systems 4th Edition - Simon Haykin PDF
1,397 pages
Chen ActiveNetworkAnalysis - FeedbackAmplifierTheory PDF
No ratings yet
Chen ActiveNetworkAnalysis - FeedbackAmplifierTheory PDF
880 pages
Wiley - Interscience.elements - Of.information - Theory.jul.2006.ebook DDU
100% (3)
Wiley - Interscience.elements - Of.information - Theory.jul.2006.ebook DDU
774 pages
Network Analysis and Synthesis. Winberg
No ratings yet
Network Analysis and Synthesis. Winberg
120 pages
Information Theory and Coding by Norman Abramson
78% (18)
Information Theory and Coding by Norman Abramson
103 pages
Branislav Jovic - Synchronization Techniques For Chaotic Communication Systems PDF
No ratings yet
Branislav Jovic - Synchronization Techniques For Chaotic Communication Systems PDF
354 pages
Signals and Systems Continuous and Discrete by Rodger E Ziemer Z
75% (4)
Signals and Systems Continuous and Discrete by Rodger E Ziemer Z
639 pages
Tutorial Satellite Communications
No ratings yet
Tutorial Satellite Communications
151 pages
Signals and Systems - A Fresh Look (Chi-Tsong Chen)
No ratings yet
Signals and Systems - A Fresh Look (Chi-Tsong Chen)
345 pages
Mdobook
No ratings yet
Mdobook
642 pages
Statistical Digital Signal Processing and Modeling
100% (7)
Statistical Digital Signal Processing and Modeling
622 pages
(Signals and Communication Technology) Joachim Speidel - Introduction To Digital Communications-Springer International Publishing (2019) PDF
No ratings yet
(Signals and Communication Technology) Joachim Speidel - Introduction To Digital Communications-Springer International Publishing (2019) PDF
329 pages
6 Multivariate Gaussian
No ratings yet
6 Multivariate Gaussian
138 pages
Signals and Systems Book
No ratings yet
Signals and Systems Book
345 pages
Wiberg-StateSpaceLinearSystems Text PDF
100% (2)
Wiberg-StateSpaceLinearSystems Text PDF
246 pages
PDF Artificial Life and Evolutionary Computation 13th Italian Workshop WIVACE 2018 Parma Italy September 10 12 2018 Revised Selected Papers Stefano Cagnoni Download
100% (3)
PDF Artificial Life and Evolutionary Computation 13th Italian Workshop WIVACE 2018 Parma Italy September 10 12 2018 Revised Selected Papers Stefano Cagnoni Download
62 pages
Communication Systems
No ratings yet
Communication Systems
839 pages
(Exposicion) BL Hammond - Monte Carlo Methods in Ab Initio Quantum Chemistry-Wspc (1994) PDF
No ratings yet
(Exposicion) BL Hammond - Monte Carlo Methods in Ab Initio Quantum Chemistry-Wspc (1994) PDF
320 pages
ST2133 ASDT 2021 Guide
No ratings yet
ST2133 ASDT 2021 Guide
242 pages
(S.Lin - and.D.J.Costello) Error Control Coding Fund (B-Ok - Xyz) PDF
100% (1)
(S.Lin - and.D.J.Costello) Error Control Coding Fund (B-Ok - Xyz) PDF
624 pages
Digital Communication Systems 3.25
No ratings yet
Digital Communication Systems 3.25
122 pages
Principles of Communication Systems - Herbert Taub
100% (5)
Principles of Communication Systems - Herbert Taub
788 pages
Full (Original PDF) A Primer of Ecological Statistics 2nd Edition PDF All Chapters
100% (8)
Full (Original PDF) A Primer of Ecological Statistics 2nd Edition PDF All Chapters
51 pages
CS6301 - Analog and Digital Communication (ADC) PDF
No ratings yet
CS6301 - Analog and Digital Communication (ADC) PDF
122 pages
A Primer of Ecological Statistics, 2nd Edition pdf epub
100% (14)
A Primer of Ecological Statistics, 2nd Edition pdf epub
16 pages
Orbital Mechanics For Engineering Students Fourth Edition. Edition Howard D. Curtis - Ebook PDF All Chapter Instant Download
100% (3)
Orbital Mechanics For Engineering Students Fourth Edition. Edition Howard D. Curtis - Ebook PDF All Chapter Instant Download
51 pages
DC Notes PDF
No ratings yet
DC Notes PDF
151 pages
Cai, Baoping - Bayesian Networks in Fault Diagnosis Practice and Application-World Scientific Publishing (2019) PDF
No ratings yet
Cai, Baoping - Bayesian Networks in Fault Diagnosis Practice and Application-World Scientific Publishing (2019) PDF
418 pages
Lectnotemat 2
No ratings yet
Lectnotemat 2
348 pages
Advanced Machine Learning: CS 281
100% (1)
Advanced Machine Learning: CS 281
88 pages
r2
No ratings yet
r2
27 pages
Machine Learning
No ratings yet
Machine Learning
191 pages
Digital Communications - Proakis
100% (1)
Digital Communications - Proakis
937 pages
Introduction to probability and statistics Second Edition, Revised And Expanded Edition Giri pdf download
No ratings yet
Introduction to probability and statistics Second Edition, Revised And Expanded Edition Giri pdf download
84 pages
Explanation of The Three Fundamental Principles - Shaykh Muhammad Ibn Saalih Al-'Uthaymeen
No ratings yet
Explanation of The Three Fundamental Principles - Shaykh Muhammad Ibn Saalih Al-'Uthaymeen
135 pages
Curtis-Linear Algebra PDF
100% (3)
Curtis-Linear Algebra PDF
346 pages
Lab No. 9 Jacobian Matrix Formulation and Importance Problem Statement: Objective
100% (1)
Lab No. 9 Jacobian Matrix Formulation and Importance Problem Statement: Objective
5 pages
Theory and Design of Digital Communication Systems by Tri T. Ha
100% (6)
Theory and Design of Digital Communication Systems by Tri T. Ha
670 pages
M e Cse
No ratings yet
M e Cse
83 pages
Principles of Digital Communications
100% (1)
Principles of Digital Communications
152 pages
Assumption of Normality Dr. Azadeh Asgari
No ratings yet
Assumption of Normality Dr. Azadeh Asgari
27 pages
Girard - Linear Logic (1987)
No ratings yet
Girard - Linear Logic (1987)
101 pages
The CMA Evolution Strategy: A Tutorial: Nikolaus Hansen June 28, 2011
No ratings yet
The CMA Evolution Strategy: A Tutorial: Nikolaus Hansen June 28, 2011
34 pages
Communication System by BP Lathi PDF
No ratings yet
Communication System by BP Lathi PDF
2 pages
Revised Brochure BStat (2016)
No ratings yet
Revised Brochure BStat (2016)
36 pages
Mobile and Wireless Communication Complete Lecture Notes #8
No ratings yet
Mobile and Wireless Communication Complete Lecture Notes #8
25 pages
Advance Digital Communication
No ratings yet
Advance Digital Communication
31 pages
Analog Communication Godse and Bakshi PDF
No ratings yet
Analog Communication Godse and Bakshi PDF
522 pages
Manly Caps1-3 PDF
No ratings yet
Manly Caps1-3 PDF
37 pages
Just Give Me The Codes Lecture 5: Data Preprocessing II
No ratings yet
Just Give Me The Codes Lecture 5: Data Preprocessing II
21 pages
Practical Implementation On Baye's Decision Rule
No ratings yet
Practical Implementation On Baye's Decision Rule
21 pages
Online_Exam_Monitoring_System_based_on_Factor_anal_241007_120456
No ratings yet
Online_Exam_Monitoring_System_based_on_Factor_anal_241007_120456
7 pages
MTH3251 Financial Mathematics Exercise Book 15
No ratings yet
MTH3251 Financial Mathematics Exercise Book 15
18 pages
For Section 1.7
No ratings yet
For Section 1.7
11 pages
How To Minimize Misclassification Rate and Expected Loss For Given Model
No ratings yet
How To Minimize Misclassification Rate and Expected Loss For Given Model
7 pages
Nyquist
No ratings yet
Nyquist
18 pages
4g Lte Advanced Pro and The Road To 5g
No ratings yet
4g Lte Advanced Pro and The Road To 5g
1 page
Probability
No ratings yet
Probability
12 pages
Random Variables, Distributions, Multidimensional Random Variables
No ratings yet
Random Variables, Distributions, Multidimensional Random Variables
9 pages
Markov Chain Implementation in C++ Using Eigen
No ratings yet
Markov Chain Implementation in C++ Using Eigen
9 pages
Understanding: Probability
No ratings yet
Understanding: Probability
5 pages
Final 100b w21
No ratings yet
Final 100b w21
5 pages
Communication Theory DR J S Chitode PDF
14% (7)
Communication Theory DR J S Chitode PDF
2 pages
Assignment 1 2020
No ratings yet
Assignment 1 2020
5 pages
Bayes Classifier Exercise - 1
No ratings yet
Bayes Classifier Exercise - 1
2 pages
A Course in Mathematical Statistics 0125993153
100% (10)
A Course in Mathematical Statistics 0125993153
593 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
CompTIA Network+ Review Guide: Exam N10-008
From Everand
CompTIA Network+ Review Guide: Exam N10-008
Jon Buhagiar
No ratings yet

03 Network Information Theory 2011 PDF

Uploaded by

03 Network Information Theory 2011 PDF

Uploaded by

Network Information Theory

This publication is in copyright. Subject to statutory exception

First published 2011

Printed in the United Kingdom at the University Press, Cambridge

Library of Congress Cataloguing in Publication data

ISBN 978-1-107-00873-1 Hardback

Additional resources for this publication at www.cambridge.org/9781107008731

Cambridge University Press has no responsibility for the persistence or

2 Information Measures and Typicality 17

3 Point-to-Point Information Theory 38

3.3 Channel Coding with Input Cost 47

Part II Single-Hop Networks

4 Multiple Access Channels 81

5 Degraded Broadcast Channels 104

Bibliographic Notes 125

6 Interference Channels 131

7 Channels with State 168

8 General Broadcast Channels 197

8.5 Outer Bounds 214

9 Gaussian Vector Channels 227

10 Distributed Lossless Compression 258

11 Lossy Compression with Side Information 274

Bibliographic Notes 288

12 Distributed Lossy Compression 294

13 Multiple Description Coding 320

14 Joint Source–Channel Coding 336

Part III Multihop Networks

15 Graphical Networks 363

16 Relay Channels 382

17 Interactive Channel Coding 427

Bibliographic Notes 454

18 Discrete Memoryless Networks 459

19 Gaussian Networks 484

20 Compression over Graphical Networks 505

21 Communication for Computing 529

21.3 Interactive Coding for Computing 537

22 Information Theoretic Secrecy 549

23 Wireless Fading Channels 583

24 Networking and Information Theory 600

Bibliographic Notes 614

A Convex Sets and Functions 623

B Probability and Estimation 625

C Cardinality Bounding Techniques 631

D Fourier–Motzkin Elimination 636

E Convex Optimization 640

Common Symbols 664

Author Index 666

Subject Index 671

Development of the Book

Organization of the Book

Appendices. To make the book as self-contained as possible, Appendices A, B, and E

Presentation of the Material

Use of the Book in Courses

15 Graphical networks 18 Discrete memoryless networks

21 Communication for computing 22 Information theoretic secrecy

Abbas El Gamal Palo Alto, California

We introduce the notation and terminology used throughout the book.

Sets, Scalars, and Vectors

sum of the two vectors.

Script letters C , R, P , . . . are used for subsets of ℝ d .

Probability and Random Variables

Sometimes we write X, Y, . . . for random (column) vectors with specified dimensions

∙ X ∼ Binom(n, p): X is a binomial random variable with parameters n ≥ 1 and p ∈

We use the notation {Xi } = (X1 , X2 , . . .) to denote a discrete-time random process.

∙ Binary entropy function: H(p) = −p log p − p̄ log p̄ for p ∈ [0, 1].

1.1 NETWORK INFORMATION FLOW PROBLEM

A networked system consists of a set of information sources and communication nodes

1.2 MAX-FLOW MIN-CUT THEOREM

The max-flow min-cut theorem is discussed in more detail in Chapter .

1.3 POINT-TO-POINT INFORMATION THEORY

Figure .. Shannon’s model of a point-to-point communication system.

Shannon’s ingenious formulation of the point-to-point communication problem led

Source–channel separation theorem. Now we return to the general point-to-point com-

1.4 NETWORK INFORMATION THEORY

∙ Wireless communication uses a shared broadcast medium.

1.4.1 Multiple Sources and Destinations