Incremental Redundancy, Fountain Codes and Advanced Topics: Suayb S. Arslan
Incremental Redundancy, Fountain Codes and Advanced Topics: Suayb S. Arslan
Incremental Redundancy,
Fountain Codes and Advanced Topics
Suayb S. Arslan
141 Innovation Dr. Irvine, CA 92617
[email protected]
Revision History:
• Jan. 2014, First version is released.
1
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
C ONTENTS
1 Abstract 3
2 Introduction 4
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Channel Model and Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Incremental Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 Advanced Topics 45
5.1 Connections to the random graph theory . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Inactivation Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Standardized Fountain Codes for Data Streaming . . . . . . . . . . . . . . . . . . 51
5.4 Fountain Codes for Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
1 A BSTRACT
The idea of writing this technical document dates back to my time in Quantum corpora-
tion, when I was studying efficient coding strategies for cloud storage applications. Having
had a thorough review of the literature, I have decided to jot down few notes for future ref-
erence. Later, these tech notes have turned into this document with the hope to establish a
common base ground on which the majority of the relevant research can easily be analyzed
and compared. As far as I am concerned, there is no unified approach that outlines and com-
pares most of the published literature about fountain codes in a single and self-contained
framework. I believe that this document presents a comprehensive review of the theoreti-
cal fundamentals of efficient coding techniques for incremental redundancy with a special
emphasis on “fountain coding" and related applications. Writing this document also helped
me have a shorthand reference. Hopefully, It’ll be a useful resource for many other graduate
students who might be interested to pursue a research career regarding graph codes, foun-
tain codes in particular and their interesting applications. As for the prerequisites, this doc-
ument may require some background in information, coding, graph and probability theory,
although the relevant essentials shall be reminded to the reader on a periodic basis.
Although various aspects of this topic and many other relevant research are deliberately
left out, I still hope that this document shall serve researchers’ need well. I have also included
several exercises for the warmup. The presentation style is usually informal and the presented
material is not necessarily rigorous. There are many spots in the text that are product of my
coauthors and myself, although some of which have not been published yet. Last but not
least, I cannot thank enough Quantum Corporation who provided me and my colleagues “the
appropriate playground" to research and leap us forward in knowledge. I cordially welcome
any comments, corrections or suggestions.
January 7, 2014
3
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
2 I NTRODUCTION
Although the problem of transferring the information meaningfully is as old as the hu-
mankind, the discovery of its underlying mathematical principles dates only fifty years back
when Claude E. Shannon introduced the formal description of information in 1948 [1]. Since
then, numerous efforts have been made to achieve the limits set forth by Shannon. In his
original description of a typical communication scenario, there are two parties involved; the
sender or transmitter of the information and the receiver. In one application, the sender
could be writing information on a magnetic medium and the receiver will be reading it out
later. In another, the sender could be transmitting the information to the receiver over a
physical medium such as twisted wire or air. Either way, the receiver shall receive the cor-
rupted version of what is transmitted. The concept of Error Correction Coding is introduced
to protect information due to channel errors. For bandwidth efficiency and increased re-
construction capabilities, incremental redundancy schemes have found widespread use in
various communication protocols.
In this document, you will be able to find some of the recent developments in “Incremen-
tal redundancy" with a special emphasis on fountain coding techniques from the perspective
of what was conventional to what is the trend now. This paradigm shift as well as the theoret-
ical/practical aspects of designing and analyzing modern fountain codes shall be discussed.
Although there are few introductory papers published in the past such as [2], this subject has
broadened its influence, and expanded its applications so large in the last decade that it has
become impossible to cover all of the details. Hopefully, this document shall cover most of
the recent advancements related to the topic and enables the reader to think about “what is
the next step now?" type of questions.
The document considers fountain codes to be used over erasure channels although the
idea is general and used over error prone wireline/wireless channels with soft input/outputs.
Indeed an erasure channel model is more appropriate in a context where fountain codes are
frequently used at the application layer with limited access to the physical layer. We also need
to note that this version of our document focuses on linear fountain codes although a recent
progress has been made in the area of non-linear fountain codes and its applications such
as found in Spinal codes [3]. Non-linear class of fountain codes have come with interesting
properties due to their construction such as polynomial time bubble decoder and generation
of coded symbols at one encoding step.
What follows is a set of notation below we use throughout the document and the defini-
tion of Binary Erasure Channel (BEC) model.
2.1 N OTATION
Let us introduce the notation we use throughout the document.
• Pr{A} denotes the probability of event A and Pr{A|B } is the conditional probability of event A
given event B .
• Matrices are denoted by bold capitals (X) whereas the vectors are denoted by bold lower case
letters (x).
4
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
• Fq is a field of q elements. Also Fkq is the vector space of dimension k where entries of field
elements belong to Fq .
• f ′ (x) and f ′′ (x) are the first and second order derivatives of the continuous function f (x). More
generally, we let f j (x) to denote j -th order derivative of f (x).
• Let f (x) and g (x) be two functions defined over some real support. Then, f (x) = O(g (x)) if and
only if there exists a C ∈ R+ and a real number x 0 ∈ R such that | f (x)| ≤ C |g (x)| for all x > x 0 .
∑
• coef( f (x), x j ) is the j -th coefficient f j for a power series f (x) = j fjxj.
Since graph codes are essential part of our discussion, some graph theory related termi-
nology might be very helpful.
• A graph G consists of the tuple (V, E ) i.e., the set of vertices (nodes) V and edges E .
• A neighbor set (or neighbors) of a node v is the set of vertices adjacent to v, i.e., {u ∈ V |u ̸=
v, (u, v) ∈ E }.
• A path in graph G is a sequence of nodes in which each pair of consecutive nodes is connected
by an edge.
• A cycle is a special path in which the start and the end node is the same node.
( )
• A graph G with k nodes is connected if there is a path between every k2 pair of nodes.
5
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
1 - e0
0 0
e 0 erasure
e 0
1 1
1 - e0
later are encoded into a set of n packets and transported as independent units through var-
ious links. The packet either reaches the destination without any interruption or it is lost
permanently i.e., the information is never corrupted. Moreover, the original order of packets
may or may not be preserved due to random delays. The destination reconstructs the original
k packets if enough number of encoded packets are reliably received.
The BEC channel model is parameterized by the erasure probability ϵ0 and typically rep-
resented as shown in Fig. 2.1. Let a binary random variable X ∈ {0, 1} represent the channel
input and is transmitted over the BEC. The output we receive is another random variable
Y ∈ {0, 1, e} where e represents an erasure. The mathematical model of this channel is nothing
but a set of conditional probabilities given by P r {Y = X |X } = 1 − ϵ0 , P r {Y ̸= X |X } = 0 and
P r {Y = e|X } = ϵ0 . Since erasures occur for each channel use independently, this channel is
called memoryless. The capacity (the maximum rate of transmission that allows reliable com-
munication) of the BEC is 1−ϵ0 bits per channel use. A rate r = k/n practical (n, k) block code
may satisfy r ≤ 1 − ϵ0 while at the same time provide an adequately reliable communication
over the BEC. On the other hand, Shannon’s so called channel coding theorem [1] states that
there is no code with rate r > 1 − ϵ0 that can provide reliable communication. Optimal codes
are the ones that have a rate r = 1−ϵ0 and provide zero data reconstruction failure probability.
The latter class of codes are called capacity-achieving codes over the BEC.
For an (n, k) block code, let ρ j be the probability of correcting j erasures. Based on one of
the basic bounds of coding theory (Singleton bound), we ensure that for j > n −k, ρ j = 0. The
class of codes that have p j = 1 for j ≤ n − k are called Maximum Distance Separable (MDS)
codes. This implies that any pattern of n −k erasures can be corrected and hence these codes
achieve the capacity of the BEC with erasure probability (n − k)/n = 1 − k/n for large block
lengths. The details can be found in any elementary coding book.
6
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
t4
Codeword symbols
already sent
other words, a rate-compatible family of codes has the property that codewords of the higher
rate codes in the family are prefixes of those of the lower rate ones. Moreover, a perfect family
of such codes is the one in which each element of the family is capacity achieving. For a given
channel model, design of a perfect family, if it is ever possible, is of tremendous interest to the
research community.
In a typical rate-compatible transmission, punctured symbols are treated as erasures by
some form of labeling at the decoder and decoding is initiated afterwards. Once a decoding
failure occurs, the transmitter sends off the punctured symbols one at a time until the decod-
ing process is successful. If all the punctured symbols are sent and the decoder is still unable
to decode the information symbols, then a retransmission is initiated through an Automatic
Repeat reQuest (ARQ) mechanism. Apparently, this transmission scenario is by its nature
“rateless" and it provides the desired incremental redundancy for the reliable transmission
of data. An example is shown for a 1/3-rate block code for encoding a message block of four
symbols in Fig. 2.2. Four message symbols are encoded to produce twelve coded symbols.
The encoder sends off six coded symbols (at time t 0 ) through the erasure channel according
to a predetermined puncturing pattern. If the decoder is successful, it means a 2/3-rate code
is used (because of puncturing) in the transmission and it transferred message symbols reli-
ably. If the decoder is unable to decode the message symbols perhaps because the channel
introduces a lot of erasures, a feedback message is generated and sent. Upon the reception
of the feedback, the encoder is triggered to send two more coded symbols (at time t 1 ) to help
the decoding process. This interaction continues until either the decoder sends a success flag
or the encoder depletes of any more coded symbols (at time t 4 ) in which case an ARQ mech-
anism must be initiated for successful decoding. For this example, any generic code with rate
1/3 should work as long as the puncturing patterns are appropriately designed for the best
performance. Our final note is that the rate of the base code (unpunctured) is determined be-
7
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
fore the transmission and whenever the channel is bad, the code performance may fall apart
and ARQ mechanism is inevitable for a reliable transfer.
This rate compatible approach is well explored in literature and applied to Reed Solomon
codes, Convolutional codes and Turbo codes successfully. Numerous efforts have been made
in literature for developing very good performance Rate Compatible Reed Solomon (RCRS),
Rate Compatible Punctured Convolutional (RCPC) Codes [4], Rate Compatible Turbo Codes
(RCTC) [5] and Rate Compatible LDPC codes [6] for various applications. However, the prob-
lem with those constructions is that a good set of puncturing patterns only allows a limited set
of code rates, particularly for convolutional and turbo codes. Secondly, the decoding process
is almost always complex even if the channel is in good state. It is because the decoder always
decodes the same low rate code. In addition, puncturing a sub-optimal low rate code may
produce very bad performance high rate code. Thus, the design of the low rate code as well
as the puncturing table used to generate that code has to be designed very carefully [7]. This
usually complicates the design of the rate compatible block code. In fact, from a purely in-
formation theoretic perspective the problem of rateless transmission is well understood and
for channels possessing a single maximizing input distribution, a randomly generated linear
codes from that distribution will be performing pretty well with high probability. However,
construction of such good codes with computationally efficient encoders and decoders is not
so straightforward. In order to save design framework from those impracticalities of develop-
ing rate compatible or in general rateless codes, we need a completely different paradigm for
constructing codes with rateless properties.
In the next section, you will be introduced to a class of codes that belongs to “near-perfect
code family for erasure channels" called fountain codes. The following chapters shall explore
theoretical principles as well as practical applicability of such codes to real life transmission
scenarios. The construction details of such codes have very interesting, deep-rooted rela-
tionship to graph theory of mathematics, which will be covered as an advanced topic later in
the document.
1 The approach used to transmit such codes is called Digital Fountain (DF), since the transmitter can be viewed
as a fountain emitting coded symbols until all the interested receivers (the sinks) have received the number of
symbols required for successful decoding.
8
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
generates the output encoding symbols to be sent over the erasure channel.
Without loss of generality, we will focus on the binary alphabet as the methods can be ap-
plied to larger alphabet sizes. An LT encoder takes a set of k symbols of information to gener-
ate coded symbols of the same alphabet. Let a binary information block xT = (x 1 , x 2 , . . . , x k ) ∈
Fk2 consist of k bits. The m-th coded symbol (check node or symbol) y m is generated in the
following way: First, the degree of y m , denoted d m , is chosen according to a suitable degree
∑
distribution Ω(x) = kℓ=1 Ωℓ x ℓ where Ωℓ is the probability of choosing degree ℓ ∈ {1, . . . , k}.
Then, after choosing the degree d m ∈ {1, . . . , k}, a d m -element subset of x is chosen randomly
according to a suitable selection distribution. For standard LT coding [9], the selection distri-
bution is the uniform distribution. This corresponds to generating a random column vector
wm of length k, and wei g ht (wm ) = d m positions are selected from a uniform distribution
to be logical 1 (or any non-zero element of Fq for non-binary coding), without replacement.
More specifically, this means that any possible binary vector of weight d m is selected with
( )
probability 1/ dkm . Finally, the coded symbol is given by y m = wTm x (mod 2) m = 1, 2, . . . N
2
. Note that all these operations are in modulo 2. Some of the coded symbols are erased by
the channel, and for decoding purposes, we concern ourselves only with those n ≤ N coded
symbols which arrive unerased at the decoder. Hence the subscript m on y m , d m and wm
runs only from 1 to n, and we ignore at the decoder those quantities associated with erased
symbols. If the fountain code is defined over Fkq , than all the encoding operations follow the
rules of the field algebra.
From the previous description, we realize that the encoding process is done by generating
binary vectors. The generator matrix of the code is hence a k × n binary matrix3 with wm s as
being its column vectors i.e.,
( )
G= w1 | w2 | w3 . . . wn−1 | wn k×n
The decodability of the code is in general sense tied to the invertibility of the generator
matrix. As will be explored later, this type of decoding is optimal but complex to implement.
However, it is useful to draw fundamental limits on the performance and complexity of foun-
tain codes. Apparently, if k > n, the matrix G can not have full rank. If n ≥ k and G contains
an invertible k × k submatrix, then we can invert the encoding operation and claim that the
decoding is successful. Let us assume, the degree distribution to be binomial with p 0 = 0.5.
In otherwords, we flip a coin for determining each entry of the column vectors of G. This idea
is quantified for this simple case in the following theorem [10].
Theorem 1: Let us denote the number of binary matrices of dimension m x × m y and rank
R by M (m x , m y , R). Without loss of generality, we assume m y ≥ m x . Then, the probability that
∏m x −1
a randomly chosen G has full rank i.e., R = m x is given by i =0 (1 − 2i −m y ) .
PROOF: Note that the number of 1 × m y matrices with R = 1 is 2m y − 1 due to the fact that
any non-zero vector has rank one. By induction, the number of ways for extending a (m x −
9
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
M (m x , m y , m x ) = M (m x − 1, m y , m x − 1)(2m y − 2m x −1 ) (3.1)
Since M (1, m y , 1) = 2m y −1 and using the recursive relationship above, we will end up with
the following probability,
∏m x −1
M (m x , m y , m x ) M (m x , m y , m x ) (2m y − 2i ) x −1
m∏
i =0
= = = (1 − 2i −m y ) (3.2)
All possible binary matrices 2m x m y 2m x m y i =0
∏
k−1 ( )
(1 − 2i −n ) ≈ 1 − 2−n + 21−n + · · · + 2k−1−n (3.3)
i =0
1 − 2k
= 1 − 2−n = 1 − 2−n (2k − 1) (3.4)
1−2
≈ 1 − 2k−n (3.5)
Thus, if we let γ = 2k−n , the probability that G does not have a full rank and hence is
not invertible is given by γ. We note that this quantity is exponentially related to the extra
redundancy n − k, needed to achieve a reliable operation. Reading this conclusion reversely,
it says that the number of bits required to have a success probability of 1 − γ is given by n ≈
k + log2 (1/γ).
As can be seen, the expected number of operations for encoding one bit in case of binary
fountain codes is the average number of bits XOR-ed to generate a coded bit. Since the entries
of G are selected to be one or zero with half probability, the expected encoding cost per bit
is O(k). If we generate n bits, the total expected cost will be O(nk). The decoding cost is
the inversion of G and the multiplication of the inverse with the received word. The cost
of the matrix inversion is in general requires an average of O(n 3 ) operations and the cost of
multiplying the inverse is O(k 2 ) operations. As can be seen the random code so generated
performs well with exponentially decaying error probability, yet its encoding and decoding
complexity is high especially for long block lengths.
A straightforward balls and bins argument might be quite useful to understand the dy-
namics of edge connections and its relationship to the performance [2]. Suppose we have k
bins and n balls to throw into these bins. A throw is performed independent of the successive
throws. One can wonder what is the probability that one bin has no balls in it after n balls
are thrown. Let us denote this probability by ω. Since balls are thrown without making any
distinction between bins, this probability is given by
( )
1 n
ω = 1− ≈ e −n/k (3.6)
k
for large k and n. Since the number of empty bins is binomially distributed with parameters
k and p, the average number of empty bins is kω = ke −n/k . In coding theory, a convention
10
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
for decoder design is to declare a failure probability γ > 0 beyond which the performance is
not allowed to degrade. Adapting such a convention, we can bound the probability of having
at least one bin with no balls (= Σk ) by γ. Therefore, we guarantee that every single bin is
covered with at least one ball with probability greater than 1 − γ. We have,
( )k
Σk = 1 − 1 − e −n/k < γ (3.7)
For large n and k, Σk ≈ ke −n/k < γ. This implies that n > k ln(k/γ) i.e., the number of balls
must be at least scaling with k multiplied by the logarithm of k. This result has close con-
nections to the Coupon collector’s problem of the probability theory, although the draws in
the original problem statement have need made with replacement. Finally, we note that this
result establishes an information theoretic lower bound on n for fountain codes constructed
as described above.
Exercise 1: Let any length-k sequence of Fk2 be equally probable to be chosen. The
nonzero entries of any element of Fk2 establishes the indexes of message symbols that con-
tribute to the generated coded symbol. What is the degree distribution in this case i.e., Ω(x)?
Is there a closed form expression for Ω(x)?
Exercise 2: Extend the result of Exercise 1 for Fkq .
x −1
m∏
q m x −m y
q m x −m y −1 ≤ 1 − (1 − q i −m y ) < (3.9)
i =0 q −1
∏m x −1
PROOF: The lower bound follows from the observation that 1 − q m x −1−m y ≥ i =0 (1 −
q i −m y ) due to each term in the product is ≤ 1. The upper bound can either be proved by in-
duction similar to theorem 1, as given in [11] or using union bound arguments as in theorem
3.1 of [35]. Here what is interesting is the upper as well as the lower bounds are independent
of m x or m y but depends only on the difference m y − m x .
11
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
0
10
q=2
−5
q=4
−10
10
q=8
−15
10
Figure 3.1: Lower and upper bounds on the block ML decoding failure of dense random linear
fountain codes, defined over Fq for q = 2, 4, 8, 64, 256.
The bounds of theorem 2 are depicted in Fig. 3.1 as functions of m y −m x , i.e., the extra re-
dundancy. As can be seen bounds converge for large q and significant gains can be obtained
over the dense fountain codes defined over F2 . These performance curves can in a way be
thought as the lower bounds for the rest of the fountain code discussion of this document.
In fact, here we characterize what is achievable without thinking about the complexity of the
implementation. Later, we shall mainly focus on low complexity alternatives while targeting
a performance profile close to what is achievable.
The arguments above and of previous section raises curiosity about the symbol-level ML
decoding performance of dense random fountain codes i.e., a randomly generated dense GT
matrix of size n × k. Although exact formulations might be cumbersome to find, tight upper
and lower bounds have been developed in the past. Let us consider it over the binary field
for the moment and let p 0 be the probability of selecting any entry of GT to be one. Such
an assumption induces a probability distribution on the check node degrees of the fountain
code. In fact it yields the following degree distribution,
( )
∑ k
d
∑ k k d
Ωd x = p 0 (1 − p 0 )k−d x d (3.10)
d =0 d =0 d
12
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
If n independently generated coded symbols are collected for decoding, the probability
that none of them are generated using the value of message symbol v shall be bounded above
and below by
( )n
′
Ω (1)n
− k−Ω Ω′ (1) Ω′ (1)n
e ′ (1)
≤ 1− ≤ e− k (3.12)
k
Exercise 3: Show the lower bound of equation (3.12). Hint: You might want to consider
the Taylor series expansion of ln(1 − x).
If we let 0 < h 0 < 1 be the denseness parameter such that Ω′ (1) = h 0 k, the lower bound will
be of the form (1 − h 0 )n ≈ 1 − h 0 k for h 0 << 1. In otherwords, larger h 0 means sharper fall off
(sharper slope) i.e., improved lower bound. This might give us a hint that denser generator
matrices shall work very well under optimal (ML) decoding assumption. We will explore next
an upper bound on the symbol level performance of ML decoding for fountain codes. It is
ensured by our previous argument that equation (3.12) is a lower bound for the ML decoding.
Following theorem from [12] establishes an upper bound for ML decoding,
Theorem 3: For a fountain code of length k with a degree distribution Ω(x) and collected
number of coded symbols n, symbol level ML decoding performance can be upper bounded by
( )[ ∑ (l )( k−l ) ]n
∑ k k −1 ∑ s=0,2,...,min{l ,⌊d ⌋even } s d −s
Ωd (k ) (3.13)
l =1 l − 1 d d
where GT x = 0 indicates that i -th row of G is dependent on a subset of rows of G and hence
causing the rank to be less than k. We go from (3.14) to (3.15) using the union bound of events.
Note that each column of G is independently generated i.e., different realizations of a vector
random variable w ∈ F 2k . Therefore, we can write
{ } ( { })n
P r GT x = 0 = P r wT x = 0 (3.16)
13
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
message symbols U1
1 0 1 0 0 1
0 1 0 1 1 0
graph G G= 0 0 1 0 0 0
1 0 1 0 0 0
0 0 0 0 1 1
coded symbols U2
If we average over Ω(x) and all possible choices of x with x i = 1 and weight(x)=l , using
equation (3.16) we obtain the desired result.
In delay sensitive transmission scenarios or storage applications (such as email trans-
actions), the use of short block length codes is inevitable. ML decoding might be the only
viable choice in those cases for a reasonable performance. The results of theorem 3 is ex-
tended in [13] to q-ary random linear fountain codes and it is demonstrated that these codes
show excellent performance under ML decoding. The tradeoff between the density of the
generator matrix (also the complexity of decoding) and the associated error floor is investi-
gated. Although the complexity of ML decoding might be tolerable for short block lengths, it
is computationally prohibitive for long block lengths (k large). In order to allow low decoding
complexity with increasing block length, an iterative scheme called Belief Propagation (BP)
algorithm is used.
14
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
node in U1 to another node in U2 . In fact, any block code can be represented using a bipartite
graph, however this representation is particularly important if the code has sparse generator
or parity check matrices. Let us consider Fig. 3.2. As can be seen, a coded symbol is generated
by adding (XOR-ing for binary domain) a subset of the message symbols. These subsets are
indicated by drawing edges between the coded symbols (check nodes) and the message sym-
bols (variable nodes) making up all together a bipartite graph named G. The corresponding
generator matrix is also shown for a code defined over F62 . The BP algorithm has the prior in-
formation about the graph connections but not about the message symbols. A conventional
way to transmit the graph connections to the receiver side is by way of a pseudo random
number generators fed with one or more seed numbers. The communication is reduced to
communicating the seed number which can be transferred easily and reliably. In the rest of
our discussions, we will assume this seed is reliably communicated.
BP algorithm can be summarized as follows,
———————————
• Step 1: Find a coded symbol of degree-one. Decode the unique message symbol that is con-
nected to this coded symbol. Next, remove the edge from the graph. If there is no degree-one
coded symbol, the decoder cannot iterate further and reports a failure.
• Step 2: Update the neighbors of the decoded message symbol based on the decoded value i.e.,
each neighbor of the decoded message symbol is added (XOR-ed in binary case) the decoded
value. After this update, remove all the neighbors and end up with a reduced graph.
• Step 3: If there are unrecovered message symbols, continue with the first step based on the
reduced graph. Else, the decoder stops.
———————————
As is clear from the description of the decoding algorithm, the number of operations is
related to the number of edges of the bipartite graph representation of the code, which in
turn is related to the degree distribution. Therefore, the design of the fountain code must
ensure a good degree distribution that allows low complexity (sparse G matrix) and low failure
probability at the same time. Although these two goals might be conflicting, the tradeoff can
be solved for a given application.
It is best to describe the BP algorithm through an example. In Fig. 3.3, an example for
decoding the information symbols as well as edge removals (this is alternatively called graph
pruning process) are shown in detail. As can be seen in this example, in each decoding step
only one information symbol is decoded. In fact for BP algorithm to continue, we must have
at least one degree-one coded symbol in each decoding step. This implies, the least amount
of message symbols to be decoded in each iteration is one if BP algorithm is successful. The
set of degree-one message symbols in each iteration is conventionally named as the ripple
[9].
Luby proposed an optimal degree distribution Ω(x) (optimal in expectation) called Soli-
ton Distribution. In Soliton distribution, the expected ripple size is one i.e., the optimal ripple
size for BP algorithm to continue4 . Since it is only expected value, in reality it might be very
4 Here, we call it optimal because although having ripple size greater than one is sufficient for BP algorithm to
continue iterations, the number of extra coded symbols needed to decode all of the message symbols generally
increase if the ripple size is greater than one.
15
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
likely that the ripple size is zero at any point in time. Apparently, degree distributions that
show good performance in practice are needed.
In order to derive the so called Soliton distribution, let us start with a useful lemma.
∑
Lemma 1: Let Ω(x) = d Ωd x d be a generic degree distribution and f (x) is some monotone
increasing function of the discrete variable x > 0 i.e. f (x − 1) < f (x) for all x. Then,
( )∑ −2 (
d∑ f (x − 1)
)j
d −2
= f (x) − f (x − 1) d Ωd f (x) (3.24)
d j =0 f (x)
( )∑
< f (x) − f (x − 1) d (d − 1)Ωd f (x)d −2 (3.25)
d
( )
= f (x) − f (x − 1) Ω′′ ( f (x)) (3.26)
16
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
which proves the inequality. The equality will only hold asymptotically as will be explained
∑ −2 ( f (x−1) ) j
shortly. Note here that since by assumption f (x − 1) < f (x) for all x, we have dj =0 f (x) <
d − 1 used to establish the inequality above.
If we let f (x) = x/k for k → ∞ and ∆x = f (x) − f (x − 1) = 1/k we will have
which along with Lemma 1 shows that there are cases the equality will hold, particularly for
asymptotical considerations. In this case therefore, we have for k → ∞
1 ′′
Ω′ (x/k) − Ω′ ((x − 1)/k) ⇒ Ω (x/k) (3.28)
k
This expression shall be useful in the following discussion.
In the process of graph pruning, when the last edge is removed from a coded symbol, that
coded symbol is said to be released from the rest of decoding process and no longer used by
the BP algorithm. We would like to find the probability that a coded symbol of initial degree d
is released at the i -th iteration. For simplicity, let us assume edge connections are performed
with replacement (i.e., easier to analyze) although in the original LT process, edge selections
are performed without replacement. The main reason for this assumption is that in the limit
(k → ∞) both assumptions probabilistically converge to one another. In order to release a
degree-d symbol, it has to have exactly one edge connected to k −i unrecovered symbols after
the iteration i , and not all the remaining d − 1 edges are connected to i − 1 already recovered
message symbols. This probability is simply given by
( )( ) (( )d −1 ( ) )
d k −i i i − 1 d −1
− (3.29)
1 k k k
in which a message symbol can be chosen in d different ways from k − i message symbols.
This is illustrated in Fig. 3.4. Since we need to have at least one connection with the “red"
message symbol of Fig. 3.4, we subtract the probability that all the remaining edges make
connection with the already recovered i − 1 message symbols. If we average over the degree
distribution we obtain the probability of releasing a coded symbol at iteration i , P k (i ) as fol-
lows,
( ) (( )d −1 ( ) )
∑ k −i i i − 1 d −1 ( )
P k (i ) = Ωd d − = (1 − i /k) Ω′ (i /k) − Ω′ ((i − 1)/k) (3.30)
d k k k
Asymptotically, we need to collect only k coded symbols to reconstruct the message block.
The expected number of coded symbols the algorithm releases at iteration i is therefore k
times P k and is given by (using the result of Lemma 1 and equation (3.28))
1
kP k = k (1 − i /k) Ω′′ (i /k) = (1 − i /k) Ω′′ (i /k) (3.31)
k
17
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
i-1 k-i
Figure 3.4: A degree release at the i -th iteration of a coded symbol of degree d .
which must be equal to 1 because at each iteration ideally one and only one coded symbol is
released and at the and of k iterations all message symbols are decoded successfully. If we set
y = i /k, for 0 < y < 1, we will have the following differential equation to solve
The general solution to this second order ordinary differential equation is given by
with the initial conditions Ω(0) = 0 and Ω(1) = 1 (due to sum of probabilities must add up to
∑ yd
unity) to find c 1 = 1 and c 2 = 0. Using the series expansion for ln(1− y) = − ∞d =1 d
, we obtain
the sum i.e., the limiting distribution,
∑∞ yd ∑∞ y d +1
Ω(y) = (1 − y) ln(1 − y) + y = − + +y (3.34)
d =1 d n=1 d
∑∞ yd ∑∞ yd
= − + (3.35)
d =2 d n=2 d − 1
∑
∞ yd ∑
= = Ωd y d (3.36)
d =2 d (d − 1) d
from which we see in the limiting distribution Ω1 = 0 and therefore, the BP algorithm cannot
start decoding. A finite length analysis (assuming selection of edges without replacement)
show that [9] the distribution can be derived to be of the form
y ∑ k yd
Ω(y) = + (3.37)
k d =2 d (d − 1)
which is named as Soliton distribution due to its resemblance to physical phenomenon known
as Soliton waves. We note that Soliton distribution is almost exactly the same as the limiting
distribution of our analysis. This demonstrates that for k → ∞, Soliton distribution converges
to the limiting distribution.
Lemma 2: Let Ω(x) be a Soliton degree distribution. The average degree of a coded symbol
is given by O(ln(k)) .
18
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
∑
k ∑
k ∑
k−1 ∑
k
Ω′ (1) = d Ωd = 1/k + 1/(d − 1) = 1/k + 1/d = 1/d = ln(k) + 0.57721 + πk (3.38)
d =1 d =2 d =1 d =1
In this description, the average number of degree-i coded symbols is set to k (Ωi + τi ).
Thus, the average number of coded symbols are given by
( )
∑ k ∑ R
k/R−1
n=k Ω d + τd = k+ + R ln(R/γ) (3.39)
d =1 d =1 i
≈ k + R ln(k/R − 1) + R ln(R/γ) (3.40)
≤ k + R ln(k/R) ln(R/γ) = k + R ln(k/γ) (3.41)
p
= k + c ln2 (k/γ) k (3.42)
Exercise 4: Show that the average degree of a check symbol using RSD is µ′ (1) = O(ln(k/γ)).
Hint: µ′ (1) ≤ ln(k) + 2 + ln(R/γ).
19
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
root v
root v
level 0
OR node
AND node
A local subgraph Gl
level 1
level 2
level 3
Exercise 4 shows that considering the practical scenarios that RSD is expected to perform
better than the Soliton distribution, yet the average number of edges (computations) are still
on the order of k ln(k/γ) i.e., not linear in k. Based on our previous argument of lemma
2 and the information theoretic lower bound, we conclude that there is no way (no degree
distribution) to make encoding/decoding perform linear in k if we impose the constraint of
negligible amount of decoder failure. Instead, as will be shown, degree distributions with a
constant maximum degree allow linear time operation. However, degree distributions that
have a constant maximum degree results in a decoding error floor due to the fact that with
this degree distribution only a fraction of message symbols can be decoded with vanishing
error probability. This led research community to the idea of concatenating LT codes with
linear time encodable erasure codes to be able to execute the overall operation in linear time.
This will be explored in Section 4.
20
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
only send “one" i.e., they can be decoded if any one of the check symbols send “one". Simi-
larly, check nodes send “one" if and only if all of the other adjacent variable nodes send “one"
i.e., that check node can decode the particular variable node if all of the other variable nodes
are already decoded and known.
Suppose OR nodes select i children to have with probability αi whereas AN D nodes
select i children to have with probability βi . These induce two probability distributions
∑L α ∑L β
α(x) = i =0 αi x i and β(x) = i =0 βi x i associated with the tree G l , where L α and L β are the
constant maximum degrees of these probability distributions. Note that the nodes at depth
2l are leaf nodes and do not have any children.
The process starts with by assigning the leaf nodes a “0" or a “1" independently with
probabilities y 0 or y 1 , respectively. We think of the tree G l as a boolean circuit consisting of
OR and AN D nodes which may be independently short circuited with probabilities a and b,
respectively. OR nodes without children are assumed to be set to “0" and AN D nodes without
children are assumed to be set to “1". We are interested in the probability of the root node
being evaluated to 0 at the end of the process. This is characterized by the following popular
Lemma [15].
Lemma 3: (The And-Or tree lemma) The probability y l that the root of G l evaluates to 0 is
y l = f (y l −1 ), where y l −1 is the probability that the root node of a G l −1 evaluates to 0, and
∑
Lα ∑
Lβ
f (x) = (1 − a)α(1 − (1 − b)β(1 − x)), α(x) = αi x i and β(x) = βi x i (3.43)
i =0 i =0
PROOF: Proof is relatively straightforward as can be found in many class notes given for
LDPC codes used over BECs. Please also see [14] and [15].
In the decoding process, the root node of G l corresponds to any variable node v and y 0 = 1
(or y 1 = 0) i.e., the probability of having zero in each leaf node is one, because in the beginning
of the decoding, no message symbol is decoded yet. In order to model the decoding process
via this And-Or tree we need to compute the distributions α(x) and β(x) to stochastically
characterize the number of children of OR and AND nodes. Luckily, this computation turns
out to be easy and it corresponds to the edge perspective degree distributions of standard
LDPC codes [28].
We already know the check symbol degree distribution Ω(x). The following argument
∑
establishes the coefficients of the variable node degree distribution λ(x) = n−1 λ
d =0 d +1
x d +1 .
We note that the average number of edges in G is Ω′ (1)n and for simplicity we assume that
the edge selections are made with replacement (remember that this assumption is valid if
k → ∞). Since coded symbols choose their edges uniform randomly from k input symbols,
any variable node v having degree d is given by the binomial distribution expressed as,
( )
Ω′ (1)n ′
λd = (1/k)d (1 − 1/k)Ω (1)n−d (3.44)
d
21
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
∑
Now for convenience, we perform a change of variables and rewrite Ω(x) = k−1 Ω
d =0 d +1
x d +1 ,
then the edge perspective distributions are given by (borrowing ideas from LDPC code dis-
cussions)
λ′ (x) Ω′ (x)
α(x) = and β(x) = (3.46)
λ′ (1) Ω′ (1)
and
1 ∑
k−1
β(x) = (d + 1)Ωd +1 x d (3.50)
Ω′ (1) d =0
Note that in a standard LT code, Ω(x) determines the message node degree distribution
λ(x). Thus, knowing Ω(x) is equivalent to knowing the asymptotic performance of the LT
code using the result of lemma 3.
Here one may be curious about the value of the limit liml →∞ y l . The result of this limit is
the appropriate unique root (i.e., the root 0 ≤ x ∗ ≤ 1) of the following equation.
where if we insert our α(x) and β(x) as found in equations (3.49) and (3.50), we have
If we think of the limiting distribution Ωd = 1/(d (d −1)) for d > 1 for some large maximum
degree (to make Ω′ (1) a constant) and zero otherwise, it is easy to see that we can satisfy the
equality asymptotically with y → 1. This means at an overhead of ϵ → 0, we have x → 0.
Therefore the zero failure probability is possible in the limit. The letter also establishes that
LT codes used with Soliton distribution is asymptotically optimal.
If Ωd = 0 for some d > N ∈ N, then it is impossible to satisfy the equality above with
y = 1. This condition is usually imposed to have linear time encoding/decoding complexity
for sparse fountain codes. Thus, for a given ϵ and N , we can solve for y ∗ = 1−x ∗ < 1 (y cannot
22
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
0
10
−1
10
−3
10
−4
10
−5 Asym. BP performance
10
Upper Bound (k=500)
Lower Bound (k=500)
−6
10
0.9 1 1.1 1.2 1.3 1.4 1.5
1+ε
be one in this case because if y = 1, right hand side of equation (3.54) will diverge whereas
the left hand side will have a finite value.) and hence 1 − y ∗ shall be the limiting value of y l
as l tends to infinity. This corresponds to an error floor issue of the LT code using a Ω(x) as
described.
Let us provide an example code with k = 500, an LT code with the following check symbol
node degree distribution from [17] is used with L β = 66,
e
Ω(x) = 0.008x + 0.494x 2 + 0.166x 3 + 0.073x 4
+0.083x 5 + 0.056x 8 + 0.037x 9 + 0.056x 19 + 0.025x 65 + 0.003x 66 (3.55)
where the average degree per node is Ω e ′ (1) =≈ 5.85 which is independent of k. Therefore, we
have a linear time encodable sparse fountain code with a set of performance curves shown in
Fig. 3.6. As can be seen, asymptotical BP algorithm performance gets pretty close to the lower
bound of ML decoding (It is verifiable that these bounds for ML decoding do not change too
much for k > 500). Note the fall off starts at ϵ = 0. If we had an optimal (MDS) code, we would
have the same fall off but with zero error probability for all ϵ > 0. This is not the case for linear
time encodable LT codes.
23
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
time, the ripple size evolves as well. It can enlarge or shrink depending on the number of
coded symbols released at each iteration. The evolution of the ripple can be described by a
simple random walk model [9], in which at each step the location jumps to next state either in
positive or negative direction with probability 1/2. In a more formal definition, let {Zi , i ≥ 1}
define the sequence of independent random variables, each taking on either 1 or -1 with
∑
equal probability. Let us define S k = kj=1 Z j to be the random walker on Z. At each time
step, the walker jumps one state in either positive or negative direction. It is easy to see that
E(S k ) = 0. By considering/recognazing E(Zk2 ) = 1 and E(S k2 ) = k, we can argue that E(|S k |) =
p
O( k). Moreover, using diffusion arguments we can show that [16]
√
E(|S k |) 2
lim p = (3.56)
k→∞ k π
√
from which we deduce that the expected ripple size better be scaling with ∼ O( 2k π ) as k → ∞
and the ripple size resembles to the evolution of a random walk. This resemblance is quite
interesting and the connections with the one dimensional diffusion process and Gaussian
approximation can help us understand more about the ripple size evolution.
From lemma 3 and subsequent analysis of the previous subsection, the standard tree
analysis allows us to compute the probability of the recovery failure of an input symbol at the
l -th iteration by
Assuming that the input symbols are independently recovered with probability 1 − y l
at the l -th iteration (i.e., the number of recovered symbols are binomially distributed with
parameters k and 1 − y l ), expected number of unrecovered symbols are k y l and k y l +1 =
′ ′ ′
ke −Ω (1)(1+ϵ)(β(1−y l )) = ke −Ω (1−y l )(1+ϵ)) , respectively, where we used the result that β(x) = Ω (x)
Ω′ (1) .
′
For large k, the expected ripple size is then given by y l k − y l +1 k = k(y l − e −Ω (1−y l )(1+ϵ)) ). In
general, we express an x-fraction of input symbols unrecovered at some iteration of the BP
algorithm. With this new notation, the expected ripple size is then given by
( ′
)
k x − e −Ω (1−x)(1+ϵ)) (3.59)
Finally, we note that since the actual number of unrecovered symbols converges to its ex-
pected value in the limit, this expected ripple size is the actual ripple size for large k 5 . Now,
let us use our previous random walker argument for the expected unrecovered symbols xk
such that we make sure pthe expected deviation from the mean of the walker (ripple size in
our context) is at least 2xk/π. Assuming that the decoder is required to recover 1 − γ frac-
tion of the k message symbols with an overwhelming probability, then this can be formally
expressed as follows,
√
−Ω′ (1−x)(1+ϵ)) 2x
x −e ≥ (3.60)
kπ
5 This can be seen using standard Chernoff bound arguments for binomial/Poisson distributions.
24
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
for x ∈ [γ, 1]. From equation (3.60), we can lower bound the derivative of the check node
degree distribution,
√
2x
− ln(x − kπ )
Ω′ (1 − x) ≥ (3.61)
1+ϵ
for x ∈ [γ, 1], or equivalently
√
2x
− ln(1 − x − kπ )
′
Ω (x) ≥ (3.62)
1+ϵ
for x ∈ [0, 1 − γ] as derived in [17]. Let us assume that the check node degree distribution is
given by the limiting distribution derived earlier i.e., Ω(x) = (1 − x) ln(1 − x) + x for x ∈ [0, 1]
and γ = 0. It remains to check,
√
(1+ϵ) 2x
(1 − x) +x + ≤1 (3.63)
kπ
If we assume k → ∞, we shall have
(1 − x)(1+ϵ) ≤ 1 − x (3.64)
which is always true for x ∈ [0, 1] and for any positive number ϵ > 0. Thus, the limiting dis-
tribution conforms with the assumption that the ripple size is evolving according to a simple
random walker. For a given γ, a way to design a degree distribution satisfying equation (3.62)
is to discretize the interval [0, 1 − γ] using some ∆γ > 0 such that we have a union of multiple
disjoint sets, expressed as
∪
(1−γ)/∆γ−1 [ )
[0, 1 − γ] = i ∆γ (i + 1)∆γ , (3.65)
i =0
We require that equation (3.62) holds at each discretized point in the interval [0, 1 − γ],
which eventually gives us a set of inequalities involving the coefficients of Ω(x). Satisfying
these set of inequalities we can find possibly more than one solution for {Ωd , d = 1, 2, . . . , k}
from which we choose the one with minimum Ω′ (1). This is similar to the linear program
whose procedure is outlined in [17], although the details of the optimization is omitted.
We evaluate the following inequality at M = (1 − γ)/∆γ + 1 different discretized points,
( √ )
2x
∑
F ln 1 − x − kπ
−Ω′ (x) = c d x d −1 ≤ (3.66)
d =1 1+ϵ
25
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
k 4096 8192
Ω1 0.01206279868062 0.00859664884231
Ω2 0.48618222931140 0.48800207839031
Ω3 0.14486030215468 0.16243601073478
Ω4 0.11968155126998 0.06926848659608
Ω5 0.03845536920060 0.09460770077248
Ω8 0.03045905002768
Ω9 0.08718444024457 0.03973381508374
Ω10 0.06397077147921
Ω32 0.08111425911047
Ω34 0.06652107350334
Ω35 0.00686341459082
ϵ 0.04 0.03
′
Ω (1) 5.714 5.7213
Note that the condition Ω(1) = 1 imposes a tighter constraint than does lc ≤ c ≤ 0 and is
needed to assure we converge to a valid probability distribution Ω(x). The degree distribution
in equation (3.55) is obtained using a similar optimization procedure for k = 65536 and ϵ =
0.038 in [17]. Other constraints such as Ω′′ (0) = 1 are possible based on the design choices
and objectives.
Let us choose the parameter set {γ = 0.005, ∆γ = 0.005} and various k and corresponding ϵ
values as shown in Table 3.1. The results are shown for k = 4096 and k = 8912. As can be seen
the probabilities resemble to a Soliton distribution whenever Ωi is non-zero. Also included
are the average degree numbers per coded symbol for each degree distribution.
6 A M ATLAB implementation of this linear program can be found at ht t p : //sua ybar sl an.com/opt LT d d .t xt .
26
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
Note that p j ,i are design parameters of the system, subject to optimization. For conve-
27
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
nience, authors denote the proposed selection distribution in a matrix form as follows:
p 1,1 p 1,2 ... p 1,k
p 2,1 p 2,2 ... p 2,k
.. .. ..
Pr ×k =
. . ... .
p r −1,1 p r −1,2 ... p r −1,k
p r,1 p r,2 ... p r,k
Since the set of probabilities in each column sums to unity, the number of design parame-
ters of Pr ×k is (r −1)×k. Similarly, the degree distribution can be expressed in a vector form as
Ωk , where the i th vector entry is the probability that a coded symbol chooses degree i i.e., Ωi .
Note Ωk and Pr ×k completely determine the performance of the proposed generalization.
In the BP algorithm, we observe that not all the check nodes decode information symbols
at each iteration. For example, degree-one check nodes immediately decode neighboring in-
formation symbols at the very first iteration. Then, degree two and three check nodes recover
some of the information bits later in the sequence of iterations. In general, at the later up-
date steps of iterations, low degree check nodes will already be released from the decoding
process, and higher degree check nodes start decoding the information symbols (due to edge
eliminations). So the coded symbols take part in different stages of the BP decoding process
depending on their degree numbers.
UEP and URT is achieved by allowing coded symbols to make more edge connections
with more important information sets. This increases the probability of decoding the more
important symbols. However, coded symbols are able to decode information symbols in dif-
ferent iterations of the BP depending on their degree numbers. For example, at the second
iteration of the BP algorithm, the probability that degree-two coded symbols decode infor-
mation symbols is higher than that of coded symbols with degrees larger than two7 . If the BP
algorithm stops unexpectedly at early iterations, it is essential that the more important in-
formation symbols are recovered. This suggests that it is beneficial to have low degree check
nodes generally make edge connections with important information sets. That is the idea
behind this generalization.
In the encoding process of the generalization for EWF codes, after choosing the degree
number for each coded symbol, authors select the edge connections according to a distribu-
tion given by
Definition 3: Generalized Window Selection Distribution.
∑
• For i = 1, . . . , k, let L i (x) = rj =1 γ j ,i x j where γ j ,i ≥ 0 is the conditional probability
of choosing the j -th window W j , given that the degree of the coded symbol is i and
∑r
j =1 γ j ,i = 1.
Similar to the previous generalization, γ j ,i are design parameters of the system, subject
to optimization. For convenience, we denote the proposed window selection distribution in
7 This observation will be quantified in Section 5 by drawing connections to random graph theory.
28
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
The set of probabilities in each column sums to unity, and the number of design parame-
ters of Lr ×k is again (r − 1) × k. Similarly, we observe that Ωk and Lr ×k completely determine
the performance of the proposed generalization of EWF codes. Authors proposed a method
for reducing the set of parameters of these generalizations for a progressive source transmis-
sion scenario. We refer the interested reader to the original studies [22] and [23].
29
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
k message symbols
Precoding
k’ intermediate symbols
LT-coding
Figure 4.1: Concatenated fountain codes consist of one or more precoding stages before an
LT type encoding at the final stage of the encoder is used.
In the past (pioneering works include [14] and [17]), asymptotically good (but not neces-
sarily optimal) check node degree distributions that allow linear-time LT encoding/decoding
are proposed based on the Soliton distribution. In otherwords, such good distributions are
developed based on an instance of a Soliton distribution with a maximum degree F . More
∑ ∑
specifically let us assume Π(x) = Fd =1 η d x d = 1/F + Fd =2 d (d1−1) x d be the modified Soliton
distribution assuming that the source block is of size > F (from equation (3.37)). Both in On-
line codes [14] and Raptor codes [17], a generic degree distribution of the following form is
assumed,
∑
F
ΩF (x) = cd ηd x d (4.2)
d =1
Researchers designed F and the coefficient set {c i }Fi=1 such that from any (1 + ϵ)k coded
symbols, all the message symbols but a γ fraction can correctly be recovered with overwhelm-
ing probability. More formally, we have the following theorem that establishes the conditions
for the existence of asymptotically good degree distributions.
Theorem 4: For any message of size k blocks and for parameters ϵ > 0 and a BP decoder
error probability γ > 0, there exists a distribution ΩF (x) that can recover a 1 − γ fraction of the
original message from any (1 + ϵ)k coded symbols in time proportional to k ln(g (ϵ, γ)) where
the function g (ϵ, γ) depends on the choice of F .
The proof of this theorem depends on the choice of F as well as the coefficients {c i }Fi=1 .
For example, it is easy to verify that for online codes these weighting coefficients are given
F +1
by c 1 = (ϵF − 1)/(1 + ϵ) and c i = (1+ϵ)(F −1) for i = 2, 3, . . . , F . Similarly, for Raptor codes it can
30
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
µF 1 F
be shown that c 1 = µ+1 , c i = µ+1 for i = 2, 3, . . . , F − 1 and c F = µ+1 for some µ ≥ ϵ. In that
respect, one can see the clear similarity of two different choices of the asymptotically good
degree distribution. This also shows the non-uniqueness of the solution established by the
two independent studies in the past.
PROOF of Theorem 4: Since both codes show the existence of one possible distribution
that is a special case of the general form given in equation (4.2), we will give one such appro-
priate choice and then prove this theorem by giving the explicit form of g (ϵ, γ). Let us rewrite
the condition of (4.1) using the degree distribution ΩF (x),
Let us consider the coefficient set i.e., c 1 and c i = c, i.e., some constant c for i = 2, 3, . . . , F .
In fact by setting ΩF (1) = 1, we can express c = FF−c 1
−1 . We note again that such a choice might
not be optimal but sufficient to prove the asymptotical result.
c 1 F − c 1 F∑−1 x i
Ω′F (x) = + (4.5)
F F − 1 i =1 i
( )
c1 F − c1 ∑ ∞ xi ∑
∞ xi
= + − (4.6)
F F − 1 i =1 i i =F i
( )
c1 F − c1 ∑∞ xi
= − ln(1 − x) + (4.7)
F F −1 i =F i
or equivalently,
( )
c1 F − c1 ∑ ∞ xi F − c1 1
− > − ln(1 − x) (4.9)
F F − 1 i =F i F −1 1+ϵ
Assuming that the righthand side of the inequality (4.9) is negative, with x ∈ (0, 1 − γ] we
can upper bound c 1 as follows
ϵF + 1
> c1 (4.10)
ϵ+1
and similarly, assuming that the lefthand side of the inequality (4.9) is positive, we can lower
bound c 1 as follows,
∑∞
F2 x i /i
i =F
c1 > ∑∞ i (4.11)
F − 1 + F i =F x /i
31
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
For example, for online codes, c 1 = (ϵF −1)/(1+ϵ) and for Raptor codes, c 1 = ϵF /(1+ϵ) are
selected where both choices satisfy the inequality (4.10). If we set c 1 = (ϵF − 1)/(1 + ϵ), using
the inequality (4.11) we will have for large F ,
(ϵF − 1)(F − 1) ϵF − 1 ∑ ∞ xi
≈ > (4.12)
F (F + 1) F i =F i
Also if we set c 1 = ϵF /(1 + ϵ), we similarly will reach at the following inequality
ϵ(F − 1) ∑ ∞ xi
> (4.13)
F i =F i
From the inequalities (4.12) and/or (4.13), we can obtain lower bounds on F provided that
∑∞ i
i =F x /i < ξ,
1 ϵ
F> or F > , (4.14)
ϵ−ξ ϵ−ξ
ln(γ)+ln(ϵ/2)
respectively. For online codes, the author set F = ln(1−γ) whereas the author of Raptor
codes set
⌊ ⌋ ϵ/2
F = 1/γ + 1 + 1 with γ = (4.15)
1 + 2ϵ
Therefore given ϵ, γ and x ∈ (0, 1−γ], as long as the choices for F comply with the inequal-
ities (4.14), we obtain an asymptotically good degree distribution that proves the result of the
theorem. We note here that the explicit form of g (ϵ, γ) is given by F and its functional form in
terms of ϵ and γ.
Furthermore, using both selections of F will result in the number of edges given by
∑
F kϵ k F∑ −1 1 kc F
k d c d πd ≈ + + (4.16)
d =1 1 + ϵ 1 + ϵ d =2 d − 1 F − 1
≈ k ln(F ) = k ln(g (ϵ, γ)) (4.17)
= O(k ln(1/ϵ)) (4.18)
which is mainly due to the fact that for a given ϵ, F can be chosen in proportional to 1/ϵ to
satisfy the asymptotical condition. This result eventually shows that choosing an appropriate
F , the encoding/decoding complexity can be made linear in k.
4.2 P RECODING
Our main goal when designing good degree distributions for concatenated fountain codes
was to dictate a ΩF (x) on the BP algorithm to ensure the recovery of (1 − γ) faction of the in-
termediate message symbols with overwhelming probability. This shall require us to have an
(1 − γ) rate (k ′ , k) MDS type code which can recover any k ′ − k or less erasures with proba-
bility one where γ = 1 − k/k ′ . This is in fact the ideal case (capacity achieving) yet lacks the
ease of implementation. For example, we recall that the best algorithm that can decode MDS
32
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
Tornado Code
Sparse Precoding of
graph Message symbols
Online Codes
Check symbols
q=2
Sparse
graph
k message symbols
Dense
graph
Sparse
graph
An MDS
erasure code
Figure 4.2: A tornado code consists of a cascade of sparse graphs followed by a dense graph
with the MDS property.
Tornado codes [26] are systematic codes, consisting of m + 1 stages of parity symbol gen-
eration process, where in each stage β ∈ (0, 1) times of the previous stage symbols are gener-
ated as check symbols. The encoding graph is roughly shown in Fig. 4.2. In stage 0, βk check
symbols are produced from k message symbols. Similarly in stage 1, β2 k check symbols are
generated and so on so forth. This sequence of check symbol generation is truncated by an
MDS erasure code of rate 1 − β. This way, the total number of check symbols so produced are
given by
∑
m βm+1 k βk
βi k + = (4.19)
i =1 1−β 1−β
Therefore with k message symbols, the block length of the code is k + βk/(1 − β) and the
rate is 1 − β. Thus, a tornado code encodes k message symbols into (1 + βp+ β2 + . . . )k coded
symbols. To maintain the linear time operation, authors chose βm k = O( k) so that the last
33
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
MDS erasure code has encoding and decoding complexity linear in k (one such alternative
for the last stage could be a Cauchy Reed-Solomon Code [27]).
The decoding operation starts with decoding the last stage (m-th stage) MDS code. This
decoding will be successful if at most β fraction of the last βm k/(1 − β) check symbols have
been lost. If the m-th stage decoding is successful, βm k check symbols are used to recover
the lost symbols of the (m −1)th stage check symbols. If there exists a right node whose all left
neighbors except single one are known, then using simple XOR-based logic, unknown value
is recovered. The lost symbols of the other stages are recovered using the check symbols of
the proceeding stage in such a recursive manner. For long block lengths, it can be shown
that this 1 − β rate tornado code can recover an average β(1 − ϵ) fraction of lost symbols using
this decoding algorithm with high probability in time proportional to k ln(1/ϵ). The major
advantage of the cascade is to enable linear time operation on the encoding and decoding
algorithms although the practical applications
p use few cascades and thus the last stage input
symbols size is usually greater than O( k). This choice is due to the fact that asymptotical
results assume erasures to be distributed over the codeword uniform randomly. In fact in
some of the practical scenarios, erasures might be quite correlated and bursty.
Exercise 5: Show that the code rate of the all cascading stages but the last stage of the
tornado code has the rate > 1 − β.
The following precoding strategy is proposed within the online coding context [14]. It ex-
hibits similarity to tornado codes and will therefore be considered as a special case of tornado
codes (See Fig. 4.2). The main motivation for this precoding strategy is to increase the num-
ber of edges slightly in each of the cascade such that a single stage tornado code can prac-
tically be sufficient to recover the residual lost intermediate symbols. More specifically, this
precoding strategy encodes k message symbols into (1 + qβ)k coded symbols such that the
original message fails to be recovered completely with a constant probability proportional
to βq . Here, each message symbol has a fixed degree q and chooses its neighbors uniform
randomly from the qβk check symbols. The decoding is exactly the same as the one used for
tornado codes. Finally, It is shown in [14] that a missing random β fraction of the original k
message symbols can be recovered from a random 1 − β fraction of the check symbols with
success probability 1 − βq .
LDPC codes are one of the most powerful coding techniques of our age equipped with
easy encoding and decoding algorithms. Their capacity approaching performance and par-
allel implementation potential make them one of the prominent options for precoding basic
LT codes. This approach is primarily realized with the introduction of Raptor codes in which
the author proposed a special class of (irregular) LDPC precodes to be used with linear time
encodable/decodable LT codes. A bipartite graph also constitutes the basis for LDPC codes.
Unlike their dual code i.e., fountain codes, LDPC code check nodes constrain the sum of the
neighbor values (variable nodes) to be zero. An example is shown in Fig. 4.3. The decoding
algorithm is very similar to BP decoding given for LT codes although here degree-1 variable
node is not necessary for the decoding operation to commence. For BP decoder to continue
at each iteration of the algorithm, there must be at least one check node with at most one
34
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
y0
y1
y 0 + y1 + y2 + y3 = 0
y2
y0 + y2 + y4 + y5 = 0
y3
y 4 + y5 + y6 = 0
y4
y 1 + y3 + y6 = 0
y5
y6
edge connected to an erasure. The details of LDPC codes is beyond the scope of this note, I
highly encourage the interested reader to look into the reference [28] for details.
Asymptotic analysis of LDPC codes reveals that these codes under BP decoding has a
threshold erasure probability ϵ∗0 [10] below which error-free decoding is possible. Therefore,
an unrecovered fraction of γ < ϵ∗0 shall yield an error-free decoding in the context of asymp-
totically good concatenated fountain codes. However, practical codes are finite length and
therefore a finite length analysis of LDPC codes is of great significance for predicting the per-
formance of finite length concatenated fountain codes which use LDPC codes in their pre-
coding stage. In the finite length analysis of LDPC codes under BP decoding and BEC, the
following definition is the key.
Definition 4: (Stopping Sets) In a given bipartite graph of an LDPC code, a stopping set S
is a subset of variable nodes (or an element of the powerset8 of the set of variable nodes) such
that every check node has zero, two or more connections (through the graph induced by S)
with the variable nodes in S.
Since the union of stopping sets is another stoping set, it can be shown that (Lemma 1.1
of [10]) any subset of the set of variable nodes has a unique maximal stopping set (which
may be an empty set or a union of small stoping sets). Exact formulations exist for example
for (l , r )-regular LDPC code ensembles [10]. Based on such, block as well as bit level erasure
rates under BP decoding are derived. This is quantified in the following theorem.
Theorem 5: For a given γ fraction of erasures, (l , r )-regular LDPC code with block length
n has the following block erasure and bit erasure probabilities after BP decoder is run on the
received sequence.
( )
∑ n n i
P bl ock,e = χi γ (1 − γ)n−i (4.20)
i =0 i
8 The power set of any set S is the set of all subsets of S, denoted by P (S), including the empty set as well as S
itself.
35
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
( )
where χi = 1 − N (i ,n/r,0)
T (i ,n/r,0) for i ≤ n/r − 1 and 1 otherwise, whereas
( ) ( )
∑ n n
i n−i
∑i i sO(i , s, n/r, 0)
P bi t ,e = γ (1 − γ) (4.21)
i =0 i s=0 s nT (i , n/r, 0)
where also
( ) ( )
n ∑ i
T (i , n/r, 0) = (i l )!, N (i , n/r, 0) = T (i , n/r, 0) − O(i , s, n/r, 0) (4.22)
il s>0 s
( )
∑ n/r ( )
O(i , s, n/r, 0) = coef ((1 + x)r − 1 − r x) j , x sl (sl )!N (i − s, n/r − j , j r − sl ) (4.23)
j j
The proof of this theorem can be found in [10]. ML decoding performance of general
LDPC code ensembles are also considered in the same study. Later, more improvements have
been made to this formulation for accuracy. In fact, this exact formulation is shown to be a
little underestimator in [29]. The generalization of this method for irregular ensembles is
observed to be hard. Yet, very good upper bounds have been found on the probability that
the induced bipartite graph has a maximal stoping set of size s such as in [17]. Irregular LDPC
codes are usually the defacto choice for precoding for their capacity achieving/fast fall off
performance curves although they can easily show error floors and might be more complex
to implement compared to regular LDPC codes.
Let us begin with giving some background information about Hamming codes before
discussing their potential use within the context of concatenated fountain codes.
B ACKGROUND One of the earliest, well-known linear codes is the Hamming code. Hamming
codes are defined by a parity check matrix and are able to correct single bit error. For any in-
teger r ≥ 2, a conventional Hamming code assume a r × 2r − 1 parity check matrix where
columns are binary representation of numbers 1, . . . , 2r − 1 i.e., all distinct nonzero r −tuples.
Therefore, a binary Hamming code is a (2r − 1, 2r − 1 − r, 3) code for any integer r ≥ 2 defined
over F22 −1 , where 3 denotes the minimum distance of the code. Consider the following Ham-
r
ming code example for r = 3 whose columns are binary representation of numbers 1, 2, . . . , 7,
0 0 0 1 1 1 1
Hham = 0 1 1 0 0 1 1
1 0 1 0 1 0 1 3×7
Since the column permutations does not change the code’s properties, we can have the
following parity check matrix and the corresponding generator matrix,
1 0 0 0 1 1 1
1 1 0 1 1 0 0 0 1 0 0 1
1 0
Hham = 1 1 1 0 0 1 0 ⇐⇒ Gham =
0 0 1 0 0 1 1
1 0 1 1 0 0 1 3×7
0 0 0 1 1 0 1 4×7
36
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
from which we can see the encoding operation is linear time. Hamming codes can be ex-
tended to include one more parity check bit at the end of each valid codeword such that the
parity check matrix will have the following form,
0
..
.
Hex = H ham
0
1 1 1 ... 1 1 1 1 r +1×2r
E RASURE DECODING PERFORMANCE In this subsection, we are more interested in the erasure
decoding performance of Hamming codes. As mentioned before, r is a design parameter (=
number of parity symbols) and the maximum number of erasures (any pattern of erasures)
that a Hamming code can correct cannot be larger than r . This is because if there were more
than r erasures, any valid two codewords shall be indistinguishable. But it is true that a Ham-
ming code can correct any erasure pattern with 2 or less erasures. Likewise, extended Ham-
ming code can correct any erasure pattern with 3 or less erasures. The following theorem
from [30] is useful for determining the number of erasure patterns of weight τ ≤ r erasures
that a Hamming code of length 2r − 1 can tolerate.
Theorem 6: Let B be a r ×τ matrix whose columns are chosen from the columns of a parity
check matrix Hham of length 2r − 1, where 1 ≤ τ ≤ r . Then the number of matrices B such that
rank(B) = τ, is equal to
∏ r
1 τ−1
B (τ, r ) = (2 − 2i ) (4.24)
τ! i =0
and furthermore the generator function for the number of correctable erasure patterns for this
Hamming code is given by
( ) ( )
2r − 1 2r − 1 2 ∑ ∏ r
r s τ τ−1
g (s, r ) = s+ s + (2 − 2i ) (4.25)
1 2 τ=3 τ! i =0
PROOF: The columns of Hham are the elements of the r − dimensional binary space ex-
cept the all-zero tuple. The number of matrices B constructed from distinct τ columns of
Hham , having rank(B) = τ, is equal to the number of different bases of τ− dimensional sub-
spaces. If we let {b1 , b2 , . . . , bτ } denote the set of basis vectors. The number of such sets can
be determined by the following procedure,
• For i = 2, 3, . . . , τ, select bi such that it is not equal to previously chosen i −1 basis vectors
i.e., {b1 , b2 , . . . , bi −1 }. It is clear that there are 2r − 2i −1 choices for bi , i = 2, 3, . . . , τ.
37
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
Since the ordering of basis vectors bi s are irrelevant, we exclude the different orderings
from our calculations and we finally reach at B (τ, r ) given in equation (4.24). Finally we notice
( r ) ( r )
that g (1, r ) = 2 1−1 and g (2, r ) = 2 2−1 meaning that all patterns with one or two erasures can
be corrected by a Hamming code.
Hamming codes are usually used to constitute the very first stage of the precoding of
concatenated fountain codes. The main reason for choosing a conventional or an extended
Hamming code as our precoder is to help the consecutive graph-based code (usually a LDPC
code) with small stoping sets. For example, original Raptor codes use extended Hamming
codes to reduce the effect of stopping sets of very small size due to the irregular LDPC-based
precoding [17].
Exercise 6: It might be a good exercise to derive the equivalent result of theorem 4 for
extended binary Hamming as well as q-ary Hamming codes. Non-binary codes might be
beneficial for symbol/object level erasure corrections for future generation fountain code-
based storage systems. See Section 5 to see more on this.
The original purpose of concatenating a standard LT code with accumulate codes [31]
is to make systematic Accumulate LT (ALT) codes as efficient as standard LT encoding while
maintaining the same performance. It is claimed in the original study [32] that additional
precodings (such as LDPC) applied to accumulate LT codes (the authors call it doped ALT
codes) may render the overall code more ready for joint optimizations and thus result in bet-
ter asymptotical performance. The major drawback however is that this precoded accumu-
late LT codes demonstrate only near-capacity achieving performance if the encoder has in-
formation about the erasure rate of the channel. This means that the rateless property might
have not been utilized in that context. Recent developments9 in Raptor coding technology
and decoder design make this approach somewhat incompetent for practical applications.
9 Please see Section 5. There have been many advancements in Raptor Coding Technology before and after the
company Digital Fountain (DF) is acquired by Qualcomm in 2009. For complete history, search Qualcomm’s
web site and search for RaptorQ technology.
38
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
∑P −1 sk
where θP ( j ) = k= j +1 n k
for 2 ≤ j ≤ P − 1.
PROOF: Let us condition on the l recovered symbols for 0 ≤ l ≤ n P after BP decoding of
the LT code. Also, let us condition on s 2 , . . . , s P erasures that are corrected by each stage of the
precoding. Note that at the P −th precoding stage, after decoding, there shall be (l + s P ) re-
covered symbols out of n P . Since we assume these recovered symbols are uniform randomly
scattered across the codeword, the number of recovered symbols in the (P − 1)th stage pre-
coding (i.e., within codeword of length n P −1 ) is nnPp−1 (l + s P ). Thus, the number of additional
recovered symbols at the (P − 1) precoding stage satisfies 0 ≤ s P −1 ≤ n P −1 − nnPp−1 (l + s P ). In
general for 2 ≤ j ≤ P − 1, we have
( ( ))
nj n j +1 n P −1
0 ≤ sj ≤ nj − s j +1 + ···+ (l + s P ) . . . (4.26)
n j +1 n j +2 nP
( )
s j +1 s j +2 s P −1 l + s P
= nj 1− − −···− − (4.27)
n j +1 n j +2 n P −1 nP
( )
l + sP
= n j 1 − θP ( j ) − (4.28)
nP
where we note that the upper bounds on s j s also determine the limits of the sums in the
expression.
Since the failure probability of each coding stage is assumed to be independent, we mul-
tiply the failure probabilities all together provided that we have l , s P , . . . , s 2 recovered symbols
just before the last precoding stage that decodes the k information symbols. In order for the
whole process to fail, the last stage of the decoding cascade must fail. Conditioning on the
number of erasures already corrected by the rest of the precoding and LT coding stages, the
number of remaining unrecovered message ( symbols can) be calculated using equation (4.28)
with j = 1. The probability that s 1 = n 1 1 − θP (1) − l +s nP
P
is given by q ( l +s
) . There-
n 1 1−θP (1)− n P
( ) P
fore, if s 1 ̸= n 1 1 − θP (1) − l +s
nP
P
the whole process shall fail and therefore the last probability
expression 1 − q ( )
l +s P follows by the product rule. Finally, we sum over l , s P , . . . , s 2 to
n 1 1−θP (1)− nP
find the unconditional failure probability of the concatenated fountain code decoding pro-
cess.
As a special case P = 1, i.e., we have only one precoding stage, the failure probability will
be reduced to
∑
n1
p l (1 − q n1 −l ) (4.29)
l =0
where we implicitly set s 1 = 0. Note that this is exactly the same expression given in [17]. Here
we formulate the expression for the general case.
39
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
Using the result of Theorem 7 and previously developed expressions like equations (4.25)
and (4.21), the failure probability of a concatenated fountain code constituted of a Hamming,
an LDPC and an LT code, can be found. If the performance of the LDPC code can be upper
bounded, the result of theorem 7 can still be used to find tight upper bounds for the concate-
nated fountain code of interest.
which however leads to significant inefficiencies regarding the overhead of the code. In order
to design a fountain code with low overhead, P matrix must have non-sparse columns for
binary concatenated fountain codes. In a number of previous studies [40], it is shown that
this objective is achievable for non-binary fountain codes at the expense of more complex
encoding and decoding operations i.e., Gaussian elimination etc. For low complexity opera-
tion, codes that operate on binary logic and an associated low complexity decoding algorithm
are more desirable. Therefore the form of G matrix in equation (4.30) is not very appropriate
to achieve our design goal. The approach we adapt is to design the generator matrix similar
to the way we design for non-systematic fountain coding, and turn the encoding operations
upside down a bit to allow systematic encoding.
Let us consider a bipartite graph G that generates the fountain code with k source symbols
and n output symbols. We call a subgraph G ∗ ⊆ G BP-decodable if and only if the message
symbols can be decoded using only the check symbols adjacent to G ∗ . Suppose that for a
given check node degree distribution Ω(x), we are able to find a BP-decodable G ∗ with k
adjacent check symbols (Note that this is the minimum size of a subgraph that BP might
result in a decoding success). Supposing that such G ∗ exists, the systematic LT encoding can
be summarized as follows,
• Run BP decoding to find the auxiliary symbol values (here due to the structure of G ∗ ,
we know that BP decoding is successful).
40
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
G G*
Figure 4.4: Let G ∗ be a BP-decodable subgraph of G. The fist k symbols of coded symbols are
assumed to be message symbols based on which we compute (using BP algorithm
or back substitution) the auxiliary symbols shown as white circles based on which
the redundant symbols are calculated.
• Step 1: Generate the degree of other check symbol nodes according to Ω∗ (x) = 1−Ω x
1
( Ω(x)
x − Ω1 )
where Ω(x) is the degree distribution used for the non-systematic code. Let V = [1 2 d 3 . . . d k ]be
the set of node degrees generated.
♢ Step 1 Check: Let d (i ) be the i-th minimum of V and if for all i = 1, 2, . . . , k we have d (i ) ≤ i
then we say V can generate an appropriate G ∗ and continue with Step 2. Otherwise, go back to
Step 1.
• Step 2: Since the degree one probability is zero with the new distribution, we can decode at
most one auxiliary block per iteration.
♢ Step 2 Edge Selections: The edge connections are made such that in each iteration ex-
actly one auxiliary block is recovered. This is assured by the following procedure.
41
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
0
10
Ω(x) in Eqn. (3.55)
Soliton Dist.
Probability of Step 1 Check Failure
−1
10
−2
10
−3
10
−4
10
20 30 40 50 60 70 80 90 100 110
k (message length)
Figure 4.5: Joint probability distribution for order statistics using different degree
distributions.
For iteration i = 3 to k
The i -th minimum degree coded symbol make d (i ) − 1 connections with the i − 1 recov-
ered auxiliary symbols and one connection with the rest of the k − i + 1 auxiliary symbols. Note
that this is possible due to Step 1 Check and at the i -th iteration, there are exactly i −1 recovered
auxiliary symbols.
———————————
Figure 4.6: An example is shown for k = 5. At each step which auxiliary block to be recovered
is color labeled gray. Connections are made such that in each iteration exactly
one auxiliary block is recovered. If there are more than one option for this edge
selections, we use uniformly random selections to keep the variable node degree
distribution as close to Poisson as possible.
42
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
Unrecovered Unrecovered
Recovered Recovered
Figure 4.7: Edge eliminations are performed based on the subgraph g ∈ G each time the BP
algorithm gets stuck and need more edge eliminations.
We note that if we are unable to check the condition of Step 1 with high probability, we
may change the check node degree distribution and it may no longer be close to Ω(x). To
explore this condition let X 1 , X 2 , . . . , X k be a sequence of independent and identically dis-
tributed (∼ Ω(x)) discrete random variables. Consider the order statistics and define X (i ) be
the i -th minimum of these random variables. Then, we are interested in the joint probabil-
ity of order statistics that Pr{X (3) ≤ 3, X (4) ≤ 4, . . . , X (k) ≤ k}. It can be shown that for check
node degree distributions that allow linear encoding/decoding, this probability is close to
one as the message length grows. To give the reader a numerical answer, let us compute this
probability as a function of message block length k for the check node degree distribution
of equation (3.55) as well as for Soliton degree distribution. Note that degree distribution in
equation (3.55) allows linear time encoding/decoding whereas Soliton distribution does not.
Results are shown in Fig. 4.5. We observe that both distributions allow a gradual decrease
in failure probability as k increases. For example, for k = 1000 using Soliton distribution the
failure rate is around 3 × 10−7 although this is not observable in the figure. The degree dis-
tribution Ω∗ (x) may have to be modified for very large values of k to keep the probability of
choosing degree-one non-zero.
Let us consider an example for k = 5 and degree vector v = [1 2 5 2 3]. It is not hard to show
that v can generate an appropriate G ∗ . An example of edge selections are shown in Fig. 4.6.
Such a selection is not unique ofcourse. Since some of the selections are not made randomly,
this may cause auxiliary node degree distribution little skewed compared to Poisson.
Although the presented approach generates an appropriate G ∗ , it has its own problems
with regard to the preservation of node degree distributions. A better approach could be to
generate a generator matrix G as in the non-systematic coding, and then eliminate a proper
subset of edges so that the BP decoding algorithm can be successful. However, a problem
with this approach is that degree distributions of concatenated fountain codes do not allow
the full recovery of the message symbols by performing the LT decoding only. Thus, no mat-
ter how well we eliminate edges in the graph, the BP decoding will not be successful with
high probability unless we use very large overhead ϵ (which may have dramatic effect on the
node degree distributions). An alternative is to insert random edges into the graph G to make
each variable node make a connection with the check nodes so that their decoding might be
possible. This process corresponds to inserting ones into the appropriate locations in G so
43
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
k
a) Data
Systematic LDPC Encoding
Data PLDPC
Systematic LT decoding (G* Matrix)
Aux
Conventional LT encoding
Data PLDPC PLT
n
Data
Systematic LT decoding (G* Matrix)
b) Aux
Systematic LDPC Encoding
Aux PLDPC
Conventional LT encoding
Data PLT
n
Figure 4.8: Systematic encoding of concatenated fountain codes: An LDPC precode concate-
nated with an LT base code.
44
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
Figure 5.1: Degree-two and degree-four check symbols disappears however the connection
properties of the message nodes through these check nodes are preserved.
parity symbols are denoted by P LDPC . A G ∗ is generated according to one of the methods dis-
cussed earlier. Using this generator matrix, LDPC codeword is decoded using BP algorithm to
form k ′ auxiliary symbols. Note that the BP decoding is successful due to the special structure
of G ∗ . Lastly, final parity redundancy is generated as in usual non-systematic LT encoding.
Decoding works in exact reverse direction. First, a large fraction of auxiliary symbols are re-
covered (not completely! due to linear complexity encoding/decoding constraint) through
LT decoding by collecting enough number of encoded symbols. Next, the data and LDPC
parity symbols are recovered through LT encoding using only recovered/available auxiliary
symbols. Consequently, any remaining unrecovered data symbols are recovered by system-
atic LDPC decoding.
In Fig. 4.8 b) shows an alternative way of constructing a systematic fountain code. How-
ever, it is obvious that this method differ significantly from the former one in terms of de-
coding performance. It is not hard to realize that in the latter method, data section can only
help decode the auxiliary symbols whereas the parities help decode the whole intermediate
symbols, both auxiliary symbols and parities due to LDPC coding. This leads to ineffective
recovery of the LDPC codeword symbols. Therefore, the former method can be preferable if
the decoding performance is the primary design objective.
5 A DVANCED T OPICS
In this section, we will discuss some of the advanced topics related to the development
of modern fountain codes and their applications in communications and cloud storage sys-
tems. We are by no means rigorous in this section as the content details can be found in the
references cited. Main objective is rather to give intuition and the system work flow funda-
mentals. There shall be many more details related to the subject left untouched, which we
have not included in this document.
45
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
j1 • If kp e < 1, then any graph in G(k, p e ) will almost surely contain connected components
of size no larger than O(ln(k)).
j2 • If kp e = 1, then any graph in G(k, p e ) will almost surely have a largest connected com-
ponent of size O(n 2/3 ).
j3 • If kp e = C > 1 for some constant C ∈ R, then any graph in G(k, p e ) will almost surely
have a unique giant component containing a fraction of k nodes. The rest of the com-
ponents are finite and of size no larger than O(ln(k)).
Furthermore, about the connectedness of the whole random graph, they proved for some
ε > 0 that
j4 • If kp e < (1 − ε) ln(k), then any graph in G(k, p e ) will almost surely be disconnected.
j5 • If kp e > (1 + ε) ln(k), then any graph in G(k, p e ) will almost surely be connected.
Next, we shall show that our conclusions of Section 3.2 and particularly the derivation of
Soliton distribution has interesting connections with the set of results in [33]. First, we realize
that the graph representing the fountain code is a bipartite graph. We transform this bipartite
graph G to a general graph G ∗ in which only message nodes exist in the following way. For
a degree-d check node, we think of all possible variable node pair connections and thus we
( )
draw d2 edge connections between the associated variable nodes (vertices of G d∗ where the
subscript specifies the degree-d ). This procedure is shown for degree-two and degree-four
∪
check symbols in Fig. 5.1. The transformed graph G ∗ is thus given by G ∗ = d G d∗ .
∑
For a given check node degree distribution Ω(x) = kd =1 Ωd x d , the expected edge connec-
(d )
tions due to only degree-d check symbols is given by 2 Ωd n. Since asymptotically the actual
number of edge connections converges to the expected value, the probability that an edge
exists between two message symbols in G d∗ can be computed to be of the form
(d )
2 Ωd n d (d − 1)Ωd (1 + ϵ)
p e (d ) = (k ) = (5.1)
k −1
2
if we assume the subgraph induced by using only degree-d check symbols contain no cycles.
Of course this assumption might be true for low d such as d = 2 with high probability. This
46
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
Figure 5.2: A set of degree-two symbols induces a connected graph that can correct every
message node if a single degree-one check node is connected to any of the mes-
sage symbols.
probability gets lower for large d . To see this consider the extreme case d = k. Consider
the BP algorithm and degree-two check symbols. It is easy to see that the graph G 2∗ allows
successful decoding of the whole message block if G 2∗ is connected and a single degree-one
check node is connected to any of the nodes of G 2∗ 10 . The requirement of G 2∗ being connected
is too tight and impractical. We rather impose the connectedness constraint for the overall
graph G ∗ . For this to happen, the average degree of a node in Erdös-Rényi random graph
model is O(ln(k)) (see j5 ) and hence the average number of edges in the graph G ∗ must be
O(k ln(k)) as established by the arguments of Section 3.
Relaxing connectedness condition, we can alternatively talk about decoding the giant
component of G 2∗ by making a single degree-one check node connect with any one of the
nodes of the giant component of G 2∗ . In order to have a giant component in G 2∗ , we must have
(according to Erdös-Rényi random graph model j3 ),
2kΩ2 (1 + ϵ) 1
kp e (2) = > 1 =⇒ Ω2 ≥ for k → ∞ (5.2)
k −1 2
1 ∑∞ i i −1
ϕ(m) = 1 − (me −m )i . (5.3)
m i =1 i !
For convenience, we drop the functional dependence and refer this fraction as ϕ. Let us
consider the remaining (1−ϕ)k message nodes and eliminate the edges connected to already
decoded ϕk message values. This elimination changes the check node degree distribution
and we denote this modified degree distribution by Ωϕ (x) from now on. Suppose a check
node has originally degree d , and its degree is reduced to i after edge eliminations. This
10 In fact what node to which this connection is made may change the number of iterations as well as the order of
decoded message symbols. The order and the number of iterations of the BP algorithm can be very important
in a number of applications [20], [21], [22].
47
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
fk (1- f) k
Figure 5.3: A ϕ fraction of k message symbols constitute a giant component of the random
graph. Elimination of edges of the check nodes that has connections with that
giant component induces a modified degree distribution Ωϕ (x).
conditional probability is simply given by the binomial distribution (since edge selections
have been made independently),
( )
d
P r {reduced degree = i |original degree = d } = (1 − ϕ)i ϕd −i (5.4)
i
for i = 0, 1, . . . , d . Using standard averaging arguments, we can find the unconditional modi-
fied degree distribution to be of the form,
( ) ( )
∑d ∑ k d i d −i i
∑ k ∑ d d
Ωϕ (x) = Ωd (1 − ϕ) ϕ x = Ωd ((1 − ϕ)x)i ϕd −i (5.5)
i =0 d =1 i d =1 i =0 i
∑
k
= Ωd ((1 − ϕ)x + ϕ)d (5.6)
d =1
= Ω((1 − ϕ)x + ϕ) (5.7)
In order to find the number of reduced degree-two check nodes, we need to find the
probability of having degree-two nodes based on Ωϕ (x) = Ω((1 − ϕ)x + ϕ). Let us look at the
Taylor expansion of Ωϕ (x) at x = 0,
(d )
∞ Ω (0)
∑ ϕ Ω′ϕ (0) Ω′′ϕ (0) Ω(3)
ϕ
(0)
d 2
x = Ωϕ (0) + x+ x + x3 + . . . (5.8)
d =0 d! 1! 2! 3!
(1 − ϕ)Ω′ (ϕ) (1 − ϕ)2 Ω′′ (ϕ) 2
= Ω(ϕ) + x+ x +... (5.9)
1! 2!
(1−ϕ)2 Ω′′ (ϕ)
from which we find the probability of degree-two to be 2! . In order to have a giant
component in the remaining (1 − ϕ)k message nodes, we need to have
48
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
First of all in the limit we realize that it is sufficient Ω2 = 1/2 and (1 − ϕ)Ω′′ (ϕ) = 1 to have
two giant components and hence an overwhelming portion of the message symbols shall be
decoded successfully. Once we set these, we immediately realize that the equation in (5.11) is
the same as the equation (3.32). In fact, we have shown that the degree distribution Ω(x) sat-
isfying both Ω2 = 1/2 and (1 − ϕ)Ω′′ (ϕ) = 1 is the limiting distribution of Soliton distribution.
In summary therefore, the degree-two check nodes of the Soliton distribution creates a giant
component and ensure the recovery of ϕ fraction of message symbols whereas the majority
of the remaining fraction is recovered by the reduced degree-two check nodes. The decoding
to completion is ensured by the higher degree check nodes. This can be seen by applying the
same procedure repeatedly few more times although the expressions shall be more complex
to track.
49
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 4 0 0 0 1 1
1 0 1 0 0 2 1 0 1 0 0 2 1 0 1 0 0 2 1 0 1 0 0
1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 3 1 1 1 1 0
1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1
1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1
1 2 1 2 i1 1 2 i1 1 3 4
Figure 5.4: BP algorithm worked on the generator matrix of our example. Pink columns show
the index of message symbols successfully decoded, i.e., x 1 and x 3 are recovered.
the unrecovered active message symbols can be solved using the belief propagation based
on the already decoded message symbols as well as the values of the inactivated message
symbols. This process is guaranteed to recover all of the message symbols if the code is ML-
decodable. This modification in the decoding algorithm may call for modifications in the de-
gree distribution design as well. A degree distribution is good if the average degree of edges
perpcoded symbol is constant and the number of inactivated message symbols are around
O( k), as this shall mean that the total number of symbol operations for inactivation decod-
ing is linear time complexity.
Let us give an example to clarify the idea behind inactivation decoding. Let us consider
the following system of equations and we would like to decode the message block x from y
based on the decoding graph G defined by the following generator matrix.
0 0 0 1 1 y1
x1
1 0 1 0 0 y2
x2
1 1 1 1 0 y3
x3 =
0 0 1 0 0 y4
x4
0 1 1 1 1 y5
x5
1 1 0 1 1 y6
In the first round of the BP algorithm, only decoder scheduling is performed i.e., what
symbols shall be decoded at what instant of the iterative BP algorithm. Let us summarize this
scheduling through generator matrix notion. The BP algorithm first finds a degree-one row
and takes away the column that shares the one with that row. In order to proceed, there must
be at least one degree-one row at each step of the algorithm. This is another way of inter-
preting the BP algorithm defined in Section 3.2. In our example, BP algorithm executes two
steps and gets stuck i.e., two columns are eliminated in the decoding process as shown in Fig.
5.4. At this point BP algorithm is no longer able to continue because there is no degree-one
coded symbol left after edge eliminations (column eliminations). The inactivation decoding
inactivates the second column (named i 1 ) and labels it as “inactive" whereas the previously
taken away columns are labeled “active". Next, we check if BP decoding is able to continue.
In our example after inactivation, it continues iterations without getting stuck a second time.
Based on the decoding orders of message symbols, we reorder the rows and the columns of
the generator matrix according the numbering system shown in Fig. 5.4. Once we do it, we
50
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
1 1 0 0 0 0 1 0 0 0 0 x3 y4
2 1 1 0 0 0 0 1 0 0 0 x1 y2 + y4
3 1 1 1 0 1 Elementary row 0 0 1 0 1 y2 + y3
eliminations x4 y1 +y2+ y3
4 0 0 1 1 0 0 0 0 1 1
1 0 1 1 1 0 0 0 0 1
x5 y1 +y4+ y5
0 1 1 1 1 0 0 0 0 1 x2 y1 +y2 +y4+ y6
1 2 3 4 i1
Figure 5.5: BP algorithm worked on the generator matrix of our example. Elementary row op-
erations are performed to obtain the suitable form of the generator matrix where
the last entry of the modified x vector is the message symbol that is inactivated.
obtain the matrix shown to the left in Fig. 5.5. Invoking elementary row operations on the
final form of the generator matrix results in a system of equations shown to the right in the
same figure. As can be seen, since we only have one inactivated message symbol, the right-
bottom of the row echelon form is of size 2 × 1. In general if we have i inactivated message
symbols and the right-bottom of the row echelon form is of size n − k + i × i . Any invertible
i × i submatrix would be enough to decode the inactivated message symbols. Considering
our example, we have x 2 is either given by y 1 + y 4 + y 5 or by y 1 + y 2 + y 4 + y 6 . Once we insert
the value of x 2 into the unrecovered message symbols and run a BP algorithm (this time we
allow decoding i.e., XOR operations), we guarantee the successful decoding. In general, the
n − k + i × i matrix is usually dense due to elementary row operations. Hence, the complexity
of the inversion operation shall at least be O(i 2 ). However, row and column operations as well
as the elimination steps takes extra effort at least on the p order of O(i 2 ), thereby making the
overall operation at least be O(i 2 ). That is why if i ≤ O( k), linear-time decoding complexity
shall still be maintained in the inactivation decoding context.
Alternatively, some of the message symbols can be permanently labeled inactive (and
hence the name permanent inactivation) before even BP scheduling takes place. Active and
permanently inactive message symbols constitute two class of message symbols and differ-
ent degree distributions can be used to generate degree and edge connections. This idea in
fact resembles to the generalization of LT coding covered in Section 3.6. The usage of perma-
nent inactivations is justified by the observation that the overhead-failure probability curve
of the resulting code so constructed is similar to that of a dense random binary fountain code,
whereas decoder matrix potentially has only a small number of dense columns. See [35] for
more details.
51
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
tor code in [17]) is adapted by 3GPP Multimedia Broadcast Multicast Service, IETF RFC 5053
and IP Datacast (DVB-IPDC) (ETSI TS 102 472 v1.2.1) for DVB-H and DVB-SH. More advanced
RaptorQ (RQ) code [35] is implemented and used by Qualcomm [36] in broadcast/multicast
file delivery and fast data streaming applications. Additionally, online codes were used by
Amplidata object storage system [37] to efficiently and reliably store large data sets. We de-
fer the discussion of fountain codes within the context of storage applications for the next
subsection.
So far, our discussions were focused on the design of degree distributions and pre-code
selections given the simple decoding algorithm, namely BP. Therefore, main objective was to
design the code such that the decoding can efficiently be done. We have also seen in Section
3.1 that dense fountain codes under Gaussian elimination provides the best performance if
they are provided with a significantly more complex decoding process. Given this potential,
the idea behind almost all the advanced standardized fountain codes for communication ap-
plications has become to devise methods to get the performance of a concatenated fountain
code close to ML decoding performance of a random dense fountain code while maintaining
the most attractive feature: low complexity decoding.
A simple comparison reveals that the code design and degree distributions are functions
of the decoding algorithm. R10 and RQ code designs are performed systematically and based
on the idea of inactivation decoding. Thus, their designs are little bit different than standard
LT and concatenated fountain codes presented so far. One of the observations was that using
a small high density submatrix in the sparse generator matrix of the concatenated fountain
code successfully mimics the behavior of a dense random linear fountain code. R10 mimics
a dense random linear fountain code defined over F2 whereas RQ mimics a dense random
linear fountain code defined over F256 . The generator matrices of these codes have particular
structure as shown in Fig. 5.6. As can be seen, a sparse Graph G is complemented by denser
matrix B (entries from F2 ) in R10 and additionally by Q (entries from F256 ) in RQ as shown.
The design of B is completely deterministic and consist of two submatrices one for sparse
LDPC code and one for dense parity check code. These matrices are designed so that BP
algorithm with inactivation decoding works well where the majority of design work is mostly
k’ k’
G G
k’ k’
B B
Q
Figure 5.6: Adding dense rows to mimic the failure probability of a random dense fountain
code[35]. Matrices G and B are binary and Q is non-binary.
52
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
based on heuristics. Two potential improvements of RQ over R10 are a steeper overhead-
failure curve and a larger number of supported source symbols per encoded source block.
RQ achieves that performance using permanent inactivation decoding and operation over
larger field alphabets.
As it has been established in [11] and [13], using higher field arithmetic significantly im-
proves the ML decoding performance (also see Section 3.1), thereby making the RQ is one of
the most advanced and best fountain codes suitable for data streaming and communication
applications. However, although many improvements have been made to make the whole
encoding and decoding process linear time for R10 and RQ, the complexity is much higher
than concatenated fountain codes defined over binary arithmetic and solely dependent on
BP algorithm. Compared to large dense random fountain codes however, R10 and RQ pro-
vides significant complexity savings and allows exceptional recovery properties.
53
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
through fountain code encoding. Let µe and µd be the encoding the decoding cost per sym-
bol, respectively. The decoding stage thus requires µd k average symbol operations. The en-
coding symbols in L takes us an average of µe |L | symbol operations. The repair complexity
is given by,
µd k + µe |L | µd k
C R (L ) = = + µe (5.12)
|L | |L |
which might have been efficient if |L | was on the order of k. However, in many practical cases
single failure or two are more frequent. Even if µd and µe are constant, the repair process will
be inefficient for large k and small constant |L | i.e., C R (L ) = O(k).
One can immediately realize that the main reason for inefficient repair is tied to the fact
that the message symbols (auxiliary symbols in the systematic form) are not readily available
when a failure is attempted to be corrected and thus the decoding of the whole message block
(or the whole auxiliary block in the systematic form) is necessary. To remedy this in [39], a
copy of the message block is stored in addition to non-systematic fountain code symbols. In
this case, it is shown that the expected repair complexity for an arbitrary set |L | is at most (1+
ϵ)Ω′ (1). For Raptor codes for instance C R (L ) = O(1). Therefore, a constant repair complexity
can be achieved. However, the overall cost is the increased overhead ϵ′ = 1 + ϵ, which does
not vanish as k → ∞.
In [40], the existence of fountain codes have been shown that are systematic and has a
vanishing overhead and a repair complexity C R (L ) = O(ln(k)). These codes are defined over
Fq and the construction is based on the rank properties of the generator matrix. Since the
BP algorithm is not applicable to the encoding and decoding, the complexity of the original
encoding and decoding processes is high. The existence of systematic fountain codes, prefer-
ably defined over F2 , with very good overhead and repair complexity, while accepting a low
complexity encoding/decoding is an open problem.
54
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
R EFERENCES
[1] C. E, Shannon, “A mathematical theory of communication," Bell Sys. Tech. J., vol. 27,
pp. 379-423; July 1948, pp. 623-625, Oct. 1948.
[2] D. J. C. MacKay, “Fountain codes," in IEE Proceedings Communications, vol. 152, no. 6,
2005, pp. 1062–1068.
[3] J. Perry, P. Iannucci, K. Fleming, H. Balakrishnan, and D. Shah. “Spinal Codes," In SIG-
COMM, Aug. 2012.
[5] A. S. Barbulescu and S. S. Pietrobon, “Rate compatible turbo codes," Electron. Let., pp.
535Ű536, 1995.
[9] M. Luby, “LT codes," in Proc. 29th Annu. ACM Symp. Theory of Computing, pp. 150–159,
2002.
[10] C. Di, D. Proietti, E. Telatar, T. Richardson, and R. Urbanke, “Finite Length Analysis of
Low-Density Parity-Check Codes," IEEE Trans. on Information Theory, pp. 1570– 1579,
Jun. 2002.
[11] G. Liva, E. Paolini, and M. Chiani, “Performance versus Overhead for Fountain Codes
over Fq ," IEEE Communications Letters, vol. 14, no. 2, pp. 178–180, 2010.
[13] B. Schotsch, R. Lupoaie, and P. Vary, “The Performance of Low-Density Random Lin-
ear Fountain Codes over Higher order Galois fields under Maximum Likelihood De-
coding," In Proc. 49th Annual Allerton Conf. on Commun., Control, and Computing,
Monticello, IL, USA, pp. 1004–1011, Sep. 2011.
[14] P. Maymounkov, “Online Codes," Secure Computer Systems Group, New York Univ., New
York, Tech. Rep. TR2002–833, 2002.
55
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
[15] M. Luby, M. Mitzenmacher, and A. Shokrollahi, “Analysis of Random Processes via And-
Or Tree Evaluation", In Symposium on Discrete Algorithms, 1998.
[16] Oliver C. Ibe, “Elements of Random Walk and Diffusion Processes", Wiley Series in Op-
erations Research and Management Science, 2013.
[17] A. Shokrollahi, “Raptor Codes," IEEE Trans. Inf. Theory, vol. 52, no. 6, pp. 2410–2423,
Jun. 2006.
[19] X. Yuan and L. Ping, “On systematic LT codes," IEEE Commun. Lett., vol. 12, pp. 681–
683, Sept. 2008.
[20] N. Rahnavard, B. N. Vellambi, and F. Fekri, “Rateless codes with unequal protection
property," IEEE Trans. Inf. Theory, vol. 53, no. 4, pp. 1521–1532, Apr. 2007.
[24] Ron M. Roth, “Introduction to Coding theory", Cambridge Univ. Press., 2006.
[25] Qualcomm Incorporated White Paper, “ Why Raptor Codes Are Better Than Reed-
Solomon Codes for Streaming Applications", Jul. 2013.
[27] J. Bloemer, M. Kalfane, M. Karpinski, R. Karp, M. Luby and D. Zuckerman, “An XOR-
Based Erasure-Resilient Coding Scheme", ICSI Technical Report, TR-95-048, August
1995.
56
G RAPH C ODES AND T HEIR A PPLICATIONS 2014 DRAFT
[29] S. J. Johnson, “A Finite-Length Algorithm for LDPC Codes Without Repeated Edges on
the Binary Erasure Channel", IEEE Transactions on Information Theory, vol. 55, no. 1,
pp. 27–32, 2009.
[32] X. Yuan and L. Ping, “Doped Accumulate LT Codes," In Proc. IEEE Int. Symp. Inf. Theory
(ISIT) 2007, Nice, France, pp. 2001–2005, Jun. 2007.
[33] P. Erdös and A. Rényi, “On the Evolution of Random Graphs", Publications of the Math-
ematical Institute of the Hungarian Academy of Sciences 5: 17–61, 1960.
[34] P. Erdös and A. Rényi, “On Random Graphs", Publicationes Mathematicae 6 pp. 290–
297, 1959.
[35] A. Shokrollahi and M. Luby, “Raptor Codes", Foundations and Trends in Communica-
tions and Information Theory, vol. 6, no. 3-4, pp. 213–322, 2009.
[36] Qualcomm Incorporated White Paper, “ RaptorQ Technical Overview ", 2010. Available
online: http://www.qualcomm.com/solutions/multimedia/media-delivery/raptorq
[38] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the Locality of Codeword Sym-
bols," IEEE Transactions on Information Theory, vol. 58, no. 11, pp. 6925–6934, 2012.
[39] R. Gummadi and R. S. Sreenivas, “Erasure Codes with Efficient Repair," 2012. Available
online: http://www.stanford.edu/ gummadi/papers/repair.pdf
[40] M. Asteris and A. G. Dimakis, “Repairable Fountain Codes," In Proc. of 2012 IEEE Inter-
national Symposium on Information Theory, Cambridge, MA, July 2012.
57