Topics 2 To 5
Topics 2 To 5
n-Sequences, Independent
Events and
Error-Correcting Codes
Digital communication is fundamental to many things that we take for granted:
CDs, mobile phones and communicating with the Martian Rovers, for instance.
Error-correcting codes are used to enable errors from fingerprints on CDs, for
example, to be corrected unobtrusively. We begin the introduction to errorcorrecting codes in this topic.
2.1
n-Sequences
The sequence (1,4,5,3,1) has n = 5 entries; the first entry is 1, the second entry
is 4, the third entry is 4, the fourth entry is 3 and the fifth entry is 1.
In the previous example notice that repeated entries in a sequence are
recorded, rather than being omitted as they are in a set. Thus the position
of an element in a sequence is very important and the sequences (1,4,5,3,1) and
(1,5,4,3,1) are different.
We will be using sequences to represent the messages that we want to send,
perhaps from a CD to the speakers, perhaps from Mars to the receivers on
Earth. We will have an alphabet of two symbols, 0 and 1, which we will think
of as the elements of Z2 . We will use these symbols to write messages, just as
we use the 26 letters of the alphabet to write messages in English.
EXAMPLE 2.1.2.
Suppose that we can have messages of length 3. Then there are 8 messages
(sequences of length 3 or 3-sequences) that are possible over Z2 . These are
(remember that we do not write the parentheses or the commas):
000, 001, 010, 011, 100, 101, 110, 111.
15
2.1.1
Exercises
1. Give the four messages possible with binary 2-sequences and the 16 messages possible with binary 4-sequences.
2. Suppose that we use the elements of Z3 to be the alphabet. Give all
possible messages of lengths 2 and 3.
2.2
Suppose that the 8 messages given in Example 2.1.2 represent 8 colours and
that we send one of the 8 messages. We see that if even one digit is changed
then a different colour is received from the one that was sent. And perhaps even
more importantly, we have no way of knowing that the message that was sent
was not the message that was received. This is because we have no redundancy
in the message.
Suppose that we send one message. Suppose that each digit is correctly
received with probability p. Suppose that errors in transmission occur independently from digit to digit. Then the probability that the message is correctly
received is p3 . So if p = 0.9 then the probability that the message is correctly
received is 0.93 = 0.729; if p = 0.99 then the probability that the message is
correctly received is 0.993 = 0.970299.
Suppose that we want to transmit an image that has 20 20 pixels in it.
Each pixel represents one message with 3 digits. Thus the total message length
is 20 20 3 = 1200 and the probability that the whole message is correctly
received is p1200 . For p = 0.99 this is 0.000005784; in other words it is very
unlikely to happen. (Most pictures are more likely to be at least 200 400
pixels and have 212 = 4096 colours (the Voyager probe, for instance) so the
problem is actually much worse than we have indicated here.)
In English it is rarely true that omitting one letter, or even several, makes
a sentence unintelligible. For example, Hw r y tdy? is rapidly interpreted as
How are you today?. Thus we can see that there is redundancy in English and
this makes it possible for us to correctly guess at missing parts of the message.
An easy way to introduce redundancy is to send the message several times
and then decode by noting which symbol appears most often in each place. This
is called majority decoding.
EXAMPLE 2.2.1.
Suppose that we transmit each symbol three times and use majority decoding.
If the symbol we want to transmit is 0 then we actually send 000. If we
receive 000 or 001 or 010 or 100 then we correctly decode to 0. The probability
that we receive 0 is p3 + p2 q + pqp + qp2 = p3 + 3p2 q (where q = 1 p). If
p = 0.9 this is 0.972 and if p = 0.99 this is 0.999702.
Returning to the pixels with colours represented by messages of length 3, we
see that if we send the message three times then the probability that any digit of
each pixel is correctly received is 0.972 and the probability that the pixel itself
is correctly received is 0.9723 = 0.91833, up from 0.729 without retransmission.
c
Debbie
Street, 2011
16
2.2.1
Exercises
= 0 mod 2.
P
(c) When we receive the string c1 c2 c3 we calculate i ci mod 2. If the
result is 1 then we know an error has occurred. Why?
i ci
(d) Consider the message 00. Give the received strings in which 0 errors
have been made, 1 error has been made, 2 errors have been made, 3
errors have been made.
(e) Which of these errors will we be able to detect?
(f) For those errors that can not be detected, to what message(s) will
they be decoded?
(g) What is the probability of an undetected error being made in transmission?
(h) How does the probability of undetected errors compare with that of
the (6,2) code discussed in Question 1?
3. Consider the 8 messages possible using binary 3-sequences. Suppose that
we construct codewords of length 6 by adjoining 3 parity check digits.
So the message (m1 , m2 , m3 ) becomes the codeword (m1 , m2 , m3 , m1 +
m2 , m1 + m3 , m2 + m3 ), where addition is carried out mod 2.
c
Debbie
Street, 2011
17
2.3
Hamming Codes
In this section we will look at one member of the family of Hamming codes.
Hamming codes are important because the codes can be encoded and decoded
very quickly which means that conversations on mobile phones, for example, are
not subject to long delay times.
We suppose that the message consists of 4 binary digits. We assume that
each message gets transformed to a codeword of 7 binary digits. We do this
encoding by post-multiplying by a generator matrix G. G is chosen so that a
single digit change, in any position of the codeword, can be corrected.
EXAMPLE 2.3.1.
Let
1
0
G=
0
0
0
1
0
0
0
0
1
0
0
0
0
1
1
1
0
1
1
0
1
1
0
1
.
1
1
Then the message m becomes the codeword c = mG (mod 2). In particular the
message 1011 is transmitted as the codeword 1011G = 1011010.
Note that the first 4 columns of G form the identity matrix of order 4, I4 , so
the first 4 entries in c are just the elements of m. If we think of G as G = [I4 P ]
then c = [m mP ].
EXAMPLE 2.3.2.
Let
1 1
H= 1 0
0 1
0
1
1
1
1
1
1
0
0
0
1
0
0
0 .
1
18
Suppose that we receive v = 1100001. Then vH 0 = 010 which is the 6th row
of H 0 and so the error was in the 6th digit and received codeword should have
been 1100011, corresponding to the message 1100.
2.3.1
Exercises
1. Suppose that you received 0100111 and 0101010, both sent using the [7,4]
Hamming code. What were the corresponding messages?
2. Let
1 0
G= 0 1
0 0
0
0
1
1
0
1
1
1
1
0
1 .
1
2.4
Hadamard Codes
H2 =
1
1
1
1
1
1
; H4 =
1
1
1
1
1
1
1
1
1
1
1
1
1
1
; H4 =
1
1
1
1
1
1
1
1
1 1
1 1
.
1 1
1 1
Hadamard matrices can only exist for order 2 and for orders that are a
multiple of 4. Whether or not Hadamard matrices exist for all orders that are
a multiple of 4 is not known.
One easy way to generate a Hadamard matrix is to take the Kronecker
product of two Hadamard matrices. Recall that if we have two matrices A =
c
Debbie
Street, 2011
19
(aij ) and B = (bij ) then the Kronecker product of A and B, written A B, has
entries aij B.
EXAMPLE 2.4.2.
1
1
H2 H2 =
1
1
1
1
1
1
1
1
1
1
1
1
.
1
1
Let h = 4. Then, if we use the first H4 given in Example 2.4.1, the 8 codewords
in the Hadamard code are
1
1
1
1
1
1
-1
-1
1
-1
1
-1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
1
1
-1
1
-1
1
-1
1
1
-1
Let v = 1111. Then vH40 = (4, 0, 0, 0). Now consider v = 111 1. Then
vH40 = (2, 2, 2, 2).
It is easy to check that all codewords in the Hadamard code have entries in
vHh0 equal to h, h or 0. But more can be said; see the exercises.
The Mariner spacecraft used the Hadamard code derived from H32 . The
colours were represented by 6 bits which then became codewords of length 32.
Thus up to 7 errors could be corrected and up to 15 errors could be detected.
(Why?)
2.4.1
Exercises
20
2.5
c
Debbie
Street, 2011
21
Topic 3
Introduction to Linear
Codes
We saw in the previous topic that Hamming codes are fast to decode since matrix
multiplication was all that was required, as opposed to searching through a list
of possible received strings to obtain the closest codeword and corresponding
message. Hamming codes are an example of linear codes and we will discuss
linear codes further here. We will also introduce the idea of cyclic linear codes.
In the following topic we will extend these ideas further to develop the codes
that are used to record music on CDs.
3.1
Linear Codes
A code C is said to be linear if the sum of any two codewords is also a codeword.
Formally, if v, w C then v + w C. Since we are considering codewords to be
binary n-sequences, we will calculate the sum of two codewords component-wise
mod 2.
EXAMPLE 3.1.1.
Let C = {0000, 0011, 1100, 0101, 1111}. C is not a linear code since 0011+0101=0110
and this is not in C.
Thus a linear code is said to be closed under addition since the sum of any
two codewords of a linear code is also a codeword. The set of all 2n binary
n-sequences, which we will represent by Z2n , is closed under addition, of course.
Any binary n-sequence can be written as the sum of the n binary n-sequences
ei , where ei has a 0 in all positions except position i where it has a 1. Since we
can not express any of the ei as a linear combination of the other ej we say that
the ei are linearly independent. Since every n-sequence in Z2n can be written as
a linear combination of the ei and the ei are linearly independent we say that
e1 , e2 , . . . , en form a basis for Z2n .
22
EXAMPLE 3.1.3.
Let n = 3. Then Z23 = {000, 001, 010, 011, 100, 101, 110, 111} and e1 = 100, e2 =
010 and e3 = 001. Then xyz Z23 can be written as xyz = xe1 + ye2 + ze3 .
For a linear code C, we have that C Z2n and that C is closed under
addition. That means that we can find a basis for C. We can then write the
basis vectors as the rows of a matrix, which we will call G, and all the codewords
can be written as linear combinations of these rows. So G is called the generator
matrix of the linear code C. If we change the -1s to 0s then the Hamming code
of the previous topic is one example of a linear code.
Recall that we are interested in codes because we want to be able to send
messages that are likely to be received correctly. If two codewords are too
close together, whatever that means exactly, then it is likely that we will have
trouble telling which one was actually received.
We will define the Hamming distance to be the number of places where the
codewords u and v differ, written d(u, v).
EXAMPLE 3.1.4.
c
Debbie
Street, 2011
23
We will now talk about an (n, m, d) code, where n is the length of the
codewords, m is the length of the message and d is the minimum distance of
the code.
We define the Hamming weight of u to be the number of non-zero entries in
u; that is, d(u, 0) = wt(u).
LEMMA 3.1.1.
Let C be a linear code. Then d(C) equals the minimum Hamming weight of any
of the non-zero codewords.
Proof. Since wt(u) = d(u, 0) we have that wt(u) d(C). We also know that
d(u, v) = wt(u v). So choose u, v C such that d(u, v) = d(C). Then
d(u, v) = wt(u v) = d(C) and the result follows.
We will assume that the first m binary digits are the information digits and
that the remaining n m binary digits are the check symbols. Then we can
write G in the standard form with G = [Im P ]. As in the case of the Hamming
code, we get a parity check matrix, H, by calculating H = [P 0 Inm ]. We can
detect errors by calculating vH 0 for the received n-sequence v. If vH 0 = 0 then
it is unlikely that an error has been made.
EXAMPLE 3.1.5.
Consider the linear code in Example 3.1.1. The code is C = {0000, 0011, 1100, 1111}.
We see that 0011 and 1100 form a basis for this code. So we have
0 0 1 1
G=
1 1 0 0
which is not in standard form. To get G into that form we need to interchange
some of the columns in G. If we interchange columns 1 and 3 we get
1 0 0 1
Gs =
.
0 1 1 0
The weight of the non-zero codewords are 2, 2 and 4 and d(C) = 2.
EXAMPLE 3.1.6.
1 0
G= 0 1
0 0
1
1
0
0
1
1
1 1
1 0 .
1 1
Clearly G is not in standard form. Since the first two columns are in the required
form, we start by swapping columns to move a 1 into position (3,1). We swap
columns 3 and 4 in G to get
1 0 0 1 1 1
G1 = 0 1 1 1 1 0 .
0 0 1 0 1 1
The first two entries in the third row are
and not a 0. We can add rows 2 and 3 to
1 0 0
Gs = 0 1 0
0 0 1
c
Debbie
Street, 2011
24
1 1 1
1 0 1 .
0 1 1
Let H be the parity check matrix for the code C. Then d(C) = d if and only
if any set of d 1 rows of H 0 is linearly independent and at least one set of d
rows of H 0 is linearly dependent.
Proof. We know that cH 0 = 0 for all the codewords. Thus there is a linear
combination of wt(c) rows of H 0 which is 0. So if d(C) = d then there is at least
one codeword of weight d. If there is a set of d 1 rows of H 0 that are linearly
dependent then d(C) 6= d.
3.1.1
Exercises
1. Find the generator matrix and parity check matrix of the (5,3) binary
code whose messages are the eight binary triples with encoding
xyz 7 x y x + y z y + z.
Now find a mapping so that G is in standard form.
2. Consider the generator matrix
1
0
G=
0
1
1
0
1
1
0
1
0
1
1
0
1
1
0
1
0
1
0
1
1
1
1
1
.
0
1
c
Debbie
Street, 2011
25
0
1
S=
1
1
1
0
1
1
1
1
0
1
0
1
.
1
0
Show that S can not be the generator matrix of a binary linear code.
5. Show that in a binary linear code either all of the codewords have even
weight, or half of the codewords have even weight and half have odd weight.
6. Let S = {100, 101, 111}. Give the linear code that arises from using the
elements of S as the rows of a generator matrix. (Another way of saying
this is to say that we are using the elements of S as a basis for the linear
code.)
7. Let S = {1010, 0101, 1111}. Give the linear code that arises from using
the elements of S as the rows of a generator matrix.
8. let S = {011, 101, 110, 111}. Show that the elements in S are not linearly
independent. Give a subset of S which is linearly independent and give
the corresponding linear code. Can you give a second set of linearly independent basis vectors for this code?
3.2
Cosets
Suppose that C is an (n, m, d) linear code and let u Z2n . Then the coset of C
generated by u is given by
C + u = {v + u|v C}.
Note that the coset of C generated by any u C is C.
EXAMPLE 3.2.1.
Let C = {000, 111}. Then the three other cosets are C + 001 = C + 110 =
{001, 110}, C +010 = C +101 = {010, 101} and C +100 = C +011 = {100, 011}.
This example illustrates a general result: If w C + u then C + w = C + u.
The next result summarises a number of useful facts about cosets, most of
which are immediately apparent and so the technical proofs are left as exercises.
THEOREM 3.2.1.
26
1 0 1 0 1
G=
.
0 1 0 1 1
Then there are four codewords in C, 00000, 10101, 01011 and 11110, arising
from the messages 00, 10, 01 and 11 in order. We see that the minimum weight
of a non-zero codeword is 3 so d(C) = 3. There are 8 cosets corresponding to
C; these are given as the columns in Table 3.1. Observe that each 5-sequence
of weight 1 appears in a unique coset and that there are sequences of weight 2
that appear in the same coset (for instance, 11000 and 00110).
00001
10100
01010
11111
00010
10111
01001
11100
00100
10001
01111
11010
01000
11101
00011
10110
10000
00101
11011
01110
00110
10011
01101
11000
01100
11001
00111
10010
The cosets are very helpful for the purposes of decoding. We know that
cH 0 = 0 if c C. We also know that all n-sequences, v and w, say, in the same
coset have vH 0 = wH 0 since we know that v + w C. Thus we know that
the error pattern and the received word are in the same coset of C. Since we
expect as few errors as possible in an error pattern, when we receive v we find
the coset that contains v, find a word of least weight in that coset, u, say, and
assume that v + u was the codeword sent.
We call the word of least weight in a coset the coset leader. In some cases
the coset leaders are unambiguously determined (the first 6 cosets in Table 3.1)
and in some there are two or more words of equal weight. In this case one is
chosen at random to be the coset leader.
We use the coset leaders to construct a fast decoding strategy. For each
coset leader u we calculate the syndrome uH 0 . When we receive v we evaluate
vH 0 and find the matching syndrome. The corresponding coset leader is the
most likely error pattern and we assume that v + u was the codeword sent.
EXAMPLE 3.2.3.
Consider the code given in Example 3.2.2 and the corresponding cosets given in
Table 3.1. The parity check matrix H is given by
1 0 1 0 0
H = 0 1 0 1 0 .
1 1 0 0 1
c
Debbie
Street, 2011
27
The coset leaders and the corresponding syndromes are given in Table 3.2.
Suppose that we received the codeword 10011. Then we evaluate
10011.H 0 = 110.
Thus we know that the coset leader for that syndrome is 00110. Hence we
assume that the transmitted codeword is
10011 + 00110 = 10101
with corresponding message 10. If we had chosen as the coset leader 11000 then
we would have assumed that the transmitted codeword was
10011 + 11000 = 01011
with corresponding message 01. Thus the choice of coset leader has a direct
bearing on the message received. Note that this is not the case for the cosets
with coset leaders of weight 1. This reflects the fact the distance of the code is
3 and so all errors of weight 1 can be corrected while errors of weight 2 can only
be detected.
Table 3.2: The Eight Syndromes of a (5,2,3) Linear Code
Coset Leader
00000
00001
00010
00100
01000
10000
00110
01100
Syndrome
000
001
010
100
011
101
110
111
In the previous example and discussion we have been making use of the
following result.
THEOREM 3.2.2.
Let C be a linear code of length n with parity check matrix H. Let u and v be
elements of Z2n .
1. uH 0 = 0 if and only if u C.
2. uH 0 = vH 0 if and only if u and v are in the same coset of C.
3. If the error pattern in a received word is u then uH 0 is the sum of the
rows of H 0 that correspond to the positions of the errors.
3.2.1
Exercises
1. List the cosets of the code C = {0000, 1001, 0110, 1111}. Give the syndromes corresponding to each of the cosets. Are the coset leaders uniquely
determined?
c
Debbie
Street, 2011
28
2. List the cosets of the code C = {00000, 10010, 01101, 11111}. Give the
syndromes corresponding to each of the cosets. Are the coset leaders
uniquely determined?
3. Consider the code with generator matrix
1 1 1 0 0
G= 0 0 1 1 1
1 0 0 0 1
0
0 .
1
3.3
A cyclic linear code is a linear code in which for each c = (c1 , c2 , . . . , cn ) C each
cyclic permutation of c is also in C. Thus (c2 , . . . , cn , c1 ) C, (c3 , . . . , cn , c1 , c2 )
C and so on until (cn , c1 , c2 , . . . , cn1 ) C. We will write
(c1 , c2 , . . . , cn ) = (c2 , . . . , cn , c1 ).
EXAMPLE 3.3.1.
Let C = {000, 011, 101, 110}. Let c = 011. Then (c) = 101 and 2 (c) = 110.
Since the sum of any two codewords is a codeword we see that C is a cyclic
linear code.
LEMMA 3.3.1.
3.3.1
Exercises
1. Find the smallest linear cyclic code containing 010101; 010010; 0101100.
(Use your common sense here. First work out how many codewords there
are. It may make more sense to give the basis vectors and say how many
words are in the code.)
2. Let C be the linear code generated by S. Show that C is a cyclic linear
code when S = {010, 011, 111} and when S = {1010, 0101, 1100}.
c
Debbie
Street, 2011
29
3.4
Hankerson, Hoffman, Leonard, Lindner, Phelps, Rodger, and Wall (2000) have
a detailed introductory level account of linear codes and their applications.
c
Debbie
Street, 2011
30
Topic 4
4.1
of our equation. Thus we get the addition and multiplication tables shown in
Table 4.1.
+
0
1
+1
0
0
1
+1
1
1
0
+1
+1
0
1
+1
+1
1
0
0
1
+1
0
0
0
0
0
1
0
1
+1
+1
1
4.1.1
Exercises
4.2
Reed-Solomon Codes
Reed-Solomon codes are cyclic linear codes that use an alphabet from GF [2k ].
They are used extensively in practice; we will discuss how they are used in the
encoding of audio CDs shortly.
We will let GF [2k ][x] be the set of all polynomials with coefficients from
GF [2k ]. We will define a cyclic linear code by using a polynomial with known
roots to give the entries in the first row of the generating matrix. Properties of
c
Debbie
Street, 2011
32
+1
0
+1
1
the code can be deduced from properties of the chosen polynomial. The next
example illustrates this idea.
EXAMPLE 4.2.1.
4 1
0
0
0 0
0 3 4 1
0
0 0
3
4
1
0 0
G= 0
.
0
0
0 3 4 1 0
0
0
0
0 3 4 1
Messages in C are of length 5 and have an alphabet chosen from GF [8]. Thus
there are 85 codewords in C. As usual, the message m corresponds to codeword
mG. For example, m = (1, , 5 , 0, 0) corresponds to the codeword 1row 1 of
G + row 2 of G + 5 row 3 of G which is ( 3 , 0, 2 , 4 , 5 , 0, 0). (From
the list of the elements of the field given above we know that 2 + = 4 , for
example.)
In the previous example we have constructed an (n = 7, m = 5) cyclic linear
code. We say this code has generating polynomial g(x). Clearly the distance of
the code is at most 3 (since there are three non-zero entries in each row of G).
One way to determine the distance of the code would be to show that there is
no linear combination of the rows of G with fewer than 3 non-zero entries. We
will not do that at the moment.
Can we determine a generating polynomial so that we know the properties
of the cyclic linear code to which it gives rise? We start by considering the form
of the parity check matrix of the previous example.
EXAMPLE 4.2.2.
We know that and 2 are the roots of g(x). Since is a root of g(x) this
means that
g() = 3 + 4 + 2 = 0 = 3 4 1 .
2
Also
xg(x) = 3 x + 4 x2 + x3 = 0 + 3 x + 4 x2 + x3
and so we have
g() = 0 =
c
Debbie
Street, 2011
33
1
2 .
3
2
3
4
5
6
= 0.
We can use the root 2 in the same way and so we find that the matrix
1
1
2
2
( 2 )2
3
2 3
H0 =
4 ( 2 )4
5 ( 2 )5
( )
6
( 2 )6
satisfies GH 0 = 0. The two columns of H 0 are linearly independent and so we
know that the code with generating matrix G has distance 3 (using Theorem
3.1.2).
In the previous example it was easy to see that the two columns of H 0
were linearly independent but we could prove it formally by considering the
determinant of the submatrix constructed from any two rows of H 0 and showing
that any such submatrix has a non-zero determinant. In this example we have
that the general form of a submatrix is
i
( 2 )i
j ( 2 )j
with determinant i 2j j 2i = i+2j + 2i+j = i+j ( i + j ) since we are
working over GF [8].
The matrix above is closely related to a Vandermonde matrix. A Vandermonde matrix of order s has the following form.
1
1
1
...
1
x1
x2
x3
...
xs
2
2
2
x21
x
.
.
.
x
x
s
2
3
..
..
..
..
..
.
.
.
.
.
s1
s1
s1
xs1
x
x
.
.
.
x
s
1
2
3
An expression for the determinant can be found in general; here we give the
result over GF [2k ] only.
k
LEMMA 4.2.1. Let x1 , x2 , . . . , xs be non-zero elements of GF
Q[2 ]. Then the
i<j (xi
+ xj ).
Proof. We prove the result by induction. For s = 2 the result is clear (recalling
that we are working mod 2). For s = 3 we have
1
1
1
x1 x2 x3 .
x21 x22 x23
c
Debbie
Street, 2011
34
1
1
1
1
1
1
x1 + x1
x2 + x1
x3 + x1 = 0
x2 + x1
x3 + x1 .
2
2
2
2
x1 + x1 x2 + x2 x1 x3 + x3 x1
0 x2 (x2 + x1 ) x3 (x3 + x1 )
Thus
x
det
1
x21
1
x2
x22
1
1
x3 = (x2 + x1 )(x3 + x1 ) det
x
2
x23
1
x3
Thus the result holds for s = 3. To complete the proof we only need to observe
that in general we add x1 times row i to row i + 1. The details are left as an
exercise.
THEOREM 4.2.1.
g(x)
xg(x)
x g(x)
G=
..
.
x2
d1
g(x)
and d(C) d.
Proof. The roots of g(x) are given by `+1 , `+2 , . . . , `+d1 .
matrix
1
1
...
1
`+1
`+2
`+d1
.
.
.
`+1 2
( `+2 )2
...
( `+d1 )2
H 0 = ( )
..
..
..
..
.
.
.
.
( `+1 )n1 ( `+2 )n1 . . . ( `+d1 )n1
Construct the
This matrix satisfies GH 0 = 0 and has a non-zero determinant for any submatrix
of order d 1. So d(C) d as required and H is the parity check matrix of
C.
The codes of this theorem are Reed-Solomon codes. They have n = 2k 1,
m = 2k d and distance d. So the Reed-Solomon code of Example 4.2.1 has
n = 23 1 = 7, m = 23 3 = 5 and distance d = 3.
While we can use the alphabet over GF [2k ], in practice we are using a binary
channel to transmit the codewords and so we use a binary representation of the
codewords. In this representation, each element of GF [2k ] is replaced by its
equivalent binary k-tuple. The next example illustrates this idea.
EXAMPLE 4.2.3.
c
Debbie
Street, 2011
35
1 0
G=
.
0 2 1
There are 16 possible messages (the ordered pairs of elements of GF [4]). To get
the corresponding codewords we evaluate mG over GF [4]. The 16 messages,
the corresponding codewords and the binary representations of these codewords
are given below.
m
00
01
0
02
10
11
1
12
mG
000
02 1
01
02
2 10
2 1
2 0
2 2 2
c
000000
001110
001001
000111
111000
110110
110001
111111
m
0
1
2
2 0
2 1
2
2 2
mG
10
111
12
102
2 0
01
12
c
100100
101010
101101
100011
011100
010010
010101
011011
We have seen that the Reed-Solomon codes all have length 2k 1 for some
k. Sometimes we want codes of a different length from this. A shortened ReedSolomon code of length n t is obtained by taking all codewords with 0 in the
final t positions (of the representation over GF [2k ] of course, given the value of
n) and deleting those positions.
EXAMPLE 4.2.4.
4.2.1
Exercises
36
4.3
Consider the Reed-Solomon code of Example 4.2.1 Recall that the generating
matrix was given by
3
4 1
0
0
0 0
0 3 4 1
0
0 0
3
4
1
0 0
G= 0
.
0
0
0 3 4 1 0
0
0
0
0 3 4 1
Consider the message m = (m1 , m2 , m3 , m4 , m5 ). It becomes the codeword
c = mG and we see that the last entry of c is m5 . The second-last entry of c,
c6 = m4 + 4 m5 and so we can find m4 . Proceeding in this way we can find m
given that c has been received without errors.
The next example illustrates the ideas of both location and magnitude of
the errors for the Reed-Solomon codes.
EXAMPLE 4.3.2.
Consider the Reed-Solomon code of Example 4.2.3. Suppose that we sent the
codeword c = (2 , , 1) and received the word v = (2 , , 2 ). Then the error
is e = (0, 0, 2 + 1) = (0, 0, ). The error appears in position 2 and is of size
.
We know that the code of the previous example had distance 2 and so we
would not expect to be able to correct any errors (see Theorem 3.1.1). In the
next example we consider the smallest Reed-Solomon code of distance 3 and see
how a decoding strategy might be developed.
EXAMPLE 4.3.3.
37
We know that the roots of g(x) are 1 and . To decode any received sequence
it is helpful to construct the parity check matrix. As in the proof of Theorem
4.2.1, we know that
1 1
H0 = 1 .
1 2
Suppose that we receive v = (, 1 + , 1 + ). We evaluate vH 0 = (, 1).
Since this is not 0 we assume that 1 error has occurred (since d = 3 and so we
can correct 1 error). Assume that the size of the error is s1 and that it is in
position p1 . Thus we know that e = (s1 , 0, 0) (so the error is in position 1) or
(0, s1 , 0) (so the error is in position ) or (0, 0, s1 ) (so the error is in position
2 ).
Evaluating eH 0 gives (s1 , s1 p1 ). Equating this to vH 0 = (, 1) we see that
s1 = and p1 = 2 . Thus we assume that the error is given by (0, 0, ) which
would mean that the corrected received sequence would be
(, 1 + , 1 + ) + (0, 0, ) = (, 1 + , 1)
which is indeed a codeword.
So now we have some idea of how a decoding strategy is going to work. We
will assume that there are at most t errors, where d 2t + 1, and that these
errors are located in positions p1 , p2 , . . . , pt and have sizes s1 , s2 , . . . , st . We will
evaluate vH 0 and equate these values to the values that we expect from the
error vector. Then we will solve for the si and the pi .
We will work through another example before giving a general description
of the decoding algorithm.
EXAMPLE 4.3.4.
Once again we will use the field GF [23 ] generated from the irreducible cubic
1 + x + x3 . We will use for the primitive element of the field. Then we have
GF [8] = {0, 1, , 2 , 3 = + 1, 4 = 2 + , 5 = 2 + + 1, 6 = 2 + 1}.
Let the generating polynomial for the code be
g(x) = (1 + x)( + x)( 2 + x)( 3 + x).
This gives rise to a Reed-Solomon code with n = 7, m = 3 and d = 5. This
means that t = 2. We know that the parity check matrix is given by
1 1
1
1
1 2 3
1 2 4 6
3
6
2
H0 =
1 4
.
5
1
1 5 3
1 6 5 4
Suppose that we receive v = ( 6 , , 5 , 2 , 1, 0, 2 ). We calculate vH 0 =
(1, 3 , 3 , 1). As this is not 0 we assume that two errors have been made. We
further assume that the first error is in position p1 and is of size s1 and that
c
Debbie
Street, 2011
38
the second error is in position p2 and is of size s2 . That is, e has two nonzero entries, in positions p1 and p2 , and these entries are equal to s1 and s2
respectively. So we want to evaluate these four unknowns.
We do this by evaluating eH 0 . This gives
(s1 + s2 , s1 p1 + s2 p2 , s1 p21 + s2 p22 , s1 p31 + s2 p32 ).
Thus we get the four equations
1 =
3 =
3 =
1 =
s1 + s2
s1 p1 + s2 p2
s1 p21 + s2 p22
s1 p31 + s2 p32 .
On the face of it these equations are going to be very difficult to solve. But
we notice that these equations are linear in the si s so if we could solve for the
pi s then we could substitute and solve for the si s.
To this end lets construct a polynomial which has as its 0s p1 and p2 . So
we let p (x) = (p1 + x)(p2 + x). Now we expand p and get
p1 p2 + (p1 + p2 )x + x2 = p (x) = 0 + 1 x + x2 .
Now we will multiply both sides of this equation by s1 to get
s1 p (x) = s1 0 + s1 1 x + s1 x2 .
Substituting p1 for x gives
0 = s1 0 + s1 1 p1 + s1 p21 .
Now multiply both sides of the original equation by s2 to get
s2 p (x) = s2 0 + s2 1 x + s2 x2 .
Substituting p2 for x gives
0 = s2 0 + s2 1 p2 + s2 p22 .
Adding we get
0 = (s1 + s2 )0 + (s1 p1 + s2 p2 )1 + s1 p21 + s2 p22 .
We realise, however, that we know some of these values and so we end up with
0 = 0 + 3 1 + 3 .
Proceeding in the same way, multiplying this time by s1 p1 and s2 p2 and
substituting and adding, we get
0 = 3 0 + 3 1 + 1.
So now we have two equations in the two unknowns 0 and 1 . Solving we
have 0 = 1 and 1 = 5 . This means that we know that p (x) = 1 + 5 x + x2 .
To find the values of p1 and p2 we substitute each of the field elements in turn
into p . We get
c
Debbie
Street, 2011
39
x
1
2
3
p (x)
5
0
4
+ 2
x
4
5
6
p (x)
1 + + 2
1
0
Now we generalise what we have just done in the previous two examples.
1. Take a received sequence v.
2. Evaluate vH 0 .
3. If vH 0 = 0, assume that no errors have been made. Decode the received
sequence.
4. Otherwise assume that at most t errors have been made, where d 2t + 1.
Further assume that the errors are located in positions p1 , p2 , . . . , pt with
sizes s1 , s2 , . . . , st . Construct a polynomial p (x) with roots p1 , p2 , . . . , pt .
5. Expand p (x). The coefficient of xi in this polynomial is i .
6. Multiply both sides of p (x) by si pji , substitute x = pi and sum over i.
Do this for each j = ` + 1 to j = ` + t.
7. Solve the resulting equations for the i and hence the pi .
8. Now solve the equations that arise from vH 0 for the si .
9. Calculate the most likely error vector, e.
10. Decode v + e.
4.3.1
Exercises
40
4.4
To date we have been assuming that the errors that arise when sending an
encoded message arise independently. But sometimes this is not the case think about a scratch on the surface of a CD for example. A set of errors that
occur together are termed a burst of errors. We will say that the burst length of
an error is the number of digits from the first 1 in e to the last 1 in e.
EXAMPLE 4.4.1.
0
1
0
0
0
1
0
0
0
1
1
0
1
0
1
0
1
1
How does interleaving help increase with the burst error correcting properties? Suppose that C is ` burst error correcting. Suppose that we interleave C
to depth s. Then a burst of errors of length at most s` during transmission can
affect at most ` digits in a codeword. So as long at there is only one burst error
pattern affecting the codeword it can be corrected. We can summarise these
observations as follows.
THEOREM 4.4.1.
Suppose that C is an ` burst error correcting code and suppose that C is interc
Debbie
Street, 2011
41
leaved to depth s. Then all bursts of length at most s` can be corrected, assuming
that each codeword is affected by at most one burst of errors.
Take a code that is 1 error correcting and interleave it to depth 3. Then the
interleaved code corrects all bursts of length 3.
4.4.1
Exercises
1 1 0 1 0 0
G = 0 0 1 0 1 0 .
1 0 0 1 1 1
Encode the messages 101, 011, 111, 010, 100 and 001. Now find the string
of digits transmitted when the code is interleaved to depth s = 1; s = 2;
s = 3.
4.5
42
The laser tracking device can not stay on track if there are no height changes
nor can it cope if there are lots of changes close together. Thus between any
two 1s (changes in height) there must be at least 2 0s and no more than 10 0s.
Obviously the 28 binary 8-sequences do not have this property, but it turns out
that there are 267 binary 14-sequences that do have this property. 256 of these
are chosen and mapped to the 256 alphabet symbols. This is termed eight to
fourteen modulation (EFM).
The following table shows part of the mapping used in EFM.
Table 4.3: Sample of EFM Mapping
8-sequence
01100101
01100110
01100111
01101000
01101001
01101010
01101011
01101100
01101101
01101110
14-sequence
00000000100010
01000000100100
00100100100010
01001001000010
10000001000010
10010001000010
10001001000010
01000001000010
00000001000010
00010001000010
We also want to have not too many 0s between these strings of length 14
and so we insert a further symbol equal to 1+final bit of left hand string+ final
bit of right hand string between the strings that make up one codeword.
The final encoding is to adjoin a binary 27-sequence with information to
ensure the playback is synchronised. These 27-sequences also have no 1s too
close together or too far apart.
Thus by the end of the encoding we have taken 6 ticks, represented by
6 4 8 = 192 binary digits, and converted them to 588 binary digits.
To decode from the 588 binary digits: start by reversing the final few steps
of the encoding process. So remove the 27 bits which include the synchronising
information, remove the merging 14-string bits and reverse the look-up procedure to get the strings of length 8. Hopefully these are indeed codewords in C2
from which all single errors can be corrected. If more than one error is detected
then all bytes in that word are flagged. Next the interleaving is removed and
C1 , which has distance 5, is used to correct up to 4 erasures, where flagged bytes
are treated as erasures.
How good is this decoding? Decoding in C2 goes wrong if the received word
is within distance 1 of the wrong codeword. There are (28 )28 = 2224 codewords
in C2 and one of them is right. There are 32(28 -1) words of length 32 at
distance 1 from each codeword. To be undetected, an error pattern must take
one codeword in C2 to another codeword in C2 or it must take a codeword in C2
to a word that is distance 1 from an incorrect codeword in C2 . There are (28 )32
binary error patterns possible and of these (2224 1)(1 + 32(28 1)) result in a
the wrong codeword or to a word distance 1 from the wrong codeword. So the
c
Debbie
Street, 2011
43
4.5.1
Exercises
4.6
The material in this topic closely parallels the presentation in Hankerson, Hoffman, Leonard, Lindner, Phelps, Rodger, and Wall (2000).
The introduction to finite fields is an edited version of material in Street and
Burgess (2007).
Table 4.3 is from Watkinson, J. (1988). The Art of Digital Audio. Focal
Press, London, first edition, quoted in Introduction to Sound Recording by
Geoff Martin available at
http://www.tonmeister.ca/main/textbook/node895.html
A good description on CD encoding with very nice diagrams and with different statements about the merging channel bits may be found at
http://www.laesieworks.com/digicom/Storage CD.html
c
Debbie
Street, 2011
44
Topic 5
5.1
Number Theory
The RSA algorithm is based on three ideas from elementary number theory:
modular arithmetic, the Euclidean algorithm and Eulers function.
The RSA algorithm uses properties of the powers mod n. The idea is for
each person j to find two distinct large primes, p and q say, calculate n = pq
and find a pair of numbers d and e such that de 1 mod ((p 1)(q 1)). Then
the public encryption function is (n, e) so each ciphertext that person j receives
is c = me mod n. It turns out that m = cd mod n and this is what we will
prove in this section.
The first step is to be able to find pairs of numbers d and e such that
de 1 mod ((p 1)(q 1)) for some p and q.
Recall from the results on affine ciphers that a pair of numbers such that
de 1 mod 26 were said to be multiplicative inverses mod 26 and that we
showed that d had a multiplicative inverse only if gcd(d, 26) = 1. We said at
the time that this was a general result and this is what we now establish.
Recall that the greatest common divisor of a and b is the largest number that
divides both a and b. We denote this number by gcd(a, b). If the gcd(a, b) = 1
then we say that a and b are relatively prime.
We can find the gcd(a, b) easily if we can calculate the prime factorisations
of a and b. The next example shows how this works.
45
EXAMPLE 5.1.1.
=
=
=
=
95 1 + 31
31 3 + 2
2 15 + 1
12+0
So the divisor in each line becomes the subject of the next line and the
remainder in one line becomes the divisor in the next. The last non-zero remainder is the gcd so gcd(126, 95) = 1 (which is clearly true since 95 = 19 5
and 126 = 2 7 9).
You can also use this algorithm to write the gcd in terms of the original
numbers. So
1
=
=
=
=
=
=
31 2 15
31 (95 31 3) 15
31 95 15 + 31 45
31 46 95 15
(126 95) 46 95 15
126 46 95 61
46
THEOREM 5.1.1.
Find the multiplicative inverse of 21 mod 26. First we observe that gcd(21, 26) =
1 and so we know that 21 has a multiplicative inverse mod 26. Next we use the
Euclidean algorithm to find the inverse. Proceeding as above we have
26 =
21 =
5 =
21 1 + 5
54+1
51
which confirms that the gcd(21, 26) = 1 and gives us the information that we
need to calculate the multiplicative inverse. We do that by expressing 1 as a
linear combination of 26 and 21.
1 = 21 5 4
= 21 (26 21) 4
= 21 5 26 4
and so we see that 5 is the multiplicative inverse of 21 mod 26.
Now we consider a very small example of the RSA system.
EXAMPLE 5.1.4.
11 2 + 2
25+1
so we see that
1 = 11 5 2 = 11 5(24 11 2) = 11 11 24 5
so d = 11. (In this case the same power is used to encrypt and decrypt and you
can show that this is true for every invertible value mod 24. This is an unusual
situation.)
Now suppose that we want to send the message m = 23. We calculate
me mod 35 = 2311 mod 35 = 32 = c. When c is received calculate cd mod 35 =
3211 mod 35 = 23 = m.
We see that we need to be able to calculate modular exponentiation efficiently and we need to establish Eulers Theorem (on which RSA is based).
We make use of the fact that if a b mod n then ah bh mod n. Thus
we can calculate the exponents of numbers mod n without having to do the
exponentiation of more than a few terms. The following example gives the idea.
c
Debbie
Street, 2011
47
EXAMPLE 5.1.5.
What is the final digit of 356 ? Thus we want to know the value of 356 mod 10.
We observe that 32 9 mod 10, 33 9 3 7 mod 10, 34 7 3 1 mod 10
and 35 1 3 3 mod 10. Now 56 = 4 14 so 356 (34 )14 1 mod 10.
We need to use the ideas in the previous example since otherwise the computations would tax the computers memory. For instance if a, b and n are 100
digit numbers then ab has more than 10100 digits and the computers memory
would overflow. By using the ideas in the previous example the computation of
ab mod n can be achieved in at most 700 steps and no number will have more
than 200 digits.
The next results give us a tool for finding suitable values of e, d and n.
THEOREM 5.1.2.
(1) (2) . . . (p 1)
(a)(2a)(3a) . . . (a(p 1))
ap1 (1 2 . . . (p 1)) mod p.
Since all the entries in S are co-prime to p we can divide by each of them to get
1 ap1 mod p.
EXAMPLE 5.1.6.
c
Debbie
Street, 2011
48
Proof. Consider the integers between 1 and ab. Partition them into a sets
Si = {i, a + i, . . . , (b 1)a + i}, 1 i a. For each of these sets, either all of the
numbers are relatively prime to a or none of the numbers are relatively prime
to a. Thus there are (a) sets that have all entries relatively prime to a.
How many of these entries are also co-prime to b? There are b entries in each
set and no two are congruent mod b (since ka + i ja + i mod b implies that
b|(k j) which is a contradiction). So each set contains each of the congruence
classes mod b and so contains (b) entries co-prime to b. The result follows.
EXAMPLE 5.1.7.
5.1.1
Exercises
49
5.2
RSA Cryptosystem
The RSA cryptosystem was proposed by Rivest, Shamir and Adleman, after
whom it is named, in 1977. It is an example of a public key cryptosystem,
a concept first suggested by Diffie and Hellman in 1976. In 1997 documents
released under Freedom of Information in Britain showed that James Ellis had
discovered public key cryptography in 1970 and a variant of RSA had been
found in 1973 by Clifford Cocks.
The idea behind public key cryptography is very simple. Each person in a
group all of whose members want to be able to communicate securely choose
a public key which they publish. This is a function for encrypting a message.
Lets call it Ej for the j th person. Person j also has a secret decryption function
Dj .
So if we want to send message m to person j we look up the list and find
Ej . We then send Ej (m). When person j receives Ej (m) they apply Dj and
get Dj (Ej (m)) = m.
As well as discussing the RSA algorithm we will discuss how easy it is to
recover a message when Dj is not known. We will also briefly discuss ways of
producing a secure cryptosystem which reduces the amount of information that
can be deduced about m merely from seeing the ciphertext c.
5.2.1
Choosing d, e and n
Choose two distinct large primes, say p and q. Let n = pq. Choose an e such
that gcd(e, (n)) = 1. Then find the d such that de 1 mod (n). We release e
and n. Suppose that the message is m where m < n. (If not break m into blocks
that have this property.) Then compute c = me mod n. To recover the message
observe that cd med m mod n since ed = 1 + k(n) by construction. While
this assumes that gcd(m, n) = 1 there is an exercise that shows that in fact we
dont need this to be true to be able to decrypt the ciphertext.
It is best if p and q are chosen independently of each other and have of the
order of 100 digits. The lengths should be slightly different (to make it harder
to factor n) and (again for protection from attack) it is better if neither p 1
nor q 1 have only small prime factors.
To make sure that d exists it is often easiest to let e be a moderately large
prime. Small values for e can be used to help factorise n.
Remember that everyone knows e and n. If we could factor n then we could
recover d. Here is an equivalent problem.
If we know n and (n) then we can find p and q. Observe that n(n)+1 =
pq (p 1)(q 1) + 1 = p + q. So we know pq and p + q. Think about the
quadratic polynomial (x p)(x q). It has roots p and q but it can also be
written as x2 (n (n) + 1)x + n which can be solved using the quadratic
formula and so yields p and q.
5.2.2
RSA is not really fast enough to use to send lots of data but it is a good way
of sending a key to be used for DES (Data Encryption Standard) which is then
used to send the large amounts of data.
c
Debbie
Street, 2011
50
5.2.3
Attacks on RSA
This section is really a quick overview of some ideas that you should be aware of
before trying to implement RSA to use in the real world. See Boneh (1999),
which can be found at crypto.stanford.edudabo/papers/RSA-survery.pdf for
more details.
Suppose that n has g digits and let d be the decryption exponent. If an
attacker has the last g/4 digits of d then they can efficiently find d in time that
is linear in e log2 (e). So if e is small and we have a large part of d then it is
quick to find the rest of d. But if e is large then this result is still no better
than a case-by-case search.
It is tempting to choose small values of d so that messages can be decrypted
quickly. However if d < n1/4 /3 then d can be calculated in time polynomial in
log(n).
Transmitting short plaintext can be a problem. For example suppose that a
56-bit key is to be transmitted for use in DES. This key is the message m and is
about 1017 . This is encrypted to give c me mod n and is likely to have about
200 digits. So Eve calculates cxe mod n for all 1 x 109 and y e mod n for
all 1 y 109 . If she gets a match then cxe y e and so c (xy)e mod n
and so m xy mod n. This attack only works when m is the product of two
integers x and y both less than 109 but in those cases it does work.
This problem is overcome by padding m with random digits at both the
beginning and end to make a larger number initially. A more sophisticated
approach is given by optimal asymmetric encryption padding; see Trappe and
Washington (2006) for more information.
A timing attack works by observing how long it takes Bob to decrypt various
ciphertexts. Similar attacks can be based on looking at the power consumed
by the decryption process. Again look at Trappe and Washington (2006) for
further details.
Other attacks are based on factorising n and it is instructive to see how
quickly computing power has improved.
When RSA was released in 1977 an RSA challenge (now called the RSA-129
challenge) was created.
Given
n
= 1143816257578888676692357799761466120102182
9672124236256256184293570693524573389783059
7123563958705058989075147599290026879543541,
c
Debbie
Street, 2011
51
e = 9007 and
c = 9686961375462206147714092225435588290575999
1124574319874695120930816298225145708356931
476622883989628013391990551829945157815154
find the corresponding plaintext.
In 1977 it was estimated this would take of the order of 4 1016 years. It
was finally solved in 1994 by Atkins, Graff, Lenstra and Leyland.
The plaintext is THE MAGIC WORDS ARE SQUEAMISH OSSIFRAGE.
To quote from Atkins et al:
To find the factorization of RSA-129, we used the double large
prime variation of the multiple polynomial quadratic sieve factoring method. The sieving step took approximately 5000 mips years,
and was carried out in 8 months by about 600 volunteers from more
than 20 countries, on all continents except Antarctica. Combining
the partial relations produced a sparse matrix of 569466 rows and
524338 columns. This matrix was reduced to a dense matrix of
188614 rows and 188160 columns using structured Gaussian elimination. Ordinary Gaussian elimination on this matrix, consisting of
35489610240 bits (4.13 gigabyte), took 45 hours on a 16K MasPar
MP-1 massively parallel computer. The first three dependencies all
turned out to be unlucky and produced the trivial factor RSA-129.
The fourth dependency produced the above factorization.
A history of the RSA factor challenge and the numbers involved at each
stage may be found at http://www.rsa.com/rsalabs/node.asp?id=2093.
5.2.4
Exercises
c
Debbie
Street, 2011
52
5.3
Trappe and Washington (2006) have a nice account of the RSA system,
including both the number theory related to it and a discussion of some
of the number-theoretic attacks that are possible. Boneh (1999), which
can be found at crypto.stanford.edudabo/papers/RSA-survery.pdf, has
more details about attacking the RSA cryptosystem and a criticism of the
textbook RSA.
c
Debbie
Street, 2011
53