CHAPTER 01 - Basics of Coding Theory
CHAPTER 01 - Basics of Coding Theory
= i
i
ix
( ) 11 mod 0
10
1
10
1
= + =
= =
ja ix iy
i
i
i
i
Single error detection
Let X = x
1
x
10
be a correct code and let
Y = x
1
x
J-1
y
J
x
J+1
x
10
with y
J
= x
J
+ a, a = 0
In such a case:
21 Basics of coding theory
The ISBN-code
Transposition detection
Let x
J
and x
k
be exchanged.
IV054
( ) ( )
( )( ) ( ) . and if 11 mod 0
10
1
10
1
k j k j
k j
i
i
i
i
x x j k x x j k
x k j x j k ix iy
= = = =
+ + =
= =
22 Basics of coding theory
Equivalence of codes
Definition Two q -ary codes are called equivalent if one can be obtained from the
other by a combination of operations of the following type:
(a) a permutation of the positions of the code.
(b) a permutation of symbols appering in a fixed position.
Question: Let a code be displayed as an M n matrix. To what correspond
operations (a) and (b)?
Claim: Distances between codewords are unchanged by operations (a), (b).
Consequently, equivalent codes have the same parameters (n,M,d) (and correct
the same number of errors).
IV054
Examples of equivalent codes
Lemma Any q -ary (n,M,d) -code over an alphabet {0,1,,q -1} is equivalent to an
(n,M,d) -code which contains the all-zero codeword 000.
Proof Trivial.
( ) ( )
1 0 2
0 2 1
2 1 0
2 2 2
1 1 1
0 0 0
2
0 1 0 1 1
1 1 1 0 1
1 0 1 1 0
0 0 0 0 0
0 0 0 1 1
1 1 1 1 1
1 1 0 0 0
0 0 1 0 0
1
23 Basics of coding theory
The main coding theory problem
A good (n,M,d) -code has small n, large M and large d.
The main coding theory problem is to optimize one of the parameters n, M, d
for given values of the other two.
Notation: A
q
(n,d) is the largest M such that there is an q -nary (n,M,d) -code.
Theorem (a) A
q
(n,1) = q
n
;
(b) A
q
(n,n) = q.
Proof
(a) obvios;
(b) Let C be an q -nary (n,M,n) -code. Any two distinct codewords of C differ in
all n positions. Hence symbols in any fixed position of M codewords have to be
different A
q
(n,n) s q. Since the q -nary repetition code is (n,q,n) -code, we
get A
q
(n,n) > q.
IV054
24 Basics of coding theory
The main coding theory problem
Example Proof that A
2
(5,3) = 4.
(a) Code C
3
is a (5,4,3) -code, hence A
2
(5,3) > 4.
(b) Let C be a (5,M,3) -code with M > 4.
By previous lemma we can assume that 00000 e C.
C contains at most one codeword with at least four 1's. (otherwise d (x,y) s 2
for two such codewords x, y)
Since 00000 e C there can be no codeword in C with one or two 1.
Since d = 3 C cannot contain three codewords with three 1's.
Since M > 4 there have to be in C two codewords with three 1's. (say 11100,
00111), the only possible codeword with four or five 1's is then 11011.
IV054
25 Basics of coding theory
The main coding theory problem
Theorem Suppose d is odd. Then a binary (n,M,d) -code exists iff a binary
(n +1,M,d +1) -code exists.
Proof Only if case: Let C be a binary code (n,M,d) -code. Let
Since parity of all codewords in C is even, d(x,y) is even for all
x,y e C.
Hence d(C) is even. Since d s d(C) s d +1 and d is odd,
d(C) = d +1.
Hence C is an (n +1,M,d +1) -code.
If case: Let D be an (n +1,M,d +1) -code. Choose code words x, y of D such
that d(x,y) = d +1.
Find a position in which x, y differ and delete this position from all codewords
of D. Resulting code is an (n,M,d) -code.
IV054
( ) { } 2 mod , ... ... C
1
1 1 1 1
=
+ +
= e =
n
i
i n n n n
x x C x x x x x
26 Basics of coding theory
The main coding theory problem
Corollary:
If d is odd, then A
2
(n,d) = A
2
(n +1,d +1).
If d} is even, then A
2
(n,d) = A
2
(n -1,d -1).
Example A
2
(5,3) = 4 A
2
(6,4) = 4
(5,4,3) -code (6,4,4) code
0 0 0 0 0
0 1 1 0 1
1 0 1 1 0 by adding check.
1 1 0 1 1
IV054
27 Basics of coding theory
A general upper bound on A
q
(n,d)
Notation F
q
n
is a set of all words of length n over alphabet {0,1,2,,q -1}
Definition For any codeword u e F
q
n
and any integer r > 0 the sphere of
radius r and centre u is denoted by
S (u,r) = {v e F
q
n
| d (u,v) s r }.
Theorem A sphere of radius r in F
q
n
, 0 s r s n contains
words.
IV054
Proof Let u be a fixed word in F
q
n
. The number of words that differ from u in m
position is
( ) ( )( ) ( )( ) ( )( )
r
n
r
n n n
q q q 1 ... 1 1
2
2 1 0
+ + + +
( )( ) . 1
m
n
m
q
28 Basics of coding theory
A general upper bound on A
q
(n,d)
Theorem (The sphere-packing or Hamming bound)
If C is a q -nary (n,M,2t +1) -code, then
(1)
IV054
Proof Any two spheres of radius t centered on distinct codewords have no
codeword in common. Hence the total number of words in M spheres of radius
t centered on M codewords is given by the left side (1). This number has to be
less or equal to q
n
.
A code which achieves the sphere-packing bound from (1), i.e. such that
equality holds in (1), is called a perfect code.
( ) ( )( ) ( )( ) { }
n
t
n
t
n n
q q q M s + + + 1 ... 1
1 0
29 Basics of coding theory
A general upper bound on A
q
(n,d)
Example An (7,M,3) -code is perfect if
i.e. M = 16
An example of such a code:
C
4
= {0000000, 1111111, 1000101, 1100010, 0110001, 1011000, 0101100,
0010110, 0001011, 0111010, 0011101, 1001110, 0100111, 1010011,
1101001, 1110100}
Table of A
2
(n,d) from 1981
For current best results see http://www.win.tue.nl/math/dw/voorlincod.html
IV054
( ) ( ) ( )
7 7
1
7
0
2 = + M
n d = 3 d = 5 d = 7
5 4 2 -
6 8 2 -
7 16 2 2
8 20 4 2
9 40 6 2
10 72-79 12 2
11 144-158 24 4
12 256 32 4
13 512 64 8
14 1024 128 16
15 2048 256 32
16 2560-3276 256-340 36-37
30 Basics of coding theory
LOWER BOUND for A
q
(n,d)
The following lower bound for A
q
(n,d) is known as Gilbert-Varshanov bound:
Theorem Given d s n, there exists a q -ary (n,M,d) -code with
and therefore
IV054
( )( )
=
>
1
0
1
d
j
j
n
j
n
q
q
M
( )
( )( )
=
>
1
0
1
,
d
j
j
n
j
n
q
q
q
d n A
31 Basics of coding theory
General coding problem
The basic problems of information theory are how to define formally such concepts
as information and how to store or transmit information efficiently.
Let X be a random variable (source) which takes a value x with probability p(x).
The entropy of X is defined by
and it is considered to be the information content of X.
The maximum information which can be stored by an n -value variable is lg n.
In a special case of a binary variable X which takes on the value 1 with probability
p and the value 0 with probability 1 p
S(X) = H(p) = -p lg p - (1 - p)lg(1 - p)
Problem: What is the minimal number of bits we need to transmit n values of X?
Basic idea: To encode more probable outputs of X by shorter binary words.
Example (Morse code)
a .- b - c -.-. d -.. e . f ..-. g --.
h . i .. j .--- k -.- l .-.. m -- n -.
o --- p .--. q --.- r .-. s t - u ..-
v - w .-- x -..- y -.-- z --..
IV054
( ) ( ) ( ) x p x p X S
x
lg
=
32 Basics of coding theory
Shannon's noisless coding theorem
In a simple form Shannon's noisless coding theorem says that in order to transmit
n values of X we need nS(X) bits.
More exactly, we cannot do better and we can reach the bound nS(X) as close as
desirable.
Example Let a source X produce the value 1 with probability p =
Let the source X produce the value 0 with probability 1 - p =
Assume we want to encode blocks of the outputs of X of length 4.
By Shannon's theorem we need 4H () = 3.245 bits per blocks (in average)
A simple and practical methods known as Huffman's code requires in this case
3.273 bits per message.
mess. code mess. code mess. code mess. Code
0000 10 0100 010 1000 011 1100 11101
0001 000 0101 11001 1001 11011 1101 111110
0010 001 0110 11010 1010 11100 1110 111101
0011 11000 0111 1111000 1011 111111 1111 1111001
Observe that this is a prefix code - no codeword is a prefix of another codeword.
IV054
33 Basics of coding theory
Design of Huffman code
Given a sequence of n objects, x
1
,,x
n
with probabilities p
1
> > p
n
.
Stage 1 - shrinking of the sequence.
Replace x
n -1
, x
n
with a new object y
n -1
with probability p
n -1
+ p
n
and rearrange
sequence so one has again nonincreasing probabilities.
Keep doing the above step till the sequence shrinks to two objects.
IV054
Stage 2 - extending the code - Apply again and again the following method.
If C = {c
1
,,c
r
} is a prefix optimal code for a source S
r
, then C' = {c'
1
,,c'
r +1
} is an
optimal code for S
r +1
, where
c'
i
= c
i
1 s i s r 1
c'
r
= c
r
1
c'
r+1
= c
r
0.
34 Basics of coding theory
Design of Huffman code IV054
Stage 2 Apply again and again the following method:
If C = {c
1
,,c
r
} is a prefix optimal code for a source S
r
, then C' = {c'
1
,,c'
r +1
} is an
optimal code for S
r +1
, where
c'
i
= c
i
1 s i s r 1
c'
r
= c
r
1
c'
r+1
= c
r
0.
35 Basics of coding theory
A BIT OF HISTORY
The subject of error-correcting codes arose originally as a response to
practical problems in the reliable communication of digitally encoded
information.
The discipline was initiated in the paper
Claude Shannon: A mathematical theory of communication, Bell
Syst.Tech. Journal V27, 1948, 379-423, 623-656
Shannon's paper started the scientific discipline information theory
and error-corecting codes are its part.
Originally, information theory was a part of electrical engineering.
Nowadays, it is an important part of mathematics and also of
informatics.
IV054
36 Basics of coding theory
A BIT OF HISTORY
SHANNON's VIEW
In the introduction to his seminal paper A mathematical theory of
communication Shannon wrote:
The fundamental problem of communication is that of reproducing at
one point either exactly or approximately a message selected at
another point.
IV054