Error correcting codes and cryptology

Error-correcting codes and cryptology
Ruud Pellikaan 1
,
Xin-Wen Wu 2
,
Stanislav Bulygin 3
and
Relinde Jurrius 4
PRELIMINARY VERSION
23 January 2012
All rights reserved.
To be published by Cambridge University Press.
No part of this manuscript is to be reproduced
without written consent of the authors and the publisher.
1
g.r.pellikaan@tue.nl, Department of Mathematics and Computing Science, Eind-
hoven University of Technology, P.O. Box 513, NL-5600 MB Eindhoven, The Nether-
lands
2
x.wu@griffith.edu.au, School of Information and Communication Technology, Grif-
fith University, Gold Coast, QLD 4222, Australia
3
Stanislav.Bulygin@cased.de, Department of Mathematics, Technische Universität
Darmstadt, Mornewegstrasse 32, 64293 Darmstadt, Germany
4
r.p.m.j.jurrius@tue.nl, Department of Mathematics and Computing Science, Eind-
hoven University of Technology, P.O. Box 513, NL-5600 MB Eindhoven, The Nether-
lands

Contents
1 Introduction 11
1.1 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Error-correcting codes 13
2.1 Block codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Repetition, product and Hamming codes . . . . . . . . . . 15
2.1.2 Codes and Hamming distance . . . . . . . . . . . . . . . . 18
2.1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Linear codes . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 Generator matrix and systematic encoding . . . . . . . . 22
2.2.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Parity checks and dual code . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Parity check matrix . . . . . . . . . . . . . . . . . . . . . 26
2.3.2 Hamming and simplex codes . . . . . . . . . . . . . . . . 28
2.3.3 Inner product and dual codes . . . . . . . . . . . . . . . . 30
2.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Decoding and the error probability . . . . . . . . . . . . . . . . . 33
2.4.1 Decoding problem . . . . . . . . . . . . . . . . . . . . . . 34
2.4.2 Symmetric channel . . . . . . . . . . . . . . . . . . . . . . 35
2.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5 Equivalent codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 Number of generator matrices and codes . . . . . . . . . . 38
2.5.2 Isometries and equivalent codes . . . . . . . . . . . . . . . 40
2.5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3 Code constructions and bounds 47
3.1 Code constructions . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.1 Constructing shorter and longer codes . . . . . . . . . . . 47
3.1.2 Product codes . . . . . . . . . . . . . . . . . . . . . . . . 52
3.1.3 Several sum constructions . . . . . . . . . . . . . . . . . . 55
3.1.4 Concatenated codes . . . . . . . . . . . . . . . . . . . . . 60
3.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2 Bounds on codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 Singleton bound and MDS codes . . . . . . . . . . . . . . 63
3.2.2 Griesmer bound . . . . . . . . . . . . . . . . . . . . . . . 68
3.2.3 Hamming bound . . . . . . . . . . . . . . . . . . . . . . . 69
3

4 CONTENTS
3.2.4 Plotkin bound . . . . . . . . . . . . . . . . . . . . . . . . 71
3.2.5 Gilbert and Varshamov bounds . . . . . . . . . . . . . . . 72
3.2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3 Asymptotically good codes . . . . . . . . . . . . . . . . . . . . . 74
3.3.1 Asymptotic Gibert-Varshamov bound . . . . . . . . . . . 74
3.3.2 Some results for the generic case . . . . . . . . . . . . . . 77
3.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4 Weight enumerator 79
4.1 Weight enumerator . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.1.1 Weight spectrum . . . . . . . . . . . . . . . . . . . . . . . 79
4.1.2 Average weight enumerator . . . . . . . . . . . . . . . . . 83
4.1.3 MacWilliams identity . . . . . . . . . . . . . . . . . . . . 85
4.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 Error probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2.1 Error probability of undetected error . . . . . . . . . . . . 89
4.2.2 Probability of decoding error . . . . . . . . . . . . . . . . 89
4.2.3 Random coding . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3 Finite geometry and codes . . . . . . . . . . . . . . . . . . . . . . 90
4.3.1 Projective space and projective systems . . . . . . . . . . 90
4.3.2 MDS codes and points in general position . . . . . . . . . 95
4.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Extended weight enumerator . . . . . . . . . . . . . . . . . . . . 97
4.4.1 Arrangements of hyperplanes . . . . . . . . . . . . . . . . 97
4.4.2 Weight distribution of MDS codes . . . . . . . . . . . . . 102
4.4.3 Extended weight enumerator . . . . . . . . . . . . . . . . 104
4.4.4 Puncturing and shortening . . . . . . . . . . . . . . . . . 107
4.4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.5 Generalized weight enumerator . . . . . . . . . . . . . . . . . . . 111
4.5.1 Generalized Hamming weights . . . . . . . . . . . . . . . 111
4.5.2 Generalized weight enumerators . . . . . . . . . . . . . . . 113
4.5.3 Generalized weight enumerators of MDS-codes . . . . . . 115
4.5.4 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5 Codes and related structures 123
5.1 Graphs and codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.1.1 Colorings of a graph . . . . . . . . . . . . . . . . . . . . . 123
5.1.2 Codes on graphs . . . . . . . . . . . . . . . . . . . . . . . 126
5.1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.2 Matroids and codes . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2.1 Matroids . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2.2 Realizable matroids . . . . . . . . . . . . . . . . . . . . . 129
5.2.3 Graphs and matroids . . . . . . . . . . . . . . . . . . . . . 130
5.2.4 Tutte and Whitney polynomial of a matroid . . . . . . . . 131
5.2.5 Weight enumerator and Tutte polynomial . . . . . . . . . 132
5.2.6 Deletion and contraction of matroids . . . . . . . . . . . . 133

CONTENTS 5
5.2.7 McWilliams type property for duality . . . . . . . . . . . 134
5.2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3 Geometric lattices and codes . . . . . . . . . . . . . . . . . . . . 136
5.3.1 Posets, the Möbius function and lattices . . . . . . . . . . 136
5.3.2 Geometric lattices . . . . . . . . . . . . . . . . . . . . . . 141
5.3.3 Geometric lattices and matroids . . . . . . . . . . . . . . 144
5.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.4 Characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . 146
5.4.1 Characteristic and Möbius polynomial . . . . . . . . . . . 146
5.4.2 Characteristic polynomial of an arrangement . . . . . . . 148
5.4.3 Characteristic polynomial of a code . . . . . . . . . . . . 150
5.4.4 Minimal codewords and subcodes . . . . . . . . . . . . . . 156
5.4.5 Two variable zeta function . . . . . . . . . . . . . . . . . 157
5.4.6 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.5 Combinatorics and codes . . . . . . . . . . . . . . . . . . . . . . . 158
5.5.1 Orthogonal arrays and codes . . . . . . . . . . . . . . . . 158
5.5.2 Designs and codes . . . . . . . . . . . . . . . . . . . . . . 161
5.5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6 Complexity and decoding 165
6.1 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.1.1 Big-Oh notation . . . . . . . . . . . . . . . . . . . . . . . 165
6.1.2 Boolean functions . . . . . . . . . . . . . . . . . . . . . . 166
6.1.3 Hard problems . . . . . . . . . . . . . . . . . . . . . . . . 171
6.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.2.1 Decoding complexity . . . . . . . . . . . . . . . . . . . . . 173
6.2.2 Decoding erasures . . . . . . . . . . . . . . . . . . . . . . 174
6.2.3 Information and covering set decoding . . . . . . . . . . . 177
6.2.4 Nearest neighbor decoding . . . . . . . . . . . . . . . . . . 184
6.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.3 Difficult problems in coding theory . . . . . . . . . . . . . . . . . 184
6.3.1 General decoding and computing minimum distance . . . 184
6.3.2 Is decoding up to half the minimum distance hard? . . . . 187
6.3.3 Other hard problems . . . . . . . . . . . . . . . . . . . . . 188
6.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7 Cyclic codes 189
7.1 Cyclic codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.1.1 Definition of cyclic codes . . . . . . . . . . . . . . . . . . 189
7.1.2 Cyclic codes as ideals . . . . . . . . . . . . . . . . . . . . 191
7.1.3 Generator polynomial . . . . . . . . . . . . . . . . . . . . 192
7.1.4 Encoding cyclic codes . . . . . . . . . . . . . . . . . . . . 195
7.1.5 Reversible codes . . . . . . . . . . . . . . . . . . . . . . . 196
7.1.6 Parity check polynomial . . . . . . . . . . . . . . . . . . . 197
7.1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.2 Defining zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.2.1 Structure of finite fields . . . . . . . . . . . . . . . . . . . 201

6 CONTENTS
7.2.2 Minimal polynomials . . . . . . . . . . . . . . . . . . . . . 205
7.2.3 Cyclotomic polynomials and cosets . . . . . . . . . . . . . 206
7.2.4 Zeros of the generator polynomial . . . . . . . . . . . . . 211
7.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
7.3 Bounds on the minimum distance . . . . . . . . . . . . . . . . . . 214
7.3.1 BCH bound . . . . . . . . . . . . . . . . . . . . . . . . . . 214
7.3.2 Quadratic residue codes . . . . . . . . . . . . . . . . . . . 217
7.3.3 Hamming, simplex and Golay codes as cyclic codes . . . . 217
7.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.4 Improvements of the BCH bound . . . . . . . . . . . . . . . . . . 219
7.4.1 Hartmann-Tzeng bound . . . . . . . . . . . . . . . . . . . 219
7.4.2 Roos bound . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.4.3 AB bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.4.4 Shift bound . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.5 Locator polynomials and decoding cyclic codes . . . . . . . . . . 229
7.5.1 Mattson-Solomon polynomial . . . . . . . . . . . . . . . . 229
7.5.2 Newton identities . . . . . . . . . . . . . . . . . . . . . . . 230
7.5.3 APGZ algorithm . . . . . . . . . . . . . . . . . . . . . . . 232
7.5.4 Closed formulas . . . . . . . . . . . . . . . . . . . . . . . . 234
7.5.5 Key equation and Forney’s formula . . . . . . . . . . . . . 235
7.5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8 Polynomial codes 241
8.1 RS codes and their generalizations . . . . . . . . . . . . . . . . . 241
8.1.1 Reed-Solomon codes . . . . . . . . . . . . . . . . . . . . . 241
8.1.2 Extended and generalized RS codes . . . . . . . . . . . . 243
8.1.3 GRS codes under transformations . . . . . . . . . . . . . 247
8.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
8.2 Subfield and trace codes . . . . . . . . . . . . . . . . . . . . . . . 251
8.2.1 Restriction and extension by scalars . . . . . . . . . . . . 251
8.2.2 Parity check matrix of a restricted code . . . . . . . . . . 252
8.2.3 Invariant subspaces . . . . . . . . . . . . . . . . . . . . . . 254
8.2.4 Cyclic codes as subfield subcodes . . . . . . . . . . . . . . 257
8.2.5 Trace codes . . . . . . . . . . . . . . . . . . . . . . . . . . 258
8.2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
8.3 Some families of polynomial codes . . . . . . . . . . . . . . . . . 259
8.3.1 Alternant codes . . . . . . . . . . . . . . . . . . . . . . . . 259
8.3.2 Goppa codes . . . . . . . . . . . . . . . . . . . . . . . . . 260
8.3.3 Counting polynomials . . . . . . . . . . . . . . . . . . . . 263
8.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
8.4 Reed-Muller codes . . . . . . . . . . . . . . . . . . . . . . . . . . 266
8.4.1 Punctured Reed-Muller codes as cyclic codes . . . . . . . 266
8.4.2 Reed-Muller codes as subfield subcodes and trace codes . 267
8.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
8.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

CONTENTS 7
9 Algebraic decoding 271
9.1 Error-correcting pairs . . . . . . . . . . . . . . . . . . . . . . . . 271
9.1.1 Decoding by error-correcting pairs . . . . . . . . . . . . . 271
9.1.2 Existence of error-correcting pairs . . . . . . . . . . . . . 275
9.1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
9.2 Decoding by key equation . . . . . . . . . . . . . . . . . . . . . . 277
9.2.1 Algorithm of Euclid-Sugiyama . . . . . . . . . . . . . . . 277
9.2.2 Algorithm of Berlekamp-Massey . . . . . . . . . . . . . . 278
9.2.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
9.3 List decoding by Sudan’s algorithm . . . . . . . . . . . . . . . . . 281
9.3.1 Error-correcting capacity . . . . . . . . . . . . . . . . . . 282
9.3.2 Sudan’s algorithm . . . . . . . . . . . . . . . . . . . . . . 285
9.3.3 List decoding of Reed-Solomon codes . . . . . . . . . . . . 287
9.3.4 List Decoding of Reed-Muller codes . . . . . . . . . . . . 291
9.3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
9.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
10 Cryptography 295
10.1 Symmetric cryptography and block ciphers . . . . . . . . . . . . 295
10.1.1 Symmetric cryptography . . . . . . . . . . . . . . . . . . . 295
10.1.2 Block ciphers. Simple examples . . . . . . . . . . . . . . . 296
10.1.3 Security issues . . . . . . . . . . . . . . . . . . . . . . . . 300
10.1.4 Modern ciphers. DES and AES . . . . . . . . . . . . . . . 302
10.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.2 Asymmetric cryptosystems . . . . . . . . . . . . . . . . . . . . . 308
10.2.1 RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
10.2.2 Discrete logarithm problem and public-key cryptography 314
10.2.3 Some other asymmetric cryptosystems . . . . . . . . . . . 316
10.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
10.3 Authentication, orthogonal arrays, and codes . . . . . . . . . . . 317
10.3.1 Authentication codes . . . . . . . . . . . . . . . . . . . . . 317
10.3.2 Authentication codes and other combinatorial objects . . 321
10.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
10.4 Secret sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
10.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
10.5 Basics of stream ciphers. Linear feedback shift registers . . . . . 329
10.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
10.6 PKC systems using error-correcting codes . . . . . . . . . . . . . 335
10.6.1 McEliece encryption scheme . . . . . . . . . . . . . . . . . 336
10.6.2 Niederreiter’s encryption scheme . . . . . . . . . . . . . . 338
10.6.3 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
10.6.4 The attack of Sidelnikov and Shestakov . . . . . . . . . . 343
10.6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
10.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
10.7.1 Section 10.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 345
10.7.2 Section 10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 347
10.7.3 Section 10.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 348
10.7.4 Section 10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 348
10.7.5 Section 10.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 349
10.7.6 Section 10.6 . . . . . . . . . . . . . . . . . . . . . . . . . . 349

8 CONTENTS
11 The theory of Gröbner bases and its applications 351
11.1 Polynomial system solving . . . . . . . . . . . . . . . . . . . . . . 352
11.1.1 Linearization techniques . . . . . . . . . . . . . . . . . . . 352
11.1.2 Gröbner bases . . . . . . . . . . . . . . . . . . . . . . . . 355
11.1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
11.2 Decoding codes with Gröbner bases . . . . . . . . . . . . . . . . . 363
11.2.1 Cooper’s philosophy . . . . . . . . . . . . . . . . . . . . . 363
11.2.2 Newton identities based method . . . . . . . . . . . . . . 368
11.2.3 Decoding arbitrary linear codes . . . . . . . . . . . . . . . 371
11.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
11.3 Algebraic cryptanalysis . . . . . . . . . . . . . . . . . . . . . . . 374
11.3.1 Toy example . . . . . . . . . . . . . . . . . . . . . . . . . 374
11.3.2 Writing down equations . . . . . . . . . . . . . . . . . . . 375
11.3.3 General S-Boxes . . . . . . . . . . . . . . . . . . . . . . . 378
11.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
11.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
12 Coding theory with computer algebra packages 381
12.1 Singular . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
12.2 Magma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
12.2.1 Linear codes . . . . . . . . . . . . . . . . . . . . . . . . . 384
12.2.2 AG-codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
12.2.3 Algebraic curves . . . . . . . . . . . . . . . . . . . . . . . 385
12.3 GAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
12.4 Sage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
12.4.1 Coding Theory . . . . . . . . . . . . . . . . . . . . . . . . 387
12.4.2 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . 388
12.4.3 Algebraic curves . . . . . . . . . . . . . . . . . . . . . . . 388
12.5 Coding with computer algebra . . . . . . . . . . . . . . . . . . . 388
12.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 388
12.5.2 Error-correcting codes . . . . . . . . . . . . . . . . . . . . 388
12.5.3 Code constructions and bounds . . . . . . . . . . . . . . . 392
12.5.4 Weight enumerator . . . . . . . . . . . . . . . . . . . . . . 395
12.5.5 Codes and related structures . . . . . . . . . . . . . . . . 397
12.5.6 Complexity and decoding . . . . . . . . . . . . . . . . . . 397
12.5.7 Cyclic codes . . . . . . . . . . . . . . . . . . . . . . . . . . 397
12.5.8 Polynomial codes . . . . . . . . . . . . . . . . . . . . . . . 399
12.5.9 Algebraic decoding . . . . . . . . . . . . . . . . . . . . . . 401
13 Bézout’s theorem and codes on plane curves 403
13.1 Affine and projective space . . . . . . . . . . . . . . . . . . . . . 403
13.2 Plane curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
13.3 Bézout’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
13.3.1 Another proof of Bezout’s theorem by the footprint . . . 413
13.4 Codes on plane curves . . . . . . . . . . . . . . . . . . . . . . . . 413
13.5 Conics, arcs and Segre . . . . . . . . . . . . . . . . . . . . . . . . 414
13.6 Qubic plane curves . . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.6.1 Elliptic cuves . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.6.2 The addition law on elliptic curves . . . . . . . . . . . . . 414
13.6.3 Number of rational points on an elliptic curve . . . . . . . 414

CONTENTS 9
13.6.4 The discrete logarithm on elliptic curves . . . . . . . . . . 414
13.7 Quartic plane curves . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.7.1 Flexes and bitangents . . . . . . . . . . . . . . . . . . . . 414
13.7.2 The Klein quartic . . . . . . . . . . . . . . . . . . . . . . 414
13.8 Divisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.9 Differentials on a curve . . . . . . . . . . . . . . . . . . . . . . . . 417
13.10The Riemann-Roch theorem . . . . . . . . . . . . . . . . . . . . . 419
13.11Codes from algebraic curves . . . . . . . . . . . . . . . . . . . . . 421
13.12Rational functions and divisors on plane curves . . . . . . . . . . 424
13.13Resolution or normalization of curves . . . . . . . . . . . . . . . . 424
13.14Newton polygon of plane curves . . . . . . . . . . . . . . . . . . . 424
13.15Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
14 Curves 427
14.1 Algebraic varieties . . . . . . . . . . . . . . . . . . . . . . . . . . 428
14.2 Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
14.3 Curves and function fields . . . . . . . . . . . . . . . . . . . . . . 428
14.4 Normal rational curves and Segre’s problems . . . . . . . . . . . 428
14.5 The number of rational points . . . . . . . . . . . . . . . . . . . . 428
14.5.1 Zeta function . . . . . . . . . . . . . . . . . . . . . . . . . 428
14.5.2 Hasse-Weil bound . . . . . . . . . . . . . . . . . . . . . . 428
14.5.3 Serre’s bound . . . . . . . . . . . . . . . . . . . . . . . . . 428
14.5.4 Ihara’s bound . . . . . . . . . . . . . . . . . . . . . . . . . 428
14.5.5 Drinfeld-Vl˘adut¸ bound . . . . . . . . . . . . . . . . . . . . 428
14.5.6 Explicit formulas . . . . . . . . . . . . . . . . . . . . . . . 428
14.5.7 Oesterlé’s bound . . . . . . . . . . . . . . . . . . . . . . . 428
14.6 Trace codes and curves . . . . . . . . . . . . . . . . . . . . . . . . 428
14.7 Good curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
14.7.1 Maximal curves . . . . . . . . . . . . . . . . . . . . . . . . 428
14.7.2 Shimura modular curves . . . . . . . . . . . . . . . . . . . 428
14.7.3 Drinfeld modular curves . . . . . . . . . . . . . . . . . . . 428
14.7.4 Tsfasman-Vl˘adut¸-Zink bound . . . . . . . . . . . . . . . . 428
14.7.5 Towers of Garcia-Stichtenoth . . . . . . . . . . . . . . . . 428
14.8 Applications of AG codes . . . . . . . . . . . . . . . . . . . . . . 429
14.8.1 McEliece crypto system with AG codes . . . . . . . . . . 429
14.8.2 Authentication codes . . . . . . . . . . . . . . . . . . . . . 429
14.8.3 Fast multiplication in finite fields . . . . . . . . . . . . . . 431
14.8.4 Correlation sequences and pseudo random sequences . . . 431
14.8.5 Quantum codes . . . . . . . . . . . . . . . . . . . . . . . . 431
14.8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
14.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

Chapter 1
Introduction
Acknowledgement:
1.1 Notes
11

Chapter 2
Error-correcting codes
Ruud Pellikaan and Xin-Wen Wu
The idea of redundant information is a well known phenomenon in reading a
newspaper. Misspellings go usually unnoticed for a casual reader, while the
meaning is still grasped. In Semitic languages such as Hebrew, and even older
in the hieroglyphics in the tombs of the pharaohs of Egypt, only the consonants
are written while the vowels are left out, so that we do not know for sure how to
pronounce these words nowadays. The letter “e” is the most frequent occurring
symbol in the English language, and leaving out all these letters would still give
in almost all cases an understandable text to the expense of greater attention
of the reader.
The art and science of deleting redundant information in a clever way such that
it can be stored in less memory or space and still can be expanded to the original
message, is called data compression or source coding. It is not the topic of this
book. So we can compress data but an error made in a compressed text would
give a diﬀerent message that is most of the time completely meaningless.
The idea in error-correcting codes is the converse. One adds redundant informa-
tion in such a way that it is possible to detect or even correct errors after trans-
mission. In radio contacts between pilots and radarcontroles the letters in the
alphabet are spoken phonetically as ”Alpha, Bravo, Charlie, ...” but ”Adams,
Boston, Chicago, ...” is more commonly used for spelling in a telephone conver-
sation. The addition of a parity check symbol enables one to detect an error,
such as on the former punch cards that were fed to a computer, in the ISBN code
for books, the European Article Numbering (EAN) and the Universal Product
Code (UPC) for articles. Error-correcting codes are common in numerous sit-
uations such as in audio-visual media, fault-tolerant computers and deep space
telecommunication.
more examples: QR quick response 2D code.
deep space, compact disc and DVD, .....
more pictures
13

14 CHAPTER 2. ERROR-CORRECTING CODES
source
encoding
sender
noise
receiver
decoding
targetE
message
E
001...
E
011...
E
message
T
Figure 2.1: Block diagram of a communication system
2.1 Block codes
Legend goes that Hamming was so frustrated the computer halted every time it
detected an error after he handed in a stack of punch cards, he thought about a
way the computer would be able not only to detect the error but also to correct
it automatically. He came with his nowadays famous code named after him.
Whereas the theory of Hamming is about the actual construction, the encoding
and decoding of codes and uses tools from combinatorics and algebra , the ap-
proach of Shannon leads to information theory and his theorems tell us what is
and what is not possible in a probabilistic sense.
According to Shannon we have a message m in a certain alphabet and of a
certain length, we encode m to c by expanding the length of the message and
adding redundant information. One can define the information rate R that mea-
sures the slowing down of the transmission of the data. The encoded message
c is sent over a noisy channel such that the symbols are changed, according to
certain probabilities that are characteristic of the channel. The received word r
is decoded to m . Now given the characteristics of the channel one can define
the capacity C of the channel and it has the property that for every R < C it is
possible to find an encoding and decoding scheme such that the error probability
that m = m is arbitrarily small. For R > C such a scheme is not possible.
The capacity is explicitly known as a function of the characteristic probability
for quite a number of channels.
The notion of a channel must be taken in a broad sense. Not only the trans-
mission of data via satellite or telephone but also the storage of information on
a hard disk of a computer or a compact disc for music and film can be modeled
by a channel.
The theorem of Shannon tells us the existence of certain encoding and decoding
schemes and one can even say that they exist in abundance and that almost all
schemes satisfy the required conditions, but it does not tell us how to construct
a specific scheme efficiently. The information theoretic part of error-correcting
codes is considered in this book only so far to motivate the construction of cod-
ing and decoding algorithms.

2.1. BLOCK CODES 15
The situation for the best codes in terms of the maximal number of errors that
one can correct for a given information rate and code length is not so clear.
Several existence and nonexistence theorems are known, but the exact bound is
in fact still an open problem.
2.1.1 Repetition, product and Hamming codes
Adding a parity check such that the number of ones is even, is a well-known
way to detect one error. But this does not correct the error.
Example 2.1.1 Replacing every symbol by a threefold repetition gives the pos-
sibility of correcting one error in every 3-tuple of symbols in a received word
by a majority vote. The price one has to pay is that the transmission is three
times slower. We see here the two conflicting demands of error-correction: to
correct as many errors as possible and to transmit as fast a possible. Notice
furthermore that in case two errors are introduced by transmission the majority
decoding rule will introduce an decoding error.
Example 2.1.2 An improvement is the following product construction. Sup-
pose we want to transmit a binary message (m1, m2, m3, m4) of length 4 by
adding 5 redundant bits (r1, r2, r3, r4, r5). Put these 9 bits in a 3 × 3 array as
shown below. The redundant bits are defined by the following conditions. The
sum of the number of bits in every row and in every column should be even.
m1 m2 r1
m3 m4 r2
r3 r4 r5
It is clear that r1, r2, r3 and r4 are well defined by these rules. The condition
on the last row and on the last column are equivalent, given the rules for the
first two rows and columns. Hence r5 is also well defined.
If in the transmission of this word of 9 bits, one symbol is flipped from 0 to 1
or vice versa, then the receiver will notice this, and is able to correct it. Since if
the error occurred in row i and column j, then the receiver will detect an odd
parity in this row and this column and an even parity in the remaining rows
and columns. Suppose that the message is m = (1, 1, 0, 1). Then the redundant
part is r = (0, 1, 1, 0, 1) and c = (1, 1, 0, 1, 0, 1, 1, 0, 1) is transmitted. Suppose
that y = (1, 1, 0, 1, 0, 0, 1, 0, 1) is the received word.
1 1 0
0 1 0 ←
1 0 1
↑
Then the receiver detects an error in row 2 and column 3 and will change the
corresponding symbol.
So this product code can also correct one error as the repetition code but its
information rate is improved from 1/3 to 4/9.
This decoding scheme is incomplete in the sense that in some cases it is not
decided what to do and the scheme will fail to determine a candidate for the
transmitted word. That is called a decoding failure. Sometimes two errors can
be corrected. If the first error is in row i and column j, and the second in row i

and column j with i > i and j = j. Then the receiver will detect odd parities
in rows i and i and in columns j and j . There are two error patterns of two
errors with this behavior. That is errors at the positions (i, j) and (i , j ) or at
the two pairs (i, j ) and (i , j). If the receiver decides to change the ﬁrst two pairs
if j > j and the second two pairs if j < j, then it will recover the transmitted
word half of the time this pattern of two errors takes place. If for instance the
word c = (1, 1, 0, 1, 0, 1, 1, 0, 1) is transmitted and y = (1, 0, 0, 1, 0, 0, 1, 0, 1) is
received, then the above decoding scheme will change it correctly in c. But
if y = (1, 1, 0, 0, 1, 1, 1, 0, 1) is received, then the scheme will change it in the
codeword c = (1, 0, 0, 0, 1, 0, 1, 0, 1) and we have a decoding error.
1 0 0 ←
0 1 0 ←
1 0 1
↑ ↑
1 1 1 ←
0 0 1 ←
1 0 1
↑ ↑
If two errors take place in the same row, then the receiver will see an even par-
ity in all rows and odd parities in the columns j and j . We can expand the
decoding rule to change the bits at the positions (1, j) and (1, j ). Likewise we
will change the bits in positions (i, 1) and (i , 1) if the columns give even parity
and the rows i and i have an odd parity. This decoding scheme will correct all
patterns with 1 error correctly, and sometimes the patterns with 2 errors. But
it is still incomplete, since the received word (1, 1, 0, 1, 1, 0, 0, 1, 0) has an odd
parity in every row and in every column and the scheme fails to decode.
One could extend the decoding rule to get a complete decoding in such a way
that every received word is decoded to a nearest codeword. This nearest code-
word is not always unique.
In case the transmission is by means of certain electro-magnetic pulses or waves
one has to consider modulation and demodulation. The message consists of
letters of a ﬁnite alphabet, say consisting of zeros and ones, and these are mod-
ulated, transmitted as waves, received and demodulated in zeros and ones. In
the demodulation part one has to make a hard decision between a zero or a
one. But usually there is a probability that the signal represents a zero. The
hard decision together with this probability is called a soft decision. One can
make use of this information in the decoding algorithm. One considers the list
of all nearest codewords, and one chooses the codeword in this list that has the
highest probability.
Example 2.1.3 An improvement of the repetition code of rate 1/3 and the
product code of rate 4/9 is given by Hamming. Suppose we have a message
(m1, m2, m3, m4) of 4 bits. Put them in the middle of the following Venn-
diagram of three intersecting circles as given in Figure 2.2. Complete the three
empty areas of the circles according to the rule that the number of ones in every
circle is even. In this way we get 3 redundant bits (r1, r2, r3) that we add to the
message and which we transmit over the channel.
In every block of 7 bits the receiver can correct one error. Since the parity
in every circle should be even. So if the parity is even we declare the circle
correct, if the parity is odd we declare the circle incorrect. The error is in
the incorrect circles and in the complement of the correct circles. We see that
every pattern of at most one error can be corrected in this way. For instance,
if m = (1, 1, 0, 1) is the message, then r = (0, 0, 1) is the redundant information

2.1. BLOCK CODES 17
&%
'$
&%
'$
&%
'$
r1 r2
r3
m4
m3
m2 m1
Figure 2.2: Venn diagram of the Hamming code
&%
'$
&%
'$
&%
'$
0 0
1
1
0
0 1
Figure 2.3: Venn diagram of a received word for the Hamming code
added and c = (1, 1, 0, 1, 0, 0, 1) the codeword sent. If after transmission one
symbol is ﬂipped and y = (1, 0, 0, 1, 0, 0, 1) is the received word as given in
Figure 2.3.
Then we conclude that the error is in the left and upper circle, but not in the
right one. And we conclude that the error is at m2. But in case of 2 errors
and for instance the word y = (1, 0, 0, 1, 1, 0, 1) is received, then the receiver
would assume that the error occurred in the upper circle and not in the two
lower circles, and would therefore conclude that the transmitted codeword was
(1, 0, 0, 1, 1, 0, 0). Hence the decoding scheme creates an extra error.
The redundant information r can be obtained from the message m by means of
three linear equations or parity checks modulo two



r1 = m2 + m3 + m4
r2 = m1 + m3 + m4
r3 = m1 + m2 + m4
Let c = (m, r) be the codeword. Then c is a codeword if and only if HcT
= 0,

where
H =


0 1 1 1 1 0 0
1 0 1 1 0 1 0
1 1 0 1 0 0 1

 .
The information rate is improved from 1/3 for the repetition code and 4/9 for
the product code to 4/7 for the Hamming code.
*** gate diagrams of encoding/decoding scheme ***
2.1.2 Codes and Hamming distance
In general the alphabets of the message word and the encoded word might be
distinct. Furthermore the length of both the message word and the encoded
word might vary such as in a convolutional code. We restrict ourselves to [n, k]
block codes that is the message words have a fixed length of k symbols and the
encoded words have a fixed length of n symbols both from the same alphabetQ.
For the purpose of error control, before transmission, we add redundant symbols
to the message in a clever way.
Definition 2.1.4 Let Q be a set of q symbols called the alphabet. Let Qn
be
the set of all n-tuples x = (x1, . . . , xn), with entries xi ∈ Q. A block code C
of length n over Q is a non-empty subset of Qn
. The elements of C are called
codewords. If C contains M codewords, then M is called the size of the code.
We call a code with length n and size M an (n, M) code. If M = qk
, then C is
called an [n, k] code. For an (n, M) code defined over Q, the value n − logq(M)
is called the redundancy. The information rate is defined as R = logq(M)/n.
Example 2.1.5 The repetition code has length 3 and 2 codewords, so its in-
formation rate is 1/3. The product code has length 9 and 24
codewords, hence
its rate is 4/9. The Hamming code has length 7 and 24
codewords, therefore its
rate is 4/7.
Example 2.1.6 Let C be the binary block code of length n consisting of all
words with exactly two ones. This is an (n, n(n − 1)/2) code. In this example
the number of codewords is not a power of the size of the alphabet.
Definition 2.1.7 Let C be an [n, k] block code over Q. An encoder of C is a
one-to-one map
E : Qk
−→ Qn
such that C = E(Qk
). Let c ∈ C be a codeword. Then there exists a unique
m ∈ Qk
with c = E(m). This m is called the message or source word of c.
In order to measure the difference between two distinct words and to evaluate
the error-correcting capability of the code, we need to introduce an appropriate
metric to Qn
. A natural metric used in Coding Theory is the Hamming distance.
Definition 2.1.8 For x = (x1, . . . , xn), y = (y1, . . . , yn) ∈ Qn
, the Hamming
distance d(x, y) is defined as the number of places where they differ:
d(x, y) = |{i | xi = yi}|.

2.1. BLOCK CODES 19

x

©
d(x,y)
rrr
rrrj
y
rr
rrrr‰
d(y,z)
$$$$$$$$$$$$X z$$$$$$$$$$$$W
d(x,z)
Figure 2.4: Triangle inequality
Proposition 2.1.9 The Hamming distance is a metric on Qn
, that means that
the following properties hold for all x, y, z ∈ Qn
:
(1) d(x, y) ≥ 0 and equality hods if and only if x = y,
(2) d(x, y) = d(y, x) (symmetry),
(3) d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality),
Proof. Properties (1) and (2) are trivial from the definition. We leave (3) to
the reader as an exercise.
Definition 2.1.10 The minimum distance of a code C of length n is defined
as
d = d(C) = min{ d(x, y) | x, y ∈ C, x = y }
if C consists of more than one element, and is by definition n+1 if C consists of
one word. We denote by (n, M, d) a code C with length n, size M and minimum
distance d.
The main problem of error-correcting codes from “Hamming’s point view” is to
construct for a given length and number of codewords a code with the largest
possible minimum distance, and to find efficient encoding and decoding algo-
rithms for such a code.
Example 2.1.11 The triple repetition code consists of two codewords: (0, 0, 0)
and (1, 1, 1), so its minimum distance is 3. The product and Hamming code
both correct one error. So the minimum distance is at least 3, by the triangle
inequality. The product code has minimum distance 4 and the Hamming code
has minimum distance 3. Notice that all three codes have the property that
x + y is again a codeword if x and y are codewords.
Definition 2.1.12 Let x ∈ Qn
. The ball of radius r around x, denoted by
Br(x), is defined by Br(x) = { y ∈ Qn
| d(x, y) ≤ r }. The sphere of radius r
around x is denoted by Sr(x) and defined by Sr(x) = { y ∈ Qn
| d(x, y) = r }.

%
'$
¨¨B
q q q q q q q
q q q q q q q
q q q q q q q
q q q q q q q
q q q q q q q
q q q q q q q
q q q q q q q
Figure 2.5: Ball of radius
√
2 in the Euclidean plane
m
q q q q q
q q q q q
q q q q q
q q q q q
q q q q q
Figure 2.6: Balls of radius 0 and 1 in the Hamming metric
Figure 2.1.2 shows the ball in the Euclidean plane. This is misleading in some
respects, but gives an indication what we should have in mind.
Figure 2.1.2 shows Q2
, where the alphabet Q consists of 5 elements. The ball
B0(x) consists of the points in the circle, B1(x) is depicted by the points inside
the cross, and B2(x) consists of all 25 dots.
Proposition 2.1.13 Let Q be an alphabet of q elements and x ∈ Qn
. Then
|Si(x)| =
n
i
(q − 1)i
and |Br(x)| =
r
i=0
n
i
(q − 1)i
.
Proof. Let y ∈ Si(x). Let I be the subset of {1, . . . , n} consisting of all
positions j such that yj = xj. Then the number of elements of I is equal to i.
And (q − 1)i
is the number of words y ∈ Si(x) that have the same ﬁxed I. The
number of possibilities to choose the subset I with a ﬁxed number of elements
i is equal to n
i . This shows the formula for the number of elements of Si(x).
Furthermore Br(x) is the disjoint union of the subsets Si(x) for i = 0, . . . , r.
This proves the statement about the number of elements of Br(x).

2.2. LINEAR CODES 21
2.1.3 Exercises
2.1.1 Consider the code of length 8 that is obtained by deleting the last entry
r5 from the product code of Example 2.1.2. Show that this code corrects one
error.
2.1.2 Give a gate diagram of the decoding algorithm for the product code of
Example 2.1.2 that corrects always 1 error and sometimes 2 errors.
2.1.3 Give a proof of Proposition 2.1.9 (3), that is the triangle inequality of
the Hamming distance.
2.1.4 Let Q be an alphabet of q elements. Let x, y ∈ Qn
have distance d.
Show that the number of elements in the intersection Br(x) ∩ Bs(y) is equal to
i,j,k
d
i
d − i
j
n − d
k
(q − 2)j
(q − 1)k
,
where i, j and k are non-negative integers such that i + j ≤ d, k ≤ n − d,
i + j + k ≤ r and d − i + k ≤ s.
2.1.5 Write a procedure in GAP that takes n as an input and constructs the
code as in Example 2.1.6.
2.2 Linear Codes
Linear codes are introduced in case the alphabet is a finite field. These codes
have more structure and are therefore more tangible than arbitrary codes.
2.2.1 Linear codes
If the alphabet Q is a finite field, then Qn
is a vector space. This is for instance
the case if Q = {0, 1} = F2. Therefore it is natural to look at codes in Qn
that
have more structure, in particular that are linear subspaces.
Definition 2.2.1 A linear code C is a linear subspace of Fn
q , where Fq stands for
the finite field with q elements. The dimension of a linear code is its dimension
as a linear space over Fq. We denote a linear code C over Fq of length n and
dimension k by [n, k]q, or simply by [n, k]. If furthermore the minimum distance
of the code is d, then we call by [n, k, d]q or [n, k, d] the parameters of the code.
It is clear that for a linear [n, k] code over Fq, its size M = qk
. The information
rate is R = k/n and the redundancy is n − k.
Definition 2.2.2 For a word x ∈ Fn
q , its support, denoted by supp(x), is defined
as the set of nonzero coordinate positions, so supp(x) = {i | xi = 0}. The weight
of x is defined as the number of elements of its support, which is denoted by
wt(x). The minimum weight of a code C, denoted by mwt(C), is defined as the
minimal value of the weights of the nonzero codewords:
mwt(C) = min{ wt(c) | c ∈ C, c = 0 },
in case there is a c ∈ C not equal to 0, and n + 1 otherwise.

Proposition 2.2.3 The minimum distance of a linear code C is equal to its
minimum weight.
Proof. Since C is a linear code, we have that 0 ∈ C and for any c1, c2 ∈ C,
c1 −c2 ∈ C. Then the conclusion follows from the fact that wt(c) = d(0, c) and
d(c1, c2) = wt(c1 − c2).
Deﬁnition 2.2.4 Consider the situation of two Fq-linear codes C and D of
length n. If D ⊆ C, then D is called a subcode of C, and C a supercode of D.
Remark 2.2.5 Suppose C is an [n, k, d] code. Then, for any r, 1 ≤ r ≤ k,
there exist subcodes with dimension r. And for any given r, there may exist
more than one subcode with dimension r. The minimum distance of a subcode
is always greater than or equal to d. So, by taking an appropriate subcode, we
can get a new code of the same length which has a larger minimum distance.
We will discuss this later in Section 3.1.
Now let us see some examples of linear codes.
Example 2.2.6 The repetition code over Fq of length n consists of all words
c = (c, c, . . . , c) with c ∈ Fq. This is a linear code of dimension 1 and minimum
distance n.
Example 2.2.7 Let n be an integer with n ≥ 2. The even weight code C of
length n over Fq consists of all words in Fn
q of even weight. The minimum weight
of C is by deﬁnition 2, the minimum distance of C is 2 if q = 2 and 1 otherwise.
The code C linear if and only if q = 2.
Example 2.2.8 Let C be a binary linear code. Consider the subset Cev of C
consisting of all codewords in C of even weight. Then Cev is a linear subcode
and is called the even weight subcode of C. If C = Cev, then there exists a
codeword c in C of odd weight and C is the disjunct union of the cosets c+Cev
and Cev. Hence dim(Cev) ≥ dim(C) − 1.
Example 2.2.9 The Hamming code C of Example 2.1.3 consists of all the words
c ∈ F7
2 satisfying HcT
= 0, where
H =


0 1 1 1 1 0 0
1 0 1 1 0 1 0
1 1 0 1 0 0 1

 .
This code is linear of dimension 4, since it is given by the solutions of three
independent homogeneous linear equations. The minimum weight is 3 as shown
in Example 2.1.11. So it is a [7, 4, 3] code.
2.2.2 Generator matrix and systematic encoding
Let C be an [n, k] linear code over Fq. Since C is a k-dimensional linear subspace
of Fn
q , there exists a basis that consists of k linearly independent codewords, say
g1, . . . , gk. Suppose gi = (gi1, . . . , gin) for i = 1, . . . , k. Denote
G =





g1
g2
...
gk





=





g11 g12 · · · g1n
g21 g22 · · · g2n
...
...
...
...
gk1 gk2 · · · gkn





.

Every codeword c can be written uniquely as a linear combination of the
basis elements, so c = m1g1 + · · · + mkgk where m1, . . . , mk ∈ Fq. Let
m = (m1, . . . , mk) ∈ Fk
q . Then c = mG. The encoding
E : Fk
q −→ Fn
q ,
from the message word m ∈ Fk
q to the codeword c ∈ Fn
q can be done eﬃciently
by a matrix multiplication.
c = E(m) := mG.
Deﬁnition 2.2.10 A k × n matrix G with entries in Fq is called a generator
matrix of an Fq-linear code C if the rows of G are a basis of C.
A given [n, k] code C can have more than one generator matrix, however every
generator matrix of C is a k×n matrix of rank k. Conversely every k×n matrix
of rank k is the generator matrix of an Fq-linear [n, k] code.
Example 2.2.11 The linear codes with parameters [n, 0, n+1] and [n, n, 1] are
the trivial codes {0} and Fn
q , and they have the empty matrix and the n × n
identity matrix In as generator matrix, respectively.
Example 2.2.12 The repetition code of length n has generator matrix
G = ( 1 1 · · · 1 ).
Example 2.2.13 The binary even-weight code of length n has for instance the
following two generator matrices







1 1 0 . . . 0 0 0
0 1 1 . . . 0 0 0
...
...
...
...
...
...
...
0 0 0 . . . 1 1 0
0 0 0 . . . 0 1 1







and







1 0 . . . 0 0 1
0 1 . . . 0 0 1
...
...
...
...
...
...
0 0 . . . 1 0 1
0 0 . . . 0 1 1







.
Example 2.2.14 The Hamming code C of Example 2.1.3 is a [7, 4] code. The
message symbols mi for i = 1, . . . , 4 are free to choose. If we take mi = 1 and
the remaining mj = 0 for j = i we get the codeword gi. In this way we get
the basis g1, g2, g3, g4 of the code C, that are the rows of following generator
matrix
G =




1 0 0 0 0 1 1
0 1 0 0 1 0 1
0 0 1 0 1 1 0
0 0 0 1 1 1 1



 .
From the example, the generator matrix G of the Hamming code has the fol-
lowing form
(Ik | P)
where Ik is the k × k identity matrix and P a k × (n − k) matrix.

Remark 2.2.15 Let G be a generator matrix of C. From Linear Algebra, see
Section ??, we know that we can transform G by Gaussian elimination in a
row equivalent matrix in row reduced echelon form by a sequence of the three
elementary row operations:
1) interchanging two rows,
2) multiplying a row with a nonzero constant,
3) adding one row to another row.
Moreover for a given matrix G, there is exactly one row equivalent matrix that
is in row reduced echelon form, denoted by rref(G). In the following proposition
it is stated that rref(G) is also a generator matrix of C.
Proposition 2.2.16 Let G be a generator matrix of C. Then rref(G) is also
a generator matrix of C and rref(G) = MG, where M is an invertible k × k
matrix with entries in Fq.
Proof. The row reduced echelon form rref(G) of G is obtained from G by a
sequence of elementary operations. The code C is equal to the row space of G,
and the row space does not change under elementary row operations. So rref(G)
generates the same code C.
Furthermore rref(G) = E1 · · · ElG, where E1, . . . , El are the elementary matrices
that correspond to the elementary row operations. Let M = E1 · · · El. Then M
is an invertible matrix, since the Ei are invertible, and rref(G) = MG.
Proposition 2.2.17 Let G1 and G2 be two k×n generator matrices generating
the codes C1 and C2 over Fq. Then the following statements are equivalent:
1) C1 = C2,
2) rref(G1) = rref(G2),
3) there is a k × k invertible matrix M with entries in Fq such that G2 = MG1.
Proof.
1) implies 2): The row spaces of G1 and G2 are the same, since C1 = C2. So
G1 and G2 are row equivalent. Hence rref(G1) = rref(G2).
2) implies 3): Let Ri = rref(Gi). There is a k × k invertible matrix Mi such
that Gi = MiRi for i = 1, 2, by Proposition 2.2.17. Let M = M2M−1
1 . Then
MG1 = M2M−1
1 M1R1 = M2R2 = G2.
3) implies 1): Suppose G2 = MG1 for some k × k invertible matrix M. Then
every codeword of C2 is linear combination of the rows of G1 that are in C1. So
C2 is a subcode of C1. Similarly C1 ⊆ C2, since G1 = M−1
G2. Hence C1 = C2.
Remark 2.2.18 Although a generator matrix G of a code C is not unique, the
row reduced echelon form rref(G) is unique. That is to say, if G is a generator
matrix of C, then rref(G) is also a generator matrix of C, and furthermore if
G1 and G2 are generator matrices of C, then rref(G1) = rref(G2). Therefore
the row reduced echelon form rref(C) of a code C is well-deﬁned, being rref(G)
for a generator matrix G of C by Proposition 2.2.17.
Example 2.2.19 The generator matrix G2 of Example 2.2.13 is in row-reduced
echelon form and a generator matrix of the binary even-weight code C. Hence
G2 = rref(G1) = rref(C).

Deﬁnition 2.2.20 Let C be an [n, k] code. The code is called systematic at
the positions (j1, . . . , jk) if for all m ∈ Fk
q there exists a unique codeword c such
that cji
= mi for all i = 1, . . . , k. In that case, the set {j1, . . . , jk} is called an
information set. A generator matrix G of C is called systematic at the positions
(j1, . . . , jk) if the k × k submatrix G consisting of the k columns of G at the
positions (j1, . . . , jk) is the identity matrix. For such a matrix G the mapping
m → mG is called systematic encoding.
Remark 2.2.21 If a generator matrix G of C is systematic at the positions
(j1, . . . , jk) and c is a codeword, then c = mG for a unique m ∈ Fk
q and
cji
= mi for all i = 1, . . . , k. Hence C is systematic at the positions (j1, . . . , jk).
Now suppose that the ji with 1 ≤ j1 · · · jk ≤ n indicate the positions of
the pivots of rref(G). Then the code C and the generator matrix rref(G) are
systematic at the positions (j1, . . . , jk).
Proposition 2.2.22 Let C be a code with generator matrix G. Then C is
systematic at the positions j1, . . . , jk if and only if the k columns of G at the
positions j1, . . . , jk are linearly independent.
Proof. Let G be a generator matrix of C. Let G be the k × k submatrix
of G consisting of the k columns at the positions (j1, . . . , jk). Suppose C is
systematic at the positions (j1, . . . , jk). Then the map given by x → xG is
injective. Hence the columns of G are linearly independent.
Conversely, if the columns of G are linearly independent, then there exists a
k × k invertible matrix M such that MG is the identity matrix. Hence MG is
a generator matrix of C and C is systematic at (j1, . . . , jk).
Example 2.2.23 Consider a code C with generator matrix
G =




1 0 1 0 1 0 1 0
1 1 0 0 1 1 0 0
1 1 0 1 0 0 1 0
1 1 0 1 0 0 1 1



 .
Then
rref(C) = rref(G) =




1 0 1 0 1 0 1 0
0 1 1 0 0 1 1 0
0 0 0 1 1 1 1 0
0 0 0 0 0 0 0 1




and the code is systematic at the positions 1, 2, 4 and 8. By the way we notice
that the minimum distance of the code is 1.
2.2.3 Exercises
2.2.1 Determine for the product code of Example 2.1.2 the number of code-
words, the number of codewords of a given weight, the minimum weight and
the minimum distance. Express the redundant bits rj for j = 1, . . . , 5 as linear
equations over F2 in the message bits mi for i = 1, . . . , 4. Give a 5×9 matrix H
such that c = (m, r) is a codeword of the product code if and only if HcT
= 0,
where m is the message of 4 bits mi and r is the vector with the 5 redundant
bits rj.

2.2.2 Let x and y be binary words of the same length. Show that
wt(x + y) = wt(x) + wt(y) − 2|supp(x) ∩ supp(y)|.
2.2.3 Let C be an Fq-linear code with generator matrix G. Let q = 2. Show
that every codeword of C has even weight if and only if every row of a G has
even weight. Show by means of a counter example that the above statement is
not true if q = 2.
2.2.4 Consider the following matrix with entries in F5
G =


1 1 1 1 1 0
0 1 2 3 4 0
0 1 4 4 1 1

 .
Show that G is a generator matrix of a [5, 3, 3] code. Give the row reduced
echelon form of this code.
2.2.5 Compute the complexity of the encoding of a linear [n, k] code by an
arbitrary generator matrix G and in case G is systematic, respectively, in terms
of the number of additions and multiplications.
2.3 Parity checks and dual code
Linear codes are implicitly deﬁned by parity check equations and the dual of a
code is introduced.
2.3.1 Parity check matrix
There are two standard ways to describe a subspace, explicitly by giving a basis,
or implicitly by the solution space of a set of homogeneous linear equations.
Therefore there are two ways to describe a linear code. That is explicitly as we
have seen by a generator matrix, or implicitly by a set of homogeneous linear
equations that is by the null space of a matrix.
Let C be an Fq-linear [n, k] code. Suppose that H is an m × n matrix with
entries in Fq. Let C be the null space of H. So C is the set of all c ∈ Fn
q such
that HcT
= 0. These m homogeneous linear equations are called parity check
equations, or simply parity checks. The dimension k of C is at least n − m. If
there are dependent rows in the matrix H, that is if k n − m, then we can
delete a few rows until we obtain an (n − k) × n matrix H with independent
rows and with the same null space as H. So H has rank n − k.
Deﬁnition 2.3.1 An (n − k) × n matrix of rank n − k is called a parity check
matrix of an [n, k] code C if C is the null space of this matrix.
Remark 2.3.2 The parity check matrix of a code can be used for error detec-
tion. This is useful in a communication channel where one asks for retransmis-
sion in case more than a certain number of errors occurred. Suppose that C
is a linear code of minimum distance d and H is a parity check matrix of C.
Suppose that the codeword c is transmitted and r = c + e is received. Then e

2.3. PARITY CHECKS AND DUAL CODE 27
is called the error vector and wt(e) the number of errors. Now HrT
= 0 if there
is no error and HrT
= 0 for all e such that 0 wt(e) d. Therefore we can
detect any pattern of t errors with t d. But not more, since if the error vector
is equal to a nonzero codeword of minimal weight d, then the receiver would
assume that no errors have been made. The vector HrT
is called the syndrome
of the received word.
We show that every linear code has a parity check matrix and we give a method
to obtain such a matrix in case we have a generator matrix G of the code.
Proposition 2.3.3 Suppose C is an [n, k] code. Let Ik be the k × k identity
matrix. Let P be a k × (n − k) matrix. Then, (Ik|P) is a generator matrix of
C if and only if (−PT
|In−k) is a parity check matrix of C.
Proof. Every codeword c is of the form mG with m ∈ Fk
q . Suppose that the
generator matrix G is systematic at the first k positions. So c = (m, r) with
r ∈ Fn−k
q and r = mP. Hence for a word of the form c = (m, r) with m ∈ Fk
q
and r ∈ Fn−k
q the following statements are equivalent:
c is a codeword ,
−mP + r = 0,
−PT
mT
+ rT
= 0,
−PT
|In−k (m, r)T
= 0,
−PT
|In−k cT
= 0.
Hence −PT
|In−k is a parity check matrix of C. The converse is proved simi-
larly.
Example 2.3.4 The trivial codes {0} and Fn
q have In and the empty matrix
as parity check matrix, respectively.
Example 2.3.5 As a consequence of Proposition 2.3.3 we see that a parity
check matrix of the binary even weight code is equal to the generator matrix
( 1 1 · · · 1 ) of the repetition code, and the generator matrix G2 of the binary
even weight code of Example 2.2.13 is a parity check matrix of the repetition
code.
Example 2.3.6 The ISBN code of a book consists of a word (b1, . . . , b10) of 10
symbols of the alphabet with the 11 elements: 0, 1, 2, . . . , 9 and X of the finite
field F11, where X is the symbol representing 10, that satisfies the parity check
equation:
b1 + 2b2 + 3b3 + · · · + 10b10 = 0.
Clearly his code detects one error. This code corrects many patterns of one
transposition of two consecutive symbols. Suppose that the symbols bi and bi+1
are interchanged and there are no other errors, then the parity check gives as
outcome
ibi+1 + (i + 1)bi +
j=i,i+1
jbj = s.

We know that j jbj = 0, since (b1, . . . , b10) is an ISBN codeword. Hence
s = bi − bi+1. But this position i is in general not unique.
Consider for instance the following code: 0444815933. Then the checksum gives
4, so it is not a valid ISBN code. Now assume that the code is the result
of transposition of two consecutive symbols. Then 4044815933, 0448415933,
0444185933, 0444851933 and 0444819533 are the possible ISBN codes. The
first and third code do not match with existing books. The second, fourth
and fifth code correspond to books with the titles: “The revenge of the dragon
lady,” “The theory of error-correcting codes” and “Nagasaki’s symposium on
Chernobyl,” respectively.
Example 2.3.7 The generator matrix G of the Hamming code C in Example
2.2.14 is of the form (I4|P) and in Example 2.2.9 we see that the parity check
matrix is equal to (PT
|I3).
Remark 2.3.8 Let G be a generator matrix of an [n, k] code C. Then the row
reduced echelon form G1 = rref(G) is not systematic at the first k positions but
at the positions (j1, . . . , jk) with 1 ≤ j1 · · · jk ≤ n. After a permutation π
of the n positions with corresponding n × n permutation matrix, denoted by Π,
we may assume that G2 = G1Π is of the form (Ik|P). Now G2 is a generator
matrix of the code C2 which is not necessarily equal to C. A parity check matrix
H2 for C2 is given by (−PT
|In−k) according to Proposition 2.3.3. A parity check
matrix H for C is now of the form (−PT
|In−k)ΠT
, since Π−1
= ΠT
.
This remark motivates the following definition.
Definition 2.3.9 Let I = {i1, . . . , ik} be an information set of the code C.
Then its complement {1, . . . , n} I is called a check set.
Example 2.3.10 Consider the code C of Example 2.2.23 with generator matrix
G. The row reduced echelon form G1 = rref(G) is systematic at the positions 1,
2, 4 and 8. Let π be the permutation (348765) with corresponding permutation
matrix Π. Then G2 = G1Π = (I4|P) and H2 = (PT
|I4) with
G2 =




1 0 0 0 1 1 0 1
0 1 0 0 1 0 1 1
0 0 1 0 0 1 1 1
0 0 0 1 0 0 0 0



 , H2 =




1 1 0 0 1 0 0 0
1 0 1 0 0 1 0 0
0 1 1 0 0 0 1 0
1 1 1 0 0 0 0 1




Now π−1
= (356784) and
H = H2ΠT
=




1 1 1 0 0 0 0 0
1 0 0 1 1 0 0 0
0 1 0 1 0 1 0 0
1 1 0 1 0 0 1 0




is a parity check matrix of C.
2.3.2 Hamming and simplex codes
The following proposition gives a method to determine the minimum distance of
a code in terms of the number of dependent columns of the parity check matrix.

Proposition 2.3.11 Let H be a parity check matrix of a code C. Then the
minimum distance d of C is the smallest integer d such that d columns of H are
linearly dependent.
Proof. Let h1, . . . , hn be the columns of H. Let c be a nonzero codeword
of weight w. Let supp(c) = {j1, . . . , jw} with 1 ≤ j1 · · · jw ≤ n. Then
HcT
= 0, so cj1 hj1 + · · · + cjw hjw = 0 with cji = 0 for all i = 1, . . . , w.
Therefore the columns hj1
, . . . , hjw
are dependent. Conversely if hj1
, . . . , hjw
are dependent, then there exist constants a1, . . . , aw, not all zero, such that
a1hj1
+ · · · + awhjw
= 0. Let c be the word deﬁned by cj = 0 if j = ji for all i,
and cj = ai if j = ji for some i. Then HcT
= 0. Hence c is a nonzero codeword
of weight at most w.
Remark 2.3.12 Let H be a parity check matrix of a code C. As a consequence
of Proposition 2.3.11 we have the following special cases. The minimum distance
of code is 1 if and only if H has a zero column. An example of this is seen in
Example 2.3.10. Now suppose that H has no zero column, then the minimum
distance of C is at least 2. The minimum distance is equal to 2 if and only if H
has two columns say hj1
, hj2
that are dependent. In the binary case that means
hj1 = hj2 . In other words the minimum distance of a binary code is at least 3 if
and only if H has no zero columns and all columns are mutually distinct. This
is the case for the Hamming code of Example 2.2.9. For a given redundancy r
the length of a binary linear code C of minimum distance 3 is at most 2r
− 1,
the number of all nonzero binary columns of length r. For arbitrary Fq, the
number of nonzero columns with entries in Fq is qr
− 1. Two such columns
are dependent if and only if one is a nonzero multiple of the other. Hence the
length of an Fq-linear code code C with d(C) ≥ 3 and redundancy r is at most
(qr
− 1)/(q − 1).
Deﬁnition 2.3.13 Let n = (qr
− 1)/(q − 1). Let Hr(q) be a r × n matrix
over Fq with nonzero columns, such that no two columns are dependent. The
code Hr(q) with Hr(q) as parity check matrix is called a q-ary Hamming code.
The code with Hr(q) as generator matrix is called a q-ary simplex code and is
denoted by Sr(q).
Proposition 2.3.14 Let r ≥ 2. Then the q-ary Hamming code Hr(q) has
parameters [(qr
− 1)/(q − 1), (qr
− 1)/(q − 1) − r, 3].
Proof. The rank of the matrix Hr(q) is r, since the r standard basis vectors
of weight 1 are among the columns of the matrix. So indeed Hr(q) is a parity
check matrix of a code with redundancy r. Any 2 columns are independent by
construction. And a column of weight 2 is a linear combination of two columns
of weight 1, and such a triple of columns exists, since r ≥ 2. Hence the minimum
distance is 3 by Proposition 2.3.11.
Example 2.3.15 Consider the following ternary Hamming H3(3) code of re-
dundancy 3 of length 13 with parity check matrix
H3(3) =


1 1 1 1 1 1 1 1 1 0 0 0 0
2 2 2 1 1 1 0 0 0 1 1 1 0
2 1 0 2 1 0 2 1 0 2 1 0 1

 .

By Proposition 2.3.14 the code H3(3) has parameters [13, 10, 3]. Notice that
all rows of H3(3) have weight 9. In fact every linear combination xH3(3) with
x ∈ F3
3 and x = 0 has weight 9. So all nonzero codewords of the ternary simplex
code of dimension 3 have weight 9. Hence S3(3) is a constant weight code. This
is a general fact of simplex codes as is stated in the following proposition.
Proposition 2.3.16 The ary simplex code Sr(q) is a constant weight code with
parameters [(qr
− 1)/(q − 1), r, qr−1
].
Proof. We have seen already in Proposition 2.3.14 that Hr(q) has rank r, so
it is indeed a generator matrix of a code of dimension r. Let c be a nonzero
codeword of the simplex code. Then c = mHr(q) for some nonzero m ∈ Fr
q.
Let hT
j be the j-th column of Hr(q). Then cj = 0 if and only if m·hj = 0. Now
m · x = 0 is a nontrivial homogeneous linear equation. This equation has qr−1
solutions x ∈ Fr
q, it has qr−1
− 1 nonzero solutions. It has (qr−1
− 1)/(q − 1)
solutions x such that xT
is a column of Hr(q), since for every nonzero x ∈ Fr
q
there is exactly one column in Hr(q) that is a nonzero multiple of xT
. So the
number of zeros of c is (qr−1
− 1)/(q − 1). Hence the weight of c is the number
of nonzeros which is qr−1
.
2.3.3 Inner product and dual codes
Definition 2.3.17 The inner product on Fn
q is defined by
x · y = x1y1 + · · · + xnyn
for x, y ∈ Fn
q .
This inner product is bilinear, symmetric and nondegenerate, but the notion
of “positive definite” makes no sense over a finite field as it does over the real
numbers. For instance for a binary word x ∈ Fn
2 we have that x · x = 0 if and
only if the weight of x is even.
Definition 2.3.18 For an [n, k] code C we define the dualdual or orthogonal
code C⊥
as
C⊥
= {x ∈ Fn
q | c · x = 0 for all c ∈ C}.
Proposition 2.3.19 Let C be an [n, k] code with generator matrix G. Then
C⊥
is an [n, n − k] code with parity check matrix G.
Proof. From the definition of dual codes, the following statements are equiv-
alent:
x ∈ C⊥
,
c · x = 0 for all c ∈ C,
mGxT
= 0 for all m ∈ Fk
q ,
GxT
= 0.
This means that C⊥
is the null space of G. Because G is a k × n matrix of rank
k, the linear space C⊥
has dimension n − k and G is a parity check matrix of
C⊥
.

Example 2.3.20 The trivial codes {0} and Fn
q are dual codes.
Example 2.3.21 The binary even weight code and the repetition code of the
same length are dual codes.
Example 2.3.22 The simplex code Sr(q) and the Hamming code Hr(q) are
dual codes, since Hr(q) is a parity check matrix of Hr(q) and a generator matrix
of Sr(q)
A subspace C of a real vector space Rn
has the property that C ∩ C⊥
= {0},
since the standard inner product is positive definite. Over finite fields this is
not always the case.
Definition 2.3.23 Two codes C1 and C2 in Fn
q are called orthogonal if x·y = 0
for all x ∈ C1 and y ∈ C2, and they are called dual if C2 = C⊥
1 .
If C ⊆ C⊥
, we call C weakly self-dual or self-orthogonal. If C = C⊥
, we call C
self-dual. The hull of a code C is defined by H(C) = C ∩ C⊥
. A code is called
complementary dual if H(C) = {0}.
Example 2.3.24 The binary repetition code of length n is self-orthogonal if
and only if n is even. This code is self-dual if and only if n = 2.
Proposition 2.3.25 Let C be an [n, k] code. Then:
(1) (C⊥
)⊥
= C.
(2) C is self-dual if and only C is self-orthogonal and n = 2k.
Proof.
(1) Let c ∈ C. Then c · x = 0 for all x ∈ C⊥
. So C ⊆ (C⊥
)⊥
. Moreover,
applying Proposition 2.3.19 twice, we see that C and (C⊥
)⊥
have the same
finite dimension. Therefore equality holds.
(2) Suppose C is self-orthogonal, then C ⊆ C⊥
. Now C = C⊥
if and only if
k = n − k, by Proposition 2.3.19. So C is self-dual if and only if n = 2k.
Example 2.3.26 Consider
G =




1 0 0 0 0 1 1 1
0 1 0 0 1 0 1 1
0 0 1 0 1 1 0 1
0 0 0 1 1 1 1 0



 .
Let G be the generator matrix of the binary [8,4] code C. Notice that GGT
= 0.
So x · y = 0 for all x, y ∈ C. Hence C is self-orthogonal. Furthermore n = 2k.
Therefore C is self-dual. Notice that all rows of G have weight 4, therefore
all codewords have weights divisible by 4 by Exercise 2.3.11. Hence C has
parameters [8,4,4].
Remark 2.3.27 Notice that x · x ≡ wt(x) mod 2 if x ∈ Fn
2 and x · x ≡ wt(x)
mod 3 if x ∈ Fn
3 . Therefore all weights are even for a binary self-orthogonal
code and all weights are divisible by 3 for a ternary self-orthogonal code.

Example 2.3.28 Consider the ternary code C with generator matrix G =
(I6|A) with
A =








0 1 1 1 1 1
1 0 1 2 2 1
1 1 0 1 2 2
1 2 1 0 1 2
1 2 2 1 0 1
1 1 2 2 1 0








.
It is left as an exercise to show that C is self-dual. The linear combination of
any two columns of A has weight at least 3, and the linear combination of any
two columns of I6 has weight at most 2. So no three columns of G are dependent
and G is also a parity check matrix of C. Hence the minimum distance of C
is at least 4, and therefore it is 6 by Remark 2.3.27. Thus C has parameters
[12, 6, 6] and it is called the extended ternary Golay code. By puncturing C we
get a [11, 6, 5] code and it is called the ternary Golay codecode.
Corollary 2.3.29 Let C be a linear code. Then:
(1) G is generator matrix of C if and only if G is a parity check matrix of C⊥
.
(2) H is parity check matrix of C if and only if H is a generator matrix of C⊥
.
Proof. The ﬁrst statement is Proposition 2.3.19 and the second statement is a
consequence of the ﬁrst applied to the code C⊥
using Proposition 2.3.25(1).
Proposition 2.3.30 Let C be an [n, k] code. Let G be a k×n generator matrix
of C and let H be an (n−k)×n matrix of rank n−k. Then H is a parity check
matrix of C if and only if GHT
= 0, the k × (n − k) zero matrix.
Proof. Suppose H is a parity check matrix. For any m ∈ Fk
q , mG is a codeword
of C. So, HGT
mT
= H(mG)T
= 0. This implies that mGHT
= 0. Since m
can be any vector in Fk
q . We have GHT
= 0.
Conversely, suppose GHT
= 0. We assumed that G is a k × n matrix of rank k
and H is an (n − k) × n matrix of rank n − k. So H is the parity check matrix
of an [n, k] code C . For any c ∈ C, we have c = mG for some m ∈ Fk
q . Now
HcT
= (mGHT
)T
= 0.
So c ∈ C . This implies that C ⊆ C . Hence C = C, since both C and C have
dimension k. Therefore H is a parity check matrix of C.
Remark 2.3.31 A consequence of Proposition 2.3.30 is another proof of Propo-
sition 2.3.3 Because, let G = (Ik|P) be a generator matrix of C. Let H =
(−PT
|In−k). Then G has rank k and H has rank n − k and GHT
= 0. There-
fore H is a parity check matrix of C.
2.3.4 Exercises
2.3.1 Assume that 3540461335 is obtained from an ISBN code by interchang-
ing two neighboring symbols. What are the possible ISBN codes? Now assume
moreover that it is an ISBN code of an existing book. What is the title of this
book?

2.4. DECODING AND THE ERROR PROBABILITY 33
2.3.2 Consider the binary product code C of Example 2.1.2. Give a parity
check matrix and a generator matrix of this code. Determine the parameters of
the dual of C.
2.3.3 Give a parity check matrix of the C of Exercise 2.2.4. Show that C is
self-dual.
2.3.4 Consider the binary simplex code S3(2) with generator matrix H as
given in Example 2.2.9. Show that there are exactly seven triples (i1, i2, i3) with
increasing coordinate positions such that S3(2) is not systematic at (i1, i2, i3).
Give the seven four-tuples of positions that are not systematic with respect to
the Hamming code H3(2) with parity check matrix H.
2.3.5 Let C1 and C2 be linear codes of the same length. Show the following
statements:
(1) If C1 ⊆ C2, then C⊥
2 ⊆ C⊥
1 .
(2) C1 and C2 are orthogonal if and only if C1 ⊆ C⊥
2 if and only if C2 ⊆ C⊥
1 .
(3) (C1 ∩ C2)⊥
= C⊥
1 + C⊥
2 .
(4) (C1 + C2)⊥
= C⊥
1 ∩ C⊥
2 .
2.3.6 Show that a linear code C with generator matrix G has a complementary
dual if and only if det(GGT
) = 0.
2.3.7 Show that there exists a [2k, k] self-dual code over Fq if and only if there
is a k × k matrix P with entries in Fq such that PPT
= −Ik.
2.3.8 Give an example of a ternary [4,2] self-dual code and show that there is
no ternary self-dual code of length 6.
2.3.9 Show that the extended ternary Golay code in Example 2.3.28 is self-
dual.
2.3.10 Show that a binary code is self-orthogonal if the weights of all code-
words are divisible by 4. Hint: use Exercise 2.2.2.
2.3.11 Let C be a binary self-orthogonal code which has a generator matrix
such that all its rows have a weight divisible by 4. Then the weights of all
codewords are divisible by 4.
2.3.12 Write a procedure either in GAP or Magma that determines whether
the given code is self-dual or not. Test correctness of your procedure with
commands IsSelfDualCode and IsSelfDual in GAP and Magma respectively.
2.4 Decoding and the error probability
Intro

2.4.1 Decoding problem
Definition 2.4.1 Let C be a linear code in Fn
q of minimum distance d. If c
is a transmitted codeword and r is the received word, then {i|ri = ci} is the
set of error positions and the number of error positions is called the number of
errors of the received word. Let e = r−c. Then e is called the error vector and
r = c + e. Hence supp(e) is the set of error positions and wt(e) the number of
errors. The ei’s are called the error values.
Remark 2.4.2 If r is the received word and t = d(C, r) is the distance of r
to the code C, then there exists a nearest codeword c such that t = d(c , r).
So there exists an error vector e such that r = c + e and wt(e ) = t . If the
number of errors t is at most (d−1)/2, then we are sure that c = c and e = e .
In other words, the nearest codeword to r is unique when r has distance at most
(d − 1)/2 to C.
***Picture***
Definition 2.4.3 e(C) = (d(C) − 1)/2 is called the error-correcting capacity
decoding radius of the code C.
Definition 2.4.4 A decoder D for the code C is a map
D : Fn
q −→ Fn
q ∪ {∗}
such that D(c) = c for all c ∈ C.
If E : Fk
q → Fn
q is an encoder of C and D : Fn
q → Fk
q ∪ {∗} is a map such that
D(E(m)) = m for all m ∈ Fk
q , then D is called a decoder with respect to the
encoder E.
Remark 2.4.5 If E is an encoder of C and D is a decoder with respect to E,
then the composition E ◦ D is a decoder of C. It is allowed that the decoder
gives as outcome the symbol ∗ in case it fails to find a codeword. This is called
a decoding failure. If c is the codeword sent and r is the received word and
D(r) = c = c, then this is called a decoding error. If D(r) = c, then r is
decoded correctly. Notice that a decoding failure is noted on the receiving end,
whereas there is no way that the decoder can detect a decoding error.
Definition 2.4.6 A complete decoder is a decoder that always gives a codeword
in C as outcome. A nearest neighbor decoder, also called a minimum distance
decoder, is a complete decoder with the property that D(r) is a nearest codeword.
A decoder D for a code C is called a t-bounded distance decoder or a decoder
that corrects t errors if D(r) is a nearest codeword for all received words r
with d(C, r) ≤ t errors. A decoder for a code C with error-correcting capacity
e(C) decodes up to half the minimum distance if it is an e(C)-bounded distance
decoder, where e(C) = (d(C) − 1)/2 is the error-correcting capacity of C.
Remark 2.4.7 If D is a t-bounded distance decoder, then it is not required
that D gives a decoding failure as outcome for a received word r if the distance
of r to the code is strictly larger than t. In other words: D is also a t -bounded
distance decoder for all t ≤ t.
A nearest neighbor decoder is a t-bounded distance decoder for all t ≤ ρ(C),
where ρ(C) is the covering radius of the code. A ρ(C)-bounded distance decoder
is a nearest neighbor decoder, since d(C, r) ≤ ρ(C) for all received words r.

Definition 2.4.8 Let r be a received word with respect to a code C. A coset
leader of r + C is a choice of an element of minimal weight in the coset r + C.
The weight of a coset is the minimal weight of an element in the coset. Let αi
be the number of cosets of C that are of weight i. Then αC(X, Y ), the coset
leader weight enumerator of C is the polynomial defined by
αC(X, Y ) =
n
i=0
αiXn−i
Y i
.
Remark 2.4.9 The choice of a coset leader of the coset r + C is unique if
d(C, r) ≤ (d − 1)/2, and αi = n
i (q − 1)i
for all i ≤ (d − 1)/2, where d is the
minimum distance of C. Let ρ(C) be the covering radius of the code, then there
is at least one codeword c such that d(c, r) ≤ ρ(C). Hence the weight of a coset
leader is at most ρ(C) and αi = 0 for i ρ(C). Therefore the coset leader
weight enumerator of a perfect code C of minimum distance d = 2t + 1 is given
by
αC(X, Y ) =
t
i=0
n
i
(q − 1)i
Xn−i
Y i
.
The computation of the coset leader weight enumerator of a code is in general
a very hard problem.
Definition 2.4.10 Let r be a received word. Let e be the chosen coset leader
of the coset r + C. The coset leader decoder gives r − e as output.
Remark 2.4.11 The coset leader decoder is a nearest neighbor decoder.
Definition 2.4.12 Let r be a received word with respect to a code C of di-
mension k. Choose an (n − k) × n parity check matrix H of the code C. Then
s = rHT
∈ Fn−k
q is called the syndrome of r with respect to H.
Remark 2.4.13 Let C be a code of dimension k. Let r be a received word.
Then r + C is called the coset of r. Now the cosets of the received words r1
and r2 are the same if and only if r1HT
= r2HT
. Therefore there is a one to
one correspondence between cosets of C and values of syndromes. Furthermore
every element of Fn−k
q is the syndrome of some received word r, since H has
rank n − k. Hence the number of cosets is qn−k
.
A list decoder gives as output the collection of all nearest codewords.
Knowing the existence of a decoder is nice to know from a theoretical point of
view, in practice the problem is to find an efficient algorithm that computes the
outcome of the decoder. To compute of a given vector in Euclidean n-space the
closest vector to a given linear subspace can be done efficiently by an orthogonal
projection to the subspace. The corresponding problem for linear codes is in
general not such an easy task. This is treated in Section 6.2.1.
2.4.2 Symmetric channel
....

Definition 2.4.14 The q-ary symmetric channel (qSC) is a channel where q-
ary words are sent with independent errors with the same cross-over probability
p at each coordinate, with 0 ≤ p ≤ 1
2 , such that all the q − 1 wrong symbols
occur with the same probability p/(q − 1). So a symbol is transmitted correctly
with probability 1 − p. The special case q = 2 is called the binary symmetric
channel (BSC).
picture
Remark 2.4.15 Let P(x) be the probability that the codeword x is sent. Then
this probability is assumed to be the same for all codewords. Hence P(c) = 1
|C|
for all c ∈ C. Let P(r|c) be the probability that r is received given that c is
sent. Then
P(r|c) =
p
q − 1
d(c,r)
(1 − p)n−d(c,r)
for a q-ary symmetric channel.
Definition 2.4.16 For every decoding scheme and channel one defines three
probabilities Pcd(p), Pde(p) and Pdf (p), that is the probability of correct decoding,
decoding error and decoding failure, respectively. Then
Pcd(p) + Pde(p) + Pdf (p) = 1 for all 0 ≤ p ≤
1
2
.
So it suffices to find formulas for two of these three probabilities. The error
probability, also called the error rate is defined by Perr(p) = 1 − Pcd(p). Hence
Perr(p) = Pde(p) + Pdf (p).
Proposition 2.4.17 The probability of correct decoding of a decoder that cor-
rects up to t errors with 2t + 1 ≤ d of a code C of minimum distance d on a
q-ary symmetric channel with cross-over probability p is given by
Pcd(p) =
t
w=0
n
w
pw
(1 − p)n−w
.
Proof. Every codeword has the same probability of transmission. So
Pcd(p) =
c∈C
P(c)
d(c,r)≤t
P(y|r) =
1
|C|
c∈C d(c,r)≤t
P(r|c),
Now P(r|c) depends only on the distance between r and c by Remark 2.4.15.
So without loss of generality we may assume that 0 is the codeword sent. Hence
Pcd(p) =
d(0,r)≤t
P(r|0) =
t
w=0
n
w
(q − 1)w p
q − 1
w
(1 − p)n−w
by Proposition 2.1.13. Clearing the factor (q − 1)w
in the numerator and the
denominator gives the desired result.
In Proposition 4.2.6 a formula will be derived for the probability of decoding
error for a decoding algorithm that corrects errors up to half the minimum
distance.

Example 2.4.18 Consider the binary triple repetition code. Assume that
(0, 0, 0) is transmitted. In case the received word has weight 0 or 1, then it
is correctly decoded to (0, 0, 0). If the received word has weight 2 or 3, then it
is decoded to (1, 1, 1) which is a decoding error. Hence there are no decoding
failures and
Pcd(p) = (1−p)3
+3p(1−p)2
= 1−3p2
+2p3
and Perr(p) = Pde(p) = 3p2
−2p3
.
If the Hamming code is used, then there are no decoding failures and
Pcd(p) = (1 − p)7
+ 7p(1 − p)6
and
Perr(p) = Pde(p) = 21p2
− 70p3
+ 105p4
− 84p5
+ 35p6
− 6p7
.
This shows that the error probabilities of the repetition code is smaller than
the one for the Hamming code. This comparison is not fair, since only one bit
of information is transmitted with the repetition code and four bits with the
Hamming code. One could transmit 4 bits of information by using the repetition
code four times. This would give the error probability
1 − (1 − 3p2
+ 2p3
)4
= 12p2
− 8p3
− 54p4
+ 72p5
+ 84p6
− 216p7
+ · · ·
plot of these functions
Suppose that four bits of information are transmitted uncoded, by the Hamming
code and the triple repetition code, respectively. Then the error probabilities are
0.04, 0.002 and 0.001, respectively if the cross-over probability is 0.01. The error
probability for the repetition code is in fact smaller than that of the Hamming
code for all p ≤ 1
2 , but the transmission by the Hamming code is almost twice
as fast as the repetition code.
Example 2.4.19 Consider the binary n-fold repetition code. Let t = (n−1)/2.
Use the decoding algorithm correcting all patterns of t errors. Then
Perr(p) =
n
i=t+1
n
i
pi
(1 − p)n−i
.
Hence the error probability becomes arbitrarily small for increasing n. The price
one has to pay is that the information rate R = 1/n tends to 0. The remarkable
result of Shannon states that for a ﬁxed rate R C(p), where
C(p) = 1 + p log2(p) + (1 − p) log2(1 − p)
is the capacity of the binary symmetric channel, one can devise encoding and
decoding schemes such that Perr(p) becomes arbitrarily small. This will be
treated in Theorem 4.2.9.
The main problem of error-correcting codes from “Shannon’s point view” is to
construct eﬃcient encoding and decoding algorithms of codes with the smallest
error probability for a given information rate and cross-over probability.
Proposition 2.4.20 The probability of correct decoding of the coset leader de-
coder on a q-ary symmetric channel with cross-over probability p is given by
Pcd(p) = αC 1 − p,
p
q − 1
.

Proof. This is left as an exercise.
Example 2.4.21 ...........
2.4.3 Exercises
2.4.1 Consider the binary repetition code of length n. Compute the probabili-
ties of correct decoding, decoding error and decoding failure in case of incomplete
decoding t = (n − 1)/2 errors and complete decoding by choosing one nearest
neighbor.
2.4.2 Consider the product code of Example 2.1.2. Compute the probabilities
of correct decoding, decoding error and decoding failure in case the decoding
algorithm corrects all error patterns of at most t errors for t = 1, t = 2 and
t = 3, respectively.
2.4.3 Give a proof of Proposition 2.4.20.
2.4.4 ***Give the probability of correct decoding for the code .... for a coset
leader decoder. ***
2.4.5 ***Product code has error probability at most P1(P2(p)).***
2.5 Equivalent codes
Notice that a Hamming code over Fq of a given redundancy r is defined up to the
order of the columns of the parity check matrix and up to multiplying a column
with a nonzero constant. A permutation of the columns and multiplying the
columns with nonzero constants gives another code with the same parameters
and is in a certain sense equivalent.
2.5.1 Number of generator matrices and codes
The set of all invertible n × n matrices over the finite field Fq is denoted by
Gl(n, q). Now Gl(n, q) is a finite group with respect to matrix multiplication
and it is called the general linear group.
Proposition 2.5.1 The number of elements of Gl(n, q) is
(qn
− 1)(qn
− q) · · · (qn
− qn−1
).
Proof. Let M be an n×n matrix with rows m1, . . . , mn. Then M is invertible
if and only if m1, . . . , mn are independent and that is if and only if m1 = 0 and
mi is not in the linear subspace generated by m1, . . . , mi−1 for all i = 2, . . . , n.
Hence for an invertible matrix M we are free to choose a nonzero vector for
the first row. There are qn
− 1 possibilities for the first row. The second row
should not be a multiple of the first row, so we have qn
− q possibilities for the
second row for every nonzero choice of the first row. The subspace generated by
m1, . . . , mi−1 has dimension i−1 and qi−1
elements. The i-th row is not in this
subspace if M is invertible. So we have qn
− qi−1
possible choices for the i-th
row for every legitimate choice of the first i − 1 rows. This proves the claim.

2.5. EQUIVALENT CODES 39
Proposition 2.5.2
1) The number of k × n generator matrices over Fq is
(qn
− 1)(qn
− q) · · · (qn
− qk−1
).
2) The number of [n, k] codes over Fq is equal to the Gaussian binomial
n
k q
:=
(qn
− 1)(qn
− q) · · · (qn
− qk−1
)
(qk − 1)(qk − q) · · · (qk − qk−1)
Proof.
1) A k × n generator matrix consists of k independent rows of length n over Fq.
The counting of the number of these matrices is done similarly as in the proof
of Proposition 2.5.1.
2) The second statement is a consequence of Propositions 2.5.1 and 2.2.17, and
the fact the MG = G if and only if M = Ik for every M ∈ Gl(k, q) and k × n
generator matrix G, since G has rank k.
It is a consequence of Proposition 2.5.2 that the Gaussian binomials are integers
for every choice of n, k and q. In fact more is true.
Proposition 2.5.3 The number of [n, k] codes over Fq is a polynomial in q of
degree k(n − k) with non-negative integers as coeﬃcients.
Proof. There is another way to count the number of [n, k] codes over Fq, since
the row reduced echelon form rref(C) of a generator matrix of C is unique by
Proposition 2.2.17. Now suppose that rref(C) has pivots at j = (j1, . . . , jk) with
1 ≤ j1 · · · jk ≤ n, then the remaining entries are free to choose as long as
the row reduced echelon form at the given pivots (j1, . . . , jk) is respected. Let
the number of these free entries be e(j). Then the number of [n, k] codes over
Fq is equal to
1≤j1···jk≤n
qe(j)
.
Furthermore e(j) is maximal and equal to k(n − k) for j = (1, 2, . . . , k) This is
left as Exercise 2.5.2 to the reader.
Example 2.5.4 Let us compute the number of [3, 2] codes over Fq. According
to Proposition 2.5.2 it is equal to
3
2 q
=
(q3
− 1)(q3
− q)
(q2 − 1)(q2 − q)
= q2
+ q + 1.
which is a polynomial of degree 2 · (3 − 2) = 2 with non-negative integers as
coeﬃcients. This is in agreement with Proposition 2.5.3. If we follow the proof
of this proposition then the possible row echelon forms are
1 0 ∗
0 1 ∗
,
1 ∗ 0
0 0 1
and
0 1 0
0 0 1
,
where the ∗’s denote the entries that are free to choose. So e(1, 2) = 2, e(1, 3) = 1
and e(2, 3) = 0. Hence the number of [3, 2] codes is equal to q2
+ q + 1, as we
have seen before .

2.5.2 Isometries and equivalent codes
Definition 2.5.5 Let M ∈ Gl(n, q). Then the map
M : Fn
q −→ Fn
q ,
defined by M(x) = xM is a one-to-one linear map. Notice that the map and
the matrix are both denoted by M. Let S be a subset of Fn
q . The operation
xM, where x ∈ S and M ∈ Gl(n, q), is called an action of the group Gl(n, q)
on S. For a given M ∈ Gl(n, q), the set SM = {xM | x ∈ S}, also denoted by
M(S), is called the image of S under M.
Definition 2.5.6 The group of permutations of {1, . . . , n} is called the sym-
metric group on n letters and is denoted by Sn. Let π ∈ Sn. Define the
corresponding permutation matrix Π with entries pij by pij = 1 if i = π(j) and
pij = 0 otherwise.
Remark 2.5.7 Sn is indeed a group and has n! elements. Let Π be the permu-
tation matrix of a permutation π in Sn. Then Π is invertible and orthogonal,
that means ΠT
= Π−1
. The corresponding map Π : Fn
q → Fn
q is given by
Π(x) = y with yi = xπ(i) for all i. Now Π is an invertible linear map. Let
ei be the i-th standard basis row vector. Then Π−1
(ei) = eπ(i) by the above
conventions. The set of n × n permutation matrices is a subgroup of Gl(n, q)
with n! elements.
Definition 2.5.8 Let v ∈ Fn
q . Then diag(v) is the n × n diagonal matrix with
v on its diagonal and zeros outside the diagonal. An n × n matrix with entries
in Fq is called monomial if every row has exactly one non-zero entry and every
column has exactly one non-zero entry. Let Mono(n, q) be the set of all n × n
monomial matrices with entries in Fq.
Remark 2.5.9 The matrix diag(v) is invertible if and only if every entry of v
is not zero. Hence the set of n × n invertible diagonal matrices is a subgroup of
Gl(n, q) with (q − 1)n
elements.
Let M be an element of Mono(n, q). Define the vector v ∈ Fn
q with nonzero
entries and the map π from {1, . . . , n} to itself by π(j) = i if vi is the unique
nonzero entry of M in the i-th row and the j-th column. Now π is a permutation
by the definition of a monomial matrix. So M has entries mij with mij = vi if
i = π(j) and mij = 0 otherwise. Hence M = diag(v)Π. Therefore a matrix is
monomial if and only if it is the product of a diagonal and a permutation matrix.
The corresponding monomial map M : Fn
q → Fn
q of the monomial matrix M is
given by M(x) = y with yi = vixπ(i). The set of Mono(n, q) is a subgroup of
Gl(n, q) with (q − 1)n
n! elements.
Definition 2.5.10 A map ϕ : Fn
q → Fn
q is called an isometry if it leaves the
Hamming metric invariant, that means that
d(ϕ(x), ϕ(y)) = d(x, y)
for all x, y ∈ Fn
q . Let Isom(n, q) be the set of all isometries of Fn
q .
Proposition 2.5.11 Isom(n, q) is a group under the composition of maps.

Proof. The identity map is an isometry.
Let ϕ and ψ be isometries of Fn
q . Let x, y ∈ Fn
q . Then
d((ϕ ◦ ψ)(x), (ϕ ◦ ψ)(y)) = d(ϕ(ψ(x)), ϕ(ψ(y))) = d(ψ(x), ψ(y)) = d(x, y).
Hence ϕ ◦ ψ is an isometry.
Let ϕ be an isometry of Fn
q . Suppose that x, y ∈ Fn
q and ϕ(x) = ϕ(y). Then
0 = d(ϕ(x), ϕ(y)) = d(x, y). So x = y. Hence ϕ is bijective. Therefore it has
an inverse map ϕ−1
.
Let x, y ∈ Fn
q . Then
d(x, y) = d(ϕ(ϕ−1
(x)), ϕ(ϕ−1
(y)) = d(ϕ−1
(x), ϕ−1
(y)),
since ϕ is an isometry. Therefore ϕ−1
is an isometry.
So Isom(n, q) is not-empty and closed under taking the composition of maps
and taking the inverse. Therefore Isom(n, q) is a group.
Remark 2.5.12 Permutation matrices define isometries. Translations and in-
vertible diagonal matrices and more generally the coördinatewise permutation
of the elements of Fq define also isometries. Conversely, every isometry is the
composition of the before mentioned isometries. This fact we leave as Exercise
2.5.4. The following proposition characterizes linear isometries.
Proposition 2.5.13 Let M ∈ Gl(n, q). Then the following statements are
equivalent:
(1) M is an isometry,
(2) wt(M(x))) = wt(x) for all x ∈ Fn
q , so M leaves the weight invariant,
(3) M is a monomial matrix.
Proof.
Statements (1) and (2) are equivalent, since M(x − y) = M(x) − M(y) and
d(x, y) = wt(x − y).
Statement (3) implies (1), since permutation matrices and invertible diagonal
matrices leave the weight of a vector invariant, and a monomial matrix is a
product of such matrices by Remark 2.5.9.
Statement (2) implies (3): Let ei be the i-th standard basis vector of Fn
q . Then
ei has weight 1. So M(ei) has also weight 1. Hence M(ei) = vieπ(i), where vi
is a nonzero element of Fq, and π is a map from {1, . . . , n} to itself. Now π is
a bijection, since M is invertible. So π is a permutation and M = diag(v)Π−1
.
Therefore M is a monomial matrix.
Corollary 2.5.14 An isometry is linear if and only if it is a map coming from
a monomial matrix, that is
Gl(n, q) ∩ Isom(n, q) = Mono(n, q).
Proof. This follows directly from the definitions and Proposition 2.5.13.
Definition 2.5.15 Let C and D be codes in Fn
q that are not necessarily linear.
Then C is called equivalent to D if there exists an isometry ϕ of Fn
q such that
ϕ(C) = D. If moreover C = D, then ϕ is called an automorphism of C. The

automorphism group of C is the set of all isometries ϕ such that ϕ(C) = C and
is denoted by Aut(C).
C is called permutation equivalent to D, and is denoted by D ≡ C if there exists
a permutation matrix Π such that Π(C) = D. If moreover C = D, then Π
is called an permutation automorphism of C. The permutation automorphism
group of C is the set of all permutation automorphism of C and is denoted by
PAut(C).
C is called generalized equivalent or monomial equivalent to D, denoted by
D ∼= C if there exists a monomial matrix M such that M(C) = D. If moreover
C = D, then M is called a monomial automorphism of C. The monomial
automorphism group of C is the set of all monomial automorphism of C and is
denoted by MAut(C).
Proposition 2.5.16 Let C and D be two Fq-linear codes of the same length.
Then:
(1) If C ≡ D, then C⊥
≡ D⊥
.
(2) If C ∼= D, then C⊥ ∼= D⊥
.
(3) If C ≡ D, then C ∼= D.
(4) If C ∼= D, then C and D have the same parameters.
Proof. We leave the proof to the reader as an exercise.
Remark 2.5.17 Every [n, k] code is equivalent to a code which is systematic at
the ﬁrst k positions, that is with a generator matrix of the form (Ik|P) according
to Remark 2.3.8.
Notice that in the binary case C ≡ D if and only if C ∼= D.
Example 2.5.18 Let C be a binary [7,4,3] code with parity check matrix H.
Then H is a 3×7 matrix such that all columns arre nonzero and mutually distinct
by Proposition 2.3.11, since C has minimum distance 3. There are exactly 7
binary nonzero column vectors with 3 entries. Hence H is a permutation of the
columns of a parity check matrix of the [7,4,3] Hamming code. Therefore: every
binary [7,4,3] code is permutation equivalent with the Hamming code.
Proposition 2.5.19
(1) Every Fq-linear code with parameters [(qr
−1)/(q −1), (qr
−1)/(q −1)−r, 3]
is generalized equivalent with the Hamming code Hr(q).
(2) Every Fq-linear code with parameters [(qr
−1)/(q −1), r, qr−1
] is generalized
equivalent with the simplex code Sr(q).
Proof. (1) Let n = (qr
− 1)/(q − 1). Then n is the number of lines in Fr
q
through the origin. Let H be a parity check matrix of an Fq-linear code with
parameters [n, n − r, 3]. Then there are no zero columns in H and every two
columns are independent by Proposition 2.3.11. Every column of H generates
a unique line in Fr
q through the origin, and every such line is obtained in this
way. Let H be the parity check matrix of a code C with the same parameters
[n, n − r, 3]. Then for every column hj of H there is a unique column hi of H
such that hj is nonzero multiple of hi. Hence H = HM for some monomial
matrix M. Hence C and C are generalized equivalent.
(2) The second statement follows form the ﬁrst one, since the simplex code is
the dual of the Hamming code.

Remark 2.5.20 A code of length n is called cyclic if the cyclic permutation
of coördinates σ(i) = i − 1 modulo n leaves the code invariant. A cyclic code
of length n has an element of order n in its automorphism group. Cyclic codes
are extensively treated in Chapter 7.1.
Remark 2.5.21 Let C be an Fq-linear code of length n. Then PAut(C) is a
subgroup of Sn and MAut(C) is a subgroup of Mono(n, q). If C is a trivial
code, then PAut(C) = Sn and MAut(C) = Mono(n, q). The matrices λIn ∈
MAut(C) for all nonzero λ ∈ Fq. So MAut(C) always contains F∗
q as a subgroup.
Furthermore Mono(n, q) = Sn and MAut(C) = PAut(C) if q = 2.
Example 2.5.22 Let C be the n-fold repetition code. Then PAut(C) = Sn
and MAut(C) isomorphic with F∗
q × Sn.
Proposition 2.5.23 Let G be a generator matrix of an Fq-linear code C of
length n. Let Π be an n × n permutation matrix. Let M ∈ Mono(n, q). Then:
(1) Π ∈ PAut(C) if and only if rref(G) = rref(GΠ),
(2) M ∈ MAut(C) if and only if rref(G) = rref(GM).
Proof. (1) Let Π be a n × n permutation matrix. Then GΠ is a generator
matrix of Π(C). Moreover Π(C) = C if and only if rref(G) = rref(GΠ) by
Proposition 2.2.17.
(2) The second statement is proved similarly.
Example 2.5.24 Let C be the code with generator matrix G and let M be the
monomial matrix given by
G =
1 0 a1
0 1 a2
and M =


0 x2 0
x1 0 0
0 0 x3

 ,
where the ai and xj are nonzero elements of Fq. Now G is already in reduced
row echelon form. One verifies that
rref(GM) =
1 0 a2x3/x1
0 1 a1x3/x2
.
Hence M is monomial automorphism of C if and only if a1x1 = a2x3 and
a2x2 = a1x3.
Definition 2.5.25 A map f from the set of all (linear) codes to another set
is called an invariant of a (linear) code if f(C) = f(ϕ(C)) for every code C in
Fn
q and every isometry ϕ of Fn
q . The map f is called a permutation invariant if
f(C) = f(Π(C)) for every code C in Fn
q and every n×n permutation matrix Π.
The map f is called a monomial invariant if f(C) = f(M(C)) for every code C
in Fn
q and every M ∈ Mono(n, q) .
Remark 2.5.26 The length, the number of elements and the minimum dis-
tance are clearly invariants of a code. The dimension is a permutation and
a monomial invariant of a linear code. The isomorphy class of the group of
autmorphisms of a code is an invariant of a code. The isomorphy classes of
PAut(C) and MAut(C) are permutation and monomial invariants, respectively
of a linear code.

2.5.3 Exercises
2.5.1 Determine the number of [5, 3] codes over Fq by Proposition 2.5.2 and
show by division that it is a polynomial in q. Determine the exponent e(j) and
the number of codes such that rref(C) is systematic at a given 3-tuple (j1, j2, j3)
for all 3-tuples with 1 ≤ j1 j2 j3 ≤ 5, as in Proposition 2.5.3, and verify
that they sum up to the total number of [5, 3] codes.
2.5.2 Show that e(j) =
k
t=1 t(jt+1 −jt −1) for every k-tuple (j1, . . . , jk) with
1 ≤ j1 . . . jk ≤ n and jk+1 = n + 1 in the proof of Proposition 2.5.3.
Show that the maximum of e(j) is equal to k(n − k) and that this maximum is
attained by exactly one k-tuple that is by (1, 2, . . . , k).
2.5.3 Let p be a prime. Let q = pm
. Consider the map ϕ : Fn
q → Fn
q defined
by ϕ(x1, . . . , xn) = (xp
1, . . . , xp
n). Show that ϕ is an isometry that permutates
the elements of the alphabet Fq coördinatewise. Prove that ϕ is a linear map
if and only if m = 1. So ϕ is not linear if m 1 . Show that ϕ(C) is a linear
code if C is a linear code.
2.5.4 Show that permutation matrices and the coördinatewise permutation of
the elements of Fq define isometries. Show that every element of Isom(n, q) is
the composition of a permutation matrix and the coördinatewise permutation
of the elements of Fq. Moreover such a composition is unique. Show that the
number of elements of Isom(n, q) is equal to n!(q!)n
.
2.5.6 Show that every binary (7,16,3) code is isometric with the Hamming
code.
2.5.7 Let C be a linear code of length n. Assume that n is a power of a prime.
Show that if there exists an element in PAut(C) of order n, then C is equivalent
with a cyclic code. Show that the assumption on n being a prime power is
necessary by means of a counterexample.
2.5.8 A code C is called quasi self-dual if it is monomial equivalent with its
dual. Consider the [2k, k] code over Fq with generator matrix (Ik|Ik). Show
that this code quasi self-dual for all q and self-dual if q is even.
2.5.9 Let C be an Fq-linear code of length n with hull H(C) = C ∩C⊥
. Let Π
be an n × n permutation matrix. Let D be an invertible n × n diagonal matrix.
Let M ∈ Mono(n, q).
(1) Show that (Π(C))⊥
= Π(C⊥
).
(2) Show that H(Π(C)) = Π(H(C)).
(3) Show that (D(C))⊥
= D−1
(C⊥
).
(4) Show that H(M(C)) = M(H(C)) if q = 2 or q = 3.
(5) Show by means of a counter example that the dimension of the hull of a
linear code over Fq is not a monomial invariant for q 3.
2.5.10 Show that every linear code over Fq is monomial equivalent to a code
with a complementary dual if q 3.

2.6. NOTES 45
2.5.11 Let C be the code of Example 2.5.24. Show that this code has 6(q − 1)
monomial automorphisms. Compute Aut(C) for all possible choices of the ai.
2.5.12 Show that PAut(C⊥
) and and MAut(C⊥
) are isomorphic as a groups
with PAut(C) and MAut(C), respectively.
2.5.13 Determine the automorphism group of the ternary code with generator
matrix
1 0 1 1
0 1 1 2
.
2.5.14 Show that in Example 12.5.5 the permutation automorphism groups
obtained for Hamming codes in GAP- and Magma- programs are diﬀerent. This
implies that these codes are not the same. Find out what is the permutation
equivalence between these codes.
2.6 Notes
One considers the seminal papers of Shannon [107] and Hamming [61] as the
starting point of information theory and coding theory. Many papers that ap-
peared in the early days of coding theory and information theory are published
in Bell System Technical Journal, IEEE Transaction on Information Theory
and Problemy Peredachi Informatsii. They were collected as key papers in
[21, 10, 111].
We mention the following classical textbooks in coding theory [3, 11, 19, 62,
75, 76, 78, 84, 93] and several more recent ones [20, 67, 77]. The Handbook on
coding theory [95] gives a wealth on information.
Audio-visual media, compact disc and DVD [76, 105]
fault-tolerant computers ...[]
deep space telecommunication [86, 134]
***Elias, sequence of codes with with R 0 and error probability going to
zero.***
***Forney, concatenated codes, sequence of codes with with R near capacity
and error probability going to zero and eﬃcient decoding algorithm.***
***Elias Wozencraft, list decoding***.

Chapter 3
Code constructions and
bounds
This chapter treats the existence and nonexistence of codes. Several construc-
tions show that the existence of one particular code gives rise to a cascade of
derived codes. Upper bounds in terms of the parameters exclude codes and
lower bounds show the existence of codes.
3.1 Code constructions
In this section, we discuss some classical methods of constructing new codes
using known codes.
3.1.1 Constructing shorter and longer codes
The most obvious way to make a shorter code out of a given code is to delete
several coördinates. This is called puncturing.
Definition 3.1.1 Let C be an [n, k, d] code. For any codeword, the process
of deleting one or more fixed coördinates is called puncturing. Let P be a
subset of {1, . . . , n} consisting of p integers such that its complement is the
set {i1, . . . , in−p} with 1 ≤ i1 · · · in−p ≤ n. Let x ∈ Fn
q . Define xP =
(xi1
, . . . , xin−p
) ∈ Fn−p
q . Let CP be the set of all punctured codewords of C,
where the puncturing takes place at all the positions of P:
CP = { cP | c ∈ C }.
We will also use the notation w.r.t non-punctured positions.
Definition 3.1.2 Let R be a subset of {1, . . . , n} consisting of r integers {i1, . . . , ir}
with 1 ≤ i1 · · · ir ≤ n. Let x ∈ Fn
q . Define x(R)(xi1
, . . . , xir
) ∈ Fr
q. Let
C(R) be the set of all codewords of C restricted to the positions of R:
C(R) = { c(R) | c ∈ C }.
47

48 CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Remark 3.1.3 So, CP is a linear code of length n − p, where p is the number
or elements of P. Furthermore CP is linear, since C is linear. In fact, suppose
G is a generator matrix of C. Then CP is a linear code generated by the rows
of GP , where GP is the k × (n − p) matrix consisting of the n − p columns at
the positions i1, . . . , in−p of G. If we consider the restricted code C(R), then its
generator matrix G(R) is the k × r submatrix of G composed of the columns
indexed by j1, . . . , jr, where R = {j1, . . . , jr}.
Proposition 3.1.4 Let C be an [n, k, d] code. Suppose P consists of p elements.
Then the punctured code CP is an [n − p, kP , dP ] code with
d − p ≤ dP ≤ d and k − p ≤ kP ≤ k.
If moreover p d, then kP = k.
Proof. The given upper bounds are clear. Let c ∈ C. Then at most p nonzero
positions are deleted from c to obtain cP . Hence wt(cP ) ≥ wt(c) − p. Hence
dP ≥ d − p.
The column rank of G, which is equal to the row rank, is k. The column rank
of GP must be greater than or equal to k − p, since p columns are deleted. This
implies that the row rank of GP is at least k − p. So kP ≥ k − p.
Suppose p d. If c and c are two distinct codewords in C, then d(cP , cP ) ≥
d−p 0 so cP and cP are distinct. Therefore C and CP have the same number
of codewords. Hence k = kP .
Example 3.1.5 It is worth pointing out that the dimension of CP can be
smaller than k. From the definition of puncturing, CP seemingly has the same
number of codewords as C. However, it is possible that C contains some dis-
tinct codewords that have the same coördinates outside the positions of P. In
this case, after deleting the coördinates in the complement of P, the number of
codewords of CP is less than that of C. Look at the following simple example.
Let C be the binary code with generator matrix
G =


1 1 0 0
1 1 1 0
0 0 1 1

 .
This is a [4, 3, 1] code. Let P = {4}. Then, the rows of GP are (1, 1, 0), (1, 1, 1)
and (0, 0, 1). It is clear that the second row is the sum of the first and second
ones. So, GP has row rank 2, and CP has dimension 2.
In this example we have d = 1 = p.
We now introduce an inverse process to puncturing the code C, which is called
extending the code.
Definition 3.1.6 Let C be a linear code of length n. Let v ∈ Fn
q . The extended
code Ce
(v) of length n + 1 is defined as follows. For every codeword c =
(c1, . . . , cn) ∈ C, construct the word ce
(v) by adding the symbol cn+1(v) ∈ Fq
at the end of c such that the following parity check holds
v1c1 + v2c2 + · · · + vncn + cn+1 = 0.
Now Ce
(v) consists of all the codewords ce
(v), where c is a codeword of C. In
case v is the all-ones vector, then Ce
(v) is denoted by Ce
.

3.1. CODE CONSTRUCTIONS 49
Remark 3.1.7 Let C be an [n, k] code. Then it is clear that Ce
(v) is a linear
subspace of Fn+1
q , and has dimension k. So, Ce
(v) is an [n+1, k] code. Suppose
G and H are generator and parity check matrices of C, respectively. Then,
Ce
(v) has a generator matrix Ge
(v) and a parity check matrix He
(v), which
are given by
Ge
(v) =





G
g1n+1
g2n+1
...
gkn+1





and He
(v) =





v1 v2 · · · vn 1
H
0
...
0





,
where the last column of Ge
(v) has entries gin+1 = −
n
j=1 gijvj.
Example 3.1.8 The extension of the [7,4,3] binary Hamming code with the
generator matrix given in Example 2.2.14 is equal to the [8,4,4] code with the
generator matrix given in Example 2.3.26. The increase of the minimum dis-
tance by one in the extension of a code of odd minimum distance is a general
phenomenon for binary codes.
Proposition 3.1.9 Let C be a binary [n, k, d] code. Then Ce
has parameters
[n + 1, k, de] with de = d if d is even and de = d + 1 if d is odd.
Proof. Let C be a binary [n, k, d] code. Then Ce
is an [n+1, k] code by Remark
3.1.7. The minimum distance de of the extended code satisﬁes d ≤ de ≤ d + 1,
since wt(c) ≤ wt(ce
) ≤ wt(c) + 1 for all c ∈ C. Suppose moreover that C is a
binary code. Assume that d is even. Then there is a codeword c of weight d
and ce
is obtained form c by extending with a zero. So ce
has also weight d. If
d is odd, then the claim follows, since all the codewords of the extended code
Ce
have even weight by the parity check c1 + · · · + cn+1 = 0.
Example 3.1.10 The binary [2r
−1, 2r
−r−1, 3] Hamming code Hr(2) has the
extension Hr(2)e
with parameters [2r
, 2r
− r − 1, 4]. The binary [2r
− 1, r, 2r−1
]
Simplex code Sr(2) has the extension Sr(2)e
with parameters [2r
, r, 2r−1
]. These
claims are a direct consequence of Propositions 2.3.14 and 2.3.16, Remark 3.1.7
and Proposition 3.1.9.
The operations extending and puncturing at the last position are inverse to each
other.
Proposition 3.1.11 Let C be a linear code of length n. Let v in Fn
q . Let
P = {n + 1} and Q = {n}. Then (Ce
(v))P = C. If the all-ones vector is a
parity check of C, then (CQ)e
= C.
Proof. The ﬁrst statement is a consequence of the fact that (ce
(v))P = c for
all words. The last statement is left as an exercise.
Example 3.1.12 The puncturing of the extended binary Hamming code Hr(2)e
gives the original Hamming code back.
By taking subcodes appropriately, we can get some new codes. The following
technique of constructing a new code involves a process of taking a subcode and
puncturing.

Definition 3.1.13 Let C be an [n, k, d] code. Let S be a subset of {1, . . . , n}.
Let C(S) be the subcode of C consisting of all c ∈ C such that ci = 0 for all
i ∈ S. The shortened code CS
is defined by CS
= (C(S))S. It is obtained by
puncturing the subcode C(S) at S, so by deleting the coördinates that are not
in S.
Remark 3.1.14 Let S consist of s elements. Let x ∈ Fn−s
q . Let xS
∈ Fn
q be
the unique word of length n such that x = (xS
)S and the entries of xS
at the
positions of S are zero, by extending x with zeros appropriately. Then
x ∈ CS
if and only if xS
∈ C.
Furthermore
xS
· y = x · yS for all x ∈ Fn−s
q and y ∈ Fn
q .
Proposition 3.1.15 Let C be an [n, k, d] code. Suppose S consists of s ele-
ments. Then the shortened code CS
is an [n − s, kS, dS] code with
k − s ≤ kS ≤ k and d ≤ dS.
Proof. The dimension of CS
is equal to the dimension of the subcode C(S) of
C, and C(S) is defined by s homogeneous linear equations of the form ci = 0.
This proves the statement about the dimension.
The minimum distance of CS
is the same as the minimum distance of C(S),
and C(S) is a subcode of C. Hence d ≤ dS.
Example 3.1.16 Consider the binary [8,4,4] code of Example 2.3.26. In the
following diagram we show what happens with the generator matrix by short-
ening at the first position in the left column of the diagram, by puncturing at
the first position in the right column, and by taking the dual in the upper and
lower row of the diagram .




1 0 0 0 0 1 1 1
0 1 0 0 1 0 1 1
0 0 1 0 1 1 0 1
0 0 0 1 1 1 1 0




dual
←→




0 1 1 1 1 0 0 0
1 0 1 1 0 1 0 0
1 1 0 1 0 0 1 0
1 1 1 0 0 0 0 1




↓ shorten at first position ↓ puncture at first postion


1 0 0 1 0 1 1
0 1 0 1 1 0 1
0 0 1 1 1 1 0

 dual
←→




1 1 1 1 0 0 0
0 1 1 0 1 0 0
1 0 1 0 0 1 0
1 1 0 0 0 0 1




Notice that the diagram commutes. This is a general fact as stated in the
following proposition.

Proposition 3.1.17 Let C be an [n, k, d] code. Let P and S be subsets of
{1, . . . , n}. Then
(CP )⊥
= (C⊥
)P
and (CS
)⊥
= (C⊥
)S,
dim CP + dim(C⊥
)P
= |P| and dim CS
+ dim(C⊥
)S = |S|.
Proof. Let x ∈ (CP )⊥
. Let z ∈ C. Then zP ∈ CP . So xP
· z = x · zP = 0, by
Remark 3.1.14. Hence xP
∈ C⊥
and x ∈ (C⊥
)P
. Therefore (CP )⊥
⊆ (C⊥
)P
.
Conversely, let x ∈ (C⊥
)P
. Then xP
∈ C⊥
. Let y ∈ CP . Then y = zP for
some z ∈ C. So x · y = x · zP = xP
· z = 0. Hence x ∈ (CP )⊥
. Therefore
(C⊥
)P
⊆ (CP )⊥
, and if fact equality holds, since the converse inclusion was
already shown.
The statement on the dimensions is a direct consequence of the corresponding
equality of the codes.
The claim about the shortening of C with S is a consequence on the equality
on the puncturing with S = P applied to the dual C.
If we want to increase the size of the code without changing the code length.
We can augment the code by adding a word which is not in the code.
Definition 3.1.18 Let C be an Fq-linear code of length n. Let v in Fn
q . The
augmented code, denoted by Ca
(v), is defined by
Ca
(v) = { αv + c | α ∈ Fq, c ∈ C }.
If v is the all-ones vector, then we denote Ca
(v) by Ca
.
Remark 3.1.19 The augmented code Ca
(v) is a linear code. Suppose that G
is a generator matrix of C. Then the (k+1)×n matrix Ga
(v), which is obtained
by adding the row v to G, is a generator matrix of Ca
(v) if v is not an element
of C.
Proposition 3.1.20 Let C be a code of minimum distance d. Suppose that the
vector v is not in C and has weight w. Then
min{d − w, w} ≤ d(Ca
(v)) ≤ min{d, w}.
In particular d(Ca
(v)) = w if w ≤ d/2.
Proof. C is a subcode and v is an element of the augmented code. This
implies the upper bound.
The lower bound is trivially satisfied if d ≤ w. Suppose w d. Let x be a
nonzero element of Ca
(v). Then x = αv + c for some α ∈ Fq and c ∈ C. If
α = 0, then wt(x) = wt(c) ≥ d w. If c = 0, then wt(x) = wt(v) = w.
If α = 0 and c = 0, then c = αv − x. So d ≤ wt(c) ≤ w + wt(x). Hence
d − w ≤ wt(x).
If w ≤ d/2, then the upper and lower bound are both equal to w.
Suppose C is a binary [n, k, d] code. We get a new code by deleting the code-
words of odd weight. In other words, the new code Cev consists of all the
codewords in C which have even weight. It is called the even weight subcode in
Example 2.2.8. This process is also called expurgating the code C.

Definition 3.1.21 Let C be an Fq-linear code of length n. Let v ∈ Fn
q . The
expurgated code of C is denoted by Ce(v) and is defined by
Ce(v) = { c | c ∈ C and c · v = 0 }.
If v = 1, then Ce(1) is denoted by Ce.
Proposition 3.1.22 Let C be an [n, k, d] code. Then
(Ca
(v))⊥
= (C⊥
)e(v).
Proof. If v ∈ C, then Ca
(v) = C and v is a parity check of C, so (C⊥
)e(v) =
C⊥
. Suppose v is not an element of C. Let G be a generator matrix of C.
Then G is a parity check matrix of C⊥
, by Proposition 2.3.29. Now Ga
(v) is a
generator matrix of Ca
(v) by definition. Hence Ga
(v) is a parity check matrix
of (Ca
(v))⊥
Furthermore Ga
(v) is also a parity check matrix of (C⊥
)e(v) by
definition. Hence (Ca
(v))⊥
= (C⊥
)e(v).
Lengthening a code is a technique which combines augmenting and extending.
Definition 3.1.23 Let C be an [n, k] code. Let v in Fn
q . The lengthened
code Cl
(v) is obtained by first augmenting C by v, and then extending it:
Cl
(v) = (Ca
(v))e
. If v = 1, then Cl
(v) is denoted by Cl
.
Remark 3.1.24 The lengthening of an [n,k] code is linear code. If v is not
element of C, then Cl
(v) is an [n + 1, k + 1] code.
3.1.2 Product codes
We describe a method for combining two codes to get a new code. In Example
2.1.2 the [9,4,4] product code is introduced. This construction will be general-
ized in this section.
Consider the identification of the space of all n1 × n2 matrices with entries in
Fq and the space Fn
q , where the matrix X = (xij)1≤i≤n1,1≤j≤n2
is mapped to
the vector x with entries x(i−1)n2+j = xij. In other words, the rows of X are
put in linear order behind each other:
x = (x11, x12, . . . , x1n2 , x21, . . . , x2n2 , x31, . . . , xn1n2 ).
For α ∈ Fq and n1 × n2 matrices (xij) and (yij) with entries in Fq, the scalar
multiplication and addition are defined by:
α(xij) = (αxij), and (xij) + (yij) = (xij + yij).
These operations on matrices give the corresponding operations of the vectors
under the identification. Hence the identification of the space of n1 ×n2 matrices
and the space Fn
q is an isomorphism of vector spaces. In the following these two
spaces are identified.
Definition 3.1.25 Let C1 and C2 be respectively [n1, k1, d1] and [n2, k2, d2]
codes. Let n = n1n2. The product code, denoted by C1 ⊗ C2 is defined by
C1 ⊗ C2 = (cij)1≤i≤n1,1≤j≤n2
(cij)1≤i≤n1
∈ C1, for all j
(cij)1≤j≤n2
∈ C2, for all i
.

From the definition, the product code C1 ⊗ C2 is exactly the set of all n1 × n2
arrays whose columns belong to C1 and rows to C2. In the literature, the
product code is called direct product, or Kronecker product, or tensor product
code.
Example 3.1.26 Let C1 = C2 be the [3, 2, 2] binary even weight code. So it
consists of the following codewords:
(0, 0, 0), (1, 1, 0), (1, 0, 1), (0, 1, 1).
This is the set of all words (m1, m2, m1 + m2) where m1 and m2 are arbitrary
bits. By the definition, the following 16 arrays are the codewords of the product
code C1 ⊗ C2:


m1 m2 m1 + m2
m3 m4 m3 + m4
m1 + m3 m2 + m4 m1 + m2 + m3 + m4

 ,
where the mi are free to choose. So indeed this is the product code of Example
2.1.2. The sum of two arrays (cij) and (cij) is the array (cij + cij). Therefore,
C1 ⊗ C2 is a linear codes of length 9 = 3 × 3 and dimension 4 = 2 × 2. And it
is clear that the minimum distance of C1 ⊗ C2 is 4 = 2 × 2.
This is a general fact, but before we state this result we need some preparations.
Definition 3.1.27 For two vectors x = (x1, . . . , xn1
) and y = (y1, . . . , yn2
), we
define the tensor product of them, denoted by x ⊗ y, as the n1 × n2 array whose
(i, j)-entry is xiyj.
Remark 3.1.28 It is clear that C1 ⊗ C2 is a linear code if C1 and C2 are both
linear.
Remark that x ⊗ y ∈ C1 ⊗ C2 if x ∈ C1 and y ∈ C2, since the i-th row of
x ⊗ y is xiy ∈ C2 and the j-th column is yjxT
and yjx ∈ C1. But the set of
all x ⊗ y ∈ C1 ⊗ C2 with x ∈ C1 and y ∈ C2 is not equal to C1 ⊗ C2. In the
previous example 

0 1 1
1 0 1
1 1 0


is in the product code, but it is not of the form x ⊗ y ∈ C1 ⊗ C2 with x ∈ C1,
since otherwise it would have at least one zero row and at least one zero column.
In general, the number of elements of the form x ⊗ y ∈ C1 ⊗ C2 with x ∈ C1
and y ∈ C2 is equal to qk1+k2
, but x ⊗ y = 0 if x = 0 or y = 0. Moreover
λ(x ⊗ y) = (λx) ⊗ y = x ⊗ (λy) for all λ ∈ Fq. Hence the we get at most
(qk1
− 1)(qk2
− 1)/(q − 1) + 1 of such elements. If k1 1 and k2 1 then this is
smaller than qk1k2
, the number of elements of C1 ⊗C2 according to the following
proposition.
Proposition 3.1.29 Let x1, . . . , xk ∈ Fn1
q and y1, . . . , yk ∈ Fn2
q . If y1, . . . , yk
are independent and x1 ⊗ y1 + · · · + xk ⊗ yk = 0, then xi = 0 for all i.
Proof. Suppose that y1, . . . , yk are independent and x1 ⊗y1 +· · ·+xk ⊗yk = 0.
Let xis be the s-the entry of xi. Then the s-th row of j xj ⊗ yj is equal to
j xjsyj, which is equal to 0 by assumption. Hence xjs = 0 for all j and s.
Hence xj = 0 for all j.

Corollary 3.1.30 Let x1, . . . , xk1
∈ Fn1
q and y1, . . . , yk2
∈ Fn2
q . If x1, . . . , xk1
and y1, . . . , yk2
are both independent, then { xi ⊗ yj | 1 ≤ i ≤ k1, 1 ≤ j ≤ k2 }
is an independent set of matrices.
Proof. Suppose that i,j λijxi ⊗ yj = 0 for certain scalars λij ∈ Fq. Then
j( i λijxi)⊗yj = 0 and y1, . . . , yk2
∈ Fn2
q are independent. So i λijxi = 0
for all j by Proposition 3.1.29. Hence λij = 0 for all i, j, since x1, . . . , xk1
are
independent.
Proposition 3.1.31 Let x1, . . . , xk1
in Fn1
q be a basis of C1 and y1, . . . , yk2
in
Fn2
q a basis of C2. Then
{ xi ⊗ yj | 1 ≤ i ≤ k1, 1 ≤ j ≤ k2 }
is a basis of C1 ⊗ C2.
Proof. The given set is an independent set by Corollary 3.1.30. This set is
a subset of C1 ⊗ C2. So the dimension of C1 ⊗ C2 is at least k1k2. Now we
will show that they form in fact a basis for C1 ⊗ C2. Without loss of generality
we may assume that C1 is systematic at the first k1 coordinates with generator
matrix (Ik1 |A) and C2 is systematic at the first k2 coordinates with generator
matrix (Ik2
|B). Then U is an l × n2 matrix, with rows in C2 if and only if
U = (M|MB), where M is an l × k2 matrix. And V is an n1 × m matrix, with
columns in C1 if and only if V T
= (N|NA), where N is an m × k1 matrix. Now
let M be an k1 × k2 matrix. Then (M|MB) is a k1 × n2 matrix with rows in
C2, and
M
AT
M
is an n1 × k2 matrix with columns in C1. Therefore
M MB
AT
M AT
MB
is an n1 × n2 matrix with columns in C1 and rows in C2 for every k1 × k2
matrix M. And conversely every codeword of C1 ⊗C2 is of this form. Hence the
dimension of C1 ⊗ C2 is equal to k1k2 and the given set is a basis of C1 ⊗ C2.
Theorem 3.1.32 Let C1 and C2 be respectively [n1, k1, d1] and [n2, k2, d2].
Then the product code C1 ⊗ C2 is an [n1n2, k1k2, d1d2] code.
Proof. By definition n = n1n2 is the length of the product code. It was
already mentioned that C1 ⊗ C2 is a linear subspace of Fn1n2
q . The dimension
of the product code is k1k2 by Proposition 3.1.31.
Next, we prove that the minimum distance of C1 ⊗C2 is d1d2. For any codeword
of C1 ⊗ C2, which is a n1 × n2 array, every nonzero column has weight ≥ d1,
and every nonzero row has weight ≥ d2. So, the weight of a nonzero codeword
of the product code is at least d1d2. This implies that the minimum distance of
C1 ⊗ C2 is at least d1d2. Now suppose x ∈ C1 has weight d1, and y ∈ C2 has
weight d2. Then, x ⊗ y is a codeword of C1 ⊗ C2 and has weight d1d2.

Definition 3.1.33 Let A = (aij) be a k1 × n1 matrix and B = (bij) a k2 × n2
matrix. The Kronecker product or tensor product A ⊗ B of A and B is the
k1k2 × n1n2 matrix obtained from A by replacing every entry aij by aijB.
Remark 3.1.34 The tensor product x ⊗ y of the two row vectors x and y of
length n1 and n2, respectively, as defined in Definition 3.1.27 is the same as the
Kronecker product of xT
and y, now considered as n1 × 1 and 1 × n2 matrices,
respectively, as in Definition 3.1.33.
Proposition 3.1.35 Let G1 be a generator matrix of C1, and G2 a generator
matrix of C2. Then G1 ⊗ G2 is a generator matrix of C1 ⊗ C2.
Proof. In this proposition the codewords are considered as elements of Fn
q and
no longer as matrices. Let xi the i-th row of G1, and denote by yj the j-th row
of G2. So x1, . . . , xk1 ∈ Fn1
q is a basis of C1 and y1, . . . , yk2 ∈ Fn2
q is a basis of
C2. Hence the set {xi ⊗ yj | 1 ≤ i ≤ k1, 1 ≤ j ≤ k2} is a basis of C1 ⊗ C2 by
Proposition 3.1.31. Furthermore, if l = (i − 1)k2 + j, then xi ⊗ yj is the l-th
row of G1 ⊗ G2. Hence the matrix G1 ⊗ G2 is a generator matrix of C1 ⊗ C2.
Example 3.1.36 Consider the ternary codes C1 and C2 with generator matri-
ces
G1 =
1 1 1
0 1 2
and G2 =


1 1 1 0
0 1 2 0
0 1 1 1

 ,
respectively. Then
G1 ⊗ G2 =








1 1 1 0 1 1 1 0 1 1 1 0
0 1 2 0 0 1 2 0 0 1 2 0
0 1 1 1 0 1 1 1 0 1 1 1
0 0 0 0 1 1 1 0 2 2 2 0
0 0 0 0 0 1 2 0 0 2 1 0
0 0 0 0 0 1 1 1 0 2 2 2








.
The second row of G1 is x2 = (0, 1, 2) and y2 = (0, 1, 2, 0) is the second row of
G2. Then x2 ⊗ y2 is equal to


0 0 0 0
0 1 2 0
0 2 1 0

 ,
considered as a matrix, and equal to (0, 0, 0, 0, 0, 1, 2, 0, 0, 2, 1, 0) written as a
vector, which is indeed equal to the (2 − 1)3 + 2 = 5-th row of G1 ⊗ G2.
3.1.3 Several sum constructions
We have seen that given an [n1, k1] code C1 and an [n2, k2] code C2, by the
product construction, we get an [n1n2, k1k2] code. The product code has infor-
mation rate (k1k2)/(n1n2) = R1R2, where R1 and R2 are the rates of C1 and
C2, respectively. In this subsection, we introduce some simple constructions by
which we can get new codes with greater rate from two given codes.

Definition 3.1.37 Given an [n1, k1] code C1 and an [n2, k2] code. Their direct
sum C1 ⊕ C2, also called (u|v) construction is defined by
C1 ⊕ C2 = { (u|v) | u ∈ C1, v ∈ C2 },
where (u|v) denotes the word (u1, . . . , un1
, v1, . . . , vn2
) if u = (u1, . . . , un1
) and
v = (v1, . . . , vn2
).
Proposition 3.1.38 Let Ci be an [ni, ki, di] code with generator matrix Gi for
i = 1, 2. Let d = min{d1, d2}. Then C1 ⊕ C2 is an [n1 + n2, k1 + k2, d] code with
generator matrix
G =
G1 0
0 G2
.
Proof. Let x1, . . . , xk1
and y1, . . . , yk2
be bases of C1 and C2, respectively.
Then (x1|0), . . . , (xk1 |0), (0|y1), . . . , (0|yk2 ) is a basis of the direct sum code.
Therefore, the direct sum is an [n1 +n2, k1 +k2] with the given generator matrix
G. The minimum distance of the direct sum is min{d1, d2}.
The direct sum or (u|v) construction is defined by the juxtaposition of arbitrary
codewords u ∈ C1 and v ∈ C2. In the following definition only a restricted set
pairs of codewords are put behind each other. This definition depends on the
choice of the generator matrices of the codes C1 and C2.
Definition 3.1.39 Let C1 be an [n1, k, d1] code and C2 an [n2, k, d2] code with
generator matrices G1 and G2, respectively. The juxtaposition of the codes C1
and C2 is the code with generator matrix (G1|G2).
Proposition 3.1.40 Let Ci be an [ni, k, di] code for i = 1, 2. Then the juxta-
position of the codes C1 and C2 is an [n1 + n2, k, d] with d ≥ d1 + d2.
Proof. The length and the dimension are clear from the definition. A nonzero
codeword c is of the form mG = (mG1, mG2) for a nonzero element m in Fk
q .
So mGi is a nonzero codeword of Ci. Hence the weight of c is at least d1 + d2.
The rate of the direct sum is (k1+k2)/(n1+n2), which is greater than (k1k2)/(n1n2),
the rate of the product code. Now a more intelligent construction is studied.
Definition 3.1.41 Let C1 be an [n, k1, d1] code and C2 an [n, k2, d2] code,
respectively. The (u|u + v) construction is the following code
{( u|u + v) | u ∈ C1, v ∈ C2 }.
Theorem 3.1.42 Let Ci be an [n, ki, di] code with generator matrix Gi for
i = 1, 2. Then the (u|u + v) construction of C1 and C2 is an [2n, k1 + k2, d]
code with minimum distance d = min{2d1, d2} and generator matrix
G =
G1 G1
0 G2
.

Proof. It is straightforward to check the linearity of the (u|u+v) construction.
Suppose x1, . . . , xk1
and y1, . . . , yk2
are bases of C1 and C2, respectively. Then,
it is easy to see that (x1|x1), . . . , (xk1
|xk1
), (0|y1), . . . , (0|yk2
) is a basis of the
(u|u+v) construction. So, it is an [2n, k1+k2] with generator matrix G as given.
Consider the minimum distance d of the (u|u + v) construction. For any code-
word (x|x + y), we have wt(x|x + y) = wt(x) + wt(x + y). If y = 0, then
wt(x|x + y) = 2wt(x) ≥ 2d1. If y = 0, then
wt(x|x + y) = wt(x) + wt(x + y) ≥ wt(x) + wt(y) − wt(x) = wt(y) ≥ d2.
Hence, d ≥ min{2d1, d2}. Let x0 be a codeword of C1 with weight d1, and y0
be a codeword of C2 with weight d2. Then, either (x0|x0) or (0|y0) has weight
min{2d1, d2}.
Example 3.1.43 The (u|u + v) construction of the binary even weight [4,3,2]
code and the 4-tuple repetition [4,1,4] code gives a [8,4,4] code with generator
matrix




1 0 0 1 1 0 0 1
0 1 0 1 0 1 0 1
0 0 1 1 0 0 1 1
0 0 0 0 1 1 1 1



 ,
which is equivalent with the extended Hamming code of Example 2.3.26.
Remark 3.1.44 For two vectors u of length n1 and v of length n2, we can still
define the sum u + v as a vector of length max{n1, n2}, by adding enough zeros
at the end of the shorter vector. From this definition of sum, the (u|u + v)
construction still works for codes C1 and C2 of different lengths.
Proposition 3.1.45 If C1 is an [n1, k1, d1] code, and C2 is an [n2, k2, d2] code,
then the (u|u + v) construction is an [n1 + max{n1, n2}, k1 + k2, min{2d1, d2}]
linear code.
Proof. The proof is similar to the proof of Theorem 3.1.42.
Definition 3.1.46 The (u + v|u − v) construction is a slightly modified con-
struction, which is defined as the following code
{ (u + v|u − v) | u ∈ C1, v ∈ C2 }.
When we consider this construction, we restrict ourselves to the case q odd.
Since u + v = u − v if q is even.
Proposition 3.1.47 Let Ci be an [n, ki, di] code with generator matrix Gi for
i = 1, 2. Assume that q is odd. Then, the (u + v|u − v) construction of C1 and
C2 is an [2n, k1 +k2, d] code with d ≥ min{2d1, 2d2, max{d1, d2}} and generator
matrix
G =
G1 G1
G2 −G2
.

Proof. The proof of the proposition is similar to that of Theorem 3.1.42. In
fact, suppose x1, . . . , xk1
and y1, . . . , yk2
are bases of C1 and C2, respectively,
then every codeword is of the form (u + v|u − v) = (u|u) + (v| − v). With
u ∈ C1 and v ∈ C2. So (u|u) is a linear combination of (x1|x1), . . . , (xk1 |xk1 ),
and (v| − v) is a linear combination of (y1| − y1), . . . , (yk2 | − yk2 ).
Using the assumption that q is odd, we can prove that this set of vectors (xi|xi),
(yj| − yj) is linearly independent. Suppose that
i
λi(xi|xi) +
j
µj(yj| − yj) = 0,
Then
i λixi + j µjyj = 0,
i λixi − j µjyj = 0.
Adding the two equations and dividing by 2 gives i λixi = 0. So λi = 0 for
all i, since the xi are independent. Similarly, the substraction of the equations
gives that µj = 0 for all j.
So the (xi|xi), (yj| − yj) are independent and generate the code. Hence they
form a basis and this shows that the given G is a generator matrix of this
construction.
Let (u + v|u − v) be a nonzero codeword. The weight of this word is at least
2d1 if v = 0, and at least 2d2 if u = 0. Now suppose u = 0 and v = 0. Then
the weight of u − v is at least wt(u) − w, where w is the number of positions i
such that ui = vi = 0. If ui = vi = 0, then ui + vi = 0, since q is odd. Hence
wt(u + v) ≥ w, and (u + v|u − v) ≥ w + (wt(u) − w) = wt(u) ≥ d1. In the
same way wt(u + v|u − v) ≥ d2. Hence wt(u + v|u − v) ≥ max{d1, d2}. This
proofs the estimate on the minimum distance.
Example 3.1.48 Consider the following ternary codes
C1 = {000, 110, 220}, C2 = {000, 011, 022}.
They are [3, 1, 2] codes. The (u + v|u − v) construction of these codes is a
[6, 2, d] code with d ≥ 2 by Proposition 3.1.47. It consists of the following nine
codewords:
(0, 0, 0, 0, 0, 0), (0, 1, 1, 0, 2, 2), (0, 2, 2, 0, 1, 1),
(1, 1, 0, 1, 1, 0), (1, 2, 1, 1, 0, 2), (1, 0, 2, 1, 2, 1),
(2, 2, 0, 2, 2, 0), (2, 0, 1, 2, 1, 2), (2, 1, 2, 2, 0, 1).
Hence d = 4. On the other hand, by the (u|u+v) construction, we get a [6, 2, 2]
code, which has a smaller minimum distance than the (u+v|u−v) construction.
Now a more complicated construction is given.
Deﬁnition 3.1.49 Let C1 and C2 be [n, k1] and [n, k2] codes, respectively. The
(a + x|b + x|a + b − x) construction of C1 and C2 is the following code
{ (a + x|b + x|a + b − x) | a, b ∈ C1, x ∈ C2 }

Proposition 3.1.50 Let C1 and C2 be [n, k1] and [n, k2] codes over Fq, re-
spectively. Suppose q is not a power of 3. Then, the (a + x|b + x|a + b − x)
construction of C1 and C2 is an [3n, 2k1 + k2] code with generator matrix
G =


G1 0 G1
0 G1 G1
G2 G2 −G2

 .
Proof. Let x1, . . . , xk1
and y1, . . . , yk2
be bases of C1 and C2, respectively.
Consider the following 2k1 + k2 vectors
(x1|0|x1), . . . , (xk1
|0|xk1
),
(0|x1|x1), . . . , (0|xk1 |xk1 ),
(y1|y1| − y1), . . . , (yk2
|yk2
| − yk2
).
It is left as an exercise to check that they form a basis of this construction in
case q is not a power of 3. This shows that the given G is a generator matrix of
the code and that it dimension is 2k1 + k2.
For binary codes, some simple inequalities, for example, Exercise 3.1.9, can be
used to estimate the minimum distance of the last construction. In general we
have the following estimate for the minimum distance.
Proposition 3.1.51 Let C1 and C2 be [n, k1, d1] and [n, k2, d2] codes over Fq,
respectively. Suppose q is not a power of 3. Let d0 and d3 be the minimum
distance of C1 ∩C2 and C1 +C2, respectively. Then, the minimum distance d of
the (a+x|b+x|a+b−x) construction of C1 and C2 is at least min{d0, 2d1, 3d3}.
The choice of the minus sign in the (a + x|b + x|a + b − x) construction be-
comes apparent in the construction of self-dual codes over Fq for arbitrary q not
divisible by 3.
Proposition 3.1.52 Let C1 and C2 be self-dual [2k,k] codes. The the codes
obtained from C1 and C2 by the direct sum, the (u|u + v) if C1 = C2, and the
(u+v|u−v) constructions and the (a+x|b+x|a+b−x) construction in case
q is not divisible by 3 are also self-dual.
Proof. The generator matrix Gi of Ci has size k × 2k and satisﬁes GiGT
i = 0
for i = 1, 2. In all the constructions the generator matrix G has size 2k × 4k or
3k × 6k as given in Theorem 3.1.42 and Propositions 3.1.38, 3.1.48 and 3.1.50
satisﬁes also GGT
= 0. For instance in the case of the (a + x|b + x|a + b − x)
construction we have
GGT
=


G1 0 G1
0 G1 G1
G2 G2 −G2




GT
1 0 GT
2
0 GT
1 GT
2
GT
1 GT
1 −GT
2

 .
All the entries in this product are the sum of terms of the form GiGT
i or G1GT
2 −
G1GT
2 which are all zero. Hence GGT
= 0.

Example 3.1.53 Let C1 be the binary [8, 4, 4] self-dual code with the generator
matrix G1 of the form (I4|A1) as given in Example 2.3.26. Let C2 be the code
with generator matrix G2 = (I4|A2) where A2 is obtained from A1 by a cyclic
shift of the columns.
A1 =




0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0



 , A2 =




1 0 1 1
1 1 0 1
1 1 1 0
0 1 1 1



 .
The codes C1 and C2 are both [8, 4, 4] self-dual codes and C1 ∩ C2 = {0, 1} and
C1 +C2 is the even weight code. Let C be the (a+x|b+x|a+b+x) construction
applied to C1 and C2. Then C is a binary self-dual [24, 12, 8] code. The claim on
the minimum distance is the only remaining statement to verify, by Proposition
3.1.52. Let G be the generator matrix of C as given in Proposition 3.1.50. The
weights of the rows of G are all divisible by 4. Hence the weights of all codewords
are divisible by 4 by Exercise ??. Let c = (a + x|b + x|a + b + x) be a nonzero
codeword with a, b ∈ C1 and x ∈ C2. If a + x = 0, then a = x ∈ C1 ∩ C2. So
a = x = 0 and c = (0|b|b) or a = x = 1
¯
and c = (0|b + 1|b), and in both cases
the weight of c is at least 8, since the weight of b is at least 4 and the weight
of 1 is 8. Similarly it is argued that the weight of c is at least 8 if b + x = 0 or
a + b + x = 0. So we may assume that neither of a + x, b + x, nor a + b + x
is zero. Hence all three are nonzero even weight codewords and wt(c) ≥ 6. But
the weight is divisible by 4. Hence the minimum distance is at least 8. Let a
be a codeword of C1 of weight 4, then c = (a, 0, a) is a codeword of weight 8.
In this way we have constructed a binary self-dual [24, 12, 8] code. It is called
the extended binary Golay code. The binary Golay code is the [23, 12, 7] code
obtained by puncturing one coordinate.
3.1.4 Concatenated codes
For this section we need some theory of finite fields. See Section 7.2.1. Let q
be a prime power and k a positive integer. The finite field Fqk with qk
elements
contains Fq as a subfield. Now Fqk is a k-dimensional vector over Fq. Let
ξ1, . . . , ξk be a basis of Fqk over Fq.
Consider the map
ϕ : Fk
q −→ Fqk .
defined by ϕ(a) = a1ξ1 + · · · + akξk. Then ϕ is an isomorphism of vector spaces
with inverse map ϕ−1
.
The vector space FK×k
q of K ×k matrices over Fq form a vector space of dimen-
sion Kk over Fq and it is linear isometric with FKk
q by taking some ordering of
the Kk entries of such matrices. Let M be a K × k matrix over Fq with i-th
row mi. The map
ϕK : FK×k
q −→ FK
qk
is defined by ϕK(M) = (ϕ(m1), . . . , ϕ(mK)). The inverse map
ϕ−1
N : FN
qk −→ FN×k
q
is given by ϕ−1
N (a1, . . . , an) = P, where P is the N × k matrix with i-th row
pi = ϕ−1
(ai).

Let A be an [N, K] code over Fqk , and B an [n, k] code over Fq. Let GA and
GB be generator matrices of A and B, respectively. The N-fold direct sum
G
(N)
B = GB ⊕ · · · ⊕ GB : FNk
q → FNn
q is defined by G
(N)
B (P) which is the N × n
matrix Q with i-th row qi = piGB for a given N × k matrix P with i-th row
pi in Fk
q .
By the following concatenation procedure a message of length Kk over Fq is
encoded to a codeword of length Nn over Fq.
Step 1: The K × k matrix M is mapped to m = ϕK(M).
Step 2: m in FK
qk is mapped to a = mGA in FN
qk .
Step 3: a in FN
qk is mapped to P = ϕ−1
N (a).
Step 4: The N × k matrix P with i-th row pi is mapped to the N × n matrix
Q with i-th row qi = piGB.
The encoding map
E : FK×k
q −→ FN×n
q
is the composition of the four maps explained above:
E = G
(N)
B ◦ ϕ−1
N ◦ GA ◦ ϕK.
Let
C = { E(M) | M ∈ FK×k
q }.
We call C the concatenated code with outer code A and inner code B.
Theorem 3.1.54 Let A be an [N, K, D] code over Fqk , and B an [n, k, d] code
over Fq. Let C be the concatenated code with outer code A and inner code B.
Then C is an Fq-linear [Nn, Kk] code and its minimum distance is at least Dd.
Proof. The encoding map E is an Fq-linear map, since it is a composition of
four Fq-linear maps. The first and third map are isomorphisms, and the second
and last map are injective, since they are given by generator matrices of full
rank. Hence E is injective. Hence the concatenated code C is an Fq-linear code
of length Nn and dimension Kk.
Next, consider the minimum distance of C. Since A is an [N, K, D] code, every
nonzero codeword a obtained in Step 2 has weight at least D. As a result,
the N × k matrix P obtained from Step 3 has at least D nonzero rows pi.
Now, because B is a [n, k, d] code, every piGB has weight d, if pi is not zero.
Therefore, the minimum distance of C is at least Dd.
Example 3.1.55 The definition of the concatenated code depends on the choice
of the map ϕ that is on the choice of the basis ξ1, . . . , ξn. In fact the minimum
distance of the concatenated code can be strictly larger than Dd as the following
example shows.
The field F9 contains the ternary field F3 as a subfield and an element ξ such
that ξ2
= 1 + ξ, since the polynomial X2
− X − 1 is irreducible in F3[X]. Now
take ξ1 = 1 and ξ2 = ξ as a basis of F9 over F3. Let A be the [2, 1, 2] outer code

over F9 with generator matrix GA = [1, ξ2
]. Let B be the trivial [2, 2, 1] code
over F3 with generator matrix GB = I2. Let M = (m1, m2) ∈ F1×2
3 . Then m =
ϕ1(M) = m1 +m2ξ ∈ F9. So a = mGA = (m1 +m2ξ, (m1 +m2)+(m1 −m2)ξ),
since ξ3
= 1 − ξ. Hence
Q = P = ϕ−1
2 (a) =
m1 m2
m1 + m2 m1 − m2
.
Therefore the concatenated code has minimum distance 3 Dd.
Suppose we would have taken ξ1 = 1 and ξ2 = ξ2
as a basis instead. Take
M = (1, 0). Then m = ϕ1(M) = 1 ∈ F9. So a = mGA = (1, ξ2
). Hence
Q = P = ϕ−1
2 (a) = I2 is a codeword in the concatenated code that has weight
2 = Dd.
Thus, the deﬁnition and the parameters of a concatenated code dependent on
the speciﬁc choice of the map ϕ.
3.1.5 Exercises
3.1.1 Prove Proposition 3.1.11.
3.1.2 Let C be the binary [9,4,4] product code of Example 2.1.2. Show that
puncturing C at the position i gives a [8,4,3] code for every choice of i = 1, . . . , 9.
Is it possible to obtain the binary [7,4,3] Hamming code by puncturing C? Show
that shortening C at the position i gives a [8,3,4] code for every choice of i.
Is it possible to obtain the binary [7,3,4] Simplex code by a combination of
puncturing and shortening the product code?
3.1.3 Suppose that there exists an [n , k , d ]q code and an [n, k, d]q code with
a [n, k − k , d + d ]q subcode. Use a generalization of the construction for Ce
(v)
to show that there exists an [n + n , k, d + d ]q code.
3.1.4 Let C be a binary code with minimum distance d. Let d be the largest
weight of any codeword of C. Suppose that the all-ones vector is not in C.
Then, the augmented code Ca
has minimum distance min{d, n − d }.
3.1.5 Let C be an Fq-linear code of length n. Let v ∈ Fn
q and S = {n + 1}.
Suppose that the all-ones vector is a parity check of C but not of v. Show that
(Cl
(c))S
= C.
3.1.6 Show that the shortened binary [7,3,4] code is a product code of codes
of length 2 and 3.
3.1.7 Let C be a nontrivial linear code of length n. Then C is the direct sum
of two codes of lengths strictly smaller than n if and only if C = v ∗ C for some
v ∈ Fn
q with nonzero entries that are not all the same.
3.1.8 Show that the punctured binary [7,3,4] is equal to the (u|u + v) con-
struction of a [3, 2, 2] code and a [3, 1, 3] code.
3.1.9 For binary vectors a, b and x,
wt(a + x|b + x|a + b + x) ≥ 2wt(a + b + a ∗ b) − wt(x),
with equality if and only if ai = 1 or bi = 1 or xi = 0 for all i, where a ∗ b =
(a1b1, . . . , anbn).

3.2. BOUNDS ON CODES 63
3.1.10 Give a parity check matrix for the direct sum, the (u|u + v), the (u +
v|u − v) and the (a + x|b + x|a + b − x) construction in terms of the parity
check matrices H1 and H2 of the codes C1 and C2, respectively.
3.1.11 Give proofs of Propositions 3.1.50 and 3.1.51.
3.1.12 Let Ci be an [n, ki, di] code over Fq for i = 1, 2, where q is a power of
3. Let k0 be the dimension of C1 ∩C2 and d3 the minimum distance of C1 +C2.
Show that the (a + x|b + x|a + b − x) construction with C1 and C2 gives a
[3n, 2k1 + k2 − k0, d] code with d ≥ min{2d1, 3d3}.
3.1.13 Show that C1 ∩ C2 = {0, 1} and C1 + C2 is the even weight code, for
the codes C1 and C2 of Example 3.1.53.
3.1.14 Show the existence of a binary [45,15,16] code.
3.1.15 Show the existence of a binary self-dual [72,36,12] code.
3.1.16 [CAS] Construct a binary random [100, 50] code and make sure that
identities from Proposition 3.1.17 take place for different position sets: the last
position, the last five, the random five.
3.1.17 [CAS] Write procedures that take generator matrices G1 and G2 of the
codes C1 and C2 and return a matrix G that is the generator matrix of the code
C, which is the result of the
• (u + v|u − v)-construction of Proposition 3.1.47;
• (a + x|b + x|a + b − x)-construction of Proposition 3.1.50.
3.1.18 [CAS] Using the previous exercise construct the extended Golay code as
in Example 3.1.53. Compare this code with the one returned by ExtendedBinary
GolayCode() (in GAP) and GolayCode(GF(2),true) (in Magma).
3.1.19 Show by means of an example that the concatenation of an [3, 2, 2]
outer and [, 2, 2, 1] inner code gives a [6, 4] code of minimum distance 2 or 3
depending on the choice of the basis of the extended field.
3.2 Bounds on codes
We have introduced some parameters of a linear code in the previous sections.
In coding theory one of the most basic problems is to find the best value of a
parameter when other parameters have been given. In this section, we discuss
some bounds on the code parameters.
3.2.1 Singleton bound and MDS codes
The following bound gives us the maximal minimum distance of a code with a
given length and dimension. This bound is called the Singleton bound.
Theorem 3.2.1 (The Singleton Bound) If C is an [n, k, d] code, then
d ≤ n − k + 1.

Proof. Let H be a parity check matrix of C. This is an (n − k) × n matrix of
row rank n − k. The minimum distance of C is the smallest integer d such that
H has d linearly dependent columns, by Proposition 2.3.11. This means that
every d − 1 columns of H are linearly independent. Hence, the column rank of
H is at least d − 1. By the fact that the column rank of a matrix is equal to the
row rank, we have n − k ≥ d − 1. This implies the Singleton bound.
Definition 3.2.2 Let C be an [n, k, d] code. If d = n − k + 1, then C is called
a maximum distance separable code or an MDS code, for short.
Remark 3.2.3 From the Singleton bound, a maximum distance separable code
achieves the maximum possible value for the minimum distance given the code
length and dimension.
Example 3.2.4 The minimum distance of the the zero code of length n is n+1,
by definition. Hence the zero code has parameters [n, 0, n + 1] and is MDS. Its
dual is the whole space Fn
q with parameters [n, n, 1] and is also MDS. The n-fold
repetition code has parameters [n, 1, n] and its dual is an [n, n − 1, 2] code and
both are MDS.
Proposition 3.2.5 Let C be an [n, k, d] code over Fq. Let G be a generator
matrix and H a parity check matrix of C. Then the following statements are
equivalent:
(1) C is an MDS code,
(2) every (n − k)-tuple of columns of a parity check matrix H are linearly inde-
pendent,
(3) every k-tuple of columns of a generator matrix G are linearly independent.
Proof. As the minimum distance of C is d any d−1 columns of H are linearly
independent, by Proposition 2.3.11. Now d ≤ n−k +1 by the Singleton bound.
So d = n−k +1 if and only if every n−k columns of H are independent. Hence
(1) and (2) are equivalent.
Now let us assume (3). Let c be an element of C which is zero at k given
coördinates. Let c = xG for some x ∈ Fk
q . Let G be the square matrix
consisting of the k columns of G corresponding to the k given zero coördinates
of c. Then xG = 0. Hence x = 0, since the k columns of G are independent
by assumption. So c = 0. This implies that the minimum distance of C is at
least n − (k − 1) = n − k + 1. Therefore C is an [n, k, n − k + 1] MDS code, by
the Singleton bound.
Assume that C is MDS. Let G be a generator matrix of C. Let G be the square
matrix consisting of k chosen columns of G. Let x ∈ Fk
q such that xG = 0.
Then c = xG is codeword and its weight is at most n − k. So c = 0, since the
minimum distance is n−k +1. Hence x = 0, since the rank of G is k. Therefore
the k columns are independent.
Example 3.2.6 Consider the code C over F5 of length 5 and dimension 2 with
generator matrix
G =
1 1 1 1 1
0 1 2 3 4
.

Note that while the first row of the generator matrix is the all 1’s vector, the
entries of the second row are distinct. Since every codeword of C is a linear
combination of the first and second row, the minimum distance of C is at least
5. On the other hand, the second row is a word of weight 4. Hence C is a [5, 2, 4]
MDS code. The matrix G is a parity check matrix for the dual code C⊥
. All
columns of G are nonzero, and every two columns are independent since
det
1 1
i j
= j − i = 0
for all 0 ≤ i j ≤ 4. Therefore, C⊥
is also an MDS code.
In fact, we have the following general result.
Corollary 3.2.7 The dual of an [n, k, n−k+1] MDS code is an [n, n−k, k+1]
MDS code.
Proof. The trivial codes are MDS and are dual of each other by Example
3.2.4. Assume 0 k n. Let H be a parity check matrix of an [n, k, n − k + 1]
MDS code C. Then any n − k columns of H are linearly independent, by (2)
of Proposition 3.2.5. Now H is a generator matrix of the dual code. Therefore
C⊥
is an [n, n − k, k + 1] MDS code, since (3) of Proposition 3.2.5 holds.
Definition 3.2.8 Let a be a vector of Fk
q . Then V (a) is the Vandermonde
matrix with entries ai−1
j .
Lemma 3.2.9 Let a be a vector of Fk
q . Then
det V (a) =
1≤rs≤k
(ais
− air
).
Proposition 3.2.10 Let n ≤ q. Let a = (a1, . . . , an) be an n-tuple of mutually
distinct elements of Fq. Let k be an integer such that 0 ≤ k ≤ n. Define the
matrices Gk(a) and Gk(a) by
Gk(a) =





1 · · · 1
a1 · · · an
...
...
...
ak−1
1 · · · ak−1
n





and Gk(a) =





1 · · · 1 0
a1 · · · an 0
...
...
...
...
ak−1
1 · · · ak−1
n 1





.
The codes with generator matrix Gk(a) and Gk(a) are MDS codes.
Proof. Consider a k × k submatrix of G(a). Then this is a Vandermonde
matrix and its determinant is not zero by Lemma 3.2.9, since the ai are mutually
distinct. So any system of k columns of Gk(a) is independent. Hence Gk(a) is
the generator matrix of an MDS code by Proposition 3.2.5.
The proof for Gk(a) is similar and is left as an exercise.

Remark 3.2.11 The codes defined in Proposition 3.2.10 are called generalized
Reed-Solomon codes and are the prime examples of MDS codes. These codes will
be treated in Section 8.1. The notion of an MDS code has a nice interpretation
of n points in general position in projective space as we will see in Section
4.3.1. The following proposition shows the existence of MDS codes over Fq
with parameters [n, k, n − k + 1] for all possible values of k and n such that
0 ≤ k ≤ n ≤ q + 1.
Example 3.2.12 Let q be a power of 2. Let n = q + 2 and a1, a2, . . . , aq be an
enumeration of the elements of Fq. Consider the code C with generator matrix






1 1 . . . 1 0 0
a1 a2 . . . aq 0 1
a2
1 a2
2 . . . a2
q 1 0






Then any 3 columns of this matrix are independent, since by the Proposition
3.2.10, the only remaining nontrivial case to check is
1 1 0
ai aj 1
a2
i a2
j 0
= −(a2
j − a2
i ) = (ai − aj)2
= 0, in characteristic 2
for all 1 ≤ i j ≤ q − 1. Hence C is a [q + 2, 3, q] code.
Remark 3.2.13 From (3) of Proposition 3.2.5 and Proposition 2.2.22 we see
that any k symbols of the codewords of an MDS code of dimension k may be
taken as message symbols. This is another reason for the name of maximum
distance separable codes.
Corollary 3.2.14 Let C be an [n, k, d] code. Then C is MDS if and only if
for any given d coördinate positions i1, i2, . . . , id, there is a minimum weight
codeword with the set of these positions as support. Furthermore two codewords
of an MDS code of minimum weight with the same support are a nonzero multiple
of each other.
Proof. Let G be a generator matrix of C. Suppose d n − k + 1. There
exist k positions j1, j2, . . . , jk such that the columns of G at these positions are
independent. The complement of these k positions consists of n − k elements
and d ≤ n−k. Choose a subset {i1, i2, . . . , id} of d elements in this complement.
Let c be a codeword with support that is contained in {i1, i2, . . . , id}. Then c is
zero at the positions j1, j2, . . . , jk. Hence c = 0 and the support of c is empty.
If C is MDS, then d = n−k+1. Let {i1, i2, . . . , id} be a set of d coördinate posi-
tions. Then the complement of this set consists of k−1 elements j1, j2, . . . , jk−1.
Let jk = i1. Then j1, j2, . . . , jk are k elements that can be used for systematic
encoding by Remark 3.2.13. So there is a unique codeword c such that cj = 0
for all j = j1, j2, . . . , jk−1 and cjk
= 1. Hence c is a nonzero codeword of weight
at most d and support contained in {i1, i2, . . . , id}. Therefore c is a codeword
of weight d and support equal to {i1, i2, . . . , id}, since d is the minimum weight
of the code.
Furthermore, let c be another codeword of weight d and support equal to

{i1, i2, . . . , id}. Then cj = 0 for all j = j1, j2, . . . , jk−1 and cjk
= 0. Then c and
cjk
c are two codewords that coincide at j1, j2, . . . , jk. Hence c = cjk
c.
Remark 3.2.15 It follows from Corollary 3.2.14 that the number of nonzero
codewords of an [n, k] MDS code of minimum weight n − k + 1 is equal to
(q − 1)
n
n − k + 1
.
In Section 4.1, we will introduce the weight distribution of a linear code. Using
the above result the weight distribution of an MDS code can be completely
determined. This will be determined in Proposition 4.4.22.
Remark 3.2.16 Let C be an [n, k, n − k + 1 code. Then it is systematic at the
ﬁrst k positions. Hence C has a generator matrix of the form (Ik|A). It is left
as an exercise to show that every square submatrix of A is nonsingular. The
converse is also true.
Deﬁnition 3.2.17 Let n ≤ q. Let a, b, r and s be vectors of Fk
q such that
ai = bj for all i, j. Then C(a, b) is the k × k Cauchy matrix with entries
1/(ai − bj), and C(a, b; r, s) is the k × k generalized Cauchy matrix with entries
risj/(ai − bj). Let k be an integer such that 0 ≤ k ≤ n. Let A(a) be the
k ×(n−k) matrix with entries 1/(aj+k −ai) for 1 ≤ i ≤ k, 1 ≤ j ≤ n−k. Then
the Cauchy code Ck(a) is the code with generator matrix (Ik|A(a)). If ri is not
zero for all i, then A(a, r) is the k × (n − k) matrix with entries
rj+kr−1
i
aj+k − ai
for 1 ≤ i ≤ k, 1 ≤ j ≤ n − k.
The generalized Cauchy code Ck(a, r) is the code with generator matrix (Ik|A(a, r)).
Lemma 3.2.18 Let a, b, r and s be vectors of Fk
q such that ai = bj for all i, j.
Then
det C(a, b; r, s) =
n
i=1
ri
n
j=1
sj
ij(ai − aj)(bj − bi)
n
i,j=1(ai − bj)
.
Proposition 3.2.19 Let n ≤ q. Let a be an n-tuple of mutually distinct ele-
ments of Fq, and r an n-tuple of nonzero elements of Fq. Let k be an integer such
that 0 ≤ k ≤ n. Then the generalized Cauchy code Ck(a, r) is an [n, k, n−k+1]
code.
Proof. Every square t × t submatrix of A is Cauchy matrix of the form
C((ai1
, . . . , ait
), (ak+j1
, . . . , ak+jt
); (b−1
i1
, . . . , b−1
it
), (bk+j1
, . . . , bk+jt
)). The deter-
minant of this matrix is not zero by Lemma 3.2.18, since the entries of a are
mutually distinct and the entries of r are not zero. Hence (Ik|A(a, r)) is the
generator matrix of an MDS code by Remark 3.2.16.
In Section 8.1 it will be shown that generalized Reed-Solomon codes and Cauchy
codes are the same.

3.2.2 Griesmer bound
Clearly, the Singleton bound can be viewed as a lower bound on the code length
n with given dimension k and minimum distance d, that is n ≥ d + k − 1. In
this subsection, we will give another lower bound on the length.
Theorem 3.2.20 (The Griesmer Bound) If C is an [n, k, d] code with k
0, then
n ≥
k−1
i=0
d
qi
.
Note that the Griesmer bound implies the Singleton bound. In fact, we have
d/q0
= d and d/qi
≥ 1 for i = 1, . . . , k − 1, which follow the Singleton
bound. In the previous Section 3.1 we introduced some methods to construct
new codes from a given code. In the following, we give another construction of
a new code, which will be used to prove Theorem 3.2.20.
Let C be an [n, k, d] code, and c be a codeword with w = wt(c). Let I = supp(c)
(see the definition in Subsection 2.1.2). The residual code of C with respect to c,
denoted by Res(C, c), is the code of length n−w punctured on all the coördinates
of I.
Proposition 3.2.21 Suppose C is an [n, k, d] code over Fq and c is a codeword
of weight w (qd)/(q − 1). Then Res(C, c) is an [n − w, k − 1, d ] code with
d ≥ d − w +
w
q
.
Proof. By replacing C by an equivalent code we may assume without loss of
the generality that c = (1, 1, . . . , 1, 0, . . . , 0) where the first w components are
equal to 1 and other components are 0. Clearly, the dimension of Res(C, c) is
less than or equal to k − 1. If the dimension is strictly less than k − 1, then
there must be a nonzero codeword in C of the form x = (x1, . . . , xw, 0, . . . , 0),
where not all the xi are the same. There exists α ∈ Fq such that at least w/q
coördinates of (x1, . . . , xw) equal to α. Thus,
d ≤ wt(x − αc) ≤ w − w/q = w(q − 1)/q,
which contradicts the assumption on w. Hence dim Res(C, c) = k − 1.
Next, consider the minimum distance. Let (xw+1, . . . , xn) be any nonzero code-
word in Res(C, c), and x = (x1, . . . , xw, xw+1, . . . , xn) be a corresponding code-
word in C. There exists α ∈ Fq such that at least w/q coördinates of (x1, . . . , xw)
equal α. Therefore,
d ≤ wt(x − αc) ≤ w − w/q + wt((xw+1, . . . , xn)).
Thus every nonzero codeword of Res(C, c) has weight at least d − w + w/q .
Proof of Theorem 3.2.20. We will prove the theorem by mathematical in-
duction on k. If k = 1, the inequality that we want to prove is n ≥ d, which

is obviously true. Now suppose k 1. Let c be a codeword of weight d. Us-
ing Proposition 3.2.21, Res(C, c) is an [n − d, k − 1, d ] code with d ≥ d/q .
Applying the inductive assumption to Res(C, c), we have
n − d ≥
k−2
i=0
d
qi
≥
k−2
i=0
d
qi+1
.
The Griesmer bound follows.
3.2.3 Hamming bound
In practical applications, given the length and the minimum distance, the codes
which have more codewords (in other words, codes of larger size) are often
preferred. A natural question is, what is the maximal possible size of a code.
given the length and minimum distance. Denote by Aq(n, d) the maximum
number of codewords in any code over Fq (which can be linear or nonlinear)
of length n and minimum distance d. The maximum when restricted to linear
codes is denoted by Bq(n, d). Clearly Bq(n, d) ≤ Aq(n, d). The following is a
well-known upper bound for Aq(n, d).
Remark 3.2.22 Denote the number of vectors in Bt(x) the ball of radius t
around a given vector x ∈ Fn
q as defined in 2.1.12 by Vq(n, t). Then
Vq(n, t) =
t
i=0
n
i
(q − 1)i
by Proposition 2.1.13.
Theorem 3.2.23 (Hamming or sphere-packing bound)
Bq(n, d) ≤ Aq(n, d) ≤
qn
Vq(n, t)
,
where t = (d − 1)/2 .
Proof. Let C be any code over Fq (which can be linear or nonlinear) of
length n and minimum distance d. Denote by M the number of codewords of
C. Since the distance between any two codewords is greater than or equal to
d ≥ 2t + 1, the balls of radius t around the codewords must be disjoint. From
Proposition 2.1.13, each of these M balls contain
t
i=0(q − 1)i n
i vectors. The
total number of vectors in the space Fn
q is qn
. Thus, we have
M · Vq(n, t) ≤ qn
.
As C is any code with length n and minimum distance d, we have established
the theorem.
Definition 3.2.24 The covering radius ρ(C) of a code C of length n over Fq
is defined to be the smallest integer s such that
c∈C
Bt(c) = Fn
q ,

that is every vector Fn
q is in the union of the balls of radius t around the code-
words. A code is of covering radius ρ is called perfect if the balls Bρ(c), c ∈ C
are mutually disjoint.
Theorem 3.2.25 (Sphere-covering bound) Let C be a code of length n with
M codewords and covering radius ρ. Then
M · Vq(n, ρ) ≥ qn
.
Proof. By deﬁnition
c∈C
Bρ(c) = Fn
q ,
Now |Bρ(c)| = Vq(n, ρ) for all c in C by Proposition 2.1.13. So M ·Vq(n, ρ) ≥ qn
.
Example 3.2.26 If C = Fn
q , then the balls B0(c) = {c}, c ∈ C cover Fn
q and
are mutually disjoint. So Fn
q is perfect and has covering radius 0.
If C = {0}, then the ball Bn(0) covers Fn
q and there is only one codeword.
Hence C is perfect and has covering radius n.
Therefore the trivial codes are perfect.
Remark 3.2.27 It is easy to see that
ρ(C) = max
x∈Fn
q
min
c∈C
d(x, c).
Let e(C) = (d(C) − 1)/2 . Then obviously e(C) ≤ ρ(C). Let C be code of
length n and minimum distance d with more than one codeword. Then C is a
perfect code if and only if ρ(C) = e(C).
Proposition 3.2.28 The following codes are perfect:
(1) the trivial codes,
(2) (2e + 1)-fold binary repetition code,
(3) the Hamming code,
(4) the binary and ternary Golay code.
Proof. (1) The trivial codes are perfect as shown in Example 3.2.26.
(2) The (2e + 1)-fold binary repetition code consists of two codewords, has
minimum distance d = 2e + 1 and error-correcting capacity e. Now
22e+1
=
2e+1
i=0
2e + 1
i
=
e
i=0
2e + 1
i
+
e
i=0
2e + 1
e + 1 + i
and 2e+1
e+1+i = 2e+1
i . So 2
e
i=0
2e+1
i = 22e+1
. Therefore the covering radius
is e and the code is perfect.
(3) From Deﬁnition 2.3.13 and Proposition 2.3.14, the q-ary Hamming code
Hr(q) is an [n, k, d] code with
n =
qr
− 1
q − 1
, k = n − r, and d = 3.

For this code, t = 1, n = k + r, and the number of codewords is M = qk
. Thus,
M 1 + (q − 1)
n
1
= M(1 + (q − 1)n) = Mqr
= qk+r
= qn
.
Therefore, Hr(q) is a perfect code.
(4) It is left to the reader to show that the binary and ternary Golay codes are
perfect.
3.2.4 Plotkin bound
The Plotkin bound is an upper bound on Aq(n, d) which is valid when d is large
enough comparing with n.
Theorem 3.2.29 (Plotkin bound) Let C be an (n, M, d) code over Fq such
that qd (q − 1)n. Then
M ≤
qd
qd − (q − 1)n
.
Proof. We calculate the following sum
S =
x∈C y∈C
d(x, y)
in two ways. First, since for any x, y ∈ C and x = y, the distance d(x, y) ≥ d,
we have
S ≥ M(M − 1)d.
On the other hand, let M be the M × n matrix consisting of the codewords of
C. For i = 1, . . . , n, let ni,α be the number of times α ∈ Fq occurs in column i
of M. Clearly, α∈Fq
ni,α = M for any i. Now, we have
S =
n
i=1 α∈Fq
ni,α(M − ni,α) = nM2
−
n
i=1 α∈Fq
n2
i,α.
Using the Cauchy-Schwartz inequality,
α∈Fq
n2
i,α ≥
1
q


α∈Fq
ni,α


2
.
Thus,
S ≤ nM2
−
n
i=1
1
q


α∈Fq
ni,α


2
= n(1 − 1/q)M2
.
Combining the above two inequalities on S, we prove the theorem.

Example 3.2.30 Consider the simplex code S3(3), that is the dual code of
the Hamming code H3(3) over F3 of Example 2.3.15. This is an [13, 3, 9] code
which has M = 33
= 27 codewords. Every nonzero codeword in this code has
Hamming weight 9, and d(x, y) = 9 for any distinct codewords x and y. Thus,
qd = 27 26 = (q − 1)n. Since
qd
qd − (q − 1)n
= 27 = M,
this code achieves the Plotkin bound.
Remark 3.2.31 For a code, if all the nonzero codewords have the same weight,
we call it a constant weight code; if the distances between any two distinct code-
words are same, we call it an equidistant code. For a linear code, it is a constant
weight code if and only if it is an equidistant code. From the proof of Theo-
rem 3.2.29, only constant weight and equidistant codes can achieve the Plotkin
bound. So the simplex code Sr(q) achieves the Plotkin bound by Proposition
2.3.16.
Remark 3.2.32 ***Improved Plotkin Bound in the binary case.***
3.2.5 Gilbert and Varshamov bounds
The Hamming and Plotkin bounds give upper bounds for Aq(n, d) and Bq(n, d).
In this subsection, we discuss lower bounds for these numbers. Since Bq(n, d) ≤
Aq(n, d), each lower bound for Bq(n, d) is also a lower bound for Aq(n, d).
Theorem 3.2.33 (Gilbert bound)
logq (Aq(n, d)) ≥ n − logq (Vq(n, d − 1)) .
Proof. Let C be a code over Fq, not necessarily linear of length n and
minimum distance d, which has M = Aq(n, d) codewords. If
M · Vq(n, d − 1) qn
,
then the union of the balls of radius d − 1 of all codewords in C is not equal to
Fn
q by Proposition 2.1.13. Take x ∈ Fn
q outside this union. Then d(x, c) ≥ d for
all c ∈ C. So C ∪{x} is a code of length n with M +1 codewords and minimum
distance d. This contradicts the maximality of Aq(n, d). hence
Aq(n, d) · Vq(n, d − 1) ≥ qn
.
In the following the greedy algorithm one can construct a linear code of length n,
minimum distance ≥ d, and dimension k and therefore the number of codewords
as large as possible.
Theorem 3.2.34 Let n and d be integers satisfying 2 ≤ d ≤ n. If
k ≤ n − logq (1 + Vq(n − 1, d − 2)) , (3.1)
then there exists an [n, k] code over Fq with minimum distance at least d.

Proof. Suppose k is an integer satisfying the inequality (3.1), which is
equivalent to
Vq(n − 1, d − 2) qn−k
. (3.2)
We construct by induction the columns h1, . . . , hn ∈ F
(n−k)
q of an (n − k) × n
matrix H over Fq such that every d − 1 columns of H are linearly independent.
Choose for h1 any nonzero vector. Suppose that j n and h1, . . . , hj are chosen
such that any d − 1 of them are linearly independent. Choose hj+1 such that
hj+1 is not a linear combination of any d − 2 or fewer of the vectors h1, . . . , hj.
The above procedure is a greedy algorithm. We now prove the correctness of
the algorithm, by induction on j. When j = 1, it is trivial that there exists a
nonzero vector h1. Suppose that j n and any d − 1 of h1, . . . , hj are linearly
independent. The number of diﬀerent linear combinations of d − 2 or fewer of
the h1, . . . , hj is
d−2
i=0
j
i
(q − 1)i
≤
d−2
i=0
n − 1
i
(q − 1)i
= Vq(n − 1, d − 2).
Hence under the condition (3.2), there always exists a vector hj+1 which is not
a linear combination of d − 2 or fewer of h1, . . . , hj.
By induction, we ﬁnd h1, . . . , hn such that hj is not a linear combination of any
d − 2 or fewer of the vectors h1, . . . , hj−1. Hence, every d − 1 of h1, . . . , hn are
linearly independent.
The null space of H is a code C of dimension at least k and minimum distance
at least d by Proposition 2.3.11. Let C be be a subcode of C of dimension k.
Then the minimum distance of C is at least d.
Corollary 3.2.35 (Varshamov bound)
logq Bq(n, d) ≥ n − logq(1 + Vq(n − 1, d − 2)) .
Proof. The largest integer k satisfying (3.1) of Theorem 3.2.34 is given by the
right hand side of the inequality.
In the next subsection, we will see that the Gilbert bound and the Varshamov
bound are the same asymptotically. In the literature, sometimes any of them is
called the Gilbert-Varshamov bound. The resulting asymptotic bound is called
the asymptotic Gilbert-Varshamov bound.
3.2.6 Exercises
3.2.1 Show that for an arbitrary code, possibly nonlinear, of length n over an
alphabet with q elements with M codewords and minim distance d the following
form of the Singleton bounds holds: M ≤ qn+1−d
.
3.2.2 Let C be an [n, k] code. Let d⊥
be the minimum distance of C⊥
. Show
that d⊥
≤ k + 1, and that equality holds if and only if C is MDS.

3.2.3 Give a proof of the formula in Lemma 3.2.9 of the determinant of a
Vandermonde matrix
3.2.4 Prove that the code G (a) in Proposition 3.2.10 is MDS.
3.2.5 Let C be an [n, k, d] code over Fq. Prove that the number of codewords
of minimum weight d is divisible by q − 1 and is at most equal to (q − 1) n
d .
Show that C is MDS in case equality holds.
3.2.6 Give a proof of Remark 3.2.16.
3.2.7 Give a proof of the formula in Lemma 3.2.18 of the determinant of a
Cauchy matrix
3.2.8 Let C be a binary MDS code. If C is not trivial, then it is a repetition
code or an even weight code.
3.2.9 [20] ***Show that the code C1 in Proposition 3.2.10 is self-orthogonal if
n = q and k ≤ n/2. Self-dual ***
3.2.10 [CAS] Take q = 256 in Proposition 3.2.10 and construct the matrices
G10(a) and G10(a ). Construct the corresponding codes with these matrices
as generator matrices. Show that these codes are MDS by using commands
IsMDSCode in GAP and IsMDS in Magma.
3.2.11 Five a proof of the statements made in Remark 3.2.27.
3.2.12 Show that the binary and ternary Golay codes are perfect.
3.2.13 Let C be the binary [7, 4, 3] Hamming code. Let D be the F4 linear
code with the same generator matrix as C. Show that ρ(C) = 2 and ρ(D) = 3.
3.2.14 Let C be an [n, k] code. let H be a parity check matrix of C. Show
that ρ(C) is the minimal number ρ such that xT
is a linear combination of at
most ρ columns of H for every x ∈ Fn−k
q . Show that the redundancy bound:
ρ(C) ≤ n − k.
3.2.15 Give an estimate of the complexity of ﬁnding a code satisfying (3.1) of
Theorem 3.2.34 by the greedy algorithm.
3.3 Asymptotically good codes
***
3.3.1 Asymptotic Gibert-Varshamov bound
In practical applications, sometimes long codes are preferred. For an inﬁnite
family of codes, a measure of the goodness of the family of codes is whether the
family contains so-called asymptotically good codes.

3.3. ASYMPTOTICALLY GOOD CODES 75
Definition 3.3.1 An infinite sequence C = {Ci}∞
i=1 of codes Ci with parame-
ters [ni, ki, di] is called asymptotically good, if lim
i→∞
ni = ∞, and
R(C) = lim inf
i→∞
ki
ni
0 and δ(C) = lim inf
i→∞
di
ni
0.
Using the bounds that we introduced in the previous subsection, we will prove
the existence of asymptotically good codes.
Definition 3.3.2 Define the q-ary entropy function Hq on [0, (q − 1)/q] by
Hq(x) =
x logq(q − 1) − x logq x − (1 − x) logq(1 − x), if 0 x ≤ q−1
q ,
0, if x = 0.
The function Hq(x) is increasing on [0, (q − 1)/q]. The function H2(x) is the
entropy function.
Lemma 3.3.3 Let q ≥ 2 and 0 ≤ θ ≤ (q − 1)/q. Then
lim
n→∞
1
n logq Vq(n, θn ) = Hq(θ).
Proof. Since θn − 1 θn ≤ θn, we have
lim
n→∞
1
n θn = θ and lim
n→∞
1
n logq(1 + θn ) = 0. (3.3)
Now we are going to prove the following equality
lim
n→∞
1
n logq( n
θn ) = −θ logq θ − (1 − θ) logq(1 − θ). (3.4)
To this end we introduce the little-o notation and use the Stirling Fomula.
log n! = n + 1
2 log n − n + 1
2 log(2n) + o(1), (n → ∞).
For two functions f(n) and g(n), f(n) = o(g(n)) means for all c 0 there exists
some k 0 such that 0 ≤ f(n) cg(n) for all n ≥ k. The value of k must not
depend on n, but may depend on c. Thus, o(1) is a function of n, which tends
to 0 when n → ∞. By the Stirling Fomula, we have
1
n logq( n
θn ) = 1
n (logq n! − logq θn ! − logq(n − θn )!) =
= logq n − θ logq θn − (1 − θ) logq(n − θn ) + o(1) =
= −θ logq θ − (1 − θ) logq(1 − θ) + o(1).
Thus (3.4) follows.
From the definition we have
n
θn (q − 1) θn
≤ Vq(n, θn ) ≤ (1 + θn ) n
θn (q − 1) θn
. (3.5)
From the right-hand part of (3.5) we have
logq Vq(n, θn ) ≤ logq(1 + θn ) + logq( n
θn ) + θn logq(q − 1).

By (3.3) and (3.4), we have
lim
n→∞
1
n logq Vq(n, θn ) ≤ θ logq(q − 1) − θ logq θ − (1 − θ) logq(1 − θ). (3.6)
The right hand side is equal to Hq(θ) by definition. Similarly, using the left-hand
part of (3.5) we prove
lim
n→∞
1
n logq Vq(n, θn ) ≥ Hq(θ). (3.7)
Combining (3.6) and (3.7), we obtain the result.
Now we are ready to prove the existence of asymptotically good codes. Specifi-
cally, we have the following stronger result.
Theorem 3.3.4 Let 0 θ (q − 1)/q. Then there exists an asymptotically
good sequence C of codes such that δ(C) = θ and R(C) = 1 − Hq(θ).
Proof. Let 0 θ (q − 1)/q. Let {ni}∞
i=1 be a sequence of positive integers
with lim
i→∞
ni = ∞, for example, we can take ni = i. Let di = θni and
ki = ni − logq (1 + Vq(ni − 1, di − 2)) .
By Theorem 3.2.34 and the Varshamov bound, there exists a sequence C =
{Ci}∞
i=1 of [ni, ki, di] codes Ci over Fq.
Clearly δ(C) = θ 0 for this sequence of q-ary codes. We now prove R(C) =
1−Hq(θ). To this end, we first use Lemma 3.3.3 to prove the following equation:
lim
i→∞
1
ni
logq (1 + Vq(ni − 1, di − 2)) = Hq(θ). (3.8)
First, we have
1 + Vq(ni − 1, di − 2) ≤ Vq(ni, di).
By Lemma 3.3.3, we have
lim sup
i→∞
1
ni
logq (1 + Vq(ni − 1, di − 2)) ≤ lim
i→∞
1
ni
logq Vq(ni, di) = Hq(θ). (3.9)
Let δ = max{1, 3/θ }, mi = ni − δ and ei = θmi . Then,
di − 2 = θni − 2
θni − 3
≥ θ(ni − δ)
= θmi
≥ ei
and ni − 1 ≥ ni − δ = mi. Therefore
1
ni
logq (1 + Vq(ni − 1, di − 2)) ≥
1
mi+δ logq Vq(ei, mi) = 1
mi
logq Vq(ei, mi) · mi
mi+δ

3.3. ASYMPTOTICALLY GOOD CODES 77
Since δ is a constant and mi → ∞, we have lim
i→∞
mi/(mi + δ) = 1. Again by
Lemma 3.3.3, we have that the right hand side of the above inequality tends to
Hq(θ). It follows that
lim inf
i→∞
1
ni
logq (1 + Vq(ni − 1, di − 2)) ≥ Hq(θ). (3.10)
By inequalities (3.9) and (3.10), we obtain (3.8).
Now by (3.8), we have
R(C) = lim
i→∞
ki
ni
= 1 − lim
i→∞
1
ni
logq (1 + Vq(ni − 1, di − 2)) = 1 − Hq(θ),
and 1 − Hq(θ) 0, since θ (q − 1)/q.
So the sequence C of codes satisfying Theorem 3.3.4 is asymptotically good.
However, asymptotically good codes are not necessarily codes satisfying the
conditions in Theorem 3.3.4.
The number of codewords increases exponentially with the code length. So for
large n, instead of Aq(n, d) the following parameter is used
α(θ) = lim sup
n→∞
logq Aq(n, θn)
n
.
Since Aq(n, θn) ≥ Bq(n, θn) and for a linear code C the dimension k = logq |C|,
a straightforward consequence of Theorem 3.3.4 is the following asymptotic
bound.
Corollary 3.3.5 (Asymptotically Gilbert-Varshamov bound) Let 0 ≤ θ ≤
(q − 1)/q. Then
α(θ) ≥ 1 − Hq(θ).
Not that both the Gilbert and Varshamov bound that we introduced in the
previous subsection imply the asymptotically Gilbert-Varshamov bound.
***Manin αq(δ) is a decreasing continuous function. picture ***
3.3.2 Some results for the generic case
In this section we investigate the parameters of ”generic” codes. It turnes out
that almost all codes have the same minimum distance and covering radius
with the length n and dimension k = nR, 0 R 1 ﬁxed. By ”almost all” we
mean that as n tends to inﬁnity, the fraction of [n, nR] codes that do not have
”generic” minimum distance and covering radius tends to 0.
Theorem 3.3.6 Let 0 R 1, then almost all [n, nR] codes over Fq have
• minimum distance d0 := nH−1
q (1 − R) + o(n)
• covering radius d0(1 + o(1))
Here Hq is a q-ary entropy function.
Theorem 3.3.7 *** it gives a number of codewords that project on a given
k-set. Handbook of Coding theory, p.691. ***

3.3.3 Exercises
***???***
3.4 Notes
Puncturing and shortening at arbitrary sets of positions and the duality theo-
rem is from Simonis [?].
Golay code, Turyn [?] construction, Pless handbook [?] .
MacWillimas
In 1973 by J. H. van Lint and A. Tietavainen theorem in regards to perfect codes:
***–puncturing gives the binary [23,12,7] Golay code, which is cyclic.
–automorphism group of (extended) Golay code.
– (ext4ended) ternary Golay code.
– designs and Golay codes.
– lattices and Golay codes.***
***repeated decoding of product code (Hoeholdt-Justesen).
***Singleton defect s(C) = n + 1 − k − d
s(C) ≥ 0 and equality holds if and only if C is MDS.
s(C) = 0 if and only if s(C⊥
) = 0.
Example where s(C) = 1 and s(C⊥
) 1.
Almost MDS and near MDS.
Genus g = max{s(C), s(C⊥
) } in 4.1. If k ≥ 2, then d ≤ q(s + 1). If k ≥ 3 and
d = q(s + 1), then s + 1 ≤ q.
Faldum-Willems, de Boer, Dodunekov-Langev,
relation with Griesmer bound***

Chapter 4
Weight enumerator
Relinde Jurrius, Ruud Pellikaan and Xin-Wen Wu
***
The weight enumerator of a code is introduced and a random coding argument
gives a proof of Shannon’s theorem.
4.1 Weight enumerator
Apart from the minimum Hamming weight, a code has other important invari-
ants. In this section, we will introduce the weight spectrum and the generalized
weight spectrum of a code.
***applications***
4.1.1 Weight spectrum
The weight spectrum of a code is an important invariant, which provides useful
information for both the code structure and practical applications of the code.
Definition 4.1.1 Let C be a code of length n. The weight spectrum, also called
the weight distribution is the following set
{(w, Aw) | w = 0, 1, . . . , n}
where Aw denotes the number of codewords in C of weight w.
The so-called weight enumerator is a convenient representation of the weight
spectrum.
Definition 4.1.2 The weight enumerator of C is defined as the following poly-
nomial
WC(Z) =
n
w=0
AwZw
.
The homogeneous weight enumerator of C is defined as
WC(X, Y ) =
n
w=0
AwXn−w
Y w
.
79

80 CHAPTER 4. WEIGHT ENUMERATOR
Remark 4.1.3 Note that WC(Z) and WC(X, Y ) are equivalent in representing
the weight spectrum. They determine each other uniquely by the following
equations
WC(Z) = WC(1, Z)
and
WC(X, Y ) = Xn
WC(X−1
Y ).
Given the weight enumerator or the homogeneous weight enumerator, the weight
spectrum is determined completely by the coeﬃcients.
Clearly, the weight enumerator and homogeneous weight enumerator can be
written in another form, that is
WC(Z) =
c∈C
Zwt(c)
(4.1)
and
WC(X, Y ) =
c∈C
Xn−wt(c)
Y wt(c)
. (4.2)
Example 4.1.4 The zero code has one codeword, and its weight is zero. Hence
the homogeneous weight enumerator of this code is W{0}(X, Y ) = Xn
. The
number of words of weight w in the trivial code Fn
q is Aw = n
w (q − 1)w
. So
WFn
q
(X, Y ) =
n
w=0
n
w
(q − 1)w
Xn−w
Y w
= (X + (q − 1)Y )n
.
Example 4.1.5 The n-fold repetition code C has homogeneous weight enu-
merator
WC(X, Y ) = Xn
+ (q − 1)Y n
.
In the binary case its dual is the even weight code. Hence it has homogeneous
weight enumerator
WC⊥ (X, Y ) =
n/2
t=0
n
2t
Xn−2t
Y 2t
=
1
2
((X + Y )n
+ (X − Y )n
) .
Example 4.1.6 The nonzero entries of the weight distribution of the [7,4,3]
binary Hamming code are given by A0 = 1, A3 = 7, A4 = 7, A7 = 1, as is seen
by inspecting the weights of all 16 codewords. Hence its homogeneous weight
enumerator is
X7
+ 7X4
Y 3
+ 7X3
Y 4
+ Y 7
.
Example 4.1.7 The simplex code Sr(q) is a constant weight code by Proposi-
tion 2.3.16 with parameters [(qr
− 1)/(q − 1), r, qr−1
]. Hence its homogeneous
weight enumerator is
WSr(q)(X, Y ) = Xn
+ (qr
− 1)Xn−qr−1
Y qr−1
.

4.1. WEIGHT ENUMERATOR 81
Remark 4.1.8 Let C be a linear code. Then A0 = 1 and the minimum dis-
tance d(C) which is equal to the minimum weight, is determined by the weight
enumerator as follows:
d(C) = min{ i | Ai = 0, i 0 }.
It also determines the dimension k(C), since
WC(1, 1) =
n
w=0
Aw = qk(C)
.
Example 4.1.9 The Hamming code over Fq of length n = (qr
− 1)/(q − 1)
has parameters [n, n − r, 3] and is perfect with covering radius 1 by Proposi-
tion 3.2.28. The following recurrence relation holds for the weight distribution
(A0, A1, . . . , An) of these codes:
n
w
(q − 1)w
= Aw−1(n − w + 1)(q − 1) + Aw(1 + w(q − 2)) + Aw+1(w + 1)
for all w. This is seen as follows. Every word y of weight w is at distance at
most 1 to a unique codeword c, and such a codeword has possible weights w−1,
w or w + 1.
Let c be a codeword of weight w −1, then there are n−w +1 possible positions
j in the complement of the support of c where cj = 0 could be changed into a
nonzero element in order to get the word y of weight w.
Similarly, let c be a codeword of weight w, then either y = c or there are w
possible positions j in the support of c where cj could be changed into another
nonzero element to get y.
Finally, let c be a codeword of weight w + 1, then there are w + 1 possible
positions j in the support of c where cj could be changed into zero to get y.
Multiply the recurrence relation with Zw
and sum over w. Let W(Z) =
w AwZw
. Then
(1+(q−1)Z)n
= (q−1)nZW(Z)−(q−1)Z2
W (Z)+W(Z)+(q−2)ZW (Z)+W (Z),
since
w
n
w (q − 1)w
Zw
= (1 + (q − 1)Z)n
,
w(w + 1)Aw+1Zw
= W (Z),
w wAwZw
= ZW (Z),
w(w − 1)Aw−1Zw
= Z2
W (Z).
Therefore W(Z) satisfies the following ordinary first order differential equation:
((q − 1)Z2
− (q − 2)Z − 1)W (Z) − (1 + (q − 1)nZ)W(Z) + (1 + (q − 1)Z)n
= 0.
The corresponding homogeneous differential equation is separable:
W (Z)
W(Z)
=
1 + (q − 1)nZ
(q − 1)Z2 − (q − 2)Z − 1
and has general solution:
Wh(Z) = C(Z − 1)qr−1
((q − 1)Z + 1)n−qr−1
,

where C is some constant. A particular solution is given by:
P(Z) =
1
qr
(1 + (q − 1)Z)n
.
Therefore the solution that satisﬁes W(0) = 1 is equal to
W(Z) =
1
qr
(1 + (q − 1)Z)n
+
qr
− 1
qr
(Z − 1)qr−1
((q − 1)Z + 1)n−qr−1
.
To prove that the weight enumerator of a perfect code is completely determined
by its parameters we need the following lemma.
Lemma 4.1.10 The number Nq(n, v, w, s) of words in Fn
q of weight w that are
at distance s from a given word of weight v does not depend on the chosen word
and is equal to
Nq(n, v, w, s) =
i+j+k=s,v+k−j=w
n − v
k
v
i
v − i
j
((q − 2)i
q − 1)k
.
Proof. Consider a given word x of weight v. Let y be a word of weight w and
distance s to x. Suppose that y has k nonzero coordinates in the complement
of the support of x, j zero coordinates in the support of x, and i nonzero
coordinates in the support of x that are distinct form the coordinates of x.
Then s = d(x, y) = i + j + k and wt(y) = w = v + k − j.
There are n−v
k possible subsets of k elements in the complement of the support
of x and there are (q − 1)k
possible choices for the nonzero symbols at the
corresponding k coordinates.
There are v
i possible subsets of i elements in the support of x and there are
(q − 2)i
possible choices of the symbols at those i positions that are distinct
from the coordinates of x.
There are v−i
j possible subsets of j elements in the support of x that are zero
at those positions. Therefore
Nq(n, v, w, s) =
i+j+k=s,v+k−j=w
n − v
k
(q − 1)k v
i
(q − 2)i v − i
j
.
Remark 4.1.11 Let us consider special values of Nq(n, v, w, s). If s = 0, then
Nq(n, v, w, 0) = 1 if v = w and Nq(n, v, w, 0) = 1 otherwise. If s = 1, then
Nq(v, w, 1) =



(n − w + 1)(q − 1) if v = w − 1,
w(q − 2) if v = w,
w + 1 if v = w + 1,
0 otherwise.
Proposition 4.1.12 Let C be a perfect code of length n and covering radius ρ
and weight distribution (A0, A1, . . . , An). Then
n
w
(q − 1)w
=
w+ρ
v=w−ρ
Av
ρ
s=|v−w|
Nq(n, v, w, s) for all w.

Proof. Define the set
N(w, ρ) = { (y, c) | y ∈ Fn
q , wt(y) = w, c ∈ C, d(y, c) ≤ ρ }.
(1) For every y in Fn
q of weight w there is a unique codeword c in C that has
distance at most ρ to y, since C is perfect with covering radius ρ. Hence
|N(w, ρ)| =
n
w
(q − 1)w
.
(2) On the other hand consider the fibre of the projection on the second factor:
N(c, w, ρ) = { y ∈ Fn
q | wt(y) = w, d(y, c) ≤ ρ }.
for a given codeword c in C. If c has weight v, then
|N(c, w, ρ)| =
ρ
s=0
Nq(n, v, w, s).
Hence
|N(w, ρ)| =
n
v=0
Av
ρ
s=0
Nq(n, v, w, s)
Notice that |wt(x) − wt(y)| ≤ d(x, y). Hence Nq(n, v, w, s) = 0 if |v − w| s.
Combining (1) and (2) gives the desired result.
Example 4.1.13 The ternary Golay code has parameters [11, 6, 5] and is per-
fect with covering radius 2 by Proposition 3.2.28. We leave it as an exercise to
show by means of the recursive relations of Proposition 4.1.12 that the weight
enumerator of this code is given by
1 + 132Z5
+ 132Z6
+ 330Z8
+ 110Z9
+ 24Z11
.
Example 4.1.14 The binary Golay code has parameters [23, 12, 7] and is per-
fect with covering radius 3 by Proposition 3.2.28. We leave it as an exercise to
show by means of the recursive relations of Proposition 4.1.12 that the weight
enumerator of this code is given by
1 + 253Z7
+ 506Z8
+ 1288Z11
+ 1288Z12
+ 506Z15
+ 203Z16
+ Z23
.
4.1.2 Average weight enumerator
Remark 4.1.15 The computation of the weight enumerator of a given code is
most of the time hard. For the perfect codes such as the Hamming codes and the
binary and ternary Golay codes this is left as exercises to the reader and can be
done by using Proposition 4.1.12. In Proposition 4.4.22 the weight distribution
of MDS codes is treated. The weight enumerator of only a few infinite families
of codes is known. On the other hand the average weight enumerator of a class
of codes is very often easy to determine.

Definition 4.1.16 Let C be a nonempty class of codes over Fq of the same
length. The average weight enumerator of C is defined as the average of all WC
with C ∈ C:
WC(Z) =
1
|C|
C∈C
WC(Z),
and similarly for the homogeneous average weight enumerator WC(X, Y ) of this
class.
Definition 4.1.17 A class C of [n, k] codes over Fq is called balanced if there is
a number N(C) such that
N(C) = |{ C ∈ C | y ∈ C }|
for every nonzero word y in Fn
q
Example 4.1.18 The prime example of a class of balanced codes is the set
C[n, k]q of all [n, k] codes over Fq. ***Other examples are:***
Lemma 4.1.19 Let C be a balanced class of [n, k] codes over Fq. Then
N(C) = |C|
qk
− 1
qn − 1
.
Proof. Compute the number of elements of the set of pairs
{ (y, C) | y = 0, y ∈ C ∈ C }
in two ways. In the first place by keeping a nonzero y in Fn
q fixed, and letting
C vary in C such that y ∈ C. This gives the number (qn
− 1)N(C), since C
is balanced. Secondly by keeping C in C fixed, and letting the nonzero y in C
vary. This gives the number |C|(qk
−1). This gives the desired result, since both
numbers are the same.
Proposition 4.1.20 Let f be a function on Fn
q with values in a complex vector
space. Let C be a balanced class of [n, k] codes over Fq. Then
1
|C|
C∈C c∈C∗
f(c) =
qk
− 1
qn − 1
v∈(Fn
q )∗
f(v),
where C∗
denotes the set of all nonzero elements of C.
Proof. By interchanging the order of summation we get
C∈C v∈C∗
f(v) =
v∈(Fn
q )∗
f(v)
v∈C∈C
1.
The last summand is constant and equals N(C), by assumption. Now the result
follows by the computation of N(C) in Lemma 4.1.20.
Corollary 4.1.21 Let C be a balanced class of [n, k] codes over Fq. Then
WC(Z) = 1 +
qk
− 1
qn − 1
n
w=1
n
w
(q − 1)w
Zw
.
Proof. Apply Proposition 4.1.20 to the function f(v) = Zwt(v)
, and use (4.1)
of Remark 4.1.3.
***GV bound for a collection of balanced codes, Loeliger***

4.1.3 MacWilliams identity
Although there is no apparent relation between the minimum distances of a
code and its dual, the weight enumerators satisfy the MacWilliams identity
MacWilliams identity.
Theorem 4.1.22 Let C be an [n, k] code over Fq. Then
WC⊥ (X, Y ) = q−k
WC(X + (q − 1)Y, X − Y ).
The following simple result is useful to the proof of the MacWilliams identity.
Lemma 4.1.23 Let C be an [n, k] linear code over Fq. Let v be an element of
Fn
q , but not in C⊥
. Then, for every α ∈ Fq, there exist exactly qk−1
codewords
c such that c · v = α.
Proof. Consider the map ϕ : C → Fq defined by ϕ(c) = c · v. This is a linear
map. The map is not constant zero, since v is not in C⊥
. Hence every fibre
ϕ−1
(α) consists of the same number of elements qk−1
for all α ∈ Fq.
To prove Theorem 4.1.22, we introduce the characters of Abelian groups and
prove some lemmas.
Definition 4.1.24 Let (G, +) be an abelian group with respect to the addition
+. Let (S, ·) be the multiplicative group of the complex numbers of modulus
one. A character χ of G is a homomorphism from G to S. So, χ is a mapping
satisfying
χ(g1 + g2) = χ(g1) · χ(g2), for all g1, g2 ∈ G.
If χ(g) = 1 for all elements g ∈ G, we call χ the principal character.
Remark 4.1.25 For any character χ we have χ(0) = 1, since χ(0) is not zero
and χ(0) = χ(0 + 0) = χ(0)2
.
If G is a finite group of order N and χ is a character of G, then χ(g) is an N-th
root of unity for all g ∈ G, since
1 = χ(0) = χ(Ng) = χ(g)N
.
Lemma 4.1.26 Let χ be a character of a finite group G. Then
g∈G
χ(g) =
|G| when χ is a principal character,
0 otherwise.
Proof. The result is trivial when χ is principal. Now suppose χ is not
principal. Let h ∈ G such that χ(h) = 1. We have
χ(h)
g∈G
χ(g) =
g∈G
χ(h + g) =
g∈G
χ(g),
since the map g → h + g is a permutation of G. Hence, (χ(h) − 1)
g∈G
χ(g) = 0,
which implies
g∈G
χ(g) = 0.

Definition 4.1.27 Let V be a complex vector space. Let f : Fn
q → V be a
mapping on Fn
q with values in V . Let χ be a character of Fq. The Hadamard
transform ˆf of f is defined as
ˆf(u) =
v∈Fn
q
χ(u · v)f(v).
Lemma 4.1.28 Let f : Fn
q → V be a mapping on Fn
q with values in the complex
vector space V . Let χ be a non-principal character of Fq. Then,
c∈C
ˆf(c) = |C|
v∈C⊥
f(v).
Proof. By definition, we have
c∈C
ˆf(c) =
c∈C v∈Fn
q
χ(c · v)f(v) =
v∈Fn
q
f(v)
c∈C
χ(c · v) =
v∈C⊥
f(v)
c∈C
χ(c · v) +
v∈Fn
q C⊥
f(v)
c∈C
χ(c · v) =
|C|
v∈C⊥
f(v) +
v∈Fn
q C⊥
f(v)
c∈C
χ(c · v).
The results follows, since
c∈C
χ(c · v) = qk−1
α∈Fq
χ(α) = 0
for any v ∈ Fn
q C⊥
and χ not principal, by Lemmas 4.1.23 and 4.1.26.
Proof of Theorem 4.1.22. Let χ be a non-principal character of Fq. Con-
sider the following mapping
f(v) = Xn−wt(v)
Y wt(v)
from Fn
q to the vector space of polynomials in the variables X and Y with
complex coefficients. Then
v∈C⊥
f(v) =
v∈C⊥
Xn−wt(v)
Y wt(v)
= WC⊥ (X, Y ),
by applying (4.2) of Remark 4.1.3 to C⊥
. Let c = (c1, . . . , cn) and v =
(v1, . . . , vn). Define wt(0) = 0 and wt(α) = 1 for all nonzero α ∈ Fq. Then
wt(v) = wt(v1) + · · · + wt(vn).
The Hadamard transform ˆf(c) is equal to
v∈Fn
q
χ(c · v)Xn−wt(v)
Y wt(v)
=

v∈Fn
q
Xn−wt(v1)−···−wt(vn)
Y wt(v1)+···+wt(vn)
χ(c1v1 + · · · + cnvn) =
Xn
v∈Fn
q
n
i=1
Y
X
wt(v)
χ(civi) =
Xn
n
i=1 v∈Fq
Y
X
wt(v)
χ(civ).
If ci = 0, then
v∈Fq
Y
X
wt(v)
χ(civ) = 1 +
Y
X
α∈F∗
q
χ(α) = 1 −
Y
X
,
by Lemma 4.1.26. Hence
v∈Fq
Y
X
wt(v)
χ(civ) =
1 + (q − 1) Y
X if ci = 0,
1 − Y
X if ci = 0.
Therefore ˆf(c) is equal to
Xn
1 −
Y
X
wt(c)
1 + (q − 1)
Y
X
n−wt(c)
=
(X − Y )wt(c)
(X + (q − 1)Y )n−wt(c)
.
Hence
c∈C
ˆf(c) =
c∈C
Un−wt(c)
V wt(c)
= WC(U, V ),
by (4.2) of Remark 4.1.3 with the substitution U = X+(q−1)Y and V = X−Y .
It is shown that on the one hand
v∈C⊥
f(v) = WC⊥ (X, Y ),
and on the other hand
c∈C
ˆf(c) = WC(X + (q − 1)Y, X − Y ),
The results follows by Lemma 4.1.28 on the Hadmard transform.
Example 4.1.29 The zero code C has homogeneous weight enumerator Xn
and its dual Fn
q has homogeneous weight enumerator (X + (q − 1)Y )n
, by Ex-
ample 4.1.4, which is indeed equal to q0
WC(X + (q − 1)Y, X − Y ) and conﬁrms
MacWilliams identity.
Example 4.1.30 The n-fold repetition code C has homogeneous weight enu-
merator Xn
+ (q − 1)Y n
and the homogeneous weight enumerator of its dual
code in the binary case is 1
2 ((X + Y )n
+ (X − Y )n
), by Example 4.1.5, which is

equal to 2−1
WC(X +Y, X −Y ), confirming the MacWilliams identity for q = 2.
For arbitrary q we have
WC⊥ (X, Y ) = q−1
WC(X + (q − 1)Y, X − Y ) =
q−1
((X + (q − 1)Y )n
+ (q − 1)(X − Y )n
) =
n
w=0
n
w
(q − 1)w
+ (q − 1)(−1)w
q
Xn−w
Y w
.
Example 4.1.31 ***dual of a balanced class of codes, C⊥
balanced?***
Definition 4.1.32 An [n, k] code C over Fq is called formally self-dual if C and
C⊥
have the same weight enumerator.
Remark 4.1.33 ***A quasi self-dual code is formally self-dual, existence of an
asymp. good family of codes***
4.1.4 Exercises
4.1.1 Compute the weight spectrum of the dual of the q-ary n-fold repetition
code directly, that is without using MacWilliams identity. Compare this result
with Example 4.1.30.
4.1.2 Check MacWilliams identity for the binary [7, 4, 3] Hamming code and
its dual the [7, 3, 4] simplex code.
4.1.3 Compute the weight enumerator of the Hamming code Hr(q) by solving
the given differential equation as given in Example 4.1.9.
4.1.4 Compute the weight enumerator of the ternary Golay code as given in
Example 4.1.13.
4.1.5 Compute the weight enumerator of the binary Golay code as given in
Example 4.1.14.
4.1.6 Consider the quasi self-dual code with generator matrix (Ik|Ik) of Exer-
cise 2.5.8. Show that its weight enumerator is equal (X2
+ (q − 1)Y 2
)k
. Verify
that this code is formally self-dual.
4.1.7 Let C be the code over Fq, with q even, with generator matrix H of
Example 2.2.9. For which q does this code contain a word of weight 7 ?
4.2 Error probability
*** Some introductory results on the error probability of correct decoding up
to half the minimum distance were given in Section ??. ***

4.2. ERROR PROBABILITY 89
4.2.1 Error probability of undetected error
***
Deﬁnition 4.2.1 Consider the q-ary symmetric channel where the receiver
checks whether the received word r is a codeword or not, for instance by com-
puting wether HrT
is zero or not for a chosen parity check matrix H, and asks
for retransmission in case r is not a codeword. See Remark 2.3.2. Now it may
occur that r is again a codeword but not equal to the codeword that was sent.
This is called an undetected error .
Proposition 4.2.2 Let WC(X, Y ) be the weigh enumerator of the code C.
Then the probability of undetected error on a q-ary symmetric channel with
cross-over probability p is given by
Pue(p) = WC 1 − p,
p
q − 1
− (1 − p)n
.
Proof. Every codeword has the same probability of transmission and the code
is linear. So without loss of generality we may assume that the zero word is
sent. Hence
Pue(p) =
1
|C|
x∈C x=y∈C
P(y|x) =
0=y∈C
P(y|0).
If the received codeword y has weight w, then w symbols are changed and the
remaining n − w symbols remained the same. So P(y|0) = (1 − p)n−w p
q−1
w
by Remark 2.4.15. Hence
Pue(p) =
n
w=1
Aw(1 − p)n−w p
q − 1
w
.
Substituting X = 1−p and Y = p/(q −1) in WC(X, Y ) gives the desired result,
since A0 = 1.
Remark 4.2.3 Now Pretr(p) = 1 − Pue(p) is the probability of retransmission.
Example 4.2.4 Let C be the binary triple repetition code. Then Pue(p) = p3
,
since WC(X, Y ) = X3
+ Y 3
by Example 4.1.5.
Example 4.2.5 Let C be the [7, 4, 3] Hamming code. Then
Pue(p) = 7p3
− 21p4
+ 21p5
− 7p6
+ p7
by Example 4.1.6.
4.2.2 Probability of decoding error
Remember that in Lemma 4.1.10 a formula was derived for Nq(n, v, w, s), the
number of words in Fn
q of weight w that are at distance s from a given word of
weight v.

Proposition 4.2.6 The probability of decoding error of a decoder that corrects
up to t errors with 2t + 1 ≤ d of a code C of minimum distance d on a q-ary
symmetric channel with cross-over probability p is given by
Pde(p) =
n
w=0
p
q − 1
w
(1 − p)n−w
t
s=0
n
v=1
AvNq(n, v, w, s).
Example 4.2.7 ...........
4.2.3 Random coding
***ML (maximum likelyhood) decoding = MD (minimum distance or nearest
neighbor) decoding for the BSC.***
Proposition 4.2.8 ***...***
Perr(p) = WC(γ) − 1, where γ = 2 p(1 − p).
Proof. ....
Theorem 4.2.9 ***Shannon’s theorem for random codes***
Proof. ***...***
4.2.4 Exercises
4.2.1 ***Give the probability of undetected error for the code ....***
4.2.3 ***Give the probability of decoding error and decoding failure for the
code .... for a decoder correcting up to ... errors.***
4.3 Finite geometry and codes
***Intro***
4.3.1 Projective space and projective systems
The notion of a linear code has a geometric equivalent in the concept of a
projective system which is a set of points in projective space.
Remark 4.3.1 The affine line A over a field F is nothing else than the field F.
The projective line P is an extension of the affine line by one point at infinity.
· · · − − − − − − − − − − − − − − − − − − − · · · ∪ {∞}

4.3. FINITE GEOMETRY AND CODES 91
The elements are fractions (x0 : x1) with x0, x1 elements of a field F not
both zero, and the fraction (x0 : x1) is equal to (y0 : y1) if and only if
(x0, x1) = λ(y0 : y1) for some λ ∈ F∗
. The point (x0 : x1) with x0 = 0
is equal to (1 : x1/x0) and corresponds to the point x1/x0 ∈ A. The point
(x0 : x1) with x0 = 0 is equal to (0 : 1) and is the unique point at infinity. The
notation P(F) and A(F) is used to emphasis that the elements are in the field F.
The affine plane A2
over a field F consists of points and lines. The points are
in F2
and the lines are the subsets of the vorm { a + λv | λ ∈ F } with v = 0,
in a parametric explicit description. A line is alternatively given by an implicit
description by means of an equation aX + bY + c = 0, with a, b, c ∈ F not all
zero. Every two distinct points are contained in exactly one line. Two lines
are either parallel, that is they coincide or do not intersect, or they intersect in
exactly one point. If F is equal to the finite field Fq, then there are q2
points
and q2
+ q lines, and every line consists of q points, and the number of lines
though a given point is q + 1.
Being parallel defines an equivalence relation on the set of lines in the affine
plane, and every equivalence or parallel class of a line l defines a unique point
at infinity Pl. So Pl = Pm if and only if l and m are parallel. In this way
the affine plane is extended to the projective plane P2
by adding the points at
infinity Pl. A line in the projective plane is a line l in the affine plane extended
with its point at infinity Pl or the line at infinity, consisting of all the points at
infinity. Every two distinct points in P2
are contained in exactly one line, and
two distinct lines intersect in exactly one point. If F is equal to the finite field
Fq, then there are q2
+ q + 1 points and the same number of lines, and every
line consists of q+1 points, and the number of lines though a given point is q+1.
***picture***
Another model of the projective plane can be obtained as follows. Consider the
points of the affine plane as the plane in three space F3
with coordinates (x, y, z)
given by the equation Z = 1. Every point (x, y, 1) in the affine plane corresponds
with a unique line in F3
through the origin parameterized by λ(x, y, 1), λ ∈ F.
Conversely, a line in F3
through the origin parameterized by λ(x, y, z), λ ∈ F,
intersects the affine plane in the unique point (x/z, y/z, 1) if z = 0, and cor-
responds to the unique parallel class Pl of the line l in the affine plane with
equation xY = yX if z = 0. Furthermore every line in the affine plane corre-
sponds with a unique plane through the origin in F3
, and conversely every plane
through the origin in F3
with equation aX + bY + cZ = 0 intersects the affine
plane in the unique line with equation aX + bY + c = 0 if not both a = 0 and
b = 0, or corresponds to the line at infinity if a = b = 0.
***picture***
An F-rational point of the projective plane is a line through the origin in F3
.
Such a point is determined by a three-tuple (x, y, z) ∈ F3
, not all of them being
zero. A scalar multiple determines the same point in the projective plane. This
defines an equivalence relation ≡ by (x, y, z) ≡ (x , y , z ) if and only if there
exists a nonzero λ ∈ F such that (x, y, z) = λ(x , y , z ). The equivalence class
with representative (x, y, z) is denoted by (x : y : z), and x, y and z are called
homogeneous coordinates of the point. The set of all projective points (x : y : z),

with x, y, z ∈ F not all zero, is called the projective plane over F. The set of
F-rational projective points is denoted by P2
(F). A line in the projective plane
that is defined over F is a plane through the origin in F3
. Such a line has a
homogeneous equation aX + bY + cZ = 0 with a, b, c ∈ F not all zero.
The affine plane is embedded in the projective plane by the map (x, y) → (x : y :
1). The image is the subset of all projective points (x : y : z) such that z = 0.
The line at infinity is the line with equation Z = 0. A point at infinity of the
affine plane is a point on the line at infinity in the projective plane. Every line in
the affine plane intersects the line at infinity in a unique point and all lines in the
affine plane which are parallel, that is to say which do not intersect in the affine
plane, intersect in the same point at infinity. The above embedding of the affine
plane in the projective plane is standard, but the mappings (x, z) → (x : 1 : z)
and (y, z) → (1 : y : z) give two alternative embeddings of the affine plane. The
images are the complement of the line Y = 0 and X = 0, respectively. Thus the
projective plane is covered with three copies of the affine plane.
Definition 4.3.2 An affine subspace of Fr
of dimension s is a subset of the
form
{ a + λ1v1 + · · · + λsvs | λi ∈ F, i = 1, . . . , s },
where a ∈ Fr
, and v1, . . . , vs is a linearly independent set of vectors in Fr
, and
r − s is called the codimension of the subspace. The affine space of dimension
r over a field F, denoted by Ar
(F) consists of all affine subsets of Fr
. The
elements of Fr
are called points of the affine space. Lines and planes are the
linear subspaces of dimension one and two, respectively. A hyperplane is an
affine subspace of codimension 1.
Definition 4.3.3 A point of the projective space over a field F of dimension
r is a line through the origin in Fr+1
. A line in Pr
(F) is a plane through the
origin in Fr+1
. More generally a projective subspace of dimension s in Pr
(F) is
a linear subspace of dimension s+1 of the vector space Fr+1
, and r −s is called
the codimension of the subspace. The projective space of dimension r over a
field F, denoted by Pr
(F), consists of all its projective subspaces. A point of a
projective space is incident with or an element of a projective subspace if the line
corresponding to the point is contained in the linear subspace that corresponds
with the projective subspace. A hyperplane in Pr
(F) is a projective subspace of
codimension 1.
Definition 4.3.4 A point in Pr
(F) is denoted by its homogeneous coordinates
(x0 : x1 : · · · : xr) with x0, x1, . . . , xr ∈ F and not all zero, where λ(x0, x1, . . . , xr),
λ ∈ F, is a parametrization of the corresponding line in Fr+1
. Let (x0, x1, . . . , xr)
and (y0, y1, . . . , yr) be two nonzero vectors in Fr+1
. Then (x0 : x1 : · · · : xr)
and (y0 : y1 : · · · : yr) represent the same point in Pr
(F) if and only if
(x0, x1, . . . , xr) = λ(y0, y1, . . . , yr) for some λ ∈ F∗
. The standard homoge-
neous coordinates of a point in Pr
(F) are given by (x0 : x1 : · · · : xr) such that
there exists a j with xj = 1 and xi = 0 for all i j.
The standard embedding of Ar
(F) in Pr
(F) is given by
(x1, . . . , xr) → (1 : x1 : · · · : xr).
Remark 4.3.5 Every hyperplane in Pr
(F) is defined by an equation
a0X0 + a1X1 + · · · + arXr = 0,

where a0, a1, . . . , ar are r elements of F, not all zero. Furthermore
a0X0 + a1X1 + · · · + arXr = 0,
defines the same hyperplane if and only if there exists a nonzero λ in F such
that ai = λai for all i = 0, 1, . . . , r. Hence there is a duality between points and
hyperplanes in Pr
(F), where a (a0 : a1 : . . . : ar) is send to the hyperplane with
equation a0X0 + a1X1 + · · · + arXr = 0.
Example 4.3.6 The columns of a generator matrix of a simplex code Sr(q)
represent all the points of Pr−1
(Fq).
Proposition 4.3.7 Let r and s be non-negative integers such that s ≤ r. The
number of s dimensional projective subspaces of Pr
(Fq) is equal to the Gaussian
binomial
r + 1
s + 1 q
=
(qr+1
− 1)(qr+1
− q) · · · (qr+1
− qs
)
(qs+1 − 1)(qs+1 − q) · · · (qs+1 − qs)
In particular, the number of points of Pr
(Fq) is equal to
r + 1
1 q
=
qr+1
− 1
q − 1
= qr
+ qr−1
+ · · · + q + 1.
Proof. An s dimensional projective subspace of Pr
(Fq) is an s+1 dimensional
subspace of Fr+1
q , which is an [r + 1, s + 1] code over Fq. The number of the
latter objects is equal to the stated Gaussian binomial, by Proposition 2.5.2.
Definition 4.3.8 Let P = (P1, . . . , Pn) be an n-tuple of points in Pr
(Fq). Then
P is called a projective system in Pr
(Fq) if not all these points lie in a hyperplane.
This system is called simple if the n points are mutually distinct.
Definition 4.3.9 A code C is called degenerate if there is a coordinate i such
that ci = 0 for all c ∈ C.
Remark 4.3.10 A code C is nondegenerate if and only if there is no zero
column in a generator matrix of the code if and only if d(C⊥
) ≥ 2.
Example 4.3.11 Let G be a generator matrix of a nondegenerate code C of
dimension k. So G has no zero columns. Take the columns of G as homogeneous
coordinates of points in Pk−1
(Fq). This gives the projective system PG of G.
Conversely, let (P1, . . . , Pn) be an enumeration of the points of a projective
system P in Pr
(Fq). Let (p0j : p1j : · · · : prj) be homogeneous coordinates of Pj.
Let GP be the (r +1)×n matrix with (p0j, p1j, . . . , prj)T
as j-th column. Then
GP is the generator matrix of a nondegenerate code of length n and dimension
r + 1, since not all points lie in a hyperplane.
Proposition 4.3.12 Let C be a nondegenerate code of length n with generator
matrix G. Let PG be the projective system of G. The code has generalized
Hamming weight dr if and only if n−dr is the maximal number of points of PG
in a linear subspace of codimension r.

Proof. Let G = (gij) and Pj = (g1j : . . . : gkj). Then P = (P1, . . . , Pn). Let D
be a subspace of C of dimension r of minimal weight dr. Let c1, . . . , cr be a basis
of D. Then ci = (ci1, . . . , cin) = hiG for a nonzero hi = (hi1, . . . , hik) ∈ Fk
q .
Let Hi be the hyperplane in Pk−1
(Fq) with equation hi1X1 + . . . + hikXk = 0.
Then cij = 0 if and only if Pj ∈ Hi for all 1 ≤ i ≤ r and 1 ≤ j ≤ n. Let H be
the intersection of H1, . . . , Hr. Then H is a linear subspace of codimension r,
since the c1, . . . , cr are linearly independent. Furthermore Pj ∈ H if and only
if cij = 0 for all 1 ≤ i ≤ r if and only if j ∈ supp(D). Hence n − dr points lie in
a linear subspace of codimension r.
The proof of the converse is left to the reader.
Definition 4.3.13 A code C is called projective if d(C⊥
) ≥ 3.
Remark 4.3.14 A code of length n is projective if and only if G has no zero
column and a column is not a scalar multiple of another column of G if and
only if the projective system PG is simple for every generator matrix G of the
code.
Definition 4.3.15 A map ϕ : Pr
(F) → Pr
(F) is called a projective transfor-
mation if ϕ is given by ϕ(x0 : x1 : · · · : xr) = (y0 : y1 : · · · : yr), where
yi =
r
j=0 aijxj for all i = 0, . . . , r, for a given invertible matrix (aij) of size
r + 1 with entries in Fq.
Remark 4.3.16 The map ϕ is well defined by ϕ(x) = y with yi =
r
j=0 aijxj.
Since the equations for the yi are homogeneous in the xj.
The diagonal matrices λIr+1 induce the identity map on Pr
(F) for all λ ∈ F∗
q.
Definition 4.3.17 Let P = (P1, . . . , Pn) and Q = (Q1, . . . , Qn) be two pro-
jective systems in Pr
(F). They are called equivalent if there exists a projec-
tive transformation ϕ of Pr
(F) and a permutation σ of {1, . . . , n} such that
Q = (ϕ(Pσ(1), . . . , ϕ(Pσ(n)).
Proposition 4.3.18 There is a one-to-one correspondence between generalized
equivalence classes of non-degenerate [n, k, d] codes over Fq and equivalence
classes of projective systems of n points in Pk−1
(Fq).
Proof. The correspondence between codes and projective systems is given in
Example 4.3.11.
Let C be a nondegenerate code over Fq with parameters [n, k, d]. Let G be a
generator matrix of C. Take the columns of G as homogeneous coordinates of
points in Pk−1
(Fq). This gives the projective system PG of G. If G is another
generator matrix of C, then G = AG for some invertible k × k matrix A with
entries in Fq. Furthermore A induces a projective transformation ϕ of Pk−1
(Fq)
such that PG = ϕ(PG). So PG and PG are equivalent.
Conversely, let P = (P1, . . . , Pn) be a projective system in Pk−1
(Fq). This gives
the k × n generator matrix GP of a nondegenerate code. Another enumeration
of the points of P and another choice of the homogeneous coordinates of the Pj
gives a permutation of the columns of GP and a nonzero scalar multiple of the
columns and therefore a generalized equivalent code.
Proposition 4.3.19 Every r-tuple of points in Pr
(Fq) lie in a hyperplane.

Proof. Let P1, . . . , Pr be r points in Pr
(Fq). Let (p0j : p1j : · · · : prj) be the
standard homogeneous coordnates of Pj. The r homogeneous equations
Y0p0j + Y1p1j + · · · + Yrprj = 0, j = 1, . . . , r,
in the r + 1 variables Y0, . . . , Yr have a nonzero solution (h0, . . . , hr). Let H be
the hyperplane with equation h0X0 + · · · + hrXr = 0. Then P1, . . . , Pr lie in H.
4.3.2 MDS codes and points in general position
***points in general position***
A second geometric proof of the Singleton bound is given by means of projective
systems.
Corollary 4.3.20 (Singleton bound)
The minimum distance d of a code of length n and dimension k is at most
n − k + 1.
Proof. The zero code has parameters [n, 0, n + 1] by definition, and indeed
this code satisfies the Singleton bound. If C is not the zero code, we may
assume without loss of generality that the code is not degenerate, by deleting
the coordinates where all the codewords are zero. Let P be the projective system
in Pk−1
(Fq) of a generator matrix of the code. Then k − 1 points of the system
lie in a hyperplane by Proposition 4.3.19. Hence n − d ≥ k − 1, by Proposition
4.3.12.
The notion for projective systems that corresponds to MDS codes is the concept
of general position.
Definition 4.3.21 A projective system of n points in Pr
(Fq) is called in general
position or an n-arc if no r + 1 points lie in a hyperplane.
Example 4.3.22 Let n = q + 1 and a1, a2, . . . , aq−1 be an enumeration of the
nonzero elements of Fq. Consider the code C with generator matrix
G =


a1 a2 . . . aq−1 0 0
a2
1 a2
2 . . . a2
q−1 0 1
1 1 . . . 1 1 0


Then C is a [q + 1, 3, q − 1] code by Proposition 3.2.10. Let Pj = (xj : x2
j : 1)
for 1 j ≤ q − 1 and Pq = (0 : 0 : 1), Pq+1 = (0 : 1 : 0). Let P = (P1, . . . , Pn).
Then P = PG and P is a projective system in the projective plane in general
position. Remark that P is the set all points in the projective plane with
coordinates (x : y : z) in Fq that lie on the conic with equation X2
= Y Z.
Remark 4.3.23 If q is large enough with respect to n, then almost every pro-
jective system of n points in Pr
(Fq) is in general position, or equivalently a
random code over Fq of length n is MDS. The following proposition and corol-
lary show that every Fq-linear code with parameters [n, k, d] is contained in an
Fqm -linear MDS code with parameters [n, n − d + 1, d] if m is large enough.

Proposition 4.3.24 Let B be a q-ary code. If qm
max{ n
i |0 ≤ i ≤ t} and
d(B⊥
) t, then there exists a sequence {Br | 0 ≤ r ≤ t} of qm
-ary codes such
that Br−1 ⊆ Br and Br is an [n, r, n−r+1] code and contained in the Fqm -linear
code generated by B for all 0 ≤ r ≤ t.
Proof. The minimum distances of B⊥
and (B⊗Fqm )⊥
are the same. Induction
on t is used. In case t = 0, there is nothing to prove, we can take B0 = 0.
Suppose the statement is proved for t. Let B be a code such that d(B⊥
) t+1
and suppose qm
max{ n
i |0 ≤ i ≤ t + 1}. By induction we may assume
that there is a sequence {Br | 0 ≤ r ≤ t} of qm
-ary codes such that Br−1 ⊆
Br ⊆ B ⊗ Fqm and Br is an [n, r, n − r + 1] code for all r, 0 ≤ r ≤ t. So
B ⊗ Fqm has a generator matrix G with entries gij for 1 ≤ i ≤ k and 1 ≤ j ≤ n,
such that the first r rows of G give a generator matrix Gr of Br. In particular
the determinants of all (t × t)-sub matrices of Gt are nonzero, by Proposition
3.2.5. Let ∆(j1, . . . , jt) be the determinant of Gt(j1, . . . , jt), which is the matrix
obtained from Gt by taking the columns numbered by j1, . . . , jt, where 1 ≤
j1 . . . jt ≤ n. For t i ≤ n and 1 ≤ j1 . . . jt+1 ≤ n we define
∆(i; j1, . . . , jt+1) to be the determinant of the (t + 1) × (t + 1) sub matrix of G
formed by taking the columns numbered by j1, . . . , jt+1 and the rows numbered
by 1, . . . , t, i. Now consider for every (t + 1)-tuple j = (j1, . . . , jt+1) such that
1 ≤ j1 . . . jt+1 ≤ n, the linear equation in the variables Xt+1, . . . , Xn given
by:
t+1
s=1
(−1)s
∆(j1, . . . , ˆjs, . . . , jt+1)
it
gijs
Xi = 0 ,
where (j1, . . . , ˆjs, . . . , jt+1) is the t-tuple obtained from j by deleting the s-th
element. Rewrite this equation by interchanging the order of summation as
follows:
it
∆(i; j)Xi = 0 .
If for a given j the coefficients ∆(i; j) are zero for all i t, then all the rows of the
matrix G(j), which is the sub matrix of G consisting of the columns numbered
by j1, . . . , jt+1, are dependent on the first t rows of G(j). Thus rank(G(j)) ≤ t,
so G has t+1 columns which are dependent. But G is a parity check matrix for
(B ⊗Fqm )⊥
, therefore d((B ⊗Fqm )⊥
) ≤ t+1, which contradicts the assumption
in the induction hypothesis. We have therefore proved that for a given (t + 1)-
tuple, at least one of the coefficients ∆(i, j) is nonzero. Therefore the above
equation defines a hyperplane H(j) in a vector space over Fqm of dimension
n − t. We assumed qm
n
t+1 , so
(qm
)n−t

n
t + 1
(qm
)n−t−1
.
Therefore (Fqm )n−t
has more elements than the union of all n
t+1 hyperplanes
of the form H(j). Thus there exists an element (xt+1, . . . , xn) ∈ (Fqm )n−t
which
does not lie in this union. Now consider the code Bt+1, defined by the generator
matrix Gt+1 with entries (glj | 1 ≤ l ≤ t + 1, 1 ≤ j ≤ n), where
glj =



glj if 1 ≤ l ≤ t
it gijxi if l = t + 1

4.4. EXTENDED WEIGHT ENUMERATOR 97
Then Bt+1 is a subcode of B⊗Fqm , and for every (t+1)- tuple j, the determinant
of the corresponding (t+1)×(t+1) sub matrix of Gt+1 is equal to it ∆(i; j)xi,
which is not zero, since x is not an element of H(j). Thus Bt+1 is an [n, t+1, n−t]
code.
Corollary 4.3.25 Suppose qm
max{ n
i | 1 ≤ i ≤ d − 1}. Let C be a q-ary
code of minimum distance d, then C is contained in a qm
-ary MDS code of the
same minimum distance as C.
Proof. The Corollary follows from Proposition 4.3.24 by taking B = C⊥
and
t = d−1. Indeed, we have B0 ⊆ B1 ⊆ · · · ⊆ Bd−1 ⊆ C⊗F⊥
qm for some Fqm -linear
codes Br, r = 0, . . . , d − 1 with parameters [n, r, n − r + 1]. Applying Exercise
2.3.5 (1) we obtain C ⊗ Fqm ⊆ B⊥
d−1, so also C ⊆ B⊥
d−1 holds. Now Bd−1 is an
Fqm -linear MDS code, thus B⊥
d−1 also is and has parameters [n, n − d + 1, d] by
Corollary 3.2.14.
4.3.3 Exercises
4.3.1 Give a proof of Remarks 4.3.10 and 4.3.14.
4.3.2 Let C be the binary [7,3,4] Simplex code. Give a parity check matrix of
an [7, 4, 4] MDS code over D over F4 that contains C as a subfield subcode.
4.3.3 ....
4.4 Extended weight enumerator
***Intro***
4.4.1 Arrangements of hyperplanes
***affine/projective arrangements***
The weight spectrum can be computed by counting points in certain configura-
tions of a set of hyperplanes.
Definition 4.4.1 Let F be a field. A hyperplane in Fk
is the set of solutions in
Fk
of a given a linear equation
a1X1 + · · · + akXk = b,
where a1, . . . , ak and b are elements of F such that not all the ai are zero. The
hyperplane is called homogeneous if the equation is homogeneous, that is b = 0.
Remark 4.4.2 The equations a1X1+· · ·+akXk = b and a1X1+· · ·+akXk = b
define the same hyperplane if and and only if (a1, . . . , ak, b ) = λ(a1, . . . , ak, b)
for some nonzero λ ∈ F.

Definition 4.4.3 An n-tuple (H1, . . . , Hn) of hyperplanes in Fk
is called an
arrangement in Fk
. The arrangement is called simple if all the n hyperplanes
are mutually distinct. The arrangement is called central if all the hyperplanes
are linear subspaces. A central arrangement is called essential if the intersection
of all its hyperplanes is equal to {0}.
Remark 4.4.4 A central arrangement of hyperplanes in Fr+1
gives rise to an
arrangement of hyperplanes in Pr
(F), since the defining equations are homoge-
nous. The arrangement is essential if the intersection of all its hyperplanes is
empty in Pr
(F). The dual notion of an arrangement in projective space is a
projective system.
Definition 4.4.5 Let G = (gij) be a generator matrix of a nondegenerate code
C of dimension k. So G has no zero columns. Let Hj be the linear hyperplane
in Fk
q with equation
g1jX1 + · · · + gkjXk = 0
The arrangement (H1, . . . , Hn) associated with G will be denoted by AG.
Remark 4.4.6 Let G be a generator matrix of a code C. Then the rank of G
is equal to the number of rows of G. Hence the arrangement AG is essential.
A code C is projective if and only if d(C⊥
) ≥ 3 if and only if AG is simple.
Similarly as in Definition 4.3.17 on equivalent projective systems one defines the
equivalence of the dual notion, that is of essential arrangements of hyperplanes
in Pr
(F). Then there is a one-to-one correspondence between generalized equiv-
alence classes of non-degenerate [n, k, d] codes over Fq and equivalence classes
of essential arrangements of n hyperplanes in Pk−1
(Fq) as in Proposition 4.3.18.
Example 4.4.7 Consider the matrix G given by
G =


1 0 0 0 1 1 1
0 1 0 1 0 1 1
0 0 1 1 1 0 1

 .
Let C be the code over Fq with generator matrix G. For q = 2, this is the
simplex code S2(2). The columns of G represent also the coefficients of the lines
of AG. The projective picture of AG is given in Figure 4.1.
Proposition 4.4.8 Let C be a nondegenerate code with generator matrix G.
Let c be a codeword c = xG for some x ∈ Fk
. Then
wt(c) = n − number of hyperplanes in AG through x.
Proof. Now c = xG. So cj = g1jx1 + · · · + gkjxk. Hence cj = 0 if and only if
x lies on the hyperplane Hj. The results follows, since the weight of c is equal
to n minus the number of positions j such that cj = 0.
Remark 4.4.9 The number Aw of codewords of weight w equals the number of
points that are on exactly n−w of the hyperplanes in AG, by Proposition 4.4.8.
In particular An is equal to the number of points that is in the complement of

H2
H4
H5
H6
H1
H4
H3
H6
H5
H7 H7
H1
H3
H2
Figure 4.1: Arrangement of G for q odd and q even
the union of these hyperplanes in Fk
q . This number can be computed by the
principle of inclusion/exclusion:
An = qk
− |H1 ∪ · · · ∪ Hn| =
qk
+
n
w=1
(−1)w
i1···iw
|Hi1 ∩ · · · ∩ Hiw |.
The following notations are introduced to find a formalism as above for the
computation of the weight enumerator.
Definition 4.4.10 For a subset J of {1, 2, . . . , n} define
C(J) = {c ∈ C | cj = 0 for all j ∈ J},
l(J) = dim C(J) and
BJ = ql(J)
− 1 and Bt =
|J|=t
BJ .
Remark 4.4.11 The encoding map x → xG = c from vectors x ∈ Fk
q to
codewords gives the following isomorphism of vector spaces
j∈J
Hj
∼= C(J)
by Proposition 4.4.8. Furthermore BJ is equal to the number of nonzero code-
words c that are zero at al j in J, and this is equal to the number of nonzero
elements of the intersection j∈J Hj.
The following two lemmas about the determination of l(J) will become useful
later.
Lemma 4.4.12 Let C be a linear code with generator matrix G. Let J ⊆
{1, . . . , n} and |J| = t. Let GJ be the k × t submatrix of G consisting of the
columns of G indexed by J, and let r(J) be the rank of GJ . Then the dimension
l(J) is equal to k − r(J).

Proof. The code CJ is defined in 3.1.2 by restricting the codewords of C to
J. Then GJ is a generator matrix of CJ by Remark 3.1.3. Consider the the
projection map πJ : C → Fk
q given by πJ (c) = cJ . Then πJ is a linear map.
The image of C under πJ is CJ and the kernel of πJ is C(J) by definition. It
follows that dim CJ + dim C(J) = dim C. So l(J) = k − r(J).
Lemma 4.4.13 Let k be the dimension of C. Let d and d⊥
be the minimum
distance the code C and its dual code, respectively. Then
l(J) =
k − t for all t d⊥
,
0 for all t n − d.
Furthermore
k − t ≤ l(J) ≤
k − d⊥
+ 1 for all t ≥ d⊥
,
n − d − t + 1 for all t ≤ n − d.
Proof. (1) Let t n − d and let J be a subset of {1, . . . , n} of size t and
c a codeword such that c ∈ C(J). Then J is contained in the complement of
the support of c. Hence t ≤ n − wt(c). Hence wt(c) ≤ n − t d. So c = 0.
Therefore C(J) = 0 and l(J) = 0.
(2) Let J be t-subset of {1, . . . , n}. Then C(J) is defined by t homogenous linear
equations on the vector space C of dimension t. So l(J) ≥ k − t.
(3) The matrix G is a parity check matrix for the dual code, by (2) of Corollary
2.3.29. Now suppose that t d⊥
. Then any t columns of G are independent, by
Proposition 2.3.11. So l(J) = k − t for all t-subsets J of {1, . . . , n} by Lemma
4.4.12.
(4) Assume that t ≤ n − d. Let J be a t-subset. Let t = n − d + 1. Choose a
t -subset J such that J ⊆ J . Then
C(J ) = { c ∈ C(J) | cj = 0 for all j ∈ J J }.
Now l(J ) = 0 by (1). Hence C(J ) = 0 and C(J ) is obtained from C(J)
by imposing |J J| = n − d − t + 1 linear homogeneous equations. Hence
l(J) = dim C(J) ≤ n − d − t + 1.
(5) Assume that d⊥
≤ t. Let J be a t-subset. Let t = d⊥
− 1. Choose a
t -subset J such that J ⊆ J. Then l(J ) = k − d⊥
+ 1 by (3) and l(J) ≤ l(J ),
since J ⊆ J. Hence l(J) ≤ k − d⊥
+ 1.
Remark 4.4.14 Notice that d⊥
≤ n − (n − k) + 1 and n − d ≤ k − 1 by the
Singleton bound. So for t = k both cases of Lemma 4.4.13 apply and both give
l(J) = 0.
Proposition 4.4.15 Let k be the dimension of C. Let d and d⊥
be the mini-
mum distance the code C and its dual code, respectively. Then
Bt =
n
t (qk−t
− 1) for all t d⊥
,
Furthermore
n
t
(qk−t
− 1) ≤ Bt ≤
n
t
(qmin{n−d−t+1,k−d⊥
+1}
− 1)
for all d⊥
≤ t ≤ n − d.

Proof. This is a direct consequence of Lemma 4.4.13 and the definition of Bt.
Proposition 4.4.16 The following formula holds
Bt =
n−t
w=d
n − w
t
Aw.
Proof. This is shown by computing the number of elements of the set of pairs
Bt = {(J, c) | J ⊆ {1, 2, . . . , n}, |J| = t, c ∈ C(J), c = 0}
in two different ways, as in Lemma 4.1.19.
For fixed J, the number of these pairs is equal to BJ , by definition.
If we fix the weight w of a nonzero codeword c in C, then the number of zero
entries of c is n − w and if c ∈ C(J), then J is contained in the complement of
the support of c, and there are n−w
t possible choices for such a J. In this way
we get the right hand side of the formula.
Theorem 4.4.17 The homogeneous weight enumerator of C can be expressed
in terms of the Bt as follows.
WC(X, Y ) = Xn
+
n
t=0
Bt(X − Y )t
Y n−t
.
Proof. Now
Xn
+
n
t=0
Bt(X − Y )t
Y n−t
= Xn
+
n−d
t=0
Bt(X − Y )t
Y n−t
,
since Bt = 0 for all t n−d by Proposition 4.4.15. Substituting the formula for
Bt of Proposition 4.4.16, interchanging the order of summation in the double
sum and applying the binomial expansion of ((X − Y ) + Y )n−w
gives that the
above formula is equal to
Xn
+
n−d
t=0
n−t
w=d
n − w
t
Aw(X − Y )t
Y n−t
=
Xn
+
n
w=d
Aw
n−w
t=0
n − w
t
(X − Y )t
Y n−w−t
) Y w
=
Xn
+
n
w=d
AwXn−w
Y w
= WC(X, Y )
Proposition 4.4.18 Let A0, . . . , An be the weight spectrum of a code of mini-
mum distance d. Then A0 = 1, Aw = 0 if 0 w d and
Aw =
n−d
t=n−w
(−1)n+w+t t
n − w
Bt if d ≤ w ≤ n.

Proof. This identity is proved by inverting the argument of the proof of
the formula of Theorem 4.4.17 and using the binomial expansion of (X − Y )t
.
This is left as an exercise. An alternative proof is given by the principle of
inclusion/exclusion. A third proof can be obtained by using Proposition 4.4.16.
A fourth proof is obtained by showing that the transformation of the Bt’s in the
Aw’s and vice versa are given by the linear maps given in Propositions 4.4.16
and 4.4.18 that are each others inverse. See Exercise 4.4.5.
Example 4.4.19 Consider the [7, 4, 3] Hamming code as in Examples 2.2.14
and ??. Then its dual is the [7, 3, 4] Simplex code. Hence d = 3 and d⊥
= 4.
So Bt = 7
t (24−t
− 1) for all t 4 and Bt = 0 for all t 4 by Proposition
4.4.15. Of the 35 subsets J of size 4 there are exactly 7 of them such that
l(J) = 1 and l(J) = 0 for the 28 remaining subsets, by Exercise 2.3.4. Therefore
B4 = 7(21
− 1) = 7. To find the the Aw we apply Proposition 4.4.18.
B0 = 15
B1 = 49
B2 = 63
B3 = 35
B4 = 7
A3 = B4 = 7
A4 = B3 − 4B4 = 7
A5 = B2 − 3B3 + 6B4 = 0
A6 = B1 − 2B2 + 3B3 − 4B4 = 0
A7 = B0 − B1 + B2 − B3 + B4 = 1
This is in agreement with Example 4.1.6
4.4.2 Weight distribution of MDS codes
***
Definition 4.4.20 Let C be a code of length n, minimum distance d and dual
minimum distance d⊥
. The genus of C is defined by g(C) = max{n + 1 − k, k +
1 − d⊥
}. ***Transfer to end of 3.2.1***
***diagram of (un)known values of Bt(T).****
Remark 4.4.21 The Bt are known as functions of the parameters [n, k]q of
the code for all t d⊥
and for all t n − d. So the Bt is unknown for the
n − d − d⊥
+ 1 values of t such that d⊥
≤ t ≤ n − d. In particular the weight
enumerator of an MDS code is completely determined by the parameters [n, k]q
of the code.
Proposition 4.4.22 The weight distribution of an MDS code of length n and
dimension k is given by
Aw =
n
w
w−d
j=0
(−1)j w
j
qw−d+1−j
− 1 .
for w ≥ d = n − k + 1.
Proof. Let C be an [n, k, n−k+1] MDS code. Then its dual is also an MDS code
with parameters [n, n−k, k +1] by Proposition 3.2.7. Then Bt = n
t qk−t
− 1

for all t d⊥
= k +1 and Bt = 0 for all t n−d = k −1 by Proposition 4.4.15.
Hence
Aw =
n−d
t=n−w
(−1)n+w+t t
n − w
n
t
qk−t
− 1
by Proposition 4.4.18. Make the substitution j = t−n+w. Then the summation
is from j = 0 to j = w − d. Furthermore
t
n − w
n
t
=
n
w
w
j
.
This gives the formula for Aw.
Remark 4.4.23 Let C be an [n, k, n − k + 1] MDS code. Then the number of
nonzero codewords of minimal weight is
Ad =
n
d
(q − 1)
according to Proposition 4.4.22. This is in agreement with Remark 3.2.15.
Remark 4.4.24 The trivial codes with parameters [n, n, 1] and [n, 0, n+1], and
the repetition code and its dual with parameters [n, 1, n] and [n, n − 1, 2] are
MDS codes of arbitrary length. But the length is bounded if 2 ≤ k according
to the following proposition.
Proposition 4.4.25 Let C be an MDS code over Fq of length n and dimension
k. If k ≥ 2, then n ≤ q + k − 1.
Proof. Let C be an [n, k, n − k + 1] code such that 2 ≤ k. Then d + 1 =
n − k + 2 ≤ n and
Ad+1 =
n
d + 1
(q2
− 1) − (d + 1)(q − 1) =
n
d + 1
(q − 1)(q − d)
by Proposition 4.4.22. This implies that d ≤ q, since Ad+1 ≥ 0. Now n =
d + k − 1 ≤ q + k − 1.
Remark 4.4.26 Proposition 4.4.25 also holds for nonlinear codes. That is: if
there exits an (n, qk
, n − k + 1) code such that k ≥ 2, then d = n − k + 1 ≤ q.
This is proved by means of orthogonal arrays by Bush as we will see in Section
5.5.1.
Corollary 4.4.27 (Bush bound) Let C be an MDS code over Fq of length n
and dimension k. If k ≥ q, then n ≤ k + 1.
Proof. If n k + 1, then C⊥
is an MDS code of dimension n − k ≥ 2. Hence
n ≤ q + (n − k) − 1 by Proposition 4.4.25. Therefore k q.
Remark 4.4.28 The length of the repetition code is arbitrary long. The length
n of a q-ary MDS code of dimension k is at most q+k−1 if 2 ≤ k, by Proposition
4.4.25. In particular the maximal length of an MDS code is a function of k and
q, if k ≥ 2.

Definition 4.4.29 Let k ≥ 2. Let m(k, q) be the maximal length of an MDS
code over Fq of dimension k.
Remark 4.4.30 So m(k, q) ≤ k + q − 1 if 2 ≤ k, and m(k, q) ≤ k + 1 if k ≥ q
by the Bush bound.
We have seen that m(k, q) is at least q + 1 for all k and q in Proposition 3.2.10.
Let C be an [n, 2, n − 1] code. Then C is systematic at the first two positions,
so we may assume that its generator matrix G is of the form
G =
1 0 x3 x4 . . . xn
0 1 y3 y4 . . . yn
.
The weight of all codewords is a least n − 1. Hence xj = 0 and yj = 0 for all
3 ≤ j ≤ n. The code is generalized equivalent with a code with xj = 1, after
dividing the j-th coordinate by xj for j ≥ 3. Let gi be the i-th row of G. If
3 ≤ j l and yj = yl, then g2 − yjg1 is a codeword of weight at most n − 2,
which is a contradiction. So n − 2 ≤ q − 1. Therefore m(2, q) = q + 1. Dually
we get m(q − 1, q) = q + 1.
If case q is even, then m(3, q) is least q + 2 by the following Example 3.2.12 and
dually m(q − 1, q) ≥ q + 2.
Later it will be shown in Proposition 13.5.1 that these values are in fact optimal.
Remark 4.4.31 The MDS conjecture states that for a nontrivial [n, k, n−k+1]
MDS code over Fq we have that n ≤ q + 2 if q is even and k = 3 or k = q − 1;
and n ≤ q + 1 in all other cases. So it is conjectured that
m(k, q) =
q + 1 if 2 ≤ k ≤ q,
k + 1 if q k,
except for q is even and k = 3 or k = q − 1, then
m(3, q) = m(q − 1, q) = q + 2.
4.4.3 Extended weight enumerator
Definition 4.4.32 Let Fqm be the extension field of Fq of degree m. Let C
be a Fq-linear code of length n. The extension by scalars of C to Fqm is the
Fqm -linear subspace in Fn
qm generated by C and will be denoted by C ⊗ Fqm .
Remark 4.4.33 Let G be a generator matrix of the code C of length n over
Fq. Then G is also a generator matrix of C ⊗ Fqm over Fqm -linear code with.
The dimension l(J) is equal to k − r(J) by Lemma 4.4.12, where r(J) is the
rank of the k × t submatrix GJ of G consisting of the t columns indexed by J.
This rank is equal to the number of pivots of GJ , so this rank does not change
by an extension of Fq to Fqm . So
dimFqm C ⊗ Fqm (J) = dimFq
C(J).
Hence the numbers BJ (qm
) and Bt(qm
) of the code C ⊗ Fqm are equal to
BJ (qm
) = qm·l(J)
− 1 and Bt(qm
) =
|J|=t
BJ (qm
).
This motivates to consider qm
as a variable in the following definitions.

Definition 4.4.34 Let C be an Fq-linear code of length n.
BJ (T) = Tl(J)
− 1 and Bt(T) =
|J|=t
BJ (T).
The extended weight enumerator is defined by
WC(X, Y, T) = Xn
+
n−d
t=0
Bt(T)(X − Y )t
Y n−t
.
Proposition 4.4.35 Let d and d⊥
be the minimum distance of code and the
dual code, respectively. Then
Bt(T) =
n
t (Tk−t
− 1) for all t d⊥
,
Proof. This is a direct consequence of Lemma 4.4.13 and the definition of Bt.
Theorem 4.4.36 The extended weight enumerator of a linear code of length n
and minimum distance d can be expressed as a homogeneous polynomial in X
and Y of degree n with coefficients Aw(T) that are integral polynomials in T.
WC(X, Y, T) =
n
w=0
Aw(T)Xn−w
Y w
,
where A0(T) = 1, and Aw(T) = 0 if 0 w d, and
Aw(T) =
n−d
t=n−w
(−1)n+w+t t
n − w
Bt(T) if d ≤ w ≤ n.
Proof. The proof is similar to the proof of Proposition 4.4.18 and is left as an
exercise.
Remark 4.4.37 The definition of Aw(T) is consistent with the fact that Aw(qm
)
is the number of codewords of weight w in C ⊗ Fqm and
WC(X, Y, qm
) =
n
w=0
Aw(qm
)Xn−w
Y w
= WC⊗Fqm (X, Y )
by Proposition 4.4.18 and Theorem 4.4.36.
Proposition 4.4.38 The following formula holds
Bt(T) =
n−t
w=d
n − w
t
Aw(T).

Remark 4.4.39 Using Theorem 4.4.36 it is immediate to find the weight dis-
tribution of a code over any extension Fqm if one knows the l(J) over the ground
field Fq for all subsets J of {1, . . . , n}. Computing the C(J) and l(J) for a fixed
J is just linear algebra. The large complexity for the computation of the weight
enumerator and the minimum distance in this way stems from the exponential
growth of the number of all possible subsets of {1, . . . , n}.
Example 4.4.40 Consider the [7, 4, 3] Hamming code as in Example 4.4.19 but
now over all extensions of the binary field. Then Bt(T) = 7
t (T4−t
− 1) for all
t 4 and Bt = 0 for all t 4 by Proposition 4.4.35 and B4(T) = 7(T − 1) = 7.
To find the the Aw(T) we apply Theorem 4.4.36.
A3(T) = B4(T) = 7(T-1)
A4(T) = B3(T) − 4B4(T) = 7(T-1)
A5(T) = B2(T) − 3B3(T) + 6B4(T) = 21(T-1)(T-2)
A6(T) = B1(T) − 2B2(T) + 3B3(T) − 4B4(T) = 7(T-1)(T-2)(T-3)
Hence
A7(T) = B0(T) − B1(T) + B2(T) − B3 + B4(T)
= T4
− 7T3
+ 21T2
− 28T + 13
***factorize, example 4.1.8***
The following description of the extended weight enumerator of a code will be
useful.
Proposition 4.4.41 The extended weight enumerator of a code of length n can
be written as
WC(X, Y, T) =
n
t=0 |J|=t
Tl(J)
(X − Y )t
Y n−t
.
Proof. By rewriting ((X − Y ) + Y )n
, we get
n
t=0 |J|=t
Tl(J)
(X − Y )t
Y n−t
=
n
t=0
(X − Y )t
Y n−t
|J|=t
((Tl(J)
− 1) + 1)
=
n
t=0
(X − Y )t
Y n−t

 n
t
+
|J|=t
(Tl(J)
− 1)


=
n
t=0
n
t
(X − Y )t
Y n−t
+
n
t=0
Bt(X − Y )t
Y n−t
= Xn
+
n
t=0
Bt(X − Y )t
Y n−t
= WC(X, Y, T).
***Examples, repetition code, Hamming, simplex, Golay, MDS code***
***MacWilliams identity***

4.4.4 Puncturing and shortening
There are several ways to get new codes from existing ones. In this section, we
will focus on puncturing and shortening of codes and show how they are used in
an alternative algorithm for finding the extended weight enumerator. The algo-
rithm is based on the Tutte-Grothendieck decomposition of matrices introduced
by Brylawski [31]. Greene [59] used this decomposition for the determination of
the weight enumerator.
Let C be a linear [n, k] code and let J ⊆ {1, . . . , n}. Then the code C punctured
by J is obtained by deleting all the coordinates indexed by J from the codewords
of C. The length of this punctured code is n − |J| and the dimension is at most
k. Let C be a linear [n, k] code and let J ⊆ {1, . . . , n}. If we puncture the code
C(J) by J, we get the code C shortened by J. The length of this shortened
code is n − |J| and the dimension is l(J).
The operations of puncturing and shortening a code are each others dual: punc-
turing a code C by J and then taking the dual, gives the same code as shortening
C⊥
by J.
We have seen that we can determine the extended weight enumerator of a [n, k]
code C with the use of a k × n generator matrix of C. This concept can be
generalized for arbitrarily matrices, not necessarily of full rank.
Definition 4.4.42 Let F be a field. Let G be a k × n matrix over F, possibly
of rank smaller than k and with zero columns. Then for each J ⊆ {1, . . . , n} we
define
l(J) = l(J, G) = k − r(GJ ).
as in Lemma 7.4.37. Define the extended weight enumerator WG(X, Y, T) as in
Definition 4.4.34.
We can now make the following remarks about WG(X, Y, T).
Proposition 4.4.43 Let G be a k × n matrix over F and WG(X, Y, T) the
associated extended weight enumerator. Then the following statements hold:
(i) WG(X, Y, T) is invariant under row-equivalence of matrices.
(ii) Let G be a l × n matrix with the same row-space as G, then we have
WG(X, Y, T) = Tk−l
WG (X, Y, T). In particular, if G is a generator ma-
trix of a [n, k] code C, we have WG(X, Y, T) = WC(X, Y, T).
(iii) WG(X, Y, T) is invariant under permutation of the columns of G.
(iv) WG(X, Y, T) is invariant under multiplying a column of G with an element
of F∗
.
(v) If G is the direct sum of G1 and G2, i.e. of the form
G1 0
0 G2
,
then WG(X, Y, T) = WG1
(X, Y, T) · WG2
(X, Y, T).

Proof. (i) If we multiply G from the left with an invertible k × k matrix, the
r(J) do not change, and therefore (i) holds.
For (ii), we may assume without loss of generality that k ≥ l. Because G and
G have the same row-space, the ranks r(GJ ) and r(GJ ) are the same. So
l(J, G) = k − l + l(J, G ). Using Proposition 4.4.41 we have for G
WG(X, Y, T) =
n
t=0 |J|=t
Tl(J,G)
(X − Y )t
Y n−t
=
n
t=0 |J|=t
Tk−l+l(J,G )
(X − Y )t
Y n−t
= Tk−l
n
t=0 |J|=t
Tl(J,G )
(X − Y )t
Y n−t
= Tk−l
WG (X, Y, T).
The last part of (ii) and (iii)–(v) follow directly from the definitions.
With the use of the extended weight enumerator for general matrices, we can
derive a recursive algorithm to determine the extended weight enumerator of a
code. Let G be a k × n matrix with entries in F. Suppose that the j-th column
is not the zero vector. Then there exists a matrix row-equivalent to G such that
the j-th column is of the form (1, 0, . . . , 0)T
. Such a matrix is called reduced at
the j-th column. In general, this reduction is not unique.
Let G be a matrix that is reduced at the j-th column a. The matrix Ga is the
k×(n−1) matrix G with the column a removed, and G/a is the (k−1)×(n−1)
matrix G with the column a and the first row removed. We can view G a as
G punctured by a, and G/a as G shortened by a.
For the extended weight enumerators of these matrices, we have the following
connection (we omit the (X, Y, T) part for clarity):
Proposition 4.4.44 Let G be a k ×n matrix that is reduced at the j-th column
a. For the extended weight enumerator of a reduced matrix G holds
WG = (X − Y )WG/a + Y WGa.
Proof. We distinguish between two cases here. First, assume that G a and
G/a have the same rank. Then we can choose a G with all zeros in the first
row, except for the 1 in the column a. So G is the direct sum of 1 and G/a. By
Proposition 4.4.43 parts (v) and (ii) we have
WG = (X + (T − 1)Y )WG/a and WGa = TWG/a.
Combining the two gives
WG = (X + (T − 1)Y )WG/a
= (X − Y )WG/a + Y TWG/a
= (X − Y )WG/a + Y WGa.

For the second case, assume that G a and G/a do not have the same rank. So
r(G a) = r(G/a) + 1. This implies G and G a do have the same rank. We
have that
WG(X, Y, T) =
n
t=0 |J|=t
Tl(J,G)
(X − Y )t
Y n−t
.
by Proposition 4.4.41. This double sum splits into the sum of two parts by
distinguishing between the cases j ∈ J and j ∈ J.
Let j ∈ J, t = |J|, J = J {j} and t = |J | = t − 1. Then
l(J , G/a) = k − 1 − r((G/a)J ) = k − r(GJ ) = l(J, G).
So the ﬁrst part is equal to
n
t=0 |J|=t
j∈J
Tl(J,G)
(X − Y )t
Y n−t
=
n−1
t =0 |J |=t
Tl(J ,G/a)
(X − Y )t +1
Y n−1−t
which is equal to (X − Y )WG/a.
Let j ∈ J. Then (G a)J = GJ . So l(J, G a) = l(J, G). Hence the second part
is equal to
n
t=0 |J|=t
j∈J
Tl(J,G)
(X − Y )t
Y n−t
= Y
n−1
t =0 |J|=t
j∈J
Tl(J,Ga)
(X − Y )t
Y n−1−t
which is equal to Y WGa.
Theorem 4.4.45 Let G be a k × n matrix over F with n k of the form G =
(Ik|P), where P is a k×(n−k) matrix over F. Let A ⊆ [k] and write PA for the
matrix formed by the rows of P indexed by A. Let WA(X, Y, T) = WPA
(X, Y, T).
Then the following holds:
WC(X, Y, T) =
k
l=0 |A|=l
Y l
(X − Y )k−l
WA(X, Y, T).
Proof. We use the formula of the last proposition recursively. We denote the
construction of G a by G1 and the construction of G/a by G2. Repeating this
procedure, we get the matrices G11, G12, G21 and G22. So we get for the weight
enumerator
WG = Y 2
WG11
+ Y (X − Y )WG12
+ Y (X − Y )WG21
+ (X − Y )2
WG22
.
Repeating this procedure k times, we get 2k
matrices with n − k columns and
0, . . . , k rows, which form exactly the PA. In the diagram are the sizes of the
matrices of the ﬁrst two steps: note that only the k × n matrix on top has to
be of full rank. The number of matrices of size (k − i) × (n − j) is given by the

binomial coeﬃcient j
i .
k × n
k × (n − 1) (k − 1) × (n − 1)
k × (n − 2) (k − 1) × (n − 2) (k − 2) × (n − 2)
On the last line we have W0(X, Y, T) = Xn−k
. This proves the formula.
Example 4.4.46 Let C be the even weight code of length n = 6 over F2. Then
a generator matrix of C is the 5×6 matrix G = (I5|P) with P = (1, 1, 1, 1, 1, 1)T
.
So the matrices PA are l × 1 matrices with all ones. We have W0(X, Y, T) =
X and Wl(X, Y, T) = Tl−1
(X + (T − 1)Y ) by part (ii) of Proposition 4.4.43.
Therefore the weight enumerator of C is equal to
WC(X, Y, T) = WG(X, Y, T)
= X(X − Y )5
+
5
l=1
5
l
Y l
(X − Y )5−l
Tl−1
(X + (T − 1)Y )
= X6
+ 15(T − 1)X4
Y 2
+ 20(T2
− 3T + 2)X3
Y 3
+15(T3
− 4T2
+ 6T − 3)X2
Y 4
+6(T4
− 5T3
+ 10T2
− 10T + 4)XY 5
+(T5
− 6T4
+ 15T3
− 20T2
+ 15T − 5)Y 6
.
For T = 2 we get WC(X, Y, 2) = X6
+15X4
Y 2
+15X2
Y 4
+Y 6
, which we indeed
recognize as the weight enumerator of the even weight code that we found in
Example 4.1.5.
4.4.5 Exercises
4.4.1 Compute the extended weight enumerator of the binary simplex code
S3(2).
4.4.2 Compute the extended weight enumerators of the n-fold repetition code
and its dual.
4.4.3 Compute the extended weight enumerator of the binary Golay code.
4.4.4 Compute the extended weight enumerator of the ternary Golay code.
4.4.5 Consider the square matrices A and B of size n + 1 with entries aij and
bij, respectively given by
aij = (−1)i+j i
j
, and bij =
i
j
for 0 ≤ i, j ≤ n.
Show that A and B are inverses of each other.

4.5. GENERALIZED WEIGHT ENUMERATOR 111
4.4.6 Give a proof of Theorem 4.4.36.
4.4.8 Compare the complexity of the methods ”exhaustive search” and ”ar-
rangements of hyperplanes” to compute the weight enumerator as a function of
q and the parameters [n, k, d] and d⊥
.
4.5 Generalized weight enumerator
***Intro***
4.5.1 Generalized Hamming weights
We recall that for a linear code C, the minimum Hamming weight is the min-
imal one among all Hamming weights wt(c) for nonzero codewords c = 0. In
this subsection, we generalize this parameter to get a sequence of values, the
so-called generalized Hamming weights, which are useful in the study of the
complexity of the trellis decoding and other properties of the code C.
***C nondegenerate?***
Let D be a subcode of C. Generalizing Definition 2.2.2, we define the support
of D, denoted by supp(D), as set of positions where at least one codeword in D
is not zero, i.e.,
supp(D) = {i | there exists x ∈ D, such that xi = 0}.
The weight of D, wt(D), is defined as the size of supp(D).
Suppose C is an [n, k] code. For any r ≤ k, the r-th generalized Hamming weight
(GHW) of C is defined as
dr(C) = min{wt(D) | D is a k−dimensional subcode of C}.
The set of GHWs {d1(C), . . . , dk(C)} is called the weight hierarchy of C. Note
that since any 1−dimensional subcode has a nonzero codeword as its basis, the
first generalized Hamming weight d1(C) is exactly equal to the minimum weight
of C.
We now state several properties of generalized Hamming weights.
Proposition 4.5.1 (Monotonicity) For an [n, k] code C, the generalized Ham-
ming weights satisfy
1 ≤ d1(C) d2(C) . . . dk(C) ≤ n.
Proof. For any 1 ≤ r ≤ k − 1, it is trivial to verify 1 ≤ dr(C) ≤ dr+1(C) ≤ n.
Let D be a subcode of dimension r + 1, such that wt(D) = dr+1(C). We choose
any index i ∈ supp(D). Consider
E = {x | x ∈ D, and xi = 0}.

By Definition 3.1.13 and Proposition 3.1.15, E is a shortened code of D, and
r ≤ dim(E) ≤ r + 1. However, by the choice of i, there exists a codeword c ∈ D
with ci = 0. Thus, c can not be a codeword of E. This implies that E is a
proper subcode of D, that is dim(E) = r. Now, by the definition of the GHWs,
we have
dr(C) ≤ wt(E) ≤ wt(D) − 1 = dr+1(C) − 1.
This proves that dr(C) dr+1(C).
Proposition 4.5.2 (Generalized Singleton Bound) For an [n, k] code C, we
have
dr(C) ≤ n − k + r.
This bound on dr(C) is a straightforward consequence of the Proposition 4.5.1.
When r = 1, we get the Singleton bound (see Theorem 3.2.1).
Let H be a parity check matrix of the [n, k] code C, which is a (n−k)×n matrix
of rank n − k. From Proposition 2.3.11, we know that the minimum distance of
C is the smallest integer d such that d columns of H are linearly dependent. We
now present a generalization of this property. Let Hi, 1 ≤ i ≤ n, be the column
vectors of H. For any subset I of {1, 2, . . . , n}, let Hi | i ∈ I be the subset of
Fn
q generated by the vectors Hi, i ∈ I, which, for simplicity, is denoted by VI.
Lemma 4.5.3 The r-th generalized Hamming weight of C is
dr(C) = min{|I| | dim( Hi | i ∈ I ) ≤ |I| − r}.
Proof. We denote V ⊥
I = {x | xi = 0 for i ∈ I, and i∈I xiHi = 0}. Then it
is easy to see that dim(VI) + dim(V ⊥
I ) = |I|. Also, from the definition, for any
I, V ⊥
I is a subcode of C.
Let D be a subcode of C with dim(D) = r and |supp(D)| = dr(C). Let
I = supp(D). Then D ⊆ V ⊥
I . This implies that dim(VI) = |I|−dim(V ⊥
I ) ≤ |I|−
dim(D) = |I| − r. Therefore, dr(C) = |supp(D)| = |I| ≥ min{|I| | dim(VI) ≤
|I| − r}. We now prove the inverse inequality. Denote d = min{|I| | dim(VI) ≤
|I| − r}. Let I be a subset of {1, 2, . . . , n} such that dim(VI) ≤ |I| − r and
|I| = d. Then dim(V ⊥
I ) ≥ r. Therefore, dr(C) ≤ |supp(V ⊥
I )| ≤ |I| = d.
Proposition 4.5.4 (Duality) Let C be an [n, k] code. Then the weight hierar-
chy of its dual code C⊥
is completely determined by the weight hierarchy of C,
precisely
{dr(C⊥
) | 1 ≤ r ≤ n − k} = {1, 2, . . . , n}{n + 1 − ds(C) | 1 ≤ s ≤ k}.
Proof. Look at the two sets {dr(C⊥
) | 1 ≤ r ≤ n−k} and {n+1−ds(C) | 1 ≤
s ≤ k}. Both are subsets of {1, 2, . . . , n}. And by the Monotonicity, the first
one has size n − k, the second one has size k. Thus, it is sufficient to prove that
these two sets are distinct.
We now prove an equivalent fact that for any 1 ≤ r ≤ k, the value n+1−dr(C)
is not a generalized Hamming weight of C⊥
. Let t = n − k + r − dr(C).

It is sufficient to prove that dt(C⊥
) n + 1 − dr(C) and for any δ ≥ 1,
dt+δ(C⊥
) = n + 1 − dr(C). Let D be a subcode of C with dim(D) = r and
|supp(D)| = dr(C). There exists a parity check matrix G for C⊥
(which is a
generator matrix for C), where the first r rows are words in D and the last
k − r rows are not. The column vectors {Gi | i ∈ supp(D)} have their first r
coordinates zero. Thus, dim( Gi | i ∈ supp(D) )= column rank of the matrix
(Gi | i ∈ supp(D)) ≤ row rank of the matrix (Ri | r + 1 ≤ i ≤ k) ≤ k − r,
where Ri is the i-th row vector of G. Let I = {1, 2, . . . , n}supp(D). Then,
|I| = n − dr(C). And dim( Gi | i ∈ I ) ≤ k − r = |I| − t. Thus, by Lemma
4.5.3, we have dt(C⊥
) ≤ |I| = n − dr(C) n − dr(C) + 1.
Next, we show dt+δ(C⊥
) = n+1−dr(C). Otherwise, dt+δ(C⊥
) = n+1−dr(C)
holds for some δ. Then by the definition of generalized Hamming weight, there
exists a generator matrix H for C⊥
(which is a parity check matrix for C) and
dr(C) − 1 positions 1 ≤ i1, . . . , idr(C)−1 ≤ n, such that the coordinates of the
first t + δ rows of H are all zero at these dr(C) − 1 positions. Without loss of
generality, we assume these positions are exactly the last dr(C) − 1 positions
n − dr(C) + 2, . . . , n. And let I = {n − dr(C) + 2, . . . , n}. Clearly, the last |I|
column vectors span a space of dimension ≤ n − k − t − δ = dr(C) − r − δ. By
Lemma 4.5.3, ds(C) ≤ dr(C)−1, where s = |I|−(dr(C)−r−δ) = r+δ −1 ≥ r.
This contradicts the Monotonicity.
4.5.2 Generalized weight enumerators
The weight distribution is generalized in the following way. Instead of looking
at words of C, we consider all the subcodes of C of a certain dimension r.
Definition 4.5.5 Let C be a linear code of length n. The number of subcodes
with a given weight w and dimension r is denoted by A
(r)
w , that is
A(r)
w = |{D ⊆ C| dim D = r, wt(D) = w}|.
Together they form the r-th generalized weight distribution of the code. The
r-th generalized weight enumerator W
(r)
C (X, Y ) of C is the polynomial with the
weight distribution as coefficients, that is
W
(r)
C (X, Y ) =
n
w=0
A(r)
w Xn−w
Y w
.
Remark 4.5.6 From this definition it follows that A
(0)
0 = 1 and A
(r)
0 = 0 for
all 0 r ≤ k. Furthermore, every 1-dimensional subspace of C contains q − 1
non-zero codewords, so (q − 1)A
(1)
w = Aw for 0 w ≤ n. This means we can
find back the original weight enumerator by using
WC(X, Y ) = W
(0)
C (X, Y ) + (q − 1)W
(1)
C (X, Y ).
Definition 4.5.7 We introduce the following notations:
[m, r]q =
r−1
i=0
(qm
− qi
)

r q = [r, r]q
k
r q
=
[k, r]q
r q
.
Remark 4.5.8 In Proposition 2.5.2 it is shown that the first number is equal
to the number of m × r matrices of rank r over Fq. and the third number is the
Gaussian binomial, and it represents the number of r-dimensional subspaces of
Fk
q . Hence the second number is the number of bases of Fr
q.
Definition 4.5.9 For J ⊆ {1, . . . , n} and r ≥ 0 an integer we define:
B
(r)
J = |{D ⊆ C(J)|D subspace of dimension r}|
B
(r)
t =
|J|=t
B
(r)
J
Remark 4.5.10 Note that B
(r)
J = l(J)
r
q
. For r = 0 this gives B
(0)
t = n
t . So
we see that in general l(J) = 0 does not imply B
(r)
J = 0, because 0
0 q
= 1. But
if r = 0, we do have that l(J) = 0 implies B
(r)
J = 0 and B
(r)
t = 0.
Proposition 4.5.11 Let dr be the r-th generalized Hamming weight of C, and
d⊥
the minimum distance of the dual code C⊥
. Then we have
B
(r)
t =
n
t
k−t
r
q
for all t d⊥
0 for all t n − dr
Proof. The first case is is a direct corollary of Lemma 4.4.13, since there
are n
t subsets J ⊆ {1, . . . , n} with |J| = t. The proof of the second case goes
analogous to the proof of the same lemma: let |J| = t, t n − dr and suppose
there is a subspace D ⊆ C(J) of dimension r. Then J is contained in the
complement of supp(D), so t ≤ n − wt(D). It follows that wt(D) ≤ n − t dr,
which is impossible, so such a D does not exist. So B
(r)
J = 0 for all J with
|J| = t and t n − dr, and therefore B
(r)
t = 0 for t n − dr.
We can check that the formula is well-defined: if t d⊥
then l(J) = k − t. If
also t n − dr, we have t n − dr ≥ k − r by the generalized Singleton bound.
This implies r k − t = l(J), so k−t
r
q
= 0.
The relation between B
(r)
t and A
(r)
w becomes clear in the next proposition.
Proposition 4.5.12 The following formula holds:
B
(r)
t =
n
w=0
n − w
t
A(r)
w .
Proof. We will count the elements of the set
B
(r)
t = {(D, J)|J ⊆ {1, . . . , n}, |J| = t, D ⊆ C(J) subspace of dimension r}

in two different ways. For each J with |J| = t there are B
(r)
J pairs (D, J) in
B
(r)
t , so the total number of elements in this set is |J|=t B
(r)
J = B
(r)
t . On the
other hand, let D be an r-dimensional subcode of C with wt(D) = w. There are
A
(r)
w possibilities for such a D. If we want to find a J such that D ⊆ C(J), we
have to pick t coordinates from the n−w all-zero coordinates of D. Summation
over all w proves the given formula.
Note that because A
(r)
w = 0 for all w dr, we can start summation at w = dr.
We can end summation at w = n − t because for t n − w we have n−w
t = 0.
So the formula can be rewritten as
B
(r)
t =
n−t
w=dr
n − w
t
A(r)
w .
In practice, we will often prefer the summation given in the proposition.
Theorem 4.5.13 The generalized weight enumerator is given by the following
formula:
W
(r)
C (X, Y ) =
n
t=0
B
(r)
t (X − Y )t
Y n−t
.
Proof. The proof is similar to the one given for Theorem 4.4.17 and is left as
an exercise.
It is possible to determine the A
(r)
w directly from the B
(r)
t , by using the next
proposition.
Proposition 4.5.14 The following formula holds:
A(r)
w =
n
t=n−w
(−1)n+w+t t
n − w
B
(r)
t .
Proof. The proof is similar to the one given for Proposition 4.4.18 and is left
as an exercise.
4.5.3 Generalized weight enumerators of MDS-codes
We can use the theory in Sections 4.5.2 and 4.4.3 to calculate the weight dis-
tribution, generalized weight distribution, and extended weight distribution of
a linear [n, k] code C. This is done by determining the values l(J) for each
J ⊆ {1, . . . , n}. In general, we have to look at the 2n
different subcodes of C
to find the l(J), but for the special case of MDS codes we can find the weight
distributions much faster.
Proposition 4.5.15 Let C be a linear [n, k] MDS code, and let J ⊆ {1, . . . , n}.
Then we have
l(J) =
0 for t k
k − t for t ≤ k
so for a given t the value of l(J) is independent of the choice of J.

Proof. We know that the dual of an MDS code is also MDS, so d⊥
= k + 1.
Now use d = n − k + 1 in Lemma 7.4.39.
Now that we know all the l(J) for an MDS code, it is easy to ﬁnd the weight
distribution.
Theorem 4.5.16 Let C be an MDS code with parameters [n, k]. Then the
generalized weight distribution is given by
A(r)
w =
n
w
w−d
j=0
(−1)j w
j
w − d + 1 − j
r q
.
The coeﬃcients of the extended weight enumerator are given by
Aw(T) =
n
w
w−d
j=0
(−1)j w
j
(Tw−d+1−j
− 1).
Proof. We will give the construction for the generalized weight enumerator
here: the case of the extended weight enumerator goes similar and is left as an
exercise. We know from Proposition 4.5.15 that for an MDS code, B
(r)
t depends
only on the size of J, so B
(r)
t = n
t
k−t
r
q
. Using this in the formula for A
(r)
w
and substituting j = t − n + w, we have
A(r)
w =
n−dr
t=n−w
(−1)n+w+t t
n − w
B
(r)
t
=
n−dr
t=n−w
(−1)t−n+w t
n − w
n
t
k − t
r q
=
w−dr
j=0
(−1)j n
w
w
j
k + w − n − j
r q
=
n
w
w−dr
j=0
(−1)j w
j
w − d + 1 − j
r q
.
In the second step, we are using the binomial equivalence
n
t
t
n − w
=
n
n − w
n − (n − w)
t − (n − w)
=
n
w
w
n − t
.
So, for all MDS-codes with given parameters [n, k] the extended and generalized
weight distributions are the same. But not all such codes are equivalent. We
can conclude from this, that the generalized and extended weight enumerators
are not enough to distinguish between codes with the same parameters. We
illustrate the non-equivalence of two MDS codes by an example.

Example 4.5.17 Let C be a linear [n, 3] MDS code over Fq. It is possible to
write the generator matrix G of C in the following form:


1 1 . . . 1
x1 x2 . . . xn
y1 y2 . . . yn

 .
Because C is MDS we have d = n − 2. We now view the n columns of G as
points in the projective plane P2
(Fq), say P1, . . . , Pn. The MDS property that
every k columns of G are independent is now equivalent with saying that no
three points are on a line. To see that these n points do not always determine
an equivalent code, consider the following construction. Through the n points
there are n
2 = N lines, the set N. These lines determine (the generator ma-
trix of) a [N, 3] code ˆC. The minimum distance of the code ˆC is equal to the
total number of lines minus the maximum number of lines from N through an
arbitrarily point P ∈ P2
(Fq) by Proposition 4.4.8. If P /∈ {P1, . . . , Pn} then
the maximum number of lines from N through P is at most 1
2 n, since no three
points of N lie on a line. If P = Pi for some i ∈ {1, . . . , n} then P lies on ex-
actly n − 1 lines of N, namely the lines PiPj for j = i. Therefore the minimum
distance of ˆC is d = N − n + 1.
We now have constructed a [N, 3, N − n + 1] code ˆC from the original code C.
Notice that two codes ˆC1 and ˆC2 are generalized equivalent if C1 and C2 are
generalized equivalent. The generalized and extended weight enumerators of an
MDS code of length n and dimension k are completely determined by the pair
(n, k), but this is not generally true for the weight enumerator of ˆC.
Take for example n = 6 and q = 9, so ˆC is a [15, 3, 10] code. Look at the codes
C1 and C2 generated by the following matrices respectively, where α ∈ F9 is a
primitive element:


1 1 1 1 1 1
0 1 0 1 α5
α6
0 0 1 α3
α α3




1 1 1 1 1 1
0 1 0 α7
α4
α6
0 0 1 α5
α 1


Being both MDS codes, the weight distribution is (1, 0, 0, 120, 240, 368). If we
now apply the above construction, we get ˆC1 and ˆC2 generated by


1 0 0 1 1 α4
α6
α3
α7
α 1 α2
1 α7
1
0 1 0 α7
1 0 0 α4
1 1 0 α6
α 1 α3
0 0 1 1 0 1 1 1 0 0 1 1 1 1 1




1 0 0 α7
α2
α3
α 0 α7
α7
α4
α7
α 0 0
0 1 0 1 0 α3
0 α6
α6
0 α7
α α6
α3
α
0 0 1 α5
α5
α6
α3
α7
α4
α3
α5
α2
α4
α α5


The weight distribution of ˆC1 and ˆC2 are, respectively,
(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 16, 312, 288, 64) and
(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 32, 264, 336, 48).
So the latter two codes are not generalized equivalent, and therefore not all
[6, 3, 4] MDS codes over F9 are generalized equivalent.

Another example was given in [110, 29] showing that two [6, 3, 4] MDS codes
could have distinct covering radii.
4.5.4 Connections
There is a connection between the extended weight enumerator and the gener-
alized weight enumerators. We first proof the next proposition.
Proposition 4.5.18 Let C be a linear [n, k] code over Fq, and let Cm
be the
linear subspace consisting of the m × n matrices over Fq whose rows are in C.
Then there is an isomorphism of Fq-vector spaces between C ⊗ Fqm and Cm
.
Proof. Let α be a primitive m-th root of unity in Fqm . Then we can
write an element of Fqm in an unique way on the basis (1, α, α2
, . . . , αm−1
) with
coefficients in Fq. If we do this for all the coordinates of a word in C ⊗ Fqm , we
get an m × n matrix over Fq. The rows of this matrix are words of C, because
C and C ⊗ Fqm have the same generator matrix. This map is clearly injective.
There are (qm
)k
= qkm
words in C ⊗ Fqm , and the number of elements of Cm
is (qk
)m
= qkm
, so our map is a bijection. It is given by
m−1
i=0
ci1αi
,
m−1
i=0
ci2αi
, . . . ,
m−1
i=0
cinαi
→





c01 c02 c03 . . . c0n
c11 c12 c13 . . . c1n
...
...
...
...
...
c(m−1)1 c(m−1)2 c(m−1)3 . . . c(m−1)n





.
We see that the map is Fq-linear, so it gives an isomorphism C ⊗ Fqm → Cm
.
Note that this isomorphism depends on the choice of a primitive element α. We
also need the next subresult.
Lemma 4.5.19 Let c ∈ C ⊗Fqm and M ∈ Cm
the corresponding m×n matrix
under a given isomorphism. Let D ⊆ C be the subcode generated by the rows of
M. Then wt(c) = wt(D).
Proof. If the j-th coordinate cj of c is zero, then the j-th column of M consists
of only zero’s, because the representation of cj on the basis (1, α, α2
, . . . , αm−1
)
is unique. On the other hand, if the j-th column of M consists of all zeros, then
cj is also zero. Therefore wt(c) = wt(D).
Proposition 4.5.20 Let C be a linear code over Fq. Then the weight numera-
tor of an extension code and the generalized weight enumerators are connected
via
Aw(qm
) =
m
r=0
[m, r]qA(r)
w .
Proof. We count the number of words in C ⊗ Fqm of weight w in two ways,
using the bijection of Proposition 4.5.18. The first way is just by substituting
T = qm
in Aw(T): this gives the left side of the equation. For the second way,

note that every M ∈ Cm
generates a subcode of C whose weight is equal to
the weight of the corresponding word in C ⊗ Fqm . Fix this weight w and a
dimension r: there are A
(r)
w subcodes of C of dimension r and weight w. Every
such subcode is generated by an r × n matrix whose rows are words of C. Left
multiplication by an m × r matrix of rank r gives an element of Cm
which
generates the same subcode of C, and all such elements of Cm
are obtained this
way. The number of m × r matrices of rank r is [m, r]q, so summation over all
dimensions r gives
Aw(qm
) =
k
r=0
[m, r]qA(r)
w .
We can let the summation run to m, because A
(r)
w = 0 for r k and [m, r]q = 0
for r m. This proves the given formula.
In general, we have the following theorem.
Theorem 4.5.21 Let C be a linear code over Fq. Then the extended weight
numerator is determined by the generalized weight enumerators:
WC(X, Y, T) =
k
r=0


r−1
j=0
(T − qj
)

 W
(r)
C (X, Y ).
Proof. If we know A
(r)
w for all r, we can determine Aw(qm
) for every m. If
we have k + 1 values of m for which Aw(qm
) is known, we can use Lagrange
interpolation to ﬁnd Aw(T), for this is a polynomial in T of degree at most k.
In fact, we have
Aw(T) =
k
r=0


r−1
j=0
(T − qj
)

 A(r)
w .
This formula has the right degree and is correct for T = qm
for all integer values
m ≥ 0, so we know it must be the correct polynomial. Therefore the theorem
follows.
The converse of the theorem is also true: we can write the generalized weight
enumerator in terms of the extended weight enumerator. For this end the fol-
lowing lemma is needed.
Lemma 4.5.22
r−1
j=0
(Z − qj
) =
r
j=0
r
j q
(−1)r−j
q(r−j
2 )Zj
.
Proof. This identity can be proven by induction and is left as an exercise.
Theorem 4.5.23 Let C be a linear code over Fq. Then the r-th generalized
weight enumerator is determined by the extended weight enumerator:
W
(r)
C (X, Y ) =
1
r q
r
j=0
r
j q
(−1)r−j
q(r−j
2 ) WC(X, Y, qj
).

Proof. We consider the generalized weight enumerator in terms of Theorem
4.5.13. Using Remark ?? and rewriting gives the following:
W
(r)
C (X, Y ) =
n
t=0
B
(r)
t (X − Y )t
Y n−t
=
n
t=0 |J|=t
l(J)
r q
(X − Y )t
Y n−t
=
n
t=0 |J|=t


r−1
j=0
ql(J)
− qj
qr − qj

 (X − Y )t
Y n−t
=
1
r−1
v=0(qr − qv)
n
t=0 |J|=t


r−1
j=0
(ql(J)
− qj
)

 (X − Y )t
Y n−t
=
1
r q
n
t=0 |J|=t
r
j=0
r
j q
(−1)r−j
q(r−j
2 )qj·l(J)
(X − Y )t
Y n−t
=
1
r q
r
j=0
r
j q
(−1)r−j
q(r−j
2 )
n
t=0 |J|=t
(qj
)l(J)
(X − Y )t
Y n−t
=
1
r q
r
j=0
r
j q
(−1)r−j
q(r−j
2 ) WC(X, Y, qj
)
In the fourth step Lemma 4.5.22 is used.
4.5.5 Exercises
4.5.2 Compute the generalized weight enumerator of the binary Golay code.
4.5.3 Compute the generalized weight enumerator of the ternary Golay code.
4.5.4 Give a proof of Lemma 4.5.22.
4.6 Notes
Puncturing and shortening at arbitrary sets of positions and the duality theo-
rem is from Simonis [?].
Golay code, Turyn [?] construction, Pless handbook [?] .
MacWillimas
***–puncturing gives the binary [23,12,7] Golay code, which is cyclic.
–automorphism group of (extended) Golay code.
– (ext4ended) ternary Golay code.

4.6. NOTES 121
– designs and Golay codes.
– lattices and Golay codes.***
***repeated decoding of product code (Hoeholdt-Justesen).
***Singleton defect s(C) = n + 1 − k − d
s(C) ≥ 0 and equality holds if and only if C is MDS.
s(C) = 0 if and only if s(C⊥
) = 0.
Example where s(C) = 1 and s(C⊥
) 1.
Almost MDS and near MDS.
Genus g = max{s(C), s(C⊥
) } in 4.1. If k ≥ 2, then d ≤ q(s + 1). If k ≥ 3 and
d = q(s + 1), then s + 1 ≤ q.
Faldum-Willems, de Boer, Dodunekov-Langev,
relation with Griesmer bound***
***Incidence structures and geometric codes

Chapter 5
Codes and related
structures
Relinde Jurrius and Ruud Pellikaan
***In this chapter seemingly unrelated topics are discussed.***
5.1 Graphs and codes
5.1.1 Colorings of a graph
Graph theory is regarded to start with the paper of Euler [57] with his solution
of the problem of the K¨onigbergs bridges. For an introduction to the theory of
graphs we refer to [14, 136].
Deﬁnition 5.1.1 A graph Γ is a pair (V, E) where V is a non-empty set and E
is a set disjoint from V . The elements of V are called vertices, and members of
E are called edges. Edges are incident to one or two vertices, which are called
the ends of the edge. If an edge is incident with exactly one vertex, then it
is called a loop. If u and v are vertices that are incident with an edge, then
they are called neighbors or adjacent. Two edges are called parallel if they are
incident with the same vertices. The graph is called simple if it has no loops
and no parallel edges.
•
•
•
•
•
•
Figure 5.1: A planar graph
123

124 CHAPTER 5. CODES AND RELATED STRUCTURES
Definition 5.1.2 A graph is called planar if the there is an injective map f :
V → R2
from the set of vertices V to the real plane such that for every edge
e with ends u and v there is a simple curve in the plane connecting the ends
of the edge such that mutually distinct simple curves do not intersect except
at the endpoints. More formally: for every edge e with ends u and v there is
an injective continuous map ge : [0, 1] → R2
from the unit interval to the plane
such that {f(u), f(v)} = {ge(0), ge(1)}, and ge(0, 1) ∩ ge (0, 1) = ∅ for all edges
e, e with e = e .
Example 5.1.3 Consider the next riddle:
Three new-build houses have to be connected to the three nearest
terminals for gas, water and electricity. For security reasons, the
connections are not allowed to cross. How can this be done?
The answer is “not”, because the corresponding graph (see Figure 5.3) is not
planar. This riddle is very suitable to occupy kids who like puzzles, but make
sure to have an easy explainable proof of the improbability. We leave it to the
reader to find one.
Definition 5.1.4 Let Γ1 = (V1, E1) and Γ2 = (V2, E2) be graphs. A map
ϕ : V1 → V2 is called a morphism of graphs if ϕ(v) and ϕ(w) are connected in
Γ2 for all v, w ∈ V1 that are connected in Γ1. The map is called an isomorphism
of graphs if it is a morphism of graphs and there exists a map ψ : V2 → V1 such
that it is a morphism of graphs and it is the inverse of ϕ. The graphs are called
isomorphic if there is an isomorphism of graphs between them.
Definition 5.1.5 An edge of a graph is called an isthmus if the number of com-
ponents of the graph increases by deleting the edge. It the graph is connected,
then deleting an isthmus gives a graph that is no longer connected. Therefore
an isthmus is also called a bridge. An edge is an isthmus if and only if it is in
no cycle. Therefore an edge that is an isthmus is also called an acyclic edge.
Remark 5.1.6 By deleting loops and parallel edges from a graph Γ one gets a
simple graph. There is a choice in the process of deleting parallel edges, but the
resulting graphs are all isomorphic. We call this simple graph the simplification
of the graph and it is denoted by ¯Γ.
Definition 5.1.7 Let Γ = (V, E) be a graph. Let K be a finite set and k = |K|.
The elements of K are called colors. A k-coloring of Γ is a map γ : V → K
such that γ(u) = γ(v) for all distinct adjacent vertices u and v in V . So vertex
u has color γ(u) and all other adjacent vertices have a color distinct from γ(u).
Let PΓ(k) be the number of k-colorings of Γ. Then PΓ is called the chromatic
polynomial of Γ.
Remark 5.1.8 If the graph Γ has no edges, then PΓ(k) = kv
where |V | = v
and |K| = k, since it is equal to the number of all maps from V to K. In
particular there is no map from V to an empty set in case V is nonempty. So
the number of 0-colorings is zero for every graph.
The number of colorings of graphs was studied by Birkhoff [16], Whitney[130,
129] and Tutte[121, 124, 125, 126, 127]. Much research on the chromatic poly-
nomial was motivated by the four-color problem of planar graphs.

5.1. GRAPHS AND CODES 125
Example 5.1.9 Let Kn be the complete graph on n vertices in which every
pair of two distinct vertices is connected by exactly one edge. Then there is no
k coloring if k n. Now let k ≥ n. Take an enumeration of the vertices. Then
there are k possible choices of a color of the first vertex and k − 1 choices for
the second vertex, since the first and second vertex are connected. Now suppose
•
•
•
• •
Figure 5.2: Complete graph K5
by induction that we have a coloring of the first i vertices, then there are k − i
possibilities to color the next vertex, since the (i + 1)-th vertex is connected to
the first i vertices. Hence
PKn
(k) = k(k − 1) · · · (k − n + 1)
So PKn (k) is a polynomial in k of degree n.
Proposition 5.1.10 Let Γ = (V, E) be a graph. Then PΓ(k) is a polynomial
in k.
Proof. See[16]. Let γ : V → K be a k-coloring of Γ with exactly i colors. Let
σ be a permutation of K. Then the composition of maps σ ◦ γ is also k-coloring
of Γ with exactly i colors. Two such colorings are called equivalent. Then
k(k − 1) · · · (k − i + 1) is the number of colorings in the equivalence class of a
given k-coloring of Γ with exactly i colors. Let mi be the number of equivalence
classes of colorings with exactly i colors of the set K. Let v = |V |. Then PΓ(k)
is equal to
m1k+m2k(k−1)+. . .+mik(k−1) · · · (k−i+1)+. . .+mvk(k−1) · · · (k−v+1).
Therefore PΓ(k) is a polynomial in k.
Definition 5.1.11 A graph Γ = (V, E) is called bipartite if V is the disjoint
union of two nonempty sets M and N such that the ends of an edge are in M
and in N. Hence no two points in M are adjacent and no two points in N are
adjacent. Let m and n be integers such that 1 ≤ m ≤ n. The complete bipartite
graph Km,n is the graph on a set of vertices V that is the disjoint union of two
sets M and N with |M| = m and |N| = n, and such that every vertex in M is
connected with every vertex in N by a unique edge.
Another tool to show that PΓ(k) is a polynomial this by deletion-contraction
of graphs, a process which is similar to the puncturing and shortening of codes
from Section ??.

•
• • •
• •
Figure 5.3: Complete bipartite graph K3,3
Definition 5.1.12 Let Γ = (V, E) be a graph. Let e be an edge that is incident
to the vertices u and v. Then the deletion Γe is the graph with vertices V and
edges E {e}. The contraction Γ/e is the graph obtained by identifying u and
v and deleting e. Formally this is defined as follows. Let ˜u = ˜v = {u, v}, and
˜w = {w} if w = u and w = v. Let ˜V = { ˜w : w ∈ V }. Then Γ/e is the graph
( ˜V , E {e}), where an edge f = e is incident with ˜w in Γ/e if f is incident with
w in Γ.
Remark 5.1.13 Notice that the number of k-colorings of Γ does not change
by deleting loops and a parallel edge. Hence the chromatic polynomial Γ and
its simplification ¯Γ are the same.
The following proposition is due to Foster. See the concluding note in [129].
Proposition 5.1.14 Let Γ = (V, E) be a simple graph. Let e be an edge of Γ.
Then the following deletion-contraction formula holds:
PΓ(k) = PΓe(k) − PΓ/e(k)
for all positive integers k.
Proof. Let u and v be the vertices of e. Then u = v, since the graph is
simple. Let γ be a k-coloring of Γ e. Then γ is also a coloring of Γ if and
only if γ(u) = γ(v). If γ(u) = γ(v), then consider the induced map ˜γ on ˜V
defined by ˜γ(˜u) = γ(u) and ˜γ( ˜w) = γ(w) if w = u and w = v. The map ˜γ gives
a k-coloring of Γ/e. Conversely, every k-coloring of Γ/e gives a k-coloring γ of
Γ e such that γ(v) = γ(w). Therefore
PΓe(k) = PΓ(k) + PΓ/e(k).
This follows also from a more general deletion-contraction formula for matroids
that will be treated in Section 5.2.6 and Proposition ??.
5.1.2 Codes on graphs
Definition 5.1.15 Let Γ = (V, E) be a graph. Suppose that V ⊆ V and
E ⊆ E and all the endpoints of e in E are in V . Then Γ = (V , E ) is a
graph and it is called a subgraph of Γ.
Definition 5.1.16 Two vertices u to v are connected by a path from u to v
if there is a t-tuple of mutually distinct vertices (v1, . . . , vt) with u = v1 and
v = vt, and a (t − 1)-tuple of mutually distinct edges (e1, . . . , et−1) such that ei
is incident with vi and vi+1 for all 1 ≤ i t. If moreover et is an edge that is
incident with u and v and distinct from ei for all i t, then (e1, . . . , et−1, et) is
called a cycle. The length of the smallest cycle is called the girth of the graph
and is denoted by γ(Γ).

5.1. GRAPHS AND CODES 127
Definition 5.1.17 The graph is called connected if every two vertices are con-
nected by a path. A maximal connected subgraph of Γ is called a connected
component of Γ. The vertex set V of Γ is a disjoint union of subsets Vi and set
of edges E is a disjoint union of subsets Ei such that Γi = (Vi, Ei) is a connected
component of Γ. The number of connected components of Γ is denoted by c(Γ).
Definition 5.1.18 Let Γ = (V, E) be a finite graph. Suppose that V consists
of m elements enumerated by v1, . . . , vm. Suppose that E consists of n elements
enumerated by e1, . . . , en. The incidence matrix I(Γ) is a m × n matrix with
entries aij defined by
aij =



1 if ej is incident with vi and vk for some i k,
−1 if ej is incident with vi and vk for some i k,
0 otherwise.
Suppose moreover that Γ is simple. Then AΓ is the arrangement (H1, . . . , Hn)
of hyperplanes where Hj = Xi − Xk if ej is incident with vi and vk with i k.
An arrangement A is called graphic if A is isomorphic with AΓ for some graph
Γ.
***characteristic polynomial det(A − λI), Matrix tree theorem
Definition 5.1.19 The graph code of Γ over Fq is the Fq-linear code that is
generated by the rows of the incidence matrix I(Γ). The cycle code CΓ of Γ is
the dual of the graph code of Γ.
Remark 5.1.20 Let Γ be a finite graph without loops. Then the arrangement
AΓ is isomorphic with ACΓ
.
Proposition 5.1.21 Let Γ be a finite graph. Then CΓ is a code with parameters
[n, k, d], where n = |E|, k = |E| − |V | + c(Γ) and d = γ(Γ).
Proof. See [14, Prop. 4.3]
***Sparse graph codes, Gallager or Low-density parity check codes and Tanner
graph codes play an important role in the research of coding theory at this mo-
ment. See [77, 99].
5.1.3 Exercises
5.1.1 Determine the chromatic polynomial of the bipartite graph K3,2.
5.1.2 Determine the parameters of the cycle code of the complete graph Km.
Show that the code CK4
over F2 is equivalent to the punctured binary [7, 3, 4]
simplex code.
5.1.3 Determine the parameters of the cycle code of the bipartite graph Km,n.
Let C(m) be the dual of the n-fold repetition code. Show that CKm,n is equiv-
alent to the product code C(m) ⊗ C(n).

5.2 Matroids and codes
Matroids were introduced by Whitney [130, 131] in axiomatizing and generaliz-
ing the concepts of independence in linear algebra and circuit in graph theory.
In the theory of arrangements one uses the notion of a geometric lattice. In
graph and coding theory one refers more to matroids.
5.2.1 Matroids
Definition 5.2.1 A matroid M is a pair (E, I) consisting of a finite set E and
a collection I of subsets of E such that the following three conditions hold.
(I.1) ∅ ∈ I.
(I.2) If J ⊆ I and I ∈ I, then J ∈ I.
(I.3) If I, J ∈ I and |I| |J|, then there exists a j ∈ (JI) such that I∪{j} ∈ I.
A subset I of E is called independent if I ∈ I , otherwise it is called dependent.
Condition (I.2) is called the independence augmentation axiom.
Remark 5.2.2 If J is a subset of E, then J has a maximal independent subset,
that is there exists an I ∈ I such that I ⊆ J and I is maximal with respect to
this property and the inclusion. If I1 and I2 are maximal independent subsets
of J, then |I1| = |I2| by condition (I.3). The rank or dimension of a subset
J of E is the number of elements of a maximal independent subset of J. An
independent set of rank r(M) is called a basis of M. The collection of all bases
of M is denoted by B.
Example 5.2.3 Let n and k be non-negative integers such that k ≤ n. Let
Un,k be a set consisting of n elements and In,k = {I ⊆ Un,k||I| ≤ k}. Then
(Un,k, In,k) is a matroid and called the uniform matroid of rank k on n elements.
A subset B of Un,k is a basis if and only if |B| = k. The matroid Un,n has no
dependent sets and is called free.
Definition 5.2.4 Let (E, I) be a matroid. An element x in E is called a loop if
{x} is a dependent set. Let x and y in E be two distinct elements that are not
loops. Then x and y are called parallel if r({x, y}) = 1. The matroid is called
simple if it has no loops and no parallel elements. Now Un,r is the only simple
matroid of rank r if r ≤ 2.
Remark 5.2.5 Let G be a k×n matrix with entries in a field F. Let E be the set
[n] indexing the columns of G and IG be the collection of all subsets I of E such
that the submatrix GI consisting of the columns of G at the positions of I are
independent. Then MG = (E, IG) is a matroid. Suppose that F is a finite field
and G1 and G2 are generator matrices of a code C, then (E, IG1
) = (E, IG2
).
So the matroid MC = (E, IC) of a code C is well defined by (E, IG) for some
generator matrix G of C. If C is degenerate, then there is a position i such that
ci = 0 for every codeword c ∈ C and all such positions correspond one-to-one
with loops of MC. Let C be nondegenerate. Then MC has no loops, and the
positions i and j with i = j are parallel in MC if and only if the i-th column of
G is a scalar multiple of the j-th column. The code C is projective if and only if
the arrangement AG is simple if and only if the matroid MC is simple. A [n, k]
code C is MDS if and only if the matroid MC is the uniform matroid Un,k.

5.2. MATROIDS AND CODES 129
Definition 5.2.6 Let M = (E, I) be a matroid. Let B be the collection of all
bases of M. Define B⊥
= (E B) for B ∈ B, and B⊥
= {B⊥
: B ∈ B}. Define
I⊥
= {I ⊆ E : I ⊆ B for some B ∈ B⊥
}. Then (E, I⊥
) is called the dual
matroid of M and is denoted by M⊥
.
Remark 5.2.7 The dual matroid is indeed a matroid. Let C be a code over a
finite field. Then (MC)⊥
is isomorphic with MC⊥ as matroids.
Let e be a loop of the matroid M. Then e is not a member of any basis of M.
Hence e is in every basis of M⊥
. An element of M is called an isthmus if it is
an element of every basis of M. Hence e is an isthmus of M if and only if e is
a loop of M⊥
.
Proposition 5.2.8 Let (E, I) be a matroid with rank function r. Then the
dual matroid has rank function r⊥
given by
r⊥
(J) = |J| − r(E) + r(E J).
Proof. The proof is based on the observation that r(J) = maxB∈B |B ∩ J|
and B J = B ∩ (E J).
r⊥
(J) = max
B∈B⊥
|B ∩ J|
= max
B∈B
|(E B) ∩ J|
= max
B∈B
|J B|
= |J| − min
B∈B
|J ∩ B|
= |J| − (|B| − max
B∈B
|B J|)
= |J| − r(E) + max
B∈B
|B ∩ (E J)|
= |J| − r(E) + r(E J).
5.2.2 Realizable matroids
Definition 5.2.9 Let M1 = (E1, I1) and M2 = (E2, I2) be matroids. A map
ϕ : E1 → E2 is called a morphism of matroids if ϕ(I) ∈ I2 for all I ∈ I1. The
map is called an isomorphism of matroids if it is a morphism of matroids and
there exists a map ψ : E2 → E1 such that it is a morphism of matroids and it is
the inverse of ϕ. The matroids are called isomorphic if there is an isomorphism
of matroids between them.
Remark 5.2.10 A matroid M is called realizable or representable over the field
F if there exists a matrix G with entries in F such that M is isomorphic with MG.
***six points in a plane is realizable over every field?,
*** The Fano plane is realizable over F if and only if F has characteristic two.
***Pappos, Desargues configuration.

For more on representable matroids we refer to Tutte [123] and Whittle [132,
133]. Let gn be the number of simple matroids on n points. The values of gn
are determined for n ≤ 8 by [18] and are given in the following table:
n 1 2 3 4 5 6 7 8
gn 1 1 2 4 9 26 101 950
Extended tables can be found in [51]. Clearly gn ≤ 22n
. Asymptotically the
number gn is given in [73] and is as follows:
gn ≤ n − log2 n + O(log2 log2 n),
gn ≥ n − 3
2 log2 n + O(log2 log2 n).
A crude upper bound on the number of k×n matrices with k ≤ n and entries in
Fq is given by (n+1)qn2
. Hence the vast majority of all matroids on n elements
is not representable over a given finite field for n → ∞.
5.2.3 Graphs and matroids
Definition 5.2.11 Let M = (E, I) be a matroid. A subset C of E is called a
circuit if it is dependent and all its proper subsets are independent. A circuit
of the dual matroid of M is called a cocircuit of M.
Proposition 5.2.12 Let C be the collection of circuits of a matroid. Then
(C.0) ∅ ∈ C.
(C.1) If C1, C2 ∈ C and C1 ⊆ C2, then C1 = C2.
(C.2) If C1, C2 ∈ C and C1 = C2 and x ∈ C1 ∩ C2, then there exists a C3 ∈ C
such that C3 ⊆ (C1 ∪ C2) {x}.
Proof. See [?, Lemma 1.1.3].
Condition (C.2) is called the circuit elimination axiom. The converse of Propo-
sition 5.2.12 holds.
Proposition 5.2.13 Let C be a collection of subsets of a finite set E that satis-
fies the conditions (C.1), (C.2) and (C.3). Let I be the collection of all subsets
of E that contain no member of C. Then (E, I) is a matroid with C as its
collection of circuits.
Proof. See [?, Theorem 1.1.4].
Proposition 5.2.14 Let Γ = (V, E) be a finite graph. Let C the collection of all
subsets {e1, . . . , et} such that (e1, . . . , et) is a cycle in Γ. Then C is the collection
of circuits of a matroid MΓ on E. This matroid is called the cycle matroid of
Γ.
Proof. See [?, Proposition 1.1.7].

Remark 5.2.15 Loops in Γ correspond one-to-one to loops in MΓ. Two edges
that are no loops, are parallel in Γ if and only if they are parallel in MΓ. So Γ
is simple if and only if MΓ is simple. Let e in E. Then e is an isthmus in the
graph Γ if and only is e is an isthmus in the matroid MΓ.
Remark 5.2.16 A matroid M is called graphic if M is isomorphic with MΓ
for some graph Γ, and it is called cographic if M⊥
is graphic. If Γ is a planar
graph, then the matroid MΓ is graphic by definition but it is also cographic.
Let Γ be a finite graph with incidence matrix I(Γ). This is a generator matrix
for CΓ over a field F. Suppose that F is the binary field. Look at all the columns
indexed by the edges of a cycle of Γ. Since every vertex in a cycle is incident
with exactly two edges, the sum of these columns is zero and therefore they
are dependent. Removing a column gives an independent set of vectors. Hence
the cycles in the matroid MCΓ
coincide with the cycles in Γ. Therefore MΓ is
isomorphic with MCΓ
. One can generalize this argument for any field. Hence
graphic matroids are representable over any field.
The matroids of the binary Hamming [7, 4, 3] code is not graphic and not co-
graphic. Clearly the matroids MK5
and MK3,3
are graphic by definition, but
they are not cographic. Tutte [122] found a classification for graphic matroids.
5.2.4 Tutte and Whitney polynomial of a matroid
See [7, 8, 25, 26, 28, 34, 59, 68] for references of this section.
Definition 5.2.17 Let M = (E, I) be a matroid. Then the Whitney rank
generating function RM (X, Y ) is defined by
RM (X, Y ) =
J⊆E
Xr(E)−r(J)
Y |J|−r(J)
and the Tutte-Whitney or Tutte polynomial by
tM (X, Y ) =
J⊆E
(X − 1)r(E)−r(J)
(Y − 1)|J|−r(J)
.
In other words,
tM (X, Y ) = RM (X − 1, Y − 1).
Remark 5.2.18 Whitney [129] defined the coefficients mij of the polynomial
RM (X, Y ) such that
RM (X, Y ) =
r(M)
i=0
|M|
j=0
mijXi
Y j
,
but he did not define the polynomial RM (X, Y ) as such. It is clear that these
coefficients are nonnegative, since they count the number of elements of certain
sets. The coefficients of the Tutte polynomial are also nonnegative, but this is
not a trivial fact, it follows from the counting of certain internal and external
bases of a matroid. See [56].

5.2.5 Weight enumerator and Tutte polynomial
As we have seen, we can interpret a linear [n, k] code C over Fq as a matroid
via the columns of a generator matrix G.
Proposition 5.2.19 Let C be a [n, k] code over Fq. Then the Tutte polynomial
tC associated with the matroid MC of the code C is
tC(X, Y ) =
n
t=0 |J|=t
(X − 1)l(J)
(Y − 1)l(J)−(k−t)
.
Proof. This follows from l(J) = k − r(J) by Lemma 4.4.12 and r(M) = k.
This formula and Proposition 4.4.41 suggest the next connection between the
weight enumerator and the Tutte polynomial. Greene [59] was the ﬁrst to notice
this connection.
Theorem 5.2.20 Let C be a [n, k] code over Fq with generator matrix G. Then
the following holds for the Tutte polynomial and the extended weight enumerator:
WC(X, Y, T) = (X − Y )k
Y n−k
tC
X + (T − 1)Y
X − Y
,
X
Y
.
Proof. By using Proposition 5.2.19 about the Tutte polynomial, rewriting,
and Proposition 4.4.41 we get
(X − Y )k
Y n−k
tC
X + (T − 1)Y
X − Y
,
X
Y
= (X − Y )k
Y n−k
n
t=0 |J|=t
TY
X − Y
l(J)
X − Y
Y
l(J)−(k−t)
= (X − Y )k
Y n−k
n
t=0 |J|=t
Tl(J)
Y k−t
(X − Y )−(k−t)
=
n
t=0 |J|=t
Tl(J)
(X − Y )t
Y n−t
= WC(X, Y, T).
We use the extended weight enumerator here, because extending a code does
not change the generator matrix and therefore not the matroid G. The converse
of this theorem is also true: the Tutte polynomial is completely deﬁned by the
extended weight enumerator.
Theorem 5.2.21 Let C be a [n, k] code over Fq. Then the following holds for
the extended weight enumerator and the Tutte polynomial:
tC(X, Y ) = Y n
(Y − 1)−k
WC(1, Y −1
, (X − 1)(Y − 1)) .

Proof. The proof of this theorem goes analogous to the proof of the previous
theorem.
Y n
(Y − 1)−k
WC(1, Y −1
, (X − 1)(Y − 1))
= Y n
(Y − 1)−k
n
t=0 |J|=t
((X − 1)(Y − 1))
l(J)
(1 − Y −1
)t
Y −(n−t)
=
n
t=0 |J|=t
(X − 1)l(J)
(Y − 1)l(J)
Y −t
(Y − 1)t
Y −(n−k)
Y n
(Y − 1)−k
=
n
t=0 |J|=t
(X − 1)l(J)
(Y − 1)l(J)−(k−t)
= tC(X, Y ).
We see that the Tutte polynomial depends on two variables, while the ex-
tended weight enumerator depends on three variables. This is no problem,
because the weight enumerator is given in its homogeneous form here: we
can view the extended weight enumerator as a polynomial in two variables via
WC(Z, T) = WC(1, Z, T).
Greene [59] already showed that the Tutte polynomial determines the weight
enumerator, but not the other way round. By using the extended weight enu-
merator, we get a two-way equivalence and the proof reduces to rewriting.
We can also give expressions for the generalized weight enumerator in terms of
the Tutte polynomial, and the other way round. The first formula was found
by Britz [28] and independently by Jurrius [68].
Theorem 5.2.22 For the generalized weight enumerator of a [n, k] code Cand
the associated Tutte polynomial we have that W
(r)
C (X, Y ) is equal to
1
r q
r
j=0
r
j q
(−1)r−j
q(r
j)(X − Y )k
Y n−k
tC
X + (qj
− 1)Y
X − Y
,
X
Y
.
And, conversely,
tC(X, Y ) = Y n
(Y − 1)−k
k
r=0


r−1
j=0
((X − 1)(Y − 1) − qj
)

 W
(r)
C (1, Y −1
) .
Proof. For the first formula, use Theorems 4.5.23 and 5.2.20. Use Theorems
4.5.21 and 5.2.21 for the second formula.
5.2.6 Deletion and contraction of matroids
Definition 5.2.23 Let M = (E, I) be a matroid of rank k. Let e be an element
of E. Then the deletion M e is the matroid on the set E {e} with independent
sets of the form I {e} where I is independent in M. The contraction M/e is
the matroid on the set E {e} with independent sets of the form I {e} where
I is independent in M and e ∈ I.

Remark 5.2.24 Let M be a graphic matroid. So M = MΓ for some finite
graph Γ. Let e be an edge of Γ, then M e = MΓe and M/e = MΓ/e.
Remark 5.2.25 Let C be a code with reduced generator matrix G at position
e. So a = (1, 0, . . . , 0)T
is the column of G at position e. Then M e =
MGa and M/e = MG/a. A puncturing-shortening formula for the extended
weight enumerator is given in Proposition 4.4.44. By virtue of the fact that
the extended weight enumerator and the Tutte polynomial of a code determine
each other by the Theorems 5.2.20 and 5.2.21, one expects that an analogous
generalization for the Tutte polynomial of matroids holds.
Proposition 5.2.26 Let M = (E, I) be a matroid. Let e ∈ E that is not a loop
and not an isthmus. Then the following deletion-contraction formula holds:
tM (X, Y ) = tMe(X, Y ) + tM/e(X, Y ).
Proof. See [119, 120, 125, 31].
5.2.7 McWilliams type property for duality
For both codes and matroids we defined the dual structure. These objects
obviously completely define there dual. But how about the various polynomials
associated to a code and a matroid? We know from Example 4.5.17 that the
weight enumerator is a less strong invariant for a code then the code itself: this
means there are non-equivalent codes with the same weight enumerator. So it
is a priori not clear that the weight enumerator of a code completely defines the
weight enumerator of its dual code. We already saw that there is in fact such
a relation, namely the MacWilliams identity in Theorem 4.1.22. We will give a
proof of this relation by considering the more general question for the extended
weight enumerator. We will prove the MacWilliams identities using the Tutte
polynomial. We do this because of the following simple and very useful relation
between the Tutte polynomial of a matroid and its dual.
Theorem 5.2.27 Let tM (X, Y ) be the Tutte polynomial of a matroid M, and
let M⊥
be the dual matroid. Then
tM (X, Y ) = tM⊥ (Y, X).
Proof. Let M be a matroid on the set E. Then M⊥
is a matroid on the same
set. In Proposition 5.2.8 we proved r⊥
(J) = |J|−r(E)+r(E J). In particular,
we have r⊥
(E) + r(E) = |E|. Substituting this relation into the definition of
the Tutte polynomial for the dual code, gives
tM⊥ (X, Y ) =
J⊆E
(X − 1)r⊥
(E)−r⊥
(J)
(Y − 1)|J|−r⊥
(J)
=
J⊆E
(X − 1)r⊥
(E)−|J|−r(EJ)+r(E)
(Y − 1)r(E)−r(EJ)
=
J⊆E
(X − 1)|EJ|−r(EJ)
(Y − 1)r(E)−r(EJ)
= tM (Y, X)

In the last step, we use that the summation over all J ⊆ E is the same as a
summation over all E J ⊆ E. This proves the theorem.
If we consider a code as a matroid, then the dual matroid is the dual code.
Therefore we can use the above theorem to prove the MacWilliams relations.
Greene[59] was the ﬁrst to use this idea, see also Brylawsky and Oxley[33].
Theorem 5.2.28 (MacWilliams) Let C be a code and let C⊥
be its dual.
Then the extended weight enumerator of C completely determines the extended
weight enumerator of C⊥
and vice versa, via the following formula:
WC⊥ (X, Y, T) = T−k
WC(X + (T − 1)Y, X − Y, T).
Proof. Let G be the matroid associated to the code. Using the previous theo-
rem and the relation between the weight enumerator and the Tutte polynomial,
we ﬁnd
T−k
WC(X + (T − 1)Y, X − Y, T)
= T−k
(TY )k
(X − Y )n−k
tC
X
Y
,
X + (T − 1)Y
X − Y
= Y k
(X − Y )n−k
tC⊥
X + (T − 1)Y
X − Y
,
X
Y
= WC⊥ (X, Y, T).
Notice in the last step that dim C⊥
= n − k, and n − (n − k) = k.
We can use the relations in Theorems 4.5.21 and 4.5.23 to prove the MacWilliams
identities for the generalized weight enumerator.
Theorem 5.2.29 Let C be a code and let C⊥
be its dual. Then the generalized
weight enumerators of C completely determine the generalized weight enumera-
tors of C⊥
and vice versa, via the following formula:
W
(r)
C⊥ (X, Y ) =
r
j=0
j
l=0
(−1)r−j q(r−j
2 )−j(r−j)−l(j−l)−jk
r − j q j − l q
W
(l)
C (X+(qj
−1)Y, X−Y ).
Proof. We write the generalized weight enumerator in terms of the extended
weight enumerator, use the MacWilliams identities for the extended weight enu-

merator, and convert back to the generalized weight enumerator.
W
(r)
C⊥ (X, Y ) =
1
r q
r
j=0
r
j q
(−1)r−j
q(r−j
2 ) WC⊥ (X, Y, qi
)
=
r
j=0
(−1)r−j q(r−j
2 )−j(r−j)
j q r − j q
q−jk
Wc(X + (qj
− 1)Y, X − Y, qj
)
=
r
j=0
(−1)r−j q(r−j
2 )−j(r−j)−jk
j q r − j q
×
j
l=0
j q
ql(j−l) j − l q
W
(l)
C (X + (qj
− 1)Y, X − Y )
=
r
j=0
j
l=0
(−1)r−j q(r−j
2 )−j(r−j)−l(j−l)−jk
r − j q j − l q
×W
(l)
C (X + (qj
− 1)Y, X − Y ).
This theorem was proved by Kløve[72], although the proof uses only half of the
relations between the generalized weight enumerator and the extended weight
enumerator. Using both makes the proof much shorter.
5.2.8 Exercises
5.2.1 Give a proof of the statements in Remark 5.2.2.
5.2.3 Show that all matroids on at most 3 elements are graphic. Give an
example of a matroid that is not graphic.
5.3 Geometric lattices and codes
***Intro***
5.3.1 Posets, the Möbius function and lattices
Definition 5.3.1 Let L be a set and ≤ a relation on L such that:
(PO.1) x ≤ x, for all x in L (reflexive),
(PO.2) if x ≤ y and y ≤ x, then x = y, for all x, y in L (anti-symmetric),
(PO.3) if x ≤ y and y ≤ z, then x ≤ z, for all x, y and z in L (transitive).
Then the pair (L, ≤) or just L is called a poset with partial order ≤ on the set
L. Define x y if x ≤ y and x = y. The elements x and y in L are comparable
if x ≤ y or y ≤ x. A poset L is called a linear order if every two elements
are comparable. Define Lx = {y ∈ L|x ≤ y} and Lx
= {y ∈ L|y ≤ x} and
the the interval between x and y by [x, y] = {z ∈ L|x ≤ z ≤ y}. Notice that
[x, y] = Lx ∩ Ly
.

5.3. GEOMETRIC LATTICES AND CODES 137
Definition 5.3.2 Let (L, ≤) be a poset. A chain of length r from x to y in L
is a sequence of elements x0, x1, . . . , xr in L such that
x = x0 x1 · · · xr = y.
Let r be a number. Let x, y in L. Then cr(x, y) denotes the number of chains
of length r from x to y. Now cr(x, y) is finite if L is finite. The poset is called
locally finite if cr(x, y) is finite for all x, y ∈ L and every number r.
Proposition 5.3.3 Let L be a locally finite poset. Let x ≤ y in L. Then
(C.0) c0(x, y) = 0 if x and y are not comparable.
(C.1) c0(x, x) = 1, cr(x, x) = 0 for all r 0 and c0(x, y) = 0 if x y.
(C.2) cr+1(x, y) = x≤zy cr(x, z) = xz≤y cr(z, y).
Proof. Statements (C.0) and (C.1) are trivial. Let z y and x = x0 x1
· · · xr = z a chain of length r from x to z, then x = x0 x1 · · · xr
xr+1 = y is a chain of length r+1 from x to y, and every chain of length r+1 from
x to y is obtained uniquely in this way. Hence cr+1(x, y) = x≤zy cr(x, z).
The last equality is proved similarly.
Definition 5.3.4 The Möbius function of L, denoted by µL or µ is defined by
µ(x, y) =
∞
r=0
(−1)r
cr(x, y).
Proposition 5.3.5 Let L be a locally finite poset. Then for all x, y in L:
(M.0) µ(x, y) = 0 if x and y are not comparable.
(M.1) µ(x, x) = 1.
(M.2) If x y, then x≤z≤y µ(x, z) = x≤z≤y µ(z, y) = 0.
(M.3) If x y, then µ(x, y) = − x≤zy µ(x, z) = − xz≤y µ(z, y).
Proof.
(M.0) and (M.1) follow from (C.0) and (C.1), respectively of Proposition 5.3.3.
(M.2) is clearly equivalent with (M.3).
(M.3) If x y, then c0(x, y) = 0. So
µ(x, y) =
∞
r=1
(−1)r
cr(x, y) =
∞
r=0
(−1)r+1
cr+1(x, y) =
−
∞
r=0
(−1)r
x≤zy
cr(x, z) = −
x≤zy
∞
r=0
(−1)r
cr(x, z) = −
x≤zy
µ(x, z).
The first and last equality uses the definition of µ. The second equality starts
counting at r = 0 instead of r = 1, the third uses (C.2) of Proposition 5.3.3 and
in the fourth the order of summation is interchanged.

Remark 5.3.6 (M.1) and (M.3) of Proposition 5.3.5 can be used as an alter-
native way to compute µ(x, y) by induction.
Definition 5.3.7 Let L be a poset. If L has an element 0L such that 0L is the
unique minimal element of L, then 0L is called the minimum of L. Similarly 1L
is called the maximum of L if 1L is the unique maximal element of L. If x, y
in L and x ≤ y, then the interval [x, y] has x as minimum and y as maximum.
Suppose that L has 0L and 1L as minimum and maximum also denoted by 0
and 1, respectively. Then 0 ≤ x ≤ 1 for all x ∈ L. Define µ(x) = µ(0, x) and
µ(L) = µ(0, 1) if L is finite.
Definition 5.3.8 Let L be a locally finite poset with a minimum element. Let
A be an abelian group and f : L → A a map from L to A. The sum function ˆf
of f is defined by
ˆf(x) =
y≤x
f(y).
Define similarly the sum functions ˇf of f by ˇf(x) = x≤y f(y) if L is a locally
finite poset with a maximum element.
Remark 5.3.9 A poset L is locally finite if and only if [x, y] is finite for all
x ≤ y in L. So [0, x] is finite if L is a locally finite poset with minimum element
0. Hence the sum function ˆf(x) is well-defined, since it is a finite sum of f(y)
in A with y in [0, x]. In the same way ˇf(x) is well-defined, since [x, 1] is finite.
Theorem 5.3.10 (Möbius inversion formula) Let L be a locally finite poset
with a minimum element. Then
f(x) =
y≤x
µ(y, x) ˆf(y).
Similarly f(x) = x≤y µ(x, y) ˇf(y) if L is a locally finite poset with a maximum
element.
Proof. Let x be an element of L. Then
y≤x
µ(y, x) ˆf(y) =
y≤x z≤y
µ(y, x)f(z) =
z≤x
f(z)
z≤y≤x
µ(y, x) =
f(x)µ(x, x) +
zx
f(z)
z≤y≤x
µ(y, x) = f(x)
The first equality uses the definition of ˆf(y). In the second equality the order
of summation is interchanged. In the third equality the first summation is split
in the parts z = x and z x, respectively. Finally µ(x, x) = 1 and the second
summation is zero for all z x, by Proposition 5.3.5.
The proof of the second equality is similar.
Example 5.3.11 Let f(x) = 1 if x = 0 and f(x) = 0 otherwise. Then the
sum function ˆf(x) = y≤x f(y) is constant 1 for all x. The Möbius inversion
formula gives that
y≤x
µ(x) =
1 if x = 0,
0 if x 0,
which is a special case of Proposition 5.3.5.

Remark 5.3.12 Let (L, ≤) be a poset. Let ≤R be the reverse relation on L
defined by x ≤R y if and only if y ≤ x. Then (L, ≤R) is a poset. Suppose that
(L, ≤) is locally finite with Möbius function µ. Then the number of chains of
length r from x to y in (L, ≤R) is the same as the number of chains of length r
from y to x in (L, ≤). Hence (L, ≤R) is locally finite with Möbius function µR
such that µR(x, y) = µ(y, x). If (L, ≤) has minimum 0L or maximum 1L, then
(L, ≤R) has minimum 1L or maximum 0L, respectively.
Definition 5.3.13 Let L be a poset. Let x, y ∈ L. Then y is called a cover of
x if x y, and there is no z such that x z y. The Hasse diagram of L is
a directed graph that has the elements of L as vertices, and there is a directed
edge from y to x if and only if y is a cover of x.
***picture***
Example 5.3.14 Let L = Z be the set of integers with the usual linear order.
Let x, y ∈ L and x ≤ y. Then c0(x, x) = 1, c0(x, y) = 0 if x y, and
cr(x, y) = y−x−1
r−1 for all r ≥ 1. So L infinite and locally finite. Furthermore
µ(x, x) = 1, µ(x, x + 1) = −1 and µ(x, y) = 0 if y x + 1.
Definition 5.3.15 Let L be a poset. Let x, y in L. Then x and y have a least
upper bound if there is a z ∈ L such that x ≤ z and y ≤ z, and if x ≤ w and
y ≤ w, then z ≤ w for all w ∈ L. If x and y have a least upper bound, then
such an element is unique and it is called the join of x and y and denoted by
x ∨ y. Similarly the greatest lower bound of x and y is defined. If it exists, then
it is unique and it is called the meet of x and y and denoted by x ∧ y. A poset
L is called a lattice if x ∨ y and x ∧ y exist for all x, y in L.
Remark 5.3.16 Let (L, ≤) be a finite poset with maximum 1 such that x ∧ y
exists for all x, y ∈ L. The collection {z|x ≤ z, y ≤ z} is finite and not empty,
since it contains 1. The meet of all the elements in this collection is well defined
and is given by
x ∨ y = { z | x ≤ z, y ≤ z}.
Hence L is a lattice. Similarly L is a lattice if L is a finite poset with minimum
0 such that x ∨ y exists for all x, y ∈ L, since x ∧ y = { z | z ≤ x, z ≤ y}.
Example 5.3.17 Let L be the collection of all finite subsets of a given set X.
Let ≤ be defined by the inclusion, that means I ≤ J if and only if I ⊆ J. Then
0L = ∅, and L has a maximum if and only if X is finite in which case 1L = X.
Let I, J ∈ L and I ≤ J. Then |I| ≤ |J| ∞. Let m = |J| − |I|. Then
cr(I, J) =
m1m2···mr−1m
m2
m1
m3
m2
· · ·
m
mr−1
.
Hence L is locally finite. L is finite if and only if X is finite. Furthermore
I ∨ J = I ∪ J and I ∧ J = I ∩ J. So L is a lattice. Using Remark 5.3.6 we see
that µ(I, J) = (−1)|J|−|I|
if I ≤ J. This is much easier than computing µ(I, J)
by means of Definition 5.3.4.

Example 5.3.18 Now suppose that X = {1, . . . , n}. Let L be the poset of
subsets of X. Let A1, . . . , An be a collection of subsets of a finite set A. Define
for a subset J of X
AJ =
j∈J
Aj and f(J) = |AJ
IJ
AI |.
Then AJ is the disjoint union of the subsets AI ( KI AK) for all I ≤ J.
Hence the sum function
ˆf(J) =
I≤J
f(I) =
I≤J
|AI
KI
AK | = |AJ |.
Möbius inversion gives that
|AJ
IJ
AI | =
I≤J
(−1)|J|−|I|
|AI|
which is called the principle of inclusion/exclusion.
Example 5.3.19 A variant of the principle of inclusion/exclusion is given as
follows. Let H1, . . . , Hn be a collection of subsets of a finite set H. Let L be the
poset of all intersections of the Hj with the reverse inclusion as partial order.
Then H is the minimum of L and H1 ∩ · · · ∩ Hn is the maximum of L. Let
x ∈ L. Define
f(x) = |x
xy
y |.
Then
ˇf(x) =
x≤y
f(y) =
x≤y
|y
yz
z | = |x|.
Hence
|x
xy
y | =
x≤y
µ(x, y)|y|.
Example 5.3.20 Let L = N be the set of positive integers with the divisibility
relation as partial order. Then 0L = 1 is the minimum of L, it is locally finite
and it has no maximum. Now m∨n = lcm(m, n) and m∧n = gcd(m, n). Hence
L is a lattice. By Remark 5.3.6 we see that
µ(n) =



1 if n = 1,
(−1)r
if n is the product of r mutually distinct primes,
0 if n is divisible by the square of a prime.
Hence µ(n) is the classical Möbius function. Furthermore µ(d, n) = µ(n
d ) if d|n.
Let
ϕ(n) = |{i ∈ N| gcd(i, n) = 1}|
be Euler’s ϕ function. Define
Vd = {i ∈ {1, . . . , n}| gcd(i, n) = n
d }

for d|n. Then
{ i ∈ {1, . . . , d} | gcd(i, d) = 1 } · n
d = Vd.
so |Vd| = ϕ(d). Now {1, . . . , n} is the disjoint union of the subsets Vd with d|n.
Hence the sum function of ϕ(n) is given by
ˆϕ(n) =
d|n
ϕ(d) = n.
Therefore
ϕ(n) =
d|n
µ(d)
n
d
,
by Möbius inversion.
Definition 5.3.21 Let (L1, ≤1) and (L2, ≤2) be posets. A map ϕ : L1 → L2
is called monotone if ϕ(x) ≤2 ϕ(y) for all x ≤1 y in L1. The map ϕ is called
strictly monotone if ϕ(x) 2 ϕ(y) for all x 1 y in L1. The map is called
an isomorphism of posets if it is strictly monotone and there exists a strictly
monotone map ψ : L2 → L1 that is the inverse of ϕ. The posets are called
isomorphic if there is an isomorphism of posets between them.
Remark 5.3.22 If ϕ : L1 → L2 is an isomorphism between locally finite posets
with a minimum, then µ2(ϕ(x), ϕ(y)) = µ1(x, y) for all x, y in L1.
If (L1, ≤1) and (L2, ≤2) are isomorphic posets and L1 is a lattice, then L2 is
also a lattice.
Example 5.3.23 Let n be a positive integer that is the product of r mutually
distinct primes p1, . . . , pr. Let L1 be the set of all positive integers that divide
n with divisibility as partial order ≤1 as in Example 5.3.20. Let L2 be the
collection of all subsets of {1, . . . , r} with the inclusion as partial order ≤2 as
in Example 5.3.17. Define the maps ϕ : L1 → L2 and ψ : L2 → L1 by ϕ(d) =
{i|pi divides n} and ψ(x) = i∈x pi. Then ϕ and ψ are strictly monotone and
they are inverses of each other. Hence L1 and L2 are isomorphic lattices.
5.3.2 Geometric lattices
Remark 5.3.24 Let (L, ≤) be a lattice without infinite chains. Then L has a
minimum and a maximum.
Definition 5.3.25 Let L be a lattice with minimum 0. An atom is an element
a ∈ L that is a cover of 0. A lattice is called atomic if for every x 0 in L there
exist atoms a1, . . . , ar such that x = a1 ∨ · · · ∨ ar, and the minimum possible
r is called the rank of x and is denoted by rL(x) or r(x) for short. A lattice
is called semimodular if for all mutually distinct x, y in L, x ∨ y covers x and
y if there exists a z such that x and y cover z. A lattice is called modular if
x ∨ (y ∧ z) = (x ∨ y) ∧ z for all x, y and z in L such that x ≤ z. A lattice L
is called a geometric lattice if it is atomic and semimodular and has no infinite
chains. If L is a geometric lattice L, then it has a minimum and a maximum
and r(1) is called the rank of L and is denoted by r(L).

Example 5.3.26 Let L be the collection of all finite subsets of a given set X as
in Example 5.3.17. The atoms are the singleton sets, that is subsets consisting
of exactly one element of X. Every x ∈ L is the finite union of its singleton
subsets. So L is atomic and r(x) = |x|. Now y covers x if and only if there
is an element Q not in x such that y = x ∪ {Q}. If x = y and x and y both
cover z, then there is an element P not in z such that x = z ∪ {P}, and there
is an element Q not in z such that y = z ∪ {Q}. Now P = Q, since x = y.
Hence x ∨ y = z ∪ {P, Q} covers x and y. Hence L is semimodular. In fact L is
modular. L is locally finite. L is a geometric lattice if and only if X is finite.
Example 5.3.27 Let L be the set of positive integers with the divisibility rela-
tion as in Example 5.3.20. The atoms of L are the primes. But L is not atomic,
since a square is not the join of finitely many elements. L is semimodular. The
interval [1, n] in L is a geometric lattice if and only if n is square free. If n
is square free and m ≤ n, then r(m) = r if and only if m is the product of r
mutually distinct primes.
Proposition 5.3.28 Let L be a geometric lattice. Then for all x, y ∈ L:
(GL.1) If x y, then r(x) r(y) (strictly monotone)
(GL.2) r(x ∨ y) + r(x ∧ y) ≤ r(x) + r(y) (semimodular inequality)
(GL.3) All maximal chains from 0 to x have the same length r(x).
Proof. See [113, Prop. 3.3.2] and [114, Prop. 3.7].
Remark 5.3.29 Let L be an atomic lattice. Then L is semimodular if and only
if the semimodular inequality (GL.2) holds for all x, y ∈ L. And L is modular
if and only if the modular equality:
r(x ∨ y) + r(x ∧ y) = r(x) + r(y) for all x, y ∈ L.
Remark 5.3.30 Let L be a geometric lattice. Let x, y ∈ L and x ≤ y. The
chain x = y0 y1 · · · ys = y from x to y is called an extension of the chain
x = x0 x1 · · · xr = y if {x0, x1, . . . , xr} is a subset of {y0, y1, . . . , ys}. A
chain from x to y is called maximal if there is no extension to a longer chain
from x to y. Every chain from x to y can be extended to a maximal chain
with the same end points, and all such maximal chains have the same length
r(y) − r(x). This is called the Jordan-Hölder property.
Remark 5.3.31 Let L be a geometric lattice. Let Lj = {x ∈ L|r(x) = j}.
Then Lj is called the level of L. Then the Hasse diagram of L is a graph that
has the elements of L as vertices. If x, y ∈ L, x y and r(y) = r(x) + 1, then
x and y are connected by an edge. So only elements between two consecutive
levels Lj and Lj+1 are connected by an edge. The Hasse diagram of L consid-
ered as a poset as in Definition 5.3.13 is the directed graph with an arrow from
y to x if x, y ∈ L, x y and r(y) = r(x) + 1.
***picture***

Remark 5.3.32 Let L be a geometric lattice. Then Lx is a geometric lattice
with x as minimum element and of rank rL(1) − rL(x), and µLx
(y) = µ(x, y)
and rLx
(y) = rL(y) − rL(x) for all x ∈ L and y ∈ Lx. Similar remarks hold for
Lx
and [x, y].
Example 5.3.33 Let L be the collection of all linear subspaces of a given finite
dimensional vector space V over a field F with the inclusion as partial order.
Then 0L = {0} is the minimum and 1L = V is the maximum of L. The partial
order L is locally finite if and only if L is finite if and only if the field F is finite.
Let x and y be linear subspaces of V . Then x ∩ y the intersection of x and y is
the largest linear subspace that is contained in x and y. So x ∧ y = x ∩ y. The
sum x + y of of x and y is by definition the set of elements a + b with a in x
and b in y. Then x + y is the smallest linear subspace containing both x and
y. Hence x ∨ y = x + y. So L is a lattice. The atoms are the one dimensional
linear subspaces. Let x be a subspace of dimension r over F. So x is generated
by a basis g1, . . . , gr. Let ai be the one dimensional subspace generated by gi.
Then x = a1 ∨ · · · ∨ ar. Hence L is atomic and r(x) = dim(x). Moreover L is
modular, since
dim(x ∩ y) + dim(x + y) = dim(x) + dim(y)
for all x, y ∈ L. Furthermore L has no infinite chains, since V is finite dimen-
sional. Therefore L is a modular geometric lattice.
Example 5.3.34 Let F be a field. Let V = (v1, . . . , vn) be an n-tuple of
nonzero vectors in Fk
. Let L = L(V) be the collection of all linear subspaces of
Fk
that are generated by subsets of V with inclusion as partial order. So L is
finite and a fortiori locally finite. By definition {0} is the linear subspace space
generated by the empty set. Then 0L = {0} and 1L is the subspace generated
by all v1, . . . , vn. Furthermore L is a lattice with x ∨ y = x + y and
x ∧ y = { z | z ≤ x, z ≤ y}
by Remark 5.3.16. Let aj be the linear subspace generated by vj. Then
a1, . . . , an are the atoms of L. Let x be the subspace generated by {vj|j ∈ J}.
Then x = j∈J aj. If x has dimension r, then there exists a subset I of J such
that |I| = r and x = i∈I ai. Hence L is atomic and r(x) = dim(x). Now
x ∧ y ⊆ x ∩ y, so
r(x ∨ y) + r(x ∧ y) ≤ dim(x + y) + dim(x ∩ y) = r(x) + r(y).
Hence the semimodular inequality holds and L is a geometric lattice. In most
cases L is not modular.
Example 5.3.35 Let F be a field. Let A = (H1, . . . , Hn) be an arrangement
over F of hyperplanes in the vector space V = Fk
. Let L = L(A) be the
collection of all nonempty intersections of elements of A. By definition Fk
is the
empty intersection. Define the partial order ≤ by
x ≤ y if and only if y ⊆ x.

Then V is the minimum element and {0} is the maximum element. Furthermore
x ∨ y = x ∩ y if x ∩ y = ∅, and x ∧ y = { z | x ∪ y ⊆ z }.
Suppose that A is a central arrangement. Then x∩y is nonempty for all x, y in L.
So x∨y and x∧y exist for all x, y in L, and L is a lattice. Let vj = (v1j, . . . , vkj)
be a nonzero vector such that
k
i=1 vijXi = 0 is a homogeneous equation of Hj.
Let V = (v1, . . . , vn). Consider the map ϕ : L(V) → L(A) defined by
ϕ(x) =
j∈J
Hj if x is the subspace generated by {vj|j ∈ J}.
Now x ⊂ y if and only if ϕ(y) ⊂ ϕ(x) for all x, y ∈ L(V). So ϕ is a strictly
monotone map. Furthermore ϕ is a bijection and its inverse map is also strictly
monotone. Hence L(V) and L(A) are isomorphic lattices. Therefore L(A) is
also a geometric lattice.
5.3.3 Geometric lattices and matroids
The notion of a geometric lattice is ”cryptomorphic” that is almost equivalent
to a matroid. See [34, 38, 44, ?, 114].
Proposition 5.3.36 Let L be a finite geometric lattice. Let M(L) be the set
of all atoms of L. Let I(L) be the collection of all subsets I of M(L) such that
r(a1 ∨ · · · ∨ ar) = r if I = {a1, . . . , ar} is a collection of r atoms of L. Then
(M(L), I(L)) is a matroid.
Proof. The proof is left as an exercise.
Proposition 5.3.37 (Rota’s Crosscut Theorem) Let L be a finite geomet-
ric lattice. Let M(L) be the matroid associated with L. Then
χL(T) =
I⊆M(L)
(−1)|I|
Tr(L)−r(I)
.
Proof. See [101] and [24, Theorem 3.1].
Definition 5.3.38 Let (M, I) be a matroid. An element x in M is called a
loop if {x} is a dependent set. Let x and y in M be two distinct elements that
are not loops. Then x and y are called parallel if r({x, y}) = 1. The matroid is
called simple if it has no loops and no parallel elements.
Remark 5.3.39 Let G be a k × n matrix with entries in a field F. Let MG
be the set {1, . . . , n} indexing the columns of G and IG be the collection of all
subsets I of MG such that the submatrix GI consisting of the columns of G
at the positions of I are independent. Then (MG, IG) is a matroid. Suppose
that F is a finite field and G1 and G2 are generator matrices of a code C, then
(MG1 , IG1 ) = (MG2 , IG2 ). So the matroid (MC, IC) of a code C is well defined
by (MG, IG) for some generator matrix G of C. If C is degenerate, then there
is a position i such that ci = 0 for every codeword c ∈ C and all such positions
correspond one-to-one with loops of MC. Let C be nondegenerate. Then MC

has no loops, and the positions i and j with i = j are parallel in MC if and
only if the i-th column of G is a scalar multiple of the j-th column. The code
C is projective if and only if the arrangement AG is simple if and only if the
matroid MC is simple. An [n, k] code C is MDS if and only if the matroid MC
is the uniform matroid Un,k.
Remark 5.3.40 Let C be a projective code with generator matrix G. Then AG
is an essential simple arrangement with geometric lattice L(AG). Furthermore
the matroids M(L(AG)) and MC are isomorphic.
Definition 5.3.41 Let (M, I) be a matroid. A k-flat of M is a maximal subset
of M of rank k. Let L(M) be the collection of all flats of M, it is called the
lattice of flats of M. Let J be a subset of M. Then the closure ¯J is by definition
the intersection of all flats that contain J.
Remark 5.3.42 M is a k-flat with k = r(M). If F1 and F2 are flats, then
F1 ∩ F2 is also a flat. Consider L(M) with the inclusion as partial order. Then
M is the maximum of L(M). And F1 ∩F2 = F1 ∧F2 for all F1 and F2 in L(M).
Hence L(M) is indeed a lattice by Remark 5.3.16. Let J be a subset of M,
then ¯J is a flat, since it is a nonempty, finite intersection of flats. So ¯∅ is the
minimum of L(M).
Remark 5.3.43 An element x in M is a loop if and only if ¯x = ¯∅. If x, y ∈ M
are no loops, then x and y are parallel if and only if ¯x = ¯y. Let ¯M = {¯x|x ∈
M, ¯x = ¯∅}. Let ¯I = {¯I|I ∈ I, ¯∅ ∈ ¯I}. Then ( ¯M, ¯I) is a simple matroid.
Definition 5.3.44 Let G be a generator matrix of a code C. The reduced
matrix ¯G is the matrix obtained from G by deleting all zero columns from G
and all columns that are a scalar multiple of a previous column. The reduced
code ¯C of C is the code with generator matrix ¯G.
Remark 5.3.45 Let G be a generator matrix of a code C. The definition of the
reduced code ¯C by means of ¯G does not depend on the choice of the generator
matrix G of C. The matroids ¯MG and M ¯G are isomorphic.
Let J be a subset of {1, . . . , n}. Then the closure ¯J is equal to the complement
in {1, . . . , n} of the support of C(J) and C(J) = C( ¯J).
Proposition 5.3.46 Let (M, I) be a matroid. Then L(M) with the inclusion
as partial order is a geometric lattice and L(M) is isomorphic with L( ¯M).
Proof. See [114, Theorem 3.8].
5.3.4 Exercises
5.3.3 Give a proof of the formulas for cr(x, y) and µ(x, y) in Example 5.3.17.
5.3.4 Give a proof of the formula for µ(x) in Example 5.3.20.

5.3.5 Give a proof of the statements in Example 5.3.27.
5.3.6 Give an example of an atomic finite lattice with minimum 0 and maxi-
mum 1 that is not semimodular.
5.3.8 Let L be a finite geometric lattice. Show that (M(L), I(L)) is a matroid
as stated in Proposition 5.3.36. Show moreover that this matroid is simple.
5.3.12 Let L be a geometric lattice. Let a be an atom of L and x ∈ L. Show
that r(x ∨ a) ≤ r(x) + 1 and r(x ∨ a) = r(x) if and only if a ≤ x.
5.3.13 Let L be a geometric lattice. Show that r(y) − r(x) is the length of
every maximal chain from x to y for all x ≤ y in L.
5.3.15 Give an example of a central arrangement A such that the lattice L(A)
is not modular.
5.4 Characteristic polynomial
***
5.4.1 Characteristic and Möbius polynomial
Definition 5.4.1 Let L be a finite geometric lattice. The characteristic poly-
nomial χL(T) and the Poincaré polynomial πL(T) of L are defined by:
χL(T) =
x∈L
µL(x)Tr(L)−r(x)
, and πL(T) =
x∈L
µL(x)(−T)r(x)
.
The two variable Möbius polynomial µL(S, T) in S and T is defined by
µL(S, T) =
x∈L x≤y∈L
µ(x, y)Sr(x)
Tr(L)−r(y)
.
The two variable characteristic polynomial or coboundary polynomial is defined
by
χL(S, T) =
x∈L x≤y∈L
µ(x, y)Sa(x)
Tr(L)−r(y)
,
where a(x) is the number of atoms a in L such that a ≤ x.
Remark 5.4.2 Now µ(L) = χL(0), and χL(1) = 0 if and only if L consists of
one element. Furthermore χL(T) = Tr(L)
πL(−T−1
), and µL(0, T) = χL(0, T) =
χL(T).

5.4. CHARACTERISTIC POLYNOMIAL 147
Remark 5.4.3 Let r be the rank of L. Then the following relation holds for
the Möbius polynomial in terms of characteristic polynomials
µL(S, T) =
r
i=0
Si
µi(T) with µi(T) =
x∈Li
χLx
(T),
where Li = {x ∈ L|r(x) = i} and n = L1 the number of atoms in L. Then
similarly
χL(S, T) =
n
i=0
Si
χi(T) with χi(T) =
x∈L,a(x)=i
χLx
(T).
Remark 5.4.4 Let L be a geometric lattice. Then
r(L)
i=0
µi(T) = µL(1, T) =
y∈L 0≤x≤y
µ(x, y)Tr(L)−r(y)
= Tr(L)
,
since 0≤x≤y µ(x, y) = 0 for all 0 y in L by Proposition 5.3.5. Similarly
n
i=0 χi(T) = χL(1, T) = Tr(L)
. Also
n
i=0 Ai(T) = Tk
for the extended
weights of a code of dimension k by Proposition 4.4.38 for t = 0.
Example 5.4.5 Let L be the lattice of all subsets of a given finite set of r
elements as in Example 5.3.17. Then r(x) = a(x) and µ(x, y) = (−1)a(y)−a(x)
if x ≤ y. Hence
χL(T) =
r
j=0
r
j
(−1)j
Tr−j
= (T − 1)r
and µi(T) =
r
i
(T − 1)r−i
.
Therefore µL(S, T) = (S + T − 1)r
.
Example 5.4.6 Let L be the lattice of all linear subspaces of a given vector
space of dimension r over the finite field Fq as in Example 5.3.33. Then r(x) is
the dimension of x over Fq. The number of subspaces of dimension i is counted
in Proposition 4.3.7. It is left as an exercise to show that
µ(x, y) = (−1)i
q(j−i)(j−i−1)/2
if r(x) = i, r(y) = j and x ≤ y, and
χL(T) =
r
i=0
r
i q
(−1)i
q(i
2)Tr−i
= (T − 1)(T − q) · · · (T − qr−1
) and
µi(T) =
r
i q
(T − 1)(T − q) · · · (T − qr−i−1
).
See [71].
Remark 5.4.7 Every polynomial in one variable with coefficients in a field F
factorizes in linear factors over the algebraic closure ¯F of F. In Examples 5.4.5
and 5.4.6 we see that χL(T) factorizes in linear factors over Z. This is always
the case for so called super solvable geometric lattices and lattices from free
central arrangements. See [92].

Definition 5.4.8 Let L be a finite geometric lattice. The Whitney numbers wi
and Wi of the first and second kind, respectively are defined by
wi =
x∈Li
µ(x) and Wi = |Li|.
The doubly indexed Whitney numbers wij and Wij of the first and second kind,
respectively are defined by
wij =
x∈Li y∈Lj
µ(x, y) and
Wij = |{(x, y)|x ∈ Li, y ∈ Lj, x ≤ y}|.
See [60], [34, §6.6.D], [?, Chapter 14] and [113, §3.11].
Remark 5.4.9 We have that
χL(T) =
r(L)
i=0
wiTr(L)−i
and µL(S, T) =
r(L)
i=0
r(L)
j=0
wijSi
Tr(L)−j
Hence the (doubly indexed) Whitney numbers of he first kind are determined
by µL(S, T). The leading coefficient of
µi(T) =
x∈Li x≤y
µ(x, y)Tr(Lx)−rLx (y)
is equal to x∈Li
µ(x, x) = |Li| = Wi. Hence the Whitney numbers of the
second kind Wi are determined by µL(S, T). We will see in Example 5.4.32 that
the Whitney numbers are not determined by χL(S, T). Finally, let r = r(L).
Then
µr−1(T) = Wr−1(T − 1)
5.4.2 Characteristic polynomial of an arrangement
A central arrangement A gives rise to a geometric lattice L(A) and character-
istic polynomial χL(A) that will be denoted by χA. Similarly πA denotes the
Poincaré polynomial of A. If A is an arrangement over the real numbers, then
πA(1) counts the number of connected components of the complement of the
arrangement. See [139]. Something similar can be said about arrangements over
finite fields.
Proposition 5.4.10 Let q be a prime power, and let A = (H1, . . . , Hn) be a
simple and central arrangement in Fk
q . Then
χA(qm
) = |Fk
qm (H1 ∪ · · · ∪ Hn)|.
Proof. See [7, Theorem 2.2], [17, Proposition 3.2], [44, Sect. 16] [92, Theorem
2.69].
Let A = Fk
qm and Aj = Hj(Fm
q ). Let L be the poset of all intersections of the
Aj. The principle of inclusion/exclusion as formulated in Example 5.3.19 gives
that
|Fk
qm (H1 ∪ · · · ∪ Hn)| =
x∈L
µ(x)|x| =
x∈L
µ(x)qm dim(x)
.

The expression on the right hand side is equal to χA(qm
), since L is iso-
morphic with the reverse of the geometric lattice L(A) of the arrangement
A = (H1, . . . , Hn), so dim(x) = µL(A) − µL(A)(x) and µL(x) = µL(A)(x) by
Remark 5.3.12.
Definition 5.4.11 Let A = (H1, . . . , Hn) be an arrangement in Fk
over the
field F. Let H = Hi. Then the deletion A H is the arrangement in Fk
obtained from (H1, . . . , Hn) by deleting all the Hj such that Hj = H. Let
x = ∩i∈IHi be an intersection of hyperplanes of A. Let l be the dimension of
x. The restriction Ax is the arrangement in Fl
of all hyperplanes x ∩ Hj in x
such that x ∩ Hj = ∅ and x ∩ Hj = x, for a chosen isomorphism of x with Fl
.
Proposition 5.4.12 Deletion-restriction formula Let A = (H1, . . . , Hn) be
a simple and central arrangement in Fk
over the field F. Let H = Hi. Then
χA(T) = χAH(T) − χAH
(T).
Proof. A proof for an arbitrary field can be found in [92, Theorem 2.56]. Here
the special case of a central arrangement over the finite field Fq will be treated.
Without loss of generality we may assume that H = H1. Denote Hj(Fqm ) by
Hj and Fk
qm by V . Then the following set is written as the disjoint union of two
others.
V (H2 ∪ · · · ∪ Hn) = (V (H1 ∪ H2 ∪ · · · ∪ Hn)) ∪ (H1 (H2 ∪ · · · ∪ Hn)) .
The number of elements of the left hand side is equal to χAH(qm
), and the
number of elements of the two sets on the right hand side are equal to χA(qm
)
and χAH
(qm
), respectively by Proposition 5.4.10. Hence
χAH(qm
) = χA(qm
) + χAH
(qm
)
for all positive integers m, since the union is disjoint. Therefore the identity of
the polynomial holds.
Definition 5.4.13 Let A = (H1, . . . , Hn) be a central simple arrangement over
the field F in Fk
. Let J ⊆ {1, . . . , n}. Define HJ = ∩j∈J Hj. Consider the
decreasing sequence
Nk ⊂ Nk−1 ⊂ · · · ⊂ N1 ⊂ N0,
of algebraic subsets of the affine space Ak
, defined by
Ni =
J⊆{1,...,n},r(HJ )=i
HJ .
Define Mi = (Ni Ni+1).
Remark 5.4.14 N0 = Ak
, N1 = ∪n
j=1Hj, Nk = {0} and Nk+1 = ∅. Further-
more Ni is a union of linear subspaces of Ak
all of dimension k − i. Notice
that HJ is isomorphic with C(J) in case A is the arrangement of the generator
matrix G of the code C as remarked in the proof of Proposition 4.4.8.

Proposition 5.4.15 Let A = (H1, . . . , Hn) be a central simple arrangement
over the field F in Fk
. Let z(x) = {j ∈ {1, . . . , n}|x ∈ Hj} and r(x) = r(Hz(x))
the rank of x for x ∈ Ak
. Then
Ni = { x ∈ Ak
| r(x) ≥ i } and Mi = { x ∈ Ak
| r(x) = i }.
Proof. Let x ∈ Ak
and c = xG. Let x ∈ Ni. Then there exists a J ⊆ {1, . . . , n}
such that r(HJ ) = i and x ∈ HJ . So cj = 0 for all j ∈ J. So J ⊆ z(x). Hence
Hz(x) ⊆ HJ . Therefore r(x) = r(Hz(x)) ≥ r(HJ ) = i. The converse implication
is proved similarly.
The statement about Mi is a direct consequence of the one about Ni.
Proposition 5.4.16 Let A be a central simple arrangement over Fq. Let L =
L(A) be the geometric lattice of A. Then
µi(qm
) = |Mi(Fqm )|.
Proof. See also [7, Theorem 6.3]. Remember that µi(T) = r(x)=i χLx (T) as
defined in Remark 5.4.3. Let L = L(A) and x ∈ L. Then L(Ax) = Lx. Let ∪Ax
be the union of the hyperplanes of Ax. Then |(x (∪Ax))(Fqm )| = χLx (qm
)
by Proposition 5.4.10. Now Mi is the disjoint union of complements of the
arrangements of Ax for all x ∈ L such that r(x) = i by Proposition 5.4.15.
Hence
|Mi(Fqm )| =
x∈L,r(x)=i
|(x (∪Ax))(Fqm )| =
x∈L,r(x)=i
χLx (qm
).
5.4.3 Characteristic polynomial of a code
Proposition 5.4.17 Let C be a nondegenerate Fq-linear code. Then
An(T) = χC(T).
Proof. The elements in Fk
qm (H1∪· · ·∪Hn) correspond one-to-one to codewords
of weight n in C⊗Fqm by Proposition 4.4.8. So An(qm
) = χC(qm
) for all positive
integers m by Proposition 5.4.10. Hence An(T) = χC(T).
Definition 5.4.18 Let G be a generator matrix of an [n, k] code C over Fq.
Define
Yi = { x ∈ Ak
| wt(xG) ≤ n − i } and Xi = { x ∈ Ak
| wt(xG) = n − i }.
Remark 5.4.19 The Yi form a decreasing sequence
Yn ⊆ Yn−1 ⊆ · · · ⊆ Y1 ⊆ Y0,
of algebraic subsets of Ak
and Xi = (Yi Yi+1).
Proposition 5.4.20 Let C be a projective code of length n. Then
χi(qm
) = |Xi(Fqm )| = An−i(qm
).

Proof. Every x ∈ Fk
qm corresponds one-to-one to codeword in C ⊗ Fqm via the
map x → xG. So |Xi(Fqm )| = An−i(qm
). And An−i(qm
) = χi(qm
) for all i, by
Remark ??.
Corollary 5.4.21 Let C be a projective code of length n. Then χi(T) = An−i(T)
for all i.
Remark 5.4.22 Another way to define Xi is the collection of all points P ∈ Ak
such that P is on exactly i distinct hyperplanes of the arrangement AG. Denote
the arrangement of hyperplanes in Pk−1
also by AG, and let ¯P be the point in
Pk−1
corresponding to P ∈ Ak
. Define
¯Xi = { ¯P ∈ Pk−1
| ¯P is on exactly i hyperplanes of AG}.
For all i n the polynomial χi(T) is divisible by T − 1. Define ¯χi(T) =
χi(T)/(T − 1). Then ¯χi(qm
) = | ¯Xi(Fqm )| for all i n by Proposition 5.4.20.
Theorem 5.4.23 Let G be a generator matrix of a nondegenerate code C. Let
AG be the associated central arrangement. Let d⊥
= d(C⊥
). Then Ni ⊆ Yi
for all i, equality holds for all i d⊥
and Mi = Xi for all i d⊥
− 1. If
furthermore C is projective, then
µi(T) = χi(T) = An−i(T) for all i d⊥
− 1.
Proof. Let x ∈ Ni. Then x ∈ HJ for some J ⊆ {1, . . . , n} such that r(HJ ) = i.
So |J| ≥ i and wt(xG) ≤ n − i by Proposition 4.4.8. Hence x ∈ Yi. Therefore
Ni ⊆ Yi.
Let i d⊥
and x ∈ Yi. Then wt(xG) ≤ n−i. Let J = supp(xG). Then |J| ≥ i.
Take a subset I of J such that |I| = i. Then x ∈ HI and r(I) = |I| = i by
Lemma 7.4.39, since i d⊥
. Hence x ∈ Ni. Therefore Yi ⊆ Ni. So Yi = Ni for
all i d⊥
, and Mi = Xi for all i d⊥
− 1.
The code is nondegenerate. So d(C⊥
) ≥ 2. Suppose furthermore that C is
projective. Then µi(T) = χi(T) = An−i(T) for all i d⊥
− 1, by Remark ??
and Propositions 5.4.20 and 5.4.16.
The extended and generalized weight enumerators are determined by the pair
(n, k) for an [n, k] MDS code by Remark ??. If C is an [n, k] code, then d(C⊥
)
is at most k + 1. Furthermore d(C⊥
) = k + 1 if and only if C is MDS if and
only if C⊥
is MDS. An [n, k, d] code is called almost MDS if d = n − k. So
d(C⊥
) = k if and only if C⊥
is almost MDS. If C is almost MDS, then C⊥
is
not necessarily almost MDS. The code C is called near MDS if both C and C⊥
are almost MDS. See [?].
Proposition 5.4.24 Let C be an [n, k, d] code such that C⊥
is MDS or almost
MDS and k ≥ 3. Then both χC(S, T) as WC(X, Y, T) determine µC(S, T). In
particular
µi(T) = χi(T) = An−i(T) for all i k − 1,
µk−1(T) =
n−1
i=k−1
χi(T) =
n−1
i=k−1
An−i(T),
and µk(T) = 1.

Proof. Let C be a code such that d(C⊥
) ≥ k ≥ 3. Then C is projective and
An−i = χi for all i k − 1 by Remark ??.
If i k − 1, then the expression for µi(T) is given by Theorem 5.4.23. Further-
more µk(T) = χn(T) = A0(T) = 1. Finally let L = L(C). Then
k
i=0 µi(T) =
Tk
,
n
i=0 χi(T) = Tk
and
n
i=0 Ai(T) = Tk
by Remark 5.4.4. Hence the for-
mula for µk−1(T) holds. Therefore µC(S, T) is determined both by WC(X, Y, T)
and χC(S, T).
Projective codes of dimension 3 are examples of codes C such that C⊥
is almost
MDS. In the following we will give explicit formulas for µC(S, T) for such codes.
Let C be a projective code of length n and dimension 3 over Fq with generator
matrix G. The arrangement AG = (H1, . . . , Hn) of planes in F3
q is simple and
essential, and the corresponding arrangement of lines in P2
(Fq) is also denoted
by AG. We defined
¯Xi(Fqm ) = { ¯P ∈ P2
(Fqm ) | ¯P is on exactly i lines of AG }
and ¯χi(qm
) = | ¯Xi(Fqm )| in Remark 5.4.22 for all i n.
Remark 5.4.25 Notice that for projective codes of dimension three ¯Xi(Fqm ) =
¯Xi(Fq) for all positive integers m and 2 ≤ i n. Abbreviate in this case
¯χi(qm
) = ¯χi for 2 ≤ i n.
Proposition 5.4.26 Let C be a projective code of length n and dimension 3
over Fq. Then



µ0(T) = (T − 1) T2
− (n − 1)T +
n−1
i=2 (i − 1)¯χi − n + 1 ,
µ1(T) = (T − 1) nT + n −
n−1
i=2 i¯χi ,
µ2(T) = (T − 1)
n−1
i=2 ¯χi .
Proof. A more general statement and proof is possible for [n, k] codes C such
that d(C⊥
) ≥ k, using Proposition 5.4.24, the fact that Bt(T) = Tk−t
− 1 for
all t d(C⊥
) by Lemma 7.4.39 and the expression of Bt(T) in terms of Aw(T)
by Proposition ??. We will give a second geometric proof for the special case of
projective codes of dimension 3.
It is enough to show this proposition with T = qm
for all m by Lagrange
interpolation. Notice that µi(qm
) is the number of elements of Mi(Fqm ) by
Proposition 5.4.16. Let ¯P be the corresponding point in P2
(Fqm ) for P ∈ F3
qm
and P = 0. Abbreviate Mi(Fqm ) by Mi. Define ¯Mi = { ¯P | P ∈ Mi}. Then
|Mi| = (qm
− 1)| ¯Mi| for all i 3.
(1) If ¯P ∈ ¯M2, then ¯P ∈ Hj ∩ Hk for some j = k. Hence ¯P ∈ ¯Xi(Fq) for some
i ≥ 2, since the code is projective. So ¯M2 is the disjoint union of the ¯Xi(Fq),
2 ≤ i n. Therefore | ¯M2| =
n−1
i=2 ¯χi.
(2) ¯P ∈ ¯M1 if and only if ¯P is on exactly one line Hj. There are n lines, and
every line has qm
+ 1 points that are defined over Fqm . If i ≥ 2, then every
¯P ∈ ¯Xi(Fq) is on i lines Hj. Hence | ¯M1| = n(qm
+ 1) −
n−1
i=2 i¯χi.
(3) P2
is the disjoint union of ¯M1, ¯M2 and ¯M0. The numbers | ¯M2| and | ¯M1|
are computed in (1) and (2), and |P2
(Fqm )| = q2m
+qm
+1. From this we derive
the number of elements of ¯M0.

Example 5.4.27 Consider the matrices G and P given by
G =


1 0 0 0 1 1 1
0 1 0 1 0 1 1
0 0 1 1 1 0 1

 and
P =


1 0 0 0 1 1 −1 1 1
0 1 0 1 0 −1 1 −1 1
0 0 1 −1 −1 0 1 1 −1

 .
Let C be the code over Fq with generator matrix G. The columns of G represent
also the coeﬃcients of the lines of AG. The j-th column of P represents the
homogenous coordinates of the points Pj in the projective plane that occur as
intersections of two lines of AG. In case q is even, the points P7, P8 and P9
coincide.
***two pictures: q odd and q even**
If q is even, then ¯χ2 = 0 and ¯χ3 = 7. If q is odd, then ¯χ2 = 3 and ¯χ3 = 6.
i 1 2 3 4 5 6 7
¯χi 0 7 0 0 0 0
q even ¯Ai 0 0 0 7 0 7T − 14 T2
− 6T + 8
¯µ3−i 7 7T − 14 T2
− 6T + 8
¯χi 3 6 0 0 0 0
q odd ¯Ai 0 0 0 6 3 7T − 17 T2
− 6T + 9
¯µ3−i 9 7T − 17 T2
− 6T + 9
Notice that there is a codeword of weight 7 in case q is even and q 4 or q is
odd and q 3, since ¯A7(T) = (T − 2)(T − 4) or ¯A7(T) = (T − 3)2
, respectively.
Example 5.4.28 Let G be a 3 × n generator matrix of an MDS code. The
lines of the arrangement AG are in general position. That means that every
two distinct lines meet in one point, and every three mutually distinct lines
have an empty intersection. So ¯χ2 = n
2 and ¯χi = 0 for all i 2. Hence
¯An−2(T) = ¯µ2(T) = n
2 and ¯An−1(T) = ¯µ1(T) = nT + 2n − n2
and ¯An(T) =
¯µ0(T) = T2
− (n − 1)T + n−1
2 , by Proposition 5.4.16 and Theorem ?? which
is in agreement with Proposition 4.4.22.
Example 5.4.29 Let a and b positive integers such that 2 a b. Let
n = a + b. Let G be a 3 × n generator matrix of a nondegenerate code. Suppose
that there are two points P and Q in the projective plane over Fq such that
the a + b lines of the projective arrangement of AG consists of a distinct lines
incident with P, and b distinct lines incident with Q and there is no line incident
with P and Q. Then ¯An−2 = ¯χ2 = ab, ¯Aa = ¯χa = 1 and ¯Ab = ¯χb = 1. Hence
¯µ2(T) = ab + 2. Furthermore
¯An−1(T) = ¯µ1(T) = (a + b)T − 2ab,
¯An(T) = ¯µ0(T) = T2
− (a + b − 1)T + ab − 1
and ¯Ai(T) = 0 for all i = a, b, n − 2, n − 1, n.

Example 5.4.30 Let a, b and c be positive integers such that 2 a b c.
Let n = a + b + c. Let G be a 3 × n generator matrix of a nondegenerate code
C(a, b, c). Suppose that there are three points P, Q and R in the projective
plane over Fq such that the lines of the projective arrangement of AG consist of
a distinct lines incident with P and not with Q and R, b distinct lines incident
with Q and not with P and R, and c distinct lines incident with R and not with
P and Q. If q is large enough, then such a configurations exists. The a lines
through P intersect the b lines through Q in ab points. Similarly statements hold
for the lines through P and R intersecting in ac points, and the lines through
Q and R intersecting in bc points. All these intersection points are on exactly
two lines of the arrangement and there are no other. Hence ¯χ2 = ab + bc + ca.
Now P is the unique point on exactly a lines of the arrangement. So ¯χa = 1.
Similarly ¯χb = ¯χc = 1. Finally ¯χi = 0 for all 2 ≤ i n and i /∈ {2, a, b, c}. Now
µi(T) is divisible by T − 1 for all 0 ≤ i k. Define ¯µi(T) = µi(T)/(T − 1).
Define similarly ¯Aw(T) = Aw(T)/(T − 1) for all 0 w ≤ n. Propositions 5.4.24
and 5.4.26 imply that ¯An−a = ¯An−b = ¯An−c = 1 and ¯An−2 = ab + bc + ca and
¯µ2(T) = ab + bc + ca + 3. Furthermore
¯An−1(T) = ¯µ1(T) = nT − 2(ab + bc + ca),
¯An(T) = ¯µ0(T) = T2
− (n − 1)T + ab + bc + ca − 2
and ¯Ai(T) = 0 for all i ∈ {0, n − a, n − b, n − c, n − 2, n − 1, n}.
Therefore WC(a,b,c)(X, Y, T) = WC(a ,b ,c )(X, Y, T) if and only if (a, b, c) =
(a , b , c ), and µC(a,b,c)(S, T) = µC(a ,b ,c )(S, T) if and only if a+b+c = a +b +c
and ab + bc + ca = a b + b c + c a . In particular let C1 = C(3, 9, 14) and
C2 = C(5, 6, 15). Then C1 and C2 are two projective codes with the same
Möbius polynomial µC(S, T) but distinct extended weight enumerators and
coboundary polynomials χC(S, T).
Example 5.4.31 Consider the codes C3 and C4 over Fq with q 2 with gen-
erator matrices G3 and G4 given by
G3 =


1 1 0 0 1 0 0
0 1 1 1 0 1 0
−1 0 1 1 0 0 1

 and G4 =


1 1 0 0 1 0 0
0 1 1 1 0 1 0
0 1 1 a 0 0 1

 ,
where a ∈ Fq {0, 1}. It was shown in [34, Exercise 6.96] that the duals of these
codes have the same Tutte polynomial. So the codes C3 and C4 have the same
Tutte polynomial
tC(X, Y ) = 2X + 2Y + 3X2
+ 5XY + 4Y 2
+ X3
+ X2
Y + 2XY 2
+ 3Y 3
+ Y 4
.
Hence C3 and C4 have the extended weight enumerator given by
X7
+ (2T − 2)X4
Y 3
+ (3T − 3)X3
Y 4
+ (T2
− T)X2
Y 5
+
+(5T2
− 15T + 10)XY 6
+ (T3
− 6T2
+ 11T − 6)Y 7
.
The codes C3 and C4 are not projective and their reductions ¯C3 and ¯C4, respec-
tively have generator matrices given by
¯G3 =


1 1 0 1 0 0
0 1 1 0 1 0
−1 0 1 0 0 1

 and ¯G4 =


1 1 0 0 0 0
0 1 1 1 1 0
0 1 1 a 0 1

 ,

From the arrangement A( ¯C3) and A( ¯C4) we deduce the ¯χi(T) that are given in
the following table.
code i 0 1 2 3 4 5
C3 T2
− 5T + 6 6T − 12 3 4 0 0
C4 T2
− 5T + 6 6T − 13 6 1 1 0
Therefore tC3 (X, Y ) = tC4 (X, Y ) but χC3 (S, T) = χC4 (S, T) and t ¯C3
(X, Y ) =
t ¯C4
(X, Y ).
Example 5.4.32 Let C5 = C⊥
3 and C6 = C⊥
4 . Then C5 and C6 have the same
Tutte polynomial tC⊥ (X, Y ) = tC(Y, X) as given by by Example 5.4.31:
2X + 2Y + 4X2
+ 5XY + 3Y 2
+ 3X3
+ 2X2
Y + XY 2
+ Y 3
+ 3X4
.
Hence C5 and C6 have the same extended weight enumerator given by
X7
+(T −1)X5
Y 2
+(6T −6)X4
Y 3
+(2T2
−T −1)X3
Y 4
+(15T2
−43T +28)X2
Y 5
+
+(7T3
−36T2
+60T −31)XY 6
+(T4
−7T3
+19T2
−23T +10)Y 7
.
The geometric lattice L(C5) has atoms a, b, c, d, e, f, g corresponding to the first,
second, etc. column of G3. The second level of L(C5) consists of the following
17 elements:
abe, ac, ad, af, ag, bc, bd, bf, bg, cd, ce, cf, cg, de, df, dg, efg.
The third level consists of the following 12 elements:
abce, abde, abefg, acdg, acf, adf, bcdf, bcg, bdg, cde, cefg, defg.
Similarly, the geometric lattice L(C6) has atoms a, b, c, d, e, f, g corresponding
to the first, second, etc. column of G4. The second level of L(C6) consists of
the following 17 elements:
abe, ac, ad, af, ag, bc, bd, bf, bg, cd, ce, cf, cg, de, dfg, ef, eg.
The third level consists of the following 13 elements:
abce, abde, abef, abeg, acd, acf, acg, adfg, bcdfg, cde, cef, ceg, defg.
Proposition 5.4.24 implies that µ0(T) and µ1(T) are the same for both codes
and equal to
µ0(T) = χ0(T) = A7(T) = (T − 1)(T − 2)(T2
− 4T + 5)
µ1(T) = χ1(T) = A6(T) = (T − 1)(7T2
− 29T + 31).
The polynomials µ3(T) and µ2(T) are given in the following table using Remarks
5.4.9 and 5.4.4.
C5 C6
µ2(T) 17T2
− 49T + 32 17T2
− 50T + 33
µ3(T) 12T − 12 13T − 13
This example shows that the Möbius polynomial µC(S, T) is not determined by
coboundary polynomials χC(S, T).

5.4.4 Minimal codewords and subcodes
Definition 5.4.33 A minimal codeword of a code C is a codeword whose sup-
port does not properly contain the support of another codeword.
Remark 5.4.34 The zero word is a minimal codeword. Notice that the nonzero
scalar multiple of a minimal codeword is again a minimal codeword. Nonzero
minimal codewords play a role in minimum distance decoding. Minimal code-
words play a role in minimum distance decoding algorithms [6, 8, 9] and secret
sharing schemes and access structures [80, 117]. We can generalize this notion
to subcodes instead of words.
Definition 5.4.35 A minimal subcode of dimension r of a code C is an r-
dimensional subcode whose support is not properly contained in the support of
another r-dimensional subcode.
Remark 5.4.36 A minimal codeword generates a minimal subcode of dimen-
sion one, and all the elements of a minimal subcode of dimension one are minimal
codewords. A codeword of minimal weight is a nonzero minimal codeword, but
te converse is not always the case.
In the Example 5.4.32 it is shown that the codes C5 and C6 have the same
Tutte polynomial whereas the number of minimal codewords of the code C5
is 12 and of C6 is 13. Hence the number of minimal codewords and subcodes
is not determined by the Tutte polynomial. However the number of minimal
codewords and the number of minimal subcodes of a given dimension are given
by the Möbius polynomial.
Theorem 5.4.37 Let C be a code of dimension k. Let 0 ≤ r ≤ k. Then
the number of minimal subcodes of dimension r is equal to Wk−r, the (r −
k)-th Whitney number of the second kind and it is determined by the Möbius
polynomial.
Proof. Let D be a subcode of C of dimension r. Let J be the complement in [n]
of the support of D. If d ∈ D and dj = 0, then j ∈ supp(D) and j ∈ J. Hence
D ⊆ C(J). Now suppose moreover that D is a minimal subcode of C. Without
loss of generality we may assume that D is systematic at the first r positions.
So D has a generator matrix of the form (Ir|A). Let dj be the j-th row of this
matrix. Let c ∈ C(J). If c −
r
j=1 cjdj is not the zero word, then the subcode
D of C generated by c, d2, . . . , dr has dimension r and its support is contained
in supp(D){1} and 1 ∈ supp(D). This contradicts the minimality of D. Hence
c−
r
j=1 cjdj = 0 and c ∈ D. Therefore D = C(J). To find a minimal subcode
of dimension r, we fix l(J) = r and minimalize the support of C(J) with respect
to inclusion. Because J is contained in the complement in [n] of the support
of C(J), this is equivalent to maximize J with respect to inclusion. In matroid
terms this means we are maximizing J for r(J) = k − l(J) = k − r. This means
J = ¯J is a flat of rank k − r by Remark 5.3.45. The flats of a matroid are the
elements in the geometric lattice L = L(M). The number of (k−r)-dimensional
elements in L(M) is equal to |Lk−r|, which is equal to the Whitney number of
the second kind Wk−r and thus equal to the leading coefficient of µk−r(T) by
Remark 5.4.9. Hence the Möbius polynomial determines all the numbers of
minimal subcodes of dimension r for 0 ≤ r ≤ k.

Remark 5.4.38 Note that the flats of dimension k−r in a matroid are exactly
the hyperplanes in the (r − 1)-th truncated matroid Tr−1
(M). This gives an-
other proof of the result of Britz [28, Theorem 3] that the minimal supports of
dimension r are the cocircuits of the (r − 1)-th truncated matroid. For r = 1,
this gives the well-known equivalence between nonzero minimal codewords and
cocircuits See [?, Theorem 9.2.4] and [123, 1.21].
5.4.5 Two variable zeta function
Generally the counting of rational points over field extensions Fqm is computed
by the zeta function.
Definition 5.4.39 Let X be an affine variety in Ak
defined over Fq, that is the
zeroset of a collection of polynomials in Fq[X1, . . . , Xk]. Then X(Fqm ) is the set
of all points X with coordinates in Fqm , also called the the set of Fqm -rational
points of X. The zeta function ZX (T) of X is the formal power series in T
defined by
ZX (T) = exp
∞
m=1
|X(Fqm )|
r
Tr
.
Theorem 5.4.40 Let A be a central simple arrangement in Fk
q . Let χA(T) =
k
j=0 cjTj
be the characteristic polynomial of A. Let M = Ak
(H1 ∪ · · · ∪ Hn)
be the complement of the arrangement. Then the zeta function of M is given
by:
ZM(T) =
k
j=0
(1 − qj
T)−cj
.
Proof. See [17, Theorem 3.6].
Two variable zeta function of Duursma
5.4.6 Overview
We have established relations between the generalized weight enumerators for
0 ≤ r ≤ k, the extended weight enumerator and the Tutte polynomial. We
summarize this in the following diagram:
WC(X, Y )
WC(X, Y, T)
4.5.23
yy
5.2.21

mm
{W
(r)
C (X, Y )}k
r=0
4.5.21
55
5.2.22 //
[[
tC(X, Y )
5.2.22
oo
5.2.20
OO
{W
(r)
C (X, Y, T)}k
r=0
--
jj

cc

We see that the Tutte polynomial, the extended weight enumerator and the
collection of generalized weight enumerators all contain the same amount of in-
formation about a code, because they completely define each other. The original
weight enumerator WC(X, Y ) contains less information and therefore does not
determine WC(X, Y, T) or {W
(r)
C (X, Y )}k
r=0. See Simonis [109].
One may wonder if the method of generalizing and extending the weight enu-
merator can be continued, creating the generalized extended weight enumerator,
in order to get a stronger invariant. The answer is no: the generalized extended
weight enumerator can be defined, but does not contain more information then
the three underlying polynomials.
It was shown by Gray [29] that the matroid of a code is a stronger invariant
than its Tutte polynomial.
5.4.7 Exercises
5.4.1 Give a proof of the formulas in Example 5.4.6.
5.4.3 Compute the two variable Möbius and coboundary polynomial of the
simplex code S3(q).
5.5 Combinatorics and codes
***Intro***
5.5.1 Orthogonal arrays and codes
Definition 5.5.1 Let q be a positive integer, not necessarily a power of a prime.
A Latin square of order q is a q×q array with entries from a set Q of q elements,
such that every column and every row is a permutation of the symbols Q.
Example 5.5.2 An example of a Latin square of order 4 with Q = {a, b, c, d}
is given by
a d c b
d a b c
c b a d
b c d a
Remark 5.5.3 An alternative way to represent a Latin square is by a map
L : R × C → Q, where R, C and Q, are the sets of rows, columns and values,
respectively, with all three of size q. Then L represents a Latin square if and
only if L(x, j) = k has a unique solution x ∈ R, for all j ∈ C and k ∈ Q, and
L(i, y) = k has a unique solution y ∈ C, for all i ∈ R and k ∈ Q.
Any permutation of the rows, that is of the set R, gives another Latin square,
and similarly permutations of the columns C and the entries Q give again Latin
squares.
Example 5.5.4 Let (G, ·) be a group where · is the multiplication on G. Let
R, C and Q all three be equal to G. Let L(x, y) = x · y. Then L defines a Latin
square of order |G|.

5.5. COMBINATORICS AND CODES 159
Remark 5.5.5 A pair of Greek-Latin squares.
Euler’s problem of 36 officers and the non-existence of two mutual orthogonal
Latin squares of the order 6.
Definition 5.5.6 Two Latin squares L1 and L2 are called mutually orthogonal
if Q2
is equal to the set of all pairs (L1(x, y), L2(x, y)) with x, y ∈ Q. A collection
{Li : i ∈ J} of Latin squares Li of order q with entries from a set Q, is called
a set of mutually orthogonal Latin squares (MOLS) if Li and Lj are mutual
orthogonal for all i, j ∈ J with i = j.
Example 5.5.7 Consider Q = Fq where + is the addition. Let La(x, y) =
x + ay. Then La defines a Latin square of order q for all a ∈ F∗
q. Furthermore
{La : a ∈ F∗
q} form a collection of q − 1 MOLS of order q.
Example 5.5.8 In GAP one can constructs lists of MOLS. For example for
q = 7 we can construct 6 MOLS:
M:=MOLS(7,6);;
M[1];
[ [ 0, 1, 2, 3, 4, 5, 6 ], [ 1, 2, 3, 4, 5, 6, 0 ],
[ 2, 3, 4, 5, 6, 0, 1 ],[ 3, 4, 5, 6, 0, 1, 2 ],
[ 4, 5, 6, 0, 1, 2, 3 ], [ 5, 6, 0, 1, 2, 3, 4 ],
[ 6, 0, 1, 2, 3, 4, 5 ] ]
Definition 5.5.9 Let n ≥ 2. An orthogonal array OA(q, n) of order q and
depth n is a q2
×n array whose entries are from a set Q of q elements, such that
for every two columns all q2
pairs of symbols from Q appear in exactly one row.
Remark 5.5.10 Let J = {1, 2, . . . , j}. Let {Li : i ∈ J} be a collection of j
MOLS of order q. Let n = j + 2. We can construct a q2
× n orthogonal array
as follows. Identify R and C with Q by means of bijections. So we may assume
that they are equal. In the first two columns all q2
pairs of Q2
are tabulated. If
(x, y) is in the row of the first two columns, then Li(x, y) is in the column i + 2
of the same row.
Conversely an OA(q, n) gives rise to n − 2 MOLS of order q if n ≥ 3. In
particular an OA(q, 3) is a Latin square and an OA(q, 4) corresponds to two
mutual orthogonal Latin squares.
Example 5.5.11 Let q be a power of a prime. Then a collection of q − 1
MOLS of order q is constructed in Example 5.5.7. Therefore there exists an
OA(q, q + 1).
Remark 5.5.12 Let {Li : i ∈ J} be a collection of n − 2 MOLS of order q
with an array A the corresponding OA(q, n). A permutation σ of the rows
R gives a collection {Li : i ∈ J} of Latin squares which are again mutually
orthogonal with a corresponding array A1. Then A1 is obtained from A by
permuting the symbols in the first column under σ and leaving the remaining
columns unchanged. Similarly, a permutation of the columns C gives an array
A2 that is obtained from A by permuting the symbols in the second column. A
permutation of the entries from Q of Li gives an array Ai+2 that is obtained
from A by permuting the symbols in the (i + 2)-th column.

Remark 5.5.13 Let A be an OA(q, n) with entries in Q. Then two rows of A
are distinct and coincide in at most one position. Let C be the subset of Qn
consisting of the rows of A. Then C is a nonlinear code of length n with q2
codewords and minimum distance n − 1. So C attains the Singleton bound of
Exercise 3.2.1. Conversely any nonlinear (n, q2
, n − 1) code yields an OA(q, n).
The following proposition is a generalization of Proposition 4.4.25 in case k = 2,
that is n ≤ q + 1 if there exists an [n, 2, n − 1] code over Fq.
Proposition 5.5.14 Suppose there exists an orthogonal array OA(q, n). Then
n ≤ q + 1.
Proof. Let A be the array of an OA(q, n). Choose an element in Q and denote
it by 0. If the symbols in the the i-th column of A are permuted, where the
other columns remain unchanged, the new array is again OA(q, n) by Remark
5.5.12. Therefore we may assume without loss of generality that the first row of
A consists of zeros. The distance between two rows is at least n − 1 by Remark
5.5.13. Hence apart form the first row, no other row contains two zeros. Next,
it can be easily observed that each element from Q occurs in every column of A
exactly q times. We leave this as an exercise for the reader.
Count the number of rows that contain one zero. This number is n(q − 1).
Indeed, zero should appear n times in each column, but zero in the first column
has already been counted. In addition, since the i-th row with i 1, cannot
have more than one zero, we see that all these zeros lie in different rows. So
1 + n(q − 1) is the number of rows that contain a zero, and this is at most q2
,
the total number of rows. Therefore n ≤ q + 1.
Remark 5.5.15 The bound of Proposition 5.5.14 is tight if q is a power of a
prime by Example 5.5.11.
Consider the following generalization of an orthogonal array.
Definition 5.5.16 An orthogonal array OA(q, n, λ) is a λq2
× n array whose
entries are from a set Q of q elements, such that for every two columns any of
q2
pairs of symbols from Q occurs in exactly λ rows. In particular OA(q, n) =
OA(q, n, 1).
The next result we present here without a proof. It provides a lower bound on
the value of λ in terms of q and n.
Theorem 5.5.17 If there exists an orthogonal array OA(q, n, λ), then
λ ≥
n(q − 1) + 1
q2
.
Proof. Reference: ***...**
Definition 5.5.18 An orthogonal array OAλ(t, n, q) is an M × n array, where
M = λqt
, whose entries are from a set Q of q ≥ 2 elements, such that for every
M × t subarray all qt
possible t-tuples occur exactly λ times as a row. The
parameters λ, t, n, q and M are called the index, strength, constraints, levels
and size, respectively. The orthogonal array is called linear if Q = Fq and the
rows of the array form an Fq-linear subspace of Fn
q .

5.5. COMBINATORICS AND CODES 161
Remark 5.5.19 An OA(q, n, λ) is an orthogonal array of strength 2, that is
OA(q, n, λ) = OAλ(2, n, q). ***Notice that the order of n and q is interchanged
according to the literature!!! should we adopt this convention too???***
Theorem 5.5.20 The following objects correspond to each other:
1) An Fq-linear [n, k, d] code,
2) A linear orthogonal array OAqs (d − 1, n, q), where s = n − k + 1 − d is the
Singleton defect of C.
Proof. Let C be an Fq-linear [n, k, d] code with Singleton defect s = s(C) =
n − k + 1 − d. Consider the qn−k
× n matrix A having as rows the codewords
of C⊥
. Then A is a linear OAqs (d − 1, n, q). *** ....***
Remark 5.5.21 An OA1(n − k, n, q) is a nonlinear generalization of an Fq-
linear MDS code of length n and dimension k.
Consider the following a generalization of Corollary 4.4.27 on MDS codes.
Theorem 5.5.22 (Bush bound) Let A be an OA1(k, n, q). If q ≤ k, then
n ≤ k + 1.
Proof. ***... ***
5.5.2 Designs and codes
5.5.3 Exercises
5.5.1 Proof that Example 5.5.7 gives a set of q − 1 mutually orthogonal Latin
squares of order q.
5.5.2 Let q be positive integer. Show that q − 1 is the maximal number of
MOLS of order q.
5.5.3 Show that there exist t MOLS of order qr if there exist t MOLS of orders
q and r, respectively.
5.5.4 Let n ≥ 3. Give a proof of the correspondence between an OA(q, n) and
n − 2 MOLS of order q of Remark 5.5.10.
5.5.5 Let A be the array of an OA(q, n, λ) with entries from Q. Show that
every symbol of Q occurs in every column of A exactly λq times.
5.5.6 Let A be the array of an OAλ(t, q, n) with entries from Q. Let A be
obtained from A by permuting the symbols in a given column and leaving the
remaining columns unchanged. Show that A is the array of an OAλ(t, q, n).
5.5.7 [CAS] Write two procedures:
• ﬁrst takes as an input a q x q table and checks if the table is a Latin
square. Check your procedure with IsLatinSquare in GAP;
• second given a list of q x q tables checks if they are MOLS. Use AreMOLS
from GAP to test your procedure.

5.6 Notes
Section 4.1.6: MDS Conjecture is confirmed for all q such that 2 ≤ q ≤ 11,
Blokhuis-Bruen-Thas, Hirschfeld-Storme.
Section 4.2:
Theory of arrangements of hyperplanes [92].
The use of the isomorphism in Proposition 4.5.18 for the proof of Theorem
4.5.21 was suggested in [109] by Simonis.
Proposition 4.5.20 first appears in [63, Theorem 3.2], although the term “gen-
eralized weight enumerator” was yet to be invented.
The identity of Lemma 4.5.22 one can find in [5, 27, 71, 128, 113].
Section 4.3:
Applications of GHW’s
***dimension/length profile, Forney***
***Wire-tap channel of type II***
***trellis complexity***
***r-th rank MDS, Kloeve,Simonis,Wei***
***Question: two var. wt enumerator determines the generalized wit enumera-
tor?***
***C AMDS C⊥
AMDS iff d2 = d1 + 2.***
***If d qs(C), then ...***
*** wt enumerator of AMDS code***
Section: 4.4:
Theory of lattices [38, ?].
The polynomial µL(S, T) is defined by Zaslavsky in [139, Section 1]. In [140,
Section 2]. and [?, Section 6] it is called the Whitney polynomial. The polyno-
mial χL(S, T) is called the coboundary polynomial by Crapo in [42, p. 605] and
[43]. See also [30, 32].
Blocking sets and codes meeting the Griesmer bound
minihypers, blocking sets and codes meeting the Griesmer bound
Belov, Hamada-Helleseth, Storme
Section 4.4.2: Corollary 4.3.25 was proved first Oberst and Dür [?], with the
weaker assumption qm
n−1
d−1 − n−k−1
d−1 , where C is an [n, k, d] code. Propo-
sition 4.3.24 was shown by Pellikaan [?] with a stronger conclusion.

5.6. NOTES 163
(Complete) n-arcs, ovals, Segre: an oval is (q + 1)-arc if q is odd, ***B. Segre,
conic, odd curve in char 2, nucleus***
Conjectures of Segre, Hirschfeld-Thas, Hirschfeld-Kochmaros-Torres pp. 599.
Section: 4.5:
Section 4.6:
Literature on (mutual orthogonal) Latin squares, orthogonal arrays, codes and
designs:
J.H. van Lint and R.M. Wilson: A course in combinatorics Pages: 158, 250,
261, 382 and 495.
P. Cameron and J.H. van Lint Designs, graphs, codes and their links. Pages:
14, 93, 170, 209.
Links between coding theory and statistical objects:
R.C. Bose, “On some connections between the design of experiments and infor-
mation theory,” Bull. Inst. Internat. Statist., vol. 38, pp. 257–271, 1961.
Connection between OA and Error-correcting codes with a given defect: R.C.
Bose and K.A. Bush, “Orthogonal arrays of strength two and three,” Ann.
Math. Stat., vol 23, pp. 508–524, 1952.
The construction of OA of max length and the Bush bound.
K.A. Bush, “Orthogonal arrays of index unity,” Ann. Math. Stat., vol 23, pp.
426–434, 1952.
J.W.P. Hirschfeld and L. Storme, “The packing problem in statistics, coding
theory and ﬁnite projective spaces,” Journ. Stat. Planning and Inference, vol.
72, pp. 355–380, 1998.
The notion of a OAλ(t, q, n) as a generalization of MOLS is from:
C.R. Rao, “Factorial experiments derivable from combinatorial arrangements of
arrays,” Journ. Royal Stat. Soc. Suppl. vol. 9, pp. 128–139, 1947.
***Bose-Bush,Bierbrauer, Stinson***
***t-resilient functions,
***The design of statistical experiments.
***Lattices and codes.

Chapter 6
Complexity and decoding
Stanislav Bulygin, Ruud Pellikaan and Xin-Wen Wu
6.1 Complexity
In this section we briefly explain the theory of complexity and introduce some
hard problems which are related to the theme of this book and will be useful in
the following chapters.
6.1.1 Big-Oh notation
The following definitions and notations are essential in the evaluation of the
complexity of an algorithm.
Definition 6.1.1 Let f(n) and g(n) be functions mapping non-negative inte-
gers to real numbers. We define
(1) f(n) = O(g(n)) for n → ∞, if there exists a real constant c 0 and an
integer constant n0 0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0.
(2) f(n) = Ω(g(n)) for n → ∞, if there exists a real constant c 0 and an
integer constant n0 0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0.
(3) f(n) = Θ(g(n)) for n → ∞, if there exist real constants c1 0 and c2 0,
and an integer constant n0 0 such that c1g(n) ≤ f(n) ≤ c2g(n) for all
n ≥ n0.
(4) f(n) ≈ g(n) for n → ∞, if limn→∞ f(n)/g(n) = 1.
(5) f(n) = o(g(n)) for n → ∞, if for every real constant 0 there exists an
integer constant n0 0 such that 0 ≤ f(n) g(n) for all n ≥ n0.
Remark 6.1.2 The notations f(n) = O(g(n)) and f(n) = o(g(n)) of Landau
are often referred to as the “big-Oh” and “little-oh” notations. Furthermore
f(n) = O(g(n)) is expressed as “f(n) is of the order g(n)”. Intuitively, this
means that f(n) grows no faster asymptotically than g(n) up to a constant. And
165

166 CHAPTER 6. COMPLEXITY AND DECODING
f(n) ≈ g(n) is expressed as “f(n) is approximately equal to g(n)”. Similarly,
in the literature f(n) = Ω(g(n)) and f(n) = Θ(g(n)), are referred to as the
“big-Omega”, “big-Theta”, notations, respectively.
Example 6.1.3 It is easy to see that for every positive constant a, we have
a = O(1) and a/n = O(1/n). Let f(n) = aknk
+ ak−1nk−1
+ · · · + a0, where k
is an integer constant and ak, ak−1, . . . , a0 are real constants with ak 0. For
this polynomial in n, we have f(n) = O(nk
), f(n) = Θ(nk
), f(n) ≈ aknk
and
f(n) = o(nk+1
) for n → ∞.
We have 2 log n + 3 log log n = O(log n), 2 log n + 3 log log n = Θ(log n) and
2 log n + 3 log log n ≈ 2 log n for n → ∞, since 2 log n ≤ 2 log n + 3 log log n ≤
5 log n when n ≥ 2 and limn→∞ log log n/ log n = 0.
6.1.2 Boolean functions
An algorithm is a well-defined computational procedure such that every execu-
tion takes a variable input and halts with an output.
The complexity of an algorithm or a computational problem includes time com-
plexity and storage space complexity.
Definition 6.1.4 A (binary) elementary (arithmetic) operation is an addition,
a comparison or a multiplication of two elements x, y ∈ {0, 1} = F2. Let A be
an algorithm that has as input a binary word. Then the time or work complexity
CT (A, n) is the number of elementary operations in the algorithm A to get the
output as a function of the length n of the input, that is the number of bits of
the input. The space or memory complexity CT (A, n) is the maximum number
of bits needed for memory during the execution of the algorithm with an input
of n bits. The complexity C(A, n) is the maximum of CT (A, n) and CS(A, n).
Example 6.1.5 Let C be a binary [n, k] code given the generator matrix G.
Then the encoding procedure
(a1, . . . , ak) → (a1, . . . , ak)G
is an algorithm. For every execution of the encoding algorithm, the input is a
vector of length k which represents a message block; the output is a codeword
of length n. To compute one entry of a codeword one has to perform k multipli-
cations and k − 1 additions. The work complexity of this encoding is therefore
n(2k − 1). The memory complexity is nk + k + n: the number of bits needed
to store the input vector, the matrix G and the output codeword. Thus the
complexity is dominated by work complexity and thus is n(2k − 1).
Example 6.1.6 In coding theory the code length is usually taken as a measure
of an input size. In case of binary codes this coincides with the above complexity
measures. For q-ary codes an element of Fq has a minimal binary representation
by log(q) bits. A decoding algorithm with a received word of length n as input
can be represented by a binary word of length N = n log(q) . In case the finite
field is fixed there is no danger of confusion, but in case the efficiency of algo-
rithms for distinct finite fields are compared, everything should be expressed in

6.1. COMPLEXITY 167
terms of the number of binary elementary operations as a function of the length
of the input as a binary string.
Let us see how this works out for solving a system of linear equations over a
finite field. Whereas the addition and multiplication is counted for 1 unit in
the binary case, this is no longer the case in the q-ary case. An addition in
Fq is equal to log(q) binary elementary operations and multiplication needs
O(m2
log2
(p) + m log3
(p)) = O(log3
(q)) elementary operations, where q = pm
and p is the characteristic of the finite field, see ??. The Gauss-Jordan algorithm
to solve a system of n linear equations in n unknowns over a finite field Fq needs
O(n3
) additions and multiplications in Fq. That means the binary complexity is
O(n3
log3
(q)) = O(N3
), where N = n log(q) is the length of the binary input.
The known decoding algorithms that have polynomial complexity and that will
be treated in the sequel reduce all to linear algebra computations, so they have
complexity O(n3
) elementary operations in Fq or O(N3
) bit operations. So
we will take the code length n as a measure of the input size, and state the
complexity as a function of n. These polynomial decoding algorithms apply to
restricted classes of linear codes.
To study the theory of complexity, two different computational models which
both are widely used in the literature are the Turing machine (TM) model and
Boolean circuit model. Between these two models the Boolean circuit model has
an especially simple definition and is viewed more amenable to combinatorial
analysis. A Boolean circuit represents a Boolean function in a natural way.
And Boolean functions have a lot of applications in the theory of coding. In
this book we choose Boolean circuits as the computational model.
*** One of two paragraphs on Boolean Circuits vs. Turing Machines (c.f. R.B.
Boppana M. Sipser, ”The Complexity of Finite Function”) ***
The basic elements of a Boolean circuit are Boolean gates, namely, AND, OR,
NOT, and XOR, which are defined by the following truth tables.
The truth table of AND (denoted by ∧):
∧ F T
F F F
T F T
The truth of OR (denoted by ∨):
∨ F T
F F T
T T T
The truth table of NOT (denoted by ¬):
¬ F T
T F
The truth table of XOR:

XOR F T
F F T
T T F
It is easy to check that XOR gate can be represented by AND, OR and NOT
as the following
x XOR y = (x ∧ (¬y)) ∨ ((¬x) ∧ y).
The NAND operation is an AND operation followed by a NOT operation. The
NOR operation is an OR operation followed by a NOT operation. In the fol-
lowing definition of Boolean circuits, we restrict to operations AND, OR and
NOT.
Substituting F = 0 and T = 1, the Boolean gates above are actually operations
on bits (called logical operations on bits). We have
∧ operation:
0 ∧ 0 = 0
0 ∧ 1 = 0
1 ∧ 0 = 0
1 ∧ 1 = 1
∨ operation:
0 ∨ 0 = 0
0 ∨ 1 = 1
1 ∨ 0 = 1
1 ∨ 1 = 1
NOT operation:
¬ 0 = 1
¬ 1 = 0
Consider the binary elementary arithmetic operations + and ·. It is easy to
verify that
x · y = x ∧ y, and x + y = x XOR y = (x ∧ (¬y)) ∨ ((¬x) ∧ y).
Definition 6.1.7 Given positive integers n and m, a Boolean function is a
function b : {0, 1}n
→ {0, 1}m
. It is also called an n-input, m-output Boolean
function and the set of all such functions is denoted by B(n, m). Denote B(n, 1)
by B(n).
Remark 6.1.8 The number of elements of B(n, m) is (2n
)2m
= 2n2m
. Identify
{0, 1} with the binary field F2. Let b1 and b2 be elements of B(n, m). Then
the sum b1 + b2 is defined by (b1 + b2)(x) = b1(x) + b2(x) for x ∈ Fn
2 . In this
way the set of Boolean functions B(n, m) is a vector space over F2 of dimension
n2m
. Let b1 and b2 be elements of B(n). Then the product b1b2 is defined by
(b1b2)(x) = b1(x)b2(x) for x ∈ Fn
2 . In this way B(n) is an F2-algebra with the
property b2
= b for all b in B(n).

6.1. COMPLEXITY 169
Every polynomial f(X) in F2[X1, . . . , Xn] yields a Boolean function ˜f : Fn
2 → F2
by evaluation: ˜f(x) = f(x) for x ∈ Fn
2 . Consider the map
ev : F2[X1, . . . , Xn] −→ B(n),
deﬁned by ev(f) = ˜f. Then ev is an algebra homomorphism. Now ˜X2
i = ˜Xi
for all i. Hence the ideal X2
1 + X1, . . . , X2
n + Xn is contained in the kernel of
ev. The factor ring and F2[X1, . . . , Xn]/ X2
1 + X1, . . . , X2
n + Xn and B(n) are
both F2-algebras of the same dimension n. Hence ev induces an isomorphism
ev : F2[X1, . . . , Xn]/ X2
1 + X1, . . . , X2
n + Xn −→ B(n).
Example 6.1.9 Let symk(x) be the Boolean function deﬁned by the following
polynomial in k2
variables xij, 1 ≤ i, j ≤ k,
symk(x) =
k
i=1
k
j=1
xij.
Hence this description needs k(k−1) additions and k−1 multiplications. There-
fore k2
− 1 elementary operations are needed in total. If we would have written
symk in normal form by expanding the products, the description is of the form
symk(x) =
σ∈KK
k
i=1
xiσ(i),
where KK
is the set of all functions σ : {1, . . . , k} → {1, . . . , k} . This expression
has kk
terms of products of k factors. So this needs (k − 1)kk
multiplications
and kk
− 1 additions. Therefore kk+1
− 1 elementary operations are needed in
total. Hence this last description has exponential complexity.
Example 6.1.10 Computing the binary determinant. Let detk(x) be the Boolean
function of k2
variables xij, 1 ≤ i, j ≤ k, that computes the determinant over
F2 of the k × k matrix x = (xij). Hence
detk(x) =
σ∈Sk
k
i=1
xiσ(i),
where Sk is the symmetric group of k elements. This expression has k! terms of
products of k factors. Therefore k(k!) − 1 elementary operations are needed in
total.
Let ˆxij be the the square matrix of size k − 1 obtained by deleting the i-th row
and the j-th column from x. Using the cofactor expansion
detk(x) =
k
j=1
xijdetk(ˆxij),
we see that the complexity of this computation is of the order O(k!). This
complexity is still exponential. But detk has complexity O(k3
) by Gaussian
elimination. This translates in a description of detk as a Boolean function with
O(k3
) elementary operations.
***explicit description and worked out in and example in det3.***

Example 6.1.11 A Boolean function computing whether an integer is prime
or not. Let primem(x) be the Boolean function that is defined by
primem(x1, . . . , xm) =
1 if x1 + x22 + · · · + xm2m−1
is a prime,
0 otherwise.
So prime2(x1, x2) = x2 and prime3(x1, x2, x3) = x2 + x1x3 + x2x3.
Only very recently it was proved that the decision problem whether an integer
is prime or not, has polynomial complexity, see ??.
Example 6.1.12 ***A Boolean function computing exponentiation expa by
Coppersmith and Shparlinski, “A polynomial approximation of DL and DH
mapping,” Journ. Crypt. vol. 13, pp. 339–360, 2000. ***
Remark 6.1.13 From these examples we see that the complexity of a Boolean
function depends on the way we write it as a combination of elementary opera-
tions.
We can formally define the complexity of a Boolean function f in terms of the
size of a circuit that represents the Boolean function.
Definition 6.1.14 A Boolean circuit is a directed graph containing no cycles
(that is, if there is a route from any node to another node then there is no way
back), which has the following structure:
(i) Any node (also called vertex) v has in-degree (that is, the number of edges
entering v) equal to 0, 1 or 2, and the out-degree (that is, the number of
edges leaving v) equal to 0 or 1.
(ii) Each node is labeled by one of AND, OR, NOT, 0, 1, or a variable xi.
(iii) If a node has in-degree 0, then it is called an input and is labeled by 0, 1,
or a variable xi.
(iv) If a node has in-degree 1 and out-degree 1, then it is labeled by NOT.
(v) If a node has in-degree 2 and out-degree 1, then it is labeled by AND or
OR.
In a Boolean circuit, any node with in-degree greater than 0 is called a gate.
Any node with out-degree 0 is called an output.
Remark 6.1.15 By the definition, we observe that:
(1) A Boolean circuit can have more than one input and more than one output.
(2) Suppose a Boolean circuit has n variables x1, x2, . . . , xn, and has m out-
puts, then it represents a Boolean function f : {0, 1}n
→ {0, 1}m
in a
natural way.
(3) Any Boolean function f : {0, 1}n
→ {0, 1}m
can be represented by a
Boolean circuit.

6.1. COMPLEXITY 171
Definition 6.1.16 The size of a Boolean circuit is the number of gates that
it contains. The depth of a Boolean circuit is the length of the longest path
from an input to an output. For a Boolean function f, the time complexity of
f, denoted by CT (f), is the smallest value of the sizes of the Boolean circuits
representing f. The space complexity (also called depth complexity), denoted by
CS(f) is the smallest value of the depths of the Boolean circuits representing f.
Theorem 6.1.17 (Shannon) Existence of a family of Boolean function of ex-
ponential complexity.
Proof. Let us first give a upper bound on the number of circuits with n
variables and size s; and then compare it with the number of Boolean functions
of n variables.
In a circuit of size s, each gate is assigned an AND or OR operator that on two
previous nodes *** this phrase is strangely stated. ***. Each previous node can
either be a previous gate with at most s choices, a literal (that is, a variable or
its negation) with 2n choices, or a constant with 2 choices. Therefore, each gate
has at most 2(s+2n+2)2
choices, which implies that the number of circuits with
n variables and size s is at most 2s
(s + 2n + 2)2s
. Now, setting s = 2n
/(10n),
the upper bound 2s
(s + 2n + 2)2s
is approximately 22n
/5
22n
. On the other
hand, the number of Boolean functions of n variables and one output is 22n
.
This implies that almost every Boolean function requires circuits of size larger
than 2n
/(10n).
6.1.3 Hard problems
We now look at the classification of algorithms through the complexity.
Definition 6.1.18 Let
Ln(α, a) = O(exp(anα
(ln n)1−α
)),
where a and α are constants with 0 ≤ a and 0 ≤ α ≤ 1. In particular
Ln(1, a) = O(exp(an)), and Ln(0, a) = O(exp(a ln n)) = O(na
). Let A denote
an algorithm with input size n. Then A is an L(α)-algorithm if the complexity
of this algorithm has an estimate of the form Ln(α, a) for some a. An L(0)-
algorithm is called a polynomial algorithm and an L(1)-algorithm is called an
exponential algorithm. An L(α)-algorithm is called an subexponential algorithm
if α 1.
A problem that has either YES or NO as an answer is called a decision problem.
All the computational problems that will be encountered here can be phrased
as decision problems in such a way that an efficient algorithm for the decision
problem yields an efficient algorithm for the computational problem, and vice
versa. In the following complexity classes, we restrict our attention to decision
problems.
Definition 6.1.19 The complexity class P is the set of all decision problems
that are solvable in polynomial complexity.

Definition 6.1.20 The complexity class NP is the set of all decision problems
for which a YES answer can be verified in polynomial time given some extra
information, called a certificate. The complexity class co-NP is the set of all
decision problems for which a NO answer can be verified in polynomial time
given an appropriate certificate.
Example 6.1.21 Consider the decision problem that has as input a generator
matrix of a code C and a positive integer w, with question “d(C) ≤ w?” In case
the answer is yes, there exists a codeword c of minimum weight d(C). Then c
is a certificate and the verification wt(c) ≤ w has complexity n.
Definition 6.1.22 Let D1 and D2 be two computational problems. Then D1
is said to polytime reducible to D2, denoted as D1 ≤P D2, provided that there
exists an algorithm A1 that solves D1 which uses an algorithm A2 that solves
D2, and A1 runs in polynomial time if A2 does. Informally, if D1 ≤P D2, we
say D1 is no harder than D2. If D1 ≤P D2 and D2 ≤P D1, then D1 and D2 are
said to be computationally equivalent.
Definition 6.1.23 A decision problem D is said to be NP-complete if
• D ∈ NP, and
• E ≤P D for every E ∈ NP.
The class of all NP-complete problems is denoted by NPC.
Definition 6.1.24 A computational problem (not necessarily a decision prob-
lem) is NP-hard if there exists some NP-complete problem that polytime re-
duces to it.
Observe that every NP-complete problem is NP-hard. So the set of all NP-
hard problems contains NPC as a subset. Some other relationships among the
complexity classes above are illustrated as follows.
******A Figure******
It is natural to ask the following questions
(1) Is P = NP ?
(2) Is NP = co-NP ?
(3) Is P = NP ∩ co-NP ?
Most experts are of the opinion that the answer to each of these questions is NO.
However no mathematical proofs are available, and to answer these questions is
an interesting and hard problem in theoretical computer science.
6.1.4 Exercises
6.1.1 Give an explicit expression of det3(x) as a Boolean function.
6.1.2 Give an explicit expression of prime4(x) as a Boolean function.
6.1.3 Give an explicit expression of expa(x) as a Boolean function, where ....

6.2. DECODING 173
6.2 Decoding
*** intro***
6.2.1 Decoding complexity
The known decoding algorithms that work for all linear codes have exponential
complexity. Now we consider some of them.
Remark 6.2.1 The brute force method compares the distance of a received
word with all possible codewords, and chooses a codeword of minimum distance.
The time complexity of the brute force method is at most nqk
.
Definition 6.2.2 Let r be a received word with respect to a code C of dimen-
sion k. Choose an (n − k) × n parity check matrix H of the code C. Then
s = rHT
∈ Fn−k
q is called the syndrome of r.
Remark 6.2.3 Let C be a code of dimension k. Let r be a received word.
Then r + C is called the coset of r. Now the cosets of the received words r1
and r2 are the same if and only if r1HT
= r2HT
. Therefore there is a one to
one correspondence between cosets of C and values of syndromes. Furthermore
every element of Fn−k
q is the syndrome of some received word r, since H has
rank n − k. Hence the number of cosets is qn−k
.
Remark 6.2.4 In Definition 2.4.10 of coset leader decoding no mention is given
of how this method is implemented. Cosetleader decoding can be done in two
ways. Let H be a parity check matrix and G a generator matrix of C.
1) Preprocess a look-up table and store it in memory with a list of pairs (s, e),
where e is a coset leader of the coset with syndrome s ∈ Fn−k
q .
Suppose a received word r is the input, compute s = rHT
; look at the unique
pair (s, e) in the table with s as its first entry; give r − e as output.
2) For a received word r, compute s = rHT
; compute a solution e of minimal
weight of the equation eHT
= s; give r − e as output.
Now consider the complexity of the two methods for coset leader decoding:
1) The space complexity is clearly qn−k
the number of elements in the table. The
time complexity is O(k2
(n−k)) for finding the solution c. The preprocessing of
the table has time complexity qn−k
by going through all possible error patterns
e of non-decreasing weight and compute s = eHT
. Put (s, e) in the list if s is
not already a first entry of a pair in the list.
2) Go through all possible error patterns e of non-decreasing weight and compute
s = eHT
and compare it with rHT
, where r is the received word. The first
instance where eHT
= rHT
gives a closest codeword c = r−e. The complexity
is at most |Bρ|n2
for finding a coset leader, where ρ is the covering radius, by
Remark 2.4.9.
***Now |Bρ| ≈ ... ***
Example 6.2.5 ***[7,4,3] Hamming codes and other perfect codes, some small
non perfect codes.***
In order to compare their complexities we introduce the following definitions.
***work factor, memory factor***

Definition 6.2.6 Let the complexity of an algorithm be exponential O(qen
)
for n → ∞. Then e is called the complexity exponent of the algorithm.
Example 6.2.7 The complexity exponent of the brute force method is R and
of coset leader decoding is 1 − R, where R is the information rate.
***Barg, van Tilburg. picture***
6.2.2 Decoding erasures
***hard/soft decision decoding, (de)modulation, signalling***
After receiving a word there is a stage at the beginning of the decoding process
where a decision has to be made about which symbol has been received. In
some applications it is desirable to postpone a decision and to put a question
mark ”?” as a new symbol at that position, as if the symbol was erased. This is
called an erasure. So a word over the alphabet Fq with erasures can be viewed
as a word in the alphabet Fq ∪ {?}, that is an element of (Fq ∪ {?})n
. If only
erasures occur and the number of erasures is at most d−1, then we are sure that
there is a unique codeword that agrees with the received word at all positions
that are not an erasure.
Proposition 6.2.8 Let d be the minimum distance of a code. Then for every
received word with t errors and s erasures such that 2t + s d there is a unique
nearest codeword. Conversely, if d ≤ 2t + s then there is a received word with
at most t errors and s erasures with respect to more than one codeword.
Proof. This is left as an exercise to the reader.
Suppose that we have received a word with s erasures and no errors. Then
the brute force method would fill in all the possible qs
words at the erasure
positions and check whether the obtained word is a codeword. This method has
complexity O(n2
qs
), which is exponential in the number of erasures. In this
section it is shown that correcting erasures only by solving a system of linear
equations. This can be achieved by using the generator matrix or the parity
check matrix. The most efficient choice depends on the rate and the minimum
distance of the code.
Proposition 6.2.9 Let C be a code in Fn
q with parity check matrix H and
minimum distance d. Suppose that the codeword c is transmitted and the word r
is received with no errors and at most d−1 erasures. Let J be the set of erasure
positions of r. Let y ∈ Fn
q be defined by yj = rj if j ∈ J and yj = 0 otherwise.
Let s = yHT
be the syndrome of y. Let e = y − c. Then wt(e) d and e is the
unique solution of the following system of linear equations in x:
xHT
= s and xj = 0 for all j ∈ J.
Proof. By the definitions we have that
s = yHT
= cHT
+ eHT
= 0 + eHT
= eHT
.

6.2. DECODING 175
The support of e is contained in J. Hence ej = 0 for all j ∈ J. Therefore e is a
solution of the system of linear equations.
If x is another solution, then (x − e)HT
= 0. Therefore x − e is an element of
C, and moreover it is supported at J. So its weight is at most d(C) − 1. Hence
it must be zero. Therefore x = e.
The above method of correcting the erasures only by means of a parity check
matrix is called syndrome decoding up to the minimum distance.
Definition 6.2.10 Let the complexity of an algorithm be f(n) with f(n) ≈ cne
for n → ∞. Then the algorithm is called polynomial of degree e with complexity
coefficient c.
Corollary 6.2.11 The complexity of correcting erasure only by means of syn-
drome decoding up to the minimum distance is polynomial of degree 3 and com-
plexity coefficient 1
3 (1 − R)2
δ for a code of length n → ∞, where R is the
information rate and δ the relative minimum distance.
Proof. This is consequence of Proposition 6.2.9 which amounts to solving a
system of n − k linear equations in at most d − 1 unknowns, in order to get the
error vector e. Then c = y − e is the codeword sent. We may assume that the
encoding is done systematically at k positions, so the message m is immediately
read off from these k positions. The complexity is asymptotically of the order:
1
3 (n − k)2
d = 1
3 (1 − R)2
δn3
for n → ∞. See Appendix ??.
Example 6.2.12 Let C be the binary [7, 4, 3] Hamming code with parity check
matrix given in Example 2.2.9. Suppose that r = (1, 0, ?, ?, 0, 1, 0) is a received
word with two erasures. Replace the erasures by zeros by y = (1, 0, 0, 0, 0, 1, 0).
The syndrome of y is equal to yHT
= (0, 0, 1). Now we want to solve the system
of linear equations xHT
= (1, 1, 0) and xi = 0 for all i = 3, 4. Hence x3 = 1 and
x4 = 1 and c = (1, 0, 1, 1, 0, 1, 0) is the transmitted codeword.
Example 6.2.13 Consider the MDS code C1 over F11 of length 11 and dimen-
sion 4 with generator matrix G1 as given in Proposition 3.2.10 with xi = i ∈ F11
for i = 1, . . . , 11. Let C be the dual code of C1. Then C is a [11, 7, 5] code by
Corollary 3.2.7, and H = G1 is a parity check matrix for C by Proposition
2.3.19. Suppose that we receive the following word with 4 erasures and no
errors.
r = (1, 0, ?, 2, ?, 0, 0, 3, ?, ?, 0).
What is the sent codeword ? Replacing the erasures by 0 gives the word
y = (1, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0).
So yHT
= (6, 0, 5, 4). Consider the linear system of equations given by the 4×4
submatrix of H consisting of the columns corresponding to the erasure positions
3, 5, 9 and 10 together with the column HyT
.




1 1 1 1 6
3 5 9 10 0
9 3 4 1 5
5 4 3 10 4



 .

After Gaussian elimination we see that (0, 8, 9, 0)T
is the unique solution of this
system of linear equations. Hence
c = (1, 0, 0, 2, 3, 0, 0, 3, 2, 0, 0)
is the codeword sent.
Remark 6.2.14 Erasures only correction by means of syndrome decoding is
efficient in case the information rate R is close to 1 and the relative minimum
distance δ is small, but cumbersome if R is small and δ is close to 1. Take
for instance the [n, 1, n] binary repetition code. Any received word with n − 1
erasures is readily corrected by looking at the remaining unerased position, if
it is 0, then the all zero word was sent, and if it is 1, then the all one word
was sent. With syndrome decoding one should solve a system of n − 1 linear
equations in n − 1 unknowns.
The following method to correct erasures only uses a generator matrix of a code.
Proposition 6.2.15 Let G be a generator matrix of an [n, k, d] code C over
Fq. Let m ∈ Fk
q be the transmitted message. Let s be an integer such that
s d. Let r be the received word with no errors and at most s erasures. Let
I = {j1, . . . , jn−s} be the subset of size n − s that is in the complement of the
erasure positions. Let y ∈ Fn−s
q be defined by yi = rji
for i = 1, . . . , n − s.
Let G be the k × (n − s) submatrix of G consisting of the n − s columns of G
corresponding to the set I. Then xG = y has a unique solution m, and mG is
the codeword sent.
Proof. The Singleton bound 3.2.1 states that k ≤ n − d + 1. So k ≤ n − s.
Now mG = c is the codeword sent and yi = rji = cji for i = 1, . . . , n−s. Hence
mG = y and m is a solution. Now suppose that x ∈ Fk
q satisfies xG = y,
then (m − x)G is a codeword that has a zero at n − s positions, so its weight
is at most s d. So (m − x)G is the zero codeword and xG = mG . Hence
m − x = 0, since G has rank k.
The above method is called correcting erasures only up to the minimum distance
by means of the generator matrix.
Corollary 6.2.16 The complexity of correcting erasures only up to the mini-
mum distance by means of the generator matrix is polynomial of degree 3 and
complexity coefficient R2
(1 − δ − 2
3 R) for a code of length n → ∞, where R is
the information rate and δ the relative minimum distance.
Proof. This is consequence of Proposition 6.2.15. The complexity is that of
solving a system of k linear equations in at most n − d + 1 unknowns, which is
asymptotically of the order: (n − d − 2
3 k)k2
= R2
(1 − δ − 2
3 R)n3
for n → ∞.
See Appendix ??.
***picture, comparison of G and H method**
Example 6.2.17 Let C be the [7, 2, 6] extended Reed-Solomon code over F7
with generator matrix
G =
1 1 1 1 1 1 1
0 1 2 3 4 5 6

6.2. DECODING 177
Suppose that (?, 3, ?, ?, ?, 4, ?) is a received word with no errors and 5 erasures.
By means of the generator matrix we have to solve the following linear system
of equations:
x1 + x2 = 3
x1 + 5x2 = 4
which has (x1, x2) = (1, 2) as solution. Hence (1, 2)G = (1, 3, 5, 0, 2, 4, 6) was the
transmitted codeword. With syndrome decoding a system of 5 linear equations
in 5 unknowns must be solved.
Remark 6.2.18 For MDS codes we have asymptotically R ≈ 1 − δ and cor-
recting erasures only by syndrome decoding and by a generator matrix has
complexity coefficients 1
3 (1 − R)3
and 1
3 R3
, respectively. Therefore syndrome
decoding decoding is preferred for R 0.5 and by a generator matrix if R 0.5.
6.2.3 Information and covering set decoding
The idea of this section is to decode by finding error-free positions in a received
word, thus localizing errors. Let r be a received word written as r = c + e,
where c is a codeword from an [n, k, d] code C and e is an error vector with
the support supp(e). Note that if I is some information set (Definition 2.2.20)
such that supp(e) ∩ I = ∅, then we are actually able to decode. Indeed, as
supp(e) ∩ I = ∅, we have that r(I) = c(I) (Definition 3.1.2). Now if we denote
by G the generator matrix of C, then the submatrix G(I) can be transformed to
the identity matrix Idk. Let G = MG, where M = G
(−1)
(I) , so that G(I) = Idk,
see Proposition 2.2.22. Thus a unique solution m ∈ Fk
q of mG = c, can be
found as m = r(I)M, because mG = r(I)MG = r(I)G and the latter restricted
to the positions of I yields r(I) = c(I). Now the algorithm, called information
set decoding exploiting this idea is presented in Algorithm 6.1.
Algorithm 6.1 Information set decoding
Input:
- Generator matrix G of an [n, k] code C
- Received word r
- I(C) a collection of all the information sets of a given code C
Output: A codeword c ∈ C, such that d(r, c) = d(r, C)
Begin
c := 0;
for I ∈ I(C) do
G := G
(−1)
(I) G
c := r(I)G
if d(c , r) d(c, r) then
c = c
end if
end for
return c
End
Theorem 6.2.19 The information set decoding algorithm performs minimum
distance decoding.

Proof. Let r = c + e, where wt(e) = d(r, C). Let rH = eH = s. Then e
is a coset leader with the support E = supp(e) in the coset with the syndrome
s. It is enough to prove that there exists some information set disjoint with E,
or, equivalently, some check set (Definition 2.3.9) that contains E. Consider an
(n − k) × |E| submatrix H(E) of the parity check matrix H. As e is a coset
leader, we have that for no other vector v in the same coset is supp(v) E.
Thus the subsystem of the parity check system defined by positions from E has
a unique solution e(E). Otherwise it would be possible to find a solution with
support a proper subset of E. The above implies that rank(H(E)) = |E| ≤ n−k.
Thus E can be expanded to a check set.
For a practical application it is convenient to choose the sets I randomly.
Namely, we choose some k-subsets randomly in hope that after some reasonable
amount of trials we encounter the one that is an information set and error-free.
Algorithm 6.2 Probabilistic information set decoding
Input:
- Received word r
- Number of trials Ntrials(n, k)
Output: A codeword c ∈ C
Begin
c := 0;
Ntr := 0;
repeat
Ntr := Ntr + 1;
Choose uniformly at random a subset I of {1, . . . , n} of cardinality k.
if G(I) is invertible then
G := G
(−1)
(I) G
c = r(I)G
c := c
end if
end if
until Ntr Ntrials(n, k)
return c
End
We would like to estimate the complexity of the probabilistic information set
decoding for the generic codes. Parameters of generic codes are computed in
Theorem 3.3.6. We now use this result and notations to formulate the following
result on complexity.
Theorem 6.2.20 Let C be a generic [n, k, d] q-ary code, with the dimension
k = Rn, 0 R 1 and the minimum distance d = d0, so that the covering
radius is d0(1 + o(1)). If Ntrials(n, k) is at least
σ · n ·
n
d0
/
n − k
d0
,

6.2. DECODING 179
*** sigma is 1/pr(nxn matrix is invertible) add to Theorem 3.3.7*** then for
large enough n the probabilistic information set decoding algorithm for the generic
code C performs minimum distance decoding with negligibly small decoding er-
ror. Moreover the algorithm is exponential with complexity exponent
CCq(R) = (logq 2) H2(δ0) − (1 − R)H2
δ0
1 − R
, (6.1)
where H2 is the binary entropy function.
Proof. In order to succeed in the algorithm, we need that the set I chosen
at a certain iteration is error-free and that the corresponding submatrix of G is
invertible. The probability P(n, k, d0) of this event is
n − d0
k
/
n
k
σq(n) =
n − k
d0
/
n
d0
σq(n).
Therefore the probability that I fails to satisfy these properties is
1 −
n − k
d0
/
n
d0
σq(n).
Considering the assumption on Ntrials(n, k) we have that probability of not-
ﬁnding an error-free information set after Ntrials(n, k) trials is
(1 − P(n, k, d0))n/P (n,k,d0)
= O(e−n
),
which is negligible.
Next, due to the fact that determining whether G(I) is invertible and perform-
ing operations in the if-part have polynomial time complexity, we have that
Ntrials(n, k) dominates time complexity. Our task now is to give an asymptotic
estimate of the latter. First, d0 = δ0n, where δ0 = H−1
q (1 − R), see Theorem
3.3.6. Then, using Stirling’s approximation log2 n! = n log2 n − n + o(n), we
have
n−1
log2
n
d0
= n−1
n log2 n − d0 log2 d0 − (n − d0) log2(n − d0) + o(n) =
= log2 n − δ0 log2(δ0n) − (1 − δ0) log2((1 − δ0)n) + o(1)
= H2(δ0) + o(1)
Thus
logq
n
d0
= (nH2(δ0) + o(n)) log2 q.
Analogously
logq
n − k
d0
= (n(1 − R)H2
δ0
1 − R
+ o(n)) log2 q,
where n − k = (1 − R)n. Now
logq Ntrials(n, k) = logq n + logq σ + logq
n
d0
+ logq
n − k
d0
.
Considering that the ﬁrst two summands are dominated by the last two, the
claim on the complexity exponent follows.

0.2 0.4 0.6 0.8 1
0.1
0.2
0.3
0.4
0.5
ES SD
CS
Figure 6.1: Exhaustive search, syndrome decoding, and information set algo-
rithm
If we depict complexity coeﬃcient of the exhaustive search, syndrome decoding,
and the probabilistic information set decoding, we see that the information set
decoding is strongly superior to the former two, see Figure 6.1.
We may think of the above algorithms in a dual way using check sets instead of
information sets and parity check matrices instead of generator matrices. The
set of all check sets is closely related with the so-called covering systems, which
we will consider a bit later in this section and which give the name for the
algorithm.
Algorithm 6.3 Covering set decoding
Input:
- Parity check matrix H of an [n, k] code C
- Received word r
- J (C) a collection of all the check sets of a given code C
Output: A codeword c ∈ C, such that d(r, c) = d(r, C)
Begin
c := 0;
s := rHT
;
for J ∈ J (C) do
e := s · (H−1
(J))T
;
Compute e such that e(J) = e and ej = 0 for j not in J;
c := r − e;
c = c
end if
end for
return c
End
Theorem 6.2.21 The covering set decoding algorithm performs minimum dis-

6.2. DECODING 181
tance decoding.
Proof. Let r = c + e as in the proof of Theorem 6.2.19. From that proof
we know that there exists a check set J such that supp(e) ⊂ J. Now we have
HrT
= HeT
= H(J)e(J). Since for the check set J, the matrix H(J) is invertible,
we may find e(J) and thus e.
Similarly to Algorithm 6.1 one may define the probabilistic version of the cov-
ering set decoding. As we have already mentioned the covering set decoding
algorithm is closely related to the notion of a covering system. The overview of
this notion follows next.
Definition 6.2.22 Let n, l and t be integers such that 0 t ≤ l ≤ n. An
(n, l, t) covering system is a collection J of subsets J of {1, . . . , n}, such that
every J ∈ J has l elements and every subset of {1, . . . , n} of size t is contained
in at least one J ∈ J . The elements J of a covering system J are also called
blocks. If a subset T of size t is contained in a J ∈ J , then we say that T is
covered or trapped by J.
Remark 6.2.23 From the proofs of Theorem 6.2.21 for almost all codes it is
enough to find a a collection J of subsets J of {1, . . . , n}, such that all J ∈ J
have n − k elements and every subset of {1, . . . , n} of size ρ = d0 + o(1) is
contained in at least one J ∈ J , thus obtaining (n, n − k, d0) covering system.
Example 6.2.24 The collection of all subsets of {1, . . . , n} of size l is an (n, l, t)
covering system for all 0 t ≤ l. This collection consists of n
l blocks.
Example 6.2.25 Consider F2
q, the affine plane over Fq. Let n = q2
be the
number of its points. Then every line consists of q points, and every collection
of two points is covered by exactly one line. Hence there exists a (q2
, q, 2)
covering system. Every line that is not parallel to the x-axis is given by a
unique equation y = mx + c. There are q2
of such lines. And there are q lines
parallel to the y-axis. So the total number of lines is q2
+ q.
Example 6.2.26 Consider the projective plane over Fq as treated in Section
4.3.1. Let n = q2
+ q + 1 be the number of its points. Then every line consists
of q + 1 points, and every collection of two points is covered by exactly one line.
There are q + 1 lines. Hence there exists a (q2
+ q + 1, q + 1, 2) covering system
consisting of q + 1 blocks.
Remark 6.2.27 The number of blocks of an (n, l, t) covering system is consid-
erably smaller than the number of all possible t-sets. It is still at least n
t / l
t .
But also this number grows exponentially in n if λ = limn→∞ l/n 0 and
τ = limn→∞ t/n 0.
Definition 6.2.28 The covering coefficient b(n, l, t) is the smallest integer b
such that there is an (n, l, t) covering system consisting of b blocks.
Although the exact value of the covering coefficient b(n, l, t) is an open problem
we do know its asymptotic logarithmic behavior.

Proposition 6.2.29 Let λ and τ be constants such that 0 τ λ 1. Then
lim
n→∞
1
n
log b(n, λn , τn ) = H2(τ) − λH2(τ/λ),
Proof. *** I suggest to skip the proof ***
In order to establish this asymptotical result we prove lower and upper bounds,
which are asymptotically identical. First the lower bound on b(n, l, t). Note
that every l-tuple traps l
t t-tuples. Therefore, one needs at least
n
t
/
l
t
l-tuples. Now we use the relation from [?]: 2nH2(θ)−o+(n)
≤ n
θn ≤ 2nH2(θ)
for 0 θ 1, where o+(n) is a non-negative function, such that o+(n) =
o(n). Applying this lower bound for l = λn and t = τn we have n
τn ≥
2nH2(τ)−o+(n)
and λn
τn
≤ 2nλH2(τ/λ)
. Therefore,
b(n, λn , τn ) ≥ n
τn / λn
τn
≥ 2n H2(τ)−λH2(τ/λ) +o+(n)
For a similar lower bound see Exercise ??.
Now the upper bound. Consider a set S with f(n, l, t) independently and uni-
formly randomly chosen l-tuples, such that
f(n, l, t) =
n
l
n−t
n−l
· cn,
where c ln 2. The probability that a t-tuple is not trapped by any tuple from
S is
1 −
n − t
n − l
/
n
l
f(n,l,t)
.
Indeed, the number of all l-tuples is n
l , the probability of trapping a given t-
tuple T1 by an l-tuple T2 is the same as probability of trapping the complement
of T2 by the complement of T1 and is equal to n−t
n−l / n
l . The expected number
of non-trapped t-tuples is then
n
t
1 −
n − t
n − l
/
n
l
f(n,l,t)
.
Using the relation limx→∞(1 − 1/x)x
= e−1
and the expression for f(n, l, t) we
have that the expected number above tends to
T = 2nH2(t/n)+o(n)−cn log e
From the condition on c we have that T 1. This implies that among all
the sets with f(n, l, t) independently and uniformly randomly chosen l-tuples,
there exists one that traps all the t-tuples. Thus b(n, l, t) ≤ f(n, l, t). By the
well-known combinatorial identities
n
t
n − t
n − l
=
n
n − l
l
t
=
n
l
l
t
,

6.2. DECODING 183
we have that for t = τn and l = λn holds
b(n, l, t) ≤ 2n H2(τ)−λH2(τ/λ) +o(n)
,
which asymptotically coincides with the lower bound proven above.
Let us now turn to the case of bounded distance decoding. So here we are aiming
at correcting some t errors, where t ρ. The complexity result for almost all
codes is obtained by substituting t/n instead of δ0 in (6.1). In particular, for
decoding up to half the minimum distance for almost all codes we have the
following result.
Corollary 6.2.30 If Ntrials(n, k) is at least
n ·
n
d0/2
/
n − k
d0/2
,
then covering set decoding algorithm for almost all codes performs decoding up
to half the minimum distance with negligibly small decoding error. Moreover the
algorithm is exponential with complexity coefficient
CSBq(R) = (logq 2) H2(δ0/2) − (1 − R)H2
δ0
2(1 − R)
. (6.2)
We are interested now in bounded decoding up to t ≤ d − 1. For almost all
(long) codes the case t = d − 1 coincides with the minimum distance decoding,
see... . From Proposition 6.2.9 it is enough to find a collection J of subsets
J of {1, . . . , n}, such that all J ∈ J have d − 1 elements and every subset
of {1, . . . , n} of size t is contained in at least one J ∈ J . Thus we need an
(n, d − 1, t) covering system. Let us call this the erasure set decoding.
Example 6.2.31 Consider a code of length 13, dimension 9 and minimum dis-
tance 5. The number of all 2-sets of {1, . . . , 13} is equal to 13
2 = 78. In order to
correct two errors one has to compute the linear combinations of two columns of
a parity check matrix H, for all the 78 choices of two columns, and see whether
it is equal to rHT
for the received word r.
An improvement can be obtained by a covering set. Consider the projective
plane over F3 as in Example 6.2.26. Hence we have a (13, 4, 2) covering system.
Using this covering system there are 13 subsets of 4 elements for which one has
to find HrT
as a linear combination of the corresponding columns of the parity
check matrix. So we have to consider 13 times a system of 4 linear equations in
4 variables instead of 78 times a system of 4 linear equations in 2 variables.
From Proposition 6.2.29 and Remark ?? we have the complexity result for era-
sure set decoding.
Proposition 6.2.32 Erasure set decoding performs bounded distance decoding
for every t = αδ0n, 0 α ≤ 1. The algorithm is exponential with complexity
coefficient
ESq(R) = (logq 2) H2(αδ0) − δ0H2(α) . (6.3)
Proof. The proof is left to the reader as an exercise.
It can be shown, see Exercise 6.2.7, that erasure set decoding is interior to cov-
ering set for all α.
***Permutation decoding, Huffman-Pless 10.2, ex Golay q=3, exer q=2***

6.2.4 Nearest neighbor decoding
***decoding using minimal codewords.
6.2.5 Exercises
6.2.1 Count an erasure as half an error. Use this idea to define an extension
of the Hamming distance on (Fq ∪ {?})n
and show that it is a metric.
6.2.3 Consider the code C over F11 with parameters [11, 7, 5] of Example
6.2.13. Suppose that we receive the word (7, 6, 5, 4, 3, 2, 1, ?, ?, ?, ?) with 4 era-
sures and no errors. Which codeword is sent?
6.2.4 Consider the code C1 over F11 with parameters [11, 4, 8] of Example
6.2.13. Suppose that we receive the word (4, 3, 2, 1, ?, ?, ?, ?, ?, ?, ?) with 7 era-
sures and no errors. Find the codeword sent.
6.2.5 Consider the covering systems of lines in the affine space Fm
q of dimension
m over Fq, and the projective space of dimension m over Fq, respectively. Show
the existence of a (qn
, q, 2) and a ((qm+1
− 1)/(q − 1), q + 1, 2) covering system
as in Examples 6.2.25 and 6.2.26 in the case m = 2. Compute the number of
lines in both cases.
6.2.6 Prove the following lower bound on b(n, l, t):
b(n, l, t) ≥
n
l
n − 1
l − 1
. . .
n − t + 1
l − t + 1
. . . .
Hint: By double counting argument prove first that l · b(n, l, t) ≥ n · b(n − 1, l −
1, t − 1) and then use b(n, l, 1) = n/l .
6.2.7 By using the properties of binary entropy function prove that for all
0 R 1 and 0 α 1 holds
(1 − R)H2
αH−1
q (1 − R)
1 − R
H−1
q (1 − R) · H2(α).
Conclude that covering set decoding is superior to erasure set.
6.3 Difficult problems in coding theory
6.3.1 General decoding and computing minimum distance
We have formulated the decoding problem in Section 6.2. As we have seen that
the minimum (Hamming) distance of a linear code is an important parameter
which can be used to estimate the decoding performance. However, a larger
minimum distance does not guarantee the existence of an efficient decoding al-
gorithm. It is natural to ask the following computational questions: For general
linear codes, whether there exists a decoding algorithm with polynomial-time
complexity? Whether or not there exists a polynomial-time algorithm which
finds the minimum distance for any linear code? It has been proved that these

6.3. DIFFICULT PROBLEMS IN CODING THEORY 185
computational problems are both intractable.
Let C be an [n, k] binary linear code. Suppose r is the received word. According
to the maximum-likelihood decoding principle, we wish to find a codeword such
that the Hamming distance between r and the codeword is minimal. As we
have seen in previous sections that using the brute force search, correct decod-
ing requires 2k
comparisons in the worst case, and thus has exponential-time
complexity.
Consider the syndrome of the received word. Let H be a parity-check matrix of
C, which is an m × n matrix, where m = n − k. The syndrome of r is s = rHT
.
The following two computational problems are equivalent, letting c = r − e:
(1) (Maximum-likelihood decoding problem) Finding a codeword c such
that d(r, c) is minimal.
(2) Finding a minimum-weight solution e to the equation xHT
= s.
Clearly, an algorithm which solves the following computational problem (3) also
solves the above Problem (2).
(3) For any non-negative integer w, find a vector x of Hamming weight ≤ w
such that xHT
= s.
Conversely, an algorithm which solves Problem (2) must solve Problem (3). In
fact, suppose e is a minimum-weight solution e to the equation xHT
= s. Then,
for w wt(e), the algorithm will return “no solution”; for w ≥ wt(e), the algo-
rithm returns e. Thus, the maximum-likelihood decoding problem is equivalent
the above problem (3).
The decision problem of the maximum-likelihood decoding problem is as
Decision Problem of Decoding Linear Codes
INSTANCE: An m × n binary matrix H, a binary vector s of length m, and
a non-negative integer w.
QUESTION: Is there a binary vector x ∈ Fn
2 of Hamming weight ≤ w such
that xHT
= s?
Proposition 6.3.1 The decision problem of decoding linear codes is an NP-
complete problem.
We will prove this proposition by reducing the three-dimensional matching prob-
lem to the decision problem of decoding linear codes. The three-dimensional
matching problem is a well-known NP-complete problem. For the completeness,
we recall this problem as follows.
Three-Dimensional Matching Problem
INSTANCE: A set T ⊆ S1 × S2 × S3, where S1, S2, and S3 are disjoint finite
sets having same number of elements, a = |S1| = |S2| = |S3|.

QUESTION: Does T contain a matching, that is, a subset U ⊆ T such that
|U| = a and no two elements of U agree in any coordinate?
We now construct a matrix M which is called the incidence matrix of T as
follows. Fix an ordering of the triples of T. Let ti = (ti1, ti2, ti3) denote the
i-th triple of T for i = 1, . . . , |T|. The matrix M has |T| rows and 3a columns.
Each row mi of M is a binary vector of length 3a and Hamming weight 3,
which is constituted of three blocks bi1, bi2 and bi3, of the same length a, i.e.,
mi = (bi1, bi2, bi3). For u = 1, 2, 3, if tiu is the v element of Su, then the v-th
coordinate of biu is 1, all the other coordinates of this block is 0.
Clearly, the existence of a matching of the Three-Dimensional Matching Prob-
lem is equivalent to the existence of a rows of M such that their mod 2 sum is
(1, 1, . . . , 1), that is, there exist a binary vector x ∈ F
|T |
2 of weight a such that
xM = (1, 1, . . . , 1) ∈ F3a
2 . Now we are ready to prove Proposition 6.3.1.
Proof of Proposition 6.3.1. Suppose we have a polynomial-time algorithm
solving the Decision Problem of Decoding Linear Codes. Given an input T ⊆
S1 × S2 × S3 for the Three-Dimensional Matching Problem, set H = MT
,
where M is the incidence matrix of T, s = (1, 1, . . . , 1) and w = a. Then run-
ning the algorithm for the Decision Problem of Decoding Linear Codes, we will
discover whether or not there exist the desired matching. Thus, a polynomial-
time algorithm for the Decision Problem of Decoding Linear Codes implies a
polynomial-time algorithm for the Three-Dimensional Matching Problem. This
proves the Decision Problem of Decoding Linear Codes is NP-complete.
Next, let us the consider the problem of computing the minimum distance of an
[n, k] binary linear code C with a parity-check matrix H. For any linear code,
the minimum distance is equal to the minimum weight, we use these two terms
interchangeably. Consider the following decision problem.
Decision Problem of Computing Minimum Distance
INSTANCE: An m × n binary matrix H and a non-negative integer w.
QUESTION: Is there a nonzero binary vector x of Hamming weight w such
that xHT
= 0?
If we have an algorithm which solves the above problem, then we can run the
algorithm with w = 1, 2, . . ., and the first integer d with affirmative answer is
the minimum weight of C. On the other hand, if we have an algorithm which
finds the minimum weight d of C, then we can solve the above problem by
comparing w with d. Therefore, we call this problem the Decision Problem
of Computing Minimum Distance, and the NP-completeness of this problem
implies the NP-hardness of the problem of computing the minimum distance.

6.3. DIFFICULT PROBLEMS IN CODING THEORY 187
***Computing the minimum distance:
– brute force, complexity (qk
− 1)/(q − 1), O(qk
)
– minimal number of parity checks: O( n
k k3
)***
***Brouwer’s algorithm and variations, Zimmerman-Canteau-Chabeaud, Sala***
*** Vardy’s result: computing the min. dist. is NP hard***
6.3.2 Is decoding up to half the minimum distance hard?
Finding the minimum distance and decoding up to half the minimum distance
are closely related problems.
Algorithm 6.3.2 Suppose that A is an algorithm that computes the minimum
distance of an Fq-linear code C that is given by a parity check matrix H. We
define an algorithm D with input y ∈ Fn
q . Let s = HyT
be the syndrome of
y with respect to H. Let ˜H = [H|s] be the parity check matrix of the code
˜C of length n + 1. Let ˜Ci be the code that is obtained by puncturing ˜C at
the i-th position. Use algorithm A to compute d(C) and d( ˜Ci) for i ≤ n. Let
t = min{d( ˜Ci)|i ≤ n}. Let I = {i|t = d( ˜Ci), i ≤ n}. Assume |I| = t and
t d(C). Assume furthermore that erasure decoding at the positions I finds a
unique codeword c in C such that ci = yi for all i not in I. Output c in case
the above assumptions are met, and output ∗ otherwise.
Proposition 6.3.3 Let A be an algorithm that computes the minimum dis-
tance. Let D be the algorithm that is defined in 6.3.2. Let y ∈ Fn
q be an input.
Then D is a decoder that gives as output c in case d(C, y) d(C) and y has c
as unique nearest codeword. In particular D is a decoder of C that corrects up
to half the minimum distance.
Proof. Let y be a word with t = d(C, y) d(C) and suppose that c is a
unique nearest codeword. Then y = c + e with c ∈ C and t = wt(e). Note that
(e, −1) ∈ ˜C, since s = HyT
= HeT
. So d( ˜C) ≤ t+1. Let ˜z be in ˜C. If ˜zn+1 = 0,
then ˜z = (z, 0) with z ∈ C. Hence wt(˜z) ≥ d(C) ≥ t + 1. If ˜zn+1 = 0, then
without loss of generality we may assume that ˜z = (z, −1). So ˜H˜zT
= 0. Hence
HzT
= s. So c = y − z ∈ C. If wt(˜z) ≤ t + 1, then wt(z) ≤ t. So d(y, c ) ≤ t.
Hence c = c, since c is the unique nearest codeword by assumption. Therefore
z = e and wt(z) = t. Hence d( ˜C) = t + 1, since t + 1 ≤ d(C).
Let ˜Ci be the code that is obtained by puncturing ˜C at the i-th position. Use
the algorithm A to compute d( ˜Ci) for all i ≤ n. An argument similar to above
shows that d( ˜Ci) = t if i is in the support of e, and d( ˜Ci) = t + 1 if i is not in
the support of e. So t = min{d( ˜Ci)|i ≤ n} and I = {i|t = d( ˜Ci), i ≤ n} is the
support of e and has size t. So the error positions are known. Computing the
error values is matter of linear algebra as shown in Proposition 6.2.11. In this
way e and c are found.
Proposition 6.3.4 Let MD be the problem of computing the minimum distance
of a code given by a parity check matrix. Let DHMD be the problem of decoding
up to half the minimum distance. Then
DHMD ≤P MD.

Proof. Let A be an algorithm that computes the minimum distance of an Fq-
linear code C that is given by a parity check matrix H. Let D be the algorithm
given in 6.3.2. Then A is used (n + 1)-times in D. Suppose that the complexity
of A is polynomial of degree e. We may assume that e ≥ 2. Computing the
error values can be done with complexity O(n3
) by Proposition 6.2.11. Then
the complexity of D is polynomial of degree e + 1.
***Sendrier and Finasz***
***Decoding with preprocessing, Bruck-Naor***
6.3.3 Other hard problems
***worse case versus average case, the simplex method for linear programming is
an example of an algorithm that runs almost always fast, that is polynomially in
its input, but for which is known to be exponentially in the worst case. Ellipsoid
method, Khachian’s method***
***approximate solutions of NP-hard problems***
6.4 Notes
In 1978, Berlekamp, McEliece and van Tilborg proved that the maximum-
likelihood decoding problem is NP-hard for general binary codes. Vardy showed
in 1997 that the problem of computing the minimum distance of a binary linear
code is NP-hard.

Chapter 7
Cyclic codes
Ruud Pellikaan
Cyclic codes have been in the center of interest in the theory of error-correcting
codes since their introduction. Cyclic codes of relatively small length have good
parameters. In the list of 62 binary cyclic codes of length 63 there are 51 codes
that have the largest known minimum distance for a given dimension among
all linear codes of length 63. Binary cyclic codes are better than the Gilbert-
Varshamov bound for lengths up to 1023. Although some negative results are
known indicating that cyclic codes are asymptotically bad, this still is an open
problem. Rich combinatorics is involved in the determination of the parameters
of cyclic codes in terms of patterns of the defining set.
***...***
7.1 Cyclic codes
7.1.1 Definition of cyclic codes
Definition 7.1.1 The cyclic shift σ(c) of a word c = (c0, c1, . . . , cn−1) ∈ Fn
q is
defined by
σ(c) := (cn−1, c0, c1, . . . , cn−2).
An Fq-linear code C of length n is called cyclic if
σ(c) ∈ C for all c ∈ C.
The subspaces {0} and Fn
q are clearly cyclic and are called the trivial cyclic
codes.
Remark 7.1.2 In the context of cyclic codes it is convenient to consider the
index i of a word modulo n and the convention is that the numbering of elements
(c0, c1, . . . , cn−1) starts with 0 instead of 1. The cyclic shift defines a linear map
σ : Fn
q → Fn
q . The i-fold composition σi
= σ ◦ · · · ◦ σ is the i-fold forward shift.
Now σn
is the identity map and σn−1
is the backward shift. A cyclic code is
invariant under σi
for all i.
189

190 CHAPTER 7. CYCLIC CODES
Proposition 7.1.3 Let G be a generator matrix of a linear code C. Then C is
cyclic if and only if the cyclic shift of every row of G is in C.
Proof. If C is cyclic, then the cyclic shift of every row of G is in C, since all
the rows of G are codewords.
Conversely, suppose that the cyclic shift of every row of G is in C. Let g1, . . . , gk
be the rows of G. Let c ∈ C. Then c =
k
i=1 xigi for some x1, . . . , xk ∈ Fq.
Now σ is a linear transformation of Fn
q . So
σ(c) =
k
i=1
xiσ(gi) ∈ C,
since C is linear and σ(gi) ∈ C for all i by assumption. Hence C is cyclic.
Example 7.1.4 Consider the [6,3] code over F7 with generator matrix G de-
ﬁned by
G =


1 1 1 1 1 1
1 3 2 6 4 5
1 2 4 1 2 4

 .
Then σ(g1) = g1, σ(g2) = 5g2 and σ(g3) = 4g3. Hence the code is cyclic.
Example 7.1.5 Consider the [7, 4, 3] Hamming code C, with generator matrix
G as given in Example 2.2.14. Then (0, 0, 0, 1, 0, 1, 1), the cyclic shift of the
third row is not a codeword. Hence this code is not cyclic. After a permutation
of the columns and rows of G we get the generator matrix G of the code C ,
where
G =




1 0 0 0 1 1 0
0 1 0 0 0 1 1
0 0 1 0 1 1 1
0 0 0 1 1 0 1



 .
Let gi be the i-th row of G . Then σ(g1) = g2, σ(g2) = g1 +g3, σ(g3) = g1 +g4
and σ(g4) = g1. Hence C is cyclic by Proposition 7.1.3. Therefore C is not
cyclic, but equivalent to a cyclic code C .
Proposition 7.1.6 The dual of a cyclic code is again cyclic.
Proof. Let C be a cyclic code. Then σ(c) ∈ C for all c ∈ C. So
σn−1
(c) = (c1, . . . , cn−1, c0) ∈ C for all c ∈ C.
Let x ∈ C⊥
. Then
σ(x) · c = xn−1c0 + x0c1 + · · · + xn−2cn−1 = x · σn−1
(c) = 0
for all c ∈ C. Hence C⊥
is cyclic.

7.1. CYCLIC CODES 191
7.1.2 Cyclic codes as ideals
The set of all polynomials in the variable X with coefficients in Fq is denoted
by Fq[X]. Two polynomials can be added and multiplied and in this way Fq[X]
is a ring. One has division with rest this means that very polynomial f(X) has
after division with another nonzero polynomial g(X) a quotient q(X) with rest
r(X) that is zero or of degree strictly smaller than deg g(X). In other words
f(X) = q(X)g(X) + r(X) and r(X) = 0 or deg r(X) deg g(X).
In this way Fq[X] with its degree is a Euclidean domain. Using division with
rest repeatedly we find the greatest common divisor gcd(f(X), g(X)) of two
polynomials f(X) and g(X) by the algorithm of Euclid.
***complexity of Euclidean Algorithm***
Every nonempty subset of a ring that is closed under addition and multiplication
by an arbitrary element of the the ring is called an ideal. Let g1, . . . , gm be given
elements of a ring. The set of all a1g1 + · · · + amgm with a1, . . . , am in the ring,
forms an ideal and is denoted by g1, . . . , gm and is called the ideal generated
by g1, . . . , gm. As a consequence of division with rest every ideal in Fq[X] is
either {0} or generated by a one unique monic polynomial. Furthermore
f(X), g(X) = gcd(f(X), g(X)) .
We refer for these notions and properties to Appendix ??.
Definition 7.1.7 Let R be a ring and I an ideal in R. Then R/I is the factor
ring of R modulo I. If R = Fq[X] and I = Xn
− 1 is the ideal generated by
Xn
− 1, then Cq,n is the factor ring
Cq,n = Fq[X]/ Xn
− 1 .
Remark 7.1.8 The factor ring Cq,n has an easy description. Every polynomial
f(X) has after division by Xn
− 1 a rest r(X) of degree at most n − 1, that is
there exist polynomials q(X) and r(X) such that
f(X) = q(X)(Xn
− 1) + r(X) and deg r(X) n or r(X) = 0.
The coset of the polynomial f(X) modulo Xn
− 1 is denoted by f(x). Hence
f(X) and r(X) have the same coset and represent the same element in Cq,n.
Now xi
denotes the coset of Xi
modulo Xn
−1 . Hence the cosets 1, x, . . . , xn−1
form a basis of Cq,n over Fq. The multiplication of the basis elements xi
and xj
in Cq,n with 0 ≤ i, j n is given by
xi
xj
=
xi+j
if i + j n,
xi+j−n
if i + j ≥ n,
Definition 7.1.9 Consider the map ϕ between Fn
q and Cq,n
ϕ(c) = c0 + c1x + · · · + cn−1xn−1
.
Then ϕ(c) is also denoted by c(x).

Proposition 7.1.10 The map ϕ is an isomorphism of vector spaces. Ideals in
the ring Cq,n correspond one-to-one to cyclic codes in Fn
q .
Proof. The map ϕ is clearly linear and it maps the i-th standard basis vector
of Fn
q to the coset xi−1
in Cq,n for i = 1, . . . , n. Hence ϕ is an isomorphism of
vector spaces. Let ψ be the inverse map of ϕ.
Let I be an ideal in Cq,n. Let C := ψ(I). Then C is a linear code, since ψ is a
linear map. Let c ∈ C. Then c(x) = ϕ(c) ∈ I and I is an ideal. So xc(x) ∈ I.
But
xc(x) = c0x+c1x2
+· · ·+cn−2xn−1
+cn−1xn
= cn−1+c0x+c1x2
+· · ·+cn−2xn−1
,
since xn
= 1. So ψ(xc(x)) = (cn−1, c0, c1 . . . , cn−2) ∈ C. Hence C is cyclic.
Conversely, let C be a cyclic code in Fn
q , and let I := ϕ(C). Then I is closed
under addition of its elements, since C is a linear code and ϕ is a linear map.
If a ∈ Fn
q and c ∈ C, then
a(x)c(x) = ϕ(a0c + a1σ(c) + · · · + an−1σn−1
(c)) ∈ I.
Hence I is an ideal in Cq,n.
In the following we will not distinguish between words and the corresponding
polynomials under ϕ; we will talk about words c(x) when in fact we mean the
vector c and vice versa.
Example 7.1.11 Consider the rows of the generator matrix G of the [7, 4, 3]
Hamming code of Example 7.1.5. They correspond to g1(x) = 1 + x4
+ x5
,
g2(x) = x + x5
+ x6
, g3(x) = x2
+ x4
+ x5
+ x6
and g4(x) = x3
+ x4
+ x6
,
respectively. Furthermore x·x6
= 1, so x is invertible in the ring F2[X]/ X7
−1 .
Now
1 + x4
+ x5
= x + x5
+ x6
= x6
+ x10
+ x11
= x3
+ x4
+ x6
.
Hence the ideals generated by gi(x) are the same for i = 1, 2, 4 and there is no
unique generating element. The third row generates the ideal
x2
+x4
+x5
+x6
= x2
(1+x2
+x3
+x4
) = 1+x2
+x3
+x4
= (1+x)(1+x+x3
) ,
which gives a cyclic code that is a proper subcode of dimension 3. Therefore all
except the third element generate the same ideal.
7.1.3 Generator polynomial
Remark 7.1.12 The ring Fq[X] with its degree function is an Euclidean ring.
Hence Fq[X] is a principal ideal domain, that means that all ideals are generated
by one element. If an ideal of Fq[X] is not zero, then a generating element
is unique up to a nonzero scalar multiple of Fq. So there is a unique monic
polynomial generating the ideal. Now Cq,n is a factor ring of Fq[X], therefore it
is also a principal ideal domain. A cyclic code C considered as an ideal in Cq,n
is generated by one element, but this element is not unique, as we have seen in
Example 7.1.11. The inverse image of C under the map Fq[X] → Cq,n is denoted

by I. Then I is a nonzero ideal in Fq[X] containing Xn
− 1. Therefore I has a
unique monic polynomial g(X) as generator. So g(X) is the monic polynomial
in I of minimal degree. Hence g(X) is the monic polynomial of minimal degree
such that g(x) ∈ C.
Deﬁnition 7.1.13 Let C be a cyclic code. Let g(X) be the monic polynomial
of minimal degree such that g(x) ∈ C. Then g(X) is called the generator
polynomial of C.
Example 7.1.14 The generator polynomial of the trivial code Fn
q is 1, and of
the zero code of length n is Xn
− 1. The repetition code and its dual have as
generator polynomials Xn−1
+ · · · + X + 1 and X − 1, respectively.
Proposition 7.1.15 Let g(X) be a polynomial in Fq[X]. Then g(X) is a gen-
erator polynomial of a cyclic code over Fq of length n if and only if g(X) is
monic and divides Xn
− 1.
Proof. Suppose g(X) is the generator polynomial of a cyclic code. Then g(X)
is monic and a generator of an ideal in Fq[X] that contains Xn
−1. Hence g(X)
divides Xn
− 1.
Conversely, suppose that g(X) is monic and divides Xn
− 1. So b(X)g(X) =
Xn
− 1 for some b(X). Now g(x) is an ideal in Cq,n and deﬁnes a cyclic code
C. Let c(X) be a monic polynomial such that c(x) ∈ C. Then c(x) = a(x)g(x).
Hence there exists an h(X) such that
c(X) = a(X)g(X) + h(X)(Xn
− 1) = (a(X) + b(X)h(X))g(X)
Hence deg g(X) ≤ deg c(X). Therefore g(X) is the monic polynomial of minimal
degree such that g(x) ∈ C. Hence g(X) is the generator polynomial of C.
Example 7.1.16 The polynomial X3
+ X + 1 divides X8
− 1 in F3[X], since
(X3
+ X + 1)(X5
− X3
− X2
+ X − 1) = X8
− 1.
Hence 1 + X + X3
is a generator polynomial of a ternary cyclic code of length
8.
Remark 7.1.17 Let g(X) be the generator polynomial of C. Then g(X) is a
monic polynomial and g(x) generates C. Let c(X) be another polynomial such
that c(x) generates C. Let d(X) be the greatest common divisor of c(X) and
Xn
− 1. Then d(X) is the monic polynomial such that
d(X) = c(X), Xn
− 1 = I.
But also g(X) is the unique monic polynomial such that g(X) = I. Hence
g(X) = gcd(c(X), Xn
− 1).
Example 7.1.18 Consider the binary cyclic code of length 7 and generated by
1 + x2
. Then 1 + X2
= (1 + X)2
and 1 + X7
is divisible by 1 + X in F2[X]. So
1 + X is the the greatest common divisor of 1 + X7
and 1 + X2
. Hence 1 + X
is the generator polynomial of C.

Example 7.1.19 Let C be the Hamming code of Examples 7.1.5 and 7.1.11.
Then 1 + x4
+ x5
generates C. In order to get the greatest common divisor of
1 + X7
and 1 + X4
+ X5
we apply the Euclidean algorithm:
1 + X7
= (1 + X + X2
)(1 + X4
+ X5
) + (X + X2
+ X4
),
1 + X4
+ X5
= (1 + X)(X + X2
+ X4
) + (1 + X + X3
),
X + X2
+ X4
= X(1 + X + X3
).
Hence 1 + X + X3
is the greatest common divisor, and therefore 1 + X + X3
is
the generator polynomial of C.
Remark 7.1.20 Let g(X) be a generator polynomial of a cyclic code of length
n, then g(X) divides Xn
− 1 by Proposition 7.1.15. So g(X)h(X) = Xn
− 1 for
some h(X). Hence g(0)h(0) = −1. Therefore the constant term of the generator
polynomial of a cyclic code is not zero.
Proposition 7.1.21 Let g(X) = g0 +g1X +· · ·+glXl
be a polynomial of degree
l. Let n be an integer such that l ≤ n. Let k = n − l. Let G be the k × n matrix
deﬁned by
G =






g0 g1 · · · gl 0 · · · 0
0 g0 g1 · · · gl
...
...
...
...
...
... · · ·
... 0
0 · · · 0 g0 g1 · · · gl






.
1. If g(X) is the generator polynomial of a cyclic code C, then the dimension
of C is equal to k and a generator matrix of C is G.
2. If gl = 1 and G is the generator matrix of a code C such that
(gl, 0, · · · , 0, g0, g1, · · · , gl−1) ∈ C,
then C is cyclic with generator polynomial g(X).
Proof.
1) Suppose g(X) is the generator polynomial of a cyclic code C. Then the
element g(x) generates C and the elements g(x), xg(x), . . . , xk−1
g(x) correspond
to the rows of the above matrix.
The generator polynomial is monic, so gl = 1 and the k × k submatrix of G
consisting of the last k columns is a lower diagonal matrix with ones on the
diagonal, so the rows of G are independent. Every codeword c(x) ∈ C is equal
to a(x)g(x) for some a(X). Division with remainder of Xn
− 1 by a(X)g(X)
gives that there exist e(X) and f(X) such that
a(X)g(X) = e(X)(Xn
− 1) + f(X) and deg f(X) n or f(X) = 0.
But Xn
− 1 is divisible by g(X) by Proposition 7.1.15. So f(X) is divisible by
g(X). Hence f(X) = b(X)g(X) and deg b(X) n − l = k or b(X) = 0 for some
polynomial b(X). Therefore c(x) = a(x)g(x) = b(x)g(x) and deg b(X) k or
b(X) = 0. So every codeword is a linear combination of g(x), xg(x), . . . , xk−1
g(x).

Hence k is the dimension of C and G is a generator matrix of C.
2) Suppose G is the generator matrix of a code C such that gl = 1 and
(gl, 0, · · · , 0, g0, g1, · · · , gl−1) ∈ C.
Then the cyclic shift of the i-th row of G is the (i + 1)-th row of G for all i k,
and the cyclic shift of the k-th row of G is (gl, 0, · · · , 0, g0, g1, · · · , gl−1) which
is also an element of C by assumption. Hence C is cyclic by Proposition 7.1.3.
Now gl = 1 and the upper right corner of G consists of zeros, so G has rank
k and the dimension of C is k. Now g(X) is monic, has degree l = n − k and
g(x) ∈ C. The generator polynomial of C has the same degree l by (1). Hence
g(X) is the generator polynomial of C.
Example 7.1.22 The ternary cyclic code of length 8 with generator polynomial
1 + X + X3
of Example 7.1.16 has dimension 5.
Remark 7.1.23 A cyclic [n, k] code is systematic at the first k positions, since
it has a generator matrix as given in Proposition 7.1.21 which is upper diagonal
with nonzero entries on the diagonal at the first k positions, since g0 = 0 by
Remark 7.1.20. So the row reduced echelon form of a generator matrix of
the code has the k × k identity matrix at the first k columns. The last row
of this rref matrix is up to the constant g0 equal to (0, · · · , 0, g0, g1, · · · , gl)
giving the coefficients of the generator polynomial. This methods of obtaining
the generator polynomial out of a given generator matrix G is more efficient
than taking the greatest common divisor of g1(X), . . . , gk(X), Xn
− 1, where
g1, . . . , gk are the rows of G.
Example 7.1.24 Consider generator matrix G of the [6,3] cyclic code over F7
of Example 7.1.4. The row reduced echelon form of G is equal to


1 0 0 6 1 3
0 1 0 3 3 6
0 0 1 6 4 6

 .
The last row represents
x2
+ 6x3
+ 4x4
+ 6x5
= x2
(1 + 6x + 4x2
+ 6x3
)
Hence 1 + 6x + 4x2
+ 6x3
is a codeword. The corresponding monic polynomial
6 + X + 3X2
+ X3
has degree 3. Hence this is the generator polynomial.
7.1.4 Encoding cyclic codes
Consider a cyclic code of length n with generator polynomial g(X) and the
corresponding generator matrix G as in Proposition 7.1.21. Let the message
m = (m0, . . . , mk−1) ∈ Fk
q be mapped to the codeword c = mG. In terms of
polynomials that means that
c(x) = m(x)g(x), where m(x) = m0 + · · · + mk−1xk−1
.
In this way we get an encoding of message words into codewords.
The k × k submatrix of G consisting of the last k columns of G is a lower

triangular matrix with ones on its diagonal, so it is invertible. That means that
we can perform row operations on this matrix until we get another matrix G2
such that its last k columns form the k × k identity matrix. The matrix G2 is
another generator matrix of the same code. The encoding
m → c2 = mG2
by means of G2 is systematic in the last k positions, that means that there exist
r0, . . . , rn−k−1 ∈ Fq such that
c2 = (r0, . . . , rn−k−1, m0, . . . , mk−1).
In other words the encoding has the nice property, that one can read off the
sent message directly from the encoded word by looking at the last k positions,
in case no errors appeared during the transmission at these positions.
Now how does one translate this systematic encoding in terms of polynomials?
Let m(X) be a polynomial of degree at most k − 1. Let −r(X) be the rest after
dividing m(X)Xn−k
by g(X). Now deg(g(X)) = n−k. So there is a polynomial
q(X) such that
m(X)Xn−k
= q(X)g(X) − r(X) and deg(r(X)) n − k or r(X) = 0.
Hence r(x) + m(x)xn−k
= q(x)g(x) is a codeword of the form
r0 + r1x + · · · + rn−k−1xn−k−1
+ m0xn−k
+ · · · + mk−1xn−1
.
Example 7.1.25 Consider the cyclic [7,4,3] Hamming code of Example 7.1.19
with generator polynomial g(X) = 1 + X + X3
. Let m be a message with
polynomial m(X) = 1 + X2
+ X3
. Then division of m(X)X3
by g(X) gives
as quotient q(X) = 1 + X + X2
+ X3
with rest r(X) = 1. The corresponding
codeword by systematic encoding is
c2(x) = r(x) + m(x)x3
= 1 + x3
+ x5
+ x6
.
Example 7.1.26 Consider the ternary cyclic code of length 8 with generator
polynomial 1+X +X3
of Example 7.1.16. Let m be a message with polynomial
m(X) = 1 + X2
+ X3
. Then division of m(X)X3
by g(X) gives as quotient
q(X) = −1 − X + X2
+ X3
with rest −r(X) = 1 − X. The corresponding
codeword by systematic encoding is
c2(x) = r(x) + m(x)x3
= −1 + x + x3
+ x5
+ x6
.
7.1.5 Reversible codes
Definition 7.1.27 Define the reversed word ρ(x) of x ∈ Fn
q by
ρ(x0, x1, . . . , xn−2, xn−1) = (xn−1, xn−2, . . . , x1, x0).
Let C be a code in Fn
q , then its reversed code ρ(C) is defined by
ρ(C) = { ρ(c) | c ∈ C }.
A code is called reversible if C = ρ(C).

Remark 7.1.28 The dimensions of C and ρ(C) are the same, since ρ is an
automorphism of Fn
q . If a code is reversible, then ρ ∈ Aut(C).
Definition 7.1.29 Let g(X) be a polynomial of degree l given by
g0 + g1X + · · · + gl−1Xl−1
+ glXl
.
Then
Xl
g(X−1
) = gl + gl−1X + · · · + g1Xl−1
+ g0Xl
.
is called the reciprocal of g(X). If moreover g(0) = 0, then Xl
g(X−1
)/g(0) is
called the monic reciprocal of g(X). The polynomial g(X) is called reversible if
g(0) = 0 and it is equal to its monic reciprocal.
Remark 7.1.30 If g = (g0, g1, . . . , gl−1, gl) are the coefficients of the polyno-
mial g(X), then the reversed word ρ(g) give the coefficients of the reciprocal of
g(X).
Remark 7.1.31 If α is a zero of g(X) and α = 0, then the reciprocal α−1
is a
zero of the reciprocal of g(X).
Proposition 7.1.32 Let g(X) be the generator polynomial of a cyclic code C.
Then ρ(C) is cyclic with the monic reciprocal of g(X) as generator polynomial,
and C is reversible if and only if g(X) is reversible.
Proof. A cyclic code is invariant under the forward shift σ and the backward
shift σn−1
. Now σ(ρ(c)) = ρ(σn−1
(c)) for all c ∈ C. Hence ρ(C) is cyclic.
Now g(0) = 0 by Remark 7.1.20. Hence the monic reciprocal of g(X) is well
defined and its corresponding word is an element of ρ(C) by Remark 7.1.30. The
degree of g(X) and its monic reciprocal are the same, and the dimensions of C
and ρ(C) are the same. Hence this monic reciprocal is the generator polynomial
of ρ(C).
Therefore C is reversible if and only if g(X) is reversible, by the definition of a
reversible polynomial.
Remark 7.1.33 If C is a reversible cyclic code, then the group generated by
σ and ρ is the dihedral group of order 2n and is contained in Aut(C).
7.1.6 Parity check polynomial
Definition 7.1.34 Let g(X) be the generator polynomial of a cyclic code C of
length n. Then g(X) divides Xn
− 1 by Proposition 7.1.15 and
h(X) =
Xn
− 1
g(X)
is called the parity check polynomial of C.
Proposition 7.1.35 Let h(X) be the parity check polynomial of a cyclic code
C. Then c(x) ∈ C if and only if c(x)h(x) = 0.

Proof. Let c(x) ∈ C. Then c(x) = a(x)g(x), for some a(x). We have that
g(X)h(X) = Xn
− 1. Hence g(x)h(x) = 0. So c(x)h(x) = a(x)g(x)h(x) = 0.
Conversely, suppose that c(x)h(x) = 0. There exist polynomials a(X) and b(X)
such that
c(X) = a(X)g(X) + b(X) and b(X) = 0 or deg b(X) deg g(X).
Hence
c(x)h(x) = a(x)g(x)h(x) + b(x)h(x) = b(x)h(x).
Notice that b(x)h(x) = 0 if b(X) is a nonzero polynomial, since deg b(X)h(X)
is at most n − 1. Hence b(X) = 0 and c(x) = a(x)g(x) ∈ C.
Remark 7.1.36 If H is a parity check matrix for a code C, then H is a gen-
erator matrix for the dual of C. One might expect that if h(X) is the parity
check polynomial for a cyclic code C, then h(X) is the generator polynomial of
the dual of C. This is not the case but something of this nature is true as the
following shows.
Proposition 7.1.37 Let h(X) be the parity check polynomial of a cyclic code
C. Then the monic reciprocal of h(X) is the generator polynomial of C⊥
.
Proof. Let C be a cyclic code of length n and dimension k with generator
polynomial g(X) and parity check polynomial h(X).
If k = 0, then g(X) = Xn
− 1 and h(X) = 1 and similarly if k = n, then
g(X) = 1 and h(X) = Xn
− 1. Hence the proposition is true in these cases.
Now suppose that 0 k n. Then h(X) = h0 + h1X + · · · + hkXk
. Hence
Xk
h(X−1
) = hk + hk−1X + · · · + h0Xk
.
The i-th position of xk
h(x−1
) is hk−i. Let g(X) be the generator polynomial of
C. Let l = n − k. Then g(X) = g0 + g1X + · · · + glXl
and gl = 1. The elements
xt
g(x) generate C. The i-th position of xt
g(x) is equal to gi+t. Hence the inner
product of the words xt
g(x) and xk
h(x−1
) is
k
i=0
gi+thk−i,
which is the coefficient of the term Xk+t
in Xt
g(X)h(X). But Xt
g(X)h(X)
is equal to Xn+t
− Xt
and 0 k n, hence this coefficient is zero. So
n
i=1 gi+thk−i = 0 for all t. So xk
h(x−1
) is an element of the dual of C.
Now g(X)h(X) = Xn
− 1. So g(0)h(0) = −1. Hence the monic reciprocal of
h(X) is well defined, is monic, represents an element of C⊥
, has degree k and the
dimension of C⊥
is n − k. Hence Xk
h(X−1
)/h(0) is the generator polynomial
of C⊥
Example 7.1.38 Consider the [6,3] cyclic code over F7 of Example 7.1.24 which
has generator polynomial X3
+ 4X2
+ X + 1. Hence
h(X) =
X6
− 1
X3 + 4X2 + X + 1
= X3
+ 4X2
+ X + 1

is the parity check polynomial of the code. The generator polynomial of the
dual code is
g⊥
(X) = X6
h(X−1
) = 1 + 4X + X2
+ X3
by Proposition 7.1.37, since h(0) = 1.
Example 7.1.39 Consider in F2[X] the polynomial
g(X) = 1 + X4
+ X6
+ X7
+ X8
.
Then g(X) divides X15
− 1 with quotient
h(X) =
X15
− 1
g(X)
= 1 + X4
+ X6
+ X7
.
Hence g(X) is the generator polynomial of a binary cyclic code of length 15
with parity check polynomial h(X). The generator polynomial of the dual code
is
g⊥
(X) = X7
h(X−1
) = 1 + X + X3
+ X7
by Proposition 7.1.37, since h(0) = 1.
Example 7.1.40 The generator polynomial 1 + X + X3
of the ternary code of
length 8 of Example 7.1.16 has parity check polynomial
h(X) =
X8
− 1
g(X)
= X5
− X3
− X2
+ X − 1.
The generator polynomial of the dual code is
g⊥
(X) = X8
h(X−1
)/h(0) = X5
− X4
+ X3
+ X2
− 1
Example 7.1.41 Let us now take a look at how cyclic codes are constructed
via generator and check polynomials in GAP.
x:=Indeterminate(GF(2));;
f:=x^17-1;;
F:=Factors(PolynomialRing(GF(2)),f);
[ x_1+Z(2)^0, x_1^8+x_1^5+x_1^4+x_1^3+Z(2)^0, x_1^8+x_1^7+x_1^6+
x_1^4+x_1^2+x_1+Z(2)^0 ]
g:=F[2];;
C:=GeneratorPolCode(g,17,code from Example 6.1.41,GF(2));;
MinimumDistance(C);;
C;
a cyclic [17,9,5]3..4 code from Example 6.1.41 over GF(2)
h:=F[3];;
C2:=CheckPolCode(h,17,GF(2));;
MinimumDistance(C2);;
C2;
a cyclic [17,8,6]3..7 code defined by check polynomial over GF(2)
So here x is a variable with which the polynomials are built. Note that one can
also deﬁne it via x:=X(GF(2)), since X is a synonym of Indeterminate. For
this same reason we could not use X as a variable.

7.1.7 Exercises
7.1.1 Let C be the Fq-linear code with generator matrix
G =




1 1 1 1 0 0 0
0 1 1 1 1 0 0
0 0 1 1 1 1 0
0 0 0 1 1 1 1



 .
Show that C is not cyclic for every finite field Fq.
7.1.2 Let C be a cyclic code over Fq of length 7 such that (1, 1, 1, 0, 0, 0, 0) is
an element of C. Show that C is a trivial code if q is not a power of 3.
7.1.3 Find the generator polynomial of the binary cyclic code of length 7
generated by 1 + x + x5
.
7.1.4 Show that 2 + X2
+ X3
is the generator polynomial of a ternary cyclic
code of length 13.
7.1.5 Let α be an element in F8 such that α3
= α + 1. Let C be the F8-linear
code with generator matrix G, where
G =


1 1 1 1 1 1 1
1 α α2
α3
α4
α5
α6
1 α2
α4
α6
α α3
α5

 .
1) Show that the code C is cyclic.
2) Determine the coefficients of the generator polynomial of this code.
7.1.6 Consider the binary polynomial g(X) = 1 + X2
+ X5
.
1) Show that g(X) is the generator polynomial of a binary cyclic code C of
length 31 and dimension 26.
2) Give the encoding with respect to the code C of the message m with m(X) =
1+X10
+X25
as message polynomial, that is systematic at the last 26 positions.
3) Find the parity check polynomial of C.
4) Give the coefficients of the generator polynomial of C⊥
.
7.1.7 Give a description of the systematic encoding of an [n, k] cyclic code in
the first k positions in terms of division by the generator polynomial with rest.
7.1.8 Estimate the number of additions and the number of multiplications in
Fq needed to encode an [n, k] cyclic code using multiplication with the generator
polynomial and compare these with the numbers for systematic encoding in the
last k positions by dividing m(X)Xn−k
by g(X) with rest.
7.1.9 [CAS] Implement the encoding procedure from Section 7.1.4.
7.1.10 [CAS] Having a generator polynomial g, code length n, and field size
q construct a cyclic code dual to the one generated by g. Use the function
ReciprocalPolynomial (both in GAP and Magma).

7.2. DEFINING ZEROS 201
7.2 Defining zeros
*** ***
7.2.1 Structure of finite fields
The finite fields we encountered up to now were always of the form Fp = Z/ p
with p a prime. For the notion of defining zeros of a cyclic codes this does not
suffice and extensions of prime fields are needed. In this section we state basic
facts on the structure of finite fields. For proofs we refer to the existing literature.
Definition 7.2.1 The smallest subfield of a field F is unique and is called the
prime field of F. The only prime fields are the rational numbers Q and the
finite field Fp with p a prime and the characteristic of the field is zero and p,
respectively.
Remark 7.2.2 Let F be a field of characteristic p a prime. Then
(x + y)p
= xp
+ yp
for all x, y ∈ F by Newton’s binomial formula, since p
i is divisible by p for all
i = 1, . . . , p − 1.
Proposition 7.2.3 If F is a finite field, then the number of elements of F is a
power of a prime number.
Proof. The characteristic of a finite field is prime, and such a field is a vector
space over the prime field of finite dimension. So the number of elements of a
finite field is a power of a prime number.
Remark 7.2.4 The factor ring over the field of polynomials in one variable
with coefficients in a field F modulo an irreducible polynomial gives a way to
construct a field extension of F. In particular, if f(X) ∈ Fp[X] is irreducible,
and f(X) is the ideal generated by all the multiples of f(X), then the factor
ring Fp[X]/ f(X) is a field with pe
elements, where e = deg f(X). The coset
of X modulo f(X) is denoted by x, and the monomials 1, x, . . . , xe−1
form
a basis over Fp. Hence every element in this field is uniquely represented by a
polynomial g(X) ∈ Fp[X] of degree at most e − 1 and its coset is denoted by
g(x). This is called the principal representation. The sum of two representatives
is again a representative. For the product one has to divide by f(X) and take
the rest as a representative.
Example 7.2.5 The irreducible polynomials of degree one in F2[X] are X and
1 + X. And 1 + X + X2
is the only irreducible polynomial of degree two in
F2[X]. There are exactly two irreducible polynomials of degree three in F2[X].
These are 1 + X + X3
and 1 + X2
+ X3
.
Consider the field F = F2[X]/ 1 + X + X3
with 8 elements. Then 1, x, x2
is a
basis of F over F2. Now
(1 + X)(1 + X + X2
) = 1 + X3
≡ X mod 1 + X + X3
.

Hence (1 + x)(1 + x + x2
) = x in F. In the following table the powers xi
are
written by their principal representatives.
x3
= 1 + x
x4
= x + x2
x5
= 1 + x + x2
x6
= 1 + x2
x7
= 1
Therefore the nonzero elements form a cyclic group of order 7 with x as gener-
ator.
Definition 7.2.6 Let F be a field. Let f(X) =
n
i=0 aiXi
in F[X]. Then
f (X) is the formal derivative of f(X) and is defined by
f (X) =
n
i=1
iaiXi−1
.
Remark 7.2.7 The product or Leibniz rule holds for the derivative
(f(X)g(X)) = f (X)g(X) + f(X)g (X).
The following criterion gives a way to decide whether the zeros of a polynomial
are simple.
Lemma 7.2.8 Let F be a field. Let f(X) ∈ F[X]. Then every zero of f(X) has
multiplicity one if and only if gcd(f(X), f (X)) = 1.
Proof. Suppose gcd(f(X), f (X)) = 1. Let α be a zero of f(X) of multiplicity
m. Then there exists a polynomial a(X) such that f(X) = (X − α)m
a(X).
Differentiating this equality gives
f (X) = m(X − α)m−1
a(X) + (X − α)m
a (X).
If m 1, then X −α divides f(X) and f (X). This contradicts the assumption
that gcd(f(X), f (X)) = 1. Hence every zero of f(X) has multiplicity one.
Conversely, if gcd(f(X), f (X)) = 1, then f(X) and f (X) have a common zero
a, possibly in an extension of F. Conclude that (X − a)2
divides f(X), using
the product rule again.
Remark 7.2.9 Let p be a prime and q = pe
. The formal derivative of Xq
− X
is −1 in Fp. Hence all zeros of Xq
− X in an extension of Fp are simple by
Lemma 7.2.8.
For every field F and polynomial f(X) in one variable X there exists a field G
that contains F as a subfield such that f(X) splits in linear factors in G[X].
The smallest field with these properties is unique up to an isomorphism of fields
and is called the splitting field of f(X) over F.
A field F is called algebraically closed if every polynomial in one variable has
a zero in F. So every polynomial in one variable over an algebraically closed
field splits in linear factors over this field. Every field F has an extension G

that is algebraically closed such that G does not have a proper subfield that
is algebraically closed. Such an extension is unique up to isomorphism and is
called the algebraic closure of F and is denoted by ¯F. The field C of complex
numbers is the algebraic closure of the field R of real numbers.
Remark 7.2.10 If F is a field with q elements, then F∗
= F {0} is a multi-
plicative group of order q − 1. So xq−1
= 1 for all x ∈ F∗
. Hence xq
= x for all
x ∈ F. Therefore the zeros of Xq
− X are precisely the elements of F.
Theorem 7.2.11 Let p be a prime and q = pe
. There exists a field of q elements
and any field with q elements is isomorphic to the splitting field of Xq
− X over
Fp and is denoted by Fq or GF(q), the Galois field of q elements.
Proof. The splitting field of Xq
−X over Fp contains the zeros of Xq
−X. Let
Z be the set of zeros of Xq
−X in the splitting field. Then |Z| = q, since Xq
−X
splits in linear factors and all zeros are simple by Remark 7.2.9. Now 0 and 1
are elements of Z and Z is closed under addition, subtraction, multiplication
and division by nonzero elements. Hence Z is a field. Furthermore Z contains
Fp since q = pe
. Hence Z is equal to the splitting field of Xq
−X over Fp. Hence
the splitting field has q elements.
If F is a field with q elements, then all elements of F are zeros of Xq
− X by
Remark 7.2.10. Hence F is contained in an isomorphic copy of the splitting field
of Xq
− X over Fp. Therefore they are equal, since both have q elements.
The set of invertible elements of the finite field Fq is an abelian group of order
q − 1. But a stronger statement is true.
Proposition 7.2.12 The multiplicative group F∗
q is cyclic.
Proof. The order of an element of F∗
q divides q − 1, since F∗
q is a group of
order q − 1. Let d be the maximal order of an element of F∗
q. Then d divides
q − 1. Let x be an element of order d. If y ∈ F∗
q, then the order n of y divides
d. Otherwise there is a prime l dividing n and l not dividing d. So z = yn/l
has
order l. Hence xz has order dl, contradicting the maximality of d. Therefore the
order of an element of F∗
q divides d. So the elements of F∗
q are zeros of Xd
− 1.
Hence q −1 ≤ d and d divides q −1. We conclude that d = q −1, x is an element
of order q − 1 and F∗
q is cyclic generated by x.
Definition 7.2.13 A generator of F∗
q is called a primitive element. An irre-
ducible polynomial f(X) ∈ Fp[X] is called primitive if x is a primitive element
in Fp[X]/ f(X) , where x is the coset of X modulo f(X).
Definition 7.2.14 Choose a primitive element α of Fq. Define α∗
= 0. Then
for every element β ∈ Fq there is a unique i ∈ {∗, 0, 1, . . . , q − 2} such that
β = αi
, and this i is called the logarithm of β with respect to α, and αi
the
exponential representation of β. For every i ∈ {∗, 0, 1, . . . , q − 2} there is a
unique j ∈ {∗, 0, 1, . . . , q − 2} such that
1 + αi
= αj
and this j is called the Zech logarithm of i and is denoted by Zech(i) = j.

Remark 7.2.15 Let p be a prime and q = pe
. In a principal representation of
Fq, very element is given by a polynomial of degree at most e−1 with coefficients
in Fp and addition in Fq is easy and done coefficient wise in Fp. But for the
multiplication we need to multiply two polynomials and compute a division with
rest.
Define the addition i+j for i, j ∈ {∗, 0, 1, . . . , q−2}, where i+j is taken modulo
q − 1 if i and j are both not equal to ∗, and i + j = ∗ if i = ∗ or j = ∗. Then
multiplication in Fq is easy in the exponential representation with respect to a
primitive element, since
αi
αj
= αi+j
for i, j ∈ {∗, 0, 1, . . . , q − 2}. In the exponential representation the addition can
be expressed in terms of the Zech logarithm.
αi
+ αj
= αi+Zech(j−i)
.
Example 7.2.16 Consider the finite field F8 as given in Example 7.2.5 by the
irreducible polynomial 1 + X + X3
. In the following table the elements are
represented as powers in x, as polynomials a0+a1x+a2x2
and the Zech logarithm
is given.
i xi
(a0, a1, a2) Zech(i)
∗ x∗
(0, 0, 0) 0
0 x0
(1, 0, 0) ∗
1 x1
(0, 1, 0) 3
2 x2
(0, 0, 1) 6
3 x3
(1, 1, 0) 1
4 x4
(0, 1, 1) 5
5 x5
(1, 1, 1) 4
6 x6
(1, 0, 1) 2
In the principal representation we immediately see that x3
+ x5
= x2
, since
x3
= 1 + x and x5
= 1 + x + x2
. The exponential representation by means of
the Zech logarithm gives
x3
+ x5
= x3+Zech(2)
= x2
.
***Applications: quasi random generators, discrete logarithm.***
Definition 7.2.17 Let Irrq(n) be the number of irreducible monic polynomials
over Fq of degree n.
Proposition 7.2.18 Let q be a power of a prime number. Then
qn
=
d|n
d · Irrq(d).
Proof. ***...***
Proposition 7.2.19
Irrq(n) =
1
n
d|n
µ
n
d
qd
.

Proof. Consider the poset N of Example 5.3.20 with the divisibility as partial
order. Define f(d) = d · Irrq(d). Then the sum function ˜f(n) = d|n f(d) is
equal to qn
, by Proposition 7.2.18. The Möbius inversion formula 5.3.10 implies
that n · Irrq(n) = d|n µ n
d qd
which gives the desired result.
Remark 7.2.20 Proposition 7.2.19 implies
Irrq(d) ≥
1
n
qn
− qn−1
− · · · − q =
1
n
qn
−
qn
− q
q − 1
0,
since µ(1) = 1 and µ(d) ≥ −1 for all d. By counting the number of irreducible
polynomials over Fq we see that that there exists an irreducible polynomial in
Fq[X] of every degree d.
Let q = pd
and p a prime. Now Zp is a field with p elements. There exists an
irreducible polynomial f(T) in Zp[T] of degree d, and Zp[T]/ f(T) is a field
with pd
= q elements. This is another way to show the existence of a finite field
with q elements, where q is a prime power.
7.2.2 Minimal polynomials
***
Remark 7.2.21 From now on we assume that n and q are relatively prime.
This assumption is not necessary but it would complicate matters otherwise.
Hence q has an inverse modulo n. So qe
≡ 1(mod n) for some positive integer
e. Hence n divides qe
− 1. Let Fqe be the extension of Fq of degree e. So n
divides the order of F∗
qe , the cyclic group of units. Hence there exists an element
α ∈ F∗
qe of order n. From now on we choose such an element α of order n.
Example 7.2.22 The order of the cyclic group F∗
3e is 2, 8, 26, 80 and 242 for
e = 1, 2, 3, 4 and 5, respectively. Hence F35 is the smallest field extension of F3
that has an element of order 11.
Remark 7.2.23 The multiplicity of every zero of Xn
−1 is one by Lemma 7.2.8,
since gcd(Xn
− 1, nXn−1
) = 1 in Fq by the assumption that gcd(n, q) = 1. Let
α be an element in some extension of Fq of order n. Then 1, α, α2
, . . . , αn−1
are
n mutually distinct zeros of Xn
− 1. Hence
Xn
− 1 =
n−1
i=0
(X − αi
).
Definition 7.2.24 Let α be a primitive n-th root of unity in the extension field
Fqe . For this choice of an element of order n we define mi(X) as the minimal
polynomial of αi
, that is the monic polynomial in Fq[X] of smallest degree such
that mi(αi
) = 0.
Example 7.2.25 In particular m0(X) = X − 1.
Proposition 7.2.26 The minimal polynomial mi(X) is irreducible in Fq[X].

Proof. Let mi(X) = f(X)g(X) with f(X), g(X) ∈ Fq[X]. Then f(αi
)g(αi
) =
mi(αi
) = 0. So f(αi
) = 0 or g(αi
) = 0. Hence deg(f(X)) ≥ deg(mi(X)) or
deg(g(X)) ≥ deg(mi(X)) by the minimality of the degree of mi(X). Hence
mi(X) is irreducible.
Example 7.2.27 Choose α = 3 as the primitive element in F7 of order 6. Then
X6
− 1 is the product of linear factors in F7[X]. Furthermore m1(X) = X − 3,
m2(X) = X − 2, m3(X) = X − 6 and so on. But 5 is also an element of order
6 in F∗
7. The choice α = 5 would give m1(X) = X − 5, m2(X) = X − 4 and so
on.
Example 7.2.28 There are exactly two irreducible polynomials of degree 3 in
F2[X]. They are factors of 1 + X7
:
1 + X7
= (1 + X)(1 + X + X3
)(1 + X2
+ X3
).
Let α ∈ F8 be a zero of 1+X +X3
. Then α is a primitive element of F8 and α2
and α4
are the remaining zeros of 1 + X + X3
. The reciprocal of 1 + X + X3
is
X3
(1 + X−1
+ X−3
) = 1 + X2
+ X3
and has α−1
= α6
, α−2
= α5
and α−4
= α3
as zeros. So m1(X) = 1 + X + X3
and m3(X) = 1 + X2
+ X3
.
Proposition 7.2.29 The monic reciprocal of mi(X) is equal to m−i(X).
Proof. The element αi
is a zero of mi(X). So α−i
is a zero of the monic
reciprocal of mi(X) by Remark 7.1.30. Hence the degree of the monic reciprocal
of mi(X) is at least deg(m−i(X)). So deg(mi(X)) ≥ deg(m−i(X)). Similarly
deg(mi(X)) ≤ deg(m−i(X)). So deg(mi(X)) = deg(m−i(X)) is the degree
of the monic reciprocal of mi(X). Hence the monic reciprocal of mi(X) is a
monic polynomial of minimal degree having α−i
as a zero, therefore it is equal
to m−i(X).
7.2.3 Cyclotomic polynomials and cosets
***
Deﬁnition 7.2.30 Let n be a nonnegative integer. Then Euler’s function ϕ is
given by
ϕ(n) = |{i : gcd(i, n) = 1, 0 ≤ i n}|.
Lemma 7.2.31 The following properties of Euler’s function hold:
1) ϕ(mn) = ϕ(m)ϕ(n) if gcd(m, n) = 1.
2) ϕ(1) = 1.
3) ϕ(p) = p − 1 if p is a prime number.
4) ϕ(pe
) = pe−1
(p − 1) if p is a prime number.

Proof. The set {i : gcd(i, n) = 1, 0 ≤ i n} is a set of representatives of
Z∗
n that is the set of all invertible elements of Zn. Hence ϕ(n) = |Z∗
n|. The
Chinese remainder theorem gives that Zm ⊕ Zn
∼= Zmn if gcd(m, n) = 1. Hence
Z∗
m ⊕ Z∗
n
∼= Z∗
mn. Therefore ϕ(mn) = ϕ(m)ϕ(n) if gcd(m, n) = 1.
The remaining items are left as an exercise.
Proposition 7.2.32 Let p1, . . . , pk be the primes dividing n. Then
ϕ(n) = n 1 −
1
p1
· · · 1 −
1
pk
.
Proof. This is a direct consequence of Lemma 7.2.31.
Definition 7.2.33 Let F be a field. Let n be a positive integer that is relatively
prime to the characteristic of F. Let α be an element of order n in ¯F∗
. The
cyclotomic polynomial of order n is defined by
Φn(X) =
gcd(i,n)=1,0≤in
(X − αi
)
Remark 7.2.34 The degree of Φn(X) is equal to ϕ(n).
Remark 7.2.35 If x is a primitive element, then y is a primitive element if and
only if y = xi
for some i such that 1 ≤ i q − 1 and gcd(i, q − 1) = 1. Hence
the number of primitive elements in F∗
q is equal to ϕ(q − 1), where ϕ is Euler’s
function.
Theorem 7.2.36 Let F be a field. Let n be a positive integer that is relatively
prime to the characteristic of F. The polynomial Φn(X) is in F[X], has as zeros
all elements in ¯F∗
of order n and has degree ϕ(n), where ϕ is Euler’s function.
Furthermore
Xn
− 1 =
d|n
Φd(X).
Proof. The degree of Φn(X) is equal to the number i such that 0 ≤ i n and
gcd(i, n) = 1 which is by definition equal to ϕ(n). The power αi
has order n if
and only if gcd(i, n) = 1. Conversely if β is an element of order n in ¯F∗
, then β
is a zero of Xn
− 1 and Xn
− 1 = 0≤in(X − αi
). So β = αi
for some i with
0 ≤ i n and gcd(i, n) = 1. Hence Φn(X) has as zeros all elements in ¯F∗
of
order n. Therefore
Xn
− 1 =
0≤in
(X − αi
) =
d|n gcd(i,d)=1,0≤id
(X − αi
) =
d|n
Φd(X).
The fact that Φn(X) has coefficients in F is shown by induction on n. Now
Φ1(X) = X − 1 is in F[X]. Suppose that Φm(X) is in F[X] for all m n. Then
f(X) = n=d|n Φd(X) is in F[X], and Xn
− 1 = f(X)Φn(X). So Xn
− 1 is
divisible by f(X) in ¯F[X], and Xn
− 1 and f(X) are in F[X]. Hence Φn(X) is
in F[X].

Remark 7.2.37 The factorization of Xn
− 1 in cyclotomic polynomials gives
a way to compute the Φn(X) recursively.
Remark 7.2.38 The cyclotomic polynomial Φn(X) depends on the field F in
Definition 7.2.33. But Φn(X) is universal in the sense that in characteristic zero
it has integer coefficients and they do not depend on the field F. By reducing
the coefficients of this polynomial modulo a prime p one gets the cyclotomic
polynomial over any field of characteristic p. In characteristic zero Φn(X) is
irreducible in Q[X] for all n. But Φn(X) is sometimes reducible in Fp[X].
Example 7.2.39 The polynomials Φ1(X) = X − 1 and Φ2(X) = X + 1 are
irreducible in any characteristic, and X2
− 1 = Φ1(X)Φ2(X). Now
X3
− 1 = Φ1(X)Φ3(X).
Hence Φ3(X) = X2
+ X + 1, and this polynomial is irreducible in Fp[X] if and
only if F∗
p has no element of order 3 if and only if p ≡ 2 mod 3.
X4
− 1 = Φ1(X)Φ2(X)Φ4(X).
So Φ4(X) = X2
+ 1, and this polynomial is irreducible in Fp[X] if and only if
p ≡ 3 mod 4.
Proposition 7.2.40 Let f(X) be a polynomial with coefficients in Fq. If β is
a zero of f(X), then βq
is also a zero of f(X).
Proof. Let f(X) = f0 + f1X + · · · + fmXm
∈ Fq[X]. Then fq
i = fi for all i.
If β is a zero of f(X), then f(β) = 0. So
0 = f(β)q
= (f0 + f1β + · · · + fmβm
)q
=
fq
0 + fq
1 βq
+ · · · + fq
mβqm
= f0 + f1βq
+ · · · + fmβqm
= f(βq
).
Hence βq
is a zero of f(X).
Remark 7.2.41 In particular we have that mi(X) = mqi(X).
Let g(X) be a generator polynomial of a cyclic code over Fq. If αi
is a zero of
g(X), then αqi
is also a zero of g(X).
Definition 7.2.42 The cyclotomic coset Cq(I) of the subset I of Zn with re-
spect to q is the subset of Zn defined by
Cq(I) = { iqj
| i ∈ I, j ∈ N0}
If I = {i}, then Cq(I) is denoted by Cq(i).
Proposition 7.2.43 The cyclotomic cosets Cq(i) give a partitioning of Zn for
a given q such that gcd(q, n) = 1.
Proof. Every i ∈ Zn is in the cylcotomic coset Cq(i).
Suppose that Cq(i) and Cq(j) have an element in common. Then iqk
= jql
for
some k, l ∈ N0. We may assume that k ≤ l, then i = jql−k
and l − k ∈ N0.
So iqm
= jql−k+m
for all m ∈ N0. Hence Cq(i) is contained in Cq(j). Now
n and q are relatively prime, so q is invertible in Zn and qe
≡ 1( mod n) for
some positive integer e. So jqm
= iq(e−1)(l−k)+m
for all m ∈ N0. Hence Cq(j)
is contained in Cq(i). Therefore we have shown that two cyclotomic cosets are
equal or disjoint.

Proposition 7.2.44
mi(X) =
j∈Cq(i)
(X − αj
)
Proof. If j ∈ Cq(i), then mi(αj
) = 0 by Proposition 7.2.40. Hence the
product j∈Cq(i)(X − αj
) divides mi(X). Now raising to the q-th power gives
a permutation of the zeros αj
with j ∈ Cq(i). The coefficients of the product
of the linear factors X − αj
are symmetric functions in the αj
and therefore
invariant under raising to the q-th power. Hence these coefficients are elements
of Fq and this product is an element of Fq[X] that has αi
as a zero. Therefore
equality holds by the minimality of mi(X).
Proposition 7.2.45 Let n be a positive integer such that gcd(n, q) = 1. Then
the number of choices of an element of order n in an extension of Fq is equal
to ϕ(n). The possible choices of the minimal polynomial m1(X) corresponds to
monic irreducible factors of Φn(X) and the number of these choices is ϕ(n)/d,
where d = |Cq(1)|.
Proof. The number of choices of an element of order n in an extension of Fq
is ϕ(n) by Theorem 7.2.36. Let i ∈ Zn and gcd(i, n) = 1. Consider the map
Cq(1) → Cq(i) defined by j → ij. Then this map is well defined and has an
inverse, since i is invertible in Zn. So |Cq(1)| = |Cq(i)| and the set of elements
in Zn such that gcd(i, n) = 1 is partitioned in cyclotomic cosets of the samen
size d by Proposition 7.2.43, and every choice of such a coset corresponds to a
choice of m1(X) and is an irreducible monic factor of Φn(X). Hence the number
of possible minimal polynomials m1(X) is ϕ(n)/d.
Example 7.2.46 Let n = 11 and q = 3. Then ϕ(11) = 10. Consider the
sequence starting with 1 and obtained by multiplying repeatedly with 3 modulo
11:
1, 3, 9, 27 ≡ 5, 15 ≡ 4, 12 ≡ 1.
So C3(1) = {1, 3, 4, 5, 9} consists of 5 elements. Hence Φ11(X) has two irre-
ducible factors in F3[X] given by:
Φ11(X) =
X11
− 1
X − 1
= (−1 + X2
− X2
+ X4
+ X5
)(−1 − X + X2
− X3
+ X5
).
Example 7.2.47 Let n = 23 and q = 2. Then ϕ(23) = 22. Consider the
sequence starting with 1 and obtained by multiplying repeatedly with 2 modulo
23:
1, 2, 4, 8, 16, 32 ≡ 9, 18, 36 ≡ 13, 26 ≡ 3, 6, 12, 24 ≡ 1.
So C2(1) = {1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 26} consists of 11 elements. Hence Φ23(X) =
(X23
− 1)/(X − 1) is the product two irreducible factors in F2[X] given by:
(1 + X2
+ X4
+ X5
+ X6
+ X10
+ X11
)(1 + X + X5
+ X6
+ X7
+ X9
+ X11
).
Proposition 7.2.48 Let i and j be integers such that 0 i, j n. Suppose
ij ≡ 1 mod n. Then
mi(X) = gcd(m1(Xj
), Xn
− 1).

Proof. Let β be a zero of gcd(m1(Xj
), Xn
−1). Then β is a zero of m1(Xj
) and
Xn
−1. So β = αl
for some l and m1(αjl
) = 0. Hence jl ∈ Cq(1) by Proposition
7.2.44. So jl ≡ qm
mod n for some m. Hence l ≡ ijl ≡ iqm
mod n. Therefore
l ∈ Cq(i) and β is a zero of mi(X).
Similarly, if β is a zero of mi(X), then β is a zero of gcd(m1(Xj
), Xn
− 1).
Both polynomials are monic and have the same zeros and all zeros are simple
by Remark 7.2.23. Therefore the polynomials are equal.
Proposition 7.2.49 Let gcd(i, n) = d and j = n/d. Let α be an element of
order n in F∗
qe and β = αd
. Let mi(X) be the minimal polynomial of αi
and
nj(X) the minimal polynomial of βj
. Then β is an element of order n/d in F∗
qe
and mi(X) = nj(X).
Proof. The map jqm
→ jdqm
gives a well defined one-to one correspondence
between elements of D, the cyclotomic coset of j modulo n/d and the elements
of C, the cyclotomic coset of i modulo n. Hence
mi(X) =
k∈C
(X − αk
) =
l∈D
(X − βl
) = nj(X)
Example 7.2.50 Let α be an element of order 8 in an extension of F3. Let
m1(X) be the minimal polynomial of α in F3[X]. Then m1(X) divides X8
− 1.
But X8
− 1 = (X4
− 1)(X4
+ 1) and the zeros of X4
− 1 have order at most 4.
The factorization of X4
− 1 is given by
X4
− 1 = (X − 1)(X + 1)(X2
+ 1)
with m0(X) = X −1 and m4(X) = X +1, since α4
= −1. The cyclotomic coset
of 2 is C3(2) = {2, 6} and α2
and α6
are the elements of order 4 in F∗
9. So
m2(X) = m6(X) = Φ4(X) = X2
+ 1.
This confirms Proposition 7.2.49 with i = d = 2 and j = 1.
Now C3(1) = {1, 3} and C3(5) = {5, 7}. So m1(X) = m3(X) and m5(X) =
m7(X). Notice that −1 ≡ 7 mod 8 and m7(X) is the monic reciprocal of
m1(X) by Proposition 7.2.29. The degree of m1(X) is 2. Suppose
m1(X) = a0 + a1X + X2
.
Then m7(X) = a−1
0 +a−1
0 a1X +X2
. The polynomials m1(X) and m7(X) divide
X4
+ 1. Hence
Φ8(X) = X4
+ 1 = m1(X)m7(X) = (a0 + a1X + X2
)(a−1
0 + a−1
0 a1X + X2
).
Expanding the right hand side and comparing coefficients gives that a0 = −1
and a1 = 1 or a1 = −1. Hence there are two possible choices for m1(X).
Choose m1(X) = X2
+ X − 1. So X2
− X − 1 is the alternative choice for
m1(X). Furthermore α2
= −α+1 and m5(X) = m7(X) = (X −α5
)(X −α7
) by
Proposition 7.2.44. An application of Proposition 7.2.48 with i = j = 5, gives a
third way to compute m5(X) since 5·5 ≡ 1 mod 8, and m1(X5
) = X10
+X5
−1
and
gcd(X10
+ X5
− 1, X8
− 1) = X2
− X − 1.

7.2.4 Zeros of the generator polynomial
We have seen in Proposition 7.1.15 that the generator polynomial divides Xn
−1,
so its zeros are n-th roots of unity if n is not divisible by the characteristic of
Fq. Instead of describing a cyclic code by its generator polynomial g(X), one
can describe the code alternatively by the set of zeros of g(X) in an extension
of Fq.
Definition 7.2.51 Fix an element α of order n in an extension Fqe of Fq. A
subset I of Zn is called a defining set of a cyclic code C if
C = {c(x) ∈ Cq,n : c(αi
) = 0 for all i ∈ I}.
The root set, the set of zeros or the complete defining set Z(C) of C is defined
as
Z(C) = {i ∈ Zn : c(αi
) = 0 for all c(x) ∈ C}.
Proposition 7.2.52 The relation between the generator polynomial g(X) of a
cyclic code C and the set of zeros Z(C) is given by
g(X) =
i∈Z(C)
(X − αi
).
The dimension of C is equal to n − |Z(C)|.
Proof. The generator polynomial g(X) divides Xn
− 1 by Proposition 7.1.15.
The polynomial Xn
− 1 has no multiple zeros, by Remark 7.2.23 since n and q
are relatively prime. So every zero of g(X) is of the form αi
for some i ∈ Zn
and has multiplicity one. Let Z(g) = {i ∈ Zn | g(αi
) = 0}. Then g(X) =
i∈Z(g)(X − αi
). Let c(x) ∈ C. Then c(x) = a(x)g(x), so c(αi
) = 0 for all
i ∈ Z(g). So Z(g) ⊆ Z(C). Conversely, g(x) ∈ C, so g(αi
) = 0 for all i ∈ Z(C).
Hence Z(C) ⊆ Z(g). Therefore Z(g) = Z(C) and g(X) is a product of the
linear factors as claimed. Furthermore the degree of g(X) is equal to |Z(C)|.
Hence the dimension of C is equal to n − |Z(C)| by Proposition 7.1.21.
Proposition 7.2.53 The complete defining set of a cyclic code is the disjoint
union of cyclotomic cosets.
Proof. Let g(X) be the generator polynomial of a cyclic code C. Then
g(αi
) = 0 if and only if i ∈ Z(C) by Proposition 7.2.52. If αi
is a zero of g(X),
then αiq
is a zero of g(X) by Remark 7.2.41. So Cq(i) is contained in Z(C) if
i ∈ Z(C). Therefore Z(C) is the union of cyclotomic cosets. This union is a
disjoint union by Proposition 7.2.43.
Example 7.2.54 Consider the binary cyclic code C of length 7 with defining
set {1}. Then Z(C) = {1, 2, 4} and m1(X) = 1 + X + X3
is the generator
polynomial of C. Hence C is the cyclic Hamming code of Example 7.1.19. The
cyclic code with defining set {3} has generator polynomial m3(X) = 1+X2
+X3
and complete defining set {3, 5, 6}.

Remark 7.2.55 If a cyclic code is given by its zero set, then this deﬁnition
depends on the choice of an element of order n. Consider Example 7.2.54. If
we would have taken α3
as element of order 7, then the generator polynomial
of the binary cyclic code with deﬁning set {1} would have been 1 + X2
+ X3
instead of 1 + X + X3
.
Example 7.2.56 Consider the [6,3] cyclic code over F7 of Example 7.1.24 with
the generator polynomial g(X) = 6 + X + 3X2
+ X3
. Then
X3
+ 3X2
+ X + 6 = (X − 2)(X − 3)(X − 6).
So 2, 3 and 6 are the zeros of the generator polynomial. Choose α = 3 as the
primitive element in F7 of order 6 as in Example 7.2.27. Then α, α2
and α3
are
the zeros of g(X).
Example 7.2.57 Let α be an element of F9 such that α2
= −α + 1 as in
Example 7.2.50. Then 1, α and α3
are the zeros of the ternary cyclic code of
length 8 with generator polynomial 1 + X + X3
of Example 7.1.16, since
X3
+ X + 1 = (X2
+ X − 1)(X − 1) = m1(X)m0(X).
Proposition 7.2.58 Let C be a cyclic code of length n. Then
Z(C⊥
) = Zn { −i | i ∈ Z(C) }.
Proof. The power αi
is a zero of g(X) if and only if i ∈ Z(C) by Proposition
7.2.52. And h(αi
) = 0 if and only if g(αi
) = 0, since h(X) = (Xn
− 1)/g(X)
and all zeros of Xn
− 1 are simple by Remark 7.2.23. Furthermore g⊥
(X) is
the monic reciprocal of h(X) by Proposition 7.1.37. Finally g⊥
(α−i
) = 0 if and
only if h(αi
) = 0 by Remark 7.1.30.
Example 7.2.59 Consider the [6,3] cyclic code C over F7 of Example 7.2.56.
Then α, α2
and α3
are the zeros of g(X). Hence Z(C) = {1, 2, 3} and
Z(C⊥
) = Z6 {−1, −2, −3} = {0, 1, 2},
by Proposition 7.2.58. Therefore
g⊥
(X) = (X −1)(X −α)(X −α2
) = (X −1)(X −3)(X −2) = X3
+X2
+4X +1.
This is in agreement with Example 7.1.38.
Example 7.2.60 Let C be the ternary cyclic code of length 8 with the gen-
erator polynomial g(X) = 1 + X + X3
of Example 7.1.16. Then g(X) =
m0(X)m1(X) and Z(C) = {0, 1, 3} by Example 7.2.57. Hence
Z(C⊥
) = Z8 {0, −1, −3} = {1, 2, 3, 4, 6}.
Proposition 7.2.61 The number of cyclic codes of length n over Fq is equal to
2N
, where N is the number of cyclotomic cosets modulo n with respect to q.
Proof. A cyclic code C of length n over Fq is uniquely determined by its
set of zeros Z(C) by Proposition 7.2.52. The set of zeros is a disjoint union of
cyclotomic cosets modulo n with respect to q by Proposition 7.2.53. Hence a
cyclic code is uniquely determined by a choice of a subset of all N cyclotomic
cosets. There are 2N
of such subsets.

Example 7.2.62 There are 3 cyclotomic cosets modulo 7 with respect to 2.
Hence there are 8 binary cyclic codes of length 7 with generator polynomials
1, m0, m1, m3, m0m1, m0m3, m1m3, m0m1m3.
7.2.5 Exercises
7.2.1 Show that f(X) = 1 + X2
+ X5
is irreducible in F2[X]. Give a principal
representation of the product of β = 1+x+x4
and γ = 1+x3
+x4
in the factor
ring F2[X]/ f(X) by division by f(X) with rest. Give a table of the principal
representation and the Zech logarithm of the powers xi
for i in {∗, 0, 1, . . . , 30}.
Compute the addition of β and γ by means of the exponential representation.
7.2.2 What is the smallest field extension of Fq that has an element of order
37 in case q = 2, 3 and 5? Show that the degree of the extension is always a
divisor of 36 for any prime power q not divisible by 37.
7.2.3 Determine Φ6(X) in characteristic zero. Let p be an odd prime. Show
that Φ6(X) is irreducible in Fp[X] if and only if p ≡ 2 mod 3.
7.2.4 Let α be an element of order 8 in an extension of F5. Give all possible
choices of the minimal polynomial m1(X). Compute the coefficients of mi(X)
for all i between 0 and 7.
7.2.5 Let α be an element of order n in F∗
qe . Let m1(X) be the minimal
polynomial of α in Fq. Estimate the total number of arithmetical operations in
Fq to compute the minimal polynomial mi(X) by means of Proposition 7.2.44
if gcd(i, n) = 1 as a function of n and e. Compare this complexity with the
computation by means of Proposition 7.2.48.
7.2.6 Let C be a cyclic code of length 7 over Fq. Show that {1, 2, 4} is a
complete defining set if q is even.
7.2.7 Compute the zeros of the code of Example 7.1.5.
7.2.8 Show that α = 5 is an element of order 6 in F∗
7. Give the coefficients of
the generator polynomial of the cyclic [6,3] code over F7 with α, α2
and α3
as
zeros.
7.2.9 Consider the binary cyclic code C of length 31 and generator polynomial
1 + X2
+ X5
of Exercise 7.1.6. Let α be a zero of this polynomial. Then α has
order 31 by Exercise 7.2.1.
1) Determine the coefficients of m1(X), m3(X) and m5(X) with respect to α.
2) Determine Z(C) and Z(C⊥
).
7.2.10 Let C be a cyclic code over F5 with m1(X)m2(X) as generator poly-
nomial. Determine Z(C) and Z(C⊥
).
7.2.11 What is the number of ternary cyclic codes of length 8?
7.2.12 [CAS] Using a function MoebiusMu from GAP and Magma write a
program that computes the number of irriducible polynomials of given de-
gree as per Proposition 7.2.19. Check your result with the use of the function
IrreduciblePolynomialsNr in GAP.

7.2.13 [CAS] Take the field GF(2^10) and its primitive element a. Compute
the Zech logarithm of a^100 with respect to a using commands ZechLog both
in GAP and Magma.
7.2.14 [CAS] Using the function CyclotomicCosets in GAP/GUAVA write
a function that takes as an input the code length n, the field size q and a list
of integers L and computes dimension of a q-ary cyclic code which definig set
is {aî|i in L} for some primitive n-th root of unity a (predefined in GAP is
fine).
7.3 Bounds on the minimum distance
*** ***
The BCH bound is a lower bound for the minimum distance of a cyclic code.
Although this bound is tight in many cases, it is not always the true minimum
distance. In this section several improved lower bounds are given but not one of
them gives the true minimum distance in all cases. In fact computing the true
minimum distance of a cyclic code is a hard problem.
7.3.1 BCH bound
Definition 7.3.1 Let C be an Fq-linear code. Let ˜C be an Fqm -linear code in
Fn
qm . If C ⊆ ˜C ∩ Fn
q , then C is called a subfield subcode of ˜C, and ˜C is called a
super code of C. If equality holds, then C is called the restriction (by scalars)
of ˜C.
Remark 7.3.2 Let I be a defining set for the cyclic code C. Then
c(αi
) = c0 + c1αi
+ · · · + cjαij
+ · · · + cn−1αi(n−1)
= 0
for all i ∈ I. Let l = |I|. Let ˜H be the l × n matrix with entries
( αij
| i ∈ I, j = 0, 1, . . . , n − 1 ).
Let ˜C be the Fqm -linear code with ˜H as parity check matrix. Then C is a sub-
field subcode of ˜C, and it is in fact its restriction by scalars. Any lower bound
on the minimum distance of ˜C holds a fortiori for C.
This remark will be used in the following proposition on the BCH (Bose-
Chaudhuri-Hocquenghem) bound on the minimum distance for cyclic codes.
Proposition 7.3.3 Let C be a cyclic code that has at least δ − 1 consecutive
elements in Z(C). Then the minimum distance of C is at least δ.
Proof. The complete defining set C contains {b ≤ i ≤ b+δ−2} for a certain b.
We have seen in Remark 7.3.2 that ( αij
| b ≤ i ≤ b+δ−2, 0 ≤ j n ) is a parity
check matrix of a code ˜C over Fqm that has C as a subfield subcode. The j-th
column has entries αbj
αij
, 0 ≤ i ≤ δ − 2. The code ˜C is generalized equivalent
with the code with parity check matrix ˜H = ( αij
) with 0 ≤ i ≤ δ−2, 0 ≤ j n,
by the linear isometry that divides the j-th coordinate by αbj
for 0 ≤ j n.
Let xj = αj−1
for 1 ≤ j ≤ n. Then ˜H = ( xi
j | 0 ≤ i ≤ δ − 2, 1 ≤ j ≤ n ) is a

7.3. BOUNDS ON THE MINIMUM DISTANCE 215
generator matrix of an MDS code over Fqm with parameters [n, δ − 1, n − δ + 2]
by Proposition 3.2.10. So ˜H is a parity check matrix of a code with parameters
[n, n − δ + 1, δ] by Proposition 3.2.7. Hence the minimum distance of ˜C and C
is at least δ.
Definition 7.3.4 A cyclic code with defining set {b, b+1, . . . , b+δ−2} is called
a BCH code with designed minimum distance δ. The BCH code is called narrow
sense if b = 1, and it is called primitive if n = qm
− 1.
Example 7.3.5 Consider the binary cyclic Hamming code C of length 7 of
Example 7.2.28. The complete defining set of C is {1, 2, 4} and contains two
consecutive elements. So 3 is a lower bound for the minimum distance. This
is equal to the minimum distance. Let D be the binary cyclic code of length 7
with defining set {0, 3}. Then the complete defining set of D is {0, 3, 5, 6}. So
5, 6, 7 are three consecutive elements in Z(D), since 7 ≡ 0 mod 7. So 4 is a
lower bound for the minimum distance of D. In fact equality holds, since D is
the dual of C that is the [7, 3, 4] binary simplex code.
Example 7.3.6 Consider the [6,3] cyclic code C over F7 of Example 7.2.56.
The zeros of the generator polynomial are α, α2
and α3
. So Z(C) = {1, 2, 3}
and d(C) ≥ 4. Now g(x) = 6 + x + 3x2
+ x3
is a codeword of weight 4. Hence
the minimum distance is 4.
Remark 7.3.7 If α and β are both elements of order n, then there exist r, s
in Zn such that β = αr
and α = βs
and rs ≡ 1mod n by Theorem 7.2.36. If
C is the cyclic code with defining set I with respect to α, and D is the cyclic
code with defining set I with respect to β, then C and D are equivalent under
the permutation σ of Zn such that σ(0) = 0 and σ(i) = ir for i = 1, . . . , n − 1.
Hence a cyclic code that is given by its zero set is defined up to equivalence by
the choice of an element of order n.
Example 7.3.8 Consider the binary cyclic code C1 of length 17 and m1(X) as
generator polynomial. Then
Z1 = {−, 1, 2, −, 4, −, −, −, 8, 9, −, −, −, 13, −, 15, 16}
is the complete defining set of C1, where the spacing − indicates a gap. The
BCH bound gives 3 as a lower bound for the minimum distance of C1. The code
C3 with generator polynomial m3(X) has complete defining set
Z3 = {−, −, −, 3, −, 5, 6, 7, −, −, 10, 11, 12, −, 14, −, −}.
Hence 4 is a lower bound for d(C3). The cyclic codes of length 17 with generator
polynomial mi(X) are equivalent if i = 0, by Remark 7.3.7, since the order of
αi
is 17. Hence d(C1) = d(C3) ≥ 4. In Example 7.4.2 it will be shown that the
minimum distance is in fact 5.
The following definition does not depend on the choice of an element of order
n.

Definition 7.3.9 A subset of Zn of the form {b + ia|0 ≤ i ≤ δ − 2} for some
integers a, b and δ with gcd(a, n) = 1 and δ ≤ n + 1 is called consecutive of
period a. Let I be a subset of Zn. The number δBCH(I) is the largest integer
δ ≤ n+1 such that there is a consecutive subset of I consisting of δ−1 elements.
Let C be a cyclic code of length n. Then δBCH(Z(C)) is denoted by δBCH(C).
Theorem 7.3.10 The minimum distance of C is at least δBCH(C),
Proof. Let α be an element of order n in an extension of Fq. Suppose that the
complete defining set of C with respect to α contains the set {b+ia|0 ≤ i ≤ δ−2}
of δ−1 elements for some integers a and b with gcd(a, n) = 1. Let β = αa
. Then
β is an element of order n and there is an element c ∈ Zn such that ac = 1, since
gcd(a, n) = 1. Hence {bc + i|0 ≤ i ≤ δ − 2} is a defining set of C with respect
to β containing δ − 1 consecutive elements. Hence the minimum distance of C
is at least δ by Proposition 7.3.3.
Remark 7.3.11 One easily sees a consecutive set of period one in Z(C) by
writing the elements in increasing order and the gaps by a spacing as done in
Example 7.3.8. Suppose gcd(a, n) = 1. Then there exists a b such that ab ≡ 1
mod n. A consecutive set of period a is seen by considering b·Z(C) and its con-
secutive sets of period 1. In this way one has to inspect ϕ(n) complete defining
sets for its consecutive sets of period 1. The complexity of this computation is
at most ϕ(n)|Z(C)| in the worst case. But quite often it is much less in case
b · Z(C) = Z(C).
Example 7.3.12 This is a continuation of Example 7.3.8. The complete defin-
ing set Z3 of the code C3 has {5, 6, 7} as largest consecutive subset of period
1 in Z17. Now 3 · 6 ≡ 1 mod 17 and 6 · {5, 6, 7} = {13, 2, 8} is a consecutive
subset of period 6 in Z17 of three elements contained in the complete defining
set Z1 of the code C1. Now b · Z1 is equal to Z1 or Z3 for all 0 b 17. Hence
δBCH(C1) = δBCH(C3) = 4.
Example 7.3.13 Consider the binary BCH code Cb of length 15 and with
defining set {b, b + 1, b + 2, b + 3} for some b. So its designed distance is 5.
Take α in F∗
16 with α4
= 1 + α as primitive element. Then m0(X) = 1 + X,
m1(X) = 1+X +X4
, m3(X) = 1+X +X2
+X3
+X4
and m5(X) = 1+X +X2
.
If b = 1, then the complete defining set is {1, 2, 3, 4, 6, 8, 9, 12} so δBCH(C1) = 5.
The generator polynomial is
g1(X) = m1(X)m3(X) = 1 + X4
+ X6
+ X7
+ X8
as is shown in Example 7.1.39 and has weight 5. So the minimum distance of
C1 is 5.
If b = 0, then δBCH(C0) = 6. The generator polynomial is
g0(X) = m0(X)m1(X)m3(X) = 1 + X + X4
+ X5
+ X6
+ X9
and has weight 6. So the minimum distance C0 is 6.
If b = 2 or b = 3, then δBCH(C2) = 7. The generator polynomial is g2(X) is
equal to 1 + X + X2
+ X4
+ X5
+ X8
+ X10
and has weight 7. So the mini-
mum distance C2 is 7. If b = 4 or b = 5, then δBCH(C4) = 15. The generator
polynomial is g4(X) is equal to 1 + X + X2
+ · · · + X12
+ X13
+ X14
and has
weight 15. So the minimum distance C4 is 15.

7.3. BOUNDS ON THE MINIMUM DISTANCE 217
Example 7.3.14 Consider the primitive narrow sense BCH code of length 15
over F16 with designed distance 5. So the defining set is {1, 2, 3, 4}. Then this
is also the complete defining set. Take α with α4
= 1 + α as primitive element.
Then the generator polynomial is given by
(X − α)(X − α2
)(X − α3
)(X − α4
) = α10
+ α3
X + α6
X2
+ α13
X3
+ X4
.
In all these cases of the previous two examples the minimum distance is equal
to the BCH bound and equal to the weight of the generator polynomial. This
in not always the case as one see in Exercise 7.3.8
Example 7.3.15 Although BCH codes are a special case of codes defined
through roots as in Section 7.2.4, GAP and Magma have special functions for
constructing these. In GAP/GUAVA we proceed as follows.
C:=BCHCode(17,3,GF(2));
a cyclic [17,9,3..5]3..4 BCH code, delta=3, b=1 over GF(2)
DesignedDistance(C);
3
MinimumDistance(C);
5
Syntax is BCHCode(n,delta,F), where n is the length, delta is δ in Defini-
tion 7.3.4, and F is the ground field. So here we constructed the narrow-
sense BCH code. One can give the parameter b explicitly, by the command
BCHCode(n,b,delta,F). The designed distance for a BCH code is printed in
its description, but can also be called separately as above. Note that code C
coincides with the code CR from Example 12.5.17.
In Magma we proceed as follows.
C:=BCHCode(GF(2),17,3); // [17, 9, 5] BCH code (d = 3, b = 1) //
Linear Code over GF(2)
a:=RootOfUnity(17,GF(2));
C:=CyclicCode(17,[a^3],GF(2));
BCHBound(C);
4
We can also provide b giving it as the last parameter in the BCHCode command.
Note that there is a possibility in Magma to compute the BCH bound as above.
7.3.2 Quadratic residue codes
***
7.3.3 Hamming, simplex and Golay codes as cyclic codes
Example 7.3.16 Consider an Fq-linear cyclic code of length n = (qr
−1)/(q−1)
with defining set {1}. Let α be an element of order n in F∗
qr . The minimum
distance of the code is at least 2, by the BCH bound. If gcd(r, q − 1) = i 1,
then i divides n, since
n =
qr
− 1
q − 1
= qr−1
+ · · · + q + 1 ≡ r mod (q − 1).
Let j = n/i. Let c0 = −αj
. Then c0 ∈ F∗
q, since j(q − 1) = n(q − 1)/i is a
multiple of n. So c(x) = c0 + xj
is a codeword of weight 2 and the minimum

distance is 2. Now consider the case with q = 3 and r = 2 in particular. Then
α ∈ F∗
9 is an element of order 4 and c(x) = 1 + x2
is a codeword of the ternary
cyclic code of length 4 with defining set {1}. So this code has parameters [4,2,2].
Proposition 7.3.17 Let n = (qr
−1)/(q−1). If r is relatively prime with q−1,
then the Fq-linear cyclic code of length n with defining set {1} is a generalized
[n, n − r, 3] Hamming code.
Proof. Let α be an element of order n in F∗
qr . The minimum distance of
the code is at least 2 by the BCH bound. Suppose there is a codeword c(x) of
weight 2 with nonzero coefficients ci and cj with 0 ≤ i j n. Then c(α) = 0.
So ciαi
+ cjαj
= 0. Hence αj−i
= −ci/cj. Therefore α(j−i)(q−1)
= 1, since
−ci/cj ∈ F∗
q. Now n|(j − i)(q − 1), but since gcd(n, q − 1) = gcd(r, q − 1) = 1 by
assumption, it follows that n|j−i, which is a contradiction. Hence the minimum
distance is at least 3. Therefore the parameters are [n, n − r, 3] and the code is
equivalent with the Hamming code Hr(q) by Proposition 2.5.19.
Corollary 7.3.18 The simplex code Sr(q) is equivalent with a cyclic code if r
is relatively prime with q − 1.
Proof. The dual of a cyclic code is cyclic by Proposition 7.1.6 and a sim-
plex code is by definition the dual of a Hamming code. So the statement is a
consequence of Proposition 7.3.17.
Proposition 7.3.19 The binary cyclic code of length 23 with defining set {1}
is equivalent to the binary [23,12,7] Golay code.
Proof. ***
Proposition 7.3.20 The ternary cyclic code of length 11 with defining set {1}
is equivalent to the ternary [11,6,5] Golay code.
Proof. ***
*** Show that there are two generator polynomials of a ternary cyclic code of
length 11 with defining set {1}, depending on the choice of an element of order
11. Give the coefficients of these generator polynomials. ***
7.3.4 Exercises
7.3.1 Let C be the binary cyclic code of length 9 and defining set {0, 1}. Give
the BCH bound of this code.
7.3.2 Show that a nonzero binary cyclic code of length 11 has minimum dis-
tance 1, 2 or 11.
7.3.3 Choose the primitive element as in Exercise 7.2.9. Give the coefficients
of the generator polynomial of a cyclic H5(2) Hamming code and give a word
of weight 3.

7.4. IMPROVEMENTS OF THE BCH BOUND 219
7.3.4 Choose the primitive element as in Exercise 7.2.9. Consider the binary
cyclic code C of length 31 and generator polynomial. m0(X)m1(X)m3(X)m5(X).
Show that C has dimension 15 and δBCH(C) = 8. Give a word of weight 8.
7.3.5 Determine δBCH(C) for all the binary cyclic codes C of length 17.
7.3.6 Show the existence of a binary cyclic code of length 127, dimension 64
and minimum distance at least 21.
7.3.7 Let C be the ternary cyclic code of length 13 with complete defining set
{1, 3, 4, 9, 10, 12}. Show that δBCH(C) = 5 and that it is the true minimum
distance.
7.3.8 Consider the binary code C of length 21 and defining set {1}.
1)Show that there are exactly two binary irreducible polynomials of degree 6
that have as zeros elements of order 21.
2)Show that the BCH bound and the minimum distance are both equal to 3.
3) Conclude that the minimum distance of a cyclic code is not always equal to
the minimal weight of the generator polynomials of all equivalent cyclic codes.
7.4 Improvements of the BCH bound
***.....***
7.4.1 Hartmann-Tzeng bound
Proposition 7.4.1 Let C be a cyclic code of length n with defining set I. Let
U1 and U2 be two consecutive sets in Zn consisting of δ1 −1 and δ2 −1 elements,
respectively. Suppose that U1 + U2 ⊆ I. Then the minimum distance of C is at
least δ1 + δ2 − 2.
Proof. This is a special case of the forthcoming Theorem 7.4.19 and Proposi-
tion 7.4.20. ***direct proof***
Example 7.4.2 Consider the binary cyclic code C3 of length 17 and defining
set {3} of Example 7.3.8. Then Proposition 7.4.1 applies with U1 = {5, 6, 7},
U2 = {0, 5}, δ1 = 4 and δ2 = 3. Hence the minimum distance of C3 is at least
5. The factorization of 1 + X17
in F2[X] is given by
(1 + X)(1 + X3
+ X4
+ X5
+ X8
)(1 + X + X2
+ X4
+ X6
+ X7
+ X8
).
Let α be a zero of the second factor. Then α is an element of F28 of order
17. Hence m1(X) is the second factor and m3(X) is the third factor. Now
1 + x3
+ x4
+ x5
+ x8
is a codeword of C1 of weight 5. Furthermore C1 and C3
are equivalent. Hence d(C3) = 5.
Definition 7.4.3 For a subset I of Zn, let δHT (I) be the largest number δ such
that there exist two nonempty consecutive sets U1 and U2 in Zn consisting of
δ1 − 1 and δ2 − 1 elements, respectively, with U1 + U2 ⊆ I and δ = δ1 + δ2 − 2.
Let C be a cyclic code of length n. Then δHT (Z(C)) is denoted by δHT (C).

Theorem 7.4.4 The Hartmann-Tzeng bound. Let I be the complete defining
set of a cyclic code C. Then the minimum distance of C is at least δHT (I).
Proof. This is a consequence of Definition 7.4.3 and Proposition 7.4.1.
Proposition 7.4.5 Let I be a subset of Zn. Then δHT (I) ≥ δBCH(I).
Proof. If we take U1 = U, U2 = {0}, δ1 = δ and δ2 = 2 in the HT bound, then
we get the BCH bound.
Remark 7.4.6 In computing δHT (I) one considers all a·I with gcd(a, n) = 1 as
in Remark 7.3.11. So we may assume that U1 is a consecutive set of period one.
Let S(U1) = {i ∈ Zn|i + U1 ⊆ I} be the shift set of U1. Then U1 + S(U1) ⊆ I.
Furthermore if U1 + U2 ⊆ I, then U2 ⊆ S(U1). Take a consecutive subset U2
of S(U1). This gives all desired pairs (U1, U2) of consecutive subsets in order to
compute δHT (I).
Example 7.4.7 Consider Example 7.4.2. Then U1 is a consecutive subset of
period one of Z3 and U2 = S(U1) is a consecutive subset of period five. And
U1 = {1, 2} is a consecutive subset of period one of Z1 and U2 = S(U1) =
{0, 7, 14} is a consecutive subset of period seven. The choices of (U1, U2) and
(U1, U2) are both optimal. Hence δHT (Z1) = 5.
Example 7.4.8 Let C be the binary cyclic code of length 21 and defining set
{1, 3, 7, 9}. Then
I = {−, 1, 2, 3, 4, −, 6, 7, 8, 9, −, 11, 12, −, 14, 15, 16, −, 18, −, −}
is the complete defining set of C. From this we conclude that δBCH(I) ≥ 5 and
δHT (I) ≥ 6. By considering 5 · I one concludes that in fact equalities hold. But
we show in Example 7.4.17 that the minimum distance of C is strictly larger
than 6.
7.4.2 Roos bound
The Roos bound is first formulated for arbitrary linear codes and afterwards
applied to cyclic codes.
Definition 7.4.9 Let a, b ∈ Fn
q . Define the star product a∗b by the coordinate
wise multiplication:
a ∗ b = (a1b1, . . . , anbn).
Let A and B be subsets of Fn
q . Define
A ∗ B = { a ∗ b | a ∈ A, b ∈ B }.
Remark 7.4.10 If A and B are subsets of Fn
q , then
(A ∗ B)⊥
= { c ∈ Fn
q | (a ∗ b) · c = 0 for all a ∈ A, b ∈ B }

is a linear subspace of Fq. But if A and B are linear subspaces of Fn
q , then A∗B
is not necessarily a linear subspace. See Example 9.1.3.
Consider the star product combined with the inner product. Then
(a ∗ b) · c =
n
i=1
aibici.
Hence a · (b ∗ c) = (a ∗ b) · c.
Proposition 7.4.11 Let C be an Fq-linear code of length n. Let (A, B) be a
pair of Fqm -linear codes of length n such that C ⊆ (A ∗ B)⊥
. Assume that A is
not degenerate and k(A)+d(A)+d(B⊥
) ≥ n+3. Then d(C) ≥ k(A)+d(B⊥
)−1.
Proof. Let a = k(A) − 1 and b = d(B⊥
) − 1. Let c be a nonzero element of
C with support I. If |I| ≤ b, then take i ∈ I. There exists an a ∈ A such that
ai = 0, since A is not degenerate. So a∗c is not zero. Now (c∗a)·b = c·(a∗b)
by Remark 7.4.10 and this is equal to zero for all b in B, since C ⊆ (A ∗ B)⊥
.
Hence a ∗ c is a nonzero element of B⊥
of weight at most b. This contradicts
d(B⊥
) b. So b |I|.
If |I| ≤ a + b, then we can choose index sets I− and I+ such that I− ⊆ I ⊆ I+
and I− has b elements and I+ has a + b elements. Recall from Definition 4.4.10
that A(I+ I−) is defined as the space {a ∈ A|ai = 0 for all i ∈ I+ I−}.
Now k(A) a and I+ I− has a elements. Hence A(I+ I−) is not zero. Let
a be a nonzero element of A(I+ I−). The vector c ∗ a is an element of B⊥
and has support in I−. Furthermore |I−| = b d(B⊥
), hence a ∗ c = 0, so
ai = 0 for all i ∈ I+. Therefore a is a nonzero element of A of weight at most
n − |I+| = n − (a + b), which contradicts the assumption d(A) n − (a + b). So
|I| a + b. Therefore d(C) ≥ a + b + 1 = k(A) + d(B⊥
) − 1.
In order to apply this proposition to cyclic codes some preparations are needed.
Definition 7.4.12 Let U be a subset of Zn. Let α be an element of order n
in F∗
qm . Let CU be the code over Fqm of length n generated by the elements
(1, αi
, . . . , αi(n−1)
) for i ∈ U. Then U is called a generating set of CU . Let dU
be the minimum distance of the code C⊥
U .
Remark 7.4.13 Notice that CU and its dual are codes over Fqm . Every subset
U of Zn is a complete defining set with respect to qm
, since n divides qm
− 1,
so qm
U = U. Furthermore CU has dimension |U|. The code CU is cyclic, since
σ(1, αi
, . . . , αi(n−1)
) = α−i
(1, αi
, . . . , αi(n−1)
).
U is the complete defining set of C⊥
U . So dU ≥ δBCH(U). Beware that dU is
by definition the minimum distance of C⊥
U over Fqm and not of the cyclic code
over Fq with defining set U.
Remark 7.4.14 Let U and V be subsets of Zn. Let w ∈ U +V . Then w = u+v
with u ∈ U and v ∈ V . So
(1, αw
, . . . , αw(n−1)
) = (1, αu
, . . . , αu(n−1)
) ∗ (1, αv
, . . . , αv(n−1)
)
Hence
CU ∗ CV ⊆ CU+V .
Therefore C ⊆ (CU ∗ CV )⊥
if C is a cyclic code with U + V in its defining set.

Remark 7.4.15 Let U be a subset of Zn. Let ¯U be a consecutive set containing
U. Then U is the complete defining set of C⊥
U . Hence Zn {−i|i ∈ U} is the
complete defining set of CU by Proposition 7.2.58. Then Zn {−i|i ∈ ¯U} is a
consecutive set of size n−| ¯U| that is contained in the defining set of CU . Hence
the minimum distance of CU is at least n − | ¯U| + 1 by the BCH bound.
Proposition 7.4.16 Let U be a nonempty subset of Zn that is contained in the
consecutive set ¯U. Let V be a subset of Zn such that | ¯U| ≤ |U| + dV − 2. Let C
be a cyclic code of length n such that U + V is in the set of zeros of C. Then
the minimum distance of C is at least |U| + dV − 1.
Proof. Let A and B be the cyclic codes with generating sets U and V ,
respectively. Then A has dimension |U| by Remark 7.4.13 and its minimum
distance is at least n − | ¯U| + 1 by Remark 7.4.15. A generating matrix of A
has no zero column, since otherwise A would be zero, since A is cyclic; but A is
not zero, since U is not empty. So A is not degenerate. Moreover d(B⊥
) = dV ,
by Definition 7.4.12. Hence k(A) + d(A) + d(B⊥
) ≥ |U| + (n − | ¯U| + 1) + dV
which is at least n + 3, since | ¯U| ≤ |U| + dV − 2. Finally C ⊆ (A ∗ B)⊥
by
Remark 7.4.14. Therefore all assumptions of Proposition 7.4.25 are fulfilled.
Hence d(C) ≥ k(A) + d(B⊥
) − 1 = |U| + dV − 1.
Example 7.4.17 Let C be the binary cyclic code of Example 7.4.8. Let U =
4 · {0, 1, 3, 5} and V = {2, 3, 4}. Then ¯U = 4 · {0, 1, 2, 3, 4, 5} is a consecutive set
and dV = 4. By inspection of the table
+ 0 4 12 20
2 2 6 14 1
3 3 7 15 2
4 4 8 16 3
we see that U + V is contained in the complete defining set of C. Furthermore
| ¯U| = 6 = |U| + dV − 2. Hence d(C) ≥ 7 by Proposition 7.4.16. The alternative
choice with U = 4 · {0, 1, 2, 3, 5, 6}, ¯U = 4 · {0, 1, 2, 3, 4, 5, 6} and V = {3, 4}
gives d(C) ≥ 8 by the Roos bound. This in fact is the true minimum distance.
Definition 7.4.18 Let I be a subset of Zn. Denote by δR(I) the largest number
δ such that there exist nonempty subsets U and V of Zn and a consecutive set
¯U with U ⊆ ¯U, U + V ⊆ I and | ¯U| ≤ |U| + dV − 2 = δ − 1.
Let C be a cyclic code of length n. Then δR(Z(C)) is denoted by δR(C).
Theorem 7.4.19 The Roos bound. The minimum distance of a cyclic code C
is at least δR(C).
Proof. This is a consequence of Proposition 7.4.16 and Definition 7.4.18.
Proposition 7.4.20 Let I be a subset of Zn. Then δR(I) ≥ δHT (I).
Proof. Let U1 and U2 be nonempty consecutive subsets of Zn of sizes δ1 − 1
and δ2 − 1, respectively. Let U = ¯U = U1 and V = U2. Now dV = δ2 ≥ 2, since
V is not empty. Hence | ¯U| ≤ |U| + dV − 2. Applying Proposition 7.4.16 gives
δR(I) ≥ |U| + dV − 1 ≥ δ1 + δ2 − 2. Hence δR(I) ≥ δHT (I).
Example 7.4.21 Examples 7.4.8 and 7.4.17 give a subset I of Z21 such that
δBCH(I) δHT (I) δR(I).

7.4.3 AB bound
Remark 7.4.22 In 3.1.2 we defined for every subset I of {1, . . . , n} the pro-
jection map πI : Fn
q → Ft
q by πI(x) = (xi1 , . . . , xit ), where I = {i1, . . . , it} and
1 ≤ i1 . . . it ≤ n. We denoted the image of πI by A(I) and the kernel
of πI by A(I), that is A(I) = {a ∈ A | ai = 0 for all i ∈ I}. Suppose that
dim A = k and |I| = t. If t d(A⊥
), then dim A(I) = k − t by Lemma 4.4.13,
and therefore dim A(I) = t.
The following proposition is known for cyclic codes as the AB or the van Lint-
Wilson bound.
Proposition 7.4.23 Let A, B and C be linear codes of length n over Fq such
that (A ∗ B) ⊥ C and d(A⊥
) a 0 and d(B⊥
) b 0. Then d(C) ≥ a + b.
Proof. Let c be a nonzero codeword in C with support I, that is to say
I = {i | ci = 0}. Let t = |I|. Without loss of generality we may assume that
a ≤ b. We have that
dim(AI) + dim(BI) ≥



2t if t ≤ a
a + t if a t ≤ b
a + b if b t
by Remark 7.4.22. But (A ∗ B) ⊥ C, so (c ∗ A)I ⊥ BI. Moreover
dim((c ∗ A)I) = dim(AI),
since ci = 0 for all i ∈ I. Therefore
dim(AI) + dim(BI) ≤ |I| = t.
This is only possible in case t ≥ a + b. Hence d(C) ≥ a + b.
Example 7.4.24 Consider the binary cyclic code of length 21 and defining set
{0, 1, 3, 7}. Then the complete defining set of this code is given by
I = {0, 1, 2, 3, 4, −, 6, 7, 8, −, −, 11, 12, −, 14, 1−, 16, −, −, −, −}.
We leave it as an exercise to show that δBCH(I) = δHT (I) = δR(I) = 6.
Application of the AB bound to U = {1, 2, 3, 6} and V = {0, 1, 5} gives that the
minimum distance is at least 7. The minimum distance is at least 8, since it is
an even weight code.
Remark 7.4.25 Let C be an Fq-linear code of length n. Let (A, B) be a pair
of Fqm -linear codes of length n. Let a = k(A) − 1 and b = d(B⊥
) − 1, then
one can restate the conditions of Proposition as follows: If (1) (A ∗ B) ⊥ C,
(2) k(A) a, (3) d(B⊥
) b, (4) d(A) + a + b n and (5) d(A⊥
) 1, then
d(C) ≥ a + b + 1.
The original proof given by Van Lint and Wilson of the Roos bound is as follows.
Let A be a generator matrix of A. Let AI be the submatrix of A consisting of
the columns indexed by I. Then rank(AI) = dim(AI). Condition (5) implies
that A has no zero column, so rank(AI) ≥ 1 for all I with at least one element.
Let I be an index set such that |I| ≤ a + b, then any two words of A differ in

at least one place of I, since d(A) n − (a + b) ≥ n − |I|, by Condition (4). So
A and AI have the same number of codewords, so rank(AI) ≥ k(A) ≥ a + 1.
Hence for any I such that b |I| ≤ a + b we have that rank(AI) ≥ |I| − b + 1.
Let B be a generator matrix of B. Then Condition (3) implies:
rank(BI) =
|I| if |I| ≤ b
≥ b if |I| b
by Remark 7.4.22. Therefore,
rank(AI) + rank(BI) |I| for |I| ≤ a + b
Now let c be a nonzero element of C with support I, then rank(AI)+rank(BI) ≤
|I|, as we have seen in the proof of Proposition 7.4.23. Hence |I| a + b, so
d(C) a + b.
Example 7.4.26 In this example we show that the assumption that A is non-
degenerate is necessary. Let A, B⊥
and C be the binary codes with gener-
ating matrices (011), (111) and (100), respectively. Then A ∗ C ⊆ B⊥
and
k(A) = 1, d(A) = 2, n = 3 and d(B⊥
) = 3, so k(A) + d(A) + d(B⊥
) = 6 = n + 3,
but d(C) = 1.
7.4.4 Shift bound
***....***
Deﬁnition 7.4.27 Let I be a subset of Zn. A subset A of Zn is called inde-
pendent with respect to I if it can be obtained by the following rules:
(I.1) the empty set is independent with respect to I.
(I.2) if A is independent with respect to I and A is a subset of I and b ∈ Zn is
not an element of I, then A ∪ {b} is independent with respect to I.
(I.3) if A is independent with respect to I and c ∈ Zn, then c+A is independent
with respect to I, where c + A = {c + a | a ∈ A}.
Remark 7.4.28 The name ”shifting” refers to condition (I.3). A set A is
independent with respect to I if and only if there exists a sequence of sets
A1, . . . , Aw and elements a1, . . . , aw−1 and b0, b1, . . . , bw−1 in Zn such that A1 =
{b0} and A = Aw and furthermore
Ai+1 = (ai + Ai) ∪ {bi} and
ai + Ai is a subset of I and bi is not an element of I.
Then
Ai = {bl−1 +
i−1
j=l aj | l = 1, . . . , i },
and all Ai are independent with respect to I.
Let i1, i2, . . . , iw and j1, j2, . . . , jw be new sequences which are obtained from
the sequences a1, . . . , aw−1 and b0, b1 . . . , bw−1 by:
iw = 0, iw−1 = a1 , . . . , iw−k = a1 + · · · + ak and jk = bk−1 − iw−k+1.

These data can be given in the following table
j1 j2 j3 . . . jw−1 jw +
aw−1 Aw i1 + j1 i1 + j2 i1 + j3 . . . i1 + jw−1 bw−1 i1
aw−2 Aw−1 i2 + j1 i2 + j2 i2 + j3 . . . bw−2 i2
...
...
...
...
...
...
a2 A3 a1 + a2 + b0 a2 + b1 b2 iw−2
a1 A2 a1 + b0 b1 iw−1
A1 b0 iw
with the elements of At as rows in the middle part. The enumeration of the
At is from the bottom to the top, and the bi are on the diagonal. In the first
row and the last column the jl and the ik are tabulated, respectively. The sum
ik + jl is given in the middle part.
By this transformation it is easy to see that a set A is independent with respect
to I if and only if there exist sequences i1, i2, . . . , iw and j1, j2, . . . , jw such that
A = {i1 + jl | 1 ≤ l ≤ w} and
ik + jl ∈ I for all l + k ≤ w and ik + jl ∈ I for all l + k = w + 1.
So the entries in the table above the diagonal are elements of I, and on the
diagonal are not in I.
Notice that in this formulation we did not assume that the sets
{ik | 1 ≤ k ≤ w}, {jl | 1 ≤ l ≤ w}
and A have size w, since this is a consequence of this definition. If for instance
ik = ik for some 1 ≤ k k ≤ w, then ik + jw+1−k = ik + jw+1−k ∈ I, but
ik + jw+1−k ∈ I, which is a contradiction.
Definition 7.4.29 For a subset Z of Zn, let µ(Z) be the maximal size of a
set which is independent with respect to Z. Define the shift bound bound for a
subset I of Zn as follows:
δS(I) = min{ µ(Z) | I ⊆ Z ⊆ Zn, Z = Zn and Z a complete defining set }.
Theorem 7.4.30 The minimum distance of C(I) is at least δS(I).
The proof of this theorem will be given at the end of this section.
Proposition 7.4.31 The following inequality holds:
δS(I) ≥ δHT (I).
Proof. There exist δ, s and a such that gcd(a, n) = 1 and δHT (I) = δ + s and
{i + j + ka | 1 ≤ j δ, 0 ≤ k ≤ s} ⊆ I.
Suppose Z is a complete defining set which contains I and is not equal to Zn.
Then there exists a δ ≥ δ such that i + j ∈ Z for all 1 ≤ j δ and i + δ ∈ Z.

The set {i + j + ka | 1 ≤ j δ, k ∈ Zn} is equal to Zn, since gcd(a, n) δ.
So there exist s ≥ s and j such that i + j + ka ∈ Z for all 1 ≤ j δ and
0 ≤ k ≤ s , and 1 ≤ j δ and i + j + (s + 1)a ∈ Z. Let w = δ + s . Let
ik = (k − 1)a for all 1 ≤ k ≤ s + 1, and ik = δ − δ − s − 1 + k for all k
such that s + 2 ≤ k ≤ δ + s . Let jl = i + l for all 1 ≤ l ≤ δ − 1, and let
jl = i+j +(l −δ +1)a for all l such that δ ≤ l ≤ δ +s . Then one easily checks
that ik + jl ∈ Z for all k + l ≤ w, and ik + jw−k+1 = i + j +(s +1)a ∈ Z for all
1 ≤ k ≤ s +1, and ik +jw−k+1 = i+δ ∈ Z for all s +2 ≤ k ≤ δ+s . So we have
a set which is independent with respect to Z and has size w = δ + s ≥ δ + s.
Hence µ(Z) ≥ δ +s for all complete defining sets Z which contain I and are not
equal to Zn. Therefore δS(I) ≥ δHT (I).
Example 7.4.32 The binary Golay code of length 23 can be defined as the
cyclic code with defining set {1}, see Proposition 7.3.19. In this example we
show that the shift bound is strictly greater than the HT bound and is still
not equal to the minimum distance. Let Ii be the cyclotomic coset of i. Then
Z0 = {0},
Z1 = {−, 1, 2, 3, 4, −, 6, −, 8, 9, −, −, 12, 13, −, −, 16, −, 18, −, −, −, −},
and
Z5 = {−, −, −, −, −, 5, −, 7, −, −, 10, 11, −, −, 14, 15, −, 17, −, 19, 20, 21, 22}
Then δBCH(Z1) = δHT (Z1) = 5.
Let (a1, . . . , a5) = (1, −1, −3, 7, 4, 13) and (b0, . . . , b5) = (5, 5, 5, 14, 5, 5). Then
the At+1 = (At + at) ∪ {bt} are given in the rows of the middle part of the
following table
at At+1 it+1
5 6 9 11 −2 8
13 A6 2 3 6 8 18 5 −3
4 A5 12 13 16 18 5 7
7 A4 8 9 12 14 3
−3 A3 1 2 5 −4
−1 A2 4 5 −1
A1 5 0
with the at in the first column and the bt in the diagonal. The corresponding se-
quence (i1, . . . , i6) = (−3, 7, 3, −4, −1, 0) is given in the last column of the table
and (j1, . . . , j6) = (5, 6, 9, 11, −2, 8) in second row. So Z1 has an independent
set of size 6. In fact this is the maximal size of an independent set of Z1.
Hence µ(Z1) = 6. The defining sets Z0, Z1 and Z5 and their unions are com-
plete, and these are the only ones. Let Z0,1 = Z0 ∪ Z1, then Z0,1 has an
independent set of size 7, since A6 is independent with respect to Z1 and also
with respect to Z0,1, and −2 + A6 = {0, 1, 4, 6, 16, 3} is a subset of Z0,1 and
5 ∈ Z0,1, so A7 = {0, 1, 4, 6, 16, 3, 5} is independent with respect to Z0,1. Fur-
thermore Z1,5 = Z1 ∪ Z5 contains a sequence of 22 consecutive elements, so
µ(Z1,5) ≥ 23. Therefore δS(Z1) = 6. But the minimum distance of the binary
Golay code is 7, since otherwise there would be a word c ∈ C(Z1) of weight 6,
so c ∈ C(Z0,1), but δS(Z0,1) ≥ 7, which is a contradiction.

Example 7.4.33 Let n = 26, F = F27, and F0 = F3. Let 0, 13, 14, 16, 17, 22,
23 and 25 be the elements of I. Let U = {0, 3, 9, 12} and V = {13, 14}. Then
dV = 3 and ¯U = {0, 3, 6, 9, 12}, so | ¯U| = 5 ≤ 4 + 3 − 2. Moreover I contains
U + V . Hence δR(I) ≥ 4 + 3 − 1 = 6, but in fact δS(I) = 5.
Example 7.4.34 ***Example of δR(I) δS(I).***
Example 7.4.35 It is necessary to take the minimum of all µ(Z) in the defi-
nition of the shift bound. The maximal size of an independent set with respect
to a complete defining set I is not a lower bound for the minimum distance of
the cyclic code with I as defining set, as the following example shows. Let F
be a finite field of odd characteristic. Let α be a non-zero element of F of even
order n. Let I = {2, 4, . . . , n − 2} and I = {0, 2, 4, . . . , n − 2}. Then I and I
are complete and µ(I) = 3, since {2, 0, 1} is independent with respect to I, but
µ(I) = 2.
***Picture of interrelations of the several bounds.
One way to get a bound on the weight of a codeword c = (c0, . . . , cn−1) is ob-
tained by looking for a maximal non-singular square submatrix of the matrix
of syndromes (Sij). For cyclic codes we get in this way a matrix, with entries
Sij = ckαk(i+j)
, which is constant along back-diagonals.
Suppose gcd(n, q) = 1. Then there is a field extension Fqm of Fq such that F∗
qm
has an element α of order n. Let ai = (1, αi
, . . . , αi(n−1)
). Then { ai | i ∈ Zn}
is a basis of Fn
qm .
Consider the following generalization of the definition of a syndrome 6.2.2.
Definition 7.4.36 The syndrome of a word y ∈ Fn
0 with respect to ai and bj
is defined by
Si,j(y) = y · ai ∗ aj.
Let S(y) be the syndrome matrix with entries Si,j(y).
Notice that ai ∗ aj = ai+j for all i, j ∈ Zn. Hence Si,j = Si+j.
Lemma 7.4.37 Let y ∈ Fn
0 . Let I = { i + j | i, j ∈ Zn and y · ai ∗ bj = 0 }.
If A is independent with respect to I, then wt(y) ≥ |A|.
Proof. Suppose A is independent with respect to I and has w elements, then
there exist sequences i1, . . . , iw and j1, . . . , jw such that A consists of the pairs
(i1, j1), (i1, j2), . . . , (i1, jw) and (ik, jl) ∈ I for all k + l ≤ w and (ik, jl) ∈ I for
all k + l = w + 1. Consider the (w × w) matrix M with entries Mk,l = Sik,jl
(y).
By the assumptions we have that M is a matrix such that Mk,l = 0 for all
k + l ≤ w and Mk,l = 0 for all k + l = w + 1, that is to say with zeros above the
back-diagonal and non-zeros on the back-diagonal, so M has rank w. Moreover
M is a submatrix of the matrix S(y) which can be written as a product:
S(y) = HD(y)HT
,
where H is the matrix with the ai as row vectors, D(y) is the diagonal matrix
with the entries of y on the diagonal. Now the rank of H is n, since the
a0, . . . , an−1 is a basis of Fn
qm . Hence
|A| = w = rank(M) ≤ rank(S(y)) ≤ rank(D(y)) = wt(y).

Remark 7.4.38 Let Ci be a code with Zi as defining set for i = 1, 2. If
Z1 ⊆ Z2, then C2 ⊆ C1.
Lemma 7.4.39 Let I be a complete defining set for the cyclic code C. If y ∈ C
and y ∈ D for all cyclic codes D with complete defining sets Z which contain I
and are not equal to I, then wt(y) ≥ µ(I).
Proof. Define
Z = {i + j |i, j ∈ Zn, y · ai ∗ bj = 0}.
***Then Z is a complete defining set. ***
Clearly I ⊆ Z, since y ∈ C and I is a defining set of C. Let D be the code with
defining set Z. Then y ∈ D. If I = Z, then y ∈ D by the assumption, which is
a contradiction. Hence I = Z, and wt(y) ≥ µ(I), by Lemma 7.4.37.
Proof. Let y be a non-zero codeword of C. Let Z be equal to {i + j |i, j ∈
Zn, y · ai ∗ bj = 0}. Then Z = Zn, since y is not zero and the ai’s generate
Fn
qm . The theorem now follows from Lemma 7.4.39 and the definition of the
shift bound.
Remark 7.4.40 The computation of the shift bound is quite involved, and is
only feasible the use of a computer. It makes sense if one classifies codes with
respect to the minimum distance, since in order to get δS(I) one gets at the
same time the δS(J) for all I ⊆ J.
7.4.5 Exercises
7.4.1 Consider the binary cyclic code of length 15 and defining set {3, 5}.
Compute the complete defining set I of this code. Show that δBCH(I) = 3 and
δHT (I) = 4 is the true minimum distance.
7.4.2 Consider the binary cyclic code of length 35 and defining set {1, 5, 7}.
Compute the complete defining set I of this code. Show that δBCH(I) =
δHT (I) = 6 and δR(I) ≥ 7.
7.4.3 Let m be odd and n = 2m
− 1. Melas’s code is the binary cyclic code
of length n and defining set {1, −1}. Show that this code is reversible, has
dimension k = n − 2m and that the minimum distance is at least five.
7.4.4 Let −1 be a power of q modulo n. Then every cyclic code over Fq of
length n is reversible.
7.4.5 Let n = 22m
+ 1 with m 1. Zetterberg’s code is the binary cyclic
code of length n and defining set {1}. Show that this code is reversible, has
dimension k = n − 4m and that the minimum distance is at least five.
7.4.6 Consider the ternary cyclic code of length 11 and defining set {1}. Com-
pute the complete defining set I of this code. Show that δBCH(I) = δHT (I) =
δS(I) = 4. Let I = {0} ∪ I. Show that δBCH(I ) = δHT (I) = 4 and δS(I) ≥ 5.
7.4.7 Let q be a power of a prime and n a positive integer such that gcd(n, q) =
1. Write a computer program that computes the complete defining set Z modulo
n with respect to q and the bounds δBCH(Z), δHT (Z), δR(Z) and δS(Z) for given
a defining set I in Zn.

7.5. LOCATOR POLYNOMIALS AND DECODING CYCLIC CODES 229
7.5 Locator polynomials and decoding cyclic codes
***
7.5.1 Mattson-Solomon polynomial
Definition 7.5.1 Let α ∈ F∗
qm be a primitive n-th root of unity.
The Mattson-Solomon (MS) polynomial A(Z) of
a(x) = a0 + a1x + · · · + an−1xn−1
is defined by
A(Z) =
n
i=1
AiZn−i
, where Ai = a(αi
) ∈ Fqm .
Here too we adopt the convention that the index i is computed modulo n.
The MS polynomial A(Z) is the discrete Fourier transform of a(x). In order to
compute the inverse discrete Fourier transform, that is the coefficients of a(X)
in terms of the A(Z) we need the following lemma on the sum of a geometric
sequence.
Lemma 7.5.2 Let β ∈ Fqm be a zero of Xn
− 1. Then
n
i=1
βi
=
n if β = 1
0 if β = 1.
Proof. If β = 1, then
n
i=1 βi
= n. If β = 1, then using the formula for the
sum of a geometric series
n
i=1 βi
= (βn+1
−β)/(β −1) and βn+1
= β gives the
desired result.
Proposition 7.5.3
1) The inverse transform is given by ai = 1
n A(αi
).
2) A(Z) is the MS polynomial of a word a(x) coming from Fn
q if and only if
Ajq = Aq
j for all j = 1, . . . , n.
3) A(Z) is the MS polynomial of a codeword a(x) of the cyclic code C if and
only if Aj = 0 for all j ∈ Z(C) and Ajq = Aq
j for all j = 1, . . . , n.
Proof.
1) Expanding A(αi
) and using the definitions gives
A(αi
) =
n
j=1
Ajαi(n−j)
=
n
j=1
a(αj
)αi(n−j)
=
n
j=1
n−1
k=0
akαjk
αi(n−j)
.
Using αn
= 1, interchanging the order of summation and using Lemma 7.5.2
with β = αk−i
gives
n−1
k=0
ak
n
j=1
α(k−i)j
= nai.

2) If A(Z) is the MS polynomial of a(x), then using Proposition 7.2.40 gives
Aq
j = a(αj
)q
= a(αqj
) = Aqj,
since the coefficients of a(x) are in Fq.
Conversely, suppose that Ajq = Aq
j for all j = 1, . . . , n. Then using (1) gives
aq
i = ( 1
n A(αi
))q
= 1
n
n
j=1 Aq
j αqi(n−j)
= 1
n
n
j=1 Aqjαqi(n−j)
.
Using the fact that multiplication with q is a permutation of Zn gives that the
above sum is equal to
1
n
n
j=1 Ajαi(n−j)
= ai.
Hence aq
i = ai and ai ∈ Fq for all i. Therefore a(x) is coming from Fn
q .
3) Aj = 0 if and only if a(αj
) = 0 by (1). Together with (2) and the definition
of Z(C) this gives the desired result.
Another proof of the BCH bound can be obtained with the Mattson-Solomon
polynomial.
Proposition 7.5.4 Let C be a narrow sense BCH code with defining minimum
distance δ. If A(Z) is the MS polynomial of a(x) a nonzero codeword of C, then
the degree of A(Z) is at most n − δ and the weight of a(x) is at least δ.
Proof. Let a(x) be a nonzero codeword of C. Let A(Z) be the MS polynomial
of a(x), then Ai = a(αi
) = 0 for all i = 1, . . . , δ − 1. So the degree of A(Z) is
at most n − δ. We have that ai = A(αi
)/n by (1) of Proposition 7.5.3. The
number of zero coefficients of a(x) is the number zeros of A(Z) in Fqm , which
is at most n − δ. Hence the weight of a(x) is at least δ.
Example 7.5.5 Let a(x) = 6 + x + 3x2
+ x3
a codeword of the cyclic code of
length 6 over F7 of Example 7.1.24. Choose α = 3 as primitive element. Then
A(Z) = 4 + Z + 3Z2
is the MS polynomial of a(x).
7.5.2 Newton identities
Definition 7.5.6 Let a(x) be a word of weight w. Then there are indices
0 ≤ i1 · · · iw n such that
a(x) = ai1 xi1
+ · · · + aiw xiw
with aij = 0 for all j. Let xj = αij
and yj = aij . Then the xj are called the
locators and the yj the corresponding values. Furthermore
Ai = a(αi
) =
w
j=1
yjxi
j.
Consider the product
σ(Z) =
w
j=1
(1 − xjZ).
Then σ(Z) has as zeros the reciprocals of the locators, and is sometimes called
the locator polynomial. Sometimes this name is reserved for the monic polyno-
mial that has the locators as zeros.

Proposition 7.5.7 Let σ(Z) =
w
i=0 σiZi
be the locator polynomial of the lo-
cators x1, . . . , xw. Then σi is the i-th elementary symmetric function in these
locators:
σt = (−1)t
1≤j1j2···jt≤w
xj1 xj2 · · · xjt .
Proof. This is proved by induction on w and is left to the reader as an exercise.
The following property of the MS polynomial is called the generalized Newton
identity and gives the reason for these definitions.
Proposition 7.5.8 For all i it holds that
Ai+w + σ1Ai+w−1 + · · · + σwAi = 0.
Proof. Substitute Z = 1/xj in the equation
1 + σ1Z + · · · + σwZw
=
w
j=1
(1 − xjZ)
and multiply by yjxi+w
j . This gives
yjxi+w
j + σ1yjxi+w−1
j + · · · + σwyjxi
j = 0.
Summing on j = 1, . . . , w yields the desired result of Proposition 7.5.8.
Example 7.5.9 Let C be the cyclic code of length 5 over F16 with defining set
{1, 2}. Then this defining set is complete. The polynomial
X4
+ X3
+ X2
+ X + 1
is irreducible over F2. Let β be a zero of this polynomial in F16. Then the order
of β is 5. The generator polynomial of C is
(X + β)(X + β2
) = X2
+ (β + β2
)X + β3
.
So (β3
, β + β2
, 1, 0, 0) ∈ C and
(β + β2
+ β3
, 1 + β, 0, 1, 0) = (β + β2
)(β3
, β + β2
, 1, 0, 0) + (0, β3
, β + β2
, 1, 0)
is an element of C. These codewords together with their cyclic shifts and their
nonzero scalar multiples give (5 + 5) ∗ 15 = 150 words of weight 3. In fact
these are the only codewords of weight 3, since it is an [5, 3, 3] MDS code and
A3 = 5
3 (16 − 1) by Remark 3.2.15. Propositions 7.5.3 and 7.5.8 give another
way to proof this. Consider the set of equations:



A4 + σ1A3 + σ2A2 + σ3A1 = 0
A5 + σ1A4 + σ2A3 + σ3A2 = 0
A1 + σ1A5 + σ2A4 + σ3A3 = 0
A2 + σ1A1 + σ2A5 + σ3A4 = 0
A3 + σ1A2 + σ2A1 + σ3A5 = 0

If A1, A2, A3, A4 and A5 are the coefficients of the MS polynomial of a codeword,
then A1 = A2 = 0. If A3 = 0, then Ai = 0 for all i. So we may assume that
A3 = 0. The above equations imply A4 = σ1A3, A5 = (σ2
1 + σ2)A3 and



σ3
1 + σ3 = 0
σ2
1σ2 + σ2
2 + σ1σ3 = 0
σ2
1σ3 + σ2σ3 + 1 = 0.
Substitution of σ3 = σ3
1 in the remaining equations yields
σ4
1 + σ2
1σ2 + σ2
2 = 0
σ5
1 + σ3
1σ2 + 1 = 0.
Multiplying the first equation with σ1 and adding to the second one gives
1 + σ1σ2
2 = 0.
Thus σ1 = σ−2
2 and
σ10
2 + σ5
2 + 1 = 0.
This last equation has 10 solutions in F16, and we are free to choose A3 from
F∗
16. This gives in total 150 solutions.
7.5.3 APGZ algorithm
Let C be a cyclic code of length n such that the minimum distance of C is at
least δ by the BCH bound. In this section we will give a decoding algorithm for
such a code which has an efficient implementation and is used in practice. This
algorithm corrects errors of weight at most (δ−1)/2, whereas the true minimum
distance can be larger than δ.
The notion of a syndrome was already given in the context of arbitrary codes
in Definition 6.2.2. Let α be a primitive n-th root of unity. Let C be a
cyclic code of length n with 1, . . . , δ − 1 in its complete defining set. Let
hi = (1, αi
, . . . , αi(n−1)
). Consider C as the subfield subcode of the code with
parity check matrix ˜H with rows hi for i ∈ Z(C) as in Remark 7.3.2. Let
c = (c0, . . . , cn−1) ∈ C be the transmitted word, so c(x) = c0 + · · · + cn−1xn−1
.
Let r be the received word with w errors and w ≤ (δ−1)/2. So r(x) = c(x)+e(x)
and wt(e(x)) = w. The syndrome Si of r(x) with respect to the row hi is equal
to
Si = r(αi
) = e(αi
) for i ∈ Z(C),
since c(αi
) = 0 for all i ∈ Z(C). The syndrome of r is s = r ˜HT
. Hence
si = Si for all i ∈ Z(C) and these are also called the known syndromes, since
the receiver knows Si for all i ∈ Z(C). The unknown syndromes are defined by
Si = e(αi
) for i ∈ Z(C).
Let A(Z) be the MS polynomial of e(x). Then
Si = r(αi
) = e(αi
) = Ai for i ∈ Z(C).
The receiver knows all S1, S2, . . . , S2w, since {1, 2, . . . , δ − 1} ⊆ Z(C) and
2w ≤ δ − 1.

Let σ(Z) be the error-locator polynomial, that is the locator polynomial
σ(Z) =
w
j=1
(1 − xjZ)
of the error positions
{x1, . . . , xw} = { αi
| ei = 0 }.
Let σi be the i-th coeﬃcient of σ(Z) and form the following set of generalized
Newton identities of Proposition 7.5.8 with Si = Ai



Sw+1 + σ1Sw + · · · + σwS1 = 0
Sw+2 + σ1Sw+1 + · · · + σwS2 = 0
...
...
...
S2w + σ1S2w−1 + · · · + σwSw = 0.
(7.1)
The algorithm of Arimoto-Peterson-Gorenstein-Zierler (APGZ) solves this sys-
tem of linear equations in the variables σj by Gaussian elimination. The fact
that this system has a unique solution is guaranteed by the following.
Proposition 7.5.10 The matrix (Si+j−1|1 ≤ i, j, ≤ v) is nonsingular if and
only if v = w the number of errors.
Proof. ***(Si+j−1) = HD(e)HT
as in the proof of Lemma 7.4.37***
After the system of linear equations is solved, we know the error-locator poly-
nomial
σ(Z) = 1 + σ1Z + σ2Z2
+ · · · + σwZw
which has as its zeros the reciprocals of the error locations. Finding the zeros
of this polynomial is done by inspecting all values of Fqm .
Example 7.5.11 Let C be the binary narrow sense BCH code of length 15 and
designed minimum distance 5 with generator polynomial 1+X4
+X6
+X7
+X8
as in Example 7.3.13. Let
r = (0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0)
be a received word with respect to the code C with 2 errors. Then r(x) =
x + x3
+ x4
+ x7
+ x13
and S1 = r(α) = α12
and S3 = r(α3
) = α7
. Now
S2 = S2
1 = α9
and S4 = S4
1 = α3
. The system of equations becomes:
α7
+ α9
σ1 + α12
σ2 = 0
α3
+ α7
σ1 + α9
σ2 = 0
Which has the unique solution σ1 = α12
and σ2 = α13
. So the error-locator
polynomial is
1 + α12
Z + α13
Z2
which has α−3
and α−10
as zeros. Hence
e = (0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
is the error and
c = (0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0)
is the codeword sent.

7.5.4 Closed formulas
Consider the system of equations (7.1) as linear in the unknowns σ1, . . . , σw
with coefficients in S1, . . . , S2w. Then
σi =
∆i
∆0
where ∆i is the determinant of a certain w × w matrix according to Cramer’s
rule. Then the ∆i are polynomials in the Si. Conclude that
det





1 Z . . . Zw
Sw+1 Sw . . . S1
...
... · · ·
...
S2w S2w−1 . . . Sw





= ∆0 + ∆1Z + · · · + ∆wZw
is a closed formula of the generic error-locator polynomial. Notice that the
constant coefficient of the generic error-locator polynomial is not 1
Example 7.5.12 Consider the narrow-sense BCH code with defining minimum
distance 5. Then {1, 2, 3, 4} is the defining set, so the syndromes S1, S2, S3 and
S4 of a received word are known. We have to solve the system of equations
S2σ1 + S1σ2 = −S3
S3σ1 + S2σ2 = −S4
Now Cramer’s rule gives that
σ1 =
−S3 S1
−S4 S2
S2 S1
S3 S2
=
S1S4 − S2S3
S2
2 − S1S3
and similarly
σ2 =
S2
3 − S2S4
S2
2 − S1S3
The generic error-locator polynomial is
det


1 Z Z2
S3 S2 S1
S4 S3 S2

 = (S2
2 − S1S3) + (S1S4 − S2S3)Z + (S2
3 − S2S4)Z2
.
In the binary case we have that S2 = S2
1 and S4 = S4
1 . So



S2
2 + S1S3 = S4
1 + S1S3 = S1(S3
1 + S3)
S1S4 + S2S3 = S5
1 + S2
1 S3 = S2
1 (S3
1 + S3)
S2
3 + S2S4 = S2
3 + S6
1 = (S3
1 + S3)2
Hence the generic error-locator polynomial becomes after division by S3
1 + S3
S1 + S2
1 Z + (S3
1 + S3)Z2
.

Example 7.5.13 Let C be the narrow sense BCH code over F16 of length 15
and designed minimum distance 5 as in Example 7.3.14. Let
r = (α5
, α8
, α11
, α10
, α10
, α7
, α12
, α11
, 1, α, α12
, α14
, α12
, α2
, 0)
be a received word with respect to the code C with 2 errors. Then S1 = α12
,
S2 = α7
, S3 = 0 and S4 = α2
. The formulas S2 = S2
1 and S4 = S4
1 of Example
7.5.11 are no longer valid, since this code is defined over F16 instead of F2. By
the formulas in Example 7.5.12 the error-locator polynomial is
1 + Z + α10
Z2
which has α−2
and α−8
as zeros. In this case the error positions are known, but
the error values need some extra computation, since the values are not binary.
This could be done by considering the error positions as erasures along the lines
of Section 6.2.2. The next section gives an alternative with Forney’s formula.
7.5.5 Key equation and Forney’s formula
Consider the narrow sense BCH code C with designed minimum distance δ. So
the defining set is {1, . . . , δ − 1}. Let c(x) ∈ C be the transmitted codeword.
Let r(x) = c(x) + e(x) be the received word with error e(x). Suppose that the
number of errors w = wt(e(x)) is at most (δ − 1)/2. The support of e(x) will be
denoted by I, that is ei = 0 if and only if i ∈ I. So the error-locator polynomial
is
σ(Z) =
i∈I
(1 − αi
Z)
with coefficients σ0 = 1, σ1, . . . , σw.
Definition 7.5.14 The syndromes are Sj = r(αj
) for 1 ≤ j ≤ δ − 1. The
syndrome polynomial S(Z) is defined by
S(Z) =
δ−1
j=1
SjZj−1
,
Remark 7.5.15 The syndrome Sj is equal to e(αj
), since c(αj
) = 0, for all
j = 1, . . . , δ − 1. Furthermore 2w ≤ δ − 1. The Newton identities
Sk + σ1Sk−1 + · · · + σwSk−w = 0 for k = w + 1, . . . , 2w
imply that the (k −1)st coefficient of σ(Z)S(Z) is zero for all k = w+1, . . . , 2w,
since
σ(Z)S(Z) =
k


i+j=k
σiSj

 Zk−1
Hence there exist polynomials q(Z) and r(Z) such that
σ(Z)S(Z) = r(Z) + q(Z)Z2w
, deg(r(Z)) w.
In the following we will identify the remainder r(Z).

Definition 7.5.16 The error-evaluator polynomial ω(Z) is defined by
ω(Z) =
i∈I
eiαi
i=j∈I
(1 − αj
Z).
Proposition 7.5.17 Let σ (Z) be the formal derivative of σ(Z). Then the error
values are given by Forney’s formula:
el = −
ω(α−l
)
σ (α−l)
for all error positions αl
.
Proof. Differentiating
σ(Z) =
i∈I
(1 − αi
Z)
gives
σ (Z) =
i∈I
−αi
i=j∈I
(1 − αj
Z).
Hence
σ (α−l
) = −αl
l=j∈I
(1 − αj−l
)
which is not zero. Substitution of α−l
in ω(Z) gives ω(α−l
) = −elσ (α−l
).
Remark 7.5.18 The polynomial σ(Z) has simple zeros. Hence β is not a
zero of σ (Z) if β is a zero of σ(Z), by Lemma 7.2.8. So the denominator in
Proposition 7.5.17 is not zero. This proposition implies that β is not zero of
ω(Z) if β is a zero of σ(Z). Hence the greatest common divisor of σ(Z) and
ω(Z) is one.
Proposition 7.5.19 The error-locator polynomial σ(Z) and the error-evaluator
polynomial ω(Z) satisfy the Key equation:
σ(Z)S(Z) ≡ ω(Z)(mod Zδ−1
). (7.2)
Moreover if (σ1(Z), ω1(Z)) is another pair of polynomials that satisfy the Key
equation and such that deg ω1(Z) deg σ1(Z) ≤ (δ − 1)/2, then there exists a
polynomial λ(Z) such that σ1(Z) = λ(Z)σ(Z) and ω1(Z) = λ(Z)ω(Z).
Proof. We have that Sj = r(αj
) = e(αj
) for all j = 1, 2, . . . , δ − 1. Using
the definitions, interchanging summations and the sum formula for a geometric
series we get
S(Z) =
δ−1
j=1
e(αj
)Zj−1
=
δ−1
j=1 i∈I
eiαij
Zj−1
=
i∈I
eiαi
δ−1
j=1
(αi
Z)j−1
=
i∈I
eiαi 1 − (αi
Z)δ−1
1 − αiZ
.

Hence
σ(Z)S(Z) =
j∈I
(1 − αj
Z)S(Z) =
i∈I
eiαi
1 − (αi
Z)δ−1
i=j∈I
(1 − αj
Z).
Therefore
σ(Z)S(Z) ≡
i∈I
eiαi
i=j∈I
(1 − αj
Z) ≡ ω(Z) (mod Zδ−1
).
Suppose that we have another pair (σ1(Z), ω1(Z)) such that
σ1(Z)S(Z) ≡ ω1(Z)(mod Zδ−1
)
and deg ω1(Z) deg σ1(Z) ≤ (δ − 1)/2. Then
σ(Z)ω1(Z) ≡ σ1(Z)ω(Z) (mod Zδ−1
)
and the degrees of σ(Z)ω1(Z) and σ1(Z)ω(Z) are strictly smaller than δ − 1.
Hence
σ(Z)ω1(Z) = σ1(Z)ω(Z).
The greatest common divisor of σ(Z) and ω(Z) is one by Remark 7.5.18.
Therefore there exists a polynomial λ(Z) such that σ1(Z) = λ(Z)σ(Z) and
ω1(Z) = λ(Z)ω(Z).
Remark 7.5.20 In Remark 7.5.15 it is shown that the Newton identities give
the Key equation σ(Z)S(Z) ≡ r(Z)(mod Zδ−1
). In Proposition 7.2 a new
proof of the Key equation is given where the remainder r(Z) is identified as
the error-evaluator polynomial ω(Z). Conversely the Newton identities can be
derived form this second proof.
Example 7.5.21 Let C be the narrow sense BCH code of length 15 over F16
of designed minimum distance 5 and let r be the received word as in Example
7.5.13. The error-locator polynomial is σ(Z) = 1 + Z + α10
Z2
which has α−2
and α−8
as zeros. The syndrome polynomial is S(Z) = α12
+α7
Z +α2
Z3
. Then
σ(Z)S(Z) = α12
+ α2
Z + α2
Z4
+ α12
Z5
.
Proposition 7.5.19 implies
ω(Z) ≡ σ(Z)S(Z) ≡ α12
+ α2
Z( mod Z4
).
Hence ω(Z) = α12
+ α2
Z, since deg(ω(Z)) deg(σ(Z)) = 2. Furthermore
σ (Z) = 1. The error values are therefore
e2 = ω(α−2
) = α11
and e8 = ω(α−8
) = α8
Remark 7.5.22 Consider the BCH code C with {b, b + 1 . . . , b + δ − 2} as
defining set. The syndromes are Sj = e(αj
) for b ≤ j ≤ b + δ − 2. Adapt the
above definitions as follows. The syndrome polynomial S(Z) is defined by
S(Z) =
b+δ−2
j=b
SjZj−b
,

The error-evaluator polynomial ω(Z) is defined by
ω(Z) =
i∈I
eiαib
i=j∈I
(1 − αj
Z).
Show that the error-locator polynomial σ(Z) and the error-evaluator polynomial
ω(Z) satisfy the Key equation:
σ(Z)S(Z) ≡ ω(Z)(mod Zδ−1
).
Show that the error values are given by Forney’s formula:
ei = −
ω(α−i
)
αi(b−1)σ (α−i)
for all error positions i.
7.5.6 Exercises
7.5.1 Consider A(Z) = 2+6Z+2Z2
+5Z3
in F7[Z]. Show that A(Z) is the MS
polynomial of codeword a(x) of a cyclic code of length 6 over F7 with primitive
element α = 3. Compute the zeros and coefficients of a(x).
7.5.3 In case w = 2 we have that σ1 = −(x1 + x2), σ2 = x1x2 and Ai =
y1xi
1 + y2xi
2. Substitute these formulas in the Newton identities in order to
check their validity.
7.5.4 Let C be the binary narrow sense BCH code of length 15 and designed
minimum distance 5 as in Example 7.5.11. Let
r = (1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1)
be a received word with respect to the code C with 2 errors. Find the codeword
sent.
7.5.5 Consider the narrow-sense BCH code with defining minimum distance
7. Then the syndromes S1, S2, . . . , S6 of a received word are known. Compute
the coefficients of the generic error-locator polynomial. Show that in the binary
case the generic error-locator polynomial becomes
(S3 + S3
1 ) + (S1S3 + S4
1 )Z + (S5 + S2
1 S3)Z2
+ (S2
3 + S1S5 + S3
1 S3 + S6
1 )Z3
,
by using S2 = S2
1 , S4 = S4
1 and S6 = S2
3 and after division by the common
factor S2
3 + S1S5 + S3
1 S3 + S6
1 .
7.5.6 Let C be the narrow sense BCH code of length 15 over F16 of designed
minimum distance 5 as in Examples 7.5.13 and 7.5.21. Let
r = (α8
, α7
, α1
, α11
, α3
, α5
, α10
, α11
, α10
, α7
, α4
, α10
, 0, 1, α5
)
be a received word with respect to the code C with 2 errors. Find the error
positions. Determine the error values by Forney’s formula.
7.5.7 Show the validity of the Key equation and Forney’s formula as claimed
in Remark 7.5.22.

7.6. NOTES 239
7.6 Notes
6.4.2: iterated HT, iterated Roos bound.
6.4.3: symmetric Roos bound.
In many cases of binary codes of length at most 62 the shift bound is equal to
the minimum distance, see [?]. For about 95% of all ternary codes of length at
most 40 the shift bound is equal to the minimum distance, see [?].
In a discussion with B.-Z. Shen we came to the following generalization of in-
dependent sets and the shift bound, see also Shen and Tzeng [?] and Augot,
Charpin and Sendrier [?] on generalized Newton identities.
Lemma 7.4.37 is a generalization of a theorem of van Lint and Wilson [?, The-
orem 11].
Generalization of shift bound for linear codes.
Linear complexity and the pseudo rank bound.
Shift bound for gen. Hamming weights.
Conjecture of (non) existence of asymptotically good cyclic codes Assmus, Tu-
ryn 1966.
***Blahut’s theorem, Massey in Festschrift on DFT and PS polynomial***
Fundamental iterative algorithm.

Chapter 8
Polynomial codes
Ruud Pellikaan
****
8.1 RS codes and their generalizations
Reed-Solomon codes will be introduced as special cyclic codes. We will show
that these codes are MDS and can be obtained by evaluating certain polynomi-
als. This gives rise to a generalization of these codes. Fractional transformations
are defined and related to the automorphism group of generalized Reed-Solomon
code.
8.1.1 Reed-Solomon codes
Consider the following definition of Reed-Solomon codes over the finite field Fq.
Definition 8.1.1 Let α be a primitive element of Fq. Let n = q − 1. Let b
and k be non-negative integers such that 0 ≤ b, k ≤ n. Define the generator
polynomial gb,k(X) by
gb,k(X) = (X − αb
) · · · (X − αb+n−k−1
).
The Reed-Solomon(RS) code RSk(n, b) is by definition the q-ary cyclic code
with generator polynomial gb,k(X). In the literature the code is also denoted
by RSb(n, k).
Proposition 8.1.2 The code RSk(n, b) has length n = q − 1, is cyclic, linear
and MDS of dimension k. The dual of RSk(n, b) is equal to RSn−k(n, n−b+1).
Proof. The code RSk(n, b) is of length q − 1, cyclic and linear by definition.
The degree of the generator polynomial is n−k, so the dimension of the code is
k by Proposition 7.1.21. The complete defining set is {b, b+1, . . . , b+n−k −1}
241

242 CHAPTER 8. POLYNOMIAL CODES
and has n − k consecutive elements. Hence the minimum distance d is at least
n − k + 1 by the BCH bound of Proposition 7.3.3. The generator polynomial
gb,k(X) has degree n − k, so gb,k(x) is a codeword of weight at most n − k + 1.
Hence d is at most n − k + 1. Also the Singleton bound gives that d is at most
n − k + 1. Hence d = n − k + 1 and the code is MDS. Another proof that the
parameters are [n, k, n − k + 1] will be given in Proposition 8.1.14.
The complete defining set of RSk(n, b) is the subset U consisting of n − k con-
secutive elements:
U = {b, b + 1, . . . , b + n − k − 1}.
Hence Zn {−i|i ∈ U} is the complete defining set of the dual of RSk(n, b) by
Proposition 7.2.58. But
Zn {−i|i ∈ U} = Zn {n − (b + n − k − 1), . . . , n − (b + 1), n − b} =
{n − b + 1, n − b + 2, . . . , n − b + k}
is the complete defining set of RSn−k(n, n − b + 1).
Another description of RS codes will be given by evaluating polynomials.
Definition 8.1.3 Let f(X) ∈ Fq[X]. Let ev(f(X)) be the evaluation of f(X)
defined by
ev(f(X)) = (f(1), f(α), . . . , f(αn−1
)).
Proposition 8.1.4 We have that
RSk(n, b) = { ev(Xn−b+1
f(X)) | f(X) ∈ Fq[X], deg(f) k }.
Proof. The dual of RSk(n, b) is RSn−k(n, n−b+1) by Proposition 8.1.2, which
has {n − b + 1, . . . , n − b + k} as complete defining set. So RSn−k(n, n − b + 1)
has H = (αij
|n − b + 1 ≤ i ≤ n − b + k, 0 ≤ j ≤ n − 1) as parity check matrix,
by Remark 7.3.2 and the proof of Proposition 7.3.3. That means that H is a
generator matrix of RSb(n, k). The rows of H are ev(Xi
) for n − b + 1 ≤ i ≤
n − b + k. So they generate the space { ev(Xn−b+1
f(X)) | deg(f) k }.
Example 8.1.5 Consider RS3(7, 1). It is a cyclic code over F8 with generator
polynomial
g1,3(X) = (X − α)(X − α2
)(X − α3
)(X − α4
)
where α is a primitive element of F8 satisfying α3
= α + 1. Then
g1,3(X) = α3
+ αX + X2
+ α3
X3
+ X4
In the second description we have that
RS3(7, 1) = { ev(f(X)) | f(X) ∈ Fq[X], deg(f) 3 }
The matrix in Exercise 7.1.5 is obtained by evaluating the monomials 1, X and
X2
at αj
for j = 0, 1, . . . , 6. It is a generating matrix of RS3(7, 1).

8.1. RS CODES AND THEIR GENERALIZATIONS 243
8.1.2 Extended and generalized RS codes
Definition 8.1.6 The extended RS code ERSk(n, b) is the extension of the
code RSk(n, b).
The code ERSk(n, 1) has also a description by means of evaluations.
Proposition 8.1.7 We have that
ERSk(n, 1) = { (f(1), f(α), . . . , f(αn−1
), f(0)) | f(X) ∈ Fq[X], deg(f) k }.
Proof. If C is a code of length n, then by Definition 3.1.6 the extended code
Ce
is given by
Ce
= { (c, −
n−1
i=0 ci) | c ∈ C }.
So we have to show that
f(0) + f(1) + f(α) + · · · + f(αn−1
) = 0
for all polynomials f(X) ∈ Fq[X] of degree at most k − 1. By linearity it is
enough to show that this is the case for all monomials of degree at most k − 1.
Let f(X) be the monomial Xi
with 0 ≤ i n. Then
n−1
j=0
f(αj
) =
n−1
j=0
αij
=
n if i = 0,
0 if 0 i n,
by Lemma 7.5.2. Now n = q − 1 = −1 in Fq. So in both cases we have that this
sum is equal to −f(0).
Definition 8.1.8 Let F be a field. Let f(X) = f0 + f1X + · · · + fkXk
be an
element of F[X] and a ∈ F. Then the evaluation of f(X) at a is given by
f(a) = f0 + f1a + · · · + fkak
.
Let
Lk = { f(X) ∈ Fq[X] | deg f(X) ≤ k }.
The evaluation map
evk,a : Lk −→ F
is given by evk,a(f(X)) = f(a). Furthermore the evaluation at infinity is defined
by evk,∞(f(X)) = fk.
Remark 8.1.9 The evaluation map is linear. Furthermore evk,∞(f(X)) = 0 if
and only if f(X) has degree at most k − 1 for all f(X) ∈ Lk. The map evk,a
does not depend on k if a ∈ F. The notation f(∞) will be used instead of
evk,∞(f(X)), but notice that this depends on k and the implicit assumption
that f(X) has degree at most k.
Definition 8.1.10 Let n be an arbitrary integer such that 1 ≤ n ≤ q. Let a
be an n-tuple of mutually distinct elements of Fq ∪ {∞}. Let b be an n-tuple
of nonzero elements of Fq. Let k be an arbitrary integer such that 0 ≤ k ≤ n.
The generalized RS code GRSk(a, b) is defined by
GRSk(a, b) = { (f(a1)b1, f(a2)b2, . . . , f(an)bn) | f(X) ∈ Fq[X], deg(f) k }.

The following two examples show that the generalized RS codes are indeed
generalizations of both RS codes as extended RS codes.
Example 8.1.11 Let α be a primitive element of F∗
q. Let n = q − 1. Define
aj = αj−1
and bj = an−b+1
j for j = 1, . . . , n. Then RSk(n, b) = GRSk(a, b).
Example 8.1.12 Let α be a primitive element of F∗
q. Let n = q. Let a1 = 0
and b1 = 1. Define aj = αj−2
and bj = an−b+1
j for j = 2, . . . , n. Then
ERSk(n, 1) = GRSk(a, b).
Example 8.1.13 The BCH code over Fq with defining set {b, b+1, . . . , b+δ−2}
and length n, can be considered as a subfield subcode over Fq of a generalized
RS code over Fqm where m is such that n divides qm
− 1.
Proposition 8.1.14 Let 0 ≤ k ≤ n ≤ q. Then GRSk(a, b) is an Fq-linear
MDS code with parameters [n, k, n − k + 1].
Proof. Notice that a linear code C stays linear under the linear map c →
(b1c1, . . . , bncn), and that the parameters remain the same if the bi are all
nonzero. Hence we may assume without loss of generality that b is the all
ones vector.
Consider the evaluation map evk−1,a : Lk−1 → Fn
q defined by
evk−1,a(f(X)) = (f(a1), f(a2), . . . , f(an)).
This map is linear and Lk−1 is a linear space of dimension k. Furthermore
GRSk(a, b) is the image of Lk−1 under evk−1,a.
Suppose that aj ∈ Fq for all j. Let f(X) ∈ Lk−1 and evk−1,a(f(X)) = 0. Then
f(X) is of degree at most k − 1 and has n zeros. But k − 1 n by assumption.
So f(X) is the zero polynomial. Hence the restriction of the map evk−1,a to
Lk−1 is injective, and GRSk(a, b) has the same dimension k as Lk−1.
Let c be a nonzero codeword of GRSk(a, b) of weight d. Then there exists a
nonzero polynomial f(X) of degree at most k − 1 such that evk−1,a(f(X)) = c.
The zeros of f(X) among the a1, . . . , an correspond to the zero coordinates of
c. So the number of zeros of f(X) among the a1, . . . , an is equal to the number
of zero coordinates of c, which is n − d. Hence n − d ≤ deg f(X) ≤ k − 1, that
is d ≥ n − k + 1.
The evaluation of the polynomial f(X) =
k−1
i=1 (X − ai) gives an explicit code-
word of weight n − k + 1. Also the Singleton bound gives that d ≤ n − k + 1.
Therefore the minimum distance of the generalized RS code is equal to n−k+1
and the code is MDS.
In case aj = ∞ for some j, then ai ∈ Fq for all i = j. Now f(aj) = 0 implies
that the degree of f(X) is a most k − 2. So the above proof applies for the
remaining n − 1 elements and polynomials of degree at most k − 2.
Remark 8.1.15 The monomials 1, X, . . . , Xk−1
form a basis of Lk−1. Suppose
that aj ∈ Fq for all j. Then evaluating these monomials gives a generator matrix
with entries ai−1
j bj of the code GRSk(a, b). If b is the all ones vectors, then
the matrix Gk(a) of Proposition 3.2.10 is a generator matrix of GRSk(a, b). If
aj = ∞, then evk−1,aj
(bjXi−1
) = 0 for all i k −1 and evk−1,aj
(bjXk−1
) = bj.
Hence (0, . . . , 0, bj)T
is the corresponding column vector of the generator matrix.

Remark 8.1.16 A generalized RS code is MDS by Proposition 8.1.14. So any
k positions can be used to encode systematically. That means that there is a
generator matrix G of the form (Ik|P), where Ik is the k × k identity matrix
and P a k × (n − k) matrix. The next proposition gives an explicit description
of P.
Proposition 8.1.17 Let b be an n-tuple of nonzero elements of Fq. Let a be
an n-tuple of mutually distinct elements of Fq ∪ {∞}. Define [ai, aj] = ai − aj,
[∞, aj] = 1 and [ai, ∞] = −1 for ai, aj ∈ Fq. Then GRSk(a, b) has a generator
matrix of the form (Ik|P), where
pij =
bj+k
k
t=1,t=i[aj+k, at]
bi
k
t=1,t=i[ai, at]
for 1 ≤ i ≤ k and 1 ≤ j ≤ n − k.
Proof. Assume first that b is the all ones vector. Let gi be the i-th row of
this generator matrix. Then this corresponds to a polynomial gi(X) of degree
at most k − 1 such that gi(ai) = 1 and gi(at) = 0 for all 1 ≤ t ≤ k and t = i.
By the Lagrange Interpolation Theorem ?? there is a unique polynomial with
these properties and it is given by
gi(X) =
k
t=1,t=i(X − at)
k
t=1,t=i[ai, at]
.
Notice that if ai = ∞, then gi(X) satisfies also the required conditions, since
[ai, at] = [∞, at] = 1 by definition and gi(X) is a monic polynomial of degree
k − 1 so gi(∞) = 1. Hence Pij = gi(aj+k) is of the described form also in case
aj+k = ∞.
For arbitrary b we have to multiply the j-th column of G with bj. In order to
get the identity matrix back, the i-th row is divided by bi.
Corollary 8.1.18 Let (Ik|P) be the generator matrix of the code GRSk(a, b).
Then
piupjv[ai, ak+u][aj, ak+v] = pjupiv[aj, ak+u][ai, ak+v]
for all 1 ≤ i, j ≤ k and 1 ≤ u, v ≤ n − k.
In Section 3.2.1 both generalized Reed-Solomon and Cauchy codes were intro-
duced as examples of MDS codes. The following corollary shows that in fact
these codes are the same.
Corollary 8.1.19 Let a be an n-tuple of mutually distinct elements of Fq. Let
b be an n-tuple of nonzero elements of Fq. Let
ci =
bi
k
t=1,t=i[ai, at] if 1 ≤ i ≤ k,
bi
k
t=1[ai, at] if k + 1 ≤ i ≤ n.
Then GRSk(a, b) = Ck(a, c).

Proof. The generator matrix of of GRSk(a, b) which is systematic at the ﬁrst
k positions is of the form (Ik|P) with P as given in Proposition 8.1.17. Then
pij =
cj+kc−1
i
[aj+k, ai]
for all 1 ≤ i ≤ k and 1 ≤ j ≤ n−k. Hence (Ik|P) = (Ik|A(a, c)) is the generator
matrix of the generalized Cauchy code Ck(a, c).
Remark 8.1.20 A generalized RS code is tight with respect to the Singleton
bound k + d ≤ n + 1, that is it is an MDS code. Hence its dual is also MDS. In
fact the next proposition shows that the dual of a generalized RS code is again
a GRS code.
Proposition 8.1.21 Let b⊥
be the vector with entries
b⊥
j =
1
bj i=j[aj, ai]
for j = 1, . . . , n. Then GRSn−k(a, b⊥
) is the dual code of GRSk(a, b).
Proof. Let G = (Ik|P) be the generator matrix of GRSk(a, b) with P as
obtained in Proposition 8.1.17. In the same way GRSn−k(a, b⊥
) has a generator
matrix H of the form (Q|In−k) with
Qij =
cj
n
t=k+1,t=i+k[aj, at]
ci+k
n
t=k+1,t=i+k[ai+k, at]
for 1 ≤ i ≤ n − k and 1 ≤ j ≤ k. After substituting the values for b⊥
j and
canceling the same terms in numerator and denominator we see that Q = −PT
.
Hence H is a parity check matrix of GRSk(a, b) by Proposition 2.3.30.
Example 8.1.22 This is a continuation of Example 8.1.11. Let b be the all
ones vector. Then RSk(n, 1) = GRSk(a, b) and RSk(n, 0) = GRSk(a, a).
Furthermore the dual of RSk(n, 1) is RSn−k(n, 0) by Proposition 8.1.2. So
RSk(n, 1)⊥
= GRSn−k(a, a). Alternatively, Proposition 8.1.21 gives that the
dual of GRSk(a, b) is equal to GRSn−k(a, c) with cj = 1/ i=j(aj − ai). We
leave it as an exercise to show that cj = −aj for all j.
Example 8.1.23 Consider the code RS3(7, 1). Let α ∈ F8 be an element with
α3
= 1 + α. Let a, b ∈ F7
8 with ai = αi−1
and b the all ones vector. Then
RS3(7, 1) = GRS3(a, b) by Remark 8.1.11. Let (I3|P) be a generator matrix of
this code. Let g1(X) be the quadratic polynomial such that g1(1) = 1, g1(α) = 0
and g1(α2
) = 0. Then
g1(X) =
(X + α)(X + α2
)
(1 + α)(1 + α2)
.
Hence g1(α3
) = α3
, g1(α4
) = α, g1(α5
) = 1 and g1(α6
) = α3
are the entries of
the ﬁrst row of P. Continuing in this way we get
P =


α3
α 1 α3
α6
α6
1 α2
α5
α4
1 α4

 .

The dual of RS3(7, 1) is RS4(7, 0), by Proposition 8.1.2, which is equal to
GRS4(a, a). This is in agreement with Proposition 8.1.21, since cj = aj for
all j.
Remark 8.1.24 Let b be an n-tuple of nonzero elements of Fq. Let a be an n-
tuple of mutually distinct elements in Fq ∪{∞} and ak = ∞. Then GRSk(a, b)
has a generator matrix of the form (Ik|P), where
pij =
cj+kc−1
i
aj+k − ai
for all 1 ≤ i ≤ k − 1 and 1 ≤ j ≤ n − k and
pkj = cj+kc−1
k
for 1 ≤ j ≤ n − k, with
ci =



bi
k−1
t=1,t=i(ai − at) if 1 ≤ i ≤ k − 1,
bk if i = k,
bi
k−1
t=1 (ai − at) if k + 1 ≤ i ≤ n.
by Corollary 8.1.19.
8.1.3 GRS codes under transformations
Proposition 8.1.25 Let n ≥ 2. Let a be in Fn
q consisting of mutually distinct
entries. Let b be an n-tuple of nonzero elements of Fq. Let 1 ≤ i, j ≤ n and
i = j. Then there exists a b in Fn
q with nonzero entries and an a in Fn
q
consisting of mutually distinct entries such that ai = 0, aj = 1 and
GRSk(a, b) = GRSk(a , b ).
Proof. We may assume without loss of generality that b = 1. Consider the
linear polynomials l(X) = (X −ai)/(aj −ai) and m(X) = (aj −ai)X +ai. Then
l(m(X)) = X and m(l(X)) = X. Now Lk is the vector space of all polynomials
in the variable X of degree at most k. Then the maps λ, µ : Lk−1 → Lk−1
deﬁned by λ(f(X)) = f(l(X)) and µ(g(X)) = g(m(X)) are both linear and
inverses of each other. Hence λ and µ are automorphisms of Lk−1. Let at = l(at)
for all t. Then the at are mutually distinct, since the at are mutually distinct and
l(X) deﬁnes a bijection of Fq. Furthermore ai = l(ai) = 0 and aj = l(aj) = 1.
Now evk−1,a(f(l(X))) is equal to
(f(l(a1)), . . . , f(l(an)) = (f(a1), . . . , f(an)) = evk−1,a (f(X)).
Finally
GRSk(a , 1) = {evk−1,a (f(X)) | f(X) ∈ Lk−1}
and GRSk(a, 1) is equal to
{evk−1,a(g(X)) | g(X) ∈ Lk−1} = {evk−1,a(f(l(X))) | f(X) ∈ Lk−1}.
Therefore GRSk(a, b) = GRSk(a , b ).

Remark 8.1.26 ***Introduction of GRS with ai = ∞ as in Remark 8.1.15.
Refer to forthcoming section of AG codes on the projective line. We leave the
proof of the fact that we may assume furthermore a3 = ∞ as an exercise to the
reader. For this one has to consider the fractional transformations
aX + b
cX + d
,
with ad − bc = 0. The set of fractional transformations with entries in a field F
form a group with the composition of maps as group operation and determine
the product and inverse.
Consider the map from GL(2, F) to the group of fractional transformations with
entries in F defined by
a b
c d
→
aX + b
cX + d
.
Then this map is a morphism of groups and that the kernel of this map consists
of the diagonal matrices aI2 with a = 0.
Remark 8.1.27 ***Definition of evaluation of a rational function.....
Let ϕ(X) be a fractional transformation and a ∈ Fq ∪ {∞} and f(X) ∈ F[X].
Then
evk,ϕ(a)(f(X)) = evk,a(f(ϕ(X))).
This follows straightforward from the definitions in case a is in F and a not a
zero of the denominator of ϕ(X).......
***projective transformations of the projective line***
Proposition 8.1.28 Let n ≥ 3. Let a be an n-tuple of mutually distinct entries
in Fq ∪ {∞}. Let b be an n-tuple of nonzero elements of Fq. Let i, j and l be
three mutually distinct integers between 1 and n. Then there exists a b in Fn
q
with nonzero entries and an n-tuple a consisting of mutually distinct entries in
Fq ∪ {∞} such that ai = 0, aj = 1 and al = ∞ and
GRSk(a, b) = GRSk(a , b ).
Proof. This is shown similarly as the proof of Proposition 8.1.25 using frac-
tional transformations instead and is left as an exercise.
Now suppose that a generator matrix of the code GRSk(a, b) is given. Is it
possible to retrieve a and b? The pair (a, b) is not unique by the action of the
fractional transformations. The following proposition gives an answer to this
question.
Proposition 8.1.29 Let n ≥ 3. Let a and a be n-tuples with mutually distinct
entries in Fq ∪{∞}. Let b and b be n-tuples of nonzero elements of Fq. Let i, j
and l be three mutually distinct integers between 1 and n. If ai = ai, aj = aj,
al = al and GRSk(a, b) = GRSk(a , b ), then a = a and b = λb for some
nonzero λ in Fq.
Proof. The generalized RS code is MDS, so it is systematic at the first k
positions and it has a generator matrix of the form (Ik|P) such that the entries
of P are nonzero. Let
c = (p11, . . . , pk1, 1, pk1/pk2, . . . , pk1/pn(n−k)).

Let G = c ∗ (Ik|P). Then G is the generator matrix of a generalized equivalent
code C. Dividing the i-th row of G by pi1 gives another generator matrix G of
the same code C such that the (k +1)-th column of G is the all ones vector and
the k-th row is of the form (0, . . . , 1, 1 . . . , 1). So we may suppose without loss
of generality that the generator matrix of generalized RS code is of the form
(Ik|P) with pi1 = 1 for all i = 1, . . . , k and pkj = 1 for all j = 1, . . . , n − k.
After a permutation of the positions we may suppose without loss of generality
that l = k, i = k + 1 and j = k + 2. After a fractional transformation we
may assume that ak+1 = ak+1 = 0, ak+2 = ak+2 = 1 and ak = ak = ∞ by
Proposition 8.1.28.
Remark 8.1.24 gives that gives that there exists an n-tuple c with nonzero entries
in Fq such that
pij =
cj+kc−1
i
aj+k − ai
for all 1 ≤ i ≤ k − 1 and 1 ≤ j ≤ n − k and
pkj = cj+kc−1
k for 1 ≤ j ≤ n − k.
Hence pkj = ck+jc−1
k = 1. So ck+j = ck for all j = 1, . . . , n − k. Multiplying all
entries of c with a nonzero constant gives the same code. Hence we may assume
without loss of generality that ck+j = ck = 1 for all j = 1, . . . , n − k. Therefore
cj = 1 for all j ≥ k.
Let i k. Then pi1 = ck+1/(ci(ak+1 − ai)) = 1, ck+1 = 1 and ak+1 = 0. So
pi1 = −1/(aici) = 1. Hence aici = −1.
Likewise pi2 = ck+2/(ci(ak+2 − ai)), ck+2 = 1 and ak+2 = 1. So
pi2 =
1
((1 − ai)ci)
=
1
(ci + 1)
, since aici = −1.
Hence
ci =
(1 − pi2)
pi2
and ai =
pi2
(pi2 − 1)
for all i k.
Finally pij = ck+j/(ci(ak+j −ai)) and ck+j = 1. So ak+j −ai = 1/(cipij). Hence
ak+j = ai − ai/pij, since ai = −1/ci. Combining this with the expression for ai
gives
aj+k =
pi2
(pi2 − 1)
·
(pij − 1)
pij
.
Therefore a and c are uniquely determined. So also b is uniquely determined,
since
bi =



ci/
k−1
t=1,t=i(ai − at) if 1 ≤ i ≤ k − 1,
ck if i = k,
ci/
k−1
t=1 (ai − at) if k + 1 ≤ i ≤ n.
by Remark 8.1.24.
***
- PAut(GRS(a, b) = ... and MAut(GRS(a, b) = ....
- What is the number of GRS codes?
***

Example 8.1.30 Let G be the generator matrix of a generalized Reed-Solomon
code with entries in F7 given by
G =


6 1 1 6 2 2 3
3 4 1 1 5 4 3
1 0 3 3 6 0 1

 .
Then rref(G) = (I3|A) with
A =


1 3 3 6
4 4 6 6
3 1 6 3

 .
So we want to ﬁnd a vector a consisting of mutually distinct entries in F7 ∪
{∞} and b in F7
7 with nonzero entries such that C = GRS3(a, b). Now C =
(1, 4, 3, 1, 5, 5, 6) ∗ C has a generator matrix of the form (I3|A ) with
A =


1 1 1 1
1 5 4 2
1 4 3 6

 .
We may assume without loss of generality that a4 = 0, a5 = 1 and a3 = ∞ by
Proposition 8.1.25. ***...............***
8.1.4 Exercises
8.1.1 Show that in RS3(7, 1) the generating codeword g1,3(x) is equal to
αev(1) + α5
ev(X) + α4
ev(X2
).
8.1.2 Compute the parity check polynomial of RS3(7, 1) and the generator
polynomial of RS3(7, 1)⊥
by means of Proposition 7.1.37, and verify that it is
equal to g0,4(X) according to Proposition 8.1.2.
8.1.3 Give the generator matrix of RS4(7, 1) of the form (I4|P), where P a
4 × 3 matrix.
8.1.4 Show directly, that is without the use of Proposition 8.1.4, that the code
{ ev(Xn−b+1
f(X)) | deg(f) k } is cyclic.
8.1.5 Give another proof of the fact in Proposition 8.1.2 that the dual of
RSk(n, b) is equal to RSn−k(n, n − b + 1) using the description with evaluations
of Proposition 8.1.4 and that the inner product of codewords of the two codes
is zero.
8.1.6 Let n = q − 1. Let a1, . . . , an be an enumeration of the elements of F∗
q.
Show that i=j(aj − ai) = −1/aj for all j.
8.1.7 Consider α ∈ F8 with α3
= 1 + α. Let a = (a1, . . . , a7) with ai = αi−1
for 1 ≤ i ≤ 7. Let b = (1, α2
, α4
, α2
, 1, 1, α4
). Find c such that the dual of
GRSk(a, b) is equal to GRS7−k(a, c) for all k.
8.1.8 Determine all values of n, k and b such that RSk(n, b) is self dual.

8.2. SUBFIELD AND TRACE CODES 251
8.1.9 Give a proof of Corollary 8.1.18.
8.1.10 Let n ≤ q. Let a be an n-tuple of mutually distinct elements of Fq, and
r an n-tuple of nonzero elements of Fq. Let k be an integer such that 0 ≤ k ≤ n.
Show that the generalized Cauchy code Ck(a, r) is equal to r ∗ Ck(a).
8.1.11 Give a proof of statements made in Remark 8.1.26.
8.1.12 Let u, v and w be three mutually distinct elements of a field F. Show
that there is a unique fractional transformation ϕ such that ϕ(u) = 0, ϕ(v) = 1
and ϕ(w) = ∞.
8.1.14 Let α ∈ F8 be a primitive element such that α3
= α + 1. Let G be the
generator matrix a generalized Reed-Solomon code given by
G =


α6
α6
α 1 α4
1 α4
0 α3
α3
α4
α6
α6
α4
α4
α5
α3
1 α2
0 α6

 .
(1) Find a in F7
8 consisting of mutually distinct entries and b in F7
8 with nonzero
entries such that G is a generator matrix of GRS3(a, b).
(2) Consider the 3 × 7 generator matrix G of the code RS3(7, 1) with entry
α(i−1)(j−1)
in the i-th row and the j-th column. Give an invertible 3 × 3 matrix
S and a permutation matrix P such that G = SGP.
(3) What is the number of pairs (S, P) of such matrices?
8.2 Subfield and trace codes
***
8.2.1 Restriction and extension by scalars
In this section we derive bounds on the parameters of subfield subcodes. We
repeat Definitions 4.4.32 and 7.3.1.
Definition 8.2.1 Let D be an Fq-linear code in Fn
q . Let C be an Fqm -linear
code of length n. If D = C ∩ Fn
q , then D is called the subfield subcode or the
restriction (by scalars) of C, and is denoted by C|Fq. If D ⊆ C, then C is called
a super code of D. If C is generated as an Fqm -linear space by D, then C is
called the extension (by scalars) of D and is denoted by D ⊗ Fqm .
Proposition 8.2.2 Let G be a generator matrix with entries in Fq. Let D and
C be the Fq-linear and the Fqm -linear code, respectively with G as generator
matrix. Then
(D ⊗ Fqm ) = C and (C|Fq) = D.

Proof. Let G be a generator matrix of the Fq-linear code D. Then G is also
a generator matrix of D ⊗ Fqm by Remark 4.4.33. Hence (D ⊗ Fqm ) = C.
Now D is contained in C and in Fn
q . Hence D ⊆ (C|Fq). Conversely, suppose
that c ∈ (C|Fq). Then c ∈ Fn
q and c = xG for some x ∈ Fk
qm . After a
permutation of the coordinates we may assume without loss of generality that
G = (Ik|A) for some k×(n−k) matrix A with entries in Fq. Therefore (x, xA) =
xG = c ∈ Fn
q . Hence x ∈ Fk
q and c ∈ D.
Remark 8.2.3 Similar statements hold as in Proposition 8.2.2 with a parity
check matrix H instead of a generator matrix G.
Remark 8.2.4 Let D be a cyclic code of length n over Fq with defining set I.
Suppose that gcd(n, q) = 1 and n divide qm
−1. Let α in F∗
qm have order n. Let
˜D be the Fqm -linear cyclic code with parity check matrix ˜H = (αij
|i ∈ I, j =
0, . . . , n−1). Then D is the restriction of ˜D by Remark 7.3.2. So (D⊗Fqm ) ⊆ ˜D
and ((D ⊗ Fqm )|Fq) = ( ˜D|Fq) = D. If α is not an element of Fq, then ˜H is not
defined over Fq and the analogous statement of Proposition 8.2.2 as mentioned
in Remark 8.2.3 does hold and (D ⊗ Fqm ) is a proper subcode of ˜D.
We will see that ˜H is row equivalent over Fqm with a matrix H with entries in
Fq and (D ⊗ Fqm ) = ˜D if I is the complete defining set of D.
8.2.2 Parity check matrix of a restricted code
Lemma 8.2.5 Let h1, . . . , hn ∈ Fqm . Let α1, . . . , αm be a basis of Fqm over Fq.
Then there exist unique elements hij ∈ Fq such that
hj =
m
i=1
hijαi.
Furthermore for all x ∈ Fn
q
n
j=1
hjxj = 0
if and only if
n
j=1
hijxj = 0 for all i = 1, . . . , m.
Proof. The existence and uniqueness of the hij is a consequence of the as-
sumption that α1, . . . , αm is a basis of Fqm over Fq. Let x ∈ Fn
q . Then
n
j=1
hjxj =
n
j=1
m
i=1
hijαi xj =
m
i=1


n
j=1
hijxj

 αi.
The αi form a basis over Fq and the xj are elements of Fq. This implies the
statement on the equivalence of the linear equations.
Proposition 8.2.6 Let E = (h1, . . . , hn) be a 1 × n parity check matrix of the
Fqm -linear code C. Let l be the dimension of the Fq linear subspace in Fqm
generated by h1, . . . , hn. Then the dimension of C|Fq is equal to n − l.

Proof. Let H be the m × n matrix with entries hij as given in Lemma
8.2.5. Then (h1j, . . . , hmj) are the coordinates of hj with respect to the basis
α1, . . . , αm of Fqm over Fq. So the rank of H is equal to l. The code C|Fq is the
null space of the matrix H, by Lemma 8.2.5 and has dimension n − rank(H)
which is n − l.
Example 8.2.7 Let α ∈ F9 be a primitive element such that α2
+ α − 1 = 0.
Choose αi = αi
with 1, α as basis. Consider the parity check matrix
E = 1 α α2
α3
α4
α5
α6
α7
of the F9-linear code C. Then according to Lemma 8.2.5 the parity check matrix
H of C|F3 is given by
H =
1 0 1 2 2 0 2 1
0 1 2 2 0 2 1 1
For instance α3
= −1 − α, so α3
has coordinates (−1, −1) with respect to the
chosen basis and the transpose of this vector is the 4-th column of H. The
entries of the row E generate F9 over F3. The rank of H is 2, so the dimension
of C|F3 is 6. This is in agreement with Proposition 8.2.6.
Lemma 8.2.5 has the following consequence.
Proposition 8.2.8 Let D be an Fq-linear code of length n and dimension k.
Let m = n − k. If k n, then D is the restriction of a code C over Fqm of
codimension one.
Proof. Let H be an (n−k)×n parity check matrix of D over Fq. Let m = n−k.
Let k n. Then m 0. Let α1, . . . , αm be a basis of Fqm over Fq. Deﬁne for
j = 1, . . . , n
hj =
m
i=1
hijαi.
Let E = (h1, . . . , hn) be an 1 × n parity check matrix of the Fqm -linear code C.
Now E is not the zero vector, since k n. So C has codimension one, and D is
the restriction of C by Lemma 8.2.5.
Proposition 8.2.9 Let C be an Fqm linear code with parameters [n, k, d]qm .
Then the dimension of C|Fq over Fq is at least n − m(n − k) and its minimum
distance is at least d.
Proof. The minimum distance of C|Fq is at least the minimum distance of C,
since C|Fq is a subset of C.
Let E be a parity check matrix of C. Then E consists of n − k rows. Every
row gives rise to m linear equations over Fq by Lemma 8.2.5. So C|Fq is the
solution space of m(n−k) homogeneous linear equations over Fq. Therefore the
dimension of C|Fq is at least n − m(n − k).
Remark 8.2.10 ***Lower bound of Delsarte-Sidelnikov***

8.2.3 Invariant subspaces
Remark 8.2.11 Let D be the restriction of an Fqm -linear code C. Suppose
that h = (h1, . . . , hn) ∈ Fn
qm is a parity check for D. So
h1c1 + · · · + hncn = 0 for all c ∈ D.
Then
n
i=1 hq
i ci =
n
i=1 hq
i cq
i = (
n
i=1 hici)
q
= 0
for all c ∈ D, since cq
i = ci for all i and c ∈ D. Hence (hq
1, . . . , hq
n) is also a
parity check for the code D.
Example 8.2.12 This is a continuation of Example 8.2.7. Consider the parity
check matrix
E =
1 α α2
α3
α4
α5
α6
α7
1 α3
α6
α α4
α7
α2
α5
of the F9-linear code C . Let D be the ternary restriction of C . Then according
to Proposition 8.2.6 the code D is the null space of the matrix H given by
H =




1 0 1 2 2 0 2 1
0 1 2 2 0 2 1 1
1 2 2 0 2 1 1 0
0 2 1 1 0 1 2 2




The second row of E is obtained by taking the third power of the entries of the
first row. So D = D by Remark 8.2.11. Indeed, the last two rows of H are
linear combinations of the first two rows. Hence H and H have the same rank,
that is 2.
Definition 8.2.13 Extend the Frobenius map ϕ : Fqm → Fqm , defined by
ϕ(x) = xq
, to the map ϕ : Fn
qm → Fn
qm , defined by ϕ(x) = (xq
1, . . . , xq
n). Like-
wise we define ϕ(G) of a matrix with entries (gij) to be the matrix with entries
(ϕ(gij)).
Remark 8.2.14 The map ϕ : Fn
qm → Fn
qm has the property that
ϕ(αx + βy) = αq
ϕ(x) + βq
ϕ(y)
for all α, β ∈ Fqm and x, y ∈ Fn
qm . Hence this map is Fq-linear, since αq
= α and
βq
= β for α, β ∈ Fqm . In particular, the Frobenius map is an automorphism of
the field Fqm with Fq as the field of elements that are point-wise fixed. Therefore
it leaves also the points of Fn
q point-wise fixed under ϕ. If x ∈ Fn
qm , then
ϕ(x) = x if and only if x ∈ Fn
q .
Furthermore ϕ is an isometry.
Definition 8.2.15 Let F be a subfield of G. The Galois group Gal(G/F) is the
group all field automorphisms of G that leave F point-wise fixed. Gal(G/F) is
denoted by Gal(qm
, q) in case F = Fq and G = Fqm .
A subspace W of Fn
qm is called Gal(qm
, q)-invariant, or just invariant, if τ(W) =
W for all τ ∈ Gal(qm
, q).

Remark 8.2.16 Gal(qm
, q) is a cyclic group generated by ϕ of order m. Hence
a subspace W is invariant if and only if ϕ(W) ⊆ W.
The following two lemmas are similar to the statements for the shift operator
in connection with cyclic codes in Propositions 7.1.3 and 7.1.6 but now for the
Frobenius map.
Lemma 8.2.17 Let G be k × n generator matrix of the Fqm -linear code C. Let
gi be the i-th row of G. Then C is Gal(qm
, q)-invariant if and only if ϕ(gi) ∈ C
for all i = 1, . . . , k.
Proof. If C is invariant, then ϕ(gi) ∈ C for all i, since gi ∈ C.
Conversely, suppose that ϕ(gi) ∈ C for all i. Let c ∈ C. Then c =
k
i=1 xigi
for some xi ∈ Fqm . So
ϕ(c) =
k
i=1
xq
i ϕ(gi) ∈ C.
Hence C is an invariant code.
Lemma 8.2.18 Let C be an Fqm -linear code. Then C⊥
is invariant if C is
invariant.
Proof. Notice that
ϕ(x · y) = (
n
i=1 xiyi)q
=
n
i=1 xq
i yq
i = ϕ(x) · ϕ(y)
for all x, y ∈ Fn
qm . Suppose that C is an invariant code. Let y ∈ C⊥
and c ∈C.
Then ϕm−1
(c) ∈ C. Hence
ϕ(y) · c = ϕ(y) · ϕm
(c) = ϕ(y · ϕm−1
(c)) = ϕ(0) = 0,
Therefore ϕ(y) ∈ C⊥
for all y ∈ C⊥
, and C⊥
is invariant.
Proposition 8.2.19 Let C be Fqm -linear code of length n. Then C is Gal(qm
, q)-
invariant if and only if C has a generator matrix with entries in Fq if and only
if C has a parity check matrix with entries in Fq.
Proof. If C has a generator matrix with entries in Fq, then clearly C is
invariant.
Now conversely, suppose that C is invariant. Let G be a k × n generator matrix
of C. We may assume without loss of generality that the first k columns are
independent. So after applying the Gauss algorithm we get the row reduced
echelon form G of G with the k × k identity matrix Ik in the first k columns.
So G = (Ik|A), where A is a k × (n − k) matrix. Let gi be the i-th row of G .
Now C is invariant. So ϕ(gi) ∈ C and ϕ(gi) is an Fqm -linear combination of
the gj. That is one can find elements sij in Fqm such that
ϕ(gi) =
n
j=1
sijgj.

Let S be the k × k matrix with entries (sij). Then
(Ik|ϕ(A)) = (ϕ(Ik)|ϕ(A)) = ϕ(G ) = SG = S(Ik|A) = (S|SA).
Therefore Ik = S and ϕ(A) = SA = A. Hence the entries of A are elements of
Fq. So G is a generator matrix of C with entries in Fq.
The last equivalence is a consequence of Proposition 2.3.3.
= α + 1. Let
G be the generator matrix of the F8-linear code C with
G =


1 α α2
α3
α4
α5
α6
1 α2
α4
α6
α α3
α5
1 α4
α α5
α2
α6
α3


Let gi be the i-th row of G. Then ϕ(gi) = gi+1 for all i = 1, 2 and ϕ(g3) = g1.
Hence C is an invariant code by Lemma 8.2.17. The proof of Proposition 8.2.19
explains how to get a generator matrix G with entries in F2. Let G be the row
reduced echelon form of G. Then
G =


1 0 0 1 0 1 1
0 1 0 1 1 1 0
0 0 1 0 1 1 1


is indeed a binary matrix. In fact it is the generator matrix of the binary [7, 3, 4]
Hamming code.
Definition 8.2.21 Let C be an Fqm -linear code. Define the codes C0
and C∗
by
C0
= ∩m
i=1ϕi
(C),
C∗
=
m
i=1 ϕi
(C)
Remark 8.2.22 It is clear form the definitions that the codes C0
and C∗
are
Gal(qm
, q)-invariant. Furthermore C0
is the largest invariant code contained in
C, that is if D is an invariant code and D ⊆ C, then D ⊆ C0
. And similarly,
C∗
is the smallest invariant code containing C, that is if D is an invariant code
and C ⊆ D, then C∗
⊆ D.
Proposition 8.2.23 Let C be an Fqm -linear code. Then
C0
= ((C⊥
)∗
)⊥
Proof. The following inclusion holds C0
⊆ C. So dually C⊥
⊆ (C0
)⊥
. Now
C0
is invariant. So (C0
)⊥
is invariant by Lemma 8.2.18, and it contains C⊥
.
By Remark 8.2.22 (C⊥
)∗
is the smallest invariant code containing C⊥
. Hence
(C⊥
)∗
⊆ (C0
)⊥
and therefore
C0
⊆ ((C⊥
)∗
)⊥
.
We have C⊥
⊆ (C⊥
)∗
. So dually ((C⊥
)∗
)⊥
⊆ C. The code ((C⊥
)∗
)⊥
is invariant
and is contained in C. The largest code that is invariant and contained in C is
equal to C0
. Hence
((C⊥
)∗
)⊥
⊆ C0
.
Both inclusions give the desired equality.

Theorem 8.2.24 Let C be an Fqm -linear code. Then C and C0
have the same
restriction. Furthermore
dimFq
(C|Fq) = dimFqm (C0
) and d(C|Fq) = d(C0
).
Proof. The inclusion C0
⊆ C implies (C0
|Fq) ⊆ (C|Fq).
The code (C|Fq) ⊗ Fqm is contained in C and is invariant. Hence
(C|Fq) ⊗ Fqm ⊆ C0
,
by Remark 8.2.22. So (((C|Fq) ⊗ Fqm )|Fq) ⊆ (C0
|Fq). But
(C|Fq) = (((C|Fq) ⊗ Fqm )|Fq),
by Lemma 8.2.2 applied to D = (C|Fq). Therefore ((C|Fq) ⊆ (C0
|Fq), and with
the converse inclusion above we get the desired equality (C|Fq) = (C0
|Fq).
The code C0
has a k × n generator matrix G with entries in Fq, by Proposition
8.2.19, since C0
is an invariant code. Then G is also a generator matrix of
(C0
|Fq), by Lemma 8.2.2. Furthermore (C|Fq) = (C0
|Fq). Therefore
dimFq
(C|Fq) = k = dimFqm (C0
).
The code C0
has a parity check H with entries in Fq, by Proposition 8.2.19.
Then H is also a parity check matrix of (C|Fq) over Fq. The minimum distance
of a code can be expressed as the minimum number of columns in a parity check
matrix that are dependent, by Proposition 2.3.11. Consider a l × m matrix B
with entries in Fq. Then the the columns of B are dependent if and only if
rank(B) m. The rank B is equal to the number of pivots in the row reduced
echelon form of B. The row reduced echelon form of B is unique, by Remark
2.2.18, and does not change by considering B as a matrix with entries over Fqm .
Therefore d(C|Fq) = d(C0
).
Remark 8.2.25 Lemma 8.2.5 gives us a method to compute the parity check
matrix of the restriction. Proposition 8.2.23 and Theorem 8.2.24 give us another
way to compute the parity check and generator matrix of the restriction of a
code. Let C be an Fqm -linear code. Let H be a parity check matrix of C. Then
H is a generator matrix of C⊥
. Let (hi, i = 1, . . . , n − k) be the rows of H. Let
H∗
be the matrix with the (n−k)m rows ϕj
(hi), i = 1, . . . , n−k, j = 1, . . . , m.
Then these rows generate (C⊥
)∗
. Let H0 be the row reduced echelon form of
H∗
with the zero rows deleted. Then H0 has entries in Fq and is a generator
matrix of (C⊥
)∗
, since it is an invariant code. So H0 is the parity check matrix
of ((C⊥
)∗
)⊥
= C0
. Hence it is also the parity check matrix of (C0
|Fq) = (C|Fq).
Example 8.2.26 Consider the parity check matrix E of Example 8.2.7. Then
E∗
is equal to the matrix E of Example 8.2.12. Taking the row reduced echelon
form of E∗
gives indeed the parity check matrix H obtained in Example 8.2.7.
8.2.4 Cyclic codes as subﬁeld subcodes
***

8.2.5 Trace codes
Definition 8.2.27 The trace map Tr
Fqm
Fq
: Fqm → Fq is defined by
Tr
Fqm
Fq
(x) = x + xq
+ · · · + xqm−1
for x ∈ Fqm .
The notation Tr
Fqm
Fq
is abbreviated by Tr in case the context is clear. This map
is extended coördinatewise to a map Tr : Fn
qm → Fn
q .
Remark 8.2.28 Let F be a field and G a finite field extension of F of degree
m. Then G is a vector space over F of dimension m. Choose a basis of G over F.
Let x ∈ G. Then multiplication by x on G is an F-linear map. Let Mx be the
corresponding matrix of this map with respect to the chosen basis. The sum of
the diagonal elements of Mx is called the trace of x. This trace does not depend
on the chosen basis and will be denoted by TrG
F (x) or by Tr(x) for short.
Definition 8.2.27 of the trace for a finite extension of a finite field is an ad hoc
definition. With the above generalization of the definition of the trace the ad
hoc definition becomes a property.
The maps Tr : Fqm → Fq and Tr : Fn
qm → Fn
q are Fq-linear.
Proposition 8.2.29 (Delsarte-Sidelnikov) Let C be an Fqm -linear code. Then
(C⊥
∩ Fn
q )⊥
= Tr(C).
Proof. ***
8.2.6 Exercises
= α+1. Choose αi = αi
with i = 0, 1, 2, 3 as basis. Consider the parity check matrix
E =
1 α α2
α3
· · · α14
1 α2
α4
α6
· · · α13
of the F16-linear code C. Let E be the 1 × 8 submatrix of E consisting of the
first row of E. Let C be the F16-linear code with E as parity check matrix.
Determine the the parity check matrices H of C|Fq and and H of C |Fq, using
Lemma 8.2.5 and Proposition 8.2.9. Show that H = H .
= α + 1. Give a binary
parity check matrix of the binary restriction of the code RS4(15, 0). Determine
the dimension of the binary restriction of the code RSk(15, 0) for all k.
4 × 15 matrix with entry gij = αj2i
at the i-th row and the j-th column. Let C
be the code with generator matrix G. Show that C is Gal(16, 2)-invariant and
give a binary generator matrix of C.
8.2.4 Let m be a positive integer and C an Fqm linear code. Let ϕ be the
Frobenius map of Fqm fixing Fq. Show that ϕ(C) is an Fqm linear code that is
isometric with C. Give a counter example of a code C that is not monomial
equivalent with ϕ(C).
8.2.5 Give proofs of the statements made in Remark 8.2.28.

8.3. SOME FAMILIES OF POLYNOMIAL CODES 259
8.3 Some families of polynomial codes
***
8.3.1 Alternant codes
Definition 8.3.1 Let a = (a1, . . . , an) be an n-tuple of n distinct elements
of Fqm . Let b = (b1, . . . , bn) be an n-tuple of nonzero elements of Fqm . Let
GRSk(a, b) be the generalized RS code over Fqm of dimension k. The alternant
code ALTr(a, b) is the Fq-linear restriction of (GRSr(a, b))⊥
.
Proposition 8.3.2 The code ALTr(a, b) has parameters [n, k, d]q with
k ≥ n − mr and d ≥ r + 1.
Proof. The code (GRSr(a, b))⊥
is equal to GRSn−r(a, c) by Proposition
8.1.21 with cj = 1/(bj i=j(aj −ai)) by Proposition 8.1.21, and has parameters
[n, n − r, r + 1]qm by Proposition 8.1.14. So the statement is a consequence of
Proposition 8.2.9.
Proposition 8.3.3
(ALTr(a, b))⊥
= Tr(GRSr(a, b)).
Proof. This is a direct consequence of the definition of an alternant code and
Proposition 8.2.29.
Proposition 8.3.4 Every linear code of minimum distance at least 2 is an
alternant code.
Proof. Let C be a code of length n and dimension k. Then k n, since
the minimum distance of C is at least 2. Let m be a positive integer such that
n − k divides m and qm
≥ n. Let a = (a1, . . . , an) be any n-tuple of n distinct
elements of Fqm . Let H be an (n − k) × n parity check matrix of C over Fq.
Following the proof of Proposition 8.2.8, let α1, . . . , αn−k be a basis of Fn−k
q
over Fq. The field Fqm is an extension of Fqn−k , since n − k divides m. Define
bj =
m
i=1 hijαi for j = 1, . . . , n. The minimum distance of C is at least 2. So
H does not contain a zero column by Proposition 2.3.11. Hence bj = 0 for all
j. Let b = (b1, . . . , bn). Then C is the restriction of GRS1(a, b)⊥
. Therefore
C = ALT1(a, c) by definition.
Remark 8.3.5 The above proposition gives that almost all linear codes are
alternant, but it gives no useful information about the parameters of the code.
***Alternant codes meet the GV bound (MacWilliams Sloane page 337)
BCH codes are not asymptotically good?? ***

8.3.2 Goppa codes
A special class alternant codes is given by Goppa codes.
Definition 8.3.6 Let L = (a1, . . . , an) be an n-tuple of n distinct elements of
Fqm . A polynomial g with coefficients in Fqm such that g(aj) = 0 for all j is
called a Goppa polynomial with respect to L. Define the Fq-linear Goppa code
Γ(L, g) by
Γ(L, g) =



c ∈ Fn
q |
n
j=1
cj
X − aj
≡ 0 mod g(X)



.
Remark 8.3.7 The assumption g(aj) = 0 implies that X − aj and g(X) are
relatively prime, so their greatest common divisor is 1. Euclides algorithm gives
polynomials Pj and Qj such that Pj(X)g(X) + Qj(X)(X − aj) = 1. So Qj(X)
is the inverse of X − aj modulo g(X). We claim that
Qj(X) = −
g(X) − g(aj)
X − aj
g(aj)−1
.
Notice that g(X) − g(aj) has aj as zero. So g(X) − g(aj) is divisible by X − aj
and its fraction is a polynomial of degree one less than the degree of g(X). With
the above definition of Qj we get
Qj(X)(X − aj) = −(g(X) − g(aj))g(aj)−1
= 1 − g(X)g(aj)−1
≡ 1 mod g(X).
Remark 8.3.8 Let g1 and g2 be two Goppa polynomials with respect to L. If
g2 divides g1, then Γ(L, g1) is a subcode of Γ(L, g2).
Proposition 8.3.9 Let L = a = (a1, . . . , an). let g be a Goppa polynomial
of degree r. The Goppa code Γ(L, g) is equal to the alternant code ALTr(a, b)
where bj = 1/g(aj).
Proof. Remark 8.3.7 implies that c ∈ Γ(L, g) if and only if
n
j=1
cj
g(X) − g(aj)
X − aj
g(aj)−1
= 0,
since the left hand side is a polynomial of degree strictly smaller than the degree
of g(X), and this polynomial is 0 if and only if it is 0 modulo g(X). Let
g(X) = g0 + g1X + · · · + grXr
. Then
g(X) − g(aj)
X − aj
=
r
l=0
gl
Xl
− al
j
X − aj
=
r
l=0
gl
l−1
i=0
Xi
al−1−i
j
=
r−1
i=0
r
l=i+1
glal−1−i
j Xi
.
Therefore c ∈ Γ(L, g) if and only if
n
j=1
r
l=i+1
glal−1−i
j g(aj)−1
cj = 0

for all i = 0, . . . , r − 1, if and only if H1cT
= 0, where H1 is a r × n parity check
matrix with j-th column







grar−1
j + gr−1ar−2
j + · · · + g2aj + g1
...
gra2
j + gr−1aj + gr−2
graj + gr−1
gr







g(aj)−1
The coefficient gr is not zero, since g(X) has degree r. Divide the last row
of H1 by gr. Then subtract gr−1 times the r-th row from row r − 1. Next
divide row r − 1 by gr. Continuing in this way by a sequence of elementary
transformations it is shown that H1 is row equivalent with the matrix H2 with
entry ai−1
j g(aj)−1
in the i-th row and the j-th column. So H2 is the generator
matrix of GRSr(a, b), where b = (b1, . . . , bn) and bj = 1/g(aj). Hence Γ(L, g)
is the restriction of GRSr(a, b)⊥
. Therefore Γ(L, g) = ALTr(a, b) by definition.
Proposition 8.3.10 Let g be a Goppa polynomial of degree r over Fqm . Then
the Goppa code Γ(L, g) is an [n, k, d] code with
k ≥ n − mr and d ≥ r + 1.
Proof. This is a consequence of Proposition 8.3.9 showing that a Goppa code
is an alternant code and Proposition 8.3.2 on the parameters of alternant codes.
Remark 8.3.11 Let g be a Goppa polynomial of degree r over Fqm . Then the
Goppa code Γ(L, g) has minimum distance d ≥ r + 1 by Proposition 8.3.10.
It is an alternant code, that is a subfield subcode of a GRS code of minimum
distance r+1 by Proposition 8.3.9. This super code has several efficient decoding
algorithms that correct r/2 errors. The same algorithms can be applied to
the Goppa code to correct r/2 errors.
Definition 8.3.12 A polynomial is called square free if all (irreducible) factors
have multiplicity one.
Remark 8.3.13 Notice that irreducible polynomials are square free Goppa
polynomials. If g(X) is a square free Goppa polynomial, then g(X) and its
formal derivative g (X) have no common factor by Lemma 7.2.8.
Proposition 8.3.14 Let g be a square free Goppa polynomial with coefficients
in F2m . Then the binary Goppa code Γ(L, g) is equal to Γ(L, g2
).
Proof. (1) The code Γ(L, g2
) is a subcode of Γ(L, g), by Remark 8.3.8.
(2) Let c be a binary word. Define the polynomial f(X) by
f(X) =
n
j=1
(X − aj)cj

So f(X) is the reciprocal locator polynomial of c, it is the monic polynomial of
degree wt(c) and its zeros are located at those aj such that cj = 0. Now
f (X) =
n
j=1
cj(X − aj)cj −1
n
l=1,l=j
(X − al)cl
.
Hence
f (X)
f(X)
=
n
j=1
cj
X − aj
Let c ∈ Γ(L, g). Then f (X)/f(X) ≡ 0 mod g(X). Now gcd(f(X), g(X)) = 1.
So there exist polynomials p(X) and q(X) such that p(X)f(X)+q(X)g(X) = 1.
Hence
p(X)f (X) ≡
f (X)
f(X)
≡ 0 mod g(X).
Therefore g(X) divides f (X), since gcd(p(X), g(X)) = 1.
Let f(X) = f0 + f1X + · · · + fnXn
. Then
f (X) =
n
i=0
ifiXi−1
=
n/2
i=0
f2i+1X2i
=


n/2
i=0
f2m−1
2i+1 Xi


2
,
since the coefficients are in F2m . So f (X) is a square that is divisible by the
square free polynomial g(X). Hence f (X) is divisible by g(X)2
, so c ∈ Γ(L, g2
).
Therefore Γ(L, g) is contained in Γ(L, g2
). So they are equal by (1).
Proposition 8.3.15 Let g be a square free Goppa polynomial of degree r with
coefficients in F2m . Then the binary Goppa code Γ(L, g) is an [n, k, d] code with
k ≥ n − mr and d ≥ 2r + 1.
Proof. This is a consequence of Proposition 8.3.14 showing that Γ(L, g) =
Γ(L, g2
) and Proposition 8.3.10 on the parameters of Goppa codes. The lower
bound on the dimension uses that g(X) has degree r, and the lower bound on
the minimum distance uses that g2
(X) has degree 2r.
= α + 1. Let
aj = αj−1
be an enumeration of the seven elements of L = F∗
8. Let g(X) =
1+X+X2
. Then g is a square free polynomial in F2[X] and a Goppa polynomial
with respect to L. Let a be the vector with entries aj. Let b be defined by
bj = 1/g(aj). Then b = (1, α2
, α4
, α2
, α, α, α4
). And Γ(L, g) = ALT2(a, b)
by Proposition 8.3.9. Let k be the dimension and d the minimum distance of
Γ(L, g). Then k ≥ 1 and d ≥ 5 by Proposition 8.3.15. In fact Γ(L, g) is a one
dimensional code generated by (0, 1, 1, 1, 1, 1, 1). Hence d = 6.
Example 8.3.17 Let L = F210 . Consider the binary Goppa code Γ(L, g) with
a Goppa polynomial g in F210 [X] of degree 50 with respect to L = F210 . Then
the code has length 1024, dimension k ≥ 524 and minimum distance d ≥ 51. If
moreover g is square free, then d ≥ 101.
***Goppa codes meet the GV bound, random argument***

8.3.3 Counting polynomials
The number of certain polynomials will be counted in order to get an idea of
the number of Goppa codes.
Remark 8.3.18 (1) Irreducible polynomials are square free Goppa polynomi-
als. The number of monic irreducible polynomials in Fq[X] of degree d is counted
by Irrq(d) and this number is computed by means of the M¨obius function as
given by Proposition 7.2.19.
(2) Every monic square free polynomial f(X) over Fq of degree r has a unique
factorization in monic irreducible polynomials. Let ei be the number of irre-
ducible factors in f(X) of degree i. Then e1 + 2e2 + · · · + rer = r and there are
ei ways to choose among the Irrq(i) monic irreducible polynomials of degree i.
Hence the number Sq(r) of monic square free polynomials over Fq of degree r
is equal to
Sq(r) =
e1+2e2+···+rer=r
r
i=1
Irrq(i)
ei
.
(3) The number SGq(r) of square free monic Goppa polynomials in Fq[X] of
degree r with respect to L = Fq is similar, since such Goppa polynomials have
no linear factors in Fq[X]. Hence
SGq(r) =
2e2+···+rer=r
r
i=2
Irrq(i)
ei
.
Simpler formulas are obtained in the following.
Proposition 8.3.19 Let Sq(r) be the number of monic square free polynomials
over Fq of degree r. Then Sq(0) = 1, Sq(1) = q and Sq(r) = qr
−qr−1
for r 1.
Proof. Clearly Sq(0) = 1 and Sq(1) = q. Since 1 is the only monic polynomial
of degree zero, and {a + X|a ∈ Fq} is the set of monic polynomials of degree
one and they are all square free.
If f(X) is a monic polynomial of degree r 1, but not square free, then we
have a unique factorization
f(X) = g(X)2
h(X),
where g(X) is a monic plynomial, say of degree a, and h(X) is a monic square
free polynomial of degree b. So 2a + b = r and a 0. Hence the number of
monic polynomials of degree r over Fq that are not square free is qr
−Sq(r) and
equal to
r/2
a=1
qa
Sq(r − 2a).
Therefore
Sq(r) = qr
−
r/2
a=1
qa
Sq(r − 2a).
This recurrence relation with starting values Sq(0) = 1 and Sq(1) = q has the
unique solution Sq(r) = qr
− qr−1
for r 1. This is left as an exercise.

Proposition 8.3.20 Let r ≤ n ≤ q. The number Gq(r, n) of monic Goppa
polynomials in Fq[X] of degree r with respect to L that consists of n distinct
given elements in Fq is given by
Gq(r, n) =
r
i=0
(−1)i n
i
qr−i
.
Proof. Let Pq(r) be the set of all monic polynomials in Fq[X] of degree r.
Then Pq(r) := |Pq(r)| = qr
, since r coefficients of a monic polynomial of degree
r are free to choose in Fq. Let a be a vector of length n with entries the elements
of L. Let I be a subset of {1, . . . , n}. Define
Pq(r, I) = { f(X) ∈ Pq(r) | f(ai) = 0 for all i ∈ I }.
If r ≥ |I|, then
Pq(r, I) = Pq(r − |I|) ·
i∈I
(X − ai),
since f(ai) = 0 if and only if f(X) = g(X)(X − ai), and the ai are mutually
distinct. Hence
Pq(r, I) := |Pq(r, I)| = Pq(r − |I|) = qr−|I|
for all r ≥ |I|. So Pq(r, I) depends on q, r and only on the size of I. Furthermore
Pq(r, I) is empty if r |I|. The set of monic Goppa polynomials in Fq[X] of
degree r with respect to L is equal to
n
i=1
(Pq(r) Pq(r, ai)) = Pq(r)
n
i=1
Pq(r, ai) .
The principle of inclusion/exclusion gives
Gq(r, n) =
I
(−1)|I|
Pq(r, I) =
r
i=0
(−1)i n
i
qr−i
.
Proposition 8.3.21 Let r ≤ n ≤ q. The number SGq(r, n) of square free,
monic Goppa polynomials in Fq[X] of degree r with respect to L that consists of
n distinct given elements in Fq is given by
SGq(r, n) = (−1)r n + r − 1
r
+
r−1
i=0
(−1)i n + 2i − 1
n + i − 1
n + i − 1
i
qr−i
.
Proof. An outline of the proof is given. The details are left as an exercise.
(1) The following recurrence relation holds
SGq(r, n) = Gq(r, n) −
r/2
a=1
Gq(a, n) · SGq(r − 2a, n)
and that the given formula for SGq(r, n) satisfies this recurrence relation.
(2) ****The given formula satisfies the recurrence relation and the starting
values.******

Example 8.3.22 ****Consider polynomials over the finite field F1024. Com-
pute the following numbers.
(1) The number of monic irreducible polynomials of degree 50.
(2) The number of square free monic polynomials of degree 50
(3) The number of monic Goppa polynomials of degree 50 with respect to L.
(4) The number of square free, monic Goppa polynomials of degree 50 with
respect to L.
****
***Question: If Γ(L, g1) = Γ(L, g2) and ..., then g1 = g2???
***the book of Berlekamp on Algebraic coding theory.
***generating functions, asymptotics***
***Goppa codes meet the GV bound.***
8.3.4 Exercises
8.3.2 Let L = F∗
9. Consider the Goppa codes Γ(L, g) over F3. Show that the
only Goppa polynomials in F3[X] of degree 2 are X2
and 2X2
.
8.3.3 Let L be an enumeration of the eight elements of F∗
9. Describe the Goppa
codes Γ(L, X) and Γ(L, X2
) over F3 as alternant codes of the form ALT1(a, b)
and ALT1(a, b ). Determine the parameters of these codes and compare these
with the ones given in Proposition 8.3.15.
8.3.4 Let g be a square free Goppa polynomial of degree r over Fqm . Then the
Goppa code Γ(L, g) has minimum distance d ≥ 2r + 1 by Proposition 8.3.15.
Explain how to adapt the decoding algorithm mentioned in Remark 8.3.11 to
correct r errors.
8.3.5 Let L = F211 . Consider the binary Goppa code Γ(L, g) with a square
free Goppa polynomial g in F211 [X] of degree 93 with respect to L = F211 . Give
lower bounds on the dimension the minimum distance of this code.
8.3.6 Give a proof of the formula Sq(r) = qr
− qr−1
for r 1 by showing by
induction that it satisfies the recurrence relation given in the proof of Proposition
8.3.19.
8.3.7 Give a proof of the recurrence relation given in (1) of the proof of Propo-
sition 8.3.21 and show that the given formula for SGq(r, n) satisfies the recur-
rence relation.
8.3.8 Consider polynomials over the finite field F211 . Let L = F211 . Give a
numerical approximation of the following numbers.
(1) The number of monic irreducible polynomials of degree 93.
(2) The number of square free monic polynomials of degree 93
(3) The number of monic Goppa polynomials of degree 93 with respect to L.
(4) The number of square free, monic Goppa polynomials of degree 93 with
respect to L.

8.4 Reed-Muller codes
The q-ary RS code RSk(n, 1) of length q − 1 was introduced as a cyclic code
in Definition 8.1.1 and it was shown in Proposition 8.1.4 that it could also be
described as the code obtained by evaluating all univariate polynomials over Fq
of degree strictly smaller than k at all the nonzero elements of the finite field Fq.
The extended RS codes can be considered as the code evaluating those functions
at all the elements of Fq as done in 8.1.7. The multivariate generalization of
the last point view is taken as the definition of Reed-Muller codes and it will be
shown that the shortened Reed-Muller codes are certain cyclic codes.
In this section, we assume n = qm
. The vector space Fm
q has n elements. Choose
an enumerations of its point Fm
q = {P1, · · · , Pn}. Let P = (P1, . . . , Pn). Define
the evaluation maps
evP : Fq[X1, . . . , Xm] −→ Fn
q
by
evP(f) = (f(P1), . . . , f(Pn))
for f ∈ Fq[X1, . . . , Xm].
Definition 8.4.1 The q-ary Reed-Muller code RMq(u, m) of order u in m vari-
ables is defined as
RMq(u, m) = { evP(f) | f ∈ Fq[X1, . . . , Xm], deg(f) ≤ u }.
The dual of a Reed-Muller code is again Reed-Muller.
Proposition 8.4.2 The dual code of RMq(u, m) is equal to RMq(u⊥
, m), where
u⊥
= m(q − 1) − u − 1.
Proof.
8.4.1 Punctured Reed-Muller codes as cyclic codes
The field Fqm can be viewed as an m-dimensional vector space over Fq. Let
β1, · · · , βm be a basis of Fqm over Fq. Then we have an isomorphism of vector
spaces
ϕ : Fqm −→ Fm
q
such that ϕ(α) = (a1, . . . , am) if and only if
α =
m
i=1
aiβi
for every α ∈ Fqm .
Choose a primitive element ζ of Fqm , that is a generator of F∗
qm which is an
element of order qm
− 1. Now define the n points P = (P1, . . . , Pn) in Fm
q by
P1 = 0 and Pi = ϕ(ζi−1
) for i = 1, . . . , n.
Pj := (a1j, a2j, . . . , am,j), j = 1, · · · , n.
and let α = (α1, . . . , αn) with
αj :=
m
i=1
aijβi j = 1, · · · , n.

8.4. REED-MULLER CODES 267
8.4.2 Reed-Muller codes as subfield subcodes and trace
codes
Alternant codes are restrictions of generalized RS codes, and it is shown [?,
Theorem 15] that Sudan’s decoding algorithm can be applied to this situation.
Following [?] we describe the q-ary Reed-Muller code RMq(u, m) as a subfield
subcode of RMqm (v, 1) for some v, and this last one is a RS code over Fqm .
In this section, we assume n = qm
. The vector space Fm
q has n elements which
are often called points, i.e, Fm
q = {P1, · · · , Pn}. Since Fm
q
∼= Fqm , the elements
of Fm
q exactly correspond to the points of Fm
q . Define the evaluation maps
evP : Fq[X1, . . . , Xm] → Fn
q and evα : Fqm [Y ] → Fn
qm
by evP(f) = (f(P1), . . . , f(Pn)) for f ∈ Fq[X1, . . . , Xm] and evα(g) = (g(α1), . . . , g(αn))
for g ∈ Fqm [Y ].
Recall that the q-ary Reed-Muller code RMq(u, m) of order u is defined as
RMq(u, m) = { evP(f) | f ∈ Fq[X1, . . . , Xm], deg(f) ≤ u }.
Similarly the qm
-ary Reed-Muller code RMqm (v, 1) of order v is defined as
RMqm (v, 1) = { evα(g) | g ∈ Fqm [Y ], deg(g) ≤ v }.
The following proposition is form [?] and [?].
Proposition 8.4.3 Let ρ be the rest after division of u⊥
+ 1 by q − 1 with
quotient e, that is
u⊥
+ 1 = e(q − 1) + ρ, where ρ q − 1.
Define d = (ρ + 1)qe
. Then d is the minimum distance of RMq(u, m).
Proposition 8.4.4 Let n = qm
. Let d be the minimum distance of RMq(u, m).
Then RMq(u, m) is a subfield subcode of RMqm (n − d, 1).
Proof. This can be shown by using the corresponding fact for the cyclic
punctured codes as shown in Theorem 1 and Corollary 2 of [?]. Here we give a
direct proof.
1) Consider the map of rings
ϕ : Fqm [Y ] −→ Fqm [X1, . . . , Xm]
defined by
ϕ(Y ) = β1X1 + · · · + βmXm.
Let Tr : Fqm → Fq be the trace map. This induces an Fq-linear map
Fqm [X1, . . . , Xm] −→ Fq[X1, . . . , Xm]
that we also denote by Tr and which is defined by
Tr
i
fiXi
=
i
Tr(fi)Xi

where the multi-index notation is adopted Xi
= Xi1
1 · · · Xim
m for i = (i1, · · · , im) ∈
Nm
0 . Define the Fq-linear map
T : Fqm [Y ] −→ Fq[X1, . . . , Xm]
by the composition T = Tr ◦ ϕ.
The trace map
Tr : Fn
qm −→ Fn
q
is defined by Tr(a) = (Tr(a1), . . . , Tr(an)).
Consider the square of maps
Fn
qm Fn
q
Fqm [Y ] Fq[X1, . . . , Xm]
E
c
E
c
Tr
T
evα evP
.
We claim that that this diagram commutes. That means that
evP ◦ T = Tr ◦ evα.
In order to show this it is sufficient that γY h
is mapped to the same element
under the two maps for all γ ∈ Fqm and h ∈ N0, since the maps are Fq-linear
and the γY h
generate Fqm [Y ] over Fq. Furthermore it is sufficient to show this
for the evaluation maps evP : Fq[X1, . . . , Xm] → Fq and evα : Fqm [Y ] → Fqm
for all points P ∈ Fn
q and elements α ∈ Fqm such that P = (a1, a2, . . . , am) and
α =
m
i=1 aiβi. Now
evP ◦ T(γY h
) = evP(Tr(γ(β1X1 + · · · + βmXm)h
)) =
evP Tr
i1+···+ım=h
h
i1 · · · im
γ(β1X1)i1
· · · (βmXm)im
=
evP
i1+···+ım=h
h
i1 · · · im
Tr(γβi1
1 · · · βim
m )Xi1
1 · · · Xim
m =
i1+···+ım=h
h
i1 · · · im
Tr(γβi1
1 · · · βim
m )ai1
1 · · · aim
m =
Tr
i1+···+ım=h
h
i1 · · · im
γβi1
1 · · · βim
m ai1
1 · · · aim
m =
Tr((γ(β1a1 + · · · + βmam)h
) = Tr(γαh
) = Tr(evα(γY h
)) = Tr ◦ evα(γY h
).
This shows the commutativity of the diagram.
2) Let h be an integer such that 0 ≤ h ≤ qm
− 1. Express h in radix-q form
h = h0 + h1q + δ2q2
+ · · · + hm−1qm−1
.

8.4. REED-MULLER CODES 269
Define the weight of h as
W(h) = h0 + δ1 + h2 + · · · + hm−1.
We show that for every f ∈ Fqm [Y ] there exists a polynomial g ∈ Fq[X1, . . . , Xm]
such that deg(g) ≤ W(h) and
evP ◦ T(γY h
) = evP (g).
It is enough to show this for every f of the form f = γY h
where γ ∈ Fqm and
h an integer such that 0 ≤ h ≤ qm
− 1. Consider
evP ◦ T(γY h
) = evP ◦ T(γY t htqt
= evP ◦ T γ
m−1
t=0
Y htqt
.
Expanding this expression gives
Tr γ
m−1
t=0 i1+···+ım=ht
ht
i1 · · · im
γ(βi1
1 · · · βim
m )qt
ai1
1 · · · aim
m .
Let
g = Tr γ
m−1
t=0 i1+···+ım=ht
ht
i1 · · · im
γ(βi1
1 · · · βim
m )qt
Xi1
1 · · · Xim
m .
Then this g has the desired properties.
3) A direct consequence of 1) and 2) is
Tr(RMqm (h, 1)) ⊆ RMq(W(h), m).
We defined d = (ρ + 1)qe
, where ρ is the rest after division of u⊥
+ 1 by q − 1
with quotient e, that is u⊥
+ 1 = e(q − 1) + ρ, where ρ q − 1. Then d − 1
is the smallest integer h such that W(h) = u⊥
+ 1, see [?, Theorem 5]. Hence
W(h) ≤ u⊥
for all integers h such that 0 ≤ h ≤ d − 2. Therefore
Tr(RMqm (d − 2, 1)) ⊆ RMq(u⊥
, m).
4) So
RMq(u, m) ⊆ (Tr(RMqm (d − 2, 1)))⊥
.
5) Let C be an Fqm -linear code in Fn
qm . The relation between the restriction
C ∩ Fn
q and the trace code Tr(C) is given by Delsarte’s theorem, see [?] and [?,
chap. 7, §8 Theorem 11]
C ∩ Fn
q = (Tr(C⊥
)⊥
.
Application to 4) and using RMqm (n − d, 1) = RMqm (d − 2, 1)⊥
gives
RMq(u, m) ⊆ RMqm (n − d, 1) ∩ Fn
q .
Hence RMq(u, m) is a subfield subcode of RMqm (n − d, 1).
***Alternative proof making use of the fact that RM is an extension of a restric-
tion of a RS code, and use the duality properties of RS codes and dual(puncture)=shorten(dual)***

Example 8.4.5 The code RMq(u, m) is not necessarily the restriction of RMqm (n−
d, 1). The following example shows that the punctured Reed-Muller code is a
proper subcode of the binary BCH code. Take q = 2, m = 6 and u = 3. Then
u⊥
= 2, σ = 3 and ρ = 0. So d = 23
= 8. The code RM∗
2 (3, 6) has parameters
[63, 42, 7]. The binary BCH code with zeros ζi
with i ∈ {1, 2, 3, 4, 5, 6} has com-
plete defining set the union of the sets: {1, 2, 4, 8, 16, 32}, {3, 6, 12, 24, 28, 33},
{5, 10, 20, 40, 17, 34}. So the dimension of the BCH code is: 63 − 3 · 6 = 45.
Therefore the BCH code has parameters [63,45,7] and it has the punctured
RM code as a subcode, but they are not equal. This is explained by the zero
9 = 1 + 23
having 2-weight equal to 2 ≤ u⊥
, whereas no element of the cyclo-
tomic coset {9, 18, 36} of 9 is in the set {1, 2, 3, 4, 5, 6}. The BCH code is the
binary restriction of RM∗
64(56, 1). Hence RM2(3, 6) is a subcode of the binary
restriction of RM64(56, 1), but they are not equal.
8.4.3 Exercises
8.4.1 Show the Shift bound for RM(q, m)∗
considered as cyclic code is equal
to the actual minimum distance.
8.5 Notes
Subfield subcodes of RS code, McEliece-Solomon.
Numerous applications of Reed-Solomon codes can be found in [135].
Twisted BCH codes by Edel.
Folded RS codes by Guruswami.
Stichtenoth-Wirtz
Cauchy and Srivastava codes, Roth-Seroussi and Dür.
Proposition 8.3.19 is due to Carlitz [37]. See also [11, Exercise (3.3)]. Proposi-
tion 8.3.21 is a generalization of Retter [98].

Chapter 9
Algebraic decoding
*** intro***
9.1 Error-correcting pairs
In this section we give an algebraic way, that is by solving a system of linear
equations, to compute the error positions of a received word with respect to
Reed-Solomon codes. The complexity of this algorithm is O(n3
).
9.1.1 Decoding by error-correcting pairs
In Definition 7.4.9 we introduced the star product a ∗ b for a, b ∈ Fn
q by the
coördinate wise multiplication a ∗ b = (a1b1, . . . , anbn).
Remark 9.1.1 Notice that multiplying polynomials first and than evaluating
gives the same answer as first evaluating and than multiplying. That is, if
f(X), g(X) ∈ Fq[X] and h(X) = f(X)g(X), then h(a) = f(a)g(a) for all a ∈ Fq.
So
ev(f(X)g(X)) = ev(f(X)) ∗ ev(g(X)) and
eva(f(X)g(X)) = eva(f(X)) ∗ eva(g(X))
for the evaluation maps ev and eva.
Proposition 9.1.2 Let k + l ≤ n. Then
GRSk(a, b) ∗ GRSl(a, c) = GRSk+l−1(a, b ∗ c) and
RSk(n, b) ∗ RSl(n, c) = RSk+l−1(n, b + c − 1) if n = q − 1.
Proof. Now GRSk(a, b) = {eva(f(X))∗b | f(X) ∈ Fq[X], deg f(X) k } and
similar statements hold for GRSl(a, c) and GRSk+l−1(a, b ∗ c). Furthermore
(eva(f(X)) ∗ b) ∗ (eva(g(X)) ∗ c) = eva(f(X)g(X)) ∗ b ∗ c;
271

272 CHAPTER 9. ALGEBRAIC DECODING
and deg f(X)g(X) k + l − 1 if deg f(X) k and deg g(X) l. Hence
GRSk(a, b) ∗ GRSl(a, c) ⊆ GRSk+l−1(a, b ∗ c).
In general equality does not hold, but we have
GRSk(a, b) ∗ GRSl(a, c) = GRSk+l−1(a, b ∗ c),
since on both sides the vector spaces are generated by the elements
(eva(Xi
) ∗ b) ∗ (eva(Xj
) ∗ c) = eva(Xi+j
) ∗ b ∗ c
where 0 ≤ i k and 0 ≤ j l.
Let n = q − 1. Let α be a primitive element of F∗
q. Define aj = αj−1
and
bj = an−b+1
j for j = 1, . . . , n. Then RSk(n, b) = GRSk(a, b) by Example
8.1.11. Similar statements hold for RSl(n, c) and RSk+l−1(n, b + c − 1). The
statement concerning the star product of RS codes is now a consequence of the
corresponding statement on the GRS codes.
Example 9.1.3 Let n = q−1, k, l 0 and k+l n. Then RSk(n, 1) is in one-
to-one correspondence with polynomials of degree at most k − 1, and similar
statements hold for RSl(n, 1) and RSk+l−1(n, 1). Now RSk(n, 1) ∗ RSl(n, 1)
corresponds one-to-one with polynomials that are a product of a polynomial of
degrees at most k − 1 and l − 1, respectively, that is to reducible polynomials
over Fq of degree at most k + l − 1. There exists an irreducible polynomial of
degree k + l − 1, by Remark 7.2.20. Hence
RSk(n, 1) ∗ RSl(n, 1) = RSk+l−1(n, 1).
Definition 9.1.4 Let A and B be linear subspaces of Fn
q . Let r ∈ Fn
q . Define
the kernel
K(r) = { a ∈ A | (a ∗ b) · r = 0 for all b ∈ B}.
Definition 9.1.5 Let B∨
be the space of all linear functions β : B → Fq. Now
K(r) is a subspace of A and it is the kernel of the linear map
Sr : A → B∨
defined by a → βa, where βa(b) = (a ∗ b) · r. Let a1, . . . , al and b1, . . . , bm
be bases of A and B, respectively. Then the map Sr has the m × l syndrome
matrix ((bi ∗ aj) · r|1 ≤ j ≤ l, 1 ≤ i ≤ m) with respect to these bases.
Example 9.1.6 Let A = RSt+1(n, 1), B = RSt(n, 0). Then A ∗ B is contained
in RS2t(n, 0) by Proposition 9.1.2. Let C = RS2t(n, 1). Then C⊥
= RS2t(n, 0)
by Proposition 8.1.2. As gn,k(X) = g0,k(X) for n = q − 1, by the definition
of Reed-Solomon code, we further have C⊥
= RS2t(n, 0). Hence A ∗ B ⊆ C⊥
.
Let ai = ev(Xi−1
) for i = 1, . . . , t + 1, and bj = ev(Xj
) for j = 1, . . . , t, and
hl = ev(Xl
) for l = 1, . . . , 2t. Then a1, . . . , at+1 is a basis of A and b1, . . . , bt
is a basis of B. The vectors h1, . . . , h2t form the rows of a parity check matrix
H for C. Then ai ∗ bj = ev(Xi+j−1
) = hi+j−1. Let r be a received word and
s = rHT
its syndrome. Then
(bi ∗ aj) · r = si+j−1.

9.1. ERROR-CORRECTING PAIRS 273
Hence to compute the kernel K(r) we have to compute the null space of the
matrix of syndromes





s1 s2 · · · st st+1
s2 s3 · · · st+1 st+2
...
...
...
...
...
st st+1 · · · s2t−1 s2t





.
We have seen this matrix before as the coeﬃcient matrix of the set of equations
for the computation of the error-locator polynomial in the algorithm of APGZ
7.5.3.
Lemma 9.1.7 Let C be an Fq-linear code of length n. Let r be a received word
with error vector e. If A ∗ B ⊆ C⊥
, then K(r) = K(e).
Proof. We have that r = c+e for some codeword c ∈ C. Now a∗b is a parity
check for C, since A∗B ⊆ C⊥
. So (a∗b)·c = 0, and hence (a∗b)·r = (a∗b)·e
for all a ∈ A and b ∈ B.
Let J be a subset of {1, . . . n}. The subspace
A(J) = { a ∈ A | aj = 0 for all j ∈ J }.
was deﬁned in 4.4.10.
Lemma 9.1.8 Let A∗B ⊆ C⊥
. Let e be the error vector of the received word r.
If I = supp(e) = { i | ei = 0 }, then A(I) ⊆ K(r). If moreover d(B⊥
) wt(e),
then A(I) = K(r).
Proof. 1) Let a ∈ A(I). Then ai = 0 for all i such that ei = 0, and therefore
(a ∗ b) · e =
ei=0
aibiei = 0
for all b ∈ B. So a ∈ K(e). But K(e) = K(r) by Lemma 9.1.7. Hence
a ∈ K(r). Therefore A(I) ⊆ K(r).
2) Suppose moreover that d(B⊥
) wt(e). Let a ∈ K(r), then a ∈ K(e) by
Lemma 9.1.7. Hence
(e ∗ a) · b = e · (a ∗ b) = 0
for all b ∈ B, giving e ∗ a ∈ B⊥
. Now wt(e ∗ a) ≤ wt(e) d(B⊥
) So e ∗ a = 0
meaning that eiai = 0 for all i. Hence ai = 0 for all i such that ei = 0, that is
for all i ∈ I = supp(e). Hence a ∈ A(I). Therefore K(r) ⊆ A(I) and equality
holds by (1).
Remark 9.1.9 Let I = supp(e) be the set of error positions. The set of zero
coordinates of a ∈ A(I) contains the set of error positions by Lemma 9.1.8. For
that reason the elements of A(I) are called error-locator vectors or functions.
But the space A(I) is not known to the receiver. The space K(r) can be com-
puted after receiving the word r. The equality A(I) = K(r) implies that all
elements of K(r) are error-locator functions.
Let A ∗ B ⊆ C⊥
. The basic algorithm for the code C computes the kernel K(r)

for every received word r. If this kernel is nonzero, it takes a nonzero element
a and determines the set J of zero positions of a. If d(B⊥
) wt(e), where e is
the error-vector, then J contains the support of e by Lemma 9.1.8. If the set J
is not too large, the error values are computed.
Thus we have a basic algorithm for every pair (A, B) of subspaces of Fn
q such
that A ∗ B ⊆ C⊥
. If A is small with respect to the number of errors, then
K(r) = 0. If A is large, then B becomes small, which results in a large code
B⊥
, and it will be difficult to meet the requirement d(B⊥
) wt(e).
Definition 9.1.10 Let A, B and C be subspaces of Fn
q . Then (A, B) is called
a t-error-correcting pair for C if the following conditions are satisfied:
1. A ∗ B ⊆ C⊥
,
2. dim(A) t,
3. d(B⊥
) t,
4. d(A) + d(C) n
Proposition 9.1.11 Let (A, B) be a t-error-correcting pair for C. Then the
basic algorithm corrects t errors for the code C with complexity O(n3
).
Proof. The pair (A, B) is a t-error-correcting for C, so A ∗ B ⊆ C⊥
and the
basic algorithm can be applied to decode C.
If a received word r has at most t errors, then the error vector e with support
I has size at most t and A(I) is not zero, since I imposes at most t linear
conditions on A and the dimension of A is at least t + 1.
Let a be a nonzero element of K(r). Let J = {j | aj = 0}.
We assumed that d(B⊥
) t. So K(r) = A(I) by Lemma 9.1.8. So a is an
error-locator vector and J contains I.
The weight of the vector a is at least d(A), so a has at most n − d(A) d(C)
zeros by (4) of Definition 9.1.10. Hence |J| d(C) and Proposition 6.2.9 or
6.2.15 gives the error values.
The complexity is that of solving systems of linear equations, that is O(n3
).
We will show the existence of error-correcting pairs for (generalized) Reed-
Solomon codes.
Proposition 9.1.12 The codes GRSn−2t(a, b) and RSn−2t(n, b) have t-error-
correcting pairs.
Proof. Let C = GRSn−2t(a, b). Then C⊥
= GRS2t(a, c) for some c by
Proposition 8.1.21. Let A = GRSt+1(a, 1) and B = GRSt(a, c). Then A ∗ B ⊆
C⊥
by Proposition 9.1.2. The codes A, B and C have parameters [n, t + 1, n −
t], [n, t, n − t + 1] and [n, n − 2t, 2t + 1], respectively, by Proposition 8.1.14.
Furthermore B⊥
has parameters [n, n − t, t + 1] by Corollary 3.2.7, and has has
minimum distance t + 1. Hence (A, B) is a t-error-correcting pair for C.
The code RSn−2t(n, b) is of the form GRSn−2t(a, b). Therefore the pair of
codes (RSt+1(n, 1), RSt(n, n − b + 1)) is a t-error-correcting pair for the code
RSn−2t(n, b).

9.1. ERROR-CORRECTING PAIRS 275
Example 9.1.13 Choose α ∈ F16 such that α4
= α + 1 as primitive element of
F16. Let C = RS11(15, 1). Let
r = (0, α4
, α8
, α14
, α1
, α10
, α7
, α9
, α2
, α13
, α5
, α12
, α11
, α6
, α3
)
be a received word with respect to the code C with 2 errors. We show how to
find the transmitted codeword by means of the basic algorithm.
The dual of C is equal to RS4(15, 0). Hence RS3(15, 1)∗RS2(15, 0) is contained
in RS4(15, 0). Take A = RS3(15, 1) and B = RS2(15, 0). Then A is a [15, 3, 13]
code, and the dual of B is RS13(15, 1) which has minimum distance 3. Therefore
(A, B) is a 2-error-correcting pair for C by Proposition 9.1.12. Let
H = (αij
| 1 ≤ i ≤ 4, 0 ≤ j ≤ 14 ).
Then H is a parity check matrix of C. The syndrome vector of r equals
(s1, s2, s3, s4) = rHT
= (α10
, 1, 1, α10
).
The space K(r) consists of the evaluation ev(a0+a1X+a2X2
) of all polynomials
a0 + a1X + a2X2
such that (a0, a1, a2)T
is in the null space of the matrix
s1 s2 s3
s2 s3 s4
=
α10
1 1
1 1 α10 ∼
1 0 1
0 1 α5 .
So K(r) = ev(1 + α5
X + X2
) . The polynomial 1 + α5
X + X2
has α6
and
α9
as zeros. Hence the error positions are at the 7-th and 10-th coordinate. In
order to compute the error values by Proposition 6.2.9 we have to find a linear
combination of the 7-th and 10-th column of H that equals the syndrome vector.
The system 



α6
α9
α10
α12
α3
1
α3
α12
1
α9
α6
α10




has (α5
, α5
)T
as unique solution. That is, the error vector e has e7 = α5
,
e10 = α5
and ei = 0 for all i ∈ {7, 10}. Therefore the transmitted codeword is
c = r − e = (0, α4
, α8
, α14
, α1
, α10
, α13
, α9
, α2
, α7
, α5
, α13
, α11
, α6
, α7
).
9.1.2 Existence of error-correcting pairs
Example 9.1.14 Let C be the binary cyclic code with defining set {1, 3, 7, 9}
as in Examples 7.4.8 and 7.4.17. Then d(C) ≥ 7 by the Roos bound 7.4.16
with U = {0, 4, 12, 20} and V = {2, 3, 4}. ***This gives us an error correcting
pair***
Remark 9.1.15 The great similarity between the concept of an error-correcting
pair and the techniques used by Van Lint and Wilson in the AB bound one can
see in the reformulation of the Roos bound in Remark 7.4.25. A special case of
this reformulation is obtained if we take a = b = t.
Proposition 9.1.16 Let C be an Fq-linear code of length n. Let (A, B) be
a pair of Fqm -linear codes of length n such that the following properties hold:
(1) (A ∗ B) ⊥ C, (2) k(A) t, (3) d(B⊥
) t, (4) d(A) + 2t n and
(5) d(A⊥
) 1. Then d(C) ≥ 2t + 1 and (A, B) is a t-error-correcting pair for
C.

Proof. The conclusion on the minimum distance of C is explained in Remark
7.4.25. Conditions (1), (2) and (3) are the same as in the ones in the definition
of a t-error-correcting pair. Condition (4) in the proposition is stronger than in
the definition, since d(A) + d(C) ≥ d(A) + 2t + 1 d(A) + 2t n.
Remark 9.1.17 As a consequence of this proposition there is an abundance
of examples of codes C with minimum distance at least 2t + 1 that have a t-
error-correcting pair. Take for instance A and B MDS codes with parameters
[n, t + 1, n − t] and [n, t, n − t + 1], respectively. Then k(A) t and d(B⊥
) t,
since B⊥
is an [n, n−t, t+1] code. Take C = (A∗B)⊥
. Then d(C) ≥ 2t+1 and
(A, B) is a t-error-correcting pair for C. Then the dimension of C is at least
n − t(t + 1) and is most of the time equal to this lower bound.
Remark 9.1.18 For a given code C it is hard to find a t-error-correcting pair
with t close to half the minimum distance. Generalized Reed-Solomon codes
have this property as we have seen in ?? and Algebraic geometry codes too as
we shall see in **** ??***. We conjecture that if an [n, n−2t, 2t+1] MDS code
has a t-error-correcting pair, then this code is a GRS code. This is proven in
the cases t = 1 and t = 2.
Proposition 9.1.19 Let C be an Fq-linear code of length n and minimum dis-
tance d. Then C has a t-error-correcting pair if t ≤ (n − 1)/(n − d + 2).
Proof. There exists an m and an Fqm -linear [n, n − d + 1, d] code D that
contains C, by Corollary 4.3.25. Let t be a positive integer such that t ≤
(n − 1)/(n − d + 2). It is sufficient to show that D has a t-error-correcting
pair. Let B be an [n, t, n − t + 1] code with the all one vector in it. Such a
code exists if m is sufficiently large. Then B⊥
is an [n, n − t, t + 1] code. So
d(B⊥
) t. Take A = (B ∗ D)⊥
. *** Now A is contained in D⊥
, since the all
one vector is in B, and D⊥
is an [n, d − 1, n − d + 2] code. So d(A) ≥ n − d + 2.
*** Now D⊥
⊆ A, since the all one vector is in B. We have that D⊥
is an
[n, d − 1, n − d + 2] code, so d(A) ≥ d(D⊥
) = n − d + 2. Hence d(A) + d(D) n.
Let b1, . . . , bt be a basis of B and d1, . . . , dn−d+1 be a basis of D. Then x ∈ A
if an only if x · (bi ∗ dj) = 0 for all i = 1, . . . , t and j = 1, . . . , n − d + 1. This is
system of t(n−d+1) homogeneous linear equations and n−t(n−d+1) ≥ t+1
by assumption. Hence k(A) ≥ n − t(n − d + 1) t. Therefore (A, B) is a
t-error-correcting pair for D and a fortiori for C.
9.1.3 Exercises
9.1.1 Choose α ∈ F16 such that α4
= α + 1 as primitive element of F16. Let
C = RS11(15, 0). Let
r = (α, 0, α11
, α10
, α5
, α13
, α, α8
, α5
, α10
, α4
, α4
, α2
, 0, 0)
be a received word with respect to the code C with 2 errors. Find the trans-
mitted codeword.
9.1.2 Consider the binary cyclic code of length 21 and defining set {0, 1, 3, 7}.
This code has minimum distance 8. Give a 3 error correcting pair for this code.
9.1.3 Consider the binary cyclic code of length 35 and defining set {1, 5, 7}.
This code has minimum distance 7. Give a 3 error correcting pair for this code.

9.2. DECODING BY KEY EQUATION 277
9.2 Decoding by key equation
In Section 7.5.5, we introduced Key equation. Now we introduce two algorithms
which solve the key equation, and thus decode cyclic codes efficiently.
9.2.1 Algorithm of Euclid-Sugiyama
In Section 7.5.5 we have seen that the decoding of BCH code with designed
minimum distance δ is reduced to the problem of finding a pair of polynomials
(σ(Z), ω(Z)) satisfying the following key equation for a given syndrome poly-
nomial S(Z) =
δ−1
i=1 SiZi−1
,
σ(Z)S(Z) ≡ ω(Z) (mod Zδ−1
)
such that deg(σ(Z)) ≤ t = (δ − 1)/2 and deg(ω(Z)) ≤ deg(σ(Z)) − 1. Here,
σ(Z) =
t+1
i=1 σiZi−1
is the error-locator polynomial, and ω(Z) =
t
i=1 ωiZi−1
is the error-evaluator polynomial. Note that σ1 = 1 by definition.
Given the key equation, the Euclid-Sugiyama algorithm (which is also called
Sugiyama algorithm in the literature) finds the error-locator and error-evaluator
polynomials, by an iterative procedure. This algorithm is based on the well-
known Euclidean algorithm. To better understand the algorithm, we briefly re-
view the Euclidean algorithm first. For a pair of univariate polynomials, namely,
r−1(Z) and r0(Z), the Euclidean algorithm finds their greatest common divisor,
which we denote by gcd(r−1(Z), r0(Z)). The Euclidean algorithm proceeds as
follows.
r−1(Z) = q1(Z)r0(Z) + r1(Z), deg(r1(Z)) deg(r0(Z))
r0(Z) = q2(Z)r1(Z) + r2(Z), deg(r2(Z)) deg(r1(Z))
...
...
...
rs−2(Z) = qs(Z)rs−1(Z) + rs(Z), deg(rs(Z)) deg(rs−1(Z))
rs−1(Z) = qs+1(Z)rs(Z).
In each iteration of the algorithm, the operation of rj−2(Z) = qj(Z)rj−1(Z) +
rj(Z), with deg(rj(Z)) deg(rj−1(Z)), is implemented by division of poly-
nomials, that is, dividing rj−2(Z) by rj−1(Z), with rj(Z) being the remain-
der. The algorithm keeps running, until it finds a remainder which is the zero
polynomial. That is, the algorithm stops after it completes the s-iteration,
where s is the smallest j such that rj+1(Z) = 0. It is easy to prove that
rs(Z) = gcd(r−1(Z), r0(Z)).
We are now ready to present the Euclid-Sugiyama algorithm for solving the key
equation.
Algorithm 9.2.1 (Euclid-Sugiyama Algorithm)
Input: r−1(Z) = Zδ−1
, r0(Z) = S(Z), U−1(Z) = 0, and U0(Z) = 1.
Proceed with the Euclidean algorithm for r−1(Z) and r0(Z), as presented above,
until an rs(Z) is reached such that
deg(rs−1(Z)) ≥
1
2
(δ − 1) and deg(rs(Z)) ≤
1
2
(δ − 3),

Update
Uj(Z) = qj(Z)Uj−1(Z) + Uj−2(Z).
Output: The following pair of polynomials:
σ(Z) = Us(Z)
ω(Z) = (−1)s
rs(Z)
where is chosen such that σ0 = σ(0) = 1.
Then the error-locator and evaluator polynomials are given as σ(Z) = Us(Z)
and ω(Z) = (−1)s
rs(Z). Note that the Euclid-Sugiyama algorithm does not
have to run the Euclidean algorithm completely; it has a diﬀerent stopping
parameter s.
Example 9.2.2 Consider the code C given in Examples 7.5.13 and 7.5.21. It
is a narrow sense BCH code of length 15 over F16 of designed minimum distance
δ = 5. Let r be the received word
r = (α5
, α8
, α11
, α10
, α10
, α7
, α12
, α11
, 1, α, α12
, α14
, α12
, α2
, 0)
Then S1 = α12
, S2 = α7
, S3 = 0 and S4 = α2
. So, S(Z) = α12
+ α7
Z + α2
Z3
.
Running the Euclid-Sugiyama algorithm with the input S(Z), the results for
each iteration are given by the following table.
j rj−1(Z) rj(Z) Uj−1(Z) Uj(Z)
0 Z4
α2
Z3
+ α7
Z + α12
0 1
1 α2
Z3
+ α7
Z + α12
α5
Z2
+ α10
Z 1 α13
Z
2 α5
Z2
+ α10
Z α2
Z + α12
α13
Z α10
Z2
+ Z + 1
Thus, we have found the error-locator polynomial as σ(Z) = U2(Z) = 1 + Z +
α12
Z2
, and the error-evaluator polynomial as ω(Z) = r2(Z) = α12
+ α2
Z.
9.2.2 Algorithm of Berlekamp-Massey
Consider again the following key equation
σ(Z)S(Z) ≡ ω(Z) (mod Zδ−1
)
such that deg(σ(Z)) ≤ t = (δ − 1)/2 and deg(ω(Z)) ≤ deg(σ(Z)) − 1; and
S(Z) =
δ−1
i=1 SiZi−1
is given.
It is easy to show that the problem of solving the key equation is equivalent to the
problem of solving the following matrix equation with unknown (σ2, . . . , σt+1)T

9.2. DECODING BY KEY EQUATION 279





St St−1 . . . S1
St+1 St . . . S2
...
...
...
S2t−1 S2t−2 . . . St










σ2
σ3
...
σt+1





= −





St+1
St+2
...
S2t





.
The Berlekamp-Massey algorithm which we will introduce in this section can
solve this matrix equation by finding σ2, . . . , σt+1 for the following recursion
Si = −
t+1
j=2
σiSi−j+1, i = t + 1, . . . , 2t.
We should point out that the Berlekamp-Massey algorithm actually solves a
more general problem, that is, for a given sequence E0, E1, E2, . . . , EN−1 of
length N (which we denote by E in the rest of the section), it finds the recursion
Ei = −
L
j=1
ΛjEi−j, i = L, . . . , N − 1,
for which L is smallest. If the matrix equation has no solution, the Berlekamp-
Massey algorithm then finds a recursion with L t.
To make it more convenient to present the Berlekamp-Massey algorithm and to
prove the correctness, we denote Λ(Z) =
L
i=0 ΛiZi
with Λ0 = 1. The above
recursion is denoted by (Λ(Z), L), and L = deg(Λ(Z)) is called the length of
the recursion.
The Berlekamp-Massey algorithm is an iterative procedure for finding the short-
est recursion for producing successive terms of the sequence E. The r-th iter-
ation of the algorithm finds the shortest recursion (Λ(r)
(Z), Lr) where Lr =
deg(Λ(r)
(X)), for producing the first r terms of the sequence E, that is,
Ei = −
Lr
j=1
Λ
(r)
j Ei−j, i = Lr, . . . , r − 1,
or equivalently,
Lr
j=0
Λ
(r)
j Ei−j = 0, i = Lr, . . . , r − 1,
with Λ
(r)
0 = 0.
Algorithm 9.2.3 (Berlekamp-Massey Algorithm)
(Initialization) r = 0, Λ(Z) = B(Z) = 1, L = 0, λ = 1, and b = 1.
1) If r = N, stop. Otherwise, compute ∆ =
L
j=0 ΛjEr−j
2) If ∆ = 0, then λ ← λ + 1, and go to 5).

3) If ∆ = 0 and 2L r, then
Λ(Z) ← Λ(Z) − ∆b−1
Zλ
B(Z)
λ ← λ + 1
and go to 5).
4) If ∆ = 0 and 2L ≤ r, then
T(Z) ← Λ(Z) (temporary storage of Λ(Z))
Λ(Z) ← Λ(Z) − ∆b−1
Zλ
B(Z)
L ← r + 1 − L
B(Z) ← T(Z)
b ← ∆
λ ← 1
5) r ← r + 1 and return to 1).
Example 9.2.4 Consider again the code C given in Example 9.2.2. Let r be
the received word
r = (α5
, α8
, α11
, α10
, α10
, α7
, α12
, α11
, 1, α, α12
, α14
, α12
, α2
, 0)
Then S1 = α12
, S2 = α7
, S3 = 0 and S4 = α2
.
Now let us compute the error-locator polynomial σ(Z) by using the Berlekamp-
Massey algorithm. Letting Ei = Si+1, for i = 0, 1, 2, 3, we have a sequence
E = {E0, E1, E2, E3} = {α12
, α7
, 0, α2
}, as the input of the algorithm. The
intermediate and ﬁnal results of the algorithm are given in the following table.
r ∆ B(Z) Λ(Z) L
0 α12
1 1 0
1 1 1 1 + α12
Z 1
2 α2
1 1 + α10
Z 1
3 α 1 + α10
Z 1 + α10
Z + α5
Z2
2
4 0 1 + α10
Z 1 + Z + α10
Z2
2
The result of the last iteration the Berlekamp-Massey algorithm, Λ(Z), is the
error-locator polynomial. That is
σ(Z) = σ1 + σ2Z + σ2Z2
= Λ(Z) = Λ0 + λ1Z + Λ2Z2
= 1 + Z + α10
Z2
.
Substituting this into the key equation, we then get ω(Z) = α12
+ α2
Z.
Proof of the correctness: will be done.
Complexity and some comparison between E-S and B-M algorithms:
will be done.

9.3. LIST DECODING BY SUDAN’S ALGORITHM 281
9.2.3 Exercises
9.2.1 Take α ∈ F∗
16 with α4
= 1 + α as primitive element. Let C be the BCH
code over F16, of length 15 and designed minimum distance 5, with defining set
{1, 2, 3, 4, 6, 8, 9, 12}. The generator polynomial is 1 + X4
+ X6
+ X7
+ X8
(see
Example 7.3.13). Let
r = (0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0)
be a received word with respect to the code C. Find the syndrome polynomial
S(Z). Write the key equation.
9.2.2 Consider the same code and same received word given in last exercise.
Using the Berlekamp-Massey algorithm, compute the error-locator polynomial.
Determine the number of errors occurred in the received word.
9.2.3 For the same code and same received word given the previous exer-
cises, using the Euclid-Sugiyama algorithm, compute the error-locator and error-
evaluator polynomials. Find the codeword which is closest to the received word.
9.2.4 Let α ∈ F∗
16 with α4
= 1 + α as in Exercise 9.2.1. For the following
sequence E over F16, using the Berlekamp-Massey algorithm, find the shortest
recursion for producing successive terms of E:
E = {α12
, 1, α14
, α13
, 1, α11
}.
9.2.5 Consider the [15, 9, 7] Reed-Solomon code over F16 with defining set
{1, 2, 3, 4, 5, 6}. Suppose the received word is
r = (0, 0, α11
, 0, 0, α5
, 0, α, 0, 0, 0, 0, 0, 0, 0).
Using the Berlekamp-Massey algorithm, find the codeword which is closest to
the received word.
9.3 List decoding by Sudan’s algorithm
A decoding algorithm is efficient if the complexity is bounded above by a poly-
nomial in the code length. Brute-force decoding is not efficient, because for a
received word, it may need to compare qk
codewords to return the most ap-
propriate codeword. The idea behind list decoding is that instead of returning
a unique codeword, the list decoder returns a small list of codewords. A list-
decoding algorithm is efficient, if both the complexity and the size of the output
list of the algorithm are bounded above by polynomials in the code length. List
decoding was first introduced by Elias and Wozencraft in 1950’s.
We now describe a list decoder more precisely. Suppose C is a q-ary [n, k, d] code,
t ≤ n is a positive integer. For any received word r = (r1, · · · , rn) ∈ Fn
q , we refer
to any codeword c in C satisfying d(c, r) ≤ t as a t-consistent codeword. Let l be
a positive integer less than or equal to qk
. The code C is called (t, l)-decodable,
if for any word r ∈ Fn
q the number of t-consistent codewords is at most l. If for
any received word, a list decoder can find all the t-consistent codewords, and
the output list has at most l codewords, then the decoder is called a (t, l)-list
decoder. In 1997 for the first time, Sudan proposed an efficient list-decoding
algorithm for Reed-Solomon codes. Later, Sudan’s list-decoding algorithm was
generalized to decoding algebraic-geometric codes and Reed-Muller codes.

9.3.1 Error-correcting capacity
Suppose a decoding algorithm can ﬁnd all the t-consistent codewords for any
received word. We call t the error-correcting capacity or decoding radius of the
decoding algorithm. As we have known in Section ??, for any [n, k, d] code, if
t ≤ d−1
2 , then there is only one t-consistent codeword for any received word. In
other words, any [n, k, d] code is d−1
2 , 1 -decodable. The decoding algorithms
in the previous sections return a unique codeword for any received word; and
they achieve an error-correcting capability less than or equal to d−1
2 . The
list decoding achieves a decoding radius greater than d−1
2 ; and the size of the
output list must be bounded above by a polynomial in n.
It is natural to ask the following question: For a [n, k, d] linear code C over
Fq, what is the maximal value t, such that C is (t, l)-decodable for a l which is
bounded above by a polynomial in n? In the following, we give a lower bound
on the maximum t such that C is (t, l)-decodable, which is called Johnson bound
in the literature.
Proposition 9.3.1 Let C ⊆ Fn
q be any linear code of minimum distance d =
(1 − 1/q)(1 − β)n for 0 β 1. Let t = (1 − 1/q)(1 − γ)n for 0 γ 1. Then
for any word r ∈ Fn
q ,
|Bt(r) ∩ C| ≤
min n(q − 1), (1 − β)/(γ2
− β) , when γ
√
β
2n(q − 1) − 1, when γ =
√
β
where, Bt(r) = {x ∈ Fn
q | d(x, r) ≤ t} is the Hamming ball of radius t around r.
We will prove this proposition later. We are now ready to state the Johnson
bound.
Theorem 9.3.2 For any linear code C ⊆ Fn
q of relative minimum distance
δ = d/n, it is (t, l(n))-decodable with l(n) bounded above by a linear function in
n, provided that
t
n
≤ 1 −
1
q
1 − 1 −
q
q − 1
δ .
Proof. For any received word r ∈ Fn
q , the set of t-consistent codewords
{c ∈ C | d(c, r) ≤ t} = Bt(r) ∩ C.
Let β be a positive real number and β 1. Denote d = (1 − 1/q) (1 − β)n. Let
t = (1 − 1/q) (1 − γ)n for some 0 γ 1. Suppose
t
n
≤ 1 −
1
q
1 − 1 −
q
q − 1
δ .
Then, γ ≥ 1 − q
q−1 · d
n =
√
β. By Proposition 9.3.1, the number of t-consistent
codewords, l(n), which is |Bt(r) ∩ C|, is bounded above by a polynomial in n,
here q is viewed as a constant.
Remark 9.3.3 The classical error-correcting capability is t = d−1
2 . For a
linear [n, k] code of minimum distance d, we have d ≤ n − k + 1 (Note that for
Reed-Solomon codes, d = n − k + 1). Thus, the normalized capability
τ =
t
n
≤
1
n
·
n − k
2
≈
1
2
−
1
2
κ

where κ is the code rate.
Let us compare this with the Johnson bound. From Theorem 9.3.2 and d ≤
n − k + 1, the Johnson bound is
1 − 1
q 1 − 1 − q
q−1 δ
≤ 1 − 1
q 1 − 1 − q
q−1 (1 − k
n + 1
n )
≈ 1 −
√
κ
for large n and large q. A comparison is given by the following Figure 9.1.
Figure 9.1: Classical error-correcting capability v.s. the Johnson bound.
To prove Proposition 9.3.1, we need the following lemma.
Lemma 9.3.4 Let v1, . . . , vm be m non-zero vectors in the real N-dimensional
space, RN
, satisfying that the inner product, vi ·vj ≤ 0, for every pair of distinct
vectors. Then we have the following upper bounds on m:
(1) m ≤ 2N.
(2) If there exists a non-zero vector u ∈ RN
such that u · vi ≥ 0 for all
i = 1, . . . , m. Then m ≤ 2N − 1.
(3) If there exists an u ∈ RN
such that u · vi 0 for all i = 1, . . . , m. Then
m ≤ N.
Proof. It is clear that (1) follows from (2). Suppose (2) is true. By viewing
−v1 as u, the conditions of (2) are all satisﬁed. Thus, we have m − 1 ≤ 2N − 1,
that is, m ≤ 2N.

To prove (2), we will use induction on N. When N = 1, it is obvious that
m ≤ 2N −1 = 1. Otherwise, by the conditions, there are non-zero real numbers
u, v1 and v2 such that u·v1 0 and u·v2 0, but v1 ·v2 0. This is impossible.
Now considering N 1, we can assume that m ≥ N + 1 (because if m ≤ N,
the result m ≤ 2N − 1 already holds). As the vectors v1, . . . , vm are all in RN
,
they must be linearly dependant. Let S ⊆ {1, 2, . . . , m} be a non-empty set of
minimum size for which there is a relation i∈S aiv1 = 0 with all ai = 0. We
claim that the ais must all be positive or all be negative. In fact, if not, we
collect the terms with positive ais on one side and the terms with negative ais
on the other. We then get an equation i∈S+ aivi = j∈S− bjvj (which we
denote by w) with ai and bj all are positive, where S+
and S−
are disjoint non-
empty sets and S+
∪ S−
= S. By the minimality of S, w = 0. Thus, the inner
product w · w = 0. On the other hand, w · w = ( i∈S+ aivi) · ( j∈S− bjvj) =
i,j(aibj)(vi · vj) ≤ 0 since aibj 0 and vi · vj ≤ 0. This contradiction shows
that ais must all be positive or all be negative. Following this, we actually can
assume that ai 0 for all i ∈ S (otherwise, we can take ai = −ai for a relation
i∈S aiv1 = 0).
Without loss of generality, we assume that S = {1, 2, . . . , s}. By the linear
dependance
s
i=1 aivi = 0 with each ai 0 and minimality of S, the vectors
v1, . . . , vs must span a subspace V of RN
of dimension s − 1. Now, for l =
s + 1, . . . , m, we have
s
i=1 aivi · vl = 0, as
s
i=1 aivi = 0. Since ai 0 for
1 ≤ i ≤ s and all vi · vl ≤ 0, we have that vi is orthogonal to vl for all i, l with
1 ≤ i ≤ s and s l ≤ m. Similarly, we can prove that u is orthogonal to vi
for i = 1, . . . , s. Therefore, the vectors vs+1, . . . , vm and u are all in the dual
space V ⊥
which has dimension (N − s + 1). As s 1, applying the induction
hypothesis to these vectors, we have m − s ≤ 2(N − s + 1) − 1. Thus, we have
m ≤ 2N − s + 1 ≤ 2N − 1.
Now we prove (3). Suppose the result is not true, that is, m ≥ N + 1. As
above, v1, . . . , vm must be linearly dependant RN
. Let S ⊆ {1, 2, . . . , m} be
a non-empty set of minimum size for which there is a relation i∈S aiv1 = 0
with all ai = 0. Again, we can assume that ai 0 for all ai ∈ S. From this, we
have
s
i=1 aiu · vi = 0. But this is impossible since for each i we have ai 0
and u · vi 0. This contradiction shows m ≤ N.
Now we are ready to prove Proposition 9.3.1.
Proof of Proposition 9.3.1. We identify vectors in Fn
q with vectors in Rqn
in
the following way: First, we set an ordering for the elements of Fq, and denote
the elements as α1, α2, . . . , αq. Denote by ord(β) the order of element β ∈ Fq
under this ordering. For example, ord(β) = i, if β = αi Then, each element αi
(1 ≤ i ≤ q) corresponds to the real unit vector of length q with a 1 in position
i and 0 elsewhere.
Without loss of generality, we assume that r = (αq, αq, . . . , αq). Denote by
c1, c2, . . . , cm all the codewords of C that are in the Hamming ball Bt(r) where
t = (1 − 1/q)(1 − γ)n for some 0 γ 1.
We view each vector in Rqn
as having n blocks each having q components,
where the n blocks correspond to the n positions of the vectors in Fn
q . For
each l = 1, . . . , q, denote by el the unit vector of length q with 1 in the lth
position and 0 elsewhere. For i = 1, . . . , m, the vector in Rqn
associated with

the codeword ci, which we denote by di, has in its jth block the components
of the vector eord(ci[j]), where ci[j] is the jth component of ci. The vector in
Rqn
associated with the word r ∈ Fn
q , which we denote by s, is defined similarly.
Let 1 ∈ Rqn
be the all 1’s vector. We define v = λs + (1−λ)
q 1 for 0 ≤ λ ≤ 1
that will be specified later. We observe that di and v are all in the space
defined by the intersection of the hyperplanes Pj = {x ∈ Rqn
|
q
l=1
xj,l = 1}
for j = 1, . . . , n. This fact implies that the vectors (di − v), for i = 1, . . . , m,
are all in P =
n
j=1 Pj where Pj = {x ∈ Rqn
|
q
l=1
xj,l = 0}. As P is an
n(q − 1)-dimensional subspace of Rqn
, we have that the vectors (d1 − v), for
i = 1, . . . , m, are all in an n(q − 1)-dimensional space.
We will set the parameter λ so that the n(q − 1)-dimensional vectors (di − v),
i = 1, . . . , m, have all pairwise inner products less than 0. For i = 1, . . . , m, let
ti = d(ci, r). Then ti ≤ t for every i, and
di · v = λ(di · s) + (1−λ)
q (di · 1)
= λ(n − ti) + (1 − λ)n
q ≥ λ(n − t) + (1 − λ)n
q ,
(9.1)
v · v = λ2
n + 2(1 − λ)λ
n
q
+ (1 − λ)2 n
q
=
n
q
+ λ2
(1 −
1
q
)n, (9.2)
di · dj = n − d(ci, cj) ≤ n − d, (9.3)
which implies that for i = j,
(di − v) · (dj − v) ≤ 2λt − d + (1 −
1
q
)(1 − λ)2
n. (9.4)
Substituting t = (1 − 1/q)(1 − γ)n and d = (1 − 1/q)(1 − β)n into the above
inequation, we have
(di − v) · (dj − v) ≤ (1 −
1
q
)n(β + λ2
− 2λγ). (9.5)
Thus, if γ 1
2 (β
λ + λ), we will have all pairwise inner products to be negative.
We pick λ to minimize (β
λ + λ) by setting λ =
√
β. Now when γ
√
β, we have
(di − v) · (dj − v) 0 for i = j.
9.3.2 Sudan’s algorithm
The algorithm of Sudan is applicable to Reed-Solomon codes, Reed-Muller
codes, algebraic-geometric codes, and some other families of codes. In this
sub-section, we give a general description of the algorithm of Sudan.
Consider the following linear code
C = {(f(P1), f(P2), · · · , f(Pn)) | f ∈ Fq[X1, . . . , Xm] and deg(f) k} ,
where Pi = (xi1, . . . , xim) ∈ Fm
q for i = 1, . . . , n, and n ≤ qm
. Note that when
m = 1, the code is a Reed-Solomon code or an extended Reed-Solomon code;
when m ≥ 2, it is a Reed-Muller code.

In the following algorithm and discussions, to simplify the statement we denote
(i1, . . . , im) by i, Xi1
1 · · · Xim
m by Xi
, H(X1 +x1, . . . , Xm +xm, Y +y) by H(X +
x, Y + y), j1
i1
· · · jm
im
by j
i , and so on.
Algorithm 9.3.5 (The Algorithm of Sudan for List Decoding)
INPUT: The following parameters and a received word:
• Code length n and the integer k;
• n points in Fm
q , namely, Pi := (xi1, . . . , xim) ∈ Fm
q , i = 1, . . . , n;
• Received word r = (y1, . . . , yn) ∈ Fn
q ;
• Desired error-correcting radius t.
Step 0: Compute parameters r, s satisfying certain conditions that
we will give for specific families of codes in the following
sub-sections.
Step 1: Find a nonzero polynomial H(X, Y ) = H(X1, . . . , Xm, Y ) such
that
• The (1, . . . , 1, k−1)-weighted degree of H(X1, . . . , Xm, Y ) is at
most s;
• For i = 1, . . . , n, each point, (xi, yi) = (xi1, . . . , xim, yi) ∈ Fm+1
q ,
is a zero of H(X, Y ) of multiplicity r.
Step 2: Find all the Y -roots of H(X, Y ) of degree less than k,
namely, f = f(X1, . . . , Xm) of deg(f) k such that H(X, f) is a
zero polynomial. For each such root, check if f(Pi) = yi for at
least n − t values of i ∈ {1, . . . , n}. If so, include f in the
output list.
As we will see later, for an appropriately selected parameter t, the algorithm of
Sudan can return a list containing all the t-consistent codewords in polynomial
time, with the size of the output list bounded above by a polynomial in code
length. So far, the best known record of error-correcting radius of list decoding
by Sudan’s algorithm is the Johnson bound. In order to achieve this bound,
prior to the actual decoding procedure, that is, Steps 1 and 2 of the algorithm
above, a pair of integers, r and s, should be carefully chosen, which will be
used to find an appropriate polynomial H(X, Y ). The parameters r and s are
independent on received words. They are used for the decoding procedure for
any received word, as long as they are determined. The actual decoding proce-
dure consists two steps: Interpolation and Root Finding. By the interpolation
procedure, a nonzero polynomial H(X, Y ) is found. This polynomial contains
all the polynomials which define the t-consistent codewords as its Y -roots. A
Y -root of H(X, Y ) is a polynomial f(X) satisfying that H(X, f(X)) is a zero
polynomial. The root-finding procedure finds and returns all these Y -roots;
thus all the t-consistent codewords are found.
We now explain the terms: weighted degree and multiplicity of a zero of a poly-
nomial, which we have used in the algorithm. Given integers a1, a2, . . . , al, the

(a1, a2, . . . , al)-weighted degree of a monomial αXd1
1 Xd2
2 · · · Xdl
l (where α is the
coefficient of the monomial), is a1d1 + a2d2 + . . . + aldl. The (a1, a2, . . . , al)-
weighted degree of a polynomial P(X1, X2, . . . , Xl) is the maximal (a1, a2, . . . , al)-
weighted degree of its terms.
For a polynomial P(X) = α0 + α1X + α2X2
+ . . . αdXd
, it is clear that 0 is
a zero of P(X), i.e., P(0) = 0, if and only if α0 = 0. We say 0 is a zero
of multiplicity r of P(X), provided that α0 = α1 = . . . = αr−1 = 0 and
αr = 0. For a nonzero value β, it is a zero of multiplicity r of P(X), provided
that 0 is a zero of multiplicity r of P(X + β). Similarly, for a multivariate
polynomial P(X1, X2, . . . , Xl) = αi1,i2,...,il
Xi1
1 Xi2
2 · · · Xil
l , (0, 0, . . . , 0) is a
zero of multiplicity r of this polynomial, if and only if for all (i1, i2, . . . , il) with
i1 + i2 + . . . + il ≤ r − 1,
αi1,i2,...,il
= 0
and there exists (i1, i2, . . . , il) with i1 +i2 +. . .+il = r such that αi1,i2,...,il
= 0.
A point (β1, β2, . . . , βl) is a zero of multiplicity r of P(X1, X2, . . . , Xl), provided
that (0, 0, . . . , 0) is a zero of multiplicity r of P(X1 + β1, X2 + β2, . . . , Xl + βl).
Now we consider a polynomial that the Step 1 of Algorithm 9.3.5 seeks. Suppose
H(X, Y ) = αi,im+1
Xi
Y im+1
is nonzero polynomial in X1, . . . , Xm, Y . It is
easy to prove that (we leave the proof to the reader as an exercise), for xi =
(xi1, . . . , xim) and yi,
H(X + xi, Y + yi) =
j,jm+1
βj,jm+1
Xj
Y jm+1
where
βj,jm+1
=
j1≥j1
· · ·
jm≥jm jm+1≥jm+1
j
j
jm+1
jm+1
αj ,jm+1
xj −j
i y
jm+1−jm+1
i .
Step 1 of the algorithm seeks a nonzero polynomial H(X, Y ) such that its
(1, . . . , 1, k − 1)-weighted degree is at most s, and for i = 1, . . . , n, each (x, yi)
is a zero of H(X, Y ) of multiplicity r. Based on the discussion above, this
can be done by solving a system consisting of the following homogeneous linear
equations in unknowns αi,im+1 (which are coefficients of H(X, Y )),
j1≥j1
· · ·
jm≥jm jm+1≥jm+1
j
j
jm+1
jm+1
αj ,jm+1
xj −j
i y
jm+1−jm+1
i = 0.
for all i = 1, . . . , n, and for every j1, . . . , jm, jm+1 ≥ 0 with j1 +· · ·+jm +jm+1 ≤
r − 1; and
αi,im+1 = 0,
for every i1, . . . , im, im+1 ≥ 0 with i1 + · · · + im + (k − 1)im+1 ≥ s + 1.
9.3.3 List decoding of Reed-Solomon codes
A Reed-Solomon code can be defined as a cyclic code generated by a generator
polynomial (see Definition 8.1.1) or as an evaluation code (see Proposition 8.1.4).
For the purpose of list decoding by Sudan’s algorithm, we view Reed-Solomon

codes as evaluation codes. Note that since any non-zero element α ∈ Fq satisfies
αn
= α, we have ev(Xn
f(X)) = ev(f(X)) for any f(X) ∈ Fq[X], where n =
q − 1. Therefore,
RSk(n, 1) = {f(x1), f(x2), . . . , f(xn) | f(X) ∈ Fq[X], deg(f) k}
where x1, . . . , xn are n distinct nonzero elements of Fq.
In this subsection, we consider the list decoding of Reed-Solomon codes RSk(n, 1)
and extended Reed-Solomon codes ERSk(n, 1), that is, the case m = 1 of the
general algorithm, Algorithm 9.3.5. As we will discuss later, Sudan’s algorithm
can be adapted to list decoding generalized Reed-Solomon codes (see Definition
8.1.10).
The correctness and error-correcting capability of list-decoding algorithm are
dependent on the parameters r and s. In the following, we first prove the
correctness of the algorithm for appropriate choice of r and s. Then we calculate
the error-correcting capability.
We can prove the correctness of the list-decoding algorithm by proving: (1)
There exists a nonzero polynomial H(X, Y ) satisfying the conditions given in
Step 1 of Algorithm 9.3.5; and (2) All the polynomials f(X) satisfying the
conditions in Step 2 are the Y -roots of H(X, Y ), that is, Y − f(X) divides
H(X, Y ).
Proposition 9.3.6 Consider a pair of parameters r and s.
(1) If r and s satisfy
n
r + 1
2

s(s + 2)
2(k − 1)
.
Then, a nonzero polynomial H(X, Y ) sought in Algorithm 9.3.5 does exist.
(2) If r and s satisfy
r(n − t) s.
Then, for any polynomial f(X) of degree at most k−1 such that f(xi) = yi
for at least n − t values of i ∈ {1, 2, . . . , n}, the polynomial H(X, Y ) is
divisible by Y − f(X).
Proof. We first prove (1). As discussed in the previous subsection, a nonzero
polynomial H(X, Y ) exists as long as we have a nonzero solution of a sys-
tem of homogeneous linear equations in unknowns αi1,i2
, i.e., the coefficients of
H(X, Y ). A nonzero solution of the system exists, provided that the number
of equations is strictly less than the number of unknowns. From the precise
expression of the system (see the end of last subsection), it is easy to calcu-
late the number of equations, which is n r+1
2 . Next, we compute the number
of unknowns. This number is equal to the number of monomials Xi1
Y i2
of

(1, k − 1)-weighted degree at most s, which is
s
k−1
i2=0
s−i2(k−1)
i1=0
1
=
s
k−1
i2=0
(s + 1 − i2(k − 1))
= (s + 1) s
k−1 + 1 − k−1
2
s
k−1
s
k−1 + 1
≥ s
k−1 + 1 s
2 + 1
≥ s
k−1 · s+2
2
where x stands for the maximal integer less than or equal to x. Thus, we
proved (1).
We now prove (2). Suppose H(X, f(X)) is not zero polynomial. Denote h(X) =
H(X, f(X)). Let I = {i | 1 ≤ i ≤ n and f(xi) = yi}. We have |I| ≥ n − t.
For any i = 1, . . . , n, as (xi, yi) is a zero of H(X, Y ) of multiplicity r, we can
express H(X, Y ) =
j1+j2≥r
γj1,j2
(X − xi)j1
(Y − yi)j2
. Now, for any i ∈ I, we
have f(X) − yi = (X − xi)f1(X) for some f1(X), because f(xi) − yi = 0. Thus,
we have
h(X) =
j1+j2≥r
γj1,j2 (X−xi)j1
(f(X)−yi)j2
=
j1+j2≥r
γj1,j2 (X−xi)j1+j2
(f1(X))j2
.
This implies that (X − xi)r
divides h(X). Therefore, h(X) has a factor g(X) =
i∈I(X − xi)r
, which is a polynomial of degree at least r(n − t). On the other
hand, since H(X, Y ) has (1, k − 1)-weighted degree at most s and the degree
of f(X) is at most k − 1, the degree of h(X) is at most s, which is less than
r(n − t). This is impossible. Therefore, H(X, f(X) is a zero polynomial, that
is, Y − f(X) divides H(X, Y ).
Proposition 9.3.7 If t satisfies (n − t)2
n(k − 1), then there exist r and s
satisfying both n r+1
2 s(s+2)
2(k−1) and r(n − t) s.
Proof. Set s = r(n − t) − 1. It suffices to prove that there exists r satisfying
n
r + 1
2

(r(n − t) − 1)(r(n − t) + 1)
2(k − 1)
which is equivalent to the following inequivalent
((n − t)2
− n(k − 1)) · r2
− n(k − 1) · r − 1 0.
Since (n − t)2
− n(k − 1) 0, any integer r satisfying
r
n(k − 1) + n2(k − 1)2 + 4(n − t)2 − 4n(k − 1)
2(n − t)2 − 2n(k − 1)
satisfies the inequivalent above. Therefore, for the list-decoding algorithm to be
correct it suffices by setting the integers r and s as
r =
n(k − 1) + n2(k − 1)2 + 4(n − t)2 − 4n(k − 1)
2(n − t)2 − 2n(k − 1)
+ 1

and s = r(n − t) − 1.
We give the following result, Theorem 9.3.8, which is a straightforward corollary
of the two propositions.
Theorem 9.3.8 For a [n, k] Reed-Solomon or extended Reed-Solomon code the
list-decoding algorithm, Algorithm 9.3.5, can correctly find all the codewords c
within distance t from the received word r, i.e., d(r, c) ≤ t, provided
t n − n(k − 1).
Remark 9.3.9 Note that for a [n, k] Reed-Solomon code, the minimum distance
d = n − k + 1 which implies that k − 1 = n − d. Substituting this into the bound
on error-correcting capability in the theorem above, we have
t
n
1 − 1 −
d
n
.
This shows that the list-decoding of Reed-Solomon codes achieves the Johnson
bound (see Theorem 9.3.2).
Regarding the size of the output list of the list-decoding algorithm, we have the
following theorem.
Theorem 9.3.10 Consider a [n, k] Reed-Solomon or extended Reed-Solomon
code. For any t n− n(k − 1) and any received word, the number of t-consist
codewords is O(
√
n3k).
Proof. From Proposition 9.3.6, we actually have proved that the number N of
the t-consist codewords is bounded from above by the degree degY (H(X, Y )).
Since the (1, k − 1)-weighted degree of H(X, Y ) is at most s, we have N ≤
degY (H(X, Y )) ≤ s
(k−1) . By the choices of r and s, s
(k−1) = O n(k−1)(n−t)
k−1 =
O(n(n−t)). Corresponding to the largest permissible value of t for the t-consist
codewords, we can choose n − t = n(k − 1) + 1. Thus,
N = O(n(n − t)) = O( n3(k − 1)) = O(
√
n3k).
Let us analyze the complexity of the list decoding of a [n, k] Reed-Solomon
code. As we have seen, the decoding algorithm consists of two main steps.
Step 1 is in fact reduced to a problem of solving a system of homogeneous
linear equations, which can be implemented making use of Gaussian elimination
with time complexity O s(s+2)
2(k−1)
3
= O(n3
) where s(s+2)
2(k−1) is the number of
unknowns of the system of homogeneous linear equations, and s is given as in
Proposition 9.3.7.
The second step is a problem of finding Y -roots the polynomial H(X, Y ). This
can be implemented by using a fast root-finding algorithm proposed by Roth
and Ruckenstein in time complexity O(nk).

9.3.4 List Decoding of Reed-Muller codes
We consider the list decoding of Reed-Muller codes in this subsection. Let
n = qm
and P1, . . . , Pn be an enumeration of the points of Fm
q . Recall that the
q-ary Reed-Muller code RMq(u, m) of order u in m variables is defined as
RMq(u, m) = {(f(P1), . . . , f(Pn)) | f ∈ Fq[X1, . . . , Xm], deg(f) ≤ u}.
Note that when m = 1, the code RMq(u, 1) is actually an extended Reed-
Solomon code.
From Proposition 8.4.4, RMq(u, m) is a subfield subcode of RMqm (n − d, 1)
where d be the minimum distance of RMq(u, m), that is
RMq(u, m) ⊆ RMqm (n − d, 1) ∩ Fn
q .
Here RMqm (n − d, 1) is an extended Reed-Solomon code over Fqm of length
n and dimension k = n − d + 1. We now give a list-decoding algorithm for
RMq(u, m) as follows.
Algorithm 9.3.11 (List-Decoding Algorithm for Reed-Muller Codes)
INPUT: Code length n and a received word r = (y1, . . . , yn) ∈ Fn
q .
Step 0: Do the following:
(1) Compute the minimum distance d of RMq(u, m) and a parameter
t = n − n(n − d) − 1 .
(2) Construct the extension field Fqm using an irreducible
polynomial of degree m over Fq.
(3) Generate the code RMqm (n − d, 1).
(4) Construct a parity check matrix H over Fq for the code
RMq(u, m).
Step 1: Using the list-decoding algorithm for Reed-Solomon codes
over Fqm , find L(1)
, the set of all codewords c ∈ RMqm (n − d, 1)
satisfying
d(c, r) ≤ t.
Step 2: For every c ∈ L(1)
, check if c ∈ Fn
q , if so, append c to L(2)
.
Step 3: For every c ∈ L(2)
, check if HcT
= 0, if so, append c to L.
Output L.
From Theorems 9.3.8 and 9.3.10, we have the following theorem.
Theorem 9.3.12 Denote by d the minimum distance of the q-ary Reed-Muller
code RMq(u, m). Then RMq(u, m) is (t, l)-decodable, provided that
t n − n(n − d) and l = O( (n − d)n3).
The algorithm above correctly finds all the t-consistent codewords for any re-
ceived vector r ∈ Fn
q .

Remark 9.3.13 Note that Algorithm 9.3.11 outputs a set of t-consistent code-
words of the q-ary Reed-Muller code defined by the enumeration of points of
Fm
q , say P1, P2, . . . , Pn, specified in Section 7.4.2. If RMq(u, m) is defined
by another enumeration of the points of Fm
q , say P1, P2, . . . , Pn, we can get
the correct t-consistent codewords by the following steps: (1) Find the permu-
tation π such that Pi = Pπ(i), i = 1, 2, . . . , n, and the inverse permutation
π−1
. (2) Let r∗
= (rπ(1), rπ(2), . . . , rπ(n)). Then, go to Steps 0-2 of Algo-
rithm 9.3.11 with r∗
. (3) For every codeword c = (c1, c2, . . . , cn) ∈ L, let
π−1
(c) = (cπ−1(1), cπ−1(2), . . . , cπ−1(n)). Then, π−1
(L) = {π−1
(c) | c ∈ L} is
the set of t-consistent codewords of RMq(u, m).
Now, let us consider the complexity of Algorithm 9.3.11 In Step 0, to construct
the extension field Fqm , it is necessary to find an irreducible polynomial g(x)
of degree m over Fq. It is well known that there are efficient algorithms for
finding irreducible polynomials over finite fields. For example, a probabilistic
algorithm proposed by V. Shoup in 1994 can find an irreducible polynomial of
degree m over Fq with expected number of O((m2
logm + mlogq)logmloglogm)
field operations in Fq.
To generate the Reed-Solomon code GRSn−d+1(a, 1) over Fqm , we need to find
a primitive element of Fqm . With a procedure by I.E. Shparlinski in 1993, a
primitive element of Fqm can be found in deterministic time O((qm
)1/4+ε
) =
O(n1/4+ε
), where n = qm
is the length of the code, ε denotes an arbitrary
positive number.
Step 1 of Algorithm 9.3.11 can be implemented using the list-decoding algo-
rithm in for Reed-Solomon code GRSn−d+1(a, 1) over Fqm . From the previous
subsection, it can be implemented to run in O(n3
) field operations in Fqm .
So, the implementation of Algorithm 9.3.11 requires O(n) field operations in Fq
and O(n3
) field operations in Fqm .
9.3.5 Exercises
9.3.1 Let P(X1, . . . , Xl) =
i1,...,il
αi1,...,il
Xi1
1 · · · Xil
l be a polynomial in vari-
ables X1, . . . , Xl with coefficients αi1,...,il
in a field F. Prove that for any
(a1, . . . , al) ∈ Fl
,
P(X1 + a1, . . . , Xl + al) =
j1,...,jl
βj1,...,jl
Xj1
1 · · · Xjl
l
where
βj1,...,jl
=
j1≥j1
· · ·
jl≥jl
j1
j1
· · ·
jl
jl
αj1,...,jl
a
j1−j1
1 · · · a
jl−jl
l .
9.4 Notes
Many cyclic codes have error-correcting pairs, for this we refer to Duursma and
Kötter [53, 54].

9.4. NOTES 293
The algorithms of Berlekamp-Massey [11, 79] and Sugiyama [118] both have
O(t2
) as an estimation of the complexity, where t is the number of corrected
errors. In fact the algorithms are equivalent as shown in [50, 65]. The application
of a fast computation of the gcd of two polynomials in [4, Chap. 16, §8.9] in
computing a solution of the key equation gives as complexity O(t log2
(t)) by
[69, 104].

Chapter 10
Cryptography
Stanislav Bulygin
This chapter is aiming at giving an overview of topics from cryptography. In par-
ticular, we cover symmetric as well as asymmetric cryptography. When talking
about symmetric cryptography, we concentrate on a notion of a block cipher,
as a mean to implement symmetric cryptosystems in practical environments.
Asymmetric cryptography is represented by the RSA and El Gamal cryptosys-
tems, as well as code-based cryptosystems due to McEliece and Niederreiter. We
also take a look at other aspects such as authentication codes, secret sharing,
and linear feedback shift registers. The material of this chapter is quite ba-
sic, but we elaborate more on several topics. Especially we should connections
to codes and related structures where applicable. The basic idea of algebraic
attacks on block ciphers is considered in the next chapter, Section 11.3.
10.1 Symmetric cryptography and block ciphers
10.1.1 Symmetric cryptography
This section is devoted to the Symmetric cryptosystems. The idea behind these
is quite simple and thus was basically known for quite along time. The task is
to convey a secret between two parties, called traditionally Alice and Bob, so
that figuring the secret out is not possible without knowledge of some additional
information. This additional information is called a secret key and is supposed to
be known only to the two communicating parties. The secrecy of the transmitted
message rests entirely upon the knowledge of this secret key, and thus if an
adversary or an eavesdropper, traditionally called Eve, is able to find out the
key, then the whole secret communication is corrupted. Now let us take a look
at the formal definition.
Definition 10.1.1 The symmetric cryptosystem is defined by the following
data:
• The plaintext space P and the ciphertext space C.
295

296 CHAPTER 10. CRYPTOGRAPHY
• {Ee : P → C|e ∈ K} and {Dd : C → P|d ∈ K} are the sets of encryption
and decryption transformations, which are bijections from P to C and from
C to P resp.
• The above transformations are parametrized by the key space K.
• Given an associated pair (e, d), so that a property ∀p ∈ P : Dd(Ee(p)) = p
holds, knowing e it is ”computationally easy” to find out d and vise versa.
The pair (e, d) is called secret key. Moreover, e is called the encryption key and
d is called the decryption key.
Note that often the counterparts e and d coincide. This gives a reason for the
name ”symmetric”. There exist also cryptosystems in which knowledge of an
encryption key e does not reveal (i.e. it is ”computationally hard” to find) an
associated decryption key d. So encryption keys can be made public, and such
cryptosystems are called asymmetric or public see Section 10.2.
Of course, one should specify exactly what are P, C, K and the transformations.
Let us take a look at a concrete example.
Example 10.1.2 The first use of a symmetric cryptosystem is conventionally
attributed to Julius Caesar. He used the following cryptosystem for communi-
cation with his generals, which is historically called Caesar cipher. Let P and C
be the sets of all strings composed of letters from the English (Latin for Caesar)
alphabet A = {A, B, C, . . . , Z}. Let K = {0, 1, 2, . . . , 25}. Now an encryption
transformation Ee given a plaintext p = (p1, . . . , pn), pi ∈ A, i = 1, . . . , n does
the following. For each i = 1, . . . , n one determines a position of pi in the al-
phabet A (”A” being 0, ”B” being 1, . . . , ”Z” being 25). Next one finds a
letter in A that stands e positions to the left, thus finding a letter ci; one needs
to overlap if the beginning of A is reached. So with the enumeration of A as
above, we have ci = pi − e ( mod 26). In this way a ciphertext c = (c1, . . . , cn)
is obtained. Decryption key is given by d = −e ( mod 26), or, equivalently, for
decryption one needs to shift letters e positions to the right.
Julius Caesar used e = 3 for his cryptosystem. Let us consider an example. For
the plaintext p =”BRUTUS IS AN ASSASSIN”, the ciphertext (if we ignore
spaces during the encryption) looks like c =”YORQRP FP XK XOOXOOFK”.
To decrypt one simply shifts back 3 positions to the right.
10.1.2 Block ciphers. Simple examples
The above is a simple example of a so-called substitution cipher, which is in
turn an instance of a block cipher. Block ciphers, among other things, provide
a practical realization of symmetric cryptosystems. They can also be used for
constructing other cryptographic primitives, like pseudorandom number genera-
tors, authentication codes (Section 10.3), hash functions. The formal definition
follows.
Definition 10.1.3 The n-bit block cipher is defined as a mapping E : An
×K →
An
, where A is an alphabet set and K is the key space and for each k ∈ K the
mapping E(·, k) =: Ek : {0, 1}n
→ {0, 1}n
is invertible. Ek is the encryption
transformation for the key k, and E−1
k = Dk is the decryption transformation.
If Ek(p) = c, then c is the ciphertext of the plaintext p under the key k.

10.1. SYMMETRIC CRYPTOGRAPHY AND BLOCK CIPHERS 297
It is common to work with the binary alphabet, i.e. A = {0, 1}. In such case,
ideally we would like to have a block cipher that is random in the sense that it
implements all (2n
)! bijections from {0, 1}n
to {0, 1}n
. In practice, though, it
is quite expensive to have such a cipher. So when designing a block cipher we
care that it behaves like a random one, i.e. for a randomly chosen key k ∈ K
the encryption transformation Ek should appear random. If one is able to ﬁnd
distinctions of Ek, where k is in some subset Kweak, from the random transfor-
mation, then it is an evidence of a weakness of the cipher. Such subset Kweak
is called the subset of weak keys; we will turn back to this later when talking
about DES.
Now we present several simple examples of block ciphers. We consider permu-
tation and substitution ciphers that were used quite intensively in the past (see
Notes) and some fundamental ideas thereof appear also in the modern ciphers.
Example 10.1.4 (Permutation or transposition cipher) The idea of this cipher
is to partition the plaintext into blocks and perform a permutation of elements
in that block. More formally, partition the plaintext into blocks of the form
p = p1 . . . pt and then permute: c = Ek(p) = pk(1), . . . , pk(t). A number t is
called a period of the cipher. The key space K now is the set of all permuta-
tions of {1, . . . , t} : K = St. For example let the plaintext be p =”CODING
AND CRYPTO”, let t = 5, and k = (4, 2, 5, 3, 1). If we remove the spaces and
partition p into 3 blocks we obtain c =”INCODDCGANTORYP”. Used alone
permutation cipher does not provide good security (see below), but in combina-
tion with other techniques it is used also in modern ciphers to provide diﬀusion
in a ciphertext.
Example 10.1.5 We can use Sage system to run the previous example. The
code looks as follows.
S = AlphabeticStrings()
E = TranspositionCryptosystem(S,5)
K = PermutationGroupElement(’(4,2,5,3,1)’)
L = E.inverse_key(K)
M = S(CODINGANDCRYPTO)
e = E(K)
c = E(L)
e(M)
INCODDCGANTORYP
c(e(M))
CODINGANDCRYPTO
One can also choose a random key for encryption:
KR = E.random_key()
KR
(1,4,2,3)
LR = E.inverse_key(KR)
LR
(1,3,2,4)
eR = E(KR)
cR = E(LR)
eR(M)
IDCONDNGACTPRYO

cR(eR(M))
CODINGANDCRYPTO
Example 10.1.6 (Substitution cipher) An idea behind monoalphabetic substi-
tution cipher is to provide substitution of every symbol in a plaintext by some
other symbol from a chosen alphabet. Formally, let A be the alphabet, so that
plaintexts and ciphertexts are composed of symbols from A. For the plaintext
p = p1 . . . pn the ciphertext c is obtained as c = Ek(p) = k(p1), . . . , k(pn). The
key space now is the set of all permutations of A : K = S|A|. In Example
10.1.2 we have already seen an instance of such a cipher. There k was chosen
to be k = (23, 24, 25, 0, 1, . . . , 21, 22). Again, used alone monoalphabetic cipher
is insecure, but as a basic idea is used in modern ciphers to provide confusion
in a ciphertext.
There is also a polyalphabetic substitution cipher. Let the key k be defined as a
sequence of permutations on A : k = (k1, . . . , kt), where t is the period. Then
every t symbols of the plaintext p are mapped to t symbols of the ciphertext c
as c = k1(p1), . . . , kt(pt). Simplifying ki to shifting by li symbols to the right
we obtain ci = pi +li ( mod |A|). Such cipher is called simple Vigenére cipher.
Example 10.1.7 The Sage-code for a substitution cipher encryption is given
below.
E = SubstitutionCryptosystem(S)
K = E.random_key()
K
ZYNJQHLBSPEOCMDAXWVRUTIKGF
e = E(K)
e(M)
NDJSMLZMJNWGARD
c = E(L)
Here the string ZYNJQHLBSPEOCMDAXWVRUTIKGF shows the permutation of the
alphabet. Namely, the letter A is mapped to Z, the letter B is mapped to Y etc.
One can also provide the permutation explicitly as follows
K = S(’MHKENLQSCDFGBIAYOUTZXJVWPR’)
e = E(K)
e(M)
KAECIQMIEKUPYZA
A piece of code for working with the simple Vigenére cipher is provided below.
E = VigenereCryptosystem(S,15)
K = S(’XSPUDFOQLRMRDJS’)
e = E(K)
e(M)
ZGSCQLODOTDPSCG
c = E(L)
c(e(M))

Table 10.1: Frequencies of the letters in the English language
E 11.1607% M 3.0129%
A 8.4966% H 3.0034%
R 7.5809% G 2.4705%
I 7.5448% B 2.0720%
O 7.1635% F 1.8121%
T 6.9509% Y 1.7779%
N 6.6544% W 1.2899%
S 5.7351% K 1.1016%
L 5.4893% V 1.0074%
C 4.5388% X 0.2902%
U 3.6308% Z 0.2722%
D 3.3844% J 0.1965%
P 3.1671% Q 0.1962%
CODINGANDCRYPTO
Note that here the string XSPUDFOQLRMRDJS defines 15 permutations: one per
position. Namely, every letter is an image of the letter A at that position. So
at the first position A is mapper to X (therefore, e.g. B is mapped to Y), at the
second position A is mapped to S and so on.
The ciphers above used alone do not provide security, as has already been men-
tioned. One way to break such ciphers is to use statistical methods. For the
permutation ciphers note that they do not change frequency of occurrence of
each letter of an alphabet. Comparing frequencies obtained from a cipher-
text with a frequency distribution of the language used, one can figure out that
he/she deals with the ciphertext obtained with a permutation cipher. Moreover,
for cryptanalysis one may try to look for anagrams - words in which letters are
permuted. If the eavesdropper is able to find such anagrams and solve them,
then he/she is pretty close to breaking such a cipher (Exercise 10.1.1). Also
if the eavesdropper has an access to an encryption device and is able to pro-
duce ciphertexts for plaintext of his/her choice (chosen-plaintext attack), then
he/she can simply choose plaintexts, such that figuring out the period and the
permutation becomes easy.
For monoalphabetic substitution ciphers one also notes that although letters are
changed, the frequency with which they occur does not change. So the eaves-
dropper may compare frequencies in a long-enough ciphertext with a frequency
distribution of the language used and thus figure out, how letters of an alphabet
were mapped to obtain a ciphertext. For example for the English alphabet one
may use frequency analysis of words occurring in ”Concise Oxford Dictionary”
(http://www.askoxford.com/asktheexperts/faq/aboutwords/frequency), see
Table 10.1. Note that since positions of the symbols are not altered, the eaves-
dropper may not only look at frequencies of the symbols, but also for combina-
tions of symbols. In particular look for pieces of a ciphertext that correspond
to frequently used words like ”the”, ”we”, ”in”, ”at” etc. For polyalphabetic
ciphers one needs to find out the period first. This can be done by the so-called
Kasiski method. When the period is determined, one can proceed with the

frequency analysis as above performed separately for all sets of positions that
stand at distance t at each other.
10.1.3 Security issues
As we have seen, block ciphers provide us with a mean to convey secret messages
as per symmetric scheme. It is clear that an eavesdropper tries to get insight in
this secret communication. The question that naturally arises is ”What does it
mean to break a cipher?” or ”When the cipher is considered to be broken?”. In
general we consider a cipher to be totally broken if the eavesdropper is able to
recover the secret key, thus compromising the whole secret communication. We
consider a cipher to be partially broken if the eavesdropper is able to recover (a
part of) a plaintext from a given ciphertext, thus compromising the part of the
communication. In order to describe actions of the eavesdropper more formally,
different assumptions on the eavesdropper’s abilities and scenarios of attacks
are introduced.
Assumptions:
• The eavesdropper has an access to all ciphertexts that are transmitted
through the communication channel. He/she is able to extract these ci-
phertexts and use them further for his/her disposal.
• The eavesdropper has a full description of the block cipher itself, i.e.
he/she is aware of how the encryptions constituting the cipher act.
The first assumption is natural to assume, since communication in the modern
world (e.g. via the Internet) assumes huge amount of information to be trans-
mitted between an enormous variety of parties. Therefore, it is impossible to
provide secure channels for all such transmissions. The second one is also quite
natural, as for most block ciphers that are proposed in the recent time there
is a full description publicly available either as a legitimate standard or as a
paper/report.
Attack scenarios:
• ciphertext-only: The eavesdropper does not have any additional informa-
tion, only an intercepted ciphertext.
• known-plaintext: Some amount of plaintext-ciphertext pairs encrypted
with one particular yet unknown key are available to the eavesdropper.
• chosen-plaintext and chosen-ciphertext: The eavesdropper has an access
to plaintext-ciphertext pairs with a specific eavesdropper’s choice of plain-
texts and ciphertexts resp.
• adaptive chosen-plaintext and adaptive chosen-ciphertext: The choice of
the special plaintexts resp. ciphertext in the previous scenario depends on
some prior processing of pairs.
• related-key: The eavesdropper is able to do encryptions with unknown yet
related keys, with the relations known to the eavesdropper.

Note that the last three attacks are quite hard to realize in a practical envi-
ronment and sometimes even impossible. Nevertheless, studying these scenarios
provides more insight in the security properties of a considered cipher.
When undertaking an attack on a cipher on thinks in terms of complexity. Recall
from Definition 6.1.4 that there is always time or processing as well as mem-
ory or storage complexities. Another type of complexity one deals with here is
data complexity, which is an amount of pre-knowledge (e.g. plain-/ciphertexts)
needed to mount an attack.
The first thing to think of when designing a cipher is to choose block/key length,
so that brute force attacks are not possible. Let us take a closer look here. If
the eavesdropper is given 2n
plaintext-ciphertext pairs encrypted with one secret
key, then he/she entirely knows the encryption function for a given secret key.
This implies that n should not be chosen too small, as then simply composing
a codebook of associated plaintexts-ciphertexts is possible. For modern block
ciphers, block length 128 bits is common. On the other side, if the eavesdropper
is given just one plaintext-ciphertext pair (p, c), he/she may proceed as follows.
Try every key from K (assume now that K = {0, 1}l
) until he/she finds k that
maps p to c: Ek(p) = c. Validate k with another pair (or several pairs) (p , c ),
i.e. check whether Ek(p ) = c . If validation fails, then discard k and move
further in K. One expects to find a valid key after searching through a half of
{0, 1}l
, i.e. after 2l−1
trials. This observation implies that key space should not
be too small, as then exhaustive search of such kind is possible. For modern
ciphers key lengths of 128, 192, 256 bits are applied. Smaller block-length, like
64 bits, are also employed in light-weight ciphers that are used for resource con-
straint devices.
Let us now discuss two main types of security that exist out there for cryptosys-
tems in general.
Definition 10.1.8 • Computational security. Here one considers a cryp-
tosystem to be secure (computationally) if the number of operations needed
to break the cryptosystem is so large that cannot be executed in practice,
similarly for memory. Usually one measures such a number by the best
attacks available for a given cryptosystem and thus claiming computa-
tional security. Another similar idea is to show that breaking a given
cryptosystem is equivalent to solving some problem, that is believed to be
hard. Such security is called provable security or sometimes reductionist
security.
• Unconditional security. Here one assumes that an eavesdropper has an
unlimited computational power. If one is able to prove that even having
this unlimited power, an eavesdropper is not able to break a given cryp-
tosystem, then it is said that the cryptosystem is unconditionally secure
or that it provides perfect secrecy.
Before going to examples of block ciphers, let us take a look at the security
criteria that are usually used, when estimating security capabilities of a cipher.
Security criteria:
• state-of-the-art security level: One gets more confident in a cipher’s se-
curity if known up-to-date attacks, both generic and specialized, do not
break the cipher faster, than the exhaustive search. The more such at-
tacks are considered, the more confidence one gets. Of course, one cannot

be absolutely confident here as new, unknown before, attacks may appear
that would impose real threat.
• block and key size: As we have seen above, small block and key sizes
make brute force attacks possible, so in this respect, longer blocks and
key provide more security. On the other hand, longer blocks and key
imply more costs in implementing such a cipher, i.e. encryption time
and memory consumption may rise considerably. So there is a trade-off
between security and ease/speed of an implementation.
• implementation complexity: In addition to the previous point, one should
also take care for efficient implementation of encryption/decryption map-
pings depending on an environment. For example different methods may
be used for hardware and software implementations. Special care is to be
taken, when one deals with hardware units with very limited memory (e.g.
smartcards).
• others: Things like data expansion and error propagation also play a role
in applications and should be taken into account accordingly.
10.1.4 Modern ciphers. DES and AES
In Section 10.1.2 we considered basic ideas for block ciphers. Next, let us con-
sider two examples of modern block ciphers. The first one - DES (Data Encryp-
tion Standard) - was proposed in 1976 and was used until late 1990s. Due to
short key length, it became possible to implement an exhaustive search attack,
so the DES was no longer secure. In 2001 the cipher Rijndael proposed by
Belgian cryptographers Joan Daemen and Vincent Rijmen was adopted as the
Advanced Encryption Standard (AES) in the USA and is now widely used for
protecting classified governmental documents. In commerce AES also became
the standard de facto.
We start with DES, which is an instance of a Feistel cipher, which is in turn an
iterative cipher.
Definition 10.1.9 An iterative block cipher is a block cipher which performs
sequentially a certain key dependant transformation Fk. This transformation is
called round transformation and the number of rounds Nr is a parameter of an
iterative cipher. It is also common to expand the initial private key k to subkeys
ki, i = 1, . . . , Nr, where each ki is used as a key for F at round i. A procedure
for obtaining the subkeys from the initial key is called a key schedule. For each
ki the transformation F should be invertible to allow decryption.
DES
Definition 10.1.10 A Feistel cipher is an iterative cipher, where encryption is
done as follows. Divide n-bit plaintext p into two parts - left and right - (l0, r0)
(n is assumed to be even). A transformation f : {0, 1}n/2
× K → {0, 1}n/2
is
chosen (K may differ from K). The initial secret key is expanded to obtain the
subkeys ki, i = 1, . . . , Nr. Then for every i = 1, . . . , Nr a pair (li, ri) is obtained
from the previous pair (li−1, ri−1) as follows: li = ri−1, ri = li−1 ⊕ f(ri−1, ki).
Here ”⊕” means bitwise addition of {0, 1}-vectors. The ciphertext is taken as
(rNr , lNr ) rather than (lNr , rNr ).

On Figure 10.1 one can see the scheme of Feistel cipher encryption.
p
c
l0 r0
s
c
l1
e
c
f ' k1
c
r1
c
s
l2
'
e
f ' k2
c
r2
q q q
rN−1
c
f ' kN
c
rN
s
'lN
lN−1
e
cc
rN lN
c
c
Figure 10.1: Feistel cipher encryption
Note that f(·, ki) need not be invertible (Exercise 10.1.5). Decryption is done
analogously with the reverse order of subkeys: kNr
, . . . , k1.
DES is a Feistel cipher that operates on 64-bit blocks and needs a 56-bit key.
Actually the key is given initially in 64 bits, of which 8 bits can be used as
parity checks. DES has 16 rounds. The subkeys k1, . . . , k16 are 48 bits long.
The transformation f from Deﬁnition 10.1.10 is chosen as
f(ri−1, ki) = P(S(E(ri−1) ⊕ ki)). (10.1)
Here E : {0, 1}32
→ {0, 1}48
is an expansion transformation that expands 32-
bit vector to a 48-bit one in order to ﬁt the size of ki when doing bitwise
addition. Next S is a substitution transformation that acts as follows. First

divide 48-bit vector E(ri−1) ⊕ ki into 8 6-bit blocks. For every block perform
a (non-linear) substitution that takes 6 bits and outputs 4 bits. Thus at the
end one has a 32-bit vector obtained by a concatenation of the results from
the substitution S. The substitution S is an instance of an S-box, a carefully
chosen non-linear transformation that makes relation between its input and
output complex, thus adding confusion to the encryption transformation (see
below for the discussion). Finally, P is a permutation of a 32-bit vector.
Algorithm 10.1.11 (DES encryption)
Input: The 64-bit plaintext p and the 64-bit key k.
Output: The 64-bit ciphertext c corresponding to p.
1. Use the parity check bits k8, k16, . . . , k64 to detect errors in 8-bit subblocks
of k.
- If no errors are detected then obtain 48-bit subkeys k1, . . . k16 from
k using key schedule.
2. Take p and perform an initial permutation IP to p. Divide the 64-bit
vector IP(p) into halves (l0, r0).
3. For i = 1, . . . , 16 do
- li := ri−1.
- f(ri−1, ki) := P(S(E(ri−1) ⊕ ki)), with S, E, P as explained after
(10.1).
- ri := li−1 ⊕ f(ri−1, ki).
4. Interchange the last halves (l16, r16) → (r16, l16) = c .
5. Perform the permutation inverse to the initial one to c , the result is the
ciphertext c := IP−1
(c ).
Let us now give a brief overview of DES properties. First of all, we mention two
basic features that any modern block cipher provides and definitely should be
taken into account when designing a block cipher.
• Confusion. When an encryption transformation of a block cipher makes
relations among a plaintext, a ciphertext, and a key, as complex as possi-
ble, it is said that such cipher adds to the encryption process. Confusion
is usually achieved by non-linear transformations realized by S-boxes.
• Diffusion. When an encryption transformation of a block cipher makes
every bit of a ciphertext dependent on every bit of a plaintext and on every
bit of a key, it is said that such cipher adds to the encryption process.
Diffusion is usually achieved by permutations. See Exercise 11.3.1 for a
concrete example.
Empirically, DES has the above features, so in this respect appears to be rather
strong. Let us discuss some other features of DES and some attacks that exist
out there for DES. Let DESk(·) be an encryption transformation defined by
DES as per Algorithm 10.1.11 for a key k. DES has 4 weak keys, in this context
these are the keys k for which DESk(DESk(·)) is the identity mapping, which,

of course, violates the criteria mentioned above. Moreover for each of these
weak keys DES has 232
fixed points, i.e. plaintexts p such that DESk(p) =
p. There are 6 pairs of semi-weak keys (dual keys), i.e. pairs (k1, k2) such
that DESk1 (DESk2 (·)) is the identity mapping. Similarly to weak keys, 4 out
of 12 semi-weak keys have 232
anti-fixed points, i.e. plaintexts p such that
DESk(p) = ¯p, where ¯p is the bitwise complement of p. It is also known that
DES encryptions are not closed under composition, i.e. do not form a group.
This is quite important as otherwise using multiple DES encryptions would be
less secure, than otherwise is believed.
If the eavesdropper is able to work under huge data complexity, several known-
plaintext attacks become possible. The most well-known of them related to DES
are linear and differential cryptanalysis. The linear cryptanalysis was proposed
by Mitsuru Matsui in early 1990s and is based on the idea of an approximation of
a cipher with an affine function. In order to implement this attack for DES one
needs 243
known plaintext-ciphertext pairs. Existence of such an attack is an
evidence of theoretic weakness of DES. Similar observation is applicable to the
differential cryptanalysis. The idea of this general method is to carefully explore
how differences in inputs to certain parts of an encryption transformation affects
outputs of these parts. Usually the focus is on the S-Boxes. An eavesdropper is
trying to find a bias in differences distribution, which which would allow him/her
to distinguish a cipher from a random permutation. In the DES-situation the
eavesdropper needs 255
known or 247
chosen plaintext-ciphertext pairs in order
to mount such an attack. These attacks do not bear any practical threat to
DES. Moreover, performing exhaustive search on the entire key space of size 256
is practically faster, than the attacks above.
AES
Next we present the basic description and properties of the Advanced Encryp-
tion Standard (AES). AES is a successor of DES and was proposed, because
DES was not considered to be secure anymore. A new cipher for the Standard
should have had larger key/block size and be resistant to linear and differential
cryptanalysis that imposed theoretically a threat for the DES. The cipher Rijn-
dael adapted for the Standard satisfies these demands. It operates on blocks of
length 128 bits and keys of length 128, 192, or 256 bits. We will concentrate on
AES, which employs keys of length 128 bits - the most common setting used.
AES is an instance of a substitution-permutation network. We give a definition
next.
Definition 10.1.12 The substitution-permutation network (SP-network) is the
iterative block cipher with layers of S-boxes interchanged with layers of permu-
tations (or P-Boxes), see Figure 10.2. It is required that S-boxes are invertible.
Note that in the definition of an SP-network we demand that S-boxes are invert-
ible transformations in contrast to Feistel ciphers, where S-boxes do not have to
be invertible, see the discussion after Definition 10.1.10. Sometimes invertibility
of S-boxes is not required, which makes the definition wider. If we recall the
notions of confusion and diffusion, we see that SP-networks exactly reflect these
notions: S-Boxes provide local confusion and then bit permutations of affine
maps provide diffusion.

plaintext
c c c c
S-box S-box S-box S-boxq q q
c c c c
P-Box
c c c c
q q qp p p p p p
c c c c
S-box S-box S-box S-boxq q q
c c c c
P-Box
c
ciphertext
Figure 10.2: SP-network
The description of the AES follows. As has already been said, AES operates
on 128-bit blocks and 128-bit keys (standard version). For convenience these
128-bit vectors are considered as 4 × 4 arrays of bytes (8-bits). AES-128 (key
length is 128 bits) has 10 rounds. We know that AES is an SP-network, so let
us describe its substitution and diffusion (permutation) layers.
AES substitution layer is based on 16 S-boxes each acting on a separate byte of
the square representation. In AES terminology the S-box is called SubBytes.
One S-box performs its substitution in three steps:
1. Inversion:: Consider an input byte binput (a {0, 1}-vector of length 8) as
an element of F256. This is done via the isomorphism F2[a]/ a8
+a4
+a3
+
a + 1 ∼= F256, so that F256 can be regarded as an 8-dimensional vector
space over F2 *** Appendix ***. If binput = 0, then the output of this
step is binverse = b−1
input otherwise binverse = 0.
2. F2-linear mapping:: Consider binverse again as a vector from F8
2. The
output of this step is given by blinear = L(binverse), where L is an invertible
F2-linear mapping given by a prescribed circulant matrix.
3. S-box constant: The output of the entire S-box is obtained as boutput =
blinear + c, where c is an S-box constant.
Thus, in essence, each S-box applies inversion and then the affine transforma-
tion to an 8-bit input block yielding 8-bit output block. It is easy to see that
S-box so defined is invertible.
The substitution layer acts locally on each individual byte, whereas diffusion
layer acts on the entire square array. The diffusion layer consists of two consec-
utive linear transformations. The first one, called ShiftRows, shifts the i-th row
of the array by i − 1 positions to the left. The second one, called MixColumns,
is given by a 4 × 4 matrix M over F256 and transforms every column C of the
array to a column MC. The matrix M is the parity check matrix of an MDS

code, cf. Definition 3.2.2 and was introduced to follow the so-called wide trail
strategy and precludes the use of linear and differential cryptanalysis.
Let us now describe the encryption process of AES.
Algorithm 10.1.13 (AES encryption)
Input: The 128-bit plaintext p and the 128-bit key k
Output: The 128-bit ciphertext c corresponding to p.
1. Perform initial key addition: w := p ⊕ k =AddRoundKey(p, k).
2. Expand the initial key k to subkeys k1, . . . , k10 using key schedule.
3. For i = 1, . . . , 9 do
- Perform S-box substitution: w :=SubBytes(w).
- Shift the rows: w :=ShiftRows(w).
- Transform the columns with the MDS matrix M: w :=MixColumns(w).
- Add the round key: w :=AddRoundKey(w, ki) = w ⊕ ki.
# The last round does not have MixColumns.
4. Perform S-box substitution: w :=SubBytes(w).
5. Shift the rows: w :=ShiftRows(w).
6. Add the round key: w :=AddRoundKey(w, k10) = w ⊕ k10.
7. The ciphertext is c := w.
The key schedule is designed similarly to the encryption and is omitted here.
All the details on the components, the key schedule, and the reverse cipher for
decryption can be found in the literature, see Notes. The reverse cipher is quite
straightforward as it has to undo invertible affine transformations and the in-
version in F256.
Let us discuss some properties of AES. First of all, we note that AES possesses
confusion and diffusion properties. The use of S-boxes provides sufficient resis-
tance to linear and differential cryptanalysis that was one of the major concerns
when replacing DES. The use of the affine mapping in the S-box among other
things removes fixed points. In the diffusion layer the diffusion is done sep-
arately for rows and columns. It is remarkable that in contrast to the DES,
where the encryption is mainly described via table look-ups, AES description
is very algebraic. All transformations described as either field inversion or a
matrix multiplication. Of course, in real-world applications some operations
like S-box are nevertheless realized as table look-ups. Still the simplicity of the
AES description has been in discussion since the selection process, where the
future AES Rijndeal took part. Highly algebraic nature of the AES descrip-
tion boosted a new branch in cryptanalysis called algebraic cryptanalysis. We
address this issue in the next chapter, see Section 11.3.

10.1.5 Exercises
10.1.1 It is known that the following ciphertext is obtained with a permutation
cipher of period 6 and contains an anagram of a famous person’s name (spaces
are ignored by encryption): ”AAASSNISFNOSECRSAAKIWNOSN”. Find the
original plaintext.
10.1.2 Sequential composition of several permutation ciphers with periods t1,
. . . , ts is called compound permutation (compound transposition). Show that
the compound permutation cipher is equivalent to a simple permutation cipher
with the period t = lcm(t1, . . . , ts).
10.1.3 [CAS] The Hill cipher is defined as follows. One encodes a length-n
block p = (p1 . . . pn), which is assumed to consist of elements from Zn, with
an invertible n × n matrix H = (hij) as ci =
n
j=1 hijpj. Therewith one
obtains the cryptogram c = (c1 . . . cn). The decryption is done analogously
using H−1
. Write a procedure that implements the Hill cipher. Compare your
implementation with the HillCryptosystem class from Sage.
10.1.4 The following text is encrypted with a monoalphabetic substitution
cipher. Decrypt it. the following ciphertext using frequency analysis and Table
10.1:
AI QYWX YRHIVXEOI MQQIHMEXI EGXMSR. XLI IRIQC MW ZIVC
GOSWI!
Hint: Decrypting small words and using first may be very useful. Also use Table
10.1.
10.1.5 Show that in the definition of a Feistel cipher the transformation f
need not be invertible to ensure encryption, in a sense that the round function
is invertible even if f is not. Also show that performing encryption starting
at (rNr , lNr ) with the reverse order of subkeys, yields (l0, r0) at the end, thus
providing a decryption.
10.1.6 It is known that the expansion transformation E of DES has the com-
plementary property, i.e. for every input x it holds that E(¯x) = E(x). It is also
known that ¯k expands to k1, . . . , k16. Knowing this show that
a. The entire DES transformation also possesses the complementary prop-
erty: ∀p ∈ {0, 1}64
, k ∈ {0, 1}56
: DES¯k(¯p) = DESk(p).
Using (a.) show
b. It is possible to reduce exhaustive search complexity from 255
(half the
key-space size) to 254
.
10.2 Asymmetric cryptosystems
In Section 10.1 we considered symmetric cryptosystems. As we may see, for
a successful communication Alice and Bob are required to keep their encryp-
tion/decryption keys secret. Only the channel itself is assumed to be eaves-
dropped. For Alice and Bob to set a secret communication it is necessary to

10.2. ASYMMETRIC CRYPTOSYSTEMS 309
convey encryption/decryption keys. This can be done, e.g. by means of a trusted
courier or some very secure channel (like specially secured telephone line) that is
considered to be strongly secure. This paradigm suited well for diplomatic and
military communication: the amount of communicating parties in these scenar-
ios was quite limited; in addition, usually communicating parties could afford
sending a trusted courier in order to keep keys secret or provide some highly
protected channel for exchanging keys. In 1970s with the beginning of electronic
communication it became apparent that such an exchanging mechanism is ab-
solutely inefficient. This is mainly due to a drastic increase in the number of
communicating parties. It is not only diplomats or high order military officials
that wish to set secret communication, but usual users (e.g. companies, banks,
social networks users) who would like to be able to do business over some large
distributed network. Suppose that there is n users which potentially are willing
to communicate with each other secretly. Then it is possible to share secret keys
between every pair of users. There are n(n − 1)/2 pairs of users, so one would
need this number of exchanges in a network to set the communication. Note
that already for n = 1, 000, we have n(n − 1)/2 = 499, 500, which is of course
not something we would like to do. Another option would be to set some trusted
authority in the middle who would store secret keys for every user and then if
Alice would like to send a plaintext p to Bob, she would send cAlice = EKAlice
(p)
to the trusted authority Tim. Tim would decrypt p = DKAlice
(cAlice) and send
cBob = EKBob
(p) to Bob who is able then to decrypt cBob with his secret key
KBob. An obvious drawback of this approach is that Tim knows all the secret
keys, and thus is able to read (and alter!) all the plaintexts, which is of course
not desirable. Another disadvantage is that for a large network it could be hard
to implement the trusted authority of this kind as it has to take part in every
communication between users and thus can get overwhelmed.
A solution to the problem above was proposed by Diffie and Hellman in 1976.
This was the starting point for asymmetric cryptography. The idea is that if
Alice wants to communicate with some other parties, she generates an encryp-
tion/decryption pair (e, d) in such a way that knowing e it is computationally
infeasible to obtain d. This is quite different from symmetric cryptography,
where e and d are (computationally) the same. Motivation for the name ”asym-
metric cryptosystem” as oppose to ”symmetric cryptosystem” should be clear
now. So what Alice does, she publishes her encryption key e in some public
repository and keeps d secret. If Bob wants to send a plaintext p to Alice, he
simply finds her public key e = eAlice in the repository and uses it for encryp-
tion: c = Ee(p). Now Alice is able to decrypt with her private key d = dAlice.
Note that due to assumptions we have on the pair (e, d), Alice is the only person
who is able to decrypt c. Indeed, Eve is able to know c and an encryption key e
used, but she is not able to get d for decryption. Remarkably, even Bob himself
is not able to restore his plaintext p from c if he loses or deletes it beforehand!
The formal definition follows.
Definition 10.2.1 The asymmetric cryptosystem is defined by the following
data:
• The plaintext space P and the ciphertext space C.
• {Ee : P → C|e ∈ K} and {Dd : C → P|d ∈ K} are the sets of encryption
and decryption transformations resp., which are bijections from P to C

and from C to P resp.
• The above transformations are parameterized by the key space K.
• Given an associated pair (e, d), so that a property ∀p ∈ P : Dd(Ee(p)) = p
holds, knowing e it is ”computationally hard” to find out d.
Here, the encryption key e is called public and the decryption key d is called
private.
The core issue in the above definition is having a property that knowledge of
e practically does not shed any light on d. The study of this issue led to the
notion of a one-way function. We say that a function f : X → Y is one-way if
it is ”computationally easy” to compute f(x) for any x ∈ X, but for y ∈ Im(f)
it is ”computationally hard” to find x ∈ X such that f(x) = y. Note that
one may compute Y = {f(x)|x ∈ Z ⊂ X}, where Z is some small subset of
X and then invert elements from Y . Still Y is essentially small compared to
Im(f), so for randomly chosen y ∈ Im(f) the above assumption should hold.
Theoretically it is not known if one-way functions exist, but in practice there
are several candidates that are believed to be one-way. We discuss this a bit
later.
The above notion of one-way function solves half of the problem. Namely, if
Bob sends Alice an encrypted plaintext c = E(p), where E is one-way, Eve
is not able to find p as she is not able to invert E. But Alice faces then the
same problem! Of course we would like to provide Alice with means to invert
E and find p. Here the notion of a trapdoor one-way function comes in hand.
A one-way function f : X → Y is said to be one-way trapdoor, if there is some
additional information, called trapdoor, having which it is ”computationally
easy” for y ∈ Im(f) to find x ∈ X such that f(x) = y. Now if Alice possesses
such a trapdoor for E she is able to obtain p from c.
Example 10.2.2 We now give examples of functions that are believed to be
one-way.
1. The first is f : Zn → Zn defined by f(x) = xa
mod n. If we take a = 3
it is easy to compute x3
mod n, but having y ∈ Zn it is believed to be
hard to compute x such that y = x3
mod n. For suitably chosen a and n
this function is used in RSA cryptosystem, Section 10.2.1. For a = 2 one
obtains the so-called Rabin scheme. It can be shown that in fact factoring
n is equivalent to inverting f in this case. Since factoring of integers is
considered to be a hard computational problem, it is believed that f is
one-way. For RSA it is believed that inverting f is as hard as factoring,
although no rigorous proof is known. In both schemes above it is assumed
that n = pq, where p and q are (suitably chosen) primes, and this fact is a
public knowledge, but p and q are kept secret. One-way property relies on
hardness of factoring n, i.e. finding p and q. For Alice the knowledge of p
and q is a trapdoor using which he is able to invert f. Thus f is believed
to be trapdoor one-way function.
2. The second example is g : F∗
q → F∗
q defined by g(x) = ax
, where a generates
the multiplicative group F∗
q. The problem of inverting g is called discrete
logarithm problem (DLP) in Fq. It is a basis for El Gamal scheme, Section

10.2.2. The DLP problem is believed to be hard in general, thus g is
believed to be one way, since for given x computing ax
in F∗
q is easy. One
may also use domains different from Fq and try to solve DLP there, on
some discussion on that cf. Section 10.2.2.
3. Consider a function h : Fk
q → Fn
q , k n defined as Fk
q m → mG+e ∈ Fn
q ,
where G is a generator matrix of an [n, k, d]q linear code and wt(e) ≤ t ≤
(d − 1)/2. So h defines an encoding function for the code defined by G.
When inverting h one faces the problem of bounded distance decoding,
which is believed to be hard. The function h is a basis for the McEliece
and Niederreiter cryptosystems, see Sections 10.6 and ??.
4. In the last example we consider a function z : Fn
q → Fm
q , n ≥ m defined
as Fn
q x → F(x) = (f1, . . . , fm(x)) ∈ Fm
q , where fi s are non-linear
polynomials over Fq. Inverting z means finding a solution of F(X) = 0
of a system of non-linear equations. This problem is known to be NP-
hard even if fi’s are quadratic and q = 2. The function z is a basis of
multivariate cryptosystems, see Section 10.2.3.
Before going to consider concrete examples of asymmetric cryptosystems, we
would like to note that there is a vital necessity of authentication in asymmetric
cryptosystems. Indeed, imagine that Eve can not only intercept and read mes-
sages, but also alter the repository, where public keys are stored. Suppose Alice
is willing to communicate a plaintext p to Bob. Assume that Eve is aware of
this intention and is able to substitute Bob’s public key eBob with her key eEve
for which she has the corresponding decryption key dEve. Alice, not knowing
that the key was replaced, takes eEve and encrypts c = EeEve
(p). Eve intercepts
c and decrypts p = DdEve
(c). So now Eve knows p. After that she may either
encrypt p with Bob’s eBob and send the ciphertext to him, or even replace p
with some other p . As a result, not only Eve gets the secret message p, but
Bob can be misinformed by the message p , which, as he thinks, comes from
Alice. Fortunately there are ways of providing means to tackle this problem.
They include use of a third trusted party (TTP) and digital signatures. Digital
signatures are asymmetric analogous of (message) authentication codes, Section
10.3. These are out of scope of this introductory chapter.
The last remark concerns the type of security that asymmetric cryptosystems
provide. Note that opposed to symmetric cryptosystems, some of which can
be shown to be unconditionally secure, asymmetric cryptosystems can only be
computationally secure. Indeed, having Bob’s public key eBob Eve can simply
encrypt all possible plaintexts until she finds p such that EeBob
(p) coincides with
the ciphertext c that she observed.
10.2.1 RSA
Now we consider an example of one of the most used asymmetric cryptosystems
- RSA , named after its creators R. Rivest, A. Shamir, and L. Adleman. This
cryptosystem was proposed in 1977 shortly after Diffie and Hellman invented
asymmetric cryptography. It it based on hardness of factorizing integers and up
to now withstood cryptanalysis, although some of the attacks suggest careful
choosing of public/private key and its size. First we present the RSA itself:

how one chooses a public/private key pair, how encryption/decryption is done
and why it works. Then we consider a concrete example with small numbers.
Finally we discuss some security issues. In this and the following subsection we
denote plaintext with m, because historically p and q are reserved in context of
RSA.
Algorithm 10.2.3 (RSA key generation)
Output: RSA public/private key pair ((e, n), d).
1. Choose two distinct primes p and q.
2. Compute n = pq and φ = φ(n) = (p − 1)(q − 1).
3. Select a number e, 1 e φ, such that gcd(e, φ) = 1.
4. Using extended Euclidian algorithm, compute d such that ed ≡ 1( mod φ).
5. The key pair is ((e, n), d).
The integers e and d above are called encryption and decryption exponent resp.;
the integer n is called modulus. For encryption Alice uses the following algo-
rithm.
Algorithm 10.2.4 (RSA encryption)
Input: Plaintext m and Bob’s encryption exponent e together with the modulus
n.
Output: Ciphertext c.
1. Represent m as an integer 0 ≤ m n.
2. Compute c = me
( mod n).
3. The ciphertext for sending to Bob is c.
For decryption Bob uses the following
Algorithm 10.2.5 (RSA decryption)
Input: Ciphertext c, the decryption exponent d, and the modulus n.
Output: Plaintext m.
1. Compute m = cd
( mod n).
2. The plaintext is m.
Let us see why Bob gets initial m as a result of decryption. Since ed ≡ 1(
mod φ), there exists an integer s such that ed = 1 + sφ. For gcd(m, p) there
are two possibilities: either 1 or p. If gcd(m, p) = 1, then due to Fermat’s
little theorem we have mp−1
= 1( mod p). Raising both sides to s(q − 1)-th
power and multiplying by m we have m1+s(p−1)(q−1)
≡ m( mod p). Now using
ed = 1 + sφ = 1 + s(p − 1)(q − 1) we have med
= m( mod p). For the case
gcd(m, p) = p we get the last congruence right away. The same argument can
be applied to q, so we obtain analogously med
= m( mod q). Using the Chinese
remainder theorem we then get med
= m( mod n). So indeed cd
= (me
)d
= m(
mod n).

Example 10.2.6 Consider an example of RSA as is described in the algorithms
above with some small values. First let us choose the primes p = 5519 and
q = 4651. So our modulus is n = pq = 25668869, thus φ = (p − 1)(q − 1) =
25658700. Take e = 29 as an encryption exponent, gcd(29, φ) = 1. Using
Euclidian algorithm obtain e · (−3539131) + 4 · φ = 1, so take d = −3539131
mod φ = 22119569. The key pair now is ((e, n), d) = ((29, 25668869), 22119569).
Suppose Alice wants to transmit a plaintext message m = 7847098 to Bob. She
takes his public key e = 29 and computes c = me
( mod n) = 22152327. She
sends c to Bob. After obtaining c Bob computes cd
( mod n) = m.
Example 10.2.7 Magma computer algebra system (cf. Appendix ??) gives an
opportunity to compute an RSA modulus of a given bit-length. For example,
if we want to construct a “random” RSA modulus of bit-length 25, we should
write:
RSAModulus(25);
26827289 1658111
Here the first number is the random RSA modulus n and the second one is
a number e, such that gcd(e, φ(n)) = 1. We can also specify the number e
explicitly (below e = 29):
n:=RSAModulus(25,29);
n;
19579939
One can further factorize n as follows:
Factorization(n);
[ 3203, 1, 6113, 1 ]
This means that p = 3203 and q = 6113 are prime factors of n and n = pq. We
can also use extended Euclidian algorithm to recover d as follows:
e:=29; phi:=25658700;
ExtendedGreatestCommonDivisor(e, phi);
1 -3050975 4
So here 1 is the gcd and d = −3539131, as was computed in the example above.
As has already been mentioned, the RSA relies on hardness of factoring inte-
gers. Of course, if Eve is able to factor n, then she is able to produce d and
thus decrypt all ciphertexts. The open question is whether breaking RSA leads
to a factoring algorithm for n. The problem of breaking RSA is called the RSA
problem. There is no rigorous proof, though, that breaking RSA is equivalent
to factoring. Still it can be shown that computing decryption exponent d and
factoring are equivalent. Note that in principle for an attacker it might be
unnecessary to compute d in order to figure out m from c given (e, n). Never-
theless, even though there is no rigorous proof of equivalence, RSA is believed
to be as hard as factoring. Now we briefly discuss some other things that need
to be taken into the consideration, when choosing parameters for the RSA.
1. For fast encryption using small encryption exponent is desirable, e.g.
e = 3. The possibility for an attack exists then, if this exponent is used
for sending the same message even to different recipients with different
moduli. There is also concern about small decryption exponent. For ex-
ample, if bitlength of d is approximately 1/4 of bitlength of n, then there
is an efficient way to get d from (e, n).

2. As to the primes p and q one should take the following into account. First
of all p − 1 and q − 1 should not have small factors, as then factoring n
with Pollard’s p − 1 algorithm is possible. Then, in order to avoid elliptic
curve factoring, p and q should be roughly of the same bitlength. On the
other side if the difference p − q is too small then techniques like Fermat
factorization become feasible.
3. In order to avoid problems as in (1.), different padding schemes are pro-
posed that add certain amount of randomness to ciphertexts. Thus the
same message will be encrypted to one of ciphertexts from some range.
An important remark to make is that using so-called quantum computers, which
are large enough, it is possible to solve the factorization problem in polynomial
time. See Notes for references. The same problem exists for the cryptosystems
based on the DLP, which are described in the next subsection. Problems (3.)
and (4.) from Example 10.2.2 are not known to be susceptible to quantum com-
puter attacks. Together with some other hard problems, they make a foundation
for the post-quantum cryptography that deals with cryptosystems resistant to
quantum computer attacks. See Notes for references.
10.2.2 Discrete logarithm problem and public-key cryp-
tography
In the previous subsection we considered the asymmetric cryptosystem RSA
based on hardness of factorizing integers. As has already been noted in Example
10.2.2, there is also a possibility to use hardness of finding discrete logarithms
as a basis for an asymmetric cryptosystem. General DLP is defined below.
Definition 10.2.8 Let G be a finite cyclic group of order g. Let α be a gener-
ator of this group so that G = {αi
|1 ≤ i ≤ g}. The discrete logarithm problem
(DLP) in G is the problem of finding 1 ≤ x ≤ g from a = αx
, where a ∈ G is
given.
For cryptographic purposes a group G should possess two main properties: 1.)
the operation in G should be efficiently performed and 2.) the DLP in G should
be difficult to solve (see Exercise 10.2.4). Cyclic groups that are widely used
in cryptography include the multiplicative group F∗
q of the finite field Fq (in
particular the multiplicative group Z∗
p for p prime), a group of points on an
elliptic curve over a finite field. Other possibilities that exist out there are the
group of units Z∗
n for a composite n, the Jacobian of a hyperelliptic curve defined
over a finite field, and the class group of an imaginary quadratic number field,
see Notes.
Here we consider classical El Gamal scheme based on the DLP. As we will see
the following description will do for any cyclic group with ”efficient description”.
Initially the multiplicative group of a finite field was used.
Algorithm 10.2.9 (El Gamal key generation)
Output: El Gamal public/private key pair ((G, α, h), a).
1. Choose some cyclic group G of order g = ord(G), where the group opera-
tion is done efficiently, and then choose its generator α.
2. Select a random integer a such that 1 ≤ a ≤ g − 2 and compute h = αa
.

3. The key pair is ((G, α, h), a).
Note that G and α can be fixed in advance for all users, so only h becomes a
public key. For encryption Alice uses the following algorithm.
Algorithm 10.2.10 (El Gamal encryption)
Input: Plaintext m and Bob’s encryption public key h together with α and the
group description of G.
1. Represent m as an element of G.
2. Select random b such that 1 ≤ b ≤ g − 2, where g = ord(G) and compute
c1 = αb
and c2 = m · hb
.
3. The ciphertext for sending to Bob is c = (c1, c2).
For decryption Bob uses the following
Algorithm 10.2.11 (El Gamal decryption)
Input: Ciphertext c, the private key a together with α and the group descrip-
tion of G.
1. In G compute m = c2 · c−a
1 = c2 · cg−1−a
1 , where g = ord(G).
2. The plaintext is m.
Let us see why we get initial m as a result of decryption. Using h = αa
we have
c2 · c−a
1 = m · hb
· α−ab
= m · αab
· α−ab
= m.
Example 10.2.12 For this example let us take the group Z∗
p where p = 8053
with a generator α = 2. Let us choose a private key to be a = 3117. Compute
h = αa
mod p = 3030. So the public key is h = 3030 and the private key is
a = 3117.
Suppose Alice wants to encrypt a message m = 1734 for Bob. For this she
chooses a random b = 6809 computes c1 = αb
mod p = 3540 and c2 = m · hb
mod p = 7336. So her ciphertext is c = (3540, 7336). Upon receiving c Bob
computes c2 · cp−1−a
1 mod p = 7336 · 35404935
mod 8053 = 1734.
Now we briefly discuss some issues connected with the El Gamal scheme.
• Message expansion: It should be noted that oppose to the RSA scheme,
ciphertexts in El Gamal are twice as large as plaintexts. So we have that
El Gamal scheme actually has a drawback of providing message expansion
by factor of 2.
• Randomization: Note that in Algorithm 10.2.10 we used randomization to
compute a ciphertext. Randomization in encryption gives an advantage
that the same message is mapped to different ciphertexts with different
encryption runs. This in turn makes chosen-plaintext attack more difficult.
We will see another example of an asymmetric scheme with randomized
encryption in Section 10.6, where we discuss McEliece scheme based on
error-correcting codes.

• Security reliance: The problem of breaking El Gamal scheme is equivalent
to the so-called (generalized) Diffie-Hellman problem, which is the problem
of finding αab
∈ G given αa
∈ G and αb
∈ G. Obviously enough, if one
is able to solve the DLP, then one is able to solve the Diffie-Helmann
problem, i.e. DLP is polytime reducible to the Diffie-Helmann problem
(cf. Definition 6.1.22). It is not known whether these two problems are
computationally equivalent. Nevertheless, it is believed that breaking El
Gamal is as hard as solving DLP.
• As we have mentioned before, El Gamal scheme is vulnerable to quantum
computer attacks. See Notes.
10.2.3 Some other asymmetric cryptosystems
So far we have seen examples of asymmetric cryptosystems based on hardness of
factoring integers (Section 10.2.1) and solving DLP in the multiplicative group of
a finite field (Section 10.2.2). Other examples that will be covered are McEliece
scheme that is based on hardness of decoding random linear codes (Section 10.6)
and solving DLP in a group of points of an elliptic curve over a finite filed (Sec-
tion ??). In this subsection we would like to briefly mention what are other
alternatives that exist out there.
The first direction we consider here is the so-called multivariate cryptography.
Here cryptosystems based on hardness of solving the multivariate quadratic
(MQ) problem. This problem is the problem of finding a solution x = (x1, . . . , xn) ∈
Fn
q to the system
y1 = f1(X1, . . . , Xn),
. . .
ym = fm(X1, . . . , Xn),
where fi ∈ Fq[X1, . . . , Xn], deg fi = 2, i = 1, . . . , m and the vector y = (y1, . . . ,
ym) ∈ Fm
q is given. This problem is known to be NP-hard, so is thought to be
a good source of a one-way function. The trapdoor is added by choosing fi’s
having some structure that is kept secret and allows decryption that e.g. boils
down to univariate factorization over a larger field. To an eavesdropper, though,
the system above with such a trapdoor should appear random. So the idea is
that the eavesdropper can do no better than solve a random quadratic system
over a finite field which is believed to be a hard problem. The cryptosystems
and digital schemes in this category include e.g. Hidden Field Equations (HFE),
SFLASH, Unbalanced Oil and Vinegar (UOV), Step-wise Triangular Schemes
(STS), and some others. Some of those were broken and several modification
were proposed to overcome the attacks (e.g. PFLASH, enSTS). At present it is
not quite clear whether it is possible to design a secure multivariate cryptosys-
tem. A lot of research in this area, though, gives a basis for optimism.
Another well-known example of a cryptosystem based on an NP-hard problem is
the knapsack cryptosystem. This cryptsystem was the first concrete realization
of an asymmetric scheme and was proposed in 1978 by Merkle and Hellman.
The knapsack cryptosystem is based on a well-known NP-hard subset sum prob-
lem. Namely the problem by given the set of positive integers A = {a1, . . . , an}
and the positive integer s to find a subset of A, such that the sum of elements
from A yields s. The idea of Merkle and Hellman was to make so-called super
increasing sequences, for which the above problem is easily solved, appear as

10.3. AUTHENTICATION, ORTHOGONAL ARRAYS, AND CODES 317
a random set A, thus providing a trapdoor. So an eavesdropper supposedly
has nothing better to do as to deal with well-known hard problem. This initial
proposal was broken by Shamir and later an improved version was broken by
Brickell. These attacks are based on integer lattices and made quite a shake in
cryptographic community at that time.
There are some other types of cryptosystems out there: polynomial based “Poly
Cracker”-type, lattice based, hash based, and group based. Therefore, we may
summarize that active research is being conducted in order to provide alterna-
tives to widely used cryptosystems.
10.2.4 Exercises
10.2.1 a. Given primes p = 5081 and q = 6829 and an encryption expo-
nent e = 37 find the corresponding decryption exponent and encrypt the
message m = 29800110.
b. Let e and m be as above. Generate (e.g. with Magma) a random RSA
modulus n of bit-length 25. For these n, e, m find the corresponding de-
cryption exponent via factorizing n; encrypt m.
10.2.2 Show that a number λ = lcm(p − 1, q − 1) that is called universal
exponent of n, can be used instead of φ in Algorithms 10.2.3 and 10.2.5.
10.2.3 Generate a public/private key pair for El Gamal scheme with G = Z∗
7121
and encrypt a message m = 5198 using this scheme.
10.2.4 Give an example of a finite cyclic group where the DLP problem is easy
to solve.
10.2.5 Show that using the same b in Algorithm 10.2.10 at least for two dif-
ferent encryptions is insecure, namely if c and c are two ciphertexts that
correspond to m and m , which were encrypted with the same b, then knowing
one of the plaintexts yields the other.
10.3 Authentication, orthogonal arrays, and codes
10.3.1 Authentication codes
In Section 10.1 we dealt with the problem of secure communication between two
parties by means of symmetric cryptosystems. In this section we address another
important problem, the problem of data source authentication. So we are now
interested in providing means for Bob to make sure that a (encrypted) message
he received from Alice indeed was sent by her and was not altered during the
transmission. In this section we consider so-called authentication codes that
provide tools necessary to ensure authentication. These codes are analyzed in
terms of unconditional security (see Definition 10.1.8). For practical purposes
one is more interested in computational security. Analogous to authentication
codes for this purpose are message authentication codes (MACs). It is also to be
noted that authentication codes are, in a sense, symmetric based, i.e. a secretly
shared key is needed to provide such an authentication. There is also asymmetric
analogue (Section 10.2) called a digital signature. In this model everybody can

verify Alice’s signature by publicly available verification algorithm. Let us go
on now to the formal definition of an authentication code.
Definition 10.3.1 An authentication code is defined by the following data:
• A set of source states S.
• A set of authentication tags T .
• A set of keys, the keyspace K.
• A set of authentication maps A parameterized by K: for each k ∈ K there
is an authentication map ak : S → T
We also define a message space M = S × T .
The idea of authentication is as follows. Alice and Bob secretly agree on some
secret key k ∈ K for their communication. Suppose that Alice wants to transmit
a message s which per definition above is called a source state. Note that now
we are not interested in providing secrecy for s itself, but rather in providing
means of authentication for s. For the transmission Alice adds an authentication
tag to s by t = ak(s). She then sends concatenated message (s, t). Usually (s, t)
is an encrypted message, maybe also encoded for error-correction, but for us it
does not play a role here. Suppose Bob receives (s , t ). He separates s and t
and checks whether s = ak(t ). If the check succeeds, he accepts s as a valid
message that came from Alice otherwise he rejects it. If no intrusion occurred
we have s = s and t = t and the check trivially succeeds. But what if Eve
wants to alter the message and make Bob believe that the altered by her choice
message still originates from Alice? There are two types of Eve’s malicious
actions one usually considers.
• Impersonation: Eve sends some message (s, t) with an intention that Bob
accepts it as Alice’s message, i.e. she aims at passing the check s = ak(t)
with high probability, where the key k is unknown to Eve.
• Substitution: Eve intercepts Alice’s message (s, t). Now she wants to
substitute instead another message (s , t ), where s = s such that ak(s ) =
t for the key k unknown to Eve.
As has already been said authentication codes are studied from the point of view
of unconditional security, i.e. we assume that Eve has unbounded computational
power. In this case we need to show that no matter how much computational
power Eve has, she cannot succeed in the above attack scenarios with a large
probability. Therefore, we need to estimate probabilities of success of imperson-
ation PI and substitution PS, given probability distributions pS and pK of the
source states set and key space resp. The probabilities PI and PS are also called
deception probabilities. Note that PI as well as PS are computed in assumption
that Eve tries to maximize her chances of deception. In reality Eve might want
not only to maximize her probability to pass the check, but also she might have
some preference as to which message she wants to substitute for Alice’s one. For
example, intercepting Alice’s message (s, t), where s =”Meeting is at seven”, she
would like to send something like (s , t ), where s =”Meeting is at six”. Thus
PI and PS actually provide an upper bound on Eve’s chances of success.

Let us first compute PI. Let us compute what is the probability of some mes-
sage (s, t) to be validated by Bob, when some private key k0 ∈ K is used.
In fact for Eve every key k that maps s to t will do. So, Pr(ak0
(s) = t) =
k∈K:ak(s)=t pK(k). Now in order to maximize her chances, Eve should choose
(s, t) with Pr(ak0
(s) = t) largest possible, i.e.
PI = max{Pr(ak0 (s) = t)|s ∈ S, t ∈ T }.
Note that PI depends only on the distribution pK and not on pS.
Computing PS is a bit trickier. We obtain that conditional probability Pr(ak0
(s )
= t |ak0
(s) = t) of the fact that Eve’s message (s , t ), s = s passes the check
once valid message (s, t) is known is
Pr(ak0 (s ) = t |ak0 (s) = t) =
Pr(ak0
(s )=t ,ak0
(s)=t)
Pr(ak0
(s)=t) =
= k∈K:ak(s )=t ,ak(s)=t pK(k)
k∈K:ak(s)=t pK(k) .
Having (s, t), Eve maximizes her chances by choosing (s , t ), s = s, such that
the corresponding conditional probability is maximal. To reflect this, introduce
ps,t := max{Pr(ak0 (s ) = t |ak0 (s) = t)|s ∈ S {s}, t ∈ T }. Now in order to
get PS we need to take weighted average of ps,t according to the distribution
pS:
PS =
(s,t)∈M
pM(s, t)ps,t,
where the distribution pM is obtained as pM(s, t) = pS(s)p(t|s) = pS(s)×
× Pr(ak0
(s) = t). The value Pr(ak0
(s) = t) is called pay-off of a message (s, t),
we denote it as π(s, t). Also Pr(ak0
(s ) = t |ak0
(s) = t) is a pay-off of a message
(s , t ) given a valid message (s, t), we denote it as πs,t(s , t ).
For convenience one may think of an authentication code as of array, which rows
are indexed by K, columns by S and an entry (k, s) for k ∈ K, s ∈ S has a value
ak(s), see Exercise 10.3.1.
We have discussed some basic things about authentication codes. So the ques-
tion now is what are important criteria for a good authentication code. These
are summarized below:
1. The deception probabilities must be small, so that eavesdropper’s chances
are low.
2. |S| should be large to facilitate authentication of potentially large number
of source states.
3. Note that since we are studying authentication codes from the point of
view of unconditional security, the secret key should be used only one, and
then changed for the next transmission as in one-time pad cf. Example ??.
Thus |K| should be minimized, because key values have to be transmitted
every time. E.g. if K = {0, 1}l
, then keys of length log2 |K| = l are to be
transmitted.
Let us now concentrate on item (1.); items (2.) and (3.) are considered in
the next sub-sections, where different constructions of authentication codes are
presented. We would like to see what values can be achieved by PI and PS
and under which circumstances do they achieve minimal possible values. Basic
results are collected in the following proposition.

Proposition 10.3.2 Let the authentication code with the data S, T , K, A, pS, pK
be fixed. We have
1. PI ≥ 1/|T |. Moreover, PI = 1/|T | iff π(s, t) = 1/|T | for all s ∈ S, t ∈ T .
2. PS ≥ 1/|T |. Moreover, PS = 1/|T | iff πs,t(s , t ) = 1/|T | for all s, s ∈
S, s = s ; t, t ∈ T .
3. PI = PS = 1/|T | iff π(s, t)πs,t(s , t ) = 1/|T |2
for all s, s ∈ S, s =
s ; t, t ∈ T .
Proof.
1. For a fixed source state s ∈ S we have
t∈T
π(s, t) =
t∈T k∈K:ak(s)=t
pK(k) =
k∈K
pK(k) = 1.
Thus for every s ∈ S there exists an authentication tag t = t(s) ∈ T , such
that π(s, t(s)) ≥ 1/|T |. Now the claim follows by the computation of PI
we made above. Note that equality is possible iff π(s, t) = 1/|T | for all
s ∈ S, t ∈ T .
2. For different fixed source states s, s ∈ S and a tag t ∈ T , such that (s, t)
is valid we have
t ∈T
πs,t(s , t ) =
t ∈T
k∈K:ak(s )=t ,ak(s)=t
pK(k)
k∈K:ak(s)=t
pK(k)
=
=
k∈K:ak(s)=t
pK(k)
k∈K:ak(s)=t
pK(k)
= 1.
So for every s , s, t, s = s there exists a tag t = t (s ) : πs,t(s , t (s )) ≥
1/|T |. Now the claim follows by the computation of PS we made above.
Note that equality is possible iff πs,t(s , t ) = 1/|T | for all s ∈ S, t ∈ T ,
due to the definition of ps,t.
3. If PI = PS = 1/|T |, then π(s, t) = 1/|T | for all s ∈ S, t ∈ T and
πs,t(s , t ) = 1/|T | for all s, s ∈ S, s = s ; t, t ∈ T . For all s, s ∈ S, s =
s ; t, t ∈ T we have π(s, t)πs,t(s , t ) = 1/|T |2
.
If π(s, t)πs,t(s , t ) = 1/|T |2
for all s, s ∈ S, s = s ; t, t ∈ T , then due to
the equality
π(s, t) = π(s, t)
t ∈T
πs,t(s , t ) =
t ∈T
π(s, t)πs,t(s , t ) =
t ∈T
1
|T |2
=
1
|T |
.
so PI = 1/|T | by (1.). Now
πs,t(s , t ) =
1
|T |2π(s, t)
=
1
|T |
.
So PS = 1/|T | by (2.).

As a straightforward consequence we have
Corollary 10.3.3 With the notation as above and assuming that pK is the uni-
form distribution (keys are equiprobable), we have PI = PS = 1/|T | iff
|{k ∈ K : ak(s ) = t , ak(s) = t}| =
|K|
|T |2
,
for all s, s ∈ S, s = s ; t, t ∈ T .
10.3.2 Authentication codes and other combinatorial ob-
jects
Authentication codes from orthogonal arrays
Now we take a look at certain combinatorial objects, called orthogonal arrays
that can be used for constructing authentication systems. A bit later we also
consider a construction that uses error-correcting codes. For the definitions and
basic properties of orthogonal arrays the reader is referred to Chapter 5, Section
5.5.1. What is important for us is that orthogonal arrays yield a construction
of authentication codes in quite a natural way. The next proposition shows a
relation between orthogonal arrays and authentication codes.
Proposition 10.3.4 If there exists an orthogonal array OA(n, l, λ) with sym-
bols from the set N with n elements, then one can construct an authentication
code with |S| = l, |K| = λn2
, T = N and thus |T | = n, for which PI = PS = 1/n.
Conversely, if there exists an authentication code with the above parameters,
then there exists an orthogonal array OA(n, l, λ).
Proof. Consider OA(n, l, λ) as an array representation of an authentication
code from Section 5.5.1. Moreover, set pK to be uniform, i.e. pK(k) = 1/(λn2
)
for every k ∈ K. Then values of parameters of such a code easily follow. In order
to obtain values for PI and PS use Corollary 10.3.3. Indeed, |{k ∈ K : ak(s ) =
t , ak(s) = t}| = λ by the definition of an orthogonal array, but λ = |K|/|T |2
.
The claim now follows. The converse if proved analogously.
Let us now consider which criteria should be met by orthogonal arrays in order
to produce good authentication codes. Parameters estimates for orthogonal
arrays in terms of of authentication codes parameters n, l, λ follow directly from
the above proposition.
• If we set that deception probabilities should be at most some value: PI ≤
, PS ≤ , then an orthogonal array should have n ≥ 1/ .
• As we can always remove some columns from an orthogonal array and still
obtain one after removal, we demand that l ≥ |S|.
• λ should be minimized under constraints imposed by the previous two
items. This is due to the fact that we would like to keep key space size as
low as possible, as has already been noted in the previous sub-section.

Finally, we present without proofs two characterization results, which say that
if one wants to construct authentication codes with minimal deception proba-
bilities, one cannot avoid using orthogonal arrays.
Theorem 10.3.5 Assume there exists an authentication code defined by S, T , K, A,
pK, pS with |T | = n and PI = PS = 1/n. Then:
1. |K| ≥ n2
. The equality is achieved iff there exists an orthogonal array
OA(n, l, 1) with l = |S| and pK(k) = 1/n2
for every k ∈ K.
2. |K| ≥ l(n − 1) + 1. The equality is achieved iff there exists an orthogonal
array OA(n, l, λ) with l = |S|, λ = (l(n − 1) + 1)/n2
and pK(k) = 1/(l(n −
1) + 1) for every k ∈ K.
Authentication codes from error-correcting codes
As we have seen above, if one wants to keep deception probabilities minimal,
one has to deal with orthogonal arrays. A significant drawback of this approach
is that the key space grows linearly in size of the source state set. In particular
we have from Theorem 10.3.5 (2.) that |K| l ≥ |S|. This means that amount
of information that needs to be transmitted secretly is larger than the one
that is allowed to go through a public channel. The same problem occurs in
the one-time pad scheme, Example ??. Of course, this is not quite practical.
In this sub-section we consider so-called almost universal and almost strongly
universal hash functions. By means of these functions it is possible to construct
authentication codes with deception probabilities slightly larger than minimal,
but size of the source state set of which grows exponentially in the key space size.
This gives an opportunity to work with much shorter keys sacrificing security
threshold a bit.
Next we give a definition of an almost universal hash function.
Definition 10.3.6 Let X and Y be some sets of cardinality n and m respec-
tively. Consider the family H of functions f : X → Y . Denote N := |H|. We
call a family H -almost universal, if for every two different x1, x2 ∈ X the
number of functions f from H such that f(x1) = f(x2) is ≤ N. Notation for
such a family is − AU(N, n, m).
There is a natural connection between almost universal hash functions and error-
correcting codes as is shown next.
Proposition 10.3.7 The existence of one of the two objects below implies the
existence of the other:
1. H = − AU(N, n, m) family of almost universal hash functions.
2. An m-ary error-correcting code C of length N, cardinality n and relative
minimum distance d/N ≥ 1 − .
Proof. Let us first describe − AU(N, n, m) as an array, similarly to how
we have done it for orthogonal arrays. Rows of the representation array are
indexed by functions from H and columns by the set X. On the place indexed
by f ∈ H and x ∈ X we write f(x) ∈ Y . Now the equivalence becomes clear.

Indeed, consider this array also as a code-book for an error-correcting code C,
so that the codewords are written in columns. It is clear that the length is the
number of rows, N, cardinality is the number of columns, n. Entries of the
array take their values from Y , thus C is an m-ary code. Now the definition
of H implies that for any two codewords x1 and x2 (columns), the number of
positions where they agree is ≤ N. But d(x1, x2) is the number of positions
where they disagree, so d(x1, x2) ≥ (1 − )N, so d/N ≥ 1 − . The reverse
implication is proven analogously.
Next we define almost strongly universal hash functions that are used for au-
thentication.
Definition 10.3.8 Let X and Y be sets of cardinality n and m respectively.
Consider a family H of functions f : X → Y . Denote N := |H|. We call a
family H -almost strongly universal, if the following two conditions hold:
1. For every x ∈ X and y ∈ Y the number of functions f from H such that
f(x) = y is N/m.
2. For every two different x1, x2 ∈ X and every y1, y2 ∈ Y the number of
functions f from H such that f(xi) = yi, i = 1, 2 is ≤ · N/m.
Notation for such a family is − ASU(N, n, m).
Almost strongly universal hash functions are nothing but authentication codes
with some conditions on the deception probabilities. The following proposition
is quite straightforward and is left to the reader as an exercise.
Proposition 10.3.9 If there exists a family H which is −ASU(N, n, m), then
there exists an authentication code with K = H, S = X, T = Y , pK a uniform
distribution, such that PI = 1/m and PS ≤ .
Note that if = 1/m in Definition 10.3.8, then from Proposition 10.3.9, 10.3.2
(2.), 10.3.4 we see that − ASU(N, n, m) is actually an orthogonal array. The
problem with orthogonal arrays has already been mentioned above. Note that
with almost strongly universal hash functions we have more freedom, as we can
make a bit larger, but gaining in other parameters, as we will see below. So
for us it is interesting to be able to construct good ASU-families. There are two
methods of doing so based on coding theory:
1. Construct AU-families from codes as per Proposition 10.3.7 and then use
Stinson’s composition method, Theorem10.3.10 below.
2. Construct ASU-families directly from error-correcting codes.
Here we consider (1.). For (2.) see the Notes. The next result due to Stinson
enables one to construct ASU-families from AU-families and some previously
constructed ASU-families; we omit the proof thereof.
Theorem 10.3.10 Let X, Y, U be sets of cardinality n, m, u resp. Let H1 be
an AU-family 1 − AU(N1, n, u) of functions f1 : X → U and let H2 be an
ASU-family 2 − ASU(N2, u, m) of functions f2 : U → Y . Consider a family H
of all possible compositions thereof: H = {f|f = f2 ◦f1, fi ∈ Hi, i = 1, 2}. Then
H is − ASU(N, n, m), where = 1 + 2 − 1 2 and N = N1N2.

Table 10.2: For Exercise 10.3.1
1 2 3
1 2 1 2
2 3 2 1
3 1 1 3
4 2 3 2
One example of the idea (1.) that employs Reed-Solomon codes is given in
Exercise 10.3.2. Note that from Exercise 10.3.2 and Proposition 10.3.9 follows
that there exists an authentication code with |S| = |K|(2/5)|K|1/5
(set a = 2b)
and PI = 1/|T |, PS = 2/|T |. So by allowing the probability of substitution
deception to rise just two times from the minimal value, we obtain that |S|
grows exponentially in |K|, which was not possible with orthogonal arrays, where
always |K| |S|.
10.3.3 Exercises
10.3.1 An authentication code is represented by the array in Table 10.2 (cf.
Sections 10.3.1, 5.5.1). The distributions pK and pS are given as follows:
pS(1) = pS(3) = 1/4, pS(2) = 1/2; pK(1) = pK(2) = pK(3) = 1/6, pK(4) = 1/2.
Compute PI and PS.
Hint: For computing the sums use the following: e.g. for the sum
k∈K:ak(s)=t
pK(k)
look at the column corresponding to s and look at which rows entry t appear,
then sum up probabilities that correspond to marked rows (they are indexed by
keys).
10.3.2 Consider a q-ary [q, k, q − k + 1] Reed-Solomon code.
• Construct the corresponding AU-family using Proposition 10.3.7. What
are parameters thereof?
It is known that for natural numbers a, b : a ≥ b and q a prime power there
exists an ASU-family 1/qb
− ASU(qa+b
, qa
, qb
). Using Stinson’s composition,
Theorem 10.3.10,
• prove that there exists an ASU-family 2/qb
− ASU(q2a+b
, qaqa−b
, qb
) with
1/qa
+ 1/qb
.
10.4 Secret sharing
In the model of symmetric (Section 10.1) and asymmetric (Section 10.2) cryp-
tography a one-to-one relation between Alice and Bob is assumed, maybe with

10.4. SECRET SHARING 325
a trusted party in the middle. This means that Alice and Bob have neces-
sary pieces of secret information to convey the communication between them.
Sometimes it is necessary to distribute this secret information among several
participants. Possible scenarios of such applications are: distributing the secret
information among the participants in a way so that even if some participants
lost some pieces of this secret information it is still possible to reconstruct the
whole secret; also sometimes shared responsibility is required, i.e. some action
is to be triggered only when several participants combine their secret pieces of
information to form the one that triggers that action. Examples of the latter
one could be triggering some military action (e.g. missile launch) by several
authorized persons (e.g a president, higher military officials) or opening a bank
vault by several top-officials of a bank. In this section we consider mathematical
means to achieve the goal. The schemes providing such functionality are called
secret sharing schemes. We consider in detail the first such scheme proposed
by Adi Shamir in 1979. Then we also briefly demonstrate how error-correcting
codes can be used for a construction of linear secret sharing schemes.
In secret sharing schemes shares are produced from the secret to be shared.
These shares are then assigned to participants of the scheme. The idea is that
if several authorized participants gather in a group that is large enough they
should be able to reconstruct the secret using knowledge of their shares. On
the contrary if a group is too small or some outsiders decide to find out the
secret, their knowledge should not be enough to figure it out. This leads to the
following definition.
Definition 10.4.1 Let Si, i = 1, . . . , n be the shares that are produced from
the secret S. Consider a collection of n participants where each participant is
assigned his/her share Si. A (t, n) threshold scheme is a scheme where every
group of t (or more) participants out of n can obtain the secret S using their
shares. On the other hand any group of less than t participants should not be
able to obtain S.
We next present the Shamir’s secret sharing scheme that is a classical example
of a (t, n) threshold scheme for any n and t ≤ n.
Algorithm 10.4.2 (Shamir’s secret sharing scheme)
Set-up: taking n as input prepare the scheme for n participants
1. Choose some prime power q n and fix a working field Fq that will be
used for all the operations in the scheme.
2. Assign to the n participants P1, . . . , Pn some distinct non-zero elements
x1, . . . , xn ∈ F∗
q.
Input: The threshold value t, the secret information S in some form.
Output: The secret S is shared among n participants.
Generation and distribution of shares:
1. Encode a secret to be shared as an element S ∈ Fq. If this is not possible
redo the Set-up phase with a larger q.
2. Choose randomly t − 1 elements a1, . . . , at−1 ∈ Fq. Assign a0 := S and
form the polynomial f(X) =
t−1
i=0 aiXi
∈ Fq[X].

3. For i = 1, . . . , n do
• compute value yi = f(xi), i = 1, . . . , n and assign the value yi to Pi.
Computing the secret from the shares:
1. Any t participants Pi1
, . . . , Pit
pull their shares yi1
, . . . , yit
together and
then using e.g. Lagrange interpolation with t interpolation points (xi1
, yi1
),
. . . , (xit
, yit
) restore f and thus a0 = S = f(0).
The part ”Computing the secret from the shares” is clearly justified by the
following formulas of Lagrange interpolation (w.l.o.g. the first t participants
pull their shares):
f(X) =
t
i=1
yi
j=i
X − xj
xi − xj
,
so that f(xi) = yi, i = 1, . . . , t and f is a unique polynomial of degree ≤ t − 1
with this property. Of course the participants do not have to reconstruct the
whole f, they just need to know a0 that can be computed as
S = a0 =
t
i=1
ciyi, ci =
j=i
xj
xj − xi
. (10.2)
So every t or more participants can recover the secret value S = f(0). On the
other hand it is possible to show that for any t − 1 shares (w.l.o.g. the first
ones) (xi, yi), i = 1, . . . , t − 1 and any a ∈ Fq there exists a polynomial fa such
that its evaluation at 0 is a. Indeed, take fa(X) = a + X ˜fa(X), where ˜fa(X) is
the Lagrange polynomial of degree ≤ t − 2 such that ˜fa(xi) = (yi − a)/xi, i =
1, . . . , t − 1 (recall that xi’s are non-zero). Then deg fa ≤ t − 1, fa(xi) = yi, and
fa(0) = a. So this means that any t−1 (or less) participants have no information
about S: the best they can do is to guess the value of S, the probability of such
a guess is 1/q. This is because, to their knowledge, f can be any of fa’s.
Example 10.4.3 Let us construct a (3, 6) Shamir’s threshold scheme. Take
q = 8 and fix the field F8 = F2[α]/ α3
+ α + 1 . Element α is a generating
element of F8
∗
. For i = 1, . . . , 6 assign xi = αi
to the participant Pi. Suppose
that the secret S = α5
is to be shared. Choose a1 = α3
, a2 = α6
, so that f(X) =
α5
+α3
X+α6
X2
. Now evaluate y1 = f(α) = α3
, y2 = f(α2
) = α3
, y3 = f(α3
) =
α6
, y4 = f(α4
) = α5
, y5 = f(α5
) = 1, y6 = f(α6
) = α6
. For every i = 1, . . . , 6
assign yi as a share for Pi. Now suppose that the participants P2, P3, and P5
decide to pull their shares together and obtain S. As in (10.2) they compute
c2 = x3
x3−x2
x5
x5−x2
= 1, c3 = 1, c5 = 1. Accordingly, c2y2 + c3y3 + c5y5 = α5
= S.
On the other hand, due to the explanation above, any 2 participants cannot
deduce S from their shares. In other words, any element of F8 is equally likely
for them to be the secret.
See Exercise 10.4.1 for a simple construction of a (t, t) threshold scheme.
Next let us outline how one can use linear error-correcting codes to construct
secret sharing schemes. Let us fix the finite field Fq: the secret values will be
drawn from this field. Also consider an [n, k]q linear code C with a generator
matrix G that has g0, . . . , gn−1 as columns (we add dashes to indicate that

10.4. SECRET SHARING 327
these are columns and they are not to be confused with the usual notation for
rows of G). Choose some information vector a ∈ Fk
q such that S = ag0, where
S is the secret information. Then compute s = (s0, s1, . . . , sn−1) = aG. Now
s0 = S and s1, . . . , sn−1 can be used as shares. The next result characterizes a
situation when the secret S can be obtained from the shares.
Proposition 10.4.4 With the notation as above, let si1
, . . . , sim
be some shares
for 1 ≤ m ≤ n − 1. These shares can reconstruct the secret S iff c⊥
=
(1, 0, . . . , 0, ci1
, 0, . . . , 0, cim
, 0, . . . , 0) ∈ C⊥
, where at least one cij
= 0.
Proof. The claim follows from the fact that G · c⊥T
= 0 and that the secret
S = ag0 can be obtained iff g0 is a linear combination of gi1
, . . . , gim
.
If we carefully look one more time at Shamir’s scheme it is not a surprise that
it can be seen as the above construction with Reed-Solomon code as a code C.
Indeed, choose N = q−1 and set xi = αi
where α is a primitive element of Fq. It
is then quite easy to see that encoding the secret and shares via the polynomial
f as in Algorithm 10.4.2 is equivalent to encoding via the Reed-Solomon code
RSt(N, 1), cf. Definition 8.1.1 and Proposition 8.1.4. The only nuance is that
in general we may assign some n ≤ N shares and not all N. Now we need to
see that every collection of t shares reconstructs the secret. Using the above
notation, let si1 , . . . , sit be the shares pulled together. According to Proposition
10.4.4 the dual of C = RSt(N, 1) should contain a codeword with the 1 at the
first position and at least one non-zero element at positions i1, . . . , it. From
Proposition 8.1.2 we have that RSt(N, 1)⊥
= RSN−t(N, N) and RSN−t(N, N)
is an MDS [N, N − t, t + 1] code. We use now Corollary 3.2.14 with the t + 1
positions 1, i1, . . . , it and we are guaranteed to have a prescribed codeword.
Therefore every collection of t shares reconstructs the secret. Having xi = αi
is
not really a restriction (Exercise 10.4.3).
In general the problem of constructing secret sharing schemes can be reduced
to finding codewords of minimum weight in a dual code as per Proposition
10.4.4. There are more advanced constructions based on error-correcting codes,
in particular based on AG-code, see Notes for the references.
It is clear that if a group of participants can recover the secret by combining their
shares, then any group of participants containing this group can also recover the
secret. We call a group of participants a minimal access set, if the participants
of this group can recover the secret with their shares, while any proper subgroup
of participants can not do so. From the preceding discussions, it is clear that
there is one-to-one correspondence between the set of minimal access sets and
the set of minimal weight codewords of the dual code C⊥
whose first coordinate
is 1. Therefore, for a secret sharing scheme based on a code C, the problem
of determining the access structure of the secret sharing scheme is reduced to
the problem of determining the set of minimal weight codewords whose first
coordinate is 1. It is obvious that the shares for the participants depend on the
selection of the generator matrix G of the code C. However, by Proposition
??, the selection of generator matrix does not affect the access structure of the
secret sharing scheme.
Note that the set of minimal weight codewords whose first coordinate is 1 is a
subset of the set of all minimal weight codewords. The problem of determining
the set of all minimal weight codewords of a code is known as the covering

problem. This problem is a hard problem for an arbitrary linear code. In the
following, let us have some more discussions on the access structure of secrete
sharing schemes based on special classes of linear codes. It is clear that for any
participant, he (she) must be in at least one minimal access set. This is true for
any secret sharing scheme. Now, we further ask the following question: Given a
participant Pi, how many minimal access sets are there which contain Pi? This
question is solved if the dual code of the code used by the secret sharing scheme
is a constant weight code. In the following proposition, we suppose C is a q-ary
[n, k] code, and G = (g0, g1, . . . , gn−1) is a generator matrix of C.
Proposition 10.4.5 Suppose C is a constant weight code. Then, in the secret
sharing scheme based on C⊥
, there are qk−1
minimal access sets. Moreover, we
have the following:
(1) If gi is a scalar multiple of g0, 1 ≤ i ≤ n − 1, then every minimal access
set contains the participant Pi. Such a participant is called a dictatorial
participant.
(2) If gi is not a scalar multiple of g0, 1 ≤ i ≤ n−1, then there are (q−1)qk−1
minimal access sets which contain the participant Pi.
Proof. .........will be given later.........
The following problem is an interesting research problem: Identify (or construct)
linear codes which are good for secret sharing, that is, the covering problem can
be solved, or the minimal weight codewords can be well characterized. Several
classes of linear codes which are good for secret sharing have been identified,
see the papers by C. Ding and J. Yuan.
10.4.1 Exercises
10.4.1 Suppose that some trusted party T wants to share a secret S ∈ Zm
between two participants A and B. For this, T generates some random number
a ∈ Zm and assigns it to A. T then assigns b = S − a mod m to B.
• Show that the scheme above is a (2, 2) threshold scheme. This scheme is
an example of a split-knowledge scheme.
• Generalize the idea above to construct a (t, t) threshold scheme for arbi-
trary t.
10.4.2 Construct a (4, 7) Shamir’s threshold scheme and share the bit-string
”1011” using it.
Hint: Represent the bit-string ”1011” as an element of a finite field with more
than 7 elements.
10.4.3 Remove the restriction on xi being equal to αi
in the Reed-Solomon
construction of Shamir’s scheme by using Proposition 3.2.10.

10.5. BASICS OF STREAM CIPHERS. LINEAR FEEDBACK SHIFT REGISTERS329
10.5 Basics of stream ciphers. Linear feedback
shift registers
In Section 10.1 we have seen how block ciphers are used for construction of
symmetric cryptosystems. Here we give some basics of stream ciphers, i.e. ci-
phers that proceed information bitwise as oppose to blockwise. Stream ciphers
are usually faster than block ciphers and have lower requirements on imple-
mentation costs. Nevertheless, stream ciphers appear to be more susceptible to
cryptanalysis. Therefore, much care should be put in designing a secure cipher.
In this section we concentrate on stream cipher design that involves the linear
feedback shift register (LFSR) as one of the building blocks.
The difference between block and stream ciphers is quite vague, since a block
cipher can be turned to a stream one using some special mode of operation.
Nevertheless, let us see what are characterizing features of such ciphers. A
stream cipher is defined via its stream of states S, the keystream K, and the
stream of outputs C. Having an input (plaintext) stream P, one would like to
obtain C using S and K by operating successively on individual units of these
streams. The streams C and K are obtained using some key, either secret or
not. If these units are binary bits, we are dealing with the binary cipher.
Consider an infinite sequence (a stream) of key bits k1, k2, k3, . . . , and a stream
of plaintext bits p1, p2, p3, . . . . Then we can form a ciphertext stream by sim-
ply adding the key stream and the plaintext stream bitwise: ci = pi ⊕ ki, i =
1, 2, 3, . . . . One can stop at some moment n thus having the n-bit ciphertext
from the n-bit key and the n-bit plaintext. If ki’s are chosen uniformly at
random and independently, we have the one-time pad scheme. It can be shown
that in the one-time pad if an eavesdropper only possesses the ciphertext, he/she
cannot say anything about the plaintext. In other words, the knowledge of the
ciphertext does not shed any additional light on the plaintext for an eavesdrop-
per. Moreover, an eavesdropper even knowing n key bits is completely uncertain
about the (n + 1)-th bit. This is a classical example of unconditionally secure
cryptosystem, cf. Definition 10.1.8.
Although the above idea yields provable guarantees for security it has an es-
sential drawback: a key should be at least as long as a plaintext, which is a
usual thing in unconditionally secure systems, see also Section 10.3.1. Clearly
this requirement is quite impractical. That is why what is usually done is the
following. One starts with some bitstring of a fixed size called a seed, and then
by making some operations with this string obtains some larger string (it can
be infinite theoretically), which should “appear random” to an eavesdropper.
Note that since the seed is finite we cannot talk about unconditional security
anymore, only computational. Indeed, having long enough key stream in the
known-plaintext scenario, it is in principle possible to run an exhaustive search
on all possible seeds to find out the one that gives rise to the given key stream.
In particular all the successive bits of the key stream will be known.
Now let us present two commonly used types of stream ciphers: synchronous
and self-synchronizing. Let P = {p0, p1, . . . } be the plaintext stream, K =
{k0, k1, . . . } be the keystream, C = {c0, c1, . . . } be the ciphertext stream, and
S = {s0, s1, . . . } be the state stream. The synchronous stream ciphersynchronous

stream cipher is defined as follows:
si+1 = f(si, k),
ki = g(si, k),
ci = h(ki, pi), i = 0, 1, . . . .
Here s0 is the initial state and f is the state function, which generates a next
state from the previous one and also depends on a key. Now ki’s form the key
stream via the function g. See Exercise 10.5.1 for some toy example. Finally
the ciphertext is formed by applying the output function h to the bits ki and
pi. This cipher is called synchronous, since both Alice and Bob need to use
the same key stream (ki)i. If some (non-)malicious insertions/deletions occur
the synchronization is lost, so additional means for providing synchronization
are necessary. Note that usually the function h is just a bitwise addition of
streams (ki)i and (pi)i. It is also very common for stream ciphers to have an
initialization phase, where only the states si are updated first and the update
and output starts to happen only at some later point of time. Therewith the
key stream (ki) gets more complicated and dependent on more state bits.
The self-synchronizing stream cipher is defined as
si = (ci−t, . . . , ci−1),
ki = g(si, k),
ci = h(ki, pi), i = 0, 1, . . . .
Here (c−t, . . . , c−1) is a non-secret initial state. So the encryption/decryption
depends only on some number of ciphertext bits, therefore the output stream is
able to recover from deletions/insertions.
Observe that if h is a bitwise addition modulo 2, then the stream ciphers de-
scribed above follow the idea of the one-time pad. The difference is that now
one obtains the key stream (ki)i not fully randomly, but as a pseudorandom
expansion of an initial state (seed) s0. The LFSR is used as a building block
in many stream ciphers that facilitates such a pseudorandom expansion. The
LFSRs have an advantage that they can be efficiently implemented in hardware.
Also the outputs of LFSRs have nice statistical properties. Moreover, LFSRs
are closely related to so-called linear recurring sequences that are readily stud-
ied via algebraic methods.
Schematically an LFSR can be presented as in Figure 10.3
Let us figure out what is going on the diagram. First the notation. A square
box is a delay box sometimes called ”flip-flop”. Its task is to pass its stored
value further after each unit of time set by a synchronizing clock. A circle
with the value ai in it performs AND operation or multiplication modulo 2 on
the input with the prescribed ai. The plus sign in a circle clearly means the
XOR operation or addition modulo 2. Now the square boxes are initialized with
some values, namely the box Di gets some value si ∈ {0, 1}, i = 0, . . . , L − 1.
When the first time unit comes to an end the following happens: the value s0
becomes an output bit. Then all values si, i = 1, . . . , L − 1 are shifted from
Di to Di−1. Simultaneously for each i = 0, . . . , L − 1 the value si goes to an
AND-circle, gets multiplied with ai and then all these products are summed
up by means of plus-circles, so that the sum ⊕L−1
i=0 aisi is formed. This sum is
written to DL−1 and is called sL. The same procedure takes place at the end
of the next time unit: now s1 is the output, the remaining values are shifted,

E
DL−1
Es
DL−2
s q q q Es
D1
Es
D0
Es
Output
T T T T

aL−1 aL−2 a2 a1
T T T T
T

a0
e e e e' ' q q q ' '
Figure 10.3: Diagram of an LFSR
and sL+1 = ⊕L
i=1ai−1si is written to DL−1. Analogously one proceeds further.
The name “Linear Feedback Shift Register” is clear now: we use only linear
operations here (multiplication by ai’s and addition), the values that appear in
D0, . . . , DL−2 give feedback to DL−1 by means of a sum of the type described,
also the values are being shifted from Di to Di−1.
Algebraically LFSRs are studied via the notion of linear recurring sequences,
which we introduce next.
Definition 10.5.1 Let L be a positive integer, let a0, . . . , aL−1 be some values
from F2. A sequence S, which first L elements are s0, . . . , sL−1 that are values
from F2 and the defining rule is
sL+i = aL−1sL+i−1 + aL−2sL+i−2 + · · · + a0si, i ≥ 0, (10.3)
is called the (L-th order) homogeneous linear recurring sequence in F2. The
elements s0, . . . , sL−1 are said to form the initial state sequence.
Obviously, a homogeneous linear recurring sequence represents an output of
some LFSR and vice versa, so we will use the both notions interchangeably.
Another important notion that comes along with the linear recurring sequences
is the following.
Definition 10.5.2 Let S be an L-th order homogeneous linear recurring se-
quence in F2 defined by (10.3). Then the polynomial
f(X) = XL
+ aL−1XL−1
+ · · · + a0 ∈ F2[X]
is called the characteristic polynomial of S.
Remark 10.5.3 The characteristic polynomial is also sometimes defined as
g(X) = 1+aL−1X +· · ·+a0XL
and is called connection or feedback polynomial.
We have g(X) = f(1/X). Everything that will be said about f(X) in the sequel
remains true also for g(X).

E
D1
Er
D0
Er
(a)
T
r
T
r
E
D1
E
D0
Er
(b)
T
r
E
D1
rE
D0
E
(c)
T
r
Figure 10.4: Diagram to Example 10.5.4
Example 10.5.4 On Figure 10.4 (a), (b), and (c) the diagrams for LFSRs with
the characteristic polynomials X2
+ X + 1, X2
+ 1, X2
+ X are depicted. We
removed the circles and sometimes also connected lines, since we are working in
F2, so ai ∈ {0, 1}. The table for the case (a) looks like this
D0 D1 output
1 0 -
1 1 0
0 1 1
1 0 1
1 1 0
. . . . . . . . .
So we see that the output sequence is actually periodic with period 3. The value
3 for the period is maximum one can get for L = 2. This is due to the fact that
X2
+ X + 1 is irreducible and moreover primitive, see below Theorem 10.5.8.
For the case (b) we have
D0 D1 output
1 0 -
0 1 0
1 0 1
. . . . . . . . .
and the period is 2. For the case (c) we have
D0 D1 output
1 0 -
1 1 0
1 1 1
. . . . . . . . .
So the output sequence here is not periodic, but is ultimately periodic, i.e. peri-
odic starting at position 2 and the period here is 1. The non-periodicity is due
to the fact that for f(X) = X2
+ X, f(0) = 0, see Theorem 10.5.8.
Example 10.5.5 Let us see how one can handle LFSRs in Magma. In Magma
one works with a connection polynomial (Remark 10.5.3). For example we
are given the connection polynomial f = X6
+ X4
+ X3
+ X + 1 and ini-
tial state sequence (s0, s1, s2, s3, s4, s5) = (0, 1, 1, 1, 0, 1), then the next state

(s1, s2, s3, s4, s5, s6) can be computed as
PX:=PolynomialRing(GF(2));
f:=X^6+X^4+X^3+X+1;
S:=[GF(2)|0,1,1,1,0,1];
LFSRStep(f,S);
[ 1, 1, 1, 0, 1, 1 ]
By writing
LFSRSequence(f,S,10);
[ 0, 1, 1, 1, 0, 1, 1, 1, 1, 0 ]
we get the next 10 state values s6, . . . , s15.
In Sage one can do the same in the following way
con_poly=[GF(2)(i) for i in [1,0,1,1,0,1]]
init_state=[GF(2)(i) for i in [0,1,1,1,0,1]]
n=10
lfsr_sequence(con_poly, init_state, n)
[0, 1, 1, 1, 0, 1, 1, 1, 1, 0]
So one has to provide the connection polynomial via its coefficients.
As we have mentioned, the characteristic polynomial plays an essential role
in determining the properties of a linear recurring sequence and the associated
LFSR. Next we summarize all the results concerning a characteristic polynomial,
but first let us make precise the notions of periodic and ultimately periodic
sequences.
Definition 10.5.6 Let S = {si}i≥0 be a sequence such that there exists a
positive integer P such that sP +i = si ∀i = 0, 1, . . . . Such a sequence is called
periodic and P is a period of S. If the property sP +i = si holds for all i starting
from some non-negative P0, then such a sequence is called ultimately periodic
also with a period P. Note that a periodic sequence is also ultimately periodic.
Remark 10.5.7 Note that periodic and ultimately periodic sequences have
many periods. It turns out that the least period always divides any other
period. We will refer to the term period meaning the least period of a sequence.
Now the main result follows.
Theorem 10.5.8 Let S be an L-th order homogeneous linear recurring se-
quence and let f(X) ∈ F2[X] be its characteristic polynomial. The following
holds:
1. S is an ultimately periodic sequence with period P ≤ 2L
− 1.
2. If f(0) = 0, then S is periodic.
3. If f(X) is irreducible over F2, then S is periodic with period n such that
P|(2L
− 1).
4. If f(X) is primitive *** recall the definition? *** over F2, then S is
periodic with period P = 2L
− 1.
Definition 10.5.9 A homogeneous linear recurring sequence S with f(X) a
primitive characteristic polynomial is called maximal period sequence in F2 or
an m-sequence.

The notions and results above can be generalized to the case of arbitrary finite
field Fq. It is notable that one can compute the characteristic polynomial of
an L-th order homogeneous linear recurring sequence S by knowing any sub-
sequence of length at least 2L by means of an algorithm by Berlekamp and
Massey, which is essentially the one from Section 9.2.2 *** more details have
to be provided here to show the connection. The explanation in 9.2.2 is a bit
too technical. Maybe we should introduce a simple version of BM in Chapter
6, which we could then use here? ***. See also Exercise 10.5.3.
Naturally one is interested in obtaining sequences with large period. Therefore
m-sequences have primary application. These sequences have nice statistical
properties. For example the distribution of patterns that have length ≤ L is al-
most uniform. The notion of linear complexity is used as a tool for investigating
the statistical properties of outputs of LFSRs. Roughly speaking, a linear com-
plexity of a sequence is a minimal L such that there exists a homogeneous linear
recurring sequence with the characteristic polynomial of degree L. Because of
nice statistical properties LFSRs can be used as pseudorandom bit generators,
see Notes.
An obvious cryptographic drawback of LFSRs is the fact that the whole output
sequence can be reconstructed by having just 2L bits of it, where L is the linear
complexity of the sequence. This obstructs using LFSRs as cryptographic prim-
itives, in particular as key stream generators. Nevertheless, one can use LFSRs
in certain combinations, add non-linearity and obtain quite effective and se-
cure key stream generators for stream ciphers. Let us briefly describe the three
possibilities of such combinations.
• Nonlinear combination generator. Here one transmits outputs of l LFSRs
L1, . . . , Ll to a non-linear function f with l inputs. The output of f
becomes then the key stream. The function f should be chosen to be
correlation immune, i.e. there should be no correlation between the output
of f and outputs of any small subset of L1, . . . , Ll.
• Nonlinear filter generator. Here the L delay boxes at every time unit end
give their values to a non-linear function g with L inputs. The output of
g becomes then the key stream. The function g is chosen in a way that
its algebraic representation is dense.
• Clock-controlled generator. Here outputs of one LFSRs control the clocks
of other LFSRs that compose the cipher. In this way the non-linearity is
introduced.
For some examples of the above, see Notes.
10.5.1 Exercises
10.5.1 Consider an example of a synchronous cipher defined by the following
data. The initial state is s0 = 10010. The function f shifts its argument by 3
positions to the right and adds 01110 bitwise. Now g is defined to sum up bits
with positions 2,4, and 5 modulo 2 to obtain a keystream bit. Compute the first
6 key stream bits of such a cipher.
10.5.2 a. The polynomial f(X) = X4
+ X3
+ 1 is primitive over F2. Draw
a diagram of an LFSR that has f as the characteristic polynomial. Let

10.6. PKC SYSTEMS USING ERROR-CORRECTING CODES 335
(0, 1, 1, 0) = (s0, s1, s2, s3) be the initial state. Compute the output of
such LFSR up to the point when it is seen that the output sequence is
periodic. What is the period?
b. Rewrite (a.) in terms of a connection polynomial. Take the same initial
state and compute (e.g. with Magma) enough output sequence values to
see the periodicity.
10.5.3 Let s0, . . . , s2L−1 be the first 2L bits of an L-th order homogeneous
linear recurring sequence defined by (10.3). If it is known that the matrix





s0 s1 . . . sL−1
s1 s2 . . . sL
...
...
...
...
sL−1 sL . . . s2L−1





is invertible, show that it is possible to compute a0, . . . , aL−1, i.e. to find out
the structure of the underlying LFSR.
10.5.4 [CAS] The shrinking generator is an example of the clock-controlled
generator. The shrinking generator is composed of two LFSR’s L1 and L2.
The output of L1 controls the output of L2 in the following way: if the output
bit of L1 is one, then the output bit of L2 is taken as an output of the whole
generator. If the output of L1 is zero, then the output of L2 is discarded. So, in
other words, the output of the generator forms a subsequence of the output of L2
and this subsequence is masked by the 1’s in the output of L1. Write a procedure
that implements the shrinking generator. Then use the output of the shrinking
generator as a key-stream k, and define a stream cipher with it, i.e. a ciphertext
is formed as ci = pi ⊕ ki, where p is the plaintext stream. Compare your
simulation results with the ones obtained with the ShrinkingGeneratorCipher
class from Sage.
10.6 PKC systems using error-correcting codes
In this section we consider the public key encryption schemes due to McEliece
(Section 10.6.1) and Niederreiter (Section 10.6.2). Both of these encryption
schemes rely on hardness of decoding random linear codes as well as hardness
of distinguishing a code with the prescribed structure from a random one. As
we have seen, the problem of the nearest codeword decoding is NP-hard. So the
McEliece cryptosystem is one of the proposals to use an NP-hard problem as a
basis, for some others see Section 10.2.3.
As has been mentioned at the end of Section 10.2.1, quantum computer attacks
impose a potential threat for classical cryptosystems like RSA (Section 10.2.1)
and those based on the DLP problem (Section 10.2.2). On the other side,
no significant advantages of using a quantum computer in attacking the code-
based schemes of McEliece and Niederreiter are known. Therefore, this area of
cryptography attracted quite a lot of attention in the last years. See Notes on
the recent developments.

10.6.1 McEliece encryption scheme
Now let us consider the public key cryptosystem by McEliece. It was proposed
in 1978 and is in fact one of the oldest public key cryptosystems. The idea of the
cryptosystem is to take a class of codes C for which there is an efficient bounded
distance decoding algorithm. The secret code C ∈ Cis given by a k×n generator
matrix G. This G is scrambled into G = SGP by means of a k × k invertible
matrix S and an n × n permutation matrix P. Denote by C is the code with
the generator matrix G . Now C is equivalent to C, cf. Definition 2.5.15. The
idea of scrambling is that the code C should appear random to an attacker,
so it should not be possible to use the efficient decoding algorithm available for
C to decrypt messages. More formally we have the following procedures that
define the encryption scheme as in Algorithms 10.1, 10.2, and 10.3. Note that
in these algorithms when we say “choose” we mean “choose randomly from an
appropriate set”.
Algorithm 10.1 McEliece key generation
Input:
System parameters:
- Length n
- Dimension k
- Alphabet size q
- Error-correcting capacity t
- A class C of [n, k] q-ary linear codes that have an efficient decoder
that can correct up to t errors
Output: McEliece public/private key pair (PK, SK).
Begin
Choose C ∈ C represented by a generator matrix G and equipped with an
efficient decoder DC.
Choose an invertible q-ary k × k matrix S.
Choose an n × n permutation matrix P.
Compute G := SGP {a generator matrix of an equivalent [n, k] code}
PK := G .
SK := (DC, S, P).
return (PK, SK).
End
Let us see why the decryption procedure really yields a correct message from
a ciphertext. We have c1 = cP−1
= mSG + eP−1
. Now since wt(eP−1
) =
wt(e) = t, we have c2 = DC(c1) = mS. The last step is then trivial.
Initially McEliece proposed to use the class of binary Goppa codes (cf. Section
8.3.2) as the class C. Interestingly enough, this class turned out to be pretty
much the only secure choice up to now. See Section 10.6.3 for the discussion.
As we saw in the procedures above, decryption is just a decoding with the code
generated by G . So if we are successful in “masking”, for instance a binary
Goppa code C, as a random code C , then the adversary is faced with the
problem of correcting t errors in a random code, which is assumed to be hard,
if t is large enough. More on that in Section 10.6.3. Let us consider a specific
example.

Algorithm 10.2 McEliece encryption
Input:
- Plaintext m
- Public key PK = G
Begin
Represent m as a vector from Fk
q .
Choose randomly a vector e ∈ Fn
q of weight t *** notation for these vectors?
***.
Compute c := mG + e. {encode and add noise; c is of length n}
return c
End
Algorithm 10.3 McEliece decryption
Input:
- Ciphertext c
- private key SK = (DC, S, P)
Begin
Compute c1 := cP−1
.
Compute c2 := DC(c1).
Compute c3 := c2S−1
.
return c3
End
Example 10.6.1 [CAS] We use Magma to construct a McEliece encryption
scheme based on a binary Goppa code, encrypt a message with it and then
decrypt. First we construct a Goppa code of length 31, dimension 16, efficiently
correcting 3 errors (see also Example 12.5.23):
q:=2^5;
Px:=PolynomialRing(GF(q));
g:=x^3+x+1;
a:=PrimitiveElement(GF(q));
L:=[aî : i in [0..q-2]];
C:=GoppaCode(L,g); // a [31,16,7] binary Goppa code
C2:=GoppaCode(L,g^2);
n:=#L; k:=Dimension(C);
Note that we had to define the code C2 generated by the square of the Goppa
polynomial g. Although the two codes are equal, we need the code C2 later for
decoding. *** add references *** Now the key generation part:
G:=GeneratorMatrix(C);
S:=Random(GeneralLinearGroup(k,GF(2)));
Determinant(S); // indeed an invertible map
1
p:=Random(Sym(n)); // a random permutation of an n-set
P:=PermutationMatrix(GF(2), p); // its matrix
GPublic:=S*G*P; // our public generator matrix
After we have obtained the public key, we can encrypt a message:

MessageSpace:=VectorSpace(GF(2),k);
m:=Random(MessageSpace);
m;
(1 1 0 0 1 1 1 0 0 0 0 1 1 1 0 0)
m2:=m*GPublic;
e:=C ! 0; e[10]:=1; e[20]:=1; e[25]:=1; // add 3 errors
c:=m2+e;
Let us decrypt using the private key:
c1:=c*P^-1;
bool,c2:=Decode(C2,c1: Al:=Euclidean);
IS:=InformationSet(C);
ms:=MessageSpace ! [c2[i]: i in IS];
m_dec:=ms*S^-1;
m_dec;
(1 1 0 0 1 1 1 0 0 0 0 1 1 1 0 0)
We see that m_dec=m. Note that we applied the Euclidian algorithm for decoding
a Goppa code (*** reference ***), but we had to apply it to the code generated
by g^2 to be able to correct all three errors. Since as a result of decoding we
obtained a codeword, not the message it encodes, we had to find an information
set and then extract a subvector at positions that correspond to this set (our
generator matrices are in a standard form, so we simply take the subvector).
10.6.2 Niederreiter’s encryption scheme
The scheme proposed by Niederreiter in 1986 is dual to the one of McEliece.
Namely, instead of using generator matrices and words, this scheme uses parity
check matrices and syndromes. Although different in terms of parameter sizes
and efficiency of en-/decryption, the two schemes of McEliece and Niederreiter
actually can be shown to have equivalent security, see the end of this section.
We now present how keys are generated and how en-/decryption is performed in
the Niederreiter scheme in Algorithms 10.4, 10.5, and 10.6. Note that in these
algorithms we use the syndrome decoder. Recall that the notion of a syndrome
decoder is equivalent to the notion of a minimum distance decoder *** add this
to the decoding section ***.
The correctness of the en-/decryption procedures is shown analogously to the
McEliece scheme, see Exercise 10.6.1. The only difference is that here we use
a syndrome decoder, which returns a vector with the smallest non-zero weight
that has the input syndrome, whereas in the case of McEliece the output of
a decoder is the codeword closest to the given word. Let us take a look at a
specific example.
Example 10.6.2 [CAS] We are working in Magma as in Example 10.6.1 and
are considering the same binary Goppa code from there. So the first 8 lines that
define the code are the same; we just add
t:=Degree(g);
Now the key generation part is quite similar as well:
H:=ParityCheckMatrix(C);
S:=Random(GeneralLinearGroup(n-k,GF(2)));
p:=Random(Sym(n));P:=PermutationMatrix(GF(2), p);
HPublic:=S*H*P; // our public parity check matrix

Algorithm 10.4 Niederreiter key generation
Input:
System parameters:
- Length n
- Dimension k
- Alphabet size q
- Error-correcting capacity t
- A class C of [n, k] q-ary linear codes that have an eﬃcient syndrome
decoder that corrects up to t errors
Output: Niederreiter public/private key pair (PK, SK).
Begin
Choose C ∈ C represented by a parity check matrix H and equipped with an
eﬃcient decoder DC.
Choose an invertible q-ary (n − k) × (n − k) matrix S.
Choose an n × n permutation matrix P.
Compute H := SHP {a parity check matrix of an equivalent [n, k] code}
PK := H .
SK := (DC, S, P).
return (PK, SK).
End
Algorithm 10.5 Niederreiter encryption
Input:
- Plaintext m
- Public key PK = H
Begin
Represent m as a vector from Fn
q of weight t. *** notation! ***
Compute c := H mT
. {ciphertext is a syndrome}
return c
End
Algorithm 10.6 Niederreiter decryption
Input:
- Ciphertext c
- private key SK = (DC, S, P)
Begin
Compute c1 := S−1
c.
Compute c2 := DC(c1) {The decoder returns an error vector of weight t.}
Compute c3 := P−1
c2.
return c3
End

The encryption is a bit trickier than in Example 10.6.1, since our messages now
are vectors of length n and of weight t.
MessageSpace:=Subsets(Set([1..n]), t);
mm:=Random(MessageSpace);
mm:=[i: i in mm]; m:=C ! [0: i in [1..n]];
// insert errors at given positions
for i in mm do
m[i]:=1;
end for;
c:=m*Transpose(HPublic); // the ciphertext
The decryption part is also a bit tricker, because the decoding function of
Magma expects a word, not a syndrome. So we have to ﬁnd a solution to
the parity check linear system and then pass this solution to the decoding func-
tion.
c1:=c*Transpose(S^-1);
c22:=Solution(Transpose(H),c1); // find any solution
bool,c2:=Decode(C2,c22:Al:=Euclidean);
m_dec:=(c22-c2)*Transpose(P^-1);
One may see that m=m_dec holds.
Now we will show that in fact the Niederreiter and McEliece encryption schemes
have equivalent security. In order to do so, we assume that we have generated
the two schemes from the same secret code C with a generator matrix G and
a parity check matrix H. Assume further that the private key of the McEliece
scheme is (S, G, P) and for the Niederreiter scheme is (M, H, P), so that the
public keys are G = SGP and H = MHP respectively. Let z = yH T
be the ciphertext obtained by encrypting y with the Niederreiter scheme and
c = mG + e be the ciphertext obtained from m with the McEliece scheme.
Equivalence now means that if one is able to recover y from z, then one is able
to recover c from m and vice versa. Therewith we show that the two systems
based on the same code and the same secret permutation provide equivalent
security.
Now, assume we can recover any y of weight ≤ t from z = yH T
. We now want
to recover m from c = mG + e with wt(e) ≤ t. For y = e we have
yH T
= eH T
= mG H T
+ eH T
= cH T
=: z,
with c = mG + e, since G H T
= SGPPT
HT
MT
= SGHT
MT
= 0, due
to PPT
= Idn and GHT
= 0. So if we can recover such y from the above
constructed z, we are able to recover e and thus m from its ciphertext c =
mG + e.
Analogously, assume that for any m and e of weight ≤ t we can recover them
from c = mG + e. Now we want to recover y of weight ≤ t from z = yH T
.
For e = y we have
z = yH T
= cH T
with c = mG + y being any solution of the equation z = cH T
. Now we can
recover m from c and thus y.
10.6.3 Attacks
There are two types of attacks one may think of for code-based cryptosystems:

1. Generic decoding attacks. One tries to recover m from c using the code
C .
2. Structural attacks. One tries to recover S, G, P from the code C given by
G in the McEliece scheme or S, H, P from in the Niederreiter scheme.
Consider the McEliece encryption scheme. In the attack of type (1.), the at-
tacker tries to directly decode the ciphertext c using the code C generated by
the public generator matrix G . Assuming C is a random code, one may obtain
complexity estimates for this type of attack. The best results in this direction
are obtained using the family of algorithms that improve on the information set
decoding (ISD), see Section 6.2.3.
Recall that the idea of the (probabilistic) ISD is to ﬁnd an error-free information
set I and then decode as c = r(I)G for the received word r. Here the matrix G
is a generator matrix of an equivalent code that is systematic at the positions of
I. In order to avoid confusion with the public generator matrix, we denote it by
˜G in the following. The ﬁrst improvement of the ISD due to Lee and Brickell is
in allowing some small number p of errors to occur in the set I. So we no longer
require r(I) = c(I), but allow r(I) = c(I) + e(I) with wt(e(I)) ≤ p. We can now
modify the algorithm in Algorithm 6.2 as in Algorithm 10.7. Note that since
we know the number of errors occurred, the if-part has changed also.
Algorithm 10.7 Lee-Brickell ISD
Input:
- Received word r
- Number of errors t occurred, so that d(r, C) = t
- Number of trials Ntrials
- Parameter p
Output: A codeword c ∈ C, such that d(r, c) = t or “No solution found”
Begin
c := 0;
Ntr := 0;
repeat
Ntr := Ntr + 1;
Choose a subset I of {1, . . . , n} of cardinality k.
if G(I) is invertible then
˜G := G
(−1)
(I) G
for e(I) of weight ≤ p do
˜c = (r(I) + e(I)) ˜G
if d(˜c, r) = t then
return ˜c
end if
end for
end if
until Ntr Ntrials
return “No solution found”
End

Remark 10.6.3 In Algorithm 10.7 one may replace choosing a set I by choos-
ing every time a random permutation matrix Π. Then one may find rref(GΠ)
therewith obtaining an information set. One must keep track of the applied
permutations Π in order to “go back” after finding a solution in this way.
The probability of success in one trial of the Lee-Brickell variant is
pLB =
k
p
n−k
t−p
n
t
compared to the original one of the probabilistic ISD
pISD =
n−k
t
n
t
.
Since in the for-loop of Algorithm 10.7 we have to run 2p
times, p should be a
small constant. In fact for small p, like p = 2, one obtains complexity improve-
ment, although not asymptotical, but quite relevant for practice.
There is a rich list of further improvements due to many researchers in the
field, see Notes. The improvements basically consider different configurations of
where a small number p of errors is allowed to be present, where only a block
of l zeroes should be present, etc. Further, the choice of the next set I can
be optimized, for example by changing just one element in the current I in a
clever way. With all these techniques in mind, one obtains quite a considerable
improvement of the ISD in practical attacks on the McEliece cryptosystem. In
fact the original proposal of McEliece to use [1024, 524] binary Goppa codes
correcting 50 errors is not a secure choice any more; one has to increase the
parameters of the Goppa codes used.
Example 10.6.4 [CAS] Magma contains implementations of the “vanilla” prob-
abilistic ISD, which was also considered in the original paper of McEliece, as
well as Lee-Brickell’s variant and several other improved algorithms. Let us try
to attack the toy example considered in Example 10.6.1. So we copy all the
instructions responsible for the code construction, key generation, and encryp-
tion.
... // as in Example ref{ex-CAS-McEliece}
Then we use commands
CPublic:=LinearCode(GPublic);
McEliecesAttack(CPublic,c,3);
LeeBrickellsAttack(CPublic,c,3,2);
to mount our toy attack. For this specific example it takes no time to execute
both attacks. In both commands first we pass the code, then the received word,
and then the number of errors to be corrected. In LeeBrickellsAttack the last
parameter is exactly the parameter p from Algorithm 10.7; we set it to 2.
We can correct errors with random codes. Below is the example:
C:=RandomLinearCode(GF(2),50,10);
c:=Random(C); r:=c;
r[2]:=r[2]+1; r[17]:=r[17]+1; r[26]:=r[26]+1;
McEliecesAttack(C,r,3);
LeeBrickellsAttack(C,r,3,2);

Apart from decoding being hard for the public code C , it should be impossible
to deduce the structure of the code C from the public C . Structural attacks
of (2.) aim at exploiting this structure. As we have mentioned, the choice of
binary Goppa codes turned out to be pretty much the only secure choice up
to now. There were quite a few attempts to propose other classes of codes for
which efficient decoding algorithms are known. Alas, all of these proposals were
broken, we just name a few: Generalized Reed-Solomon (GRS) codes, Reed-
Muller codes, BCH codes, algebraic-geometry codes of small genus, LDPC codes,
quasi-cyclic codes; see Notes. In the next section we will consider in detail how
a prominent attack on GRS works. In particular, weakness of the GRS codes
suggest, due to equivalent security, the weakness of the original proposal of
Niederreiter, who suggested to use these codes in his scheme.
10.6.4 The attack of Sidelnikov and Shestakov
Let C be the code GRSk(a, b), where a consists of n mutually distinct entries
of Fq and b consists of nonzero entries, cf. Definition 8.1.10. If this code is used
in the McEliece PKC, then for an attacker the code C with generator matrix
G = SGP is known, where S is an invertible k × k matrix and P = ΠD with
Π an n × n permutation matrix and D an invertible diagonal matrix. The code
C is equal to GRSk(a , b ), where a = aΠ and b = bP. In order to decode
GRSk(a , b ) up to (n − k + 1)/2 errors it is enough to find a and b . The
S is not essential in masking G, since G has a unique row equivalent matrix
(Ik|A ) that is in reduced row echelon form. Here A is a generalized Cauchy
matrix (Definition 3.2.17), but it is a priori not evident how to recover a and
b from this.
The code is MDS hence all square submatrices of A are invertible by Re-
mark 3.2.16. In particular all entries of A are nonzero. After multiplying
the coordinates with nonzero constants we get a code which is generalized
equivalent with the original one, and is again of the form GRSk(a , b ), since
r ∗ GRSk(a , b ) = GRSk(a , b ∗ r). So without loss of generality it may be
assumed that the code has a generator matrix of the form (Ik|A ) such that the
last row and the first column of A consists of ones.
Without loss of generality it may be assumed that ak−1 = ∞, ak = 0 and
ak+1 = 1 by Proposition 8.1.25. Then according to Proposition 8.1.17 and
Corollary 8.1.19 there exists a vector c with entries ci given by
ci =
bi
k
t=1,t=i(ai − at) if 1 ≤ i ≤ k,
bi
k
t=1(ai − at) if k + 1 ≤ i ≤ n,
such that A has entries aij given by
aij =
cj+k−1c−1
i
aj+k−1 − ai
for 1 ≤ i ≤ k − 1, 1 ≤ j ≤ n − k + 1 and
aik = cj+k−1c−1
i
for 1 ≤ j ≤ n − k + 1.

Example 10.6.5 Let G be the generator matrix of a code C with entries in
F7 given by
G =


6 1 1 6 2 2 3
3 4 1 1 5 4 3
1 0 3 3 6 0 1

 .
Then rref(G ) = (I3|A ) with
A =


1 3 3 6
4 4 6 6
3 1 6 3

 .
G is a pubic key and it is known that it is the generator matrix of a generalized
Reed-Solomon code. So we want to ﬁnd a in F7
7 consisting of mutually distinct
entries and b in F7
7 with nonzero entries such that C = GRS3(a, b). Now
C = (1, 4, 3, 1, 5, 5, 6) ∗ C has a generator matrix of the form (I3|A) with
A =


1 1 1 1
1 5 4 2
1 4 3 6

 .
We may assume without loss of generality that a1 = 0 and a2 = 1 by Proposition
8.1.25. ***...............***
10.6.5 Exercises
10.6.1 Show the correctness of the Niederreiter scheme.
10.6.2 Using methods of Section 10.6.3 attack larger McEliece schemes. In
the Goppa construction take
- m = 8, r = 16
- m = 9, r = 5
- m = 9, r = 7
Make observations that would answer the following questions:
- Which attack is faster: the plain ISD or Lee-Brickell’s variant?
- What is the role of the parameter p? What is the optimal value of p in
these experiments?
- Does the execution time diﬀer from one run to the other or it stays the
same?
- Is there any change in execution time, when the attacks are done for
random codes with the same parameters as above?
Try to experiment with other attack methods implemented in Magma: LeonsAttack,
SternsAttack, CanteautChabaudsAttack.
Hint: For constructing Goppa polynomials use the command PrimitivePolynomial.

10.7. NOTES 345
10.6.3 Consider binary Goppa codes of length 1024 and Goppa polynomial of
degree 50.
(1) Give an upper bound of the number of these codes.
(2) What is the fraction of the number of these codes with respect to all binary
[1024, 524] codes?
(3) What is the minimum distance of a random binary [1024, 524] code according
to the Gilbert-Varshamov bound?
10.6.4 Give an estimate of the complexity of decoding 50 errors of a received
word with respect to a binary [1024, 524, 101] code by means of covering set
decoding.
generator matrix given by
G =


α6
α6
α 1 α4
1 α4
0 α3
α3
α4
α6
α6
α4
α4
α5
α3
1 α2
0 α6

 .
(1) Find a in F7
8 consisting of mutually distinct entries and b in F7
8 with nonzero
entries such that G is a generator matrix of GRS3(a, b).
(2) Consider the 3 × 7 generator matrix G of the code RS3(7, 1) with entry
α(i−1)(j−1)
in the i-th row and the j-th column. Give an invertible 3 × 3 matrix
S and a permutation matrix P such that G = SGP.
(3) What is the number of pairs (S, P) of such matrices?
10.7 Notes
Some excellent references for introduction to cryptography are [87, 117, 35].
10.7.1 Section 10.1
Computational security concerns with practical attacks on cryptosystems, whereas un-
conditional security works with probabilistic models, where an attacker is supposed
to possess unlimited computing power. A usual claim when working with uncondi-
tional security would be to give an upper bound of attacker’s success probability. This
probability is independent on the computing power of an attacker and bears ”absolute
value”. For instance in the case of Shamir’s secret sharing scheme (Section 10.4) no
matter how much computing power does a group of t − 1 participants have, it does
not have better to do as to guess a value of the secret. Probability of such a success
is 1/q. More on these issues can be found in [117].
A couple of remarks on block ciphers that were used in the past. Jeﬀerson cylinder
invented by Thomas Jeﬀerson in 1795 and independently by Etienne Bazeries is a
polyalphabetic cipher used by the U.S. army in 1923-1942 and had the name M-94.
It was constructed as a rotor with 20–36 discs with letters, each of which provided a
substitution at a corresponding position. For its time it had quite good cryptographic
properties. Probably the best known historic cipher is German’s Enigma. It had been
used for commercial purposes already in 1920s, but became famous for its use during
the World War II by the Nazi German military. Enigma is also a rotor-based polyal-
phabetic cipher. More on historical ciphers in [87].

The Kasiski method aims at recovering period of a polyalphabetic substitution cipher.
Here one encrypts repeated portions of a plaintext with the same keyword. More de-
tails in [87].
The National Bureau of Standards (NBS, later became National Institute of Standards
and Technology - NIST) initiated development of DES (Data Encryption Standard) in
early 1970s. IBM’s cryptography department and in particular its leaders Dr. Horst
Feistel (recall Feistel cipher) and Dr.W.Tuchman contributed the most to the devel-
opment. The evaluation process was also facilitated by the NSA (National Security
Agency). The standard was finally approved and published in 1977 [90]. A lot of
controversy accompanied DES since its appearance. Some experts claimed that the
developers could have intentionally added some design trapdoors to the cipher, so
that its cryptanalysis would have been possible by them, but not by the others. The
key size 56-bits also raised concern, which eventually led to the need to adopt a new
standard. Historical remarks on DES and its development can be found in [112].
Differential and linear cryptanalysis turned out to be the most successful theoretical at-
tacks on DES. For initial papers on these attacks, see [15, 82]. The reader may also visit
http://www.ciphersbyritter.com/RES/DIFFANA.HTM for more references and history
of the differential cryptanalysis. We also mention that differential cryptanalysis may
have been known to the developers long before Adi Shamir published his paper in
1990.
Since DES encryptions do not form a group, the use of a triple application of DES
was proposed that was called triple DES [66]. Although no effective cryptanalysis
against triple DES was proposed it is barely used due to its slow compared to AES
implementation.
In the middle of 1990s it became apparent to the cryptography community that the
DES did not provide sufficient security level anymore. So NIST announced a competi-
tion for a cipher that would replace DES and became the AES (Advanced Encryption
Standard). The main criteria that were imposed for the future AES were resistance
to linear and differential cryptanalysis, faster and more effective (compared to DES)
implementation, ability to work with 128-bit plaintext blocks and 128, 192, 256-bit
keys; the number of rounds was not specified. After five years of the selection process,
the cipher Rijndael proposed by the Belgian researchers Joan Daemen and Vincent
Rijmen won. The cipher was officially adopted for the AES in 2001 [91]. Because
resistance to linear and differential cryptanalysis was one of the milestones in the
design of AES, J.Daemen and V.Rijmen carefully studied this question and showed
how such resistance can be achieved within Rijndael. In the design they used what
they called wide trail strategy - a method devised specifically to counter the linear
and differential cryptanalysis. Description of the AES together with the discussion
of underlying design decisions and theoretical discussion can be found in [45]; for the
wide trail strategy see [46].
As to attacks on AES, up to now there is no attack out there that could break AES
at least theoretically, i.e. faster than the exhaustive search, in a scenario where the
unknown key stays the same. Several attacks, though, work on non-full AES that
performs less than 10 rounds. For example Boomerang type of attacks are able to
break 5–6 rounds of AES-128 much faster, than the exhaustive search. For 6 rounds
the Boomerang attack has data complexity of 271
128-bit blocks, memory complexity
of 233
blocks and time complexity of 271
AES encryptions. This attack is mounted
under a mixture of chosen plaintext and adaptive chosen ciphertext scenarios. Some
other attacks also can attack 5–7 rounds. Among them are the Square attack, pro-
posed by Daemen and Rijmen themselves, collision attack, partial sums, impossible

10.7. NOTES 347
differentials. For an overview of attacks on Rijndael see [40]. There are recent works
on related key attacks on AES, see [1]. It is possible to attack 12 rounds of AES-192
and 14 rounds of AES-256 in the related key scenario. Still, it is quite questionable
by the community on whether one may consider these as a real threat.
We would like to mention several other recent block ciphers. The cipher Serpent is an
instance of an SP-network and was the second in the AES competition that was won
by Rijndael. As was prescribed by the selection committee it also operates on 128-bit
blocks and key of sizes 128, 192, and 256 bits. Serpent has a strong security margin,
prescribing 32 rounds. Some information online: http://www.cl.cam.ac.uk/~rja14/
serpent.html. Next, the cipher Blowfish proposed in 1993 is an instance of a Feis-
tel cipher, has 16 rounds and operates on 64-bit blocks and default key size of 128
bits. Blowfish is up to now resistant to cryptanalysis and its implementation is rather
fast, although has some limitations that preclude its use in some environments. Infor-
mation online: http://www.schneier.com/blowfish.html. A successor of Blowfish
proposed by the same person - Bruce Schneier - Twofish was one of the five finalists
in the AES competition. It has the same block and key sizes as all the AES contes-
tants and has 16 rounds. Twofish is also a Feistel cipher. This cipher is also believed
to resist cryptanalysis. Information online http://www.schneier.com/twofish.html.
It is noticeable that all these ciphers are in public domain and are free for use
in any software/hardware implementations. A light-weight block cipher PRESENT
that operates on plaintext block of only 64 bits and key length of 80 and 128 bits;
PRESENT has 31 rounds [2]. There exist proposals with even smaller block lengths,
see http://www.ecrypt.eu.org/lightweight/index.php/Block_ciphers.
10.7.2 Section 10.2
The concept of the asymmetric cryptography was introduced by Diffie and Hellman
in 1976 [48]. For an introduction to the subject and survey of results see [87, 89]. The
notion of a one-way as well as trapdoor one-way function was also introduced by Diffie
and Hellman in the same paper [48].
Rabin’s scheme from Example 10.2.2 was introduced by Rabin in [97] in 1979 and
ElGamal scheme was presented in [55]. The notion of a digital signature was also
presented in the pioneering work [48], see also [88].
The RSA scheme was introduced in 1977 by Rivest, Shamir, and Adleman [100]. In
the same paper they showed that computing the decryption exponent and factoring
are equivalent. There is no known polynomial time algorithm for factoring integers.
Still there are quite a few algorithms out there that have sub-exponential complex-
ity. For a survey of existing methods, see [96]. Asymptotically the best known sub-
exponential algorithm is general number field sieve, and it has an expected running
time of O(exp((64
9
b)1/3
log b2/3
)), where b is a bit length of a number n that is to be
factored [36].
Development of factoring algorithms changed requirements on the RSA key size through
the time. In their original paper [100] the authors suggested the use of 200 decimal
digit modulus (664 bits). The sizes of 336 and 512 bits were also used. In 2010 the
result of factoring RSA-768 was announced. Used at present modulus of 1024 bits
raises many questions on whether it may be considered secure. Therefore, for long-
term security the key size of 2048 or even 3072 bits are to be used. Quite remarkable is
the work of Shor [?, ?] who proposed an algorithm that can solve integer factorization
problem in polynomial time on a quantum computer.
The use of Z∗
n in ElGamal scheme was proposed in [83]. For the use of Jacobian of

a hyperelliptic curve see [41]. There are several methods for solving DLP, we name
just a few: Baby-step giant-step, Pollard’s rho and Pollard’s lambda (or kangaroo),
Pohlig-Hellman, index calculus. The fastest algorithms for solving DLP for Z∗
p and
F2m are variations of the index calculus algorithm. All of the above algorithms ap-
plied to the multiplicative group of a finite field do not have polynomial complexity.
For an introduction to these methods the reader may consult [41]. Index calculus is
an algorithm with sub-exponential complexity. These developments on DLP solving
algorithms affected the key size of El Gamal scheme. In practice the key size grew
from 512 to 768 and finally to 1024. At the moment using 1024-bit key for El Gamal
is considered to be a standard. The mentioned work of Shor [108] also solve the DLP
problem in polynomial time. Therefore existence of a large enough quantum computer
jeopardizes the ElGamal scheme.
Despite doubts of some researcher of a possibility to construct a large enough quan-
tum computer, the area of post-quantum cryptography has evolved that incorporates
cryptosystems that are potentially resistant to quantum computer attacks. See [12]
for an overview of the area, which includes lattice basedd, hash based, coding based,
and multivariate based cryptography. Some references to multivariate asymmetric sys-
tems, digital schemes, and their cryptanalysis can be found in [49].
The knapsack cryptosystem by Merkle and Hellman has an interesting history. It
was one of the first asymmetric cryptosystems. Its successful cryptanalysis showed
that only reliance on hardness of the underlying problem may be misleading. For an
interesting historical development, see the survey chapter of Diffie [47].
10.7.3 Section 10.3
Authentication codes were initially proposed by MacWilliams, Gilbert, and Sloan [58].
Introductory material on authentication codes is well exposed in [117].
Message authentication codes (MACs) are widely used in practice for authentication
purposes. MACs are keyed hash functions. In the case of MACs one demands from
a hash function to provide compression (a message of arbitrary size is mapped to a
fixed size vector), ease of computation (it should be easy to compute an image know-
ing the key), and computation-resistance (practical impossibility to compute an image
without knowing the key, even having some pairs element-image in disposal). More
on MACs in [87].
Results and discussion of the relation between authentication codes and orthogonal
arrays is in [117, 116, 115]. Proposition 10.3.7 is due to Bierbrauer, Johansson, Kaba-
tianskii, and Smeets [13]. By adding linear structure to the source state set, key space,
tag space, and authentication mappings one obtains linear authentication codes that
can be used in the study of distributed authentication systems [103].
10.7.4 Section 10.4
The notion of a secret sharing scheme was first introduced in 1979 by Shamir [106] and
independently by Blakely [22]. We mention here some notions that were not mentioned
in the main text. A secret sharing scheme is called perfect if knowledge of shares from
an unauthorized group (e.g. a group of t participants in Shamir’s scheme) does not
reduce the uncertainty about the secret itself. In terms of entropy function it can be
stated like this: H(S|A) = 0, where S is the secret and A is an unauthorized set,
moreover we have H(S|B) = H(S) for B being an authorized set. In perfect secret
sharing schemes it holds that the size of each share is at least the size of the secret.

10.7. NOTES 349
If equality holds such a system is called ideal. The notion of a secret sharing scheme
can be generalized via the notion of an access structure. Using access structures one
prescribes which subsets of participants can reconstruct the secret (authorized subset)
and which cannot (unauthorized subset). The notion of a distribution of shares can
also be formalized. More details on these notions and treatment using probability
theory can be found e.g. in [117].
McEliece and Sarwate [85] were the first to point out the connection between Shamir’s
scheme and the Reed-Solomon codes construction. Some other works on relations
between coding theory and secret sharing schemes include [70, 81, 94, 138]. More
recent works concern applications of AG-codes to this subject. We mention the chapter
of Duursma [52] and the work of Chen and Cramer [39]. In the latter two references
the reader can also find the notion of secure multi-party computation, see [137]. The
idea here is that several participants wish to compute the value of some publicly known
function evaluated at their values (like shares in the above). The thing is that each
of the participants should not be able to know the values of other participants by the
known computed value of the public function and his/her own value. We also mention
that as was the case with authentication codes, information theoretic perfectness can
be traded off to obtain a system where shares are smaller than the secret [23].
10.7.5 Section 10.5
Introductory material on LFSRs with discussion of practical issues can be found in
[87]. The notion of linear complexity is treated in [102], see also materials online at
http://www.ciphersbyritter.com/RES/LINCOMPL.HTM. A thorough exposure of linear
recurring sequences is in [74].
Some examples of non-cryptographic use of LFSRs, namely randomization in digital
broadcasting: Advanced Television Systems Committee (ATSC) standard for digital
television format (http://www.atsc.org/guide_default.html), Digital Audio Broad-
casting (DAB) digital radio technology for broadcasting radio stations, Digital Video
Broadcasting - Terrestrial (DVB-T) is the European standard for the broadcast trans-
mission of digital terrestrial television (http://www.dvb.org/technology/standards/).
An example of nonlinear combination generator: E0 is a stream cipher used in the Blue-
tooth protocol, see e.g. [64]; of nonlinear filter generator Knapsack generator [87]; of
clock-controlled generators: A5/1 and A5/2 are stream ciphers used to provide voice
privacy in the Global System for Mobile communications (GSM) cellular telephone pro-
tocol http://web.archive.org/web/20040712061808/www.ausmobile.com/downloads/
technical/Security+in+the+GSM+system+01052004.pdf.
10.7.6 Section 10.6
***

Chapter 11
The theory of Gröbner
bases and its applications
Stanislav Bulygin
In this chapter we deal with methods in coding theory and cryptology based
on polynomial system solving. As the main tool for this we use the theory
of Gröbner bases that is a well-established instrument in computational alge-
bra. In Section 11.1 we give a brief overview of the topic of polynomial system
solving. We start with relatively easy methods of linearization and extended
linearization. Then we give basics of more involved theory of Gröbner bases.
The problem we are dealing with in this chapter is the problem of polynomial
system solving. We formulate this problem as follows: let F be a field and let
P = F[X1, . . . , Xn] be a polynomial ring over F in n variables X = (X1, . . . , Xn).
Let f1, . . . , fm ∈ P. We are interested in finding a set of solutions S ⊆ Fn
to
the polynomial system defined as
f1(X1, . . . , Xn) = 0,
. . .
fm(X1, . . . , Xn) = 0. (11.1)
In other words, S is composed of those elements from Fn
that satisfy all the
equations above.
In terms of algebraic geometry this problem may be formulated as follows. Given
an ideal I ⊆ P, find the variety VF(I) which it defines:
VF(I) = {x = (x1, . . . , xn) ∈ Fn
|f(x) = 0 ∀f ∈ I}.
Since now we are interested in applications to coding and cryptology, we will
be working over finite fields and often we would like solutions of corresponding
systems to lie in these finite fields, rather than in an algebraic closure. Recall
that for every element α ∈ Fq the following holds: αq
= α. This means that if
we add an equation Xq
− X = 0 to a polynomial system (11.1), we are guaran-
teed that solutions for the X-variable lie in Fq.
After introducing tools for the polynomial system solving in Section 11.1, we
351

352CHAPTER 11. THE THEORY OF GR ÖBNER BASES AND ITS APPLICATIONS
give two concrete applications in Sections 11.2 and 11.3. In Section 11.2 we con-
sider applications of Gröbner bases techniques to decoding linear codes, whereas
Section 11.3 deals with methods of algebraic cryptanalysis of block ciphers. Due
to space limitations many interesting topics related to these areas are not con-
sidered. We provide their short overview with references in the Notes section.
11.1 Polynomial system solving
11.1.1 Linearization techniques
We know how to solve systems of linear equations efficiently. Gaussian elimi-
nation is a standard tool for this job. If we are given a system of non-linear
equations, a natural solution would be to try to reduce this problem to a lin-
ear one, which we know how to solve. This simple idea leads to a technique
that is called linearization. This technique works as follows: we replace every
monomial occurring in a non-linear (polynomial) equation by a new variable.
At the end we obtain a linear system with the same number of equations, but
many new variables. The hope is that by solving this linear system we are able
to get a solution to our initial non-linear problem. It is better to illustrate this
approach on a concrete example.
Example 11.1.1 Consider a quadratic system in two unknowns x and y over
the field F3: 


x2
− y2
− x + y = 0
−x2
+ x − y + 1 = 0
y2
+ y + x = 0
x2
+ x + y = 0
Introduce new variables a := x2
and b := y2
. Therewith we have a linear system:



a − b − x + y = 0
−a + x − y + 1 = 0
b + y + x = 0
a + x + y = 0
This system has a unique solution, which may be found with the Gaussian
elimination: a = b = x = y = 1. Moreover, the solution on a and b is consistent
with the conditions a = x2
, b = y2
. So although the system is quadratic, we are
still able to solve it purely with methods of linear algebra.
It must be noted that the linearization technique works very seldom. Usually
the number of variables (i.e. monomials) in the system is much lager than the
number of equations. Therefore one has to solve an underdetermined linear
system, which has many solutions, among which it is hard to find a “real” one
that stems from the original non-linear system.
Example 11.1.2 Consider a system in three variables x, y, z over the field F16:



xy + yz + xz = 0
xyz + x + 1 = 0
xy + y + z = 0

11.1. POLYNOMIAL SYSTEM SOLVING 353
It may be shown that over F16 this system has a unique solution (1, 0, 0). If we
replace monomials in this system with new variables we will end up with a linear
system of 3 equations and 7 variables. This system is full rank. In particular the
variables x, y, z are now free variables which yield values for other variables. So
such linear system has 163
solutions, of which only one will provide a legitimate
solution for the initial system. Other solutions do not have any meaning. E.g.
we may show that assignment x = 1, y = 1, z = 1 implies that “variable” xy
should be 0, and of course this cannot be true, since both x and y are 1. So using
linearization technique boils down to sieving the set F3
16 for a right solution, but
this is nothing more than an exhaustive search for the initial system.
So the problem with the linearization technique is that we do not have enough
linearly independent equations for solving. Here is where the idea of eXtended
Linearization (XL) comes in hand. The idea of XL is to multiply initial equa-
tions by all monomials up to given degree (hopefully not too large) to generate
new equations. Of course new variables will appear, since new monomials will
appear. Still if the system is “nice” enough we may generate necessary number
of linearly independent equations to obtain a solution. Namely, we hope that
after “extending” our system with new equations and doing Gaussian elimina-
tion, we will be able to find a univariate equation. Then we can solve it and
plug in obtained values and then proceed with a simplified system.
Example 11.1.3 Consider a small system in two unknowns x, y over the field
F4:
x2
+ y + 1 = 0
xy + y = 0
It is clear that the linearization technique does not work so well in this case,
since the number of variables (3) is larger than the number of equations (2).
Now multiply the two equations first with x and then with y. Therewith we
obtain four new equations, which have the same solution as the initial ones, so
we may add them to the system. The new equations are:
x3
+ xy + x = 0
x2
y + xy = 0
x2
y + y2
+ y = 0
xy2
+ y2
= 0
Here again the number of equations is lower than the number of variables. Still,
by ordering monomials in the way that y2
and y go leftmost in the matrix
representation of a system and doing Gaussian elimination, we encounter a
univariate equation y2
= 0 (check this!). So we have a solution for y, namely
y = 0. After substituting y = 0 in the first equation we have x2
+ 1 = 0, which
is again a univariate equation. Over F4 is has a unique solution x = 1. So by
using linear algebra and univariate equation solving, we were able to obtain the
solution (1, 0) for the system.
Algorithm 11.1 explains more formally how the XL works. In our example it
was enough to set D = 3. Usually one has to go much higher to get the result.
In the next section we consider a technique of Gröbner basis, which is a more
powerful tool. In some sense, it is a refined and improved version of the XL.

Algorithm 11.1 XL(F, D)
Input:
- A system of polynomial equations F = {f1 = 0, . . . , fm = 0} of total
degree d over the ﬁeld F in variables x1, . . . , xn ;
- Parameter D;
Output: a solution to the system F or the message “no solution found”
Begin
Dcurrent := d;
Sol := ∅;
repeat
Extend: Multiply each equation fi ∈ F with all monomials of degree
≤ Dcurrent − d; Denote the system so obtained by Sys;
Linearize: assign each monomial appearing in Sys a new variable, order
the new variables such that xa
i go left-most in the matrix representation of
a system in blocks {xi, x2
i , . . . }; Sys := Gauss(Sys);
if exists a univariate equation f(xi) = 0 in Sys then
solve f(xi) = 0 over F and obtain ai : f(ai) = 0;
Sol := Sol ∪ {(i, ai)};
if |Sol| = n then
return Sol;
end if
Sys := Sys with ai substituted for xi;
else
Dcurrent := Dcurrent + 1;
end if
until Dcurrent = D + 1
return “no solution found”
End

11.1.2 Gröbner bases
In the previous section we saw how one can solve systems of non-linear equations
using linearization techniques. Speaking the language of algebraic geometry, we
want to find elements of the variety V (f1, ..., fm), where V (f1, ..., fm) := {a ∈
Fn
: ∀1 ≤ i ≤ m, fi(a) = 0} and F is a field. The target object of this section,
Gröbner basis technique, gives an opportunity to find this variety and also solve
many other important problems like for example ideal membership, i.e. deciding
whether a given polynomial may be obtained as a polynomial combination of the
given set of polynomials. As we will see, the algorithm for finding Gröbner bases
generalizes Gaussian elimination for linear systems on one side and Euclidean
algorithm of finding the GCD of two univariate polynomials on the other side.
We will see how this algorithm, the Buchberger’s algorithm, works and how
Gröbner bases can be applied for finding a variety (system solving) and some
other problems. First of all, we need some definitions. *** should this go to
Appendix? ***
Definition 11.1.4 Let R := F[x1, ...xn] be a polynomial ring in n variables
over the field F. An ideal in R is a subset of R with the following properties:
- 0 ∈ I;
- for every f, g ∈ I : f + g ∈ I;
- for every h ∈ R and every f ∈ I : h · f ∈ R.
So the ideal I is a subset of R closed under addition and closed under multi-
plication with elements from R. Let f1, . . . , fm ∈ R. It is easy to see that the
object f1, . . . , fm := {a1f1 + · · · + amfm|ai ∈ R ∀i} is an ideal. We say that
f1, . . . , fm is an ideal generated by the polynomials f1, . . . , fm. From commu-
tative algebra it is know that every ideal I has a finite system of generators, i.e.
I = f1, . . . , fm for some f1, . . . , fm ∈ I. A Gröbner basis, that we define later,
is a system of generators with special properties.
A monomial in R is a product of the form xa1
1 · · · · · xan
n with a1, . . . , an being
non-negative integers. Denote X = {x1, . . . , xn} and by Mon(X) the set of all
monomials in R.
Definition 11.1.5 A monomial ordering on R is any relation on Mon(X)
such that
- is a total ordering on Mon(X), i.e. any two elements of Mon(X) are
comparable;
- is multiplicative, i.e. Xα
Xβ
implies Xα
· Xγ
Xβ
· Xγ
for all
vectors γ with non-negative integer entries, here Xα
= xα1
1 · · · · · xαn
n ;
- is a well-ordering, i.e. every non-empty subset of Mon(X) has a minimal
element.
Example 11.1.6 Here are some orderings that are frequently used in practice.
1. Lexicographic ordering induced by x1 · · · xn : Xα
lp Xβ
if and only
if there exists an s such that α1 = β1, . . . , αs−1 = βs−1, αs βs.

2. Degree reverse lexicographic ordering induced by x1 · · · xn : Xα
dp
Xβ
if and only if |α| := α1+· · ·+αn β1+· · ·+βn =: |β| or if |α| = |β| and
there exists an s such that αn = βn, . . . , αn−s+1 = βn−s+1, αn−s βn−s.
3. Block ordering or product ordering. Let X and Y be two ordered sets of
variables, 1 a monomial ordering on F[X] and 2 a monomial ordering on
F[Y ]. The block ordering on F[X, Y ] is the following: Xα1
Y β1
Xα2
Y β2
if and only if Xα1
1 Xα2
or if Xα1
=1 Xα2
and Y β1
2 Y β2
.
Definition 11.1.7 Let be a monomial ordering on R. Let f = α cαXα
be
a non-zero polynomial from R. Let α0 be such that cα0
= 0 and Xα0
Xα
for
all α = α0 with cα = 0. Then lc(f) := cα0 is called the leading coefficient of f,
lm(f) := Xα0
is called the leading monomial of f, lt(f) := cα0 Xα0
is called the
leading term of f, moreover tail(f) := f − lt(f).
Having these notions we are ready to define the notion of a Gröbner basis.
Definition 11.1.8 Let I be an ideal in R. The leading ideal of I with respect
to is defined as L(I) := lt(f)|f ∈ I, f = 0 . The L(I) is abbreviated by
L(I) if it is clear which ordering is meant. A finite subset G = {g1, . . . , gm} of I
is called a Gröbner basis for I with respect to if L(I) = lt(g1), . . . , lt(gm) .
We say that the set F = {f1, . . . , fm} is a Gröbner basis if F is a Gröbner basis
of the ideal F .
Remark 11.1.9 Note that a Gröbner basis of an ideal is not unique. The so-
called reduced Gröbner basis of an ideal is unique. By this one means a Gröbner
basis G in which all elements have leading coefficient equal to 1 and no leading
term of an element g ∈ G divides any of the terms of g , where g = g ∈ G.
Historically the first algorithm for computing Gröbner bases was proposed by
Bruno Buchberger in 1965. In fact the very notion of the Gröbner basis was
introduced by Buchberger in his Ph.D. thesis and was named after his Ph.D.
advisor Wolfgang Gröbner. In order to be able to formulate the algorithm we
need two more definitions.
Definition 11.1.10 Let f, g ∈ R {0} be two non-zero polynomials, and let
lm(f) and lm(g) be leading monomials of f and g resp. w.r.t some monomial
ordering. Denote m := lcm(lm(f), lm(g)). Then the s-polynomial of these two
polynomials is defined as
spoly(f, g) = m/lm(f) · f − lc(f)/lc(g) · m/lm(g) · g.
Remark 11.1.11 1. If lm(f) = xa1
1 · . . . xan
n and lm(g) = xb1
1 · · · · · xbn
n , then
m = xc1
1 · . . . xcn
n , where ci = max(ai, bi) for all i. Therewith m/lm(f) and
m/lm(g) are monomials.
2. Note that if we write f = lc(f) · lm(f) + f and g = lc(g) · lm(g) + g ,
where lm(f ) lm(f) and lm(g) lm(g ), then spoly(f, g) = m/lm(f) ·
(lc(f) · lm(f) + f ) − lc(f)/lc(g) · m/lm(g) · (lc(g) · lm(g) + g ) = m · lc(f) +
m/lm(f) · f − m · lc(f) − lc(f)/lc(g) · m/lm(g) · g = m/lm(f) · f −
lc(f)/lc(g) · m/lm(g) · g . Therewith we “canceled out” the leading terms
of both f and g.

Example 11.1.12 In order to understand this notion better, let us see what
are the s-polynomials in the case of linear and univariate polynomials
linear: Let R = Q[x, y, z] and a monomial ordering being lexicographic with x
y z. Let f = 3x+2y−10z, g = x+5y−5z, then lm(f) = lm(g) = x, m =
x : spoly(f, g) = f − 3/1 · g = 3x + 2y − 10z − 3x − 15y + 15z = −13y + 5z
and this is exactly what one would do to cancel out the variable x during
the Gaussian elimination.
univariate: Let R = Q[x]. Let f = 2x5
− x3
, g = x2
− 10x + 1, then m = x5
and
spoly(f, g) = f − 2/1 · x3
· g = 2x5
− x3
− 2x5
+ 20x4
− 2x3
= 20x4
− 3x3
and this is the first step in polynomial division algorithm, which is used
in the Euclidean algorithm for finding gcd(f, g).
To define the next notion we need for the Buchberger’s algorithm, we use the
following result.
Theorem 11.1.13 Let f1, . . . , fm ∈ R {0} be non-zero polynomials in the
ring R endowed with a monomial ordering and let f ∈ R be some polynomial.
Then there exist polynomials a1, . . . , am, h ∈ R with the following properties:
1. f = a1 · f1 + · · · + am · fm + h;
2. lm(f) ≥ lm(ai · fi) for f = 0 and every i such that ai · fi = 0;
3. if h = 0, then lm(h) is not divisible by any of lm(ai · fi).
Moreover,if G = {f1, . . . , fm} is a Gröbner basis, then the polynomial h is
unique.
Definition 11.1.14 Let F = {f1, . . . , fm} ⊂ R and f ∈ R. We define the
normal form of f w.r.t F to be any h from Theorem 11.1.13. Notation is
NF(f|F) := h.
Remark 11.1.15 1. If R = F[x] and f1 := g ∈ R, then NF(f| g ) is exactly
the remainder of division of the univariate polynomial f by the polynomial
g. So the notion of a normal form generalizes the notion of a remainder
for the case of a multivariate polynomial ring.
2. The notion of a normal form is uniquely defined only if f1, . . . , fm is a
Gröbner basis.
3. Normal form has a very important property: f ∈ I ⇐⇒ NF(f|G) = 0,
where G is a Gröbner basis of I. So by computing a Gröbner basis of the
given ideal I and then computing the normal form of the given polynomial
f we may decide whether f belongs to I or not.
The algorithm for computing a normal form proceeds as in Algorithm 11.2.
In Algorithm 11.2 the function Exists_LT_Divisor(F,h) returns an index i
such that lt(fi) divides lt(h) if such index exists and 0 otherwise. Note that the
algorithm may also be adapted so that it returns the polynomial combination of
fi’s such that together with h it satisfies conditions (1)–(3) of Theorem 11.1.13.

Algorithm 11.2 NF(f|F)
Input:
- Polynomial ring R with monomial ordering
- Set of polynomials F = {f1, . . . , fm} ⊂ R
- Polynomial f ∈ R
Output: a polynomial h which satisfies (1)–(3) of Theorem 11.1.13 for the set
F and the polynomial f with some ai’s from R
Begin
h := f;
i :=Exists_LT_Divisor(F, h);
while h = 0 and i do
h := h − lt(h)/lt(fi) ∗ fi;
i :=Exists_LT_Divisor(F, h);
end while
return h
End
Example 11.1.16 Let R = Q[x, y] and the monomial ordering being lexico-
graphic ordering with x y. Let f = x2
− y3
and F = {f1, f2} = {x2
+
x + y, x3
+ xy + y3
}. At the beginning of Algorithm 11.2 h = f. Now
Exists_LT_Divisor(F,h)=1 so we enter the while-loop. In the while-loop fol-
lowing assignment is made h := h−lt(h)/lt(f1)·f1 = x2
−y3
−x2
/x2
·(x2
+x+y) =
−x − y3
− y. We compute again Exists_LT_Divisor(F,h)=0. So we do not
enter in the second loop and h = −x − y3
− y is a normal form of f we looked
for.
Now we are ready to formulate the Buchberger’s algorithm for finding a Gröbner
basis of an ideal: Algorithm 11.3.
The main idea of the algorithm is: if after “canceling” leading terms of the
current pair (also called a critical pair) we cannot “divide” the result by the
current set, then add the result to this set and add all new pairs to the set of
critical pairs. The next example shows the algorithm in action.
Example 11.1.17 We take as a basis Example 11.1.16. So R = Q[x, y] with
the monomial ordering being lexicographic ordering with x y, and we have
f1 = x2
+ x + y, f2 = x3
+ xy + y3
. Initialization phase yields G := {f1, f2}
and Pairs := {(f1, f2)}. Now we enter the while-loop. We have to compute
h = NF(spoly(f1, f2)|G). First, spoly(f1, f2) = x · f1 − f2 = x3
+ x2
+ xy − x3
−
xy−y3
= x2
−y3
. As we know from Example 11.1.16, NF(x2
−y3
|G) = −x−y3
−y
and is non-zero. Therefore, we add f3 := h to G and add pairs (f3, f1) and
(f3, f2) to Pairs. Recall that pair (f1, f2) is no longer in Pairs, so now we have
two elements there.
In the second run of the loop we take the pair (f3, f1) and remove it from
Pairs. Now spoly(f3, f1) = −xy3
− xy + x + y and NF(−xy3
− xy + x +
y|G) = y6
+ 2y4
− y3
+ y2
=: f4. We update the sets G and Pairs. Now
Pairs = {(f3, f2), (f4, f1), (f4, f2), (f4, f3)} and G = {f1, f2, f3, f4}. Next take
the pair (f3, f2). For this pair spoly(f3, f2) = −x2
y3
− x2
y + xy + y3
and
NF(spoly(f3, f2)|G) = 0. It may be shown that likewise all the other pairs
from the set Pairs reduce to 0 w.r.t G. Therefore, the algorithm outputs
G = {f1, f2, f3, f4} as a Gröbner basis of f1, f2 w.r.t lexicographic ordering.

Algorithm 11.3 Buchberger(F)
Input:
- Polynomial ring R with monomial ordering
- Normal form procedure NF
- Set of polynomials F = {f1, . . . , fm} ⊂ R
Output: Set of polynomials G ⊂ R such that G is a Gr¨obner basis of the ideal
generated by the set F w.r.t monomial ordering
Begin
G := {f1, . . . , fm};
while Pairs = empty do
Select a pair (f, g) ∈ Pairs;
Remove the pair (f, g) from Pairs;
h := NF(spoly(f, g)|G);
if h = 0 then
for all p ∈ G do
Add pair (h, p) to Pairs;
end for
Add h to G;
end if
end while
return G
End
Example 11.1.18 [CAS] Now we will show how to compute the above exam-
ples in Singular and Magma. In Singular one has to execute the following code:
ring r=0,(x,y),lp;
poly f1=x2+x+y;
poly f2=x3+xy+y3;
ideal I=f1,f2;
ideal GBI=std(I);
GBI;
GBI[1]=y6+2y4-y3+y2
GBI[2]=x+y3+y
One may request computation of the reduced Gr¨obner basis by switching on the
option option(redSB). In the above example GBI is already reduced. Now if
we compute the normal form of f1-f2 w.r.t GBI it should be zero.
NF(f1-f2,GBI);
0
It is also possible to track computations for small examples using LIBteachstd.lib;.
One should add this line at the beginning of the above piece of code together
with the line printlevel=1;, which makes program comments visible. Then
one should use standard(I) instead of std(I) to see the run in detail. Simi-
larly, NFMora(f1-f2,I) should be used instead of NF(f1-f2,I).
In Magma the following pieace of code does the job:
Px,y := PolynomialRing(Rationals(), 2, lex);
I:=[x^2+x+y,x^3+x*y+y^3];
G:=GroebnerBabsis(I);
NormalForm(I[1]-I[2],G);

Now that we have introduced techniques necessary to compute Gröbner bases,
let us demonstrate one of the main applications of Gröbner bases, namely poly-
nomial system solving. The following result shows how one can solve a poly-
nomial system of equations, provided one can compute a Gröbner basis w.r.t
lexicographic ordering.
Theorem 11.1.19 Let f1(X) = · · · = fm(X) = 0 be a system of polynomial
equations defined over F[X] with X = (x1, . . . , xn), such that it has finitely
many solutions 1
. Let I = f1, . . . , fm be an ideal defined by the polynomials
in the system and let G be a Gröbner basis for I with respect to lp induced by
xn · · · x1. Then there are elements g1, . . . , gn ∈ G such that
gn ∈ F[xn], lt(gn) = cnxmn
n ,
gn−1 ∈ F[xn−1, xn], lt(gn−1) = cn−1x
mn−1
n−1 ,
. . .
g1 ∈ F[x1, . . . , xn], lt(g1) = c1xm1
1 .
for some positive integers mi, i = 1, . . . , n and elements ci ∈ F{0}, i = 1, . . . , n.
It is clear how to solve the system I now. After computing G, first solve a
univariate equation gn(xn) = 0. Let a
(n)
1 , . . . , a
(n)
ln
be the roots. For every
a
(n)
i then solve gn−1(xn−1, a
(n)
i ) = 0 to find possible values for xn−1. Repeat
this process until all the coordinates of all candidate solutions are found. The
candidates form a finite set Can ⊆ Fn
. Test all other elements of G on whether
they vanish at elements of Can. If there is some g ∈ G that does not vanish at
some can ∈ Can, then discard can from Can. Since the number of solutions is
finite the above procedure terminates.
Example 11.1.20 Let us be more specific and give a concrete example of how
Theorem 11.1.19 can be applied. Turn back to Example 11.1.17. Suppose we
want to solve a system of equations x2
+ x + y = 0, x3
+ xy + y3
= 0 over the
rationals. We compute a Gröbner basis of the corresponding ideal and obtain
that elements f4 = y6
+2y4
−y3
+y2
and f3 = −x−y3
−y belong to the Gröbner
basis. Since f4 has finitely many solutions (at most 6 over the rationals) and for
every fixed value of y f3 has exactly one solution of x, we actually know that
our system has finitely many solutions, both over the rationals and its algebraic
closure. In order to find solutions, we have to solve the univariate equation
y6
+2y4
−y3
+y2
= 0 for y. If we factorize, we obtain f4 = y2
(y4
+2y2
−y +1),
where y4
+ 2y2
− y + 1 is irreducible over Q. So from the equation f4 = 0 we
only get y = 0 as a solution. Then for y = 0 the equation f3 = 0 yields x = 0.
Therefore, over rationals the given system has a unique solution (0, 0).
Example 11.1.21 Let us consider another example. Consider the following
1Rigorously speaking, we rewuire the system to have finitely many solutions in ¯F. Such
systems (ideals) are called zero-dimensional.

system over F2:
xy + x + y + z = 0,
xz + yz + y = 0,
x + yz + z = 0,
x2
+ x = 0,
y2
+ y = 0,
z2
+ z = 0.
Note that the field equations x2
+ x = 0, y2
+ y = 0, z2
+ z = 0 make sure that
any solution for the first three equations actually lies in F2. Since F2 is a finite
field, we automatically get that the system above has finitely many solutions
(in fact not more than 23
= 8). One can show that reduced Grbner basis (see
Remark 11.1.9) w.r.t lexicographic ordering with x y z of the corresponding
ideal is: G = {z2
+z, y +z, x}. From this we obtain that the system in question
has two solutions: (0, 0, 0) and (0, 1, 1).
In Sections 11.2 and 11.2 we will see many more situations when Gröbner bases
are applied in the solving context. Gröbner basis techniques are also used for
answering many other important questions. To end this section, we give one
such application. *** should this go to Section 11.3? ***
Example 11.1.22 Sometimes it is needed to obtain explicit equations relating
certain variables from given implicit ones. The following example is quite typical
in algebraic cryptanalysis of block ciphers. One of the main building blocks of
modern block ciphers are the so-called S–Boxes, local non-linear transformations
that in composition with other, often linear, mappings compose a secure block
cipher. Suppose we have an S–Box S that transforms a 4-bit vector into a 4-
bit vector in a non-linear way as follows. Consider a non-zero binary vector x
as an element in F24 via an identification of F4
2 and F24 = F2[a]/ a4
+ a + 1
done in a usual way, so that e.g. a vector (0, 1, 0, 0) is mapped to the primitive
element a, and (0, 1, 0, 1) is mapped to a + a3
. Now if x is considered as an
element of F24 the S–Box S maps it to y = x−1
and then considers it again
as a vector via the above identification. The zero vector is mapped to the zero
vector. Not going deeply into details, we just state that such a transformation
can be represented over F2 as a system of quadratic equations that implicitly
relate the input variables x with the output variables y. The equations are
y0x0 + y3x1 + y2x2 + y1x3 + 1 = 0,
y1x0 + y0x1 + y3x1 + y2x2 + y3x2 + y1x3 + y2x3 = 0,
y2x0 + y1x1 + y0x2 + y3x2 + y2x3 + y3x3 = 0,
y3x0 + y2x1 + y1x2 + y0x3 + y3x3 = 0,
together with the field equations x2
i + xi = 0 and y2
i + yi = 0 for i = 0, . . . , 3.
The equations do not describe the part when 0 is mapped to 0, so only the
inversion is modeled.
In certain situations it is more preferable to have explicit relations that would
should how the output variables y depend on the input variables x. For this
the following technique is used. Consider the above equations as polynomials
in the same polynomial ring F2[y0, . . . , y3, x0, . . . , x3] with y0 · · · y3 x0
· · · x3 w.r.t the block ordering with blocks being y- and x-variables, and the
ordering is degree reverse lexicographic in each block (see Example 11.1.6). In

this ordering, each monomial in y-variables will be larger than any monomial
in x-variable, regardless of their degree. This ordering is a so-called elimination
ordering for y variables. The reduced Gröbner basis of the ideal composed of
the above equations is
x2
i + xi,
the field equations on x;
(x0 + 1) · (x1 + 1) · (x2 + 1) · (x3 + 1),
which provides that x should not be the all-zero vector, and
y3 + x1x2x3 + x0x3 + x1x3 + x2x3 + x1 + x2 + x3,
y2 + x0x2x3 + x0x1 + x0x2 + x0x3 + x2 + x3,
y1 + x0x1x3 + x0x1 + x0x2 + x1x2 + x1x3 + x3,
y0 + x0x1x2 + x1x2x3 + x0x2 + x1x2 + x0 + x1 + x2 + x3,
which give explicit relations on y in terms of x. Interestingly enough, the field
equations together with the latter explicit equations describe the entire S–Box
transformation, so the case 0 → 0 is also covered.
Using similar techniques one may obtain other interesting properties of ideals,
which come in hand in different applications.
11.1.3 Exercises
11.1.1 Let R = Q[x, y, z] and let F = {f1, f2} with f1 = x2
+ xy + z2
, f2 =
y2
+ z and let f = x3
+ 2y3
− z3
. The monomial ordering is degree lexico-
graphic. Compute NF(f|F). Use the procedure NFMora from the Singular’s
teachstd.lib to check your result 2
.
11.1.2 Let R = F2[x, y, z] and let F = {f1, f2} with f1 = x2
+ xy + z2
, f2 =
xy + z2
. The monomial ordering is lexicographic. Compute a Gröbner basis of
F . Use the procedure standard from the Singular’s teachstd.lib to check
your result.
11.1.3 [CAS] Recall that in Example 11.1.20 we came to the conclusion that
the only solution over the rationals for the system is (0, 0). Use Singular’s library
solve.lib and in particular the command solve to find also complex solutions
of this system.
11.1.4 Upgrade Algorithm 11.2 so that it also returns ai’s from Theorem
11.1.19.
11.1.5 Prove the so-called product criterion: if polynomials f and g are such
that lm(f) and lm(g) are co-prime, then NF(spoly(f, g)|{f, g}) = 0.
11.1.6 Do the following sets constitute a Gröbner basis:
2By setting printing level appropriately, procedures of teachstd.lib enable tracking their
run. Therewith one may see exactly what a corresponding algorithm is doing.

11.2. DECODING CODES WITH GR ÖBNER BASES 363
1. F1 := {xy + 1, yz + x + y + 2} ⊂ Q[x, y, z] with the degree ordering being
degree lexicographic?
2. F2 := {x+20, y+10, z+12, u+1} ⊂ F23[x, y, z, u] with the degree ordering
being the block ordering with blocks (x, y) and (z, u) and degree reverse
lexicographic ordering inside the blocks?
11.2 Decoding codes with Gröbner bases
As the first application of the Gröbner basis method we consider decoding linear
codes. For the clarity of presentation we make an emphasis on cyclic codes. We
consider Cooper’s philosophy or the power sums method in Section 11.2.1 and
the method of generalized Newton identities in Section 11.2.2. In Section 11.2.3
we provide a brief overview of methods for decoding general linear codes.
11.2.1 Cooper’s philosophy
Now we will give an introduction to the so-called Cooper’s philosophy or the
power sums method. This method uses the special form of a parity-check matrix
of a cyclic code. The main idea is to write these parity check equations with
unknowns for error positions and error values and then solve with respect to
these unknowns by adding some natural restrictions on them.
Let F = Fqm be the splitting field of Xn
− 1 over Fq. Let a be a primitive n-th
root of unity in F. If i is in the defining set of a cyclic code C (Definition ??),
then
(1, ai
, . . . , a(n−1)i
)cT
= c0 + c1ai
+ · · · + cn−1a(n−1)i
= c(ai
) = 0,
for every codeword c ∈ C. Hence (1, ai
, . . . , a(n−1)i
) is a parity check of C. Let
{i1, . . . , ir} be a defining set of C. Then a parity check matrix H of C can be
represented as a matrix with entries in F (see also Section 7.5.3):
H =





1 ai1
a2i1
. . . a(n−1)i1
1 ai2
a2i2
. . . a(n−1)i2
...
...
...
...
...
1 air
a2ir
. . . a(n−1)ir





.
Let c, r and e be the transmitted codeword, the received word and the error
vector, respectively. Then r = c + e. Denote the corresponding polynomials by
c(x), r(x) and e(x), respectively. If we apply the parity check matrix to r, we
obtain
sT
:= HrT
= H(cT
+ eT
) = HcT
+ HeT
= HeT
,
since HcT
= 0, where s is the syndrome vector. Define si = r(ai
) for all
i = 1, . . . , n. Then si = e(ai
) for all i in the complete defining set, and these
si are called the known syndromes. The remaining si are called the unknown
syndromes. We have that the vector s above has entries s = (si1
, . . . , sir
). Let t
be the number of errors that occurred while transmitting c over a noisy channel.
If the error vector is of weight t, then it is of the form
e = (0, . . . , 0, ej1
, 0, . . . , 0, ejl
, 0, . . . , 0, ejt
, 0, . . . , 0).

More precisely there are t indices jl with 1 ≤ j1 · · · jt ≤ n such that ejl
= 0
for all l = 1, . . . , t and ej = 0 for all j not in {j1, . . . , jt}. We obtain
siu = r(aiu
) = e(aiu
) =
t
l=1
ejl
(aiu
)jl
, 1 ≤ u ≤ r. (11.2)
The aj1
, . . . , ajt
but also the j1, . . . , jt are the error locations, and the ej1 , . . . , ejt
are the error values. Define zl = ajl
and yl = ejl
. Then z1, . . . , zt are the error
locations and y1, . . . , yt are the error values and the syndromes in (11.2) become
generalized power sum functions
siu
=
t
l=1
ylziu
l , 1 ≤ u ≤ r. (11.3)
In the binary case the error values are yi = 1, and the syndromes are the ordi-
nary power sums.
Now we give a description of Cooper’s philosophy. As the receiver does not know
how many errors occurred, the upper bound t is replaced by the error-correcting
capacity e and some zl’s are allowed to be zero, while assuming that the number
of errors is at most the error-correcting capacity e. The following variables are
introduced: X1, . . . , Xr, Z1, . . . , Ze and Y1, . . . , Ye, where Xu stands for the
syndrome siu
, 1 ≤ u ≤ r; Zl stands for the error location zl for 1 ≤ l ≤ t, and 0
for t l ≤ e; and finally Yl stands for the error value yl for 1 ≤ l ≤ t, and any
element of Fq {0} for t l ≤ e. The syndrome equations (11.2) are rewritten
in terms of these variables as power sums:
fu :=
e
l=1
YlZiu
l − Xu = 0, 1 ≤ u ≤ r.
We also add some other equations in order to specify the range of values that
can be achieved by our variables, namely:
u := Xqm
u − Xu = 0, 1 ≤ u ≤ r,
since sj ∈ F; *** add field equations in the Appendix ***
ηl := Zn+1
l − Zl = 0, 1 ≤ l ≤ e,
since ajl
are either n-th roots of unity or zero; and
λl := Y q−1
l − 1 = 0, 1 ≤ l ≤ e,
since yl ∈ Fq {0}. We obtain the following set of polynomials in the variables
X = (X1, . . . , Xr), Z = (Z1, . . . , Ze) and Y = (Y1, . . . , Ye):
FC = {fu, u, ηl, λl : 1 ≤ u ≤ r, 1 ≤ l ≤ e} ⊂ Fq[X, Z, Y ]. (11.4)
The zero-dimensional ideal IC generated by FC is called the CRHT-syndrome
ideal associated to the code C, and the variety V (FC) defined by FC is called
the CRHT-syndrome variety, after Chen, Reed, Helleseth and Truong. We have

V (FC) = V (IC).
Initially decoding of cyclic codes was essentially brought to finding the re-
duced Gröbner basis of the CRHT-ideal. Unfortunately, the CRHT-variety
has many spurious elements, i.e. elements that do not correspond to error po-
sitions/values. It turns out that adding more polynomials to the CRHT-ideal
gives an opportunity to eliminate these spurious elements. By adding polyno-
mials
χl,m := ZlZmp(n, Zl, Zm) = 0, 1 ≤ l m ≤ e
to FC, where
p(n, X, Y ) =
Xn
− Y n
X − Y
=
n−1
i=0
Xi
Y n−1−i
, (11.5)
we ensure that for all l and m either Zl and Zm are distinct or at least one of
them is zero. The resulting set of polynomials is
FC := {fu, u, ηi, λi, χl,m : 1 ≤ u ≤ r, 1 ≤ i ≤ e, 1 ≤ l m ≤ e} ⊂ Fq[X, Z, Y ].
(11.6)
The ideal generated by FC is denoted by IC. By investigating the structure of
IC and its reduced Gröbner basis with respect to lexicographic order induced
by X1 · · · Xr Ze · · · Z1 Y1 · · · Ye, the following result may
be proven.
Theorem 11.2.1 Every cyclic code C possesses a general error-locator polyno-
mial LC. This means that there exists a unique polynomial LC ∈ Fq[X1, . . . , Xr, Z]
that satisfies the following two properties:
• LC = Ze
+ at−1Ze−1
+ · · · + a0 with aj ∈ Fq[X1, . . . , Xr], 0 ≤ j ≤ e − 1;
• given a syndrome s = (si1
, . . . , sir
) ∈ Fr
corresponding to an error of
weight t ≤ e and error locations {k1, . . . , kt}, if we evaluate the Xu = siu
for all 1 ≤ u ≤ r, then the roots of LC(s, Z) are exactly ak1
, . . . , akt
and
0 of multiplicity e − t, in other words
LC(s, Z) = Ze−t
t
i=1
(Z − aki
).
Moreover, LC belongs to the reduced Gröbner basis of the ideal IC and its is a
unique element, which is a univariate polynomial in Ze of degree e. *** check
this ***
Having this polynomial, decoding of the cyclic code C reduces to univariate
factorization. The main effort here is finding the reduced Gröbner basis of IC.
In general this is infeasible already for moderate size codes. For small codes,
though, it is possible to apply this technique successfully.
Example 11.2.2 As an example we consider finding the general error locator
polynomial for a binary cyclic BCH code C with parameters [15,7,5] that corrects
2 errors. This code has {1, 3} as a defining set. So here q = 2, m = 4, and n = 15.
The field F16 is the splitting field of X15
− 1 over F2. In the above description
we have to write equations for all syndromes that correspond to elements in

the complete defining set. Note that we may write the equations only for the
elements from the defining set {1, 3} as all the others are just consequences of
those. Following the description above we write generators FC of the ideal IC
in the ring F2[X1, X2, Z1, Z2]:



Z1 + Z2 − X1, Z3
1 + Z3
2 − X2,
X16
1 − X1, X16
2 − X2,
Z16
1 − Z1, Z16
2 − Z2,
Z1Z2p(15, Z1, Z2).
We suppress the equations λ1 and λ2 as error values are over F2. In order
to find the general error locator polynomial we compute the reduced Gröbner
basis G of the ideal IC with respect to the lexicographical order induced by
X1 X2 Z2 Z1. The elements of G are:



X16
1 + X1,
X2X15
1 + X2,
X8
2 + X4
2 X12
1 + X2
2 X3
1 + X2X6
1 ,
Z2X15
1 + Z2,
Z2
2 + Z2X1 + X2X14
1 + X2
1 ,
Z1 + Z2 + X1
According to Theorem 11.2.1 the general error correcting polynomial LC is a
unique element of G of degree 2 with respect to Z2. So LC ∈ F2[X1, X2, Z] is
LC(X1, X2, Z) = Z2
+ ZX1 + X2X14
1 + X2
1 .
Let us see how decoding using LC works. Let
r = (1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1)
be a received word with 2 errors. In the field F16 with a primitive element a,
such that a4
+ a + 1 = 0, a is also a 15-th root of unity. Then the syndromes
are s1 = a2
, s3 = a14
. Plug them into LC in place of X1 and X2 and obtain:
LC(Z) = Z2
+ a2
Z + a6
.
Factorizing yields LC = (Z + a)(Z + a5
). According to Theorem 11.2.1, expo-
nents 1 and 5 show exactly the error locations minus 1. So that errors occurred
on positions 2 and 6.
Example 11.2.3 [CAS] All the computations in the previous example may be
undertaken using the library decodegb.lib of Singular. The following Singular-
code yields the CRHT-ideal and its reduced Gröbner basis.
LIB decodegb.lib;
// binary cyclic [15,7,5] code with a defining set (1,3)
list defset=1,3; // defining set
int n=15; // length
int e=2; // error-correcting capacity
int q=2; // base field size
int m=4; // degree extension of the splitting field
int sala=1; // indicator to add additional equations as in (11.5)

def A=sysCRHT(n,defset,e,q,m,sala);
setring A; // set the polynomial ring for the system ’crht’
option(redSB); // compute reduced Groebner basis
ideal red_crht=std(crht);
Now, inspecting the ideal red_crht we see which polynomial should we take as
a general error-locator polynomial according to Theorem 11.2.1.
poly gen_err_loc_poly=red_crht[5];
At this point we have to change to a splitting field in order to do our further
computations.
list l=ringlist(basering);
l[1][4]=ideal(a4+a+1);
def B=ring(l);
setring B;
poly gen_err_loc_poly=imap(A,gen_err_loc_poly);
We can now process our received vector and compute the syndromes:
matrix rec[1][n]=1,1,0,1,0,0,0,0,0,0,1,1,1,0,1;
matrix checkrow1[1][n];
matrix checkrow3[1][n];
int i;
number work=a;
for (i=0; i=n-1; i++) {
checkrow1[1,i+1]=workî;
}
work=a^3;
for (i=0; i=n-1; i++){
checkrow3[1,i+1]=workî;
}
// compute syndromes
matrix s1mat=checkrow1*transpose(rec);
matrix s3mat=checkrow3*transpose(rec);
number s1=number(s1mat[1,1]);
number s3=number(s3mat[1,1]);
One can now substitute and solve
poly specialized_gen=substitute(gen_err_loc_poly,X(1),s1,X(2),s3);
factorize(specialized_gen);
[1]:
_[1]=1
_[2]=Z(2)+(a)
_[3]=Z(2)+(a^2+a)
[2]:
1,1,1
One can also check that a^5=a^2+a.
So we have seen that it is theoretically possible to encode all the information
needed for decoding a cyclic code in one polynomial. Finding this polynomial,
though, is a quite challenging task. Moreover, note that the polynomial coef-
ficients aj ∈ Fq[X1, . . . , Xr] may be quite dense, so it may be a problem even
just to store the polynomial LC. The method, nevertheless, provides efficient
closed formulas for small codes that are relevant in practice. This method can
be adapted to correct erasures and to find the minimum distance of a code.

More information on these issues is in Notes.
11.2.2 Newton identities based method
In Section 7.5.2 and Section 7.5.3 we have seen how Newton identities can be
used for efficient decoding of cyclic codes up to half the BCH bound. Now we
want to generalize this method and be able to decode up to half the minimum
distance. In order to correct more errors we have to pay a price. Systems we
have to solve are no longer linear, but quadratic. This is exactly where Gröbner
basis techniques come into play.
Let us recall necessary notions. Note that we change the notation a bit, as it will
be convenient for the generalization. The error-locator polynomial is defined by
σ(Z) =
t
l=1
(Z − zl).
If this product is expanded
σ(Z) = Zt
+ σ1Zt−1
+ · · · + σt−1Z + σt,
then the coefficients σi are the elementary symmetric functions in the error
locations z1, . . . , zt.
σi = (−1)i
1≤j1j2···ji≤t
zj1
zj2
. . . zji
, 1 ≤ i ≤ t.
The syndromes si and the coefficients σi satisfy the following generalized Newton
identities, see Proposition 7.5.8:
si +
t
j=1
σjsi−j = 0, for all i ∈ Zn. (11.7)
Now suppose that the complete defining set of the cyclic code contains the 2t
consecutive elements b, . . . , b + 2t − 1 for some b. Then d ≥ 2t + 1 by the BCH
bound. Furthermore the set of equations (11.7) for i = b + t, . . . , b + 2t − 1
is a system of t linear equations in the unknowns σ1, . . . , σt with the known
syndromes sb, . . . , sb+2t−1 as coefficients. Gaussian elimination solves the system
of equations with complexity O(t3
). In this way we obtain the APGZ decoding
algorithm, see Section 7.5.3. See Example 7.5.11 for the algorithm in action on
a small example.
One may go further and obtain closed formulas or solve the decoding problem
via the key equation, see Section ?? Section ??. All the above mentioned algo-
rithms from Chapter 7 decode up to the BCH error-correcting capacity, which
is often strictly smaller than the true capacity. A general method was out-
lined by Berlekamp, Tzeng, Hartmann, Chien, and Stevens, where the unknown
syndromes were treated as variables. We have
si+n = si, for all i ∈ Zn,
since si+n = r(ai+n
) = r(ai
). Furthermore
sq
i = (e(ai
))q
= e(aiq
) = sqi, for all i ∈ Zn,

and
σqm
i = σi, for all 1 ≤ i ≤ t.
So the zeros of the following set of polynomials Newtont in the variables S1, . . . , Sn
and σ1, . . . , σt are considered.
Newtont



σqm
i − σi, for all 1 ≤ i ≤ t,
Si+n − Si, for all i ∈ Zn,
Sq
i − Sqi, for all i ∈ Zn,
Si +
t
j=1 σjSi−j, for all i ∈ Zn.
(11.8)
Solutions of Newtont are called generic, formal or one-step. Computing these
solutions is considered as a preprocessing phase which has to be performed only
one time. For the actual decoder for every received word r the variables Si are
specialized to the actual value si(r) for i ∈ SC. Alternatively one can solve
Newtont together with the polynomials Si − si(r) for i ∈ SC. This is called
online decoding. Note that obtaining general error-locator polynomial as in the
previous subsection is an example of formal decoding: this polynomial has to
be found only once.
Example 11.2.4 Let us consider an example of decoding using Newton identi-
ties and such that the APGZ algorithm is not applicable. We consider a 3-error
correcting cyclic code of length 31 with a deﬁning set {1, 5, 7}. Note that BCH
error-correcting capacity of this code is 2. We are aiming now at correcting 3
errors. Let us write the corresponding ideal:



σ1S31 + σ2S30 + σ3S29 + S1,
σ1S1 + σ2S31 + σ3S30 + S2,
σ1S2 + σ2S1 + σ3S31 + S3,
σ1Si−1 + σ2Si−2 + σ3Si−3 + Si, 4 ≤ i ≤ 31,
σ32
i + σi, i = 1, 2, 3,
Si+31 + Si, for all i ∈ Z31,
S2
i + S2i, for all i ∈ Z31,
Note that the equations Si+31 = Si, and S2
i = S2i imply,



S2
1 + S2, S4
1 + S4, S8
1 + S8, S16
1 + S16,
S2
3 + S6, S4
3 + S12, S8
3 + S24, S16
3 + S17,
S2
5 + S10, S4
5 + S20, S8
5 + S9, S16
5 + S18,
S2
7 + S14, S4
7 + S28, S8
7 + S25, S16
7 + S19,
S2
3 + S6, S4
3 + S12, S8
3 + S24, S16
3 + S17,
S2
11 + S22, S4
11 + S13, S8
11 + S26, S16
11 + S21,
S2
15 + S30, S4
15 + S29, S8
15 + S27, S16
15 + S23,
S2
31 + S31.
Our intent is to write σ1, σ2, σ3 in terms of known syndromes S1, S5, S7. The
next step would be to compute the reduced Gr¨obner basis of this system with
respect to some elimination order induced by S31 · · · S8 S6 S4 · · ·
S2 σ1 σ2 σ3 S7 S5 S1. Unfortunately the computation is quite
time consuming and the result is too huge to illustrate the idea. Rather, we

do online decoding, i.e. for a concrete received r compute syndromes S1, S5, S7,
plug the values into the system and then find σ’s. Let
r = (0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1)
be a received word with three errors. So the known syndromes we need are
s1 = a7
, s5 = a25
and s7 = a29
. Substitute these values into the system above
and compute the reduced Gröbner basis of the system. The reduced Gröbner
basis with respect to the degree reverse lexicographic order (here it is possible
to go without an elimination order, see Remark ??) restricted to the variables
σ1, σ2, σ3 is 


σ3 + a5
,
σ2 + a3
,
σ1 + a7
Corresponding values for σ’s gives rise to the error locator polynomial:
σ(Z) = Z3
+ a7
Z2
+ a3
Z + a5
.
Factoring this polynomial yields three roots: a4
, a10
, a22
, which indicate error
positions 5, 11, and 23.
Note also that we could have worked only with the equations for S1, S5, S7, S3, S11, S15, S31,
but the Gröbner basis computation is harder then.
Example 11.2.5 [CAS] The following program fulfils the above computation
using decodegb.lib from Singular.
LIBdecodegb.lib;
int n=31; // length
list defset=1,5,7; // defining set
int t=3; // number of errors
int q=2; // base field size
int m=5; // degree extension of the splitting field
def A=sysNewton(n,defset,t,q,m);
setring A;
// change the ring to work in the splitting field
list l=ringlist(basering);
l[1][4]=ideal(a5+a2+1);
def B=ring(l);
setring B;
ideal newton=imap(A,newton);
matrix rec[1][n]=0,0,1,0,0,1,1,1,1,0,1,1,0,0,1,1,0,1,0,0,0,0,0,1,0,0,1,1,0,0,1;
// compute the parity-check rows for defining set (1,5,7)
// similarly to the example with CRHT
...
// compute syndromes s1,s5,s7
// analogously to the CRHT-example
...
// substitute the known syndromes in the system
ideal specialize_newton;
for (i=1; i=size(newton); i++) {
specialize_newton[i]=substitute(newton[i],S(1),s1,S(5),s5,S(7),s7);
}

option(redSB);
// find sigmas
ideal red_spec_newt=std(specialize_newton);
// identify values of sigma_1m sigma_2, and sigma_3
// find the roots of the error-locator
ring solve=(2,a),Z,lp;minpoly=a5+a2+1;
poly error_loc=Z3+(a4+a2)*Z2+(a^3)*Z+(a^2+1); // sigma’s plugged in
factorize(error_loc);
So as we see, by using Gröbner basis it is possible to go beyond the BCH error-
correcting capacity. The price paid is the complexity of solving quadratic, as
opposed to linear, systems. *** more stuff in notes ***
11.2.3 Decoding arbitrary linear codes
Now we will outline a couple of ideas that may be used for decoding arbitrary
linear codes up to the full error-correcting capacity.
Decoding affine variety codes with Fitzgerald-Lax
The following method generalizes ideas of Cooper’s philosophy to arbitrary lin-
ear codes. In this approach the main notion is the affine variety code. Let
P1, . . . , Pn be points in Fs
q. It is possible to compute a Gröbner basis of an ideal
I ⊆ Fq[U1, . . . , Us] of polynomials that vanish exactly at these points. Define
Iq := I + Xq
1 − X1, . . . , Xq
s − Xs . So Iq is a 0-dimensional ideal. We have
V (Iq) = {P1, . . . , Pn}. An affine variety code C(I, L) = φ(L) is an image of the
evaluation map
φ : R → Fn
q ,
¯f → (f(P1), . . . , f(Pn)),
where R := Fq[U1, . . . , Us]/Iq, L is a vector subspace of R and ¯f is the coset
of f in Fq[U1, . . . , Us] modulo Iq. It is possible to show that every q-ary linear
[n, k] code, or equivalently its dual, can be represented as an affine variety code
for certain choice of parameters. See Exercise 11.2.2 for such a construction in
the case of cyclic codes.
In order to write a system of polynomial equations similar to the one in Sec-
tion 11.2.1 one needs to generalize the CRHT approach to affine codes. Simi-
larly to the CRHT method the system of equations (or equivalently the ideal)
is composed of the “parity-check” part and the “constraints” part. Parity-
check part is constructed according to the evaluation map φ. Now, as can
be seen from Exercise 11.2.2, the points P1, . . . , Pn encode positions in a vec-
tor, similarly to how ai
encode positions in the case of a cyclic code, a be-
ing a primitive n-th root of unity. Therefore, one needs to add polynomials
(gl(Xk1, . . . , Xks))l=1,...,m;k=1,...,t for every error position. Adding other natural
constraints, like field equations on error values, and then computing a Gröbner
basis of the combined ideal IC w.r.t certain elimination ordering, it is possible
to recover both error positions (i.e. values of “error points”) and error values.
In general, finding I and L is quite technical and it turns out that for random
codes this method is quite poor, because of the complicated structure of IC.
The method may be quite efficient, though, if a code has more structure, like in
the case of geometric codes (e.g. Hermitian codes). We mention also that there

are improvements of the approach of Fitzgerald and Lax, which follow the same
idea as the improvements for the CRHT-method. Namely, one adds polynomials
that ensure that the error locations are different. It can be proven that affine
variety codes possess the so-called multi-dimensional general error-locator poly-
nomial, which is a generalization of the general error-locator polynomial from
Theorem 11.2.1.
Decoding by embedding in an MDS code
Now we briefly outline a method that provides a system for decoding that is
composed of at most quadratic equations. The main feature of the method is
that we do not need field equations for the solution to lie in a correct domain.
Let C be an Fq-linear [n, k] code with error correcting capacity e. Choose a
parity check matrix H of C. Let h1, . . . , hr be the rows of H. Let b1, . . . , bn
be a basis of Fn
q . Let Bs be the s × n matrix with b1, . . . , bs as rows, then
B = Bn. We say that b1, . . . , bn is an ordered MDS basis and B an MDS
matrix if all the s × s submatrices of Bs have rank s for all s = 1, . . . , n. Note
that an MDS basis for Fn
q always exists if n ≤ q. By extending an initial field
to a sufficiently large degree, we may assume that an MDS basis exists there.
Since the parameters of a code do not change when going to a scalar extension,
we may assume that our code C is defined over this sufficiently large Fq with
q ≥ n. Each row hi is then a linear combination of the basis b1, . . . , bn, that
is there are constants aij ∈ Fq such that hi =
n
j=1 aijbj. In other words
H = AB where A is the r × n matrix with entries aij. For every i and j, bi ∗ bj
is a linear combination of the basis vectors b1, . . . , bn, so there are constants
µij
l ∈ Fq such that bi ∗ bj =
n
l=1 µij
l bl. The elements µij
l ∈ Fq are called the
structure constants of the basis b1, . . . , bn. Linear functions Uij in the variables
U1, . . . , Un are defined as Uij =
n
l=1 µij
l Ul.
Definition 11.2.6 For the received vector r the ideal J(r) in the ring Fq[U1, . . . , Un]
is generated by the elements
n
l=1 ajlUl − sj(r) for j = 1, . . . , r,
where s(r) is the syndrome of r. The ideal I(t, U, V ) in the ring Fq[U1, . . . , Un, V1, . . . , Vt]
is generated by the elements
t
j=1 UijVj − Ui,t+1 for i = 1, . . . , n.
Let J(t, r) be the ideal in Fq[U1, . . . , Un, V1, . . . , Vt] generated by J(r) and
I(t, U, V ).
Now we are ready to state the main result of the method.
Theorem 11.2.7 Let B be an MDS matrix with structure constants µij
l and
linear functions Uij. Let H be a parity check matrix of the code C such that
H = AB as above. Let r = c + e be a received word with c in C the codeword
sent and e the error vector. Suppose that the weight of e is not zero and at most
e. Let t be the smallest positive integer such that J(t, r) has a solution (u, v)
over ¯Fq. Then wt(e) = t and the solution is unique satisfying u = Be. The
error vector is recovered as e = B−1
u.

So as we see, although we did not impose any field equations neither on U− nor
on V −variables, we still are able to obtain a correct solution. For the case of
cyclic codes by going to a certain field extension Fq it may be shown that the
system I(t, U, V ) actually defines the generalized Newton identities. Therefore
one of the corollaries of the above theorem that it is actually possible to work
without the field equations in the method of Newton identities.
Decoding by normal form computations
Another method for arbitrary linear codes has a different approach to how
one represents code-related information. Below we outline the idea for binary
codes. Let [X] be a commutative monoid generated by X = {X1, . . . , Xn}. The
following mapping associates a vector of reduced exponents to a monomial:
ψ : [X] → Fn
2 ,
n
i=1 Xai
i → (a1 mod 2, . . . , an mod 2).
Now, let {w1, . . . , wk} be rows of a generator matrix G of the binary [n, k] code
C with the error-correcting capacity e. Consider an ideal IC ⊆ K[X1, . . . , Xn],
where K is an arbitrary field: I := Xw1
− 1, . . . , Xwk
− 1, X2
1 − 1, . . . , X2
n − 1 .
So the ideal IC encodes the information about the code C. The next theorem
shows how one decodes using IC.
Theorem 11.2.8 Let GB be the reduced Gröbner basis of IC w.r.t some degree
compatible monomial ordering . If wt(ψ(NF(Xa
, GB))) ≤ e, then ψ(NF(Xa
, GB))
is the error vector corresponding to the received word ψ(Xa
), i.e. ψ(Xa
) −
ψ(NF(Xa
, GB)) is the codeword of C, closest to ψ(Xa
).
Note that IC is a binomial ideal, and therefore GB is also a binomial ideal.
For binomial ideals a normal form of a monomial is again a monomial. So the
computation in the theorem above are well-defined. Using the special structure
of IC it is possible to improve on Gröbner basis computations to obtain GB,
compared to usual techniques.
It is remarkable that the code-related information as well as a solution to the
decoding problem is represented by exponents of monomials, whereas in all
the methods we considered before these data are encoded as values of certain
variables.
11.2.4 Exercises
11.2.1 [CAS] Consider a binary cyclic code of length 21 with a defining set
(1, 3, 7, 9). This code has parameters [21, 7, 8], see Example 7.4.8 and Example
7.4.17. The BCH bound is 5, so we cannot correct more than 2 errors with the
methods from Chapter 7. Use the full error-correction capacity and correct 3
errors in some random codeword using methods from Section 11.2.1, Section
11.2.2, and decodegb.lib from Singular. Note that finding the general error-
locator polynomial is very intense, therefore use online decoding in the CRHT-
method: plug in concrete values of syndromes before computing a Gröbner
basis.
11.2.2 Show how a cyclic code may be considered as an affine variety code
from Section 11.2.3.

11.2.3 Using a method of normal forms decode one error in a random codeword
of the Hamming code (Example 2.2.14). Try different coefficient fields, as well
as different monomial orderings. Do you always get the same result?
11.3 Algebraic cryptanalysis
In the previous section we have seen how polynomial system solving (via Gröbner
bases) is used in the problem of decoding linear codes. In this section we briefly
highlight another interesting application of polynomial system solving. Namely,
we will be talking about algebraic cryptanalysis of block ciphers. Block ci-
phers were introduced in Chapter 10 as one of the main tools for providing
secure symmetric communication. There we also mentioned that there exist
methods for cryptanalyzing block ciphers, i.e. distinguishing them from random
permutations and using this for recovering secret key used for the encryption.
Traditional methods of cryptanalysis are statistical in nature. A cryptanalyst
or attacker queries a cipher seen as a black-box and set up with an unknown
key with (possibly chosen) plaintexts and receives corresponding ciphertexts.
By collecting many such pairs a cryptanalyst hopes to find statistical patterns
that would distinguish the cipher in question from a random permutation. Al-
gebraic cryptanalysis takes another approach. In this approach a cryptanalyst
writes down a system of polynomial equations over a finite field (usually F2),
which corresponds to the cipher in question via modeling operations done by
the cipher during the encryption process (and also key schedule) as algebraic
(polynomial) equations. Therewith the obtained system of equations reflects
the encryption process; plaintext and ciphertext are parameters of the system;
key is the unknown variable represented e.g. by bit variables. After plugging
in actual plaintext/ciphertext the system should yield the unknown secret key
as a solution. In theory, provided that plaintext- and key lengths coincide, an
attacker needs only one pair of plaintext/ciphertext to recover the key3
. This
feature distinguishes algebraic approach from the statistical one, where an at-
tacker usually needs many pairs to observe some statistical pattern.
We proceed as follows. In Section 11.3.1 we describe a toy cipher, which will
then be used to illustrate the idea outlined above. We will see how to write
equations for the toy cipher in Section 11.3.2. We will also see that it may be
possible to write equations in different ways, which can be important for actual
solving. In Section 11.3.3 we address the question of writing equations for an
arbitrary S-Box.
11.3.1 Toy example
As a toy block cipher we will take an iterative (Definition 10.1.9) block cipher
(Definition 10.1.3) with text/key length of 16 bits and a two-round encryption.
Our toy cipher is an SP-network (Definition 10.1.12). Namely in every round
we have a layer of local substitutions (S-Boxes) followed by a permutation layer.
Specifically, the encryption algorithm proceeds as in Algorithm 11.4.
In this Algorithm SBox inherits the main idea of the S-Box in the AES, see
Section 10.1.4. Namely, we divide the state vector w := (w0, . . . , w15) into four
blocks of 4 consecutive bits. Then each block of four bits is considered as an
3He/she may need a few pairs in case the size of a plaintext and key do not coincide

11.3. ALGEBRAIC CRYPTANALYSIS 375
element of the field F16
∼= F2[x]/ x4
+ x + 1 . The SBox then takes this number
and outputs an inverse in F16 for non-zero inputs, or 0 ∈ F16 otherwise. The
so obtained number is then interpreted again as a vector over F2 of length 4.
Now the permutation layer represented by Perm acts on the entire 16-bit state
vector. The bit at position i, 0 ≤ i ≤ 15 is moved to position Pos(i), where
Pos(i) =
4 · i mod 15, 0 ≤ i ≤ 14,
15, i = 15.
(11.9)
So Perm(w) = (wP os(1), . . . , wP os(15)). Interestingly enough, this permutation
provides optimal diffusion in a sense that full dependency is achieved already
after 2 rounds, see Exercise 11.3.1.
Schematically the encryption process of our toy cipher is depicted on Figure ...
. *** add figure ***
Algorithm 11.4 Toy cipher encryption
Input: A 16-bit plaintext p and a 16-bit key k.
Output: A 16-bit ciphertext c.
Begin
Perform initial key addition: w := p ⊕ k =AddKey(p, k).
for i = 1, . . . , 2 do
Perform S-box substitution: w :=SBox(w).
Perform a permutation w :=Perm(w).
Add the key: w :=AddKey(w, k) = w ⊕ k.
end for
The ciphertext is c := w.
return c
End
11.3.2 Writing down equations
Now let us turn to the question of how to write a system of equations that
describes the encryption algorithm as in Algorithm 11.4. We would like to
write equations on the bit level, i.e. over F2. Denote by p = (p0, . . . , p15) and
c = (c0, . . . , c15) the plaintext and ciphertext variables that appear as param-
eters in our system. Then k = (k0, . . . , k15) are unknown key variables. Let
xi = (xi,0, . . . , xi,15), i = 0, 1 be the variables representing result of bitwise key
addition, yi = (yi,0, . . . , yi,15), i = 1, 2 be variables representing outcome of the
S-Boxes, and zi = (zi,0, . . . , zi,15), i = 1, 2 be results of the permutation layer.
Now we can write the encryption process as the following system:
x0 = p + k,
yi = SBox(xi−1), i = 1, 2,
zi = Perm(yi), i = 1, 2,
x1 = z1 + k,
c = z2 + k. (11.10)
Here SBox and Perm are some polynomial functions that act on variable-
vectors according to Algorithm 11.4.

There are three operations that are performed in the algorithm: bitwise key
addition, substitution via four 4-bit S-Boxes, and the permutation. The key
addition is represented trivially as above and one can write it on the bit level
as, e.g. in the initial key addition: x0,j = pj + kj, 0 ≤ j ≤ 15. The permutation
Perm also does not pose any problem. According to (11.9) we have that the
blocks zi = Perm(yi), i = 1, 2 above are written as
zi,j = yi,P os−1(j), 0 ≤ j ≤ 15,
where Pos−1
(j) may be easily computed and in fact in this case we have
Pos−1
= Pos.
An interesting question is how to write equations over F2 that would describe the
S-Box transformation SBox. Since SBox is composed of four parallel S-Boxes
that perform inversion in F16, we may concentrate on writing equation for one
S-Box. Let a = (a0, a1, a2, a3) be input bits of the S-Box and b = (b0, b1, b2, b3)
are the output bits. The way we defined S-Box, we should consider a = 0 as
an element of F16 and then compute b = a−1
in F16. Afterwards we regard b
as a vector in F4
2. The all-zero vector is mapped to the all-zero vector. The
describing equation for inversion over F16 for the case a = 0 is obviously simply
a · b = 1 or, incorporating the case a = 0, b = a14
. Let us concentrate on the
case a = 0. We would like to rewrite the equation a · b = 1 over F16 into a
system of equations over F2 which involves the bit variables ai’s and bj’s. In
Example 11.1.22 we have seen what these equations are. But how can we ob-
tain those? Using the identification F16
∼= F2[x]/ x4
+x+1 we identify vectors
(a0, a1, a2, a3) and (b0, b1, b2, b3) from F4
2 with a = a0 + a1x + a2x2
+ a3x3
and
b = b0 + b1x + b2x2
+ b3x3
. Now having the rule x4
+ x + 1 = 0 in mind we have
to perform the multiplication a · b and collect the coefficients for the exponents
of x. We have (considering that x4
= x + 1, x5
= x2
+ x, x6
= x3
+ x2
):
a · b = (a0 + a1x + a2x2
+ a3x3
) · (b0 + b1x + b2x2
+ b3x3
) =
a0b0 + (a0b1 + a1b0)x + (a0b2 + a2b0 + a1b1)x2
+ (a0b3 + a3b0 + a1b2 + a2b1)x3
+
+(a1b3 + a3b1 + a2b2)x4
+ (a2b3 + a3b2)x5
+ a3b3x6
=
= a0b0 + (a0b1 + a1b0)x + (a0b2 + a2b0 + a1b1)x2
+ (a0b3 + a3b0 + a1b2 + a2b1)x3
+
+(a1b3 + a3b1 + a2b2)(x + 1) + (a2b3 + a3b2)(x2
+ x) + a3b3(x3
+ x2
) =
(a0b0 + a1b3 + a3b1 + a2b2) + (a0b1 + a1b0 + a1b3 + a2b2 + a2b3 + a3b1 + a3b2)x+
+(a0b2 + a1b1 + a2b0 + a2b3 + a3b2 + a3b3)x2
+ (a0b3 + a1b2 + a2b1 + a3b0 + a3b3)x3
.
So the vector representation of the product a·b is (a0b0+a1b3+a3b1+a2b2, a0b1+
a1b0+a1b3+a2b2+a2b3+a3b1+a3b2, a0b2+a1b1+a2b0+a2b3+a3b2+a3b3, a0b3+
a1b2 +a2b1 +a3b0 +a3b3). The vector representation of 1 ∈ F16 is (1, 0, 0, 0). By
comparing the corresponding vector entries we have the following system over
F2 that describes the S-Box:
a0b0 + a1b3 + a3b1 + a2b2 = 1,
a0b1 + a1b0 + a1b3 + a2b2 + a2b3 + a3b1 + a3b2 = 0,
a0b2 + a1b1 + a2b0 + a2b3 + a3b2 + a3b3 = 0,
a0b3 + a1b2 + a2b1 + a3b0 + a3b3 = 0.
In order to fully describe the S-Box we must recall that our bit variables ai’s
and bj’s live in F2. Therefore the field equations a2
i + ai = 0 and b2
i + bi = 0

for 0 ≤ i ≤ 3 have to be added. So now we have obtained exactly the implicit
equations as in Example 11.1.22.
By adding field equations for all participating variables to the equations we
introduced above, we obtain a full description of the toy cipher in assumption
that no zero-inversion occurs in S-Boxes, a probability of this event is computed
in Exercise 11.3.2. Having a pair (p, c) encoded with an unknown key k we may
plug in the values of p and c into the system (11.10) and try to solve for the
unknowns and in particular for the unknowns k. Work out Exercise 11.3.3 to
see the details.
Going back to Example 11.1.22 we recall that it is possible to obtain explicit
relations between the inputs and outputs. Note also that these relations now also
include the case 0 → 0, if we remove the equation (a0+1)(a1+1)(a2+1)(a3+1) =
0. These explicit equations are:
b0 = a0a1a2 + a1a2a3 + a0a2 + a1a2 + a0 + a1 + a2 + a3,
b1 = a0a1a3 + a0a1 + a0a2 + a1a2 + a1a3 + a3,
b2 = a0a2a3 + a0a1 + a0a2 + a0a3 + a2 + a3,
b3 = a1a2a3 + a0a3 + a1a3 + a2a3 + a1 + a2 + a3.
These equations may be useful in the following approach. By having explicit
equations of degree three that describe the S-Boxes, one may obtain equations
of degree 3 · 3 = 9 in the key variables only. Indeed, one should do consecutive
substitutions from equation to equation in the system (11.10). One proceeds
by substituting corresponding bit variables from x0 = p + k to y1 = SBox(x0),
therewith obtaining relations of the form y1 = f(p, k) of degree three in k (p is
assumed to be known as usual). Then substitute y1 = f(p, k) to z1 = Perm(y1)
and then these to x1 = z1 + k. One obtains relations of the form x1 = g(p, k)
again of degree three in k. Now the next substitution of x1 = g(p, k) to y2 =
SBox(x1) increases the degree. Namely, because g is of degree three and SBox is
of degree three we obtain equations y2 = h(p, k) of degree 3·3 = 9. The following
substitutions do not increase the degree, since all the following equations are
linear.
The reason for us wanting to obtain such equations in key variables only is a
possibility to use more than one pair of plain-/ciphertext encoded with the same
unknown key. By doing the above process for each such pair, we obtain each
time 16 equations of degree 9 in the key variables k (the key stays the same).
Note that if we would use the implicit representation we could not eliminate the
“intermediate” variables, such as x0, y1, z1, etc. Moreover, these intermediate
variables depend on parameters p (and c), so these variables are all different
for different plaintext/ciphertext pairs. The idea of the latter approach is to
keep the number of variables as small as possible, but increase the number of
equations that relate them. In the theory and practice of solving polynomial
systems it has been noted that solving more overdetermined (i.e. more equations
than variables) systems has a positive effect on complexity and thus on the
success of solving a system in question.
Still, degree-9 equations are too hard to attack. We would like to reduce the
degree of our equations. Below we outline a general principle, known as the
“meet-in-the-middle” principle, to reduce the degree. As the name suggests,
we would like to obtain some relations between variables in the middle, rather
than at the end of encryption. For this we need to invert the second half of a
cipher in question. In our case this means to invert the second round. We have

already noted that Perm = Perm−1
. Also since the S-Box transformation is
an inversion in F16 with 0 → 0 we have that SBox = SBox−1
. Now similarly
with the above substitution procedure, we do “forward” substitutions:
x0 = p + k → y1 = SBox(x0) → z1 = Perm(y1),
obtaining at the end equations z1 = F(p, k) of degree 3, and then “backward”
substitutions
z2 = c + k → y2 = Perm(z2) → x1 = SBox(y2) → z1 = x1 + k,
obtaining equations z1 = G(c, k) also of degree 3. Equating the two one obtains
a system of 16 equations F(p, k) = G(c, k) of degree 3 in key variables k only.
Repeating this process for each plain-/ciphertext pair, one may obtain as many
equations (each time a multiple of 16) as one wants. One should not forget, of
course, to include the field equations each time to make sure that the values of
variables stay in F2. Exercise 11.3.4 elaborates on solving using this approach.
11.3.3 General S-Boxes
In the previous section we have seen how to write equations for the S-Box given
by the inversion function in the field F16. Although this idea was employed in
the AES, a widely used cipher, cf. Section 10.1.4, this is not a standard way
to define S-Boxes in block ciphers. Usually S-Boxes are defined via so-called
look-up tables, i.e. tables which explicitly prescribe an output value to a given
input value. Whereas we used algebraic structure of the toy cipher in Section
11.3.1 to derive equations, it is still not clear from that exposition how to write
S-Box equations in the more general case of look-up table definitions.
As an illustrating example we will use a 3-bit S-Box. This S-Box is even smaller
than the one employed in our toy cipher. Still it has been proposed in one of
the so-called light-weight block ciphers PrintCIPHER. The look-up table for
this S-Box, call it S, is as follows:
x 0 1 2 3 4 5 6 7
S(x) 0 1 3 6 7 4 5 2
Here we used decimal representation for length-3 binary vectors. For example,
the S-Box maps the vector 2 = (0, 1, 0) to the vector 3 = (1, 1, 0).
One method we can use to obtain explicit relations for the output values is
as follows. The S-Box S is a function S : F3
2 → F3
2, which can be seen as a
collection of functions Si : F3
2 → F2, i = 0, 1, 2 mapping input vectors to the bits
at position 0, 1, and 2 resp. It is known that *** recall?! *** that each function
defined over a finite field is actually a polynomial function.
Let us find a polynomial describing the function S0. The look-up table in this
case is as follows:
x 0 1 2 3 4 5 6 7
S0(x) 0 1 1 0 1 0 1 0
Denote by x0, x1, x2 the input bits. We have
S0(x0, x1, x2) = S0(0, 0, 0) · (x0 − 1)(x1 − 1)(x2 − 1)+
+S0(1, 0, 0) · x0(x1 − 1)(x2 − 1)+
+S0(0, 1, 0) · (x0 − 1)x1(x2 − 1) + S0(1, 1, 0) · x0x1(x2 − 1)+
S0(0, 0, 1) · (x0 − 1)(x1 − 1)x2 + S0(1, 0, 1) · x0(x1 − 1)x2+
S0(0, 1, 1) · (x0 − 1)x1x2 + S0(1, 1, 1) · x0x1x2.

Indeed, by assigning concrete values (v0, v1, v2) to (x0, x1, x2) we obtain S0(v0, v1, v2) =
S0(v0, v1, v2)·1·1·1 and in each other summand at least one factor will evaluate
to zero canceling that summand. Substituting concrete values for S0 from the
look-up table, we obtain:
S0(x0, x1, x2) = x0(x1 − 1)(x2 − 1) + (x0 − 1)x1(x2 − 1) + (x0 − 1)(x1 − 1)x2+
+(x0 − 1)x1x2 = x1x2 + x0 + x1 + x2.
Analogously we obtain polynomial expressions for S1 and S2:
S1(x0, x1, x2) = x0x2 + x1 + x2,
S2(x0, x1, x2) = x0x1 + x2.
Another technique based on linear algebra gives an opportunity to obtain dif-
ferent relations between input an output variables. We are interested in rela-
tions of as low degree as possible and usually these are quadratic relations.
Let us demonstrate how to obtain bilinear relations for the S-Box S. De-
note yi = Si, i = 0, 1, 2. So we are interested in finding relations of the form
0≤i,j≤3 aijxiyj = 0. In order to do this, we treat coefficients aij’s as variables.
Each assignment of values to (x0, x1, x2) yields a unique assignment of values to
(y0, y1, y2) according to the look-up table. Each assignment of (x0, x1, x2) and
thus of (y0, y1, y2) provides us with a linear equation in aij’s by plugging in as-
signed values in the relation 0≤i,j≤3 aijxiyj = 0, which should hold for every
assignment. We may use 23
= 8 assignments for the x-variables to get 8 linear
equations in 3 · 3 = 9 variables aij’s. Each non-trivial solution of this homoge-
neous linear system provides us with a non-trivial bilinear relation between the
x- and y-variables. Exercise 11.3.5 works out the details of this approach for the
example of S. We just mention that, e.g. x0y2 + x1y0 + x1y1 + x2y1 + x2y2 = 0
is one such bilinear relation. There exist overall 3 linearly independent bilin-
ear relations. Using exactly the same idea one may find other relations, e.g.
general quadratic: 0≤i,j≤3 aijxiyj + 0≤ij≤3 bijxixj + 0≤ij≤3 cijyiyj +
0≤i≤3 dixi + 0≤i≤3 eiyi = 0 and others that may be of interest.
Clearly, techniques of this section apply also to other S-Boxes defined by look-
up tables. See Exercise 11.3.6 for the treatment of the S-Box coming from the
block cipher PRESENT.
11.3.4 Exercises
11.3.1 Prove that in the toy cipher of Section 11.3.1 every ciphertext bit de-
pends on every plaintext bit.
11.3.2 Considering that inputs to the S-Boxes of the toy cipher are all uni-
formly distributed and independent random values, what is the probability that
no zero-inversion occurs during the encryption?
11.3.3 [CAS] Using Magma and/or Singular and/or SAGE/PolyBoRi write
an equation system representing the toy cipher from Section 11.3.1. When
defining a base ring for your Gröbner bases computations think/experiment on
the following questions:
• which ordering of variables works better?

• which monomial ordering is better? try e.g. lexicographic, degree reverse
lexicographic;
• does the result of the computation change when changing the ordering?
why?
• what happens if you remove the ﬁeld equations?
• try explicit vs. implicit representations for the S-Box;
11.3.4 Work out the meet-in-the-middle approach of Section 11.3.2. For the
substitution use the command subst in Singular.
11.3.5 Find bilinear relations for the S-Box S using the linear algebra approach
from Section 11.3.3. Compose a matrix for the homogeneous system as described
in the text. The rows will be indexed by assignments of (x0, x1, x2) and columns
by indexes (i, j) of the variables ai,j that are coeﬃcients for xiyj. Show that
rank of this matrix is 7 and thus you can get 3 linearly independent solutions.
Write down 3 linearly independent bilinear relations for S.
11.3.6 An S-Box in the block cipher PRESENT is a non-linear transformation
of 4-bit vectors. Its look-up table is as follows:
x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SBox(x) 12 5 6 11 9 0 10 13 3 14 15 8 4 7 1 2
• Write down equations that relate input bits explicitly to output bits. What
is the degree of these equations?
• Find all linearly independent bilinear relations and general quadratic re-
lations between inputs and outputs.
11.4 Notes

Chapter 12
Coding theory with
computer algebra packages
Stanislav Bulygin
In this chapter we give a brief overview of the three computer algebra systems:
Singular, Magma, and GAP. We concentrate our attention on things that are
useful for the book. On other topics, as well as language semantics and syntax
the reader is referred to the corresponding web-cites.
12.1 Singular
As is cited at www.singular.uni-kl.de: “SINGULAR is a Computer Algebra
System for polynomial computations with special emphasis on the needs of com-
mutative algebra, algebraic geometry, and singularity theory”. In the context
of this book, we use some functionality provided for AG-codes (brnoeth.lib),
decoding linear code via polynomial system solving (decodegb.lib), teaching
cryptography (crypto.lib, atkins.lib) and Gr¨obner bases (teachstd.lib).
Singular can be downloaded free of charge from http://www.singular.uni-kl.
de/download.html for diﬀerent platforms (Linux, Windows, Mac OS). The cur-
rent version is 3-0-4 *** to change at the end *** The web-cite provides an online
manual at http://www.singular.uni-kl.de/Manual/ latest/index.htm. Be-
low we provide the list of commands that can be used to work with objects
presented in this book together with short descriptions. Examples of use can be
looked at from the links given below. More examples occur throughout the book
at the corresponding places. The functionality mentioned above is provided via
libraries and not the kernel function to load a library in Singular one has to
type (brnoeth.lib as an example):
LIBbrnoeth.lib;
brnoeth.lib: Brill-Noether Algorithm, Weierstrass-SG and AG-codes by J.
I. Farran Martin and C. Lossen (http://www.singular.uni-kl.de/Manual/
latest/sing_1238.htm#SEC1297)
Description: Implementation of the Brill-Noether algorithm for solving the
381

382CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES
Riemann-Roch problem and applications in Algebraic Geometry codes. The
computation of Weierstrass semigroups is also implemented. The procedures
are intended only for plane (singular) curves defined over a prime field of pos-
itive characteristic. For more information about the library see the end of the
file brnoeth.lib.
Selected procedures:
- NSplaces: computes non-singular places with given degrees
- BrillNoether: computes a vector space basis of the linear system L(D)
- Weierstrass: computes the Weierstrass semigroup of C at P up to m
- AGcode_L: computes the evaluation AG code with divisors G and D
- AGcode_Omega: computes the residual AG code with divisors G and D
- decodeSV: decoding of a word with the basic decoding algorithm
- dual_code: computes the dual code
decodegb.lib: Decoding and minimum distance of linear codes with Gröbner
bases by S. Bulygin (...)
Description: In this library we generate several systems used for decoding cyclic
codes and finding their minimum distance. Namely, we work with the Cooper’s
philosophy and generalized Newton identities. The original method of quadratic
equations is worked out here as well. We also (for comparison) enable to work
with the system of Fitzgerald-Lax. We provide some auxiliary functions for fur-
ther manipulations and decoding. For an overview of the methods mentioned
above, see “Decoding codes with GB” section of the manual. For the vanishing
ideal computation the algorithm of Farr and Gao is implemented.
- sysCRHT: generates the CRHT-ideal as in Cooper’s philosophy
- sysNewton: generates the ideal with the generalized Newton identities
- syndrome: computes a syndrome w.r.t. the given check matrix
- sysQE: generates the system of quadratic equations for decoding
- errorRand: inserts random errors in a word
- randomCheck: generates a random check matrix
- mindist: computes the minimum distance of a code
- decode: decoding of a word using the system of quadratic equations
- decodeRandom: a procedure for manipulation with random codes
- decodeCode: a procedure for manipulation with the given code
- vanishId: computes the vanishing ideal for the given set of points

12.2. MAGMA 383
crypto.lib: Procedures for teaching cryptography by G. Pfister (http: //www.singular.uni-kl.de/Manual/late
Description: The library contains procedures to compute the discrete logarithm,
primality-tests, factorization included elliptic curve methodes. The library is
intended to be used for teaching purposes but not for serious computations.
Sufficiently high printlevel allows to control each step, thus illustrating the
algorithms at work.
atkins.lib: Procedures for teaching Elliptic Curve cryptography (primality
test) by S. Steidel (http://www.singular.uni-kl.de/Manual/latest/sin g_1281.htm#SEC1340)
Description: The library contains auxiliary procedures to compute the elliptic
curve primality test of Atkin and the Atkin’s Test itself. The library is intended
to be used for teaching purposes but not for serious computations. Sufficiently
high printlevel allows to control each step, thus illustrating the algorithms at
work. teachstd.lib: Procedures for teaching standard bases by G.-M. Greuel
(http://www.singular.uni-kl.de/Manual/latest/sing_1344.htm#SEC1403)
Description: The library is intended to be used for teaching purposes, but not
for serious computations. Sufficiently high printlevel allows to control each
step, thus illustrating the algorithms at work. The procedures are implemented
exactly as described in the book ’A SINGULAR Introduction to Commutative
Algebra’ by G.-M. Greuel and G. Pfister (Springer 2002) [].
- tail: tail of f
- leadmonomial: leading monomial as poly (also for vectors)
- monomialLcm: lcm of monomials m and n as poly (also for vectors)
- spoly: s-polynomial of f [symmetric form]
- NFMora normal form of i w.r.t Mora algorithm
- prodcrit: test for product criterion
- chaincrit: test for chain criterion
- standard: standard basis of ideal/module
12.2 Magma
“Magma is a large, well-supported software package designed to solve computa-
tionally hard problems in algebra, number theory, geometry and combinatorics”
– this is a formulation given at the official web-site http://magma.maths.usyd.
edu.au/magma/. The current version is 2.15-7 *** to change at the end ***.
In this book we use illustrations with Magma for different coding construc-
tions: general, as well as more specific, such as AG-codes, also some machinery
for working with algebraic curves, as well as a few procedures for cryptog-
raphy. Although Magma is a non-commercial system, it is still not free of
charge: one has to purchase a license to work with it. Details can be found at
http://magma.maths.usyd.edu.au/magma/Ordering/ordering.shtml. Still
one can try to run simple Magma-code in the so-called “Magma-Calculator”

(http://magma.maths.usyd.edu.au/calc/). *** All examples and exercises
run successfully in this calculator. *** Online help system for Magma can
be found at http://magma.maths.usyd.edu.au/magma/htmlhelp/MAGMA.htm.
Next we describe briefly some procedures that come in hand while dealing with
objects from this book. We list only a few commands to give a flavor of func-
tionality. One can get a lot more from the manual.
12.2.1 Linear codes
Full list of commands with descriptions can be found at http://magma.maths.usyd.
edu.au/magma/htmlhelp/text1667.htm
- LinearCode: constructs a linear codes as a vector subspace
- PermutationCode: permutes positions in a code
- RepetitionCode: constructs a repetition code
- RandomLinearCode: constructs random linear code
- CyclicCode constructs a cyclic code
- ReedMullerCode: constructs a Reed-Muller code
- HammingCode: constructs a Hamming code
- BCHCode: constructs a BCH code
- ReedSolomonCode: constructs a Reed-Solomon code
- GeneratorMatrix: yields the generator matrix
- ParityCheckMatrix: yields the parity check matrix
- Dual: constructs the dual code
- GeneratorPolynomial: yields the generator polynomial of the given cyclic
code
- CheckPolynomial: yields the check polynomial of the given cyclic code
- Random: yields a random codeword
- Syndrome: yields a syndrome of a word
- Distance: yields distance between words
- MinimumDistance: computes minimum distance of a code
- WeightEnumerator: computes the weight enumerator of a code
- ProductCode: constructs a product code from the given two
- SubfieldSubcode: constructs a subfield subcode
- McEliecesAttack: Runs basic attack on the McEliece cryptosystem
- GriesmerBound: provides the Griesmer bound for the given parameters

12.2. MAGMA 385
- SpherePackingBound: provides the sphere packing bound for the given
parameters
- BCHBound: provides the BCH bound for the given cyclic code
- Decode: decode a code with standard methods
- MattsonSolomonTransform: computes the Mattson-Solomon transform
- AutomorphismGroup: computes the automorphism group of the given code
12.2.2 AG-codes
- AGCode: constructs an AG-code
- AGDualCode: constructs a dual AG-code
- HermitianCode: constructs a Hermitian code
- GoppaDesignedDistance: returns designed Goppa distance
- AGDecode: basic algorithm for decoding an AG-code
12.2.3 Algebraic curves
- Curve: constructs a curve
- CoordinateRing: computes the coordinate ring of the given curve with
Gr¨obner basis techniques
- JacobianMatrix: computes the Jacobian matrix
- IsSingular: test if the given curve has singularities
- Genus: computes genus of a curve
- EllipticCurve: constructs an elliptic curve
- AutomorphismGroup: computes the automorphism of the given curve
- FunctionField: computes the function ﬁeld of the given curve
- Valuation: computes a valuation of the given function w.r.t the given
place
- GapNumbers:: yields gap numbers
- Places: computes places of the given curve
- RiemannRochSpace: computes the Riemann-Roch space

- Basis: computes a sequence containing a basis of the Riemann-Roch space
L(D) of the divisor D.
- CryptographicCurve: given the finite field computes an elliptic curve E
over a finite field together with a point P on E such that the order of P is
a large prime and the pair (E, P) satisfies the standard security conditions
for being resistant to MOV and Anomalous attacks
12.3 GAP
In this subsection we consider GAP computational discrete algebra system that
is “a system for computational discrete algebra, with particular emphasis on
Computational Group Theory”. GAP stands for Groups, Algorithms, Pro-
gramming, http://www.gap-system.org. Although primary concern of GAP is
computations with groups, it also provides coding-oriented functionality via the
GUAVA package, http://www.gap-system.org/Packages/guava.html. GAP
can be downloaded for free from http://www.gap-system.org/Download/index.
html. The current GAP version is 4.4.12, the current GUAVA version is 3.9.
*** to change at the end *** As before, we only list here some procedures
to provide an understanding of which things can be done with GUAVA/GAP.
Package GUAVA is included as follows:
LoadPackage(guava);
Online manual for guava can be found at http://www.gap-system.org/Manuals/
pkg/guava3.9/htm/chap0.html
- RandomLinearCode: constructs a random linear code
- GeneratorMatCode: constructs a linear code via its generator matrix
- CheckMatCode: constructs a linear code via its parity check matrix
- HammingCode: constructs a Hamming code
- ReedMullerCode: constructs a Reed-Muller code
- GeneratorPolCode: constructs a cyclic code via its generator polynomial
- CheckPolCode: constructs a cyclic code via its check polynomial
- RootsCode: constructs a cyclic code via roots of the generator polynomial
- BCHCode: constructs a BCH code
- ReedSolomomCode: constructs a Reed-Solomon code
- CyclicCodes: returns all cyclic codes of given length
- EvaluationCode: construct an evaluation code
- AffineCurve: sets a framework for working with an affine curve
- GoppaCodeClassical: construct a classical geometric Goppa code
- OnePointAGCode: construct a one-point AG-code

12.4. SAGE 387
- PuncturedCode: construct a punctured code for the given
- DualCode: constructs the dual code to the given
- UUVCode: constructs a code via the (u|u + v)-construction
- LowerBoundMinimumDistance: yields the best lower bound on the mini-
mum distance available
- UpperBoundMinimumDistance: yields the best upper bound on the mini-
mum distance available
- MinimumDistance: yields the minimum distance of the given code
- WeightDistribution: yield the weight distribution of the given code
- Decode: general decoding procedure
12.4 Sage
Sage framework provides an opportunity to use strengths of many open-source
computer algebra systems (among them are Singular and GAP) for developing
effective code for solving different mathematical problems. The general frame-
work is made possible through the python-interface. Sage is thought as an open-
source alternative to commercial systems, such as Magma, Maple, Mathematica,
and Matlab. Sage provides tools for a wide variety of algebraic and combina-
torial objects among other things. For example functionality for coding theory
and cryptography is present, as well as functionality for working with algebraic
curves. The web-page of the project is http://www.sagemath.org/. One can
download Sage from http://www.sagemath.org/download.html. The refer-
ence manual for Sage is available at http://www.sagemath.org/doc/reference/.
Now we briefly describe some commands that may come in hand while working
with this book.
12.4.1 Coding Theory
Manual available at http://www.sagemath.org/doc/reference/coding.html
Coding-functionality of Sage has a lot in common with the one of GAP/GUAVA.
In fact, for many commands Sage uses implementations available from GAP.
- LinearCodeFromCheckMatrix: constructs a linear code via its parity check
matrix
- RandomLinearCode: constructs a random linear code
- CyclicCodeFromGeneratingPolynomial: constructs a cyclic code via its
generator polynomial
- QuadraticResidueCode: constricts a quadratic residue cyclic code
- ReedSolomonCode: constructs a Reed-Solomon code

- gilbert_lower_bound: computes the lower bound due to Gilbert
- permutation_automorphism_group: computes the permutation automor-
phism group of the given code
- weight_distribution: computes the weight distribution of a code
12.4.2 Cryptography
Manual available at http://www.sagemath.org/doc/reference/cryptography.html
Selected procedures/classes:
- SubstitutionCryptosystem: defines a substitution cryptosystem/cipher
- VigenereCryptosystem: defines the Vigenere cryptosystem/cipher
- lfsr_sequence: produces an output of the given LFSR
- SR: returns a small scale variant for the AES
12.4.3 Algebraic curves
Manual available at http://www.sagemath.org/doc/reference/plane_curves.html
Selected procedures/classes:
- EllipticCurve_finite_field: constructs an elliptic curves over a finite
field
- trace_of_frobenius: computes the trace of Frobenius of an elliptic curve
- cardinality: computes the number of rational points of an elliptic curve
- HyperellipticCurve_finite_field: constructs a hyperelliptic curves
over a finite field
12.5 Coding with computer algebra
12.5.1 Introduction
..............
12.5.2 Error-correcting codes
Example 12.5.1 Let us construct Example 2.1.6 for n = 5 using GAP/GUAVA.
First, we need to define the list of codewords
M := Z(2)^0 * [
[1,1,0,0,0],[1,0,1,0,0],[1,0,0,1,0],[1,0,0,0,1],[0,1,1,0,0],
[0,1,0,1,0],[0,1,0,0,1],[0,0,1,1,0],[0,0,1,0,1],[0,0,0,1,1]
];
In GAP Z(q) is a primitive element of the field GF(q). So multiplying the list
M by Z(2)^0 we make sure that the elements belong to GF(2). Now construct
the code:
C:=ElementsCode(M,Example 2.1.6 for n=5,GF(2));

12.5. CODING WITH COMPUTER ALGEBRA 389
a (5,10,1..5)1..5 Example 2.1.6 for n=5 over GF(2)
We can compute the minimum distance and the size of C as follows
MinimumDistance(C);
2
Size(C);
10
Now the information on the code is updated
Print(C);
a (5,10,2)1..5 Example 2.1.6 for n=5 over GF(2)
The block 1..5 gives a range for the covering radius of C. We treat it later in
3.2.24.
Example 12.5.2 Let us construct the Hamming [7, 4, 3] code in GAP/GUAVA
and Magma. Both systems have a built-in command for this. In GAP
C:=HammingCode(3,GF(2));
a linear [7,4,3]1 Hamming (3,2) code over GF(2)
Here the syntax is HammingCode(r,GF(q)) where r is the redundancy and GF(q)
is the deﬁning alphabet. We can extract a generator matrix as follows
M:=GeneratorMat(C);;
Display(M);
1 1 1 . . . .
1 . . 1 1 . .
. 1 . 1 . 1 .
1 1 . 1 . . 1
Two semicolons indicate that we do not want an output of a command be printed
on the screen. Display provides a nice way to represent objects.
In Magma we do it like this
C:=HammingCode(GF(2),3);
C;
[7, 4, 3] Hamming code (r = 3) Linear Code over GF(2)
Generator matrix:
[1 0 0 0 1 1 0]
[0 1 0 0 0 1 1]
[0 0 1 0 1 1 1]
[0 0 0 1 1 0 1]
So here the syntax is reverse.
Example 12.5.3 Let us construct [7, 4, 3] binary Hamming code via its parity
check matrix. In GAP/GUAVA we proceed as follows
H1:=Z(2)^0*[[1,0,0],[0,1,0],[0,0,1],[1,1,0],[1,0,1],[0,1,1],[1,1,1]];;
H:=TransposedMat(H1);;
C:=CheckMatCode(H,GF(2));
a linear [7,4,1..3]1 code defined by check matrix over GF(2)
We can now check the property of the check matrix:
G:=GeneratorMat(C);;
Display(G*H1);
. . .
. . .
. . .
. . .

We can also compute syndromes:
c:=CodewordNr(C,7);
[ 1 1 0 0 1 1 0 ]
Syndrome(C,c);
[ 0 0 0 ]
e:=Codeword(1000000);;
Syndrome(C,c+e);
[ 1 0 0 ]
So we have taken the 7th codeword in the list of codewords of C and showed
that its syndrome is 0. Then we introduced an error at the first position: the
syndrome is non-zero.
In Magma one can generate codes only by vector subspace generators. So the
way to generate a code via its parity check matrix is to use Dual command, see
Example 12.5.4. So we construct the Hamming code as in Example 12.5.2 and
then proceed as above.
H:=ParityCheckMatrix(C);
H;
[1 0 0 1 0 1 1]
[0 1 0 1 1 1 0]
[0 0 1 0 1 1 1]
G:=GeneratorMatrix(C);
G*Transpose(H);
[0 0 0]
[0 0 0]
[0 0 0]
[0 0 0]
Syndromes are handled as follows:
c:=Random(C);
Syndrome(c,C);
(0 0 0)
V:=AmbientSpace(C);
e:=V![1,0,0,0,0,0,0];
r:=c+e;
Syndrome(r,C);
(1 0 0)
Here we have taken a random codeword of C and computed its syndrome. Now,
V is the space where C is defined, so the error vector e sits there, which is
indicated by the prefix V!.
Example 12.5.4 Let us start again with the binary Hamming code and see
how dual codes are constructed in GAP and Magma. In GAP we have
C:=HammingCode(3,GF(2));;
CS:=DualCode(C);
a linear [7,3,4]2..3 dual code
G:=GeneratorMat(C);;
H:=GeneratorMat(CS);;
Display(G*TransposedMat(H));
. . .
. . .

. . .
. . .
The same can be done in Magma. Moreover, we can make sure that the dual of
the Hamming code is the predeﬁned simplex code:
CS:=Dual(C);
G:=GeneratorMatrix(CS);
S:=SimplexCode(3);
H:=ParityCheckMatrix(S);
G*Transpose(H);
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
Example 12.5.5 Let us work out some examples in GAP and Magma that il-
lustrate the notions of permutation equivalency and permutation automorphism
group. As a model example we take as usual the binary Hamming code. Next
we show how equivalency can be checked in GAP/GUAVA:
p:=(1,2,3)(4,5,6,7);;
CP:=PermutedCode(C,p);
a linear [7,4,3]1 permuted code
IsEquivalent(C,CP);
true
So codes C and CP are equivalent. We may compute the permutation that brings
C to CP:
CodeIsomorphism( C, CP );
(4,5)
Interestingly, we obtain that CP can be obtained from C just by (4,5). Let as
check if this is indeed true:
CP2:=PermutedCode(C,(4,5));;
Display(GeneratorMat(CP)*TransposedMat(CheckMat(CP2)));
. . .
. . .
. . .
. . .
So indeed the codes CP and CP2 are the same. The permutation automorphism
group can be computed via:
AG:=AutomorphismGroup(C);
Group([ (1,2)(5,6), (2,4)(3,5), (2,3)(4,6,5,7), (4,5)(6,7), (4,6)(5,7) ])
Size(AG)
168
So the permutation automorphism group of C has 5 generators and 168 elements.
In Magma there is no immediate way to deﬁne permuted codes. We still can
compute a permutation automorphism group, which is called a permutation
group there:
PermutationGroup(C);
Permutation group acting on a set of cardinality 7

Order = 168 = 2^3 * 3 * 7
(3, 6)(5, 7)
(1, 3)(4, 5)
(2, 3)(4, 7)
(3, 7)(5, 6)
12.5.3 Code constructions and bounds
Example 12.5.6 In this example we go through the above constructions in
GAP and Magma. As a model code we consider the [15, 11, 3] binary Hamming
code.
CP:=PuncturedCode(C);
a linear [14,11,2]1 punctured code
CP5:=PuncturedCode(C,[11,12,13,14,15]);
a linear [10,10,1]0 punctured code
So PuncturedCode(C) punctures C at the last position and there is also a posi-
bility to give the positions explicitly. The same syntax is for the shortening
construction.
CS:=ShortenedCode(C);
a linear [14,10,3]2 shortened code
CS5:=ShortenedCode(C,[11,12,13,14,15]);
a linear [10,6,3]2..3 shortened code
Next we extend a code and check the property described in Proposition 3.1.11.
CE:=ExtendedCode(C);
a linear [16,11,4]2 extended code
CEP:=PuncturedCode(CE);;
C=CEP;
true
A code C can be extended i times via ExtendedCode(C,i). Next take the short-
ened code augment and lengthen it.
CSA:=AugmentedCode(CS);;
d:=MinimumDistance(CSA);;
CSA;
a linear [14,11,2]1 code, augmented with 1 word(s)
CSL:=LengthenedCode(CS);
a linear [15,11,2]1..3 code, lengthened with 1 column(s)
By default the augmentation is done by the all-one vector. One can specify the
vector v to augment with explicitly by AugmentedCode(C,v). One can also do
extension in the lengthening construction i times by LengthenedCode(C,i).
Now we do the same operations in Magma.
CP:=PunctureCode(C, 15);
CP5:=PunctureCode(C, {11..15});
CS:=ShortenCode(C, 15);
CS5:=ShortenCode(C, {11..15});
CE:=ExtendCode(C);
CEP:=PunctureCode(CE,16);
C eq CEP;
true

CSA:=AugmentCode(CS);
CSL:=LengthenCode(CS);
One can also expurgate a code as follows.
CExp:=ExpurgateCode(C);
CExp;
[15, 10, 4] Cyclic Linear Code over GF(2)
Generator matrix:
[1 0 0 0 0 0 0 0 0 0 1 0 1 0 1]
[0 1 0 0 0 0 0 0 0 0 1 1 1 1 1]
[0 0 1 0 0 0 0 0 0 0 1 1 0 1 0]
[0 0 0 1 0 0 0 0 0 0 0 1 1 0 1]
[0 0 0 0 1 0 0 0 0 0 1 0 0 1 1]
[0 0 0 0 0 1 0 0 0 0 1 1 1 0 0]
[0 0 0 0 0 0 1 0 0 0 0 1 1 1 0]
[0 0 0 0 0 0 0 1 0 0 0 0 1 1 1]
[0 0 0 0 0 0 0 0 1 0 1 0 1 1 0]
[0 0 0 0 0 0 0 0 0 1 0 1 0 1 1]
We see that in fact the code CExp has more structure: it is cyclic, i.e. a cyclic
shift of every codeword is again a codeword, cf. Chapter 7. One can also ex-
purgate codewords from the given list L by ExpurgateCode(C,L). In GAP this
is done via ExpurgatedCode(C,L).
Example 12.5.7 Let us demonstrate how direct product is constructed in GAP
and Magma. We construct a direct product of the binary [15, 11, 3] Hamming
code with itself. In GAP we do
CProd:=DirectProductCode(C,C);
a linear [225,121,9]15..97 direct product code
In Magma:
CProd:=DirectProduct(C,C);
Example 12.5.8 Now we go through some of the above constructions using
GAP and Magma. As model codes for summands we take binary [7, 4, 3]
and [15, 11, 3] Hamming codes. In GAP the direct sum and the (u|u + v)-
construction are implemented.
C1:=HammingCode(3,GF(2));;
C2:=HammingCode(4,GF(2));;
C:=DirectSumCode(C1,C2);
a linear [22,15,3]2 direct sum code
CUV:=UUVCode(C1,C2);
a linear [22,15,3]2..3 U|U+V construction code
In Magma along with the above commands, a command for the juxtaposition
is deﬁned. The syntax of the commands is as follows:
C1:=HammingCode(GF(2),3);
C2:=HammingCode(GF(2),4);
C:=DirectSum(C1,C2);
CJ:=Juxtaposition(C2,C2); // [30, 11, 6] Cyclic Linear Code over GF(2)
CPl:=PlotkinSum(C1,C2);

Example 12.5.9 Let us construct a concatenated code in GAP and Magma.
We concatenate a Hamming [17, 15, 3] code over F16 and the binary [7, 4, 3]
Hamming code. In GAP we do the following
O:=[HammingCode(2,GF(16))];;
I:=[HammingCode(3,GF(2))];;
C:=BZCodeNC(O,I);
a linear [119,60,9]0..119 Blokh Zyablov concatenated code
In GAP there is a possibility to perform a generalized construction using many
outer and inner codes, therefore the syntax is with square brackets to define
lists.
In Magma we proceed as below
O:=HammingCode(GF(16),2);
I:=HammingCode(GF(2),3);
C:=ConcatenatedCode(O,I);
Example 12.5.10 Magma provides a way to construct an MDS code with pa-
rameters [q +1, k, q −k +2] over Fq given the prime power q and positive integer
k. Example follows
C:=MDSCode(GF(16),10); //[17, 10, 8] Cyclic Linear Code over GF(2^4)
Example 12.5.11 GAP and Magma provide commands that give an opportu-
nity to compute some lower and upper bounds in size and minimum distance of
codes, as well as stored tables for best known codes. Let us take a look how this
functionality is handled in GAP first. The command UpperBoundSingleton(n,d,q)
gives an upper bound on size of codes of length n, minimum distance d defined
over Fq. This applies also to non-linear codes. E.g.:
UpperBoundSingleton(25,10,2);
65536
In the same way one can compute the Hamming, Plotkin, and Griesmer bounds:
UpperBoundHamming(25,10,2);
2196
UpperBoundPlotkin(25,10,2);
1280
UpperBoundGriesmer(25,10,2);
512
Note that GAP does not require qd (q − 1)n as in Theorem 3.2.29. If
qd (q − 1)n is not the case, shortening is applied. One can compute an
upper bound which is a result of several bounds implemented in GAP
UpperBound(25,10,2);
1280
Since Griesmer bound is not in the list with which UpperBound works, we obtain
larger value. Analogously one can compute lower bounds
LowerBoundGilbertVarshamov(25,10,2);
16
Here 16 = 24
is the size of the binary code of length 25 with the minimum
distance at least 10.
One can access built-in tables (although somewhat outdated) as follows:
Display(BoundsMinimumDistance(50,25,GF(2)));
rec(
n := 50,

k := 25,
q := 2,
references := rec(
EB3 := [ %A Y. Edel J. Bierbrauer, %T Inverting Construction
Y1, %R preprint, %D 1997 ],
Ja := [ %A D.B. Jaffe, %T Binary linear codes: new results on
nonexistence, %D 1996, %O
http://www.math.unl.edu/~djaffe/codes/code.ps.gz ] ),
construction := false,
lowerBound := 10,
lowerBoundExplanation := [ Lb(50,25)=10, by taking subcode of:,
Lb(50,27)=10, by extending:, Lb(49,27)=9, reference: EB3 ],
upperBound := 12,
upperBoundExplanation := [ Ub(50,25)=12, by a one-step Griesmer
bound from:, Ub(37,24)=6, by considering shortening to:,
Ub(28,15)=6, otherwise extending would contradict:, Ub(29,15)=7,
reference: Ja ] )
In Magma one can compute the bounds in the following way
GriesmerBound(GF(2),25,10):
PlotkinBound(GF(2),25,10);
^
Runtime error in ’PlotkinBound’: Require n = 2*d for even weight binary case
34
SingletonBound(GF(2),25,10):
SpherePackingBound(GF(2),25,10):
GilbertVarshamovBound(GF(2),25,10);
9
GilbertVarshamovLinearBound(GF(2),25,10);
16
Note that the result on the Plotkin bound is diﬀerent from the one computed by
GAP, since Magma implements an improved bound treated in Remark 3.2.32.
The colon at the end of line suppresses the output. Access to built-in database
for given n and d is done as follows:
BDLCLowerBound(GF(2),50,10);
27
BDLCUpperBound(GF(2),50,10);
29
The corresponding commands for given n, k and k, d start with preﬁxes BKLC
and BLLC respectively.
12.5.4 Weight enumerator
Example 12.5.12 This example illustrates some functionality available for
weight distribution computations in GAP and Magma. In GAP one can com-
pute the weight enumerator of a code as well as the weight enumerator for its
dual via the MacWilliams identity.
CodeWeightEnumerator(C);

x_1^15+35*x_1^12+105*x_1^11+168*x_1^10+280*x_1^9+435*x_1^8+435*x_1^7+
280*x_1^6+168*x_1^5+105*x_1^4+35*x_1^3+1
CodeMacWilliamsTransform(C);
15*x_1^8+1
One interesting feature available in GAP is drawing weight histograms. It works
as follows:
WeightDistribution(C);
[ 1, 0, 0, 35, 105, 168, 280, 435, 435, 280, 168, 105, 35, 0, 0, 1 ]
WeightHistogram(C);
435-------------------------------------------
* *
* *
* *
* *
* *
* *
* *
* *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * * * *
* * * * * *
* * * * * *
* * * * * * * *
* * * * * * * *
* * * * * * * *
* * * * * * * *
* * * * * * * * * *
+--------+--+--+--+--+--+--+--+--+--+--------+--
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
In Magma the analogous functionality looks as follows:
WeightEnumerator(C);
$.1^15 + 35*$.1^12*$.2^3 + 105*$.1^11*$.2^4 + 168*$.1^10*$.2^5 + 280*$.1^9*$.2^6
+ 435*$.1^8*$.2^7 + 435*$.1^7*$.2^8 + 280*$.1^6*$.2^9 + 168*$.1^5*$.2^10 +
105*$.1^4*$.2^11 + 35*$.1^3*$.2^12 + $.2^15
W:=WeightDistribution(C);
MacWilliamsTransform(15,11,2,W);
[ 0, 1, 8, 15 ]
So WeightEnumerator(C) actually returns the homogeneus weight enumerator
with $.1 and $.2 as variables.

12.5.5 Codes and related structures
12.5.6 Complexity and decoding
Example 12.5.13 In GAP/GUAVA and Magma for general linear code the
idea of Definition 2.4.10 (1) is employed. In GAP such a decoding goes as
follows
C:=RandomLinearCode(15,5,GF(2));;
MinimumDistance(C);
5
# can correct 2 errors
c:=11101*C; # encoding
c in C;
true
r:=c+Codeword(01000000100000);
c1:=Decodeword(C,r);;
c1 = c;
true
m:=Decode(C,r); # obtain initial message word
[ 1 1 1 0 1 ]
One can also obtain the syndrome table that is a table of pairs coset leader /
syndrome by SyndromeTable(C).
The same idea is realized in Magma as follows.
C:=RandomLinearCode(GF(2),15,5); // can be [15,5,5] code
# can correct 2 errors
c:=Random(C);
e:=AmbientSpace(C) ! [0,1,0,0,0,0,0,0,1,0,0,0,0,0,0];
r:=c+e;
result,c1:=Decode(C,r);
result; // does decoding succeed?
true
c1 eq c;
true
There are more advanced decoding methods for general linear codes. More on
that in Section 10.6.
12.5.7 Cyclic codes
Example 12.5.14 We have already constructed finite fields and worked with
them in GAP and Magma. Let us take a look one more time at those notions
and show some new. In GAP we handle finite fields as follows.
G:=GF(2^5);;
a:=PrimitiveRoot(G);
Z(2^5)
DefiningPolynomial(G);
x_1^5+x_1^2+Z(2)^0
a^5+a^2+Z(2)^0; # check
0*Z(2)
Pretty much the same funtionality is provided in Magma
G:=GF(2^5);

a:=PrimitiveElement(G);
DefiningPolynomial(G);
$.1^5+$.1^2+1
b:=G.1;
a eq b;
true
// define explicitly
Px:=PolynomialRing(GF(2));
p:=x^5+x^2+1;
Fz:= extGF(2)|p;
F;
Finite field of size 2^5
Example 12.5.15 Minimal polynomials are computed in GAP as follows:
a:=PrimitiveUnityRoot(2,17);;
MinimalPolynomial(GF(2),a);
x_1^8+x_1^7+x_1^6+x_1^4+x_1^2+x_1+Z(2)^0
In Magma it is done analogous
MinimalPolynomial(a,GF(2));
x^8 + x^7 + x^6 + x^4 + x^2 + x + 1
Example 12.5.16 Some example on how to compute the cyclotomic polyno-
mial in GAP and Magma follow. In GAP
CyclotomicPolynomial(GF(2),10);
x_1^4+x_1^3+x_1^2+x_1+Z(2)^0
In Magma it is done as follows
CyclotomicPolynomial(10);
$.1^4 - $.1^3 + $.1^2 - $.1 + 1
Note that in Magma the cyclotomic polynomial is always defined over Q.
Example 12.5.17 Let us construct cyclic codes via roots in GAP and Magma.
In GAP/GUAVA we proceed as follows.
C:=GeneratorPolCode(h,17,GF(2));; # h is from Example 6.1.41
CR:=RootsCode(17,[1],2);;
MinimumDistance(CR);;
CR;
a cyclic [17,9,5]3..4 code defined by roots over GF(2)
C=CR;
true
C2:=GeneratorPolCode(g,17,GF(2));; # g is from Example 6.1.41
CR2:=RootsCode(17,[3],2);;
C2=CR2;
true
So we generated first a cyclic code which generator polynomial has a (predefined)
primitive root of unity as a root. Then we took the first element, which is not
in the cyclotomic class of 1 and that is 3. We constructed a cyclic code with a
primitive root of unity cubed as a root of the generator polynomial. Note that
these results are in accordance with Example 12.5.15. We can also compute the
number of all cyclic codes of the given length, as e.g.
NrCyclicCodes(17,GF(2));

8
In Magma we do the construction as follows
C:=CyclicCode(17,[a],GF(2));
Example 12.5.18 We can compute the Mattson-Solomon transform in Magma.
This is done as follows:
Fx := PolynomialRing(SplittingField(x^17-1));
f:=x^15+x^3+x;
A:=MattsonSolomonTransform(f,17);
A;
$.1^216*x^16 + $.1^177*x^15 + $.1^214*x^14 + $.1^99*x^13 +
$.1^181*x^12 + $.1^173*x^11 + $.1^182*x^10 + $.1^198*x^9 +
$.1^108*x^8 + $.1^107*x^7 + $.1^218*x^6 + $.1^91*x^5 + $.1^54*x^4
+ $.1^109*x^3 + $.1^27*x^2 + $.1^141*x + 1
InverseMattsonSolomonTransform(A,17) eq f;
true
So for the construction we need a field that contains a primitive n-th root of
unity. We can also compute the inverse transform.
12.5.8 Polynomial codes
Example 12.5.19 Now we describe constructions of Reed-Solomon codes in
GAP/ GUAVA and Magma. In GAP we proceed as follows:
C:=ReedSolomonCode(31,5);
a cyclic [31,27,5]3..4 Reed-Solomon code over GF(32)
A construction of the extended code is somewhat different from the one defined
in Definition 8.1.6. GUAVA first constructs by ExtendedReedSolomonCode(n,d)
first ReedSolomonCode(n-1,d-1) and then extends it. The code is defined over
GF(n), so n should be a prime power.
CE:=ExtendedReedSolomonCode(31,5);
a linear [31,27,5]3..4 extended Reed Solomon code over GF(31)
The generalized Reed-Solomon codes are handled as follows.
R:=PolynomialRing(GF(2^5));;
a:=Z(2^5);;
L:=List([1,2,3,6,7,10,12,16,20,24,25,29],i-Z(2^5)î);;
CG:=GeneralizedReedSolomonCode(L,4,R);;
So we define the polynomial ring R and the list of points L. Note that such a
construction corresponds to the construction from Definition 8.1.10 with b = 1.
In Magma we proceed as follows.
C:=ReedSolomonCode(31,5);
a:=PrimitiveElement(GF(2^5));
A:=[aî:i in [1,2,3,6,7,10,12,16,20,24,25,29]];
B:=[aî:i in [1,2,1,2,1,2,1,2,1,2,1,2]];
CG:=GRSCode(A,B,4);
So Magma give an opportunity to construct the generalized Reed-Solomon codes
with arbitrary b which entries are non-zero.
Example 12.5.20 In Magma one can compute subfield subcodes. This is done
as follows:

C:=CyclicCode(17,[a],GF(2^8)); // splitting field size 2^8
CSS:=SubfieldSubcode(C);
C2:=CyclicCode(17,[a],GF(2));
C2 eq CSS;
true
CSS_4:=SubfieldSubcode(C,GF(4)); // [17, 13, 4] code over GF(2^2)
By default the prime subfield is taken for the construction.
Example 12.5.21 *** GUAVA slow!!! ***
In Magma we can compute a trace code as is shown below:
CT:=Trace(C);
CT:Minimal;
[273, 272] Linear Code over GF(2)
We can also specify a subfield to restrict to by giving it as a second parameter
in Trace.
Example 12.5.22 In GAP/GUAVA in order to construct an alternant code
we proceed as follows
a:=Z(2^5);;
P:=List([1,2,3,6,7,10,12,16,20,24,25,29],i-aî);;
B:=List([1,2,1,2,1,2,1,2,1,2,1,2],i-aî);;
CA:=AlternantCode(2,B,P,GF(2));
a linear [12,5,3..4]3..6 alternant code over GF(2)
By providing an extension field as the last parameter in AlternantCode, one
constructs an extension code (as per Definition 8.2.1) of the one defined by the
base field (in our example it is GF(2)), rather than the restriction-construction
as in Definition 8.3.1.
In Magma one proceeds as follows.
A:=[aî:i in [1,2,3,6,7,10,12,16,20,24,25,29]];
B:=[aî:i in [1,2,1,2,1,2,1,2,1,2,1,2]];
CA:=AlternantCode(A,B,2);
CG:=GRSCode(A,B,2);
CGS:=SubfieldSubcode(Dual(CG));
CA eq CGS;
true
Here one can add a desired subfield for the restriction as in Definition 8.3.1 via
giving it as another parameter at the end of the parameter list for AlternantCode.
Example 12.5.23 In GAP/GUAVA one can construct a Goppa code as fol-
lows.
x:=Indeterminate(GF(2),x);;
g:=x^3+x+1;
C:=GoppaCode(g,15);
a linear [15,3,7]6..8 classical Goppa code over GF(2)
So the Goppa code C is constructed over the field, where the polynomial g is
defined. There is also a possibility to provide the list of non-roots L explicitly
via GoppaCode(g,L).
In Magma one needs to provide the list L explicitly.

Px:=PolynomialRing(GF(2^5));
G:=x^3+x+1;
L:=[a^i : i in [0..30]];
C:=GoppaCode(L,G);
C:Minimal;
[31, 16, 7] Goppa code (r = 3) Linear Code over GF(2)
The polynomial G should be deﬁned in the polynomial ring over the extension.
The command C:Minimal only displays the description for C, no generator ma-
trix is displayed.
Example 12.5.24 Now we show how binary Reed-Muller code can be con-
structed in GAP/GUAVA and also we check the property from the previous
proposition.
u:=5;;
m:=7;;
C:=ReedMullerCode(u,m);;
C2:=ReedMullerCode(m-u-1,m);;
CD:=DualCode(C);
CD = C2;
true
In Magma one can do the above analogously:
u:=5;
m:=7;
C:=ReedMullerCode(u,m);
C2:=ReedMullerCode(m-u-1,m);
CD:=Dual(C);
CD eq C2;
true
12.5.9 Algebraic decoding

Chapter 13
Bézout’s theorem and codes
on plane curves
Ruud Pellikaan
In this section affine and projective plane curves are defined. Bézout’s theorem
on the number of points in the intersection of two plane curves is proved. A class
of codes from plane curves is introduced and the parameters of these codes are
determined. Divisors and rational functions on plane curve will be discussed.
13.1 Affine and projective space
lines planes quadrics
coordinate transformations
pictures
13.2 Plane curves
Let F be a field and ¯F its algebraic closure. By an affine plane curve over F we
mean the set of points (x, y) ∈ ¯F2
such that F(x, y) = 0, where F ∈ F[X, Y ].
Here F = 0 is called the defining equation of the curve. The F-rational points
of the curve with defining equation F = 0 are the points (x, y) ∈ F2
such that
F(x, y) = 0. The degree of the curve is the degree of F.
Two plane curves with defining equations F = 0 and G = 0 have a component
in common with defining equation H = 0, if F and G have a nontrivial factor
H in common, that is F = BH and G = AH for some A, B ∈ F[X, Y ], and the
degree of H is not zero.
A curve with defining equation F = 0, F ∈ F[X, Y ], is called irreducible if F is
not divisible by any G ∈ F[X, Y ] such that 0 deg(G) deg(F), and absolutely
irreducible if F is irreducible when considered as a polynomial in ¯F[X, Y ].
The partial derivative with respect to X of a polynomial F = fijXi
Y j
is
defined by
FX = ifijXi−1
Y j
.
403

404CHAPTER 13. BÉZOUT’S THEOREM AND CODES ON PLANE CURVES
The partial derivative with respect to Y is defined similarly.
A point (x, y) on an affine curve with equation F = 0 is singular if FX(x, y) =
FY (x, y) = 0, where FX and FY are the partial derivatives of F with respect to
X and Y , respectively. A regular point of a curve is a nonsingular point of the
curve. A regular point (x, y) on the curve has a well-defined tangent line to the
curve with equation
FX(x, y)(X − x) + FY (x, y)(Y − y) = 0.
Example 13.2.1 The curve with defining equation X2
+ Y 2
= 0 can be con-
sidered over any field. The polynomial X2
+ Y 2
is irreducible in F3[X, Y ] but
reducible in F9[X, Y ] and F5[X, Y ]. The point (0, 0) is an F-rational point of
the curve over any field F, and it is the only singular point of this curve if the
characteristic of F is not two.
A projective plane curve of degree d with defining equation F = 0 over F is the
set of points (x : y : z) ∈ P2
(¯F) such that F(x, y, z) = 0, where F ∈ F[X, Y, Z]
is a homogeneous polynomial of degree d.
Let F = fijXi
Y j
∈ F[X, Y ] be a polynomial of degree d. The homogenization
F∗
of F is an element of F[X, Y, Z] and is defined by
F∗
= fijXi
Y j
Zd−i−j
.
Then F∗
(X, Y, Z) = Zd
F(X/Z, Y/Z). If F = 0 defines an affine plane curve
of degree d, then F∗
= 0 is the equation of the corresponding projective curve.
A point at infinity of the affine curve with equation F = 0 is a point of the
projective plane in the intersection of the line at infinity and the projective
curve with equation F∗
= 0. So the points at infinity on the curve are all points
(x : y : 0) ∈ P2
(¯F) such that F∗
(x, y, 0) = 0.
A projective plane curve is reducible, respectively absolutely irreducible, if its
defining homogeneous polynomial is irreducible, respectively absolutely irre-
ducible.
A point (x : y : z) on a projective curve with equation F = 0 is singular if
FX(x, y, z) = FY (x, y, z) = FZ(x, y, z) = 0, and regular otherwise. Through a
regular point (x : y : z) on the curve passes the tangent line with equation
FX(x, y, z)X + FY (x, y, z)Y + FZ(x, y, z)Z = 0.
If F ∈ F[X, Y, Z] is a homogeneous polynomial of degree d, then Euler’s equation
XFX + Y FY + ZFZ = dF
holds. So the two definitions of the tangent line to a curve in the affine and
projective plane are consistent with each other.
A curve is called regular or nonsingular if all its points are regular. In Corol-
lary 13.3.13 it will be shown that a regular projective plane curve is absolutely
irreducible.
Remark 13.2.2 Let F be a polynomial in F[X, Y ] of degree d. Suppose that
F has at least d + 1 elements. Then there exists an affine change of coordinates

13.2. PLANE CURVES 405
such that the coefficients of Ud
and V d
in F(U, V ) are 1. This is seen as follows.
The projective curve with the defining equation F∗
= 0 intersects the line at
infinity in at most d points. Then there exist two F-rational points P and Q on
the line at infinity and not on the curve. Choose a projective transformation
of coordinates which transforms P and Q into (1 : 0 : 0) and (0 : 1 : 0),
respectively. This change of coordinates leaves the line at infinity invariant and
gives a polynomial F(U, V ) such that the coefficients of Ud
and V d
are not
zero. An affine transformation can now transform these coefficients into 1. If
for instance F = X2
Y + XY 2
∈ F4[X, Y ] and α is a primitive element of F4,
then X = U + αV and Y = αU + V gives F(U, V ) = U3
+ V 3
.
Similarly, for all polynomials F, G ∈ F[X, Y ] of degree l and m there exists an
affine change of coordinates such that the coefficients of V l
and V m
in F(U, V )
and G(U, V ), respectively, are 1.
Example 13.2.3 The Fermat curve Fm is a projective plane curve with defin-
ing equation
Xm
+ Y m
+ Zm
= 0.
The partial derivatives of Xm
+ Y m
+ Zm
are mXm−1
, mY m−1
, and mZm−1
.
So considered as a curve over the finite field Fq, it is regular if m is relatively
prime to q.
Example 13.2.4 Suppose q = r2
. The Hermitian curve Hr over Fq is defined
by the equation
Ur+1
+ V r+1
+ 1 = 0.
The corresponding homogeneous equation is
Ur+1
+ V r+1
+ Wr+1
= 0.
Hence it has r + 1 points at infinity and it is the Fermat curve Fm over Fq with
r = m − 1. The conjugate of a ∈ Fq over Fr is obtained by ¯a = ar
. So the
equation can also be written as
U ¯U + V ¯V + W ¯W = 0.
This looks like equating a Hermitian form over the complex numbers to zero
and explains the terminology.
We will see in Section 3 that for certain constructions of codes on curves it is
convenient to have exactly one point at infinity. We will give a transformation
such that the new equation of the Hermitian curve has this property. Choose
an element b ∈ Fq such that br+1
= −1. There are exactly r + 1 of these,
since q = r2
. Let P = (1 : b : 0). Then P is a point of the Hermitian curve.
The tangent line at P has equation U + br
V = 0. Multiplying with b gives the
equation V = bU. Substituting V = bU in the defining equation of the curve
gives that Wr+1
= 0. So P is the only intersection point of the Hermitian
curve and the tangent line at P. New homogeneous coordinates are chosen such
that this tangent line becomes the line at infinity. Let X1 = W, Y1 = U and
Z1 = bU − V . Then the curve has homogeneous equation
Xr+1
1 = br
Y r
1 Z1 + bY1Zr
1 − Zr+1
1

in the coordinates X1, Y1 and Z1. Choose an element a ∈ Fq such that ar
+a =
−1. There are r of these. Let X = X1, Y = bY1 + aZ1 and Z = Z1. Then the
curve has homogeneous equation
Xr+1
= Y r
Z + Y Zr
with respect to X, Y and Z. Hence the Hermitian curve has affine equation
Xr+1
= Y r
+ Y
with respect to X and Y . This last equation has (0 : 1 : 0) as the only point at
infinity.
To see that the number of affine Fq-rational points is r+(r+1)(r2
−r) = r3
one
argues as follows. The right side of the equation Xr+1
= Y r
+ Y is the trace
from Fq to Fr. The first r in the formula on the number of points corresponds
to the elements of Fr. These are exactly the elements of Fq with zero trace. The
remaining term corresponds to the elements in Fq with a nonzero trace, since
the equation Xr+1
= β, β ∈ F∗
r, has exactly r + 1 solutions in Fq.
Example 13.2.5 The Klein curve has homogeneous equation
X3
Y + Y 3
Z + Z3
X = 0.
More generally we define the curve Km by the equation
Xm
Y + Y m
Z + Zm
X = 0.
Suppose that m2
−m+1 is relatively prime to q. The partial derivatives of the left
side of the equation are mXm−1
Y + Zm
, mY m−1
Z + Xm
and mZm−1
X + Y m
.
Let (x : y : z) be a singular point of the curve Km. If m is divisible by the
characteristic, then xm
= ym
= zm
= 0. So x = y = z = 0, a contradiction. If
m and q are relatively prime, then xm
y = −mym
z = m2
zm
x. So
(m2
− m + 1)zm
x = xm
y + ym
z + zm
x = 0.
Therefore z = 0 or x = 0, since m2
− m + 1 is relatively prime to the character-
istic. But z = 0 implies xm
= −mym−1
z = 0. Furthermore ym
= −mzm−1
x.
So x = y = z = 0, which is a contradiction. Similarly x = 0 leads to a contra-
diction. Hence Km is nonsingular if gcd(m2
− m + 1, q) = 1.
13.3 Bézout’s theorem
The principal theorem of algebra says that a polynomial of degree m in one vari-
able with coefficients in a field has at most m zeros. If the field is algebraically
closed and if the zeros are counted with multiplicities, then the total number
of zeros is equal to m. Bézout’s theorem is a generalization of the principal
theorem of algebra from one to several variables. It can be stated and proved
in any number of variables. But only the two variable case will be treated, that
is to say the case of plane curves.
First we recall some wellknown notions from commutative algebra.

13.3. BÉZOUT’S THEOREM 407
Let R be a commutative ring with a unit. An ideal I in R is called a prime
ideal if I = R and for all f, g ∈ R if fg ∈ I, then f ∈ I or g ∈ I.
Let F be a field. Let F be a polynomial in F[X, Y ] which is not a constant, and
let I be the ideal in F[X, Y ] generated by F. Then I is a prime ideal if and only
if F is irreducible.
Let R be a commutative ring with a unit. A nonzero element f of R is called a
zero divisor if fg = 0 for some g ∈ R, g = 0. The ring R is called an integral
domain if it has no zero divisors.
Let S be a commutative ring with a unit. Let I be an ideal in S. The factor
ring of S modulo I is denoted by S/I. Then I is a prime ideal if and only if
S/I is an integral domain.
Let R be an integral domain. Define the relation ∼ on the set of pairs {(f, g)|f, g ∈
R, g = 0} by (f1, g1) ∼ (f2, g2) if and only if there exists an h ∈ R, h = 0 such
that f1g2h = g1f2h. This is an equivalence relation. Its classes are called frac-
tions . The class of (f, g) is denoted by f/g or f
g and f is called the numerator
and g the denominator. The field of fractions or quotient field of R consists of
all fractions f/g where f, g ∈ R and g = 0 and is denoted by Q(R). This indeed
is a field with addition and multiplication defined by
f1
g1
+
f2
g2
=
f1g2 + f2g1
g1g2
and
f1
g1
·
f2
g2
=
f1f2
g1g2
.
Example 13.3.1 The quotient field of the integers Z is the rationals Q. The
quotient field of the ring of polynomials F[X1, . . . , Xm] is called the field of
rational functions (in m variables) and is denoted by F(X1, . . . , Xm).
Remark 13.3.2 If R is a commutative ring with a unit, then matrices with
entries in R and the determinant of a square matrix can be defined as when
R is a field. The usual properties for matrix addition and multiplication hold.
If moreover R is an integral domain, then a square matrix M of size n has
determinant zero if and only if there exists a nonzero r ∈ Rn
such that rM = 0.
This is seen by considering the same statement over the quotient field Q(R),
where it is true, and clearing denominators.
Furthermore we define an algebraic construction which is called the resultant of
two polynomials that measures whether they have a factor in common.
Definition 13.3.3 Let R be a commutative ring with a unit. Then R[Y ] is the
ring of polynomials in one variable Y with coefficients in R. Let F and G be
two polynomials in R[Y ] of degree l and m, respectively. Then F =
l
i=0 fiY i
and G =
m
j=0 gjY j
, where fi, gj ∈ R for all i, j. Define the Sylvester matrix
Sylv(F, G) of F and G as the square matrix of size l + m by
Sylv(F, G) =
















f0 f1 . . . . fl 0 . . . 0 0
0 f0 f1 . . . . fl 0 . . . 0
...
...
...
...
... · · ·
...
...
...
0 . . . 0 f0 f1 . . . . fl 0
0 0 . . . 0 f0 f1 . . . . fl
g0 g1 . . . . . gm 0 . 0
0 g0 g1 . . . . . gm 0 .
...
...
...
...
... · · ·
...
...
...
0 . . . 0 g0 g1 . . . . . gm
















.

The first m rows consist of the cyclic shifts of the first row
f0 f1 . . . fl 0 . . . 0
and the last l rows consist of the cyclic shifts of row m + 1
g0 g1 . . . gm 0 . . . 0 .
The determinant of Sylv(F, G) is called the resultant of F and G and is denoted
by Res(F, G).
Proposition 13.3.4 If R is an integral domain and f and g are elements of
R[Y ], then Res(F, G) = 0 if and only if F and G have a nontrivial common
factor.
Proof. If F and G have a nontrivial common factor, then F = BH and G =
AH for some A, B and H in R[Y ], where H has nonzero degree. So AF = BG
for some A and B, where deg(A) m = deg(G) and deg(B) l = deg(F).
Write A = aiY i
, F = fjY j
, B = brY r
and G = gsY s
. Rewrite the
equation AF − BG = 0 as a system of equations
i+j=k
aifj −
r+s=k
brgs = 0 for k = 0, 1, ..., l + m − 1,
or as a matrix equation
(a, −b)Sylv(F, G) = 0,
where a = (a0, a1, ..., am−1), and b = (b0, b1, ..., bl−1). Hence the rows of the
matrix Sylv(F, G) are dependent in case F and G have a common factor and so
its determinant is zero. Thus we have shown that if F and G have a nontrivial
common factor, then Res(F, G) = 0. The converse is also true. This is proved
by reversing the argument.
Corollary 13.3.5 If F is an algebraically closed field and F and G are elements
of F[Y ], then Res(F, G) = 0 if and only if F and G have a common zero in F.
After this introduction on the resultant, we are in a position to prove a weak
form of Bézout’s theorem.
Proposition 13.3.6 Two plane curves of degree l and m that do not have a
component in common, intersect in at most lm points.
Proof. A special case of Bézout is m = 1. A line, which is not a component of
a curve of degree l, intersects this curve in at most l points. Stated differently,
suppose that F is a polynomial in X and Y with coefficients in a field F and
has degree l, and L is a nonconstant linear form. If F and L have more than l
common zeros, then L divides F in F[X, Y ]. A more general special case is if
F is a product of linear terms. So if one of the curves is a union of lines and
the other curve does not contain any of these lines as a component, then the
number of points in the intersection is at most lm. This follows from the above
special case. The third special case is: if G = XY − 1 and F is arbitrary. Then

the first curve can be parameterized by X = T, Y = 1/T; substituting this in
F gives a polynomial in T and 1/T of degree at most l, multiplying by Tl
gives
a polynomial of degree at most 2l, and therefore the intersection of these two
curves has at most 2l points. It is not possible to continue like this, that is to
say by parameterizing the second curve by rational functions in T: X = X(T)
and Y = Y (T).
The proof of the general case uses elimination theory. Suppose that we have two
equations in two variables of degree l and m, respectively, and we eliminate one
variable. Then we get a polynomial in one variable of degree at most lm having
as zeros the first coordinates of common zeros of the two original polynomials.
In geometric terms, we have two curves of degree l respectively m in the affine
plane, and we project the points of the intersection to a line. If we can show
that we get at most lm points on this line and we can choose the projection in
such a way that no two points of the intersection project to one point on the
line, then we are done.
We may assume that the field is algebraically closed, since by a common zero
(x, y) of F and G, we mean a pair (x, y) ∈ ¯F2
such that F(x, y) = G(x, y) = 0.
Let F and G be polynomials in the variables X and Y of degree l and m,
respectively, with coefficients in a field F, and which do not have a common
factor in F[X, Y ]. Then they do not have a nontrivial common factor in R[Y ],
where R = F[X], so Res(F, G) is not zero by Proposition 13.3.4. By Remark
13.2.2 we may assume that, after an affine change of coordinates, F and G
are monic and have degree l and m, respectively, as polynomials in Y with
coefficients in F[X]. Hence F =
l
i=0 fiY i
and G =
m
j=0 gjY j
, where fi and
gj are elements of F[X] of degree at most l − i and m − j, respectively, and
fl = gm = 1. The square matrix Sylv(F, G) of size l + m has entries in F[X].
Taking the determinant gives the resultant Res(F, G) which is an element of
R = F[X], that is to say a polynomial in X with coefficients in F.
The degree is at most lm. This can be seen by homogenizing F and G. Then
F∗
=
l
i=1 fi yi
where fi is a homogeneous polynomial in X and Z of degree
l − i, and similarly for G∗
. The determinant D(X, Z) of the corresponding
Sylvester matrix is homogeneous of degree lm, since
D(TX, TZ) = Tlm
D(X, Z).
This is seen by dividing the rows and columns of the matrix by appropriate
powers of T.
We claim that the zeros of the polynomial Res(F, G) are exactly the projection
of all points in the intersection of the curves defined by the equations F = 0
and G = 0. Thus we claim that x is a zero of Res(F, G) if and only if there
exists an element y ∈ F such that (x, y) is a common zero of F and G.
Let F(x) and G(x) be the polynomials in F[Y ], which are obtained from F and
G by substituting x for X. The polynomials F(x) and G(x) have again degree
l and m in Y , since we assumed that F and G are monic polynomials in Y of
degrees l and m, respectively. Now
Res(F(x), G(x)) = Res(F, G)(x),

that is to say it does not make a difference if we substitute x for X first and take
the resultant afterwards, or take the resultant first and make the substitution
afterwards. The degree of F and G has not diminished after the substitution.
Let (x, y) be a common zero of F and G. Then y is a common zero of F(x) and
G(x), so Res(F(x), G(x)) = 0 by Corollary 13.3.5, and therefore Res(F, G)(x) =
0.
For the proof of the converse statement, one reads the above proof backwards.
Now we know that Res(F, G) is not identically zero and has degree at most lm,
and therefore Res(F, G) has at most lm zeros. There is still a slight problem,
it may happen that for a fixed zero x of Res(F, G) there exists more than one
y such that (x, y) is a zero of F and G. This occasionally does happen. We will
show that after a suitable coordinate change this does not occur.
For every zero x of Res(F, G) there are at most min{l, m} elements y such that
(x, y) is a zero of F and G. Therefore F and G have at most min{l2
m, lm2
} zeros
in common, hence the collection of lines, which are incident with two distinct
points of these zeros, is finite. Hence we can find a point P that is not in the
union of this finite collection of lines. Furthermore there exists a line L incident
with P and not incident with any of the common zeros of F and G. In fact
almost every point P and line L , incident with P , have the above mentioned
properties. Choose homogeneous coordinates such that P = (0 : 1 : 0) and L is
the line at infinity. If P1 = (x, y1) and P2 = (x, y2) are zeros of F and G, then
the line with equation X −xZ = 0 through the corresponding points (x : y1 : 1)
and (x : y2 : 1) in the projective plane, contains also P. This contradicts the
choice made for P. So for every zero x of Res(f, g) there exists exactly one y
such that (x, y) is a zero of F and G. Hence F and G have at most lm common
zeros. This finishes the proof of the weak form of Bézout’s theorem.
There are several reasons why the number of points in the intersection could be
less than lm: F may not be algebraically closed; points of the intersection may
lie at infinity; and multiplicities may occur.
Take for instance F = X2
− Y 2
+ 1, G = Y and F = F3. Then the two points
of the intersection lie in F2
9 and not in F2
3. Let H = Y − 1. Then the two lines
defined by G and H have no intersection in the affine plane. The homogenized
polynomials G∗
= G and H∗
= Y − Z define curves in the projective plane
which have exactly (1 : 0 : 0) in their intersection. Finally the line with equa-
tion H = 0 is the tangent line to the conic defined by F at the point (0, 1), and
this point has to be counted with multiplicity 2.
In order to define the multiplicity of a point of intersection we have to localize
the ring of polynomials.
Definition 13.3.7 Let P = (x, y) ∈ F2
. Let F[X, Y ]P be the subring of the
field of fractions F(X, Y ) consisting of all fractions A/B such that A, B ∈
F[X, Y ] and B(P) = 0. The ring F[X, Y ]P is called the localization of F[X, Y ]
at P.
We explain the use of localization for the definition of the multiplicity by analogy
to the multiplicity of a zero of a polynomial in one variable. Let F = (X −a)e
G,
where a ∈ F, F, G ∈ F[X] and G(a) = 0. Then a is a zero of F with multiplicity
e. The dimension of F[X]/(F) as a vector space over F is equal to the degree

of F. But the element G is invertible in the localization F[X]a of F[X] at a. So
the ideal generated by F in F[X]a is equal to the ideal generated by (X − a)e
.
Hence the dimension of F[X]a/(F) over F is equal to e.
Definition 13.3.8 Let P be a point in the intersection of two affine curves X
and Y defined by F and G, respectively. The intersection multiplicity I(P; X, Y)
of P at X and Y is defined by
I(P; X, Y) = dim F[X, Y ]P /(F, G).
Without proof we state several properties of the intersection multiplicity.
After a projective change of coordinates it may be assumed that the point
P = (0, 0) is the origin of the affine plane. There is a unique way to write F as
the sum of its homogeneous parts
F = Fd + Fd+1 + · · · + Fl,
where Fi is homogeneous of degree i, and Fd = 0 and Fl = 0. The homogeneous
polynomial Fd defines a union of lines over ¯F, which are called the tangent lines
of X at P. The point P is regular point if and only if d = 1. The tangent line
to X at P is defined by F1 = 0 if d = 1. Similarly
G = Ge + Ge+1 + · · · + Gm.
If the tangent lines of X at P are distinct from the tangent lines of Y at P, then
the multiplicity of P is equal to de. In particular, if P is a regular point of both
curves and the tangent lines are distinct, then d = e = 1 and the intersection
multiplicity is 1.
The Hermitian curve over Fq, with q = r2
, has the property that every line
in the projective plane with coefficients in Fq intersects the Hermitian curve in
r + 1 distinct points or in exactly one point with multiplicity r + 1.
Definition 13.3.9 A cycle is a formal sum mP P of points of the projective
plane P2
(¯F) with integer coefficients mP such that for finitely many P its coeffi-
cient mP is not zero. The degree of a cycle is defined by deg( mP P) = mP .
If the projective plane curves X and Y are defined by the equations F = 0 and
G = 0, respectively, then the intersection cycle X · Y is defined by
X · Y = I(P; X, Y)P.
Proposition 13.3.6 implies that this indeed is a cycle, that is to say there are
only finitely many points P such that I(P; X, Y) is not zero.
Example 13.3.10 Consider the curve X with homogeneous equation
Xa
Y c
+ Y b+c
Za−b
+ Xd
Za+c−d
= 0
with d b a. Let L be the line with equation X = 0. The intersection of L
with X consists of the points P = (0 : 0 : 1) and Q = (0 : 1 : 0).
The origin of the affine plane is mapped to P under the mapping (x, y) → (x :
y : 1). The affine equation of the curve is
Xa
Y c
+ Y b+c
+ Xd
= 0.

The intersection multiplicity at P of X and L is equal to the dimension of
F[X, Y ]0/(X, Xa
Y c
+ Y b+c
+ Xd
), which is b + c.
The origin of the affine plane is mapped to Q under the mapping (x, z) → (x :
1 : z). The affine equation of the curve becomes now
Xa
+ Za−b
+ Xd
Za+c−d
= 0.
The intersection multiplicity at P of X and L is equal to the dimension of
F[X, Y ]0/(X, Xa
+ Za−b
+ Xd
Za+c−d
), which is a − b. Therefore
X · L = (b + c)P + (a − b)Q.
Let M be the line with equation Y = 0. Let N be the line with equation Z = 0.
Let R = (1 : 0 : 0). One shows similiarly that
X · M = dP + (a + c − d)R and X · N = aQ + cR.
We state now as a fact the following strong version of Bézout’s theorem.
Theorem 13.3.11 If X and Y are projective plane curves of degrees l and m,
respectively, that do not have a component in common, then
deg(X · Y) = lm.
Corollary 13.3.12 Two projective plane curves of positive degree have a point
in common.
Corollary 13.3.13 A regular projective plane curve is absolutely irreducible.
Proof. If F = GH is a factorization of F with factors of positive degree, we
get
FX = GXH + GHX
by the product or Leibniz rule for the partial derivative. So FX is an element
of the ideal generated by G and H, and similarly for the other two partial
derivatives. Hence the set of common zeros of FX, FY , FZ and F contains the
set of common zeros of G and H. The intersection of the curves with equations
G = 0 and H = 0 is not empty, by Corollary 13.3.12 since G and H have positive
degrees. Therefore the curve has a singular point.
Remark 13.3.14 Notice that the assumption that the curve is a projective
plane curve is essential. The equation X2
Y −X = 0 defines a regular affine plane
curve, but is clearly reducible. However one gets immediately from Corollary
13.3.13 that if F = 0 is an affine plane curve and the homogenization F∗
defines
a regular projective curve, then F is absolutely irreducible. The affine curve
with equation X2
Y − X = 0 has the points (1 : 0 : 0) and (0 : 1 : 0) at infinity,
and (0 : 1 : 0) is a singular point.

13.4. CODES ON PLANE CURVES 413
13.3.1 Another proof of Bezout’s theorem by the footprint
13.4 Codes on plane curves
Let G be an irreducible element of Fq[X, Y ] of degree m. Let P1, . . . , Pn be n
distinct points in the affine plane over Fq which lie on the plane curve defined
by the equation G = 0. So G(Pj) = 0 for all j = 1, . . . , n. Consider the code
E(l) = {(F(P1), . . . , F(Pn)) | F ∈ Fq[X, Y ], deg(F) ≤ l}.
Let Vl be the vector space of all polynomials in two variables X, Y and coef-
ficients in Fq,and of degree at most l. Let P = {P1, . . . , Pn}. Consider the
evaluation map
evP : Fq[X, Y ] −→ Fn
q
defined by evP (F) = (F(P1), . . . , F(Pn)). Then this is a linear map that has
E(l) as image of Vl.
Proposition 13.4.1 Let k be the dimension and d the minimum distance of
the code E(l). Suppose lm n. Then
d ≥ n − lm.
k =
l+2
2 if l m,
lm + 1 − m−1
2 if l ≥ m.
Proof. The monomials of the form Xα
Y β
with α + β ≤ l form a basis of Vl.
Hence Vl has dimension l+2
2 .
Let F ∈ Vl. If G is a factor of F, then the corresponding codeword evP (F)
is zero. Conversely, if evP (F) = 0, then the curves with equation F = 0 and
G = 0 have degree l ≤ l and m, respectively, and have the n points P1, . . . , Pn
in their intersection. Bézout’s theorem and the assumption lm n imply that
F and G have a common factor. But G is irreducible. Therefore F is divisible
by G. So the kernel of the evaluation map, restricted to Vl, is equal to GVl−m,
which is zero if l m. Hence k = l+2
2 if l m, and
k =
l + 2
2
−
l − m + 2
2
= lm + 1 −
m − 1
2
if l ≥ m.
The same argument with Bézout gives that a nonzero codeword has at most lm
zeros, and therefore has weight at least n − lm. This shows that d ≥ n − lm.
Example 13.4.2 Conics, reducible and irreducible.............................
Remark 13.4.3 If F1, . . . , Fk is a basis for Vl modulo GVl−m, then
(Fi(Pj) | 1 ≤ i ≤ k, 1 ≤ j ≤ n)
is a generator matrix of E(l). So it is a parity check matrix for C(l), the dual
of E(l). The minimum distance d⊥
of C(l) is equal to the minimal number of
dependent columns of this matrix. Hence for all t d⊥
and every subset Q of

P consisting of t distinct points Pi1
, . . . , Pit
, the corresponding k × t submatrix
must have maximal rank t. Let Ll = Vl/GVl−m. Then the evaluation map evQ
induces a surjective map from Ll to Ft
q. The kernel is the space of all functions
F ∈ Vl which are zero at the points of Q modulo GVl−m, which we denote by
Ll(Q). So dim(Ll(Q)) = k − t.
Conversely, the dimension of Ll(Q) is at least k −t for all t-subsets Q of P. But
in order to get a bound for d⊥
, we have to know that dim(Ll(Q)) = k − t for
all t d⊥
. The theory developed so far is not sufficient to get such a bound.
The theorem of Riemann-Roch in the theory of algebraic curves gives an answer
to this question. See Section ??. Section ?? gives another, more elementary,
solution to this problem.
Notice that the following inequality hold for the codes E(l):
k + d ≥ n + 1 − g,
where g = (m − 1)(m − 2)/2. In Section 7 we will see that g is the (arithmetic)
genus. In Sections 3-6 the role of g will be played by the number of gaps of the
(Weierstrass) semigroup of a point at infinity.
13.5 Conics, arcs and Segre
Proposition 13.5.1
m(3, q) =
q + 1 if q is odd,
q + 2 if q is even.
Proof. We have seen that m(3, q) is at least q + 1 for all q in Example ??. If
case q is even, then m(3, q) is least q + 2 by in Example 3.2.12. ***Segre***
***Finite geometry and the Problems of Segre***
13.6 Qubic plane curves
13.6.1 Elliptic cuves
13.6.2 The addition law on elliptic curves
13.6.3 Number of rational points on an elliptic curve
Manin’s proof, Chahal
13.6.4 The discrete logarithm on elliptic curves
13.7 Quartic plane curves
13.7.1 Flexes and bitangents
13.7.2 The Klein quartic
13.8 Divisors
In the following, X is an irreducible smooth projective curve over an alge-
braically closed field F.

13.8. DIVISORS 415
Definition 13.8.1 A divisor is a formal sum D = P ∈X nP P, with nP ∈ Z
and nP = 0 for all but a finite number of points P. The support of a divisor is
the set of points with nonzero coefficient. A divisor D is called effective if all
coefficients nP are non-negative (notation D 0). The degree deg(D) of the
divisor D is nP .
Definition 13.8.2 Let X and Y be projective plane curves defined by the equa-
tions F = 0 and G = 0, respectively, then the intersection divisor X ·Y is defined
by
X · Y = I(P; X, Y)P,
where I(P; X, Y) is the intersection mulitplicity of Definition ??.
Bézout’s theorem tells us that X · Y is indeed a divisor and that its degree is
lm if the degrees of X and Y are l and m, respectively.
Let vP = ordP be the discrete valuation defined for functions on X in Definition
??.
Definition 13.8.3 If f is a rational function on X, not identically 0, we define
the divisor of f to be
(f) =
P ∈X
vP (f)P.
So, in a sense, the divisor of f is a bookkeeping device that tells us where the
zeros and poles of f are and what their multiplicities and orders are.
Theorem 13.8.4 The degree of a divisor of a rational function is 0.
Proof. Let X be a projective curve of degree l. Let f be a rational func-
tion on the curve X. Then f is represented by a quotient A/B of two ho-
mogeneous polynomials of the same degree, say m. Let Y and Z be the hy-
persurfaces defined by the equations A = 0 and B = 0, respectively. Then
vP (f) = I(P; X, Y) − I(P; X, Z), since f = a/b = (a/hm
)(b/hm
)−1
, where H is
a homogeneous linear form representing h such that H(P) = 0. Hence
(f) = X · Y − X · Z.
So (f) is indeed a divisor and its degree is zero, since it is the difference of two
intersection divisors of the same degree lm.
Example 13.8.5 Look at the curve of Example ??. We saw that f = x/(y+z)
has a pole of order 2 in Q = (0 : 1 : 1). The line L with equation X = 0 intersects
the curve in three points, namely P1 = (0 : α : 1), P2 = (0 : 1 + α : 1) and Q.
So X · L = P1 + P2 + Q. The line M with equation Y = 0 intersects the curve
in three points, namely P3 = (1 : 0 : 1), P4 = (α : 0 : 1) and P5 = (1 + α : 0 : 1).
So X · M = P3 + P4 + P5. The line N with equation Y + Z = 0 intersects
the curve only in Q. So X · N = 3Q. Hence (x/(y + z)) = P1 + P2 − 2Q and
(y/(y + z)) = P3 + P4 + P5 − 3Q.
In this example it is not necessary to compute the intersection multiplicities
since they are a consequence of Bézout’s theorem.

Example 13.8.6 Let X be the Klein quartic with equation X3
Y + Y 3
Z +
Z3
X = 0 of Example 13.2.5. Let P1 = (0 : 0 : 1), P2 = (1 : 0 : 0) and
Q = (0 : 1 : 0). Let L be the line with equation X = 0. Then L intersects X in
the points P1 and Q. Since L is not tangent in Q, we see that I(Q; X, L) = 1.
So the intersection multiplicity of X and L in P1 is 3, since the multiplicities
add up to 4. Hence X · L = 3P1 + Q. Similarly we get for the lines M and
N with equations Y = 0 and Z = 0, respectively, X · M = 3P2 + P1 and
X · N = 3Q + P2. Therefore (x/z) = 3P1 − P2 − 2Q and (y/z) = P1 + 2P2 − 3Q.
Definition 13.8.7 The divisor of a rational function is called a principal divi-
sor. We shall call two divisors D and D linearly equivalent if and only if D−D
is a principal divisor ; notation D ≡ D .
This is indeed an equivalence relation.
Definition 13.8.8 Let D be a divisor on a curve X. We define a vector space
L(D) over F by
L(D) = {f ∈ F(X)∗
| (f) + D 0} ∪ {0}.
The dimension of L(D) over F is denoted by l(D).
Note that if D =
r
i=1 niPi −
s
j=1 mjQj with all ni, mj 0, then L(D)
consists of 0 and the functions in the function field that have zeros of multiplicity
at least mj at Qj (1 ≤ j ≤ s) and that have no poles except possibly at the
points Pi, with order at most ni (1 ≤ i ≤ r). We shall show that this vector
space has finite dimension.
First we note that if D ≡ D and g is a rational function with (g) = D − D ,
then the map f → fg shows that L(D) and L(D ) are isomorphic.
Theorem 13.8.9
(i) l(D) = 0 if deg(D) 0,
(ii) l(D) ≤ 1 + deg(D).
Proof. (i) If deg(D) 0, then for any function f ∈ F(X)∗
, we have deg((f)+
D) 0, that is to say, f /∈ L(D).
(ii) If f is not 0 and f ∈ L(D), then D = D+(f) is an effective divisor for which
L(D ) has the same dimension as L(D) by our observation above. So without
loss of generality D is effective, say D =
r
i=1 niPi, (ni ≥ 0 for 1 ≤ i ≤ r).
Again, assume that f is not 0 and f ∈ L(D). In the point Pi, we map f onto
the corresponding element of the ni-dimensional vector space (t−ni
i OPi
)/OPi
,
where ti is a local parameter at Pi. We thus obtain a mapping of f onto the
direct sum of these vector spaces ; (map the 0-function onto 0). This is a linear
mapping. Suppose that f is in the kernel. This means that f does not have a
pole in any of the points Pi, that is to say, f is a constant function. It follows
that
l(D) ≤ 1 +
r
i=1
ni = 1 + deg(D).

13.9. DIFFERENTIALS ON A CURVE 417
Example 13.8.10 Look at the curve of Examples ?? and 13.8.5. We saw that
f = x/(y+z) and g = y/(y+z) are regular outside Q and have a pole of order 2
and 3, respectively, in Q = (0 : 1 : 1). So the functions 1, f and g have mutually
distinct pole orders and are elements of L(3Q). Hence the dimension of L(3Q)
is at least 3. We will see in Example 13.10.3 that it is exactly 3.
13.9 Differentials on a curve
Let X be an irreducible smooth curve with function field F(X).
Definition 13.9.1 Let V be a vector space over F(X). An F-linear map D :
F(X) → V is called a derivation if it satifies the product rule
D(fg) = fD(g) + gD(f).
Example 13.9.2 Let X be the projective line with funtion field F(X). Define
D(F) = iaiXi−1
for a polynomial F = aiXi
∈ F[X] and extend this
definition to quotients by
D
F
G
=
GD(F) − FD(G)
G2
.
Then D : F(X) → F(X) is a derivation.
Definition 13.9.3 The set of all derivations D : F(X) → V will be denoted by
Der(X, V). We denote Der(X, V) by Der(X) if V = F(X).
The sum of two derivations D1, D2 ∈ Der(X, V) is defined by (D1 + D2)(f) =
D1(f) + D2(f). The product of D ∈ Der(X, V) with f ∈ F(X) is defined by
(fD)(g) = fD(g). In this way Der(X, V) becomes a vector space over F(X).
Theorem 13.9.4 Let t be a local parameter at a point P. Then there exists
a unique derivation Dt : F(X) → F(X) such that Dt(t) = 1. Furthermore
Der(X) is one dimensional over F(X) and Dt is a basis element for every local
parameter t.
Definition 13.9.5 A rational differential form or differential on X is an F(X)-
linear map from Der(X) to F(X). The set of all rational differential forms on
X is denoted by Ω(X).
Again Ω(X) becomes a vector space over F(X) in the obvious way. Consider
the map
d : F(X) −→ Ω(X),
where for f ∈ F(X) the differential df : Der(X) → F(X) is defined by df(D) =
D(f) for all D ∈ Der(X). Then d is a derivation.
Theorem 13.9.6 The space Ω(X) has dimension 1 over F(X) and dt is a basis
for every point P with local parameter t.
So for every point P and local parameter tP , a differential ω can be represented
in a unique way as ω = fP dtP , where fP is a rational function. The obvious
definition for “the value “ of ω in P by ω(P) = fP (P) has no meaning, since it
depends on the choice of tP . Despite of this negative result it is possible to say
whether ω has a pole or a zero at P of a certain order.

Definition 13.9.7 Let ω be a differential on X. The order or valuation of ω
in P is defined by ordP (ω) = vP (ω) = vP (fP ). The differential form ω is called
regular if it has no poles. The regular differentials on X form an F[X]-module,
which we denote by Ω[X].
This definition does not depend on the choices made.
If X is an affine plane curve defined by the equation F = 0 with F ∈ F[X, Y ],
then Ω[X] is generated by dx and dy as an F[X]-module with the relation fxdx+
fydy = 0.
Example 13.9.8 We again look at the curve X in P2
given by X3
+Y 3
+Z3
= 0
in characteristic unequal to three. We define the sets Ux by Ux = {(x : y : z) ∈
X | y = 0, z = 0} and similarly Uy and Uz. Then Ux, Uy, and Uz cover X since
there is no point on X where two coordinates are zero. It is easy to check that
the three representations
ω =
y
z
2
d
x
y
on Ux, η =
z
x
2
d
y
z
on Uy, ζ =
x
y
2
d
z
x
on Uz
define one differential on X. For instance, to show that η and ζ agree on Uy ∩Uz
one takes the equation (x/z)3
+ (y/z)3
+ 1 = 0, differentiates, and applies the
formula d(f−1
) = −f−2
df to f = z/x.
The only regular functions on X are constants, so one cannot represent this
differential as g df with f and g regular functions on X.
Now the divisor of a differential is defined as for functions.
Definition 13.9.9 The divisor (ω) of the differential ω is defined by
(ω) =
P ∈X
vP (ω)P.
Of course, one must show that only finitely many coefficients in (ω) are not 0.
Let ω be a differential and W = (ω). Then W is called a canonical divisor. If ω
is another nonzero differential, then ω = fω for some rational function f. So
(ω ) = W ≡ W and therefore the canonical divisors form one equivalence class.
This class is also denoted by W. Now consider the space L(W). This space of
rational functions can be mapped onto an isomorphic space of differential forms
by f → fω. By the definition of L(W), the image of f under the mapping is a
regular differential form, that is to say, L(W) is isomorphic to Ω[X].
Definition 13.9.10 Let X be a smooth projective curve over F. We define the
genus g of X by g = l(W).
Example 13.9.11 Consider the differential dx on the projective line. Then dx
is regular at all points Pa = (a : 1), since x − a is a local parameter in Pa and
dx = d(x − a). Let Q = (1 : 0) be the point at infinity. Then t = 1/x is a local
parameter in Q and dx = −t−2
dt. So vQ(dx) = −2. Hence (dx) = −2Q and
l(−2Q) = 0. Therefore the projective line has genus zero.

13.10. THE RIEMANN-ROCH THEOREM 419
The genus of a curve will play an important role in the following sections. For
methods with which one can determine the genus of a curve, we must refer to
textbooks on algebraic geometry. We mention one formula without proof, the
so-called Plücker formula.
Theorem 13.9.12 If X is a nonsingular projective curve of degree m in P2
,
then
g =
1
2
(m − 1)(m − 2).
Example 13.9.13 The genus of a line and a nonsingular conic are zero by
Theorem 13.9.12. In fact a curve of genus zero is isomorphic to the projective
line. For example the curve X with equation XZ − Y 2
= 0 of Example ?? is
isomorphic to P1
where the isomorphism is given by (x : y : z) → (x : y) = (y : z)
for (x : y : z) ∈ X. The inverse map is given by (u : v) → (u2
: uv : v2
).
Example 13.9.14 So the curve of Examples ??, 13.8.5 and 13.9.8 has genus
1 and by the definition of genus, L(W) = F, so regular differentials on X are
scalar multiples of the differential ω of Example 13.9.8.
For the construction of codes over algebraic curves that generalize Goppa codes,
we shall need the concept of residue of a differential at a point P. This is defined
in accordance with our treatment of local behavior of a differential ω.
Definition 13.9.15 Let P be a point on X, t a local parameter at P and
ω = f dt the representation of ω. The function f can be written as i aiti
. We
define the residue ResP (ω) of ω in the point P to be a−1.
One can show that this algebraic definition of the residue does not depend on
the choice of the local parameter t.
One of the basic results in the theory of algebraic curves is known as the residue
theorem. We only state the theorem.
Theorem 13.9.16 If ω is a differential on a smooth projective curve X, then
P ∈X
ResP (ω) = 0.
13.10 The Riemann-Roch theorem
The following theorem, known as the Riemann-Roch theorem is not only a cen-
tral result in algebraic geometry with applications in other areas, but it is also
the key to the new results in coding theory.
Theorem 13.10.1 Let D be a divisor on a smooth projective curve of genus g.
Then, for any canonical divisor W
l(D) − l(W − D) = deg(D) − g + 1.
We do not give the proof. The theorem allows us to determine the degree of
canonical divisors.

Corollary 13.10.2 For a canonical divisor W, we have deg(W) = 2g − 2.
Proof. Everywhere regular functions on a projective curve are constant, that
is to say, L(0) = F, so l(0) = 1. Substitute D = W in Theorem 13.10.1 and the
result follows from Definition 13.9.10.
Example 13.10.3 It is now clear why in Example 13.8.10 the space L(3Q)
has dimension 3. By Example 13.9.14 the curve X has genus 1, the degree of
W − 3Q is negative, so l(W − 3Q) = 0. By Theorem 13.10.1 we have l(3Q) = 3.
At first, Theorem 13.10.1 does not look too useful. However, Corollary 13.10.2
provides us with a means to use it successfully.
Corollary 13.10.4 Let D be a divisor on a smooth projective curve of genus g
and let deg(D) 2g − 2. Then
l(D) = deg(D) − g + 1.
Proof. By Corollary 13.10.2, deg(W − D) 0, so by Theorem 13.8.9(i),
l(W − D) = 0.
Example 13.10.5 Consider the code of Theorem ??. We embed the affine
plane in a projective plane and consider the rational functions on the curve
defined by G. By Bézout’s theorem, this curve intersects the line at infinity,
that is to say, the line defined by Z = 0, in m points. These are the possible
poles of our rational functions, each with order at most l. So, in the terminology
of Definition 13.8.8, we have a space of rational functions, defined by a divisor
D of degree lm. Then Corollary 13.10.4 and Theorem ?? imply that the curve
defined by G has genus at most equal to m−1
2 . This is exactly what we find
from the Plücker formula 13.9.12.
Let m be a non-negative integer. Then l(mP) ≤ l((m − 1)P) + 1, by the
argument as in the proof of Theorem 13.8.9.
Definition 13.10.6 If l(mP) = l((m − 1)P), then m is called a (Weierstrass)
gap of P. A non-negative integer that is not a gap is called a nongap of P.
The number of gaps of P is equal to the genus g of the curve, since l(iP) =
i + 1 − g if i 2g − 2, by Corollary 13.10.4 and
1 = l(0) ≤ l(P) ≤ · · · ≤ l((2g − 1)P) = g.
If m ∈ N0, then m is a nongap of P if and only if there exists a rational function
which has a pole of order m in P and no other poles. Hence, if m1 and m2
are nongaps of P, then m1 + m2 is also a nongap of P. The nongaps form
the Weierstrass semigroup in N0. Let (ρi|i ∈ N) be an enumeration of all the
nongaps of P in increasing order, so ρ1 = 0. Let fi ∈ L(ρiP) be such that
vP (fi) = −ρi for i ∈ N. Then f1, . . . , fi provide a basis for the space L(ρiP).
This will be the approach of Sections 3-7.
The term l(W − D) in Theorem 13.10.1 can be interpreted in terms of differen-
tials. We introduce a generalization of Definition 13.8.8 for differentials.

13.11. CODES FROM ALGEBRAIC CURVES 421
Definition 13.10.7 Let D be a divisor on a curve X. We define
Ω(D) = {ω ∈ Ω(X) | (ω) − D 0}
and we denote the dimension of Ω(D) over F by δ(D), called the index of
speciality of D.
The connection with functions is established by the following theorem.
Theorem 13.10.8 δ(D) = l(W − D).
Proof. If W = (ω), we define a linear map φ : L(W − D) → Ω(D) by
φ(f) = fω. This is clearly an isomorphism.
Example 13.10.9 If we take D = 0, then by Definition 13.9.10 there are ex-
actly g linearly independent regular differentials on a curve X. So the differential
of Example 13.9.8 is the only regular differential on X (up to a constant factor)
as was already observed after Theorem 13.9.12.
13.11 Codes from algebraic curves
We now come to the applications to coding theory. Our alphabet will be Fq. Let
F be the algebraic closure of Fq. We shall apply the theorems of the previous
sections. A few adaptations are necessary, since for example, we consider for
functions in the coordinate ring only those that have coefficients in Fq. If the
affine curve X over Fq is defined by a prime ideal I in Fq[X1, . . . , Xn], then its
coordinate ring Fq[X] is by definition equal to Fq[X1, . . . , Xn]/I and its function
field Fq(X) is the quotient field of Fq[X]. It is always assumed that the curve
is absolutely irreducible, this means that the the defining ideal is also prime in
F[X1, . . . , Xn]. Similar adaptions are made for projective curves. Notice that
F(x1, . . . , xn)q
= F(xq
1, . . . , xq
n) for all F ∈ Fq[X1, . . . , Xn]. So if (x1, . . . , xn)
is a zero of F and F is defined over Fq, then (xq
1, . . . , xq
n) is also a zero of F.
Let Fr : F → F be the Frobenius map defined by Fr(x) = xq
. We can extend
this map coordinatewise to points in affine and projective space. If X is a curve
defined over Fq and P is a point of X, then Fr(P) is also a point of X, by the
above remark. A divisor D on X is called rational if the coefficients of P and
Fr(P) in D are the same for any point P of X. The space L(D) will only be
considered for rational divisors and is defined as before but with the restriction
of the rational functions to Fq(X). With these changes the stated theorems
remain true over Fq in particular the theorem of Riemann-Roch 13.10.1.
Let X be an absolutely irreducible nonsingular projective curve over Fq. We
shall define two kinds of algebraic geometry codes from X. The first kind
generalizes Reed-Solomon codes, the second kind generalizes Goppa codes. In
the following, P1, P2, . . . , Pn are rational points on X and D is the divisor P1 +
P2 + · · · + Pn. Furthermore G is some other divisor that has support disjoint
from D. Although it is not necessary to do so, we shall make more restrictions
on G, namely
2g − 2 deg(G) n.

Definition 13.11.1 The linear code C(D, G) of length n over Fq is the image
of the linear map α : L(G) → Fn
q defined by α(f) = (f(P1), f(P2), . . . , f(Pn)).
Codes of this kind are called geometric Reed-Solomon codes.
Theorem 13.11.2 The code C(D, G) has dimension k = deg(G) − g + 1 and
minimum distance d ≥ n − deg(G).
Proof. (i) If f belongs to the kernel of α, then f ∈ L(G − D) and by Theorem
13.8.9(i), this implies f = 0. The result follows from the assumption 2g − 2
deg(G) n and Corollary 13.10.4.
(ii) If α(f) has weight d, then there are n − d points Pi, say Pi1
, Pi2
, . . . , Pin−d
,
for which f(Pi) = 0. Therefore f ∈ L(G − E), where E = Pi1
+ · · · + Pin−d
.
Hence deg(G) − n − d ≥ 0.
Note the analogy with the proof of Theorem ??.
Example 13.11.3 Let X be the projective line over Fqm . Let n = qm
− 1. We
define P0 = (0 : 1), P∞ = (1 : 0) and we define the divisor D as
n
j=1 Pj, where
Pj = (βj
: 1), (1 ≤ j ≤ n). We define G = aP0 +bP∞, a ≥ 0, b ≥ 0. (Here β is a
primitive nth root of unity.) By Theorem 13.10.1, L(G) has dimension a+b+1
and one immediately sees that the functions (x/y)i
, −a ≤ i ≤ b, form a basis of
L(G). Consider the code C(D, G). A generator matrix for this code has as rows
(βi
, β2i
, . . . , βni
) with −a ≤ i ≤ b. One easily checks that (c1, c2, . . . , cn) is a
codeword in C(D, G) if and only if
n
j=1 cj(βl
)j
= 0 for all l with a l n−b.
It follows that C(D, G) is a Reed-Solomon code. The subfield subcode with
coordinates in Fq is a BCH code.
Example 13.11.4 Let X be the curve of Examples ??, 13.8.5, 13.8.10 and
13.10.3. Let G = 3Q, where Q = (0 : 1 : 1). We take n = 8, so D is the sum of
the remaining rational points. The coordinates are given by
Q P1 P2 P3 P4 P5 P6 P7 P8
x 0 0 0 1 α α 1 α α
y 1 α α 0 0 0 1 1 1
z 1 1 1 1 1 1 0 0 0
where α = α2
= 1+α. We saw in Examples 13.8.10 and 13.10.3 that 1, x/(y+z)
and y/(y + z) are a basis of L(3Q) over F and hence also over F4. This leads to
the following generator matrix for C(D, G):


1 1 1 1 1 1 1 1
0 0 1 α α 1 α α
α α 0 0 0 1 1 1

 .
By Theorem 13.11.2, the minimum distance is at least 5 and of course, one
immediately sees from the generator matrix that d = 5.
We now come to the second class of algebraic geometry codes. We shall call
these codes geometric Goppa codes.
Definition 13.11.5 The linear code C∗
(D, G) of length n over Fq is the image
of the linear map α∗
: Ω(G − D) → Fn
q defined by
α∗
(η) = (ResP1
(η), ResP2
(η), . . . , ResPn
(η)).
The parameters are given by the following theorem.

13.11. CODES FROM ALGEBRAIC CURVES 423
Theorem 13.11.6 The code C∗
(D, G) has dimension k∗
= n − deg(G) + g − 1
and minimum distance d∗
≥ deg(G) − 2g + 2.
Proof. Just as in Theorem 13.11.2, these assertions are direct consequences
of Theorem 13.10.1 (Riemann-Roch), using Theorem 13.10.8 (making the con-
nection between the dimension of Ω(G) and l(W − G)) and Corollary 13.10.2
(stating that the degree of a canonical divisor is 2g − 2).
Example 13.11.7 Let L = {α1, . . . , αn} be a set of n distinct elements of Fqm .
Let g be a polynomial in Fqm [X] which is not zero at αi for all i. The (classical)
Goppa code Γ(L, g) is defined by
Γ(L, g) = {c ∈ Fn
q |
ci
X − αi
≡ 0 (mod g )}.
Let Pi = (αi : 1), Q = (1 : 0) and D = P1 + · · · + Pn. If we take for E the
divisor of zeros of g on the projective line, then Γ(L, g) = C∗
(D, E − Q) and
c ∈ Γ(L, g) if and only if
ci
X − αi
dX ∈ Ω(E − Q − D).
This is the reason that some authors extend the definiton of geometric Goppa
codes to subfield subcodes of codes of the form C∗
(D, G).
It is a well-known fact that the parity check matrix of the Goppa code Γ(L, g)
is equal to the following generator matrix of a generalized RS code





g(α1)−1
. . . g(αn)−1
α1g(α1)−1
. . . αng(αn)−1
... · · ·
...
αr−1
1 g(α1)−1
. . . αr−1
n g(αn)−1





,
where r is the degree of the Goppa polynomial g. So Γ(L, g) is the subfield
subcode of the dual of a generalized RS code. This is a special case of the
following theorem.
Theorem 13.11.8 The codes C(D, G) and C∗
(D, G) are dual codes.
Proof. From Theorem 13.11.2 and Theorem 13.11.6 we know that k + k∗
= n.
So it suffices to take a word from each code and show that the inner product of
the two words is 0. Let f ∈ L(G), η ∈ Ω(G − D). By Definitions 13.11.1 and
13.11.5, the differential fη has no poles except possibly poles of order 1 in the
points P1, P2, . . . , Pn. The residue of fη in Pi is equal to f(Pi)ResPi
(η). By
Theorem 13.9.16, the sum of the residues of fη over all the poles, that is to say,
over the points Pi, is equal to zero. Hence we have
0 =
n
i=1
f(Pi)ResPi (η) = α(f), α∗
(η) .
Several authors prefer the codes C∗
(D, G) over geometric RS codes but the
nonexperts in algebraic geometry probably feel more at home with polynomials
than with differentials. That this is possible without loss of generality is stated
in the following theorem.

Theorem 13.11.9 Let X be a curve defined over Fq. Let P1, . . . , Pn be n ra-
tional points on X. Let D = P1 +· · ·+Pn. Then there exists a differential form
ω with simple poles at the Pi such that ResPi
(ω) = 1 for all i. Furthermore
C∗
(D, G) = C(D, W + D − G)
for all divisors G that have a support disjoint from the support of D, where W
is the divisor of ω.
So one can do without differentials and the codes C∗
(D, G). However, it is
useful to have both classes when treating decoding methods. These use parity
checks, so one needs a generator matrix for the dual code.
In the next paragraph we treat several examples of algebraic geometry codes.
It is already clear that we find some good codes. For example from Theorem
13.11.2 we see that such codes over a curve of genus 0 (the projective line) are
MDS codes. In fact, Theorem 13.11.2 says that d ≥ n − k + 1 − g, so if g is
small, we are close to the Singleton bound.
13.12 Rational functions and divisors on plane
curves
This section will be finished together with the correction of Section 7.
rational cycles, Frobenius, divisors..... rational functions discrete valuation,
discrete valuation ring.
Example 13.12.1 Consider the curve X with homogeneous equation
Xa
Y c
+ Y b+c
Za−b
+ Xd
Za+c−d
= 0
with d b a as in Example 13.3.10. The divisor of the rational function x/z
is
x
z
= (X · L) − (X · N) = (b + c)P − bQ − cR.
The divisor of the rational function y/z is
y
z
= (X · M) − (X · N) = dP − aQ − (a − d)R.
Hence the divisor of (x/z)α
(y/z)β
is
((b + c)α + dβ)P + (−bα − aβ)Q + (−cα + (a − d)β)R.
It has only a pole at Q if and only if cα ≤ (a − d)β. (This will serve as a
motivation for the choice of the basis of R in Proposition ??.)
13.13 Resolution or normalization of curves
13.14 Newton polygon of plane curves
[?]

13.15. NOTES 425
13.15 Notes
Goppa submitted his seminal paper [?] in June 1975 and it was published in
1977. Goppa also published three more papers in the eighties [?, ?, ?] and a
book [?] in 1991.
Most of this section is standard textbook material. See for instance [?, ?, ?, ?]
to mention a few. Section 13.4 is a special case of Goppa’s construction and
comes from [?]. The Hermitian curves in Example 13.2.4 and their codes have
been studied by many authors. See [?, ?, ?, ?]. The Klein curve goes back to F.
Klein [?] and has been studied thoroughly, also over ﬁnite ﬁelds in connection
with codes. See [?, ?, ?, ?, ?, ?, ?, ?].

428 CHAPTER 14. CURVES
14.1 Algebraic varieties
14.2 Curves
14.3 Curves and function ﬁelds
14.4 Normal rational curves and Segre’s prob-
lems
14.5 The number of rational points
14.5.1 Zeta function
14.5.2 Hasse-Weil bound
14.5.3 Serre’s bound
14.5.4 Ihara’s bound
14.5.5 Drinfeld-Vl˘adut¸ bound
14.5.6 Explicit formulas
14.5.7 Oesterl´e’s bound
14.6 Trace codes and curves
14.7 Good curves
14.7.1 Maximal curves
14.7.2 Shimura modular curves
14.7.3 Drinfeld modular curves
14.7.4 Tsfasman-Vl˘adut¸-Zink bound
14.7.5 Towers of Garcia-Stichtenoth

14.8. APPLICATIONS OF AG CODES 429
14.8 Applications of AG codes
14.8.1 McEliece crypto system with AG codes
14.8.2 Authentication codes
Here we consider an application of AG-codes to authentication. Recall that in
Chapter 10, Section 10.3.1 we started to consider authentication codes that are
constructed via almost universal and almost strongly universal hash functions.
They, in turn, can be constructed using error-correction codes. We recall two
methods of constructing authentication codes (almost strongly universal hash
function to be precise) from error-correcting codes:
1. Construct AU-families from codes as per Proposition 10.3.7 and then use
Stinson’s composition method, Theorem 10.3.10.
2. Construct ASU-families directly from error-correcting codes.
As an example we mentioned ASU-families constructed as in (1.) using Reed-
Solomon codes, Exercise 10.3.2. Now we would like to move on and present a
general construction of almost universal hash functions that employs AG-codes.
The following proposition formulates the result we need.
Proposition 14.8.1 Let C be an algebraic curve over Fq with N + 1 rational
points P0, P1, . . . , PN . Fix P = Pi for some i = 0, . . . , N and let WS(P) =
{0, w1, w2, . . . } be the Weierstraß semigroup of P. Then for each j ≥ 1 one can
construct an almost universal hash family − U(N, qj
, q), where ≤ wj/N.
Proof. Indeed, construct an AG-code C = CL(D, wjP), where the divisor
D is deﬁned as D = k=i Pk and P = Pi. So C is obtained as an image of
the evaluation map for the functions that have a pole only at P and its order is
bounded by wj. From ?? we have that length of C is N, dim C = dim L(wjP) =
j, and d(C) ≥ N − deg(wjP) = N − wj. So 1 − d(C)/N ≤ wj and now the
claim easily follows.
As an example of this proposition, we show next how one can obtain AU-families
from Hermitian curves.
Proposition 14.8.2 For every prime power q and every i ≤ q, Hermitian curve
yq
+ y = xq+1
over Fq2 yields
i
q2
− U(q3
, qi2
+i
, q2
).
Proof. Recall from ?? that Hermitian curve over Fq2 has q3
+ 1 rational
points P1, . . . , Pq3 , P∞. Construct C = CL(D, wiP), where P = P∞ is a place
at inﬁnity, D =
q3
i=1 Pi, and WS(P) = {0, w1, w2, . . . }. It is known that the
Weierstraß semigroup WS(P) is generated by q and q + 1.
Let us show that w(i+1
2 ) = iq for all i ≤ q. We proceed by induction. For i = 1
we have w1 = q, which is obviously true. Then suppose that for some i ≥ 1 we
have w(i
2) = (i − 1)q and want to prove w(i+1
2 ) = iq. Clearly, for this we need to
show that there is exactly i−1 non-gaps between (i−1)q and iq (these numbers
themselves are not included in the count). So for the non-gaps aq + b(q + 1)

430 CHAPTER 14. CURVES
that lie between (i − 1)q and iq we have: (i − 1)q aq + b(q + 1) iq. Thus,
automatically, a i. We have then
(i − a − 1)
q
q + 1
b (i − a)
q
q + 1
. (14.1)
So from here we see that 0 a i−1, because for a = i−1 we have b q/(q+1),
which is not possible. So there are i − 1 values of a, namely 0, . . . , i − 2, which
could give rise to a non-gap. The interval from (14.1) has length q/(q + 1) 1.
So it may contain at most one integer. If i−a q+1, then (i−a−1)q/(q+1)
i − a − 1 (i − a)q/(q + 1). And thus an integer i − a − 1 is always in that
interval if i − a q + 1. But for 0 a i − 1, the condition i − a q + 1 is
always full filled, since i ≤ q by the assumption. Thus for every 0 a i − 1,
there exists exactly one b = i−a−1, such that aq +b(q +1) lies between (i−1)q
and iq. It is also easily seen that all these non-gaps are different. So, indeed,
w(i+1
2 ) = iq for all i ≤ q.
Now the claim follows form Proposition 14.8.1.
As a consequence we have
Corollary 14.8.3 Let a, b be positive integers such that b ≤ a ≤ 2b and qa
is a
square. Then there exists
2
qb
− SU(q5a/2+b
, qaq2(a−b)
/2
, qb
).
Proof. Do the ”Hermitian” construction from the previous proposition over
Fqa and i = qa−b
. Then the claim follows from Theorem 10.3.10 and Exercise
10.3.2.
*** Suzuki curves? ***
To get some feeling about all these, the reader is advised to solve Exercise 14.8.1.
Now we move to (2.). We would like to show the direct construction of Xing et.
al ?? that uses AG-codes.
Theorem 14.8.4 Let C be an algebraic curve over Fq of genus g and let R be
some set of rational points of C. Let G be a positive divisor such that |R|
deg(G) ≥ 2g + 1 and R ∩ supp(G) = ∅. Then there exists − ASU(N, n, m)
with N = q|R|, n = qdeg(G)−g+1
, m = q, and = deg(G)/|R|.
Proof. Consider the set H = {h(P,α) : L(G) → Fq|h(P,α)(f) = f(P) +
α, f ∈ L(G)}. Take H as functions in the definition of an ASU-family; set
X = L(G), Y = Fq. Then |X| = dim L(G) = deg(G) − g + 1, because deg(G) ≥
2g + 1 2g − 1.
It can be shown (see Exercise 14.8.2) that if deg(G) ≥ 2g + 1, then |H| = q|R|.
It is also easy to see that for any a ∈ L(G) and any b ∈ Fq there exists exactly
|R| = |H|/q functions from H that map a to b. This proves the first part of
being ASU. As to the second part consider
m = max
a1=a2∈L(G);b1,b2∈Fq
|{h(P,α) ∈ H|h(P,α)(a1) = b1; h(P,α)(a2) = b2}| =
= max
a1=a2∈L(G);b1,b2∈Fq
|{(P, α) ∈ R × Fq|(a1 − a2 − b1 + b2)(P) = 0; a2(P) + α = b2}|.

14.9. NOTES 431
As a1 −a2 ∈ L(G){0} and b1 −b2 ∈ Fq we see that a1 −a2 −b1 +b2 ∈ L(G){0}.
Note that there cannot be more than deg(G) zeros of a1 − a2 − b1 + b2 among
points in R (cf. ??). Since α in (P, α) is uniquely determined by P ∈ R,
we see that there are at most deg(G) pairs (P, α) ∈ R × Fq that satisfy both
(a1 − a2 − b1 + b2)(P) = 0, a2(P) + α = b2. In other words,
m ≤ deg(G) =
deg(G) · |H|
|R| · q
.
We can take now = deg(G)/|R| in Definition 10.3.8.
Again we present here a concrete result coming from Hermitian codes.
Corollary 14.8.5 Let q be a prime power and let an integer q3
d ≥ q(q−1)+1
be given. Then there exists
d
q3
− ASU(q5
, qd−q(q−1)/2+1
, q2
).
Proof. Consider again the Hermitian curve over Fq2 . Take any rational
point P and construct C = CL(D, G), where D = P =P P is the sum of all
remaining rational points (there is q3
of them), and G = dP. Then the claim
follows directly from the previous theorem.
For a numerical example we refer again to Exercise 14.8.1.
14.8.3 Fast multiplication in finite fields
14.8.4 Correlation sequences and pseudo random sequences
14.8.5 Quantum codes
14.8.6 Exercises
14.8.1 Suppose we would like to obtain an authentication code with PS =
2−20
≥ PI and log |S| ≥ 234
. Give the parameters of such an authentication
code using the following constructions. Compare the results.
• OA-construction as per Theorem 10.3.5.
• RS-construction as per Exercise 10.3.2.
• Hermitian construction as per Corollary 14.8.3.
• Hermitian construction as per Corollary 14.8.5.
14.8.2 Let H = {h(P,α) : L(G) → Fq|h(P,α)(f) = f(P) + α, f ∈ L(G)} as in
the proof of Theorem 14.8.4. Prove that if deg(G) ≥ 2g + 1, then |H| = q|R|.
14.9 Notes

Bibliography
[1]
[2]
[3] N. Abramson. Information theory and coding. McGraw-Hill, New York,
1963.
[4] A.V. Aho, J.E. Hopcroft, and J.D. Ulman. The design and analysis of
computer algorithms. Addison-Wesley, Reading, 1979.
[5] M. Aigner. Combinatorial theory. Springer, New York, 1979.
[6] A. Ashikhmin and A. Barg. Minimal vectors in linear codes. IEEE Trans-
actions on Information Theory, 44(5):2010–2017, 1998.
[7] C.A. Athanasiadis. Characteristic polynomials of subspace arrangements
and finite fields. Advances in Mathematics, 122:193–233, 1996.
[8] A. Barg. The matroid of supports of a linear code. AAECC, 8:165–172,
1997.
[9] A. Barg. Complexity issues in coding theory. In V.S. Pless and W.C.
Huffman, editors, Handbook of coding theory, volume 1, pages 649–754.
North-Holland, Amsterdam, 1998.
[10] E.R. Berlekamp. Key papers in the development of coding theory. IEEE
Press, New York, 1974.
[11] E.R. Berlekamp. Algebraic coding theory. Aegon Park Press, Laguna Hills,
1984.
[12] D.J. Bernstein, J. Buchmann, and E. Dahmen, editors. Post-Quantum
Cryptography. Springer-Verlag, Berlin Heidelberg, 2009.
[13] J. Bierbrauer, T. Johansson, G. Kabatianskii, and B. Smeets. On families
of hash functions via geometric codes and concatenation. In Advances in
Cryptology – CRYPTO ’93. Lecture Notes in Computer Science, volume
773, pages 331–342, 1994.
[14] N. Biggs. Algebraic graph theory. Cambridge University Press, Cambridge,
1993.
433

434 BIBLIOGRAPHY
[15] E. Biham and A. Shamir. Differential cryptanalysis of DES-like cryp-
tosystems. In Advances in Cryptology – CRYPTO ’90. Lecture Notes in
Computer Science, volume 537, pages 2–21, 1990.
[16] G. Birkhoff. On the number of ways of coloring a map. Proc. Edinburgh
Math. Soc., 2:83–91, 1930.
[17] A. Björner and T. Ekedahl. Subarrangments over finite fields: Chomo-
logical and enumerative aspects. Advances in Mathematics, 129:159–187,
1997.
[18] J.E. Blackburn, N.H. Crapo, and D.A. Higgs. A catalogue of combinatorial
geometries. Math. Comp., 27:155–166, 1973.
[19] R.E. Blahut. Theory and practice of error control codes. Addison-Wesley,
Reading, 1983.
[20] R.E. Blahut. Algebraic codes for data transmission. Cambridge University
Press, Cambridge, 2003.
[21] I.F. Blake. Algebraic coding theory: History and development. Dowden,
Hutchinson and Ross, Stroudsburg, 1973.
[22] G.R. Blakely. Safeguarding cryptographic keys. In Proceedings of 1979
national Computer Conf., pages 313–317, New York, 1979.
[23] G.R. Blakely and C. Meadows. Security of ramp schemes. In Advances in
Cryptology – CRYPTO ’84. Lecture Notes in Computer Science, volume
196, pages 242–268, 1985.
[24] A. Blass and B.E. Sagan. Möbius functions of lattices. Advances in Math-
ematics, 129:94–123, 1997.
[25] T. Britz. MacWilliams identities and matroid polynomials. The Electronic
Journal of Combinatorics, 9:R19, 2002.
[26] T. Britz. Relations, matroids and codes. PhD thesis, Univ. Aarhus, 2002.
[27] T. Britz. Extensions of the critical theorem. Discrete Mathematics,
305:55–73, 2005.
[28] T. Britz. Higher support matroids. Discrete Mathematics, 307:2300–2308,
2007.
[29] T. Britz and C.G. Rutherford. Covering radii are not matroid invariants.
Discrete Mathematics, 296:117–120, 2005.
[30] T. Britz and K. Shiromoto. A macwillimas type identity for matroids.
Discrete Mathematics, 308:4551–4559, 2008.
[31] T. Brylawski. A decomposition for combinatorial geometries. Transactions
of the American Mathematical Society, 171:235–282, 1972.
[32] T. Brylawski and J. Oxley. Intersection theory for embeddings of matroids
into uniform geometries. Stud. Appl. math., 61:211–244, 1979.

BIBLIOGRAPHY 435
[33] T. Brylawski and J. Oxley. Several identities for the characteristic polyno-
mial of a combinatorial geometry. Discrete Mathematics, 31(2):161–170,
1980.
[34] T.H. Brylawski and J.G. Oxley. The tutte polynomial and its applications.
In N. White, editor, Matroid Applications. Cambridge University Press,
Cambridge, 1992.
[35] J. Buchmann. Introduction to Cryptography. Springer, Berlin, 2004.
[36] J.P. Buhler, H.W. Lenstra Jr., and C. Pomerance. Factoring integers with
the number field sieve. In A.K. Lenstra and H.W. Lenstra Jr., editors,
The development of the number field sieve. Lecture Notes in Computer
Science, volume 1554, pages 50–94. Springer, Berlin, 1993.
[37] L. Carlitz. The arithmetic of polynomials in a galois field. American
Journal of Mathematics, 54:39–50, 1932.
[38] P. Cartier. Les arrangements d’hyperplans: un chapitre de géométrie
combinatoire. Seminaire N. Bourbaki, 561:1–22, 1981.
[39] H. Chen and R. Cramer. Algebraic geometric secret sharing schemes and
secure multi-party computations over small fields. In C. Dwork, editor,
Advances in Cryptology – CRYPTO 2006. Lecture Notes in Computer
[40] C. Cid and H. Gilbert. AES security report,
ECRYPT, IST-2002-507932. Available online at
http://www.ecrypt.eu.org/ecrypt1/documents/D.STVL.2-1.0.pdf.
[41] H. Cohen and G. Frey. Handbook of elliptic and hyperelliptic curve cryp-
tography. CRC Press, Boca Raton, 2006.
[42] H. Crapo. Möbius inversion in lattices. Archiv der Mathematik, 19:595–
607, 1968.
[43] H. Crapo. The tutte polynomial. Aequationes Math., 3:211–229, 1969.
[44] H. Crapo and G.-C. Rota. On the foundations of combinatorial theory:
Combinatorial geometries. MIT Press, Cambridge MA, 1970.
[45] J. Daemen and R. Vincent. The design of rijndael. Springer, Berlin, 1992.
[46] J. Daemen and R. Vincent. The wide trail design strategy. In B. Honary,
editor, Cryptography and Coding 2001. Lecture Notes in Computer Sci-
ence, volume 2260, pages 222–238. Springer, Berlin, 2001.
[47] W. Diffie. The first ten years of public key cryptography. In J. Simmons,
editor, Contemporary Cryptology: The Science of Information Integrity,
pages 135–176. IEEE Press, Piscataway, 1992.
[48] W. Diffie and M.E. Hellman. New directions in cryptography. IEEE Trans.
Inform. Theory, 22:644–654, 1976.

436 BIBLIOGRAPHY
[49] J. Ding, J.E. Gower, and D.S. Schmidt. Multivariate Public Key Cryp-
tosystems. Advances in Information Security. Springer Science+Business
Media, LLC, 2006.
[50] J.L. Dornstetter. On the equivalence of the Berlekamp-Massey and the
Euclidean algorithms. IEEE Trans. Inform. Theory, 33:428–431, 1987.
[51] W.M.B. Dukes. On the number of matroids on a finite set. Séminaire
Lotharingien de Combinatoire, 51, 2004.
[52] I. Duursma. Algebraic geometry codes: general theory. In D. Ruano
E. Martinéz-Moro, C. Munuera, editor, Advances in algebraic geometry
codes, pages 1–48. World Scientific, New Jersey, 2008.
[53] I.M. Duursma. Decoding codes from curves and cyclic codes. PhD thesis,
Eindhoven University of Technology, 1993.
[54] I.M. Duursma and R. Kötter. Error-locating pairs for cyclic codes. IEEE
Trans. Inform. Theory, 40:1108–1121, 1994.
[55] T. ElGamal. A public key cryptosystem and a signature scheme based on
discrete logarithms. IEEE Trans. Inform. Theory, 31:469–472, 1985.
[56] G. Etienne and M. Las Vergnas. Computing the Tutte polynomial of a
hyperplane arrangement. Advances in Applied Mathematics, 32:198–211,
2004.
[57] L. Euler. Solutio problematis ad geometriam situs pertinentis. Commen-
tarii Academiae Scientiarum Imperialis Petropolitanae, 8:128–140, 1736.
[58] E.N. Gilbert, F.J. MacWilliams, and N.J.A. Sloan. Codes, which detect
deception. Bell Sys. Tech. J., 33(3):405–424, 1974.
[59] C. Greene. Weight enumeration and the geometry of linear codes. Studies
in Applied Mathematics, 55:119–128, 1976.
[60] C. Greene and T. Zaslavsky. On the interpretation of whitney numbers
through arrangements of hyperplanes, zonotopes, non-radon partitions
and orientations of graphs. Trans. Amer. Math. Soc., 280:97–126, 1983.
[61] R.W. Hamming. Error detecting and error correcting codes. Bell System
Techn. Journal, 29:147–160, 1950.
[62] R.W. Hamming. Coding and Information Theory. Prentice-Hall, New
Jersey, 1980.
[63] T. Helleseth, T. Kløve, and J. Mykkeltveit. The weight distribution of
irreducible cyclic codes with block lengths n1((ql
− 1)/n). Discrete Math-
ematics, 18:179–211, 1977.
[64] M. Hermelina and K. Nyberg. Correlation properties of the bluetooth com-
biner generator. In Dan Boneh, editor, Information Security and Cryptol-
ogy ICISC 1999. Lecture Notes in Computer Science, volume 1787, pages
17–29. Springer, Berlin, 2000.

BIBLIOGRAPHY 437
[65] A.E. Heytmann and J.M. Jensen. On the equivalence of the Berlekamp-
Massey and the Euclidean algorithm for decoding. IEEE Trans. Inform.
Theory, 46:2614–2624, 2000.
[66] L.J. Hoffman. Modern methods for computer security and privacy.
Prentice-Hall, New Jersey, 1977.
[67] W.C. Huffman and V.S. Pless. Fundamentals of error-correcting codes.
Cambridge University Press, Cambridge, 2003.
[68] R.P.M.J. Jurrius. Classifying polynomials of linear codes. Master’s thesis,
Leiden University, 2008.
[69] J. Justesen. On the complexity of decoding Reed-Solomon codes. IEEE
Trans. Inform. Theory, 22:237–238, 1976.
[70] E.D. Karnin, J.W. Greene, and M.E. Hellman. On secret sharing systems.
IEEE Trans. Inform. Theory, 29(1):35–31, 1983.
[71] T. Kløve. The weight distribution of linear codes over GF(ql
) having
generator matrix over GF(q). Discrete Mathematics, 23:159–168, 1978.
[72] T. Kløve. Support weight distribution of linear codes. Discrete Matemat-
ics, 106/107:311–316, 1992.
[73] D.E. Knuth. The asymptotic number of geometries. J. Comb. Theory Ser.
A, 16:398–400, 1974.
[74] R. Lidl and H. Niederreiter. Introduction to finite fields and their appli-
cations. Cambridge University Press, Cambridge, 1994.
[75] S. Lin and D.J. Costello. Error control coding : fundamentals and appli-
cations. Prentice-Hall, New Jersey, 1983.
[76] J.H. van Lint. Mathematics and the compact disc. Nieuw Archief voor
Wiskunde, 16:183–190, 1998.
[77] David MacKay. Information theory, inference and learning algorithms.
Cambridge University Press, Cambridge, 2003.
[78] F.J. MacWilliams and N.J.A. Sloane. The theory of error-correcting codes.
Elsevier Sc. Publ., New York, 1977.
[79] J.L. Massey. Shift-register synthesis and BCH decoding. IEEE Trans.
Inform. Theory, 15:122–127, 1969.
[80] J.L. Massey. Minimal codewords and secret sharing. In In Proc. Sixth
Joint Swedish-Russian Workshop on Information theory, Molle, Sweden,
pages 276–279, 1993.
[81] J.L. Massey. On some applications of coding theory. In Cryptography,
Codes and Ciphers: Cryptography and Coding IV, pages 33–47. 1995.
[82] M. Matsui. Linear cryptanalysis method for DES cipher. In T. Helleseth,
editor, Advances in Cryptology - EUROCRYPT 1993. Lecture Notes in
Computer Science, volume 765, pages 386–397. Springer, Berlin, 1994.

438 BIBLIOGRAPHY
[83] K.S. McCurley. A key distribution system equivalent to factoring. Journal
of Cryptology, 1:95–105, 1988.
[84] R.J. McEliece. The theory of information and coding. Addison-Wesley
Publ. Comp., Reading, 1977.
[85] R.J. McEliece and D.V. Sawate. On sharing secrets and Reed-Solomon
codes. Communications of ACM, 24:583–584, 1981.
[86] R.J. McEliece and L. Swanson. Reed-Solomon codes and the exploration
of the solar system. In S.B. Wicker and V.K. Bhargava, editors, Reed-
Solomon codes and their applications, pages 25–40. IEEE Press, New York,
1994.
[87] A. Menezes, P. van Oorschot, and S. Vanstone. Handbook of ap-
plied cryptography. CRC Press, Boca Raton, 1996. Available online
http://www.cacr.math.uwaterloo.ca/hac/.
[88] C.J. Mitchell, F. Piper, and P. Wild. Digital signatures. In J. Simmons,
editor, Contemporary Cryptology: The Science of Information Integrity,
pages 325–378. IEEE Press, New York, 1992.
[89] J. Nechvatal. Public key cryptography. In J. Simmons, editor, Contem-
porary Cryptology: The Science of Information Integrity, pages 177–288.
IEEE Press, New York, 1992.
[90] National Institute of Standards and Technology. Data encryp-
tion standard (DES). Federal Information Processing Stan-
dards Publication, National Technical Information Service,
Springfield, VA, Apr., reaffirmed version available online at
http://csrc.nist.gov/publications/fips/fips46-3/fips46-3.pdf,
46(3), 1977.
[91] National Institute of Standards and Technology. Advanced en-
cryption standard (AES). Federal Information Standards Publication
http://www.csrc.nist.gov/publications/fips/fips197/fips-197.pdf,
197(26), 2001.
[92] P. Orlik and H. Terao. Arrangements of hyperplanes, volume 300.
Springer-Verlag, Berlin, 1992.
[93] W.W. Peterson and E.J. Weldon. Error-correcting codes. MIT Pres, Cam-
bridge, 1972.
[94] J. Pieprzyk and X.M. Zhang. Ideal threshold schemes from mds codes. In
G. Goos, J. Hartmanis, and J. van Leeuwen, editors, Information Security
and Cryptology ICISC 2002. Lecture Notes in Computer Science, volume
2587, pages 269–279. Springer, Berlin, 2003.
[95] V.S. Pless and W.C. Huffman. Handbook of coding theory. Elsevier Sc.
Publ., New York, 1998.
[96] C. Pomerance. Factoring. In C. Pomerance, editor, Cryptology and Com-
putational Number Theory, volume 42, pages 27–47. American Mathemat-
ical Society, Rhode Island, 1990.

BIBLIOGRAPHY 439
[97] M. Rabin. Digitalized signatures and public-key functions as intractable
as factorization. MIT/LCS/TR-212, 1979.
[98] C.T. Retter. Bounds on Goppa codes. IEEE Trans. Inform. Theory,
22(4):476–482, 1976.
[99] T. Richardson and R. Urbanke. Modern coding theory. Cambridge Uni-
versity Press, Cambridge, 2008.
[100] R.L. Rivest, A. Shamir, and L.M. Adleman. A method for obtaining dig-
ital signatures and public-key cryptosystems. Communications of ACM,
21:120–126, 1977.
[101] G.-C. Rota. On the foundations of combinatorial theory I: Theory of
möbius functions. Zeit. für Wahrsch., 2:340–368, 1964.
[102] R.A. Rueppel. Analysis and Design of Stream Ciphers. Springer-Verlag,
Berlin, 1986.
[103] R. Safavi-Naini, H. Wang, and C. Xing. Linear authentication codes:
Bounds and constructions. In C. Pandu Rangan and C. Ding, editors,
Advances in Cryptology - INDOCRYPT 2001. Lecture Notes in Computer
[104] D. Sarwate. On the complexity of decoding Goppa codes. IEEE Trans.
Inform. Theory, 23:515–516, 1977.
[105] K.A. Schouhamer Immink. Reed-Solomon codes and the compact disc. In
S.B. Wicker and V.K. Bhargava, editors, Reed-Solomon codes and their
applications, pages 41–59. IEEE Press, New York, 1994.
[106] A. Shamir. How to share a secret. Communications of ACM, 22:612–613,
1979.
[107] A. Shannon. A mathematical theory of communication. Bell Syst. echn.
Journal, 27:379–423, 623–656, 1948.
[108] P. Shor. Polynomial time algorithms for prime factorization and discrete
logarithms on a quantum computer. SIAM Journal of Computing, 26:1484.
[109] J. Simonis. The effective length of subcodes. AAECC, 5:371–377, 1993.
[110] A.N. Skorobogatov. Linear codes, strata of grassmannians, and the prob-
lems of segre. In H. Stichtenoth and M.A. Tfsafsman, editors, Coding
Theory and Algebraic Geometry, Lecture Notes Math. vol 1518, pages 210–
223. Springer-Verlag, Berlin, 1992.
[111] D. Slepian. Key papers in the development of information theory. IEEE
Press, New York, 1974.
[112] M.E. Smid and D.K. Branstad. The data encryption standard: Past and
future. In J. Simmons, editor, Contemporary Cryptology: The Science of
Information Integrity, pages 43–64. IEEE Press, New York, 1992.
[113] R.P. Stanley. Enumerative combinatorics, vol. 1. Cambridge University
Press, Cambridge, 1997.

440 BIBLIOGRAPHY
[114] R.P. Stanley. An introduction to hyperplane arrangements. In Geomet-
ric combinatorics, IAS/Park City Math. Ser., 13, pages 389–496. Amer.
Math. Soc., Providence, RI, 2007.
[115] D.R. Stinson. The combinatorics of authentication and secrecy. Journal
of Cryptology, 2:23–49, 1990.
[116] D.R. Stinson. Combinatorial characterization of authentication codes.
Designs, Codes and Cryptography, 2:175–187, 1992.
[117] D.R. Stinson. Cryptography, theory and practice. CRC Press, Boca Raton,
1995.
[118] Y. Sugiyama, M. Kasahara, S. Hirasawa, and T. Namekawa. A method
for solving the key equation for decoding Goppa codes. Information and
Control, 27:87–99, 1975.
[119] W.T. Tutte. A ring in graph theory. Proc. Cambridge Philos. Soc., 43:26–
40, 1947.
[120] W.T. Tutte. An algebraic theory of graphs. PhD thesis, Univ. Cambridge,
1948.
[121] W.T. Tutte. A contribution to the theory of chromatic polynomials. Cana-
dian Journal of Mathematics, 6:80–91, 1954.
[122] W.T. Tutte. Matroids and graphs. Transactions of the American Mathe-
matical Society, 90:527–552, 1959.
[123] W.T. Tutte. Lectures on matroids. J. Res. Natl. Bur. Standards, Sect. B,
69:1–47, 1965.
[124] W.T. Tutte. On the algebraic theory of graph coloring. J. Comb. Theory,
1:15–50, 1966.
[125] W.T. Tutte. On dichromatic polynomials. J. Comb. Theory, 2:301–320,
1967.
[126] W.T. Tutte. Cochromatic graphs. J. Comb. Theory, 16:168–174, 1974.
[127] W.T. Tutte. Graphs-polynomials. Advences in Applied Mathematics,
32:5–9, 2004.
[128] J.H. van Lint and R. M. Wilson. A course in combinatorics. Cambridge
University Press, Cambridge, 1992.
[129] H. Whitney. The colorings of graphs. Ann. Math., 33:688–718, 1932.
[130] H. Whitney. A logical expansion in mathematics. Bull. Amer. Math. Soc.,
38:572–579, 1932.
[131] H. Whitney. On the abstract properties of linear dependence. Amer. J.
Math., 57:509–533, 1935.
[132] G. Whittle. A charactrization of the matroids representable over gf(3)
and the rationals. J. Comb. Theory Ser. B, 65(2):222–261, 1995.

BIBLIOGRAPHY 441
[133] G. Whittle. On matroids representable over gf(3) and other ﬁelds. Trans.
Amer. Math. Soc., 349(2):579–603, 1997.
[134] S.B. Wicker. Deep space applications. In V.S. Pless and W.C. Huﬀman,
editors, Handbook of coding theory, volume 2, pages 2119–2169. North-
Holland, Amsterdam, 1998.
[135] S.B. Wicker and V.K. Bhargava. Reed-Solomon codes and their applica-
tions. IEEE Press, New York, 1994.
[136] R.J. Wilson and J.J. Watkins. Graphs; An introductory approach. J. Wiley
Sons, New York, 1990.
[137] A.C.-C. Yao. Protocols for secure computations. Extended Abstract, 23rd
Annual Symposium on Foundations of Computer Science, FOCS 1982,
pages 160–164, 1982.
[138] J. Yuan and C. Ding. Secret sharing schemes from three classes of linear
codes. IEEE Trans. Inform. Theory, 52(1):206–212, 2006.
[139] T. Zaslavsky. Facing up to arrangements: Face-count fomulas for parti-
tions of space by hyperplanes. Mem. Amer. Math. Soc. vol. 1, No. 154,
Amer. Math. Soc., 1975.
[140] T. Zaslavsky. Signed graph colouring. Discrete. Math., 39:215–228, 1982.

Index
action, 40
adjacent, 123
Adleman, 311
AES, 302
algebra, 14
algorithm
APGZ, 233
basic, 273
eﬃcient, 35
Euclidean, 191, 277
Sudan, 286
Sugiyama, 277
alphabet, 18
anti-symmetric, 136
arc, 95
Arimoto, 233
arrangement, 98
central, 98
essential, 98
graphic, 127
simple, 98
array
orthogonal, 159, 160
linear, 160
atom, 141
atomic, 141
attack
adaptive chosen-ciphertext, 300
adaptive chosen-plaintext, 300
chosen-plaintext, 300
ciphertext
chosen, 300
ciphertext-only, 300
known-plaintext, 300
related-key, 300
authentication code, 318
MAC, 317
message, 317
automorphism, 41
monomial, 42
permutation, 42
axiom
circuit elimination, 130
independence augmentation, 128
balanced, 84
ball, 19
basis, 22
external, 131
internal, 131
Berlekamp, 334
bilinear, 30
binary cipher, 329
binomial
Gaussian, 39, 93
Bose, 214
bound
AB, 223
BCH, 214
Bush, 161
Gilbert, 72
Gilbert-Varshamov, 73
greatest lower, 139
Griesmer, 68
Hamming, 69
HT, 220
Johnson, 282
least upper, 139
Plotkin, 71
Roos, 222
shift, 225
Singleton, 63
sphere-covering, 70
sphere-packing, 69
Varshamov, 73
bound:redundancy, 74
box
delay, 330
S, 304
Brickell, 317
bridge, 124
broken
442

INDEX 443
partially, 300
totally, 300
capacity, 14, 37
error-correcting, 34, 282
chain
extension of, 142
maximal, 142
of length, 137
channel
binary symmetric, 36
q-ary symmetric, 36
character, 85
principal, 85
characteristic, 201
Chaudhuri, 214
Chinese remainder, 207
cipher
alphabetic substitution, 298
block, 296
Caesar, 296
Feistel, 302
iterative block, 302
permutation, 297
self-synchronizing stream, 330
stream, 329
substitution, 296, 298
transposition , 297
Vigenére, 298
ciphertext, 295, 296
confusion, 298
diffusion, 297
circuit, 130
class
parallel, 91
closed
algebraically, 202
closure, 145
cocircuit, 130
code
alternant, 259
augmented, 51
BCH, 215
narrow sense, 215
primitive, 215
block, 18
Cauchy, 67
generalized, 67
concatenated, 61
constant weight, 72
convolutional, 18
cycle, 127
degenerate, 93
dual, 30
complementary, 31
self, 31
weakly self, 31
error-correcting, 13
expurgated, 52
extended, 48
extension by scalars, 104, 251
Gallager, 127
genus, 102
geometric Goppa, 422
Golay, 60
ternary, 32
Goppa, 260
graph, 127
sparse, 127
Tanner, 127
Hamming, 22, 29
hull, 31
inner, 61
lengthened, 52
linear, 21
low-density parity check, 127
maximum distance separable, 64
MDS, 64
Melas, 228
orthogonal, 30, 31
self, 31
outer, 61
projective, 94
punctured, 47, 107
reduced, 145
Reed-Muller, 266
Reed-Solomon, 241
extended, 243
generalized, 66, 243
geometric, 422
residual, 68
restricted, 47
restriction by scalars, 214, 251
reverse, 196
shortened, 50, 107
simplex, 29
sub, 22
even weight, 22
subfield, 214, 251
super, 22, 214, 251

444 INDEX
trivial, 23
Zetterberg, 228
codeword, 18
minimal, 156
nearest, 34
codimension, 92
coding
source, 13
color, 124
combinatorics, 14
comparable, 136
complexity
data, 301
implementation, 302
linear, 334
component
connected, 127
compression
data, 13
computation
secure multi-party, 349
conjecture
MDS, 104
connected, 126
connection, 331
consecutive, 216
constant
S-box, 306
constraints, 160
construction
(a + x|b + x|a + b − x), 58
(u + v|u − v), 57
(u|u + v), 56
(u|v), 56
contraction
matroid, 133
coordinate
homogeneous, 91
coordinates
homogeneous
standard, 92
correlation immune, 334
coset, 35, 191
cyclotomic, 208
coset leader, 35
cover, 139
Cramer, 234
cryptanalysis
algebraic, 307
cryptography
multivariate, 316
cryptosystem, 295
asymmetric, 296, 309
knapsack, 316
public, 296
RSA, 310, 311
symmetric, 295
cycle, 126
cyclic, 43, 189
Daemen, 302
decision
hard, 16
soft, 16
decoder, 34
complete, 34
coset leader, 35
half the minimum distance, 34
incomplete, 15
list, 16, 35, 281
minimum distance, 34
nearest neighbor, 34
decoding
correct, 34
decryption, 296
El Gamal, 315
McEliece, 337
Niederreiter, 339
RSA, 312
deletion, 149
matroid, 133
Delsarte, 258
demodulation, 16
dependent, 128
depth, 159
derivation, 417
derivative
formal, 202
DES, 302
triple, 346
detection
error, 26
diagram
Hasse, 139, 142
Diﬃe, 309, 316
dimension, 21
distance
bounded, 34
designed minimum, 215
Hamming, 18

INDEX 445
minimum, 19
distribution
weight, 79
division with rest, 191
divisor, 415
canonical, 418
degree, 415
effective, 415
greatest common, 191
principal, 416
support, 415
DLP, 310
domain
Euclidean, 191
dual, 30, 93
self
formally, 88
quasi, 44
edge, 123
acyclic, 124
El Gamal, 314
Elias, 281
elimination
Gaussian, 24
embedded, 92
encoder, 18
systematic, 25
encoding, 23, 195
encryption, 296
AES, 307
confusion, 304
DES, 304
diffusion, 304
El Gamal, 315
McEliece, 337
Niederreiter, 339
RSA, 312
end, 123
entropy function, 75
enumerator
weight, 79
average, 84
coset leader, 35
extended, 105
generalized, 113
homogeneous, 79
equality
modular, 142
equation
Key, 236
equivalent, 41
generalized, 42
monomial, 42
permutation, 42
error, 34
decoding, 15, 34
number of, 27, 34
undetected, 89
Euclid, 191, 277
EuclidSugiyama, 277
Euler, 206
evaluation, 243
expand, 302
explicit, 26, 91
exponent
universal, 317
failure
decoding, 15, 34
family
almost universal, 322
feedback, 331
Feistel, 302
field
Galois, 203
prime, 201
splitting, 202
sub, 201
finite
locally, 137
flat, 145
force
brute, 301
form
differential, 417
formula
closed, 234
deletion-contraction, 126, 134
deletion-restriction, 149
Möbius inversion, 138
Stirling, 75
Forney, 236
Fourier, 229
free
square, 261
Frobenius, 254
function
Euler’s phi, 140
Möbius, 137

446 INDEX
one-way, 310
trapdoor, 310
state, 330
sum, 138
symmetric, 231
zeta, 157
Galois, 254
gap, 420
Weierstrass, 420
Gauss, 24, 39
generator
clock-controlled, 334
nonlinear combination, 334
nonlinear ﬁlter, 334
shrinking, 335
generic, 234
genus, 418
Gilbert, 72, 73
girth, 126
Golay, 60
good
asymptotically, 75
Goppa, 260
Gorenstein, 233
graph, 123
coloring, 124
connected, 127
contraction, 126
deletion, 126
planar, 124
simple, 123
greatest common divisor, 277
Griesmer, 68
group
automorphism, 42
monomial, 42
permutation, 42
dihedral, 197
Galois, 254
general linear, 38
symmetric, 40
Hamming, 14, 16, 18
Hartmann, 220
Hellman, 309, 316
hierarchy
weight, 111
Hocquenghem, 214
hyperplane, 92, 97
homogeneous, 97
projective, 92
ideal, 191
generated, 191
identity
generalized Newton, 231
MacWilliams, 85
image, 40
impersonation, 318
implicit, 26, 91
incident, 92, 123
independent, 224
index, 160
inequality
semimodular, 142
triangle, 19
information theory, 14
inner product, 30
interpolation
Lagrange, 245
interval, 136
invariant, 40, 43, 254
permutation, 43
inversion, 306
isometry, 40
linear, 40
isomorphic
graph, 124
matroid, 129
poset, 141
isthmus
graph, 124
matroid, 129
Jeﬀerson cylinder, 345
Johnson, 282
join, 139
juxtaposition, 56
Kasiski method, 299
kernel, 272
key
decryption, 296, 310
dual, 305
encryption, 296, 310
private, 309
public, 309
schedule, 302
secret, 296

INDEX 447
size, 302
weak, 297, 304
semi, 305
key generation
El Gamal, 314
Niederreiter, 339
RSA, 312
key space, 310
keyspace, 318
keystream, 329
Lagrange, 245
lattice, 139
free, 147
geometric, 141
modular, 141
of flats, 145
rank, 141
semimodular, 141
super solvable, 147
Leibniz, 202
length, 18
level, 142
levels, 160
LFSR, 329
line
affine, 90
at infinity, 91
parallel, 91
projective, 90
lines, 91
Lint, van, 223
locator, 230
error, 233, 273
logarithm, 203
Zech, 203
loop, 144
graph, 123
matroid, 128
Möbius, 137
MacWilliams, 85
map
F2-linear, 306
authentication, 318
evaluation, 242, 243
Frobenius, 254
trace, 258
Massey, 334
matrix
Cauchy, 67
generalized, 67
diagonal, 40
generator, 23
incidence, 127
monomial, 40
parity check, 26
permutation, 40
reduced, 145
syndrome, 227, 272
matroid, 128
basis, 128
cographic, 131
cycle, 130
dimension, 128
dual, 129
free, 128
graphic, 131
rank, 128
realizable, 129
representable, 129
simple, 128, 144
uniform, 128
Mattson, 229
maximum, 138
McEliece, 295
meet, 139
Melas, 228
memory, 301
Merkle, 316
message, 18
method
Stinson’s composition, 323
minimum, 138
modulation, 16
monotone, 141
strictly, 141
morphism
graph, 124
matroid, 129
Muller, 266
neighbor, 123
Newton, 201, 231
Niederreiter, 295
nondegenerate, 30
nongap, 420
operations
elementary row, 24

448 INDEX
order, 158, 159
linear, 136
partial, 136
output, 329
pad
one-time, 329
pair
error-correcting, 274
parallel, 144
graph, 123
matroid, 128
parameter, 21
parametric, 91
parity check, 26
path, 126
pay-off, 319
perfect, 70
period, 297, 333
periodic, 333
ultimately, 333
permutation
substitution, 305
Peterson, 233
pivots, 25
plaintext, 295, 296
plane
affine, 91
projective, 91
Plotkin, 71
point
anti-fixed, 305
at infinity, 91
fixed, 305
ratinal, 157
points, 91
polynomial
characteristic, 146, 331
two variable, 146
chromatic, 124
coboundary, 146
cyclotomic, 207
error-evaluator, 236
error-locator, 233
generator, 193
locator, 230
Möbius, 146
Mattson-Solomon, 229
minimal, 205
parity check, 197
Poincaré, 146
syndrome, 235
Tutte, 131
Whitney rank generating, 131
poset, 136
position
error, 34, 233
general, 95
primitive, 203
cryptographic, 296
principle
inclusion/exclusion, 99
of inclusion/exclusion, 140
probabilistic, 14
probability
correct decoding, 36
cross-over, 36
deception, 318
decoding error, 36
decoding failure, 36
error, 14, 36
retransmission, 89
problem
Diffie-Hellman, 316
discrete logarithm, 310, 314
DLP, 310, 314
multivariate quadratic, 316
RSA, 313
subset sum, 316
processing, 301
product, 202
direct, 53
Kronecker, 53
star, 220, 271
tensor, 53
projective plane, 92
property
Jordan-Hölder, 142
pseudorandom, 334
quotient, 191
radius
covering, 69
decoding, 34, 282
rate
error, 36
information, 14, 18
rational, 91
reciprocal, 197

INDEX 449
monic, 197
redundant, 13, 18
Reed, 241, 266
reflexive, 136
register
linear feedback shift, 329
relation
reverse, 139
representation
exponential, 203
principal, 201
residue, 419
restriction, 149
retransmission, 26, 89
reverse, 196
Riemann, 419
Rijmen, 302
Rijndael, 302
ring
factor, 191
Rivest, 311
Roch, 419
row reduced echelon form, 24
scheme
El Gamal, 310
Rabin, 310
secret sharing, 325
ideal, 349
perfect, 348
threshold, 325
secrecy
perfect, 301
security
computational, 301
provable, 301
state-of-the-art, 301
unconditional, 301
seed, 329
semigroup
Weierstrass, 420
sequence, 333
initial state, 331
linear recurring, 330
homogeneous, 331
maximal period, 333
super increasing, 316
set
check, 28
complete defining, 211
defining, 211
generating, 221
information, 25
root, 211
shift, 220
zero, 211
Shamir, 311, 317, 325
Shannon, 14, 37
Shestakov, 343
shift
cyclic, 189
Sidelnikov, 258, 343
sieve
number field, 347
Singleton, 63
size, 160
Solomon, 229, 241
space, 92
ciphertext, 295, 309
key, 296
message, 318
null, 273
plaintext, 295, 309
projective, 92
spectrum
weight, 79
sphere, 19
split-knowledge scheme, 328
square
Latin, 158
Greek, 159
mutually orthogonal, 159
standard
advanced encryption, 302
AES, 302
data encryption, 302
DES, 302
state, 329
initial, 330
source, 318
Stinson, 323
storage, 301
strategy
wide trail, 307, 346
strength, 160
structure
access, 349
subcode
minimal, 156
subgraph, 126

450 INDEX
subset
independent
maximal , 128
subspace
affine, 92
substitution, 298, 318
Sudan, 281, 286
Sugiyama, 277
sum
direct, 56
support, 21, 111
symmetric, 30
syndrome, 27, 227
known, 232
system
projective, 93
equivalent, 94
simple, 93
systematic, 25, 196
tag
authentication, 318
time, 301
transform
discrete Fourier, 229
transformation
decryption, 296
encryption, 296
fractional, 248
projective, 94
round, 302
transitive, 136
trapdoor, 310
trivial, 189
Tzeng, 220
value, 230
error, 34
Vandermonde, 65
variety
affine, 157
Varshamov, 73
vector
error, 27, 34
Venn-diagram, 16
vertex, 123
Weierstrass, 420
weight, 21, 111
constant, 30
even, 22
generalized, 111
minimum, 21
Whitney number, 148
first kind, 148
second kind, 148
Wilson, 223
word
message, 18
source, 18
Wozencraft, 281
Zetterberg, 228
Zierler, 233

Error correcting codes and cryptology

More Related Content

What's hot (20)

Similar to Error correcting codes and cryptology (20)

More from Rosemberth Rodriguez (6)

Recently uploaded (20)

Error correcting codes and cryptology