0% found this document useful (0 votes)
158 views

Introduction To Algebraic Coding Theory - 2022

Uploaded by

Milan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views

Introduction To Algebraic Coding Theory - 2022

Uploaded by

Milan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 266

Contemporary Mathematics and Its Applications:

Monographs, Expositions and Lecture Notes

Print ISSN: 2591-7668


Online ISSN: 2591-7676

Series Editor
M Zuhair Nashed (University of Central Florida)

Editorial Board
Guillaume Bal Palle Jorgensen
Gang Bao Marius Mitrea
Liliana Borcea Otmar Scherzer
Raymond Chan Frederik J Simons
Adrian Constantin Edriss S Titi
Willi Freeden Luminita Vese
Charles W Groetsch Hong-Kun Xu
Mourad Ismail Masahiro Yamamoto

This series aims to inspire new curriculum and integrate current research into texts. Its aims
and main scope are to publish:
– Cutting-edge Research Monographs
– Mathematical Plums
– Innovative Textbooks for capstone (special topics) undergraduate and graduate level
courses
– Surveys on recent emergence of new topics in pure and applied mathematics
– Advanced undergraduate and graduate level textbooks that may initiate new directions
and new courses within mathematics and applied mathematics curriculum
– Books emerging from important conferences and special occasions
– Lecture Notes on advanced topics
Monographs and textbooks on topics of interdisciplinary or cross-disciplinary interest are
particularly suitable for the series.

Published

Vol. 3 Introduction to Algebraic Coding Theory


by Tzuong-Tsieng Moh

Vol. 2 Derived Functors and Sheaf Cohomology


by Ugo Bruzzo & Beatriz Graña Otero

Vol. 1 Frontiers in Orthogonal Polynomials and q-Series


edited by M Zuhair Nashed & Xin Li
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

Library of Congress Control Number: 2022000598

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library.

Contemporary Mathematics and Its Applications: Monographs, Expositions


and Lecture Notes — Vol. 3
INTRODUCTION TO ALGEBRAIC CODING THEORY
Copyright © 2022 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy
is not required from the publisher.

ISBN 978-981-122-096-8 (hardcover)


ISBN 978-981-122-097-5 (ebook for institutions)
ISBN 978-981-122-098-2 (ebook for individuals)

For any available supplementary material, please visit


https://www.worldscientific.com/worldscibooks/10.1142/11849#t=suppl

Desk Editors: Soundararajan Raghuraman/Lai Fun Kwong

Typeset by Stallion Press


Email: [email protected]

Printed in Singapore
TO

My wife Ping
This page intentionally left blank
Preface

This is a compilation of lecture notes from a course for second-year


graduate students at Purdue University, who are only required to have
some background in abstract algebra. The development of algebraic coding
theory correlates to the development of algebra and with a delay of about
one hundred years. Together, the lecture notes form the following four parts
in this book: Part I is about the elementary theory of algebraic coding.
Mathematically, the tools used are vector spaces and linear algebra. Part II
is about various rings and the associated algebraic codings. A fast decoding
method is presented. Part III is a survey of the useful parts of algebraic
geometry. Part IV is about geometric coding theory.
The ultimate goal of this text is to introduce the students to the modern
geometric Goppa codes. Since the elementary coding theory is assumed
to be of interest only to students in engineering or computer science and
hence unknown to many students in mathematics, we will begin with some
discussions of this elementary material. It is naturally progressed to the
ring-theoretic codes, mainly BCH codes, Reed–Solomon codes, and classical
Goppa codes, which can easily be discussed in the context of polynomial and
rational function rings and formal power-series rings. A complete treatment
of decoding ring codes is given in Part II.
One of the challenges is to present a survey of the useful parts of
algebraic geometry, which is an elegant and important topic in mathematics.
There are many ways to teach the subject. In the chapter on algebraic
geometry, Part III, we adopt Chevalley’s algebraic approach to the curve
theory and provide many examples to illustrate the theorems. Many of

vii
viii Introduction to Algebraic Coding Theory

the proofs are consigned to classic books by Zariski and Samuel [16],
Walker [15], Chevalley [10], and Mumford [13].
The final part of this text is devoted to the well-known geometric Goppa
codes [25]. Their early decoding processes depend on linear algebra only and
are elementary. For the decoding processes, we give explicit descriptions and
try to present them as naturally as possible. Further discussions involving
the remarkable concepts of Feng and Rao [21] on majority voting will also
be presented in this way.
The field of coding theory is too rich to be covered in a one-semester
course. We have added appendices to discuss the familiar topics such
as convolution codes, sphere-packing problem, other interesting codes, and
Berlekamp’s algorithm which might be beneficial to interested readers who
wish to have a wide scope of understanding of the related materials.
This book is written to be concise. There are about 204 pages for a one-
semester course. We hope that it will be useful to students and working
algebraic geometers alike in understanding the booming field of coding. We
would like to thank W. Heinzer for reading the whole book and making
some valuable suggestions and B. Lucier for commenting on Parts I & II of
our manuscript.
We wish to thank Ms. Rochelle Kronzek, executive editor at World
Scientific Publishing Company, for her constant enthusiasm on initializing
this project. We are grateful to Ms. Lai Fun Kwong, managing editor at
WSP, for her prompt communications and support during the book writing.
We wish to thank the anonymous referee who improved this book and
Mr. T. R. Soundararajan for taking care of the final form of the book.
About the Author

T. T. Moh is an Emeritus Professor of Mathe-


matics at Purdue University, specializing in algebra
and algebraic geometry. He received his PhD in
mathematics from Purdue University in 1969 and
became an Assistant Professor there afterward. He
also spent time at the Institute for Advanced Study
in Princeton and was on the faculty of University
of Minnesota before rejoining Purdue University in
1976.

ix
This page intentionally left blank
Contents

Preface vii
About the Author ix
Introduction xv

Part I: Vector Space Codes 1


Chapter 1. Linear Codes 3
1.1 Real Curve Codes . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Vector Space Codes . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Maximum Likelihood Decoding . . . . . . . . . . . . . . . 12
1.6 Dual Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 Hamming Code . . . . . . . . . . . . . . . . . . . . . . . . 17
1.8 Shannon’s Theorem . . . . . . . . . . . . . . . . . . . . . . 21
1.9 Gilbert–Varshamov’s Bound . . . . . . . . . . . . . . . . . 24

Part II: Ring Codes 29


Chapter 2. Rings 31
2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 The Finite Field Fq . . . . . . . . . . . . . . . . . . . . . . 32
2.3 The Computer Programs for Finite Fields . . . . . . . . . 37

xi
xii Introduction to Algebraic Coding Theory

2.4 The Total Quotient Rings . . . . . . . . . . . . . . . . . . 41


2.5 The Ring F[x] . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.6 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.7 Power Series Rings F[[x]] and Fields of Meromorphic
Functions F((x)) . . . . . . . . . . . . . . . . . . . . . . . 55

Chapter 3. Ring Codes 61


3.1 BCH Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Decoding a Primitive BCH Code . . . . . . . . . . . . . . 67
3.3 Reed–Solomon Codes . . . . . . . . . . . . . . . . . . . . . 74
3.4 Classical Goppa Codes . . . . . . . . . . . . . . . . . . . . 80

Part III: Algebraic Geometry 87


Chapter 4. Algebraic Geometry 89
4.1 Affine Spaces and Projective Spaces . . . . . . . . . . . . 90
4.2 Affine Algebraic Varieties . . . . . . . . . . . . . . . . . . 93
4.3 Regular Functions and Rational Functions . . . . . . . . . 95
4.4 Affine Algebraic Curves . . . . . . . . . . . . . . . . . . . 100
4.5 Projective Algebraic Curves . . . . . . . . . . . . . . . . . 104
4.6 Riemann’s Theorem . . . . . . . . . . . . . . . . . . . . . 110
4.7 Riemann–Roch Theorem I . . . . . . . . . . . . . . . . . . 118
4.8 Riemann–Roch Theorem II . . . . . . . . . . . . . . . . . 122
4.9 Weil’s Conjecture and Hasse–Weil Inequality . . . . . . . 131

Part IV: Algebraic Geometric Codes 139


Chapter 5. Algebraic Curve Goppa Codes 141
5.1 Geometric Goppa Codes . . . . . . . . . . . . . . . . . . . 141
5.2 Comparisons with Reed–Solomon Code . . . . . . . . . . . 150
5.3 Improvement of Gilbert–Varshamov’s Bound . . . . . . . . 153

Chapter 6. Decoding the Geometric Goppa Codes 159


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2 Error Locator . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.3 SV Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.4 DU Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.5 Feng–Rao’s Majority Voting . . . . . . . . . . . . . . . . . 190
Contents xiii

Appendices 207
Appendix A Convolution Codes 209
A.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . 209
A.2 Combining and Splitting . . . . . . . . . . . . . . . . . . . 211
A.3 Smith Normal Form . . . . . . . . . . . . . . . . . . . . . 211
A.4 Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . 213

Appendix B Sphere-Packing Problem and Weight Enumerators 217


B.1 Golay Code . . . . . . . . . . . . . . . . . . . . . . . . . . 218
B.2 Uniformly Packed Code . . . . . . . . . . . . . . . . . . . 219
B.3 Weight Enumerators . . . . . . . . . . . . . . . . . . . . . 220

Appendix C Other Important Coding and Decoding Methods 223


C.1 Hadamard Codes . . . . . . . . . . . . . . . . . . . . . . . 223
C.2 Reed–Muller Code . . . . . . . . . . . . . . . . . . . . . . 224
C.3 Constructing New Code from Old Codes . . . . . . . . . . 227
C.4 Threshold Decoding . . . . . . . . . . . . . . . . . . . . . 229
C.5 Soft-Decision Decoding . . . . . . . . . . . . . . . . . . . . 231
C.6 Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . 232
C.7 Low-Density Parity Check (LDPC) Codes . . . . . . . . . 232

Appendix D Berlekamp’s Decoding Algorithm 233


D.1 Berlekamp’s Algorithm . . . . . . . . . . . . . . . . . . . . 234

References 237
Index 241
This page intentionally left blank
Introduction

All living beings use signals to communicate with each other. The signals,
also known as codes, can take the form of chemicals, sounds, colors, etc.
About two million years ago, humanity gained its own distinctiveness by
creating abstract signals, languages. All languages can be seen as codes.
Many historians try to decipher “lost” languages: the most famous one
was probably written hieroglyphs, which were deciphered using the Rosetta
stone. Since ancient times, poems have been used as a way of communicat-
ing oral tradition, including Illiad and Odyssey in Greek, Mahabharata and
Ramayana in Sanskrit, and Ode in Chinese. One advantage of the rhyme
and poetic structure of poetic verse over prose is that it is easy to find the
errors if they occur. In other words, poetry is the first “error-detecting”
form of communication.
In 1945, Erwin Schrödinger published a book entitled What is life?
The Physical Aspect of the Living Cell [3], in which he observed that
chromosomes are code-scripts and are molecules in nature. Ergo, there
must be a code of some kind which allowed molecules in a cell to carry
information. It motivated many scientists to study the codes transmitted
by living beings, eventually leading to the discovery of the double-helix
structures of DNA and RNA by James D. Watson and Francis Crick.
Poetry uses rhymes and molecules use chemical bonds to detect errors
and correct them. It is natural to impose some algebraic relations on
the symbols of letters for the same purpose. A computer scientist, R. W.
Hamming, used a primitive computer, by today’s standard, to perform his
research. At that time, scientists had to queue their work for the computer
to sequentially process. If the computer found errors (usually the errors

xv
xvi Introduction to Algebraic Coding Theory

were caused by misreading the program by the computer) in the program,


it would skip the task and proceed to the next one in the queue. The
researcher would have to correct the errors in the faulty program if there
is a mistake, in any case resubmit it to the queue, and wait several weeks
for the computer to find time to work on it again. Hamming was annoyed
by the wait and decided to create a code to prevent the computer from
misreading the program. His invention of “self-correcting code” introduces
some vector-space structures for words. This is a great invention and was
named after him. Let us introduce his idea as follows.
Let us consider GF(2) (=F2 ), the field of two elements {0, 1} which
will be called the set of letters, and message a1 a2 a3 a4 , where ai ∈ {0, 1}.
Hamming added three more symbols b1 b2 b3 by the following formula:
b 1 = a1 + a3 + a4 ,
b 2 = a1 + a2 + a3 , (1)
b 3 = a2 + a3 + a4 .
Then, Hamming used seven symbols a1 a2 a3 a4 b1 b2 b3 to carry the
message of four symbols a1 a2 a3 a4 . We may consider the following matrix
multiplication, with
⎛ ⎞
1 0 0 0 1 1 0
⎜0 1 0 0 0 1 1⎟
G=⎜ ⎝0 0 1 0 1 1 1⎠

0 0 0 1 1 0 1
and [a1 a2 a3 a4 ]×G = [a1 a2 a3 a4 b1 b2 b3 ]. The matrix G is called the generator
matrix. The ai ’s are called the message symbols. Furthermore, let
⎛ ⎞
1 1 0
⎜0 1 1⎟
⎜ ⎟
⎜1 1 1⎟
⎜ ⎟
⎜ ⎟
H = ⎜1 0 1⎟
⎜ ⎟
⎜1 0 0⎟
⎜ ⎟
⎝0 1 0⎠
0 0 1
and [a1 a2 a3 a4 ] × G × H = [000]. The matrix H is called the check matrix
and bi ’s are called the check symbols.
The decoding process is as follows. Suppose that the computer, for
whatever reasons, reads [a1 a2 a3 a4 b1 b2 b3 ] as [a1 a2 a3 a4 b1 b2 b3 ] which might
be different from the original string. However, this kind of error is
Introduction xvii

infrequent, so we may reasonably assume that there is at most one error,


i.e., either
[a1 a2 a3 a4 b1 b2 b3 ] = [a1 a2 a3 a4 b1 b2 b3 ]
or
[a1 a2 a3 a4 b1 b2 b3 ] = [a1 a2 a3 a4 b1 b2 b3 ] + [0 · · · 010 · · · 0].
The computer calculates
[a1 a2 a3 a4 b1 b2 b3 ] × H = [c1 c2 c3 ].
If [c1 c2 c3 ] = [000], then the above defining equations (1) for b1 , b2 , b3 show
that there is either no error or there is more than one error. Since we
assumed that there is at most one error, we may conclude there is no error,
so the computer should take the message [a1 a2 a3 a4 ]. If [c1 c2 c3 ] = [000],
then we have
[a1 a2 a3 a4 b1 b2 b3 ] × H = ([a1 a2 a3 a4 b1 b2 b3 ] + [0 · · · 010 · · · 0]) × H
= [0 · · · 010 · · · 0] × H
= [c1 c2 c3 ].
Therefore, [c1 c2 c3 ] must be one of the row vectors of the matrix H, and
thus the computer locates the position of the error. The computer simply
flips the bit at that position. In this way, the computer will correct the
code [a1 a2 a3 a4 b1 b2 b3 ] and take the first four bits of the corrected message
as the message. This code not only detects the error but also corrects the
error. Note that the location of the error must be detected before it can be
corrected, a process that will be referred to as finding error locator.
However, there might be two or more errors, in which case Hamming
code fails and the above method decodes the message to a wrong word. One
may assume that the possibility is rather small. The usage of Hamming
codes is to eradicate a single error, but it is ineffective if the errors are
multiple.
For a longer message, we can chop it into blocks (each with four bits),
padding the end if necessary. This produces a block code.
The principles of self-correcting in Hamming codes are valid today, and
widely used in communications through noisy channels. Academically, code
means self-correcting code. All channels of communications are noisy to
different degrees. The self-correcting codes have become prominent today.
Furthermore, although the above Hamming code is merely a simple exercise
in linear algebra, the implications were groundbreaking.
xviii Introduction to Algebraic Coding Theory

As important as Hamming’s work, around that time, C. E. Shannon


was developing his theory of information. Shannon wanted to know
how we measure information. Shannon took the idea of entropy from
thermodynamics, which is defined to be logarithm of disorder. The second
law of thermodynamics states: The entropy of an isolated system is always
increasing, for instance, if one mixes cold water with hot water, then one
gets warm water, simultaneously explaining the phenomenon of cooling,
diffusion and the direction of time. Shannon defines
information = −entropy.
We may normalize the information by adding 1 to avoid negatives. We have
information = 1 − entropy
as used in this book. Classically, given a discrete random variable X, i.e., one
whose range R = (x1 , x2 , . . .) is finite or countable. Let pi = P (X = xi ) =
the probability for xi to happen. The entropy H(X) of X is defined by
 1 
H(X) = pi log2 =− pi log2 (pi ).
p >0
pi p >0
i i

Let us take a simple example. Suppose we check one out of two boxes (e.g.,
male and female) to provide some information. Before choosing a box, both
boxes have equal chance of being selected, and we only pick one. So, before
the selection, the entropy is
1 1
log2 2 + log2 2 = 1 = 1 − information.
2 2
Therefore, the information is 0. After the selection, the entropy is
−1 log2 1 = 0 = 1 − information.
Therefore, the information is 1. The information gained is
information = 1 − 0 = 1.
For four boxes to be checked, for these requirements to be fulfilled, we
either use the definition of entropy directly, or group them into two subsets,
{box 1, box 2} and {box 3, box 4 }, then we pick one subset out of the two
subsets and one box from the subset of two boxes, so
information = 1 + 1 = 2.
The material world tends to homogenize distributions, for instance, air
tends to mix all components uniformly. These are the results of the increase
of entropy. On the other hand, living beings tend to select from a mixed
Introduction xix

things to use some components only. These separations of components


correspond to the increase of information. The natural world tends to
increase the entropy and decrease the information. The living beings tends
to increase the information and decrease the entropy. An ancient Chinese
philosopher Lao Tzu (the founder of Taoism, cf. Tao Te Ching, Chapter 77)
[2] said: “The heavenly Tao (i.e., the way of material world) takes from
those who have too much, and it gives to those who have little or nothing
(allude a flat plain is made of the erosion of maintain tops and filling
of the valley bottoms). Ah, but the human way is different. Even the
wealthiest leech the poor so they can have even more.” Later on, the
second part of his phrase was repeated in Matthew’s Gospel [1] 12:13, 25:29
and known as “Matthew effect” in modern sociology. In modern times,
Lao Tzu’s words can be rephrased as “the way of nature is to increase
entropy, the way of living beings is to increase information (i.e., to decrease
entropy).” According to the second law of thermodynamics, the amount of
entropy will increase or the amount of information will decrease, as it passes
through noisy channels, time. Hamming code is a way of protecting parts
of the information so they decrease less with the help of a proper decoding
procedure. The meaning of Hamming’s discovery can be better understood
using the framework of Shannon’s information theory. Hamming’s discovery
led to BCH codes, Reed–Solomon codes (which are widely used today), and
geometric Goppa codes. Similar to how a tall tree starts from a tiny seed, the
study of self-correcting codes eventually leads to many interesting problems
for algebraic-geometry enthusiasts.
Let us return to the codes of life, DNA and RNA. Today, they
are broken down into basic units called genes. There is an important
phenomenon called mutation which is caused by environmental cosmic
rays (lately, biologists add free radicals as other mutation agents) and is
a driving force behind evolution. The cosmic rays input energy and change
the configuration of the molecules of the genes. The mutations can be
considered as errors. Some mutations are detrimental and are not welcomed
by living beings. How does the body deal with these mutations? There may
be some other machination, say a self-correcting function, to prevent those
mutations from happening. We see that some genes are functional and
some are non-coding genes or junk genes, and genetic engineers typically
transplant only the functional genes to other species. We know that life is
very economical, so why is it that the non-coding genes are passed down
through the generations? Perhaps they exist for a self-correcting purpose,
similar to the additional symbols b1 b2 b3 in our example of Hamming code.
xx Introduction to Algebraic Coding Theory

It turns out that DNA has some proof-reading capabilities which RNA lacks,
although how those proof-reading capabilities function is unclear. These
capabilities slow down the rates of mutations considerably. The fact that
the lack of proof-reading capabilities for RNA make some viruses evolve
rapidly and this phenomenon causes many problems for an individual’s
health and even a worldwide pandemic problem. The transplant of a single
useful gene without the associate proof-reading capability might likely be
dangerous. The self-correcting codes that occur in nature might be better
than all of our algebraic coding theory.
Similar to the codes of life, civilizations and cultures themselves may
be viewed as the transmission of codes through time. Due to the nature
of decaying phenomena caused by historical events, thermodynamics, and
cosmic rays, we may view the channel of time as a noisy channel. In his
old age, Leonardo da Vinci worried about the decaying of his masterpieces.
Preservation of our heritage becomes an important topic. One way might be
using the self-correcting codes to prolong the useful period of our civilization
and culture. Oral and written languages are important parts of heritages. In
all oral and written languages, there are many non-functional parts which
serve as check symbols. It is dangerous to delete these parts.
We live in the age of technology. Messages are transmitted in sequences
of 0’s and 1’s through space. It is possible to make an error with noisy
channels, and so self-correcting codes become vital to eradicate all errors
(as long as the number of errors is small). Self-correcting codes are widely
used in industries for a variety of applications including e-mail, telephone,
remote sensing (e.g., photographs of Mars), CD, etc. We will present some
essentials of the theory in this book.
Using linear algebra, we have the salient Hamming codes. The next level
of coding theory is through the usage of ring theory, especially polynomials,
rational functions, and power series to produce BCH codes, Reed–Solomon
codes, and classical Goppa codes. The more advanced level of coding theory
is an application of algebraic geometry to geometric Goppa codes. The aim
of this book is to gradually bring interested readers to the most advanced
level of coding theory.
PART I

Vector Space Codes


This page intentionally left blank
Chapter 1

Linear Codes

1.1. Real Curve Codes

In this chapter, we lay the foundation for coding theory using linear algebra,
although certainly there are many other ways. For instance, we could simply
send multiple copies of the same message and determine the correct one bit
by bit by a majority vote, which is called repetition code. Alternatively, we
could use real curve theory to construct a “self-correcting code” as follows.
Given data a0 , a1 , let us consider the line defined by linear equation
y = f (x) = a0 x + a1 .
We transmitted the values y0 = f (0), y1 = f (1) (and in general yi = f (i))
instead of {a0 , a1 }, making the observation that {a0 , a1 } and {y0 , y1 }
determine each other. There is no way to tell if the transmitted {y0 , y1 }
contain an error. To detect errors, assume that there is at most one error;
we may transmit a group of three data, (y0 , y1 , y2 ). If (0, y0 ), (1, y1 ), and
(2, y2 ) are on a line L3 ⊂ A2R , then there is no mistake since we assume
that there is at most one error. If they are not on a line, then there is a
mistake. However, we cannot decide which one is an error. To correct one
possible error, we should add two more symbols y2 = f (2), y3 = f (3) instead
of just one more y2 and transmit {y0 , y1 , y2 , y3 } because we assume that
there is at most one error, then there must be at least three correct values.
For any three correct values (f (i), f (j), f (k)) among the four values, the
corresponding points (i, f (i)), (j, f (j)), (k, f (k)) will lie on a line L4 ⊂ A2R ,
and hence the remaining one point (, f ()) is determined, i.e., since  is
correct, we just need to determine f (). That is, a brute-force search for

3
4 Introduction to Algebraic Coding Theory

the correct triple will reveal which three values are consistent (for a line)
and hence determine the extra one.
Let us consider the problem of correcting two errors. Let us add one
more data f (4). Now, we have f (0), f (1), f (2), f (3), and f (4). Can we
correct two errors? Say, (0, f (0)), (1, f (1)), (2, f (2)) lie on a line. However,
(2, f (2)), (3, f (3)), (4, f (4)) may lie on a different line. Then, we cannot
tell which line is the correct one. We cannot correct the mistakes. We may
modify the method in the previous paragraph to correct two errors. We shall
add two more points {f (4), f (5)} and transmit {y0 , y1 , . . . , y4 , y5 } because
we assume that there are at most two errors; therefore, there are at least
four correct values. Furthermore, any four correct values will determine the
line and hence the remaining two values. That is, a brute-force search for
the correct four tuple will reveal which four values are consistent (for a
line). Thus, two errors can be corrected this way.
It is easy to generalize the above method to correct any number of
errors.
Instead of lines, we may use curves of higher degrees. We may consider
all quadratic curves. A quadratic curve is defined by the equation
y = f (x) = a0 x2 + a1 x + a2 ,
representing the original data {a0 , a1 , a2 }. We transmit the values y0 =
f (0), y1 = f (1), and y2 = f (2), instead of {a0 , a1 , a2 }, making the
observation that {a0 , a1 , a2 } and {y0 , y1 , y2 } determine each other. There is
no way to tell if the transmitted {y0 , y1 , y2 } contain an error. To correct one
possible error, we should add two more symbols y3 = f (3), y4 = f (4) and
transmit {y0 , y1 , y2 , y3 , y4 }. Suppose on the other end, the receiver receives
{y0 , y1 , y2 , y3 , y4 } with one possible error. The receiver then determines
which of the four tuples are consistent (for a quadratic curve) and then
uses it to correct the fifth one. In general, we feel that if we consider curves
of higher degrees and if we transmit sufficiently more points than necessary,
say s more points, then we may correct  2s  errors. However, a brute-force
search for the correct tuple will be time-consuming. We may modify this
curve code slightly to produce the Reed–Solomon code (see Section 3.3)
with a fast decoding process.
Both repetition codes and real curve codes are time-consuming for
decoding. Instead, we focus on linear codes, which are more efficient and
have fast decoding methods. We follow the historical development of the
theory of self-correcting codes, primarily using techniques from linear
algebra.
Linear Codes 5

1.2. Preliminaries

1.2.1. Groups and Fields


We study the foundations of algebra. A kernel is group theory. Abstract
group theory was an invention of Galois. It turns out to be an important
concept in mathematics, physics, etc. Let us give the usual definition of
abstract group.

Definition 1.1. Let G be a non-empty set. A binary operation · is a rule


to assign an element c = a · b, given any ordered pair of elements (a, b) in
G. If a · b ∈ G for all a, b ∈ G, then we say that G is closed under the binary
operation ·. If G is closed under · and has the following three properties,
then we say (G, ·) forms a group under the binary operation ·:
(1) Associative law: (a · b) · c = a · (b · c).
(2) The existence of identity: there exists an element e such that a · e =
e · a = a for all a ∈ G.
(3) The existence of inverse: for any element a, there exists an element b
such that a · b = b · a = e.
Sometimes, we omit the mention of · if it is obvious, and we simply say
that G is a group. If a · b = b · a holds for all a, b ∈ G always, we say that
G is an abelian group or commutative group. 

In primary and high schools, we have studied the associative law,


commutative law, and distributive law for integers Z, and we may have
noticed that the additive identity 0 = 1, the multiplicative identity. Now,
we assume that the same rules are satisfied by the two imposed operations,
addition and multiplication, on the set of symbols. Furthermore, we include
one more useful rule which is valid for the set of rational numbers Q, i.e.,
every non-zero element will have an inverse. We thus call a non-empty set
K with two operations that satisfies all the preceding requirements a field.
So, algebraically, a field behaves as the rational numbers. We use sets which
are similar to the set of rational numbers and call them fields. Usually, we
work on a set of symbols {a1 , . . . , an }. We take our symbols from a finite
field Fq (see the following), and we call them letters in our discussions of
coding theory. A field (K,+,·) is defined as follows.

Definition 1.2. Let K be a set with two operations (+, ·), where +, · are
binary operations between elements in K such that a, b, c ∈ K satisfies the
following conditions:
6 Introduction to Algebraic Coding Theory

(1) (K, +) is an abelian group, with the identity denoted by 0;


(2) (K\{0}, ·) is an abelian group, with the identity denoted by 1;
(3) Distributive law: (a + b) · c = a · c + b · c.
Then, we say that (K, +, ·) is a field. Sometimes, we omit the mention of
+, · if it is obvious, and we simply state that K is a field. 

Example 1: The well-known rational numbers Q, real numbers R, and the


complex numbers C are all fields. There are other important fields Z/pZ,
the field of the equivalent classes [a]p modulus a prime number p:
[a]p = {all integers b such that p | a − b},
where p is a prime integer, and the addition and product are defined as
[a]p + [b]p = [a + b]p , [a]p · [b]p = [ab]p .
It is routine to show that Fp = Z/pZ is a field if p is a prime number
(see Exercises). 
We have the following definition.

Definition 1.3. Let K be a field. Consider the set of repeatedly adding


n
  
1 as S = {1, 1 + 1, . . . , 1 + 1 + · · · + 1, . . .}, where 1 is the multiplicative
identity in the field K. Then, either the set S does not contain 0 or the first
n, such that n · 1 = 0 must be a prime number p. In the first case, we say
the field K has characteristic 0, and in the second case, we say the field has
characteristic p. 

The only thing we have to establish in Definition 1.3 is that with n, the
smallest positive integer such that n · 1 = 0, if n exists, then n must be a
prime number. Otherwise, let n =  · m with 0 < , m < n, it follows from
n · 1 =  · m · 1 = ( · 1)(m · 1) = 0.
Since we have a field, either  · 1 = 0 or m · 1 = 0. Therefore, n is not the
smallest. A contradiction, i.e.,  is prime.
We need some basic knowledge of field theory. The reader is referred
to Commutative Algebra, Vol. I, p. 60, by Zariski and Samuel [16], for field
theory and the following corollary.

Corollary 1.4. If L is an overfield of K and x ∈ L, then x is algebraic


over K if and only if K(x) is a finite extension of K. In that case, if
n = [K(x) : K] (i.e., n = the vector space dimension of K(x) over K),
then the degree of the minimal polynomial of x in K[X] is n.
Linear Codes 7

Proof. See the reference. 

The reader is referred to Commutative Algebra, Vol. I, p. 106, by Zariski


and Samuel [16], for the following theorem.

Theorem 1.5. If K is a field, then there exists an algebraic closure of K


and any two algebraic closures of K are K-isomorphic fields.

Proof. See the reference. 

1.2.2. Finite Fields


In coding theory, we are interested in a finite field L. Naturally, it contains
some prime field Zp which is of exactly p elements. Therefore, L is a finite-
dimensional vector space over Zp . By the vector space consideration of
dimension, let the dimension be n, then putting a coordinate system to it,
we conclude that it has precisely pn elements, using the following theorem.

Theorem 1.6. If K is a finite field, then there are exactly pn = q elements


in K for some suitable prime number p and a positive integer n. If we fix
an algebraic closure Ω and only consider subfields of Ω, then the field of pn
elements exists uniquely and will be called Fpn = Fq . We may use either
Zp or Fp to denote a finite field of p elements.

Proof. See the above references. 

1.2.3. Vector Spaces


Using letters, we form messages (which maybe ordinary words and sentences
and articles). Note that some letters may indicate empty space or empty
line. Given any positive integer k (i.e., the size of a page), we may chop
a long article into many blocks of the same length k, just as we chop a
long book into many pages, and pad the last block. We shall only work
on a single block of size k. Note that messages (which may be ordinary
words, sentences, and articles) are more than simple letters. If there is no
relation among letters for one message, then it is hard to locate and correct
errors. It is to our advantage to introduce algebraic relations for letters in a
message. We shall put the simplest relation, the structure of a vector space
(see later), among letters in a single block of length k to help us locate
and correct errors. We will make a detailed study of the coding theory thus
discussed.
8 Introduction to Algebraic Coding Theory

Before we go further, we need some elementary knowledge of Linear


Algebra. Let us define the concept of vector space.

Definition 1.7. Let (K, +, ·) be a field and V be a non-empty set.


A vector space (V, +, ·, K), where + is a binary operation between elements
in V and · : K × V →  V is a binary operation with a, b ∈ K and v, u ∈ V
satisfying the following conditions:

(1) (V, +) is an abelian group.


(2) Associative law: (a · (b · v)) = (a · b) · v, for all a, b, v.
(3) Distributive law: (a + b) · v = a · v + b · v and a · (v + u) = a · v + a · u.
(4) 1 · v = v.

Sometimes, we omit the mention of (V, +, ·) if they are obvious and say V
is a vector space. 

A set of vectors {vi }i∈I is called a set of generating vectors if for any

vector v ∈ V, we always have an expression v = finite ai vi . A set of vectors
{vi }i∈I is called a set of linearly independent vectors if for any expression

0 = finite ai vi , we must have ai = 0 ∀i. A set of vectors {vi }i∈I is called
a basis of V if it is a set of generating and linearly independent vectors.
A common theorem of the vector spaces is that for a given vector space
V, all bases will have the same cardinality, which is called the dimension
of the vector space V. Any vector space is said to be finite-dimensional if
it has a finite basis. In coding theory, we only use finite-dimensional vector
spaces.
If we have two fields L ⊃ K, then clearly L is a vector space over K.

1.2.4. Matrices

Consider a matrix A. The matrix A is said to be in a reduced row echelon


form if it satisfies the following conditions: (1) All rows are of the form
[0, . . . , 0, 1, . . .], where the first non-zero term is 1 and happens at nj th
position. Usually, this term is called the pivot term. (2) If i < j, then
ni < nj . (3) On the nj th column, only that particular coefficient is 1, while
all other coefficients are zeroes. It follows from a well-known theorem that
by row operations, any matrix can be reduced to its reduced row echelon
form.
Linear Codes 9

1.3. Vector Space Codes

In general, setting an algebraic closure Ω of Zp , we will consider a finite field


K of q = pm elements between Ω and Zp . The field is uniquely determined
by the number q and will be denoted by Fq . The detailed treatments of
finite fields is given in Section 2.2.
Let us refer to the finite field Fq as our set of letters, then any message
a1 a2 · · · ak can be considered as a horizontal vector [a1 a2 · · · ak ] in the vector
space Fkq .
A basis {e1 , . . . , ek } is called a standard basis if any element ei is of
the form ei = (0, . . . , 0, 1, 0, . . . , 0). For the purpose of coding theory, we
will only consider the standard basis in Fkq since a message is composed of
letters with positions, and a letter will be denoted by an element ai ∈ Fq ,

while a message will be denoted by i ai ei in the standard basis. Since we
use horizontal vectors, all matrices of linear action will act from the right.
The theory of Gaussian reduced row echelon forms are significant, and we
shall allow a permutation of columns (which corresponds to exchanging the
positions of letters). The vector space Fkq with the standard basis will be
called the message space V.
Let n ≥ k. We should consider another vector space Fn q = U as the word
n
space, and any vector in Fq = U will be called a word. We shall repeat our

construction of standard basis, and a word will be any vector ni ai ei , and
we shall allow permutation of columns. A system of linear equations defines
a subspace C of dimension k in Fn q (= U). Any element in C will be called
a code word. The subspace C will be referred to as the code space.

Definition 1.8. An [n, k] code is a k-dimensional subspace C, which is


called code space of an n-dimensional vector space Fnq (= U), which is called
word space, with standard basis. Sometimes, the number k will be referred
to as the rank of C. A coding is a one-to-one (i.e., injective) map from the
message space V, which is another copy of vector space Fkq , to the code
space C.
All concepts and theorems in coding theory are independent of the
ordering of the basis vectors, so we will allow permutations of the standard
bases in the message spaces and word spaces. Two [n, k] codes C1 , C2 are
said to be equivalent if their code spaces C1 , C2 differ by permutations of
the standard bases of the message space and word space. 

A k-dimensional subspace C is generally defined by n − k equations.


To apply Gaussian elimination to simplify the system, we are restricted to
10 Introduction to Algebraic Coding Theory

using row operations and permutations of columns only. Therefore, we may


assume that the system of equations is of the following standard form:

y1 = c1,1 x1 + · · · + c1,k xk ,
··· (2)
yn−k = cn−k,1 x1 + · · · + cn−k,k xk .

It follows that the generator matrix G can be written in the following


standard form:
⎛ ⎞
1 0 ··· 0 c1,1 ··· cn−k,1
⎜ ⎟
⎜ 0 ··· ··· ··· ··· ··· · · ·⎟
G=⎜ ⎜· · · · · ·
⎟,
⎝ ··· 0 ··· ··· · · ·⎟

0 ··· 0 1 c1,k · · · cn−k,k

with [x1 · · · xk ] × G = [x1 · · · xk y1 · · · yn−k ]. The symbols xi will be called


message symbols and the symbols yi will be called check symbols. Define G
using equation (2), where

G = I G

and I is the k × k identity matrix.


Observe that if we rewrite the system of equations in equation (2) in
the form

y1 − c1,1 x1 − · · · − c1,k xk = 0,
··· (2 )
yn−k − cn−k,1 x1 − · · · − cn−k,k xk = 0,

then the check matrix H can be written as


⎛ ⎞
−c1,1 · · · −cn−k,1
⎜ ⎟
⎜ ··· ··· · · ·⎟
⎜ ⎟
⎜ ··· ··· · · ·⎟
⎜ ⎟
⎜ ⎟

H = ⎜−c1,k · · · −cn−k,k ⎟ ⎟.
⎜ ⎟
⎜ 1 ··· 0⎟
⎜ ⎟
⎜ ··· ··· · · ·⎟
⎝ ⎠
0 ··· 1
Linear Codes 11

It is easy to verify that G × H = 0, hence we have [x1 · · · xk ] × G × H =


[0 · · · 0]. We may write
 
−G
H= ,
J

where J is the (n − k) × (n − k) identity matrix.


For a longer message, we chop it into blocks, which are vectors in Fkq ,
padding the end if necessary and applying the above method for each block.
This is the so-called block code.
We may identify the message space V with a subspace {[x1 · · · xk 0 · · · 0] :
xi ∈ Fq }. If there is no confusion, we will call this subspace the message
space V.
Let us consider the coset space U/V the image space of the mapping
defined by H. It is called the syndrome space. It can be canonically identified
with the subspace W = {[0 · · · 0y1 · · · yn−k ] : yj ∈ Fq }. For any element
a ∈ U, a×H is called the syndrome of a. We have the following proposition.

Proposition 1.9. Let a, b ∈ U. Then, they have the same syndrome or


they belong to the same coset iff a × H = b × H. In particular, a is in the
code space iff a × H = 0. 

Example 2: The example of Hamming in the introduction is a [7, 4] code.


Its syndrome space W is {[0000y1y2 y3 ] : yj ∈ Fq }. Let us consider a general
[n, k] code C. From the check matrix H, we may add n × k zeroes to form
the following n × n matrix H̄:
⎛ ⎞
0 ··· 0 −c1,1 ··· −cn−k,1
⎜ ⎟
⎜0 ··· 0 ··· ··· · · ·⎟
⎜ ⎟
⎜0 ··· 0 ··· ··· · · ·⎟
⎜ ⎟
⎜ ⎟
H̄ = ⎜
⎜0 ··· 0 −ck,1 · · · −cn−k,k ⎟⎟.
⎜ ⎟
⎜0 ··· 0 1 ··· 0⎟
⎜ ⎟
⎜0 ··· 0 ··· ··· · · ·⎟
⎝ ⎠
0 ··· 0 0 ··· 1

Then, it is easy to see that H̄ 2 = H̄, and H̄ : a→


 a× H̄ is the projection
of the whole space U to the syndrome space W. 
12 Introduction to Algebraic Coding Theory

1.4. Distances

The only natural metric in Fq is the discrete one, i.e., we have d(a, b) = 1
if a = b and d(a, b) = 0 if a = b. We generalize this distance function to an
n-dimensional vector space Fn q (= U) by using the following definitions.

Definition 1.10. Let a = [a1 · · · an ], b = [b1 · · · bn ] ∈ U. Then, we define


the Hamming distance between a and b as the product distance of the above-
mentioned natural metric in Fq (which may be called a sum distance), i.e.,

d(a, b) = d(ai , bi ), where d(ai , bi ) is the natural distance of elements in
the field Fq . 

Definition 1.11. Let a = [a1 · · · an ] ∈ U. Then, we define the Hamming


weight of a as w(a) = d(a, 0). 

We have the following natural theorem.

Theorem 1.12. The Hamming distance is a distance, i.e.,

(1) : d(a, a) = 0,
(2) : d(a, b) = d(b, a),
(3) : d(a, b) + d(b, c) ≥ d(a, c).

Proof. It is routine for all product distances. 

Proposition 1.13. We always have d(a, b) = w(a − b) = d(a − b, 0).

Proof. It is evident. 

Definition 1.14. The minimal distance of a set S is min{d(a, b) : a =


b, a, b ∈ S}. Sometimes, the minimal distance of a set S is called the
Hamming distance of S. 

1.5. Maximum Likelihood Decoding

From the above discussion, we know that a coding theory is determined


by its code space, which is a subspace of the word space. In this section,
we show that given any subspace of a finite-dimensional vector space, a
coding theory will be determined naturally. This means that we will show
that the coding theory corresponding to the theory of subspaces of a finite-
dimensional vector space.
Linear Codes 13

Let a subspace C of a finite-dimensional space U be given. Let us call


C the code space and U the word space. A finite-dimensional subspace C is
determined by a finite system of equations, and by Gaussian row operations
and column permutations, we may assume it to be of the following forms:

y1 = c1,1 x1 + · · · + c1,k xk ,
··· (2 )
yn−k = cn−k,1 x1 + · · · + cn−k,k xk .

It is easy to see that all {x1 , . . . , xk } define a message space V. Let


us only transmit vector v from C. Suppose we have received a message
v  = [a1 · · · an ], and we know the code word v = [a1 · · · an ] is in the code
space C, but perhaps v  ∈ C. How do we recover the code word v? Let us
assume that v  = v + e, where e is the error vector. Since v = v  − e, v
and e uniquely determine each other with v  known. However, with only v 
known, we cannot uniquely determine e. We may select possible e from the
syndrome space with e × H̄ = v  × H̄, where H̄ is used in Example 2 of the
preceding section, and let v = v  − e. Note that we have

v × H̄ = (v  − e) × H̄ = 0.

Therefore, we conclude that v ∈ C, the code space. However, upon closer


examination, we discover that we may use any element in the same coset
as e. Up to now, there is no particular reason to pick any one from
the syndrome space. We have to find some criterion for the best possible
selection.
Let us consider the real situation. Let us fix a position for the error
vector e; a zero coordinate indicates that there is no error in that position,
whereas a non-zero coordinate indicates that the letter at that position
should be corrected. Let pi be the probability (which is due to the machine
or the program and is independent of v  ) that the error vector e at a
particular position has a value i. For simplicity, we assume that the following
definition is satisfied.

Definition 1.15 (Symmetric channel). If pi = pj (= p) for all i, j = 0,


and p0 > 1/2, then the channel is said to be a symmetric channel . 

Let q be the number of symbols i, j. Note that q−1 i=0 pi = 1; therefore,
p = pi < 1/(2(q − 1)) for all i = 0. Note that p0 is the probability that
there is no error at this particular position. The condition p0 > 1/2 simply
14 Introduction to Algebraic Coding Theory

means that it is less likely to have an error. This is reasonable. We shall


assume that we have a symmetric channel.

Proposition 1.16. For a symmetric channel, the probability P (e) for


an error vector e to happen is pw(e) (1 − w(e)p)q−w(e) , where w(e) is the
Hamming weight of e. Therefore, P (e) > P (e ) iff w(e) < w(e ).

Proof. It is evident. 
Let us continue our discussion of taking a subspace C as code space;
an ad hoc decoding method is the following maximum likelihood decoding
method. The maximum likelihood decoding is to find an error vector e in
the coset S of v  such that
w(e) = d(e, 0) = min{w(u) : u ∈ S}.
It means that the maximum likelihood decoding is to select the one correction
with the least amount of corrections. The previous proposition explains the
meaning of the term maximum likelihood decoding. We further define the
following.

Definition 1.17. A coset leader e of a coset S is an element e in S such


that
w(e) = d(e, 0) = min{w(u) : u ∈ S}. 

See Figure 1.1.


In the case of real numbers, the maximum likelihood decoding is
to orthogonally project the vector v  to the subspace C. Usually, the
orthogonal projection of v  is unique. Figure 1.1 over the field of real
numbers is misleading in the context of a finite field, where the coset leaders

syndrome space
coset

syndrome code space


coset leader
message space

coset leader set

Figure 1.1. Various spaces.


Linear Codes 15

of a vector v  only forms a set rather than a point as in the real case. For
instance, we have the following example.
Example 3: Let us consider a [6, 3] code C over the prime field F2 = Z/2Z
with the following check matrix H:
⎛ ⎞
1 1 0
⎜ ⎟
⎜0 1 1⎟
⎜ ⎟
⎜1 0 1⎟
⎜ ⎟
H=⎜ ⎟.
⎜1 0 0⎟
⎜ ⎟
⎜ ⎟
⎝ 0 1 0⎠
0 0 1
Furthermore, we let H̄ be the following matrix:
⎛ ⎞
0 0 0 1 1 0
⎜ ⎟
⎜0 0 0 0 1 1⎟
⎜ ⎟
⎜0 0 0 1⎟
⎜ 1 0 ⎟
H̄ = ⎜ ⎟.
⎜0 0 0 1 0 0⎟
⎜ ⎟
⎜ ⎟
⎝0 0 0 0 1 0⎠
0 0 0 0 0 1
Let the coset S be with syndrome [000111], i.e., s × H̄ = [000111] for any
s ∈ S; then, it is easy to see that no element s with w(s) = 1 is in S,
while there are three elements [100001], [010100], and [001010] in S with
w(s) = 2. They are all coset leaders, and they are not unique. 
If there are several coset leaders for a coset, we will identify any one of
them as the coset leader. The maximum likelihood decoding procedure is
as follows: For any message v  , we find the coset S where it lies (usually
by finding the syndrome of v  which determines the coset S); then, we
find a coset leader e which is the most likely error vector in S. Finally,
we correct the error by taking v = v  − e. In engineering, we may have a
table of {(syndrome, coset leader)} for the sake of convenience. Since the
syndrome space is of dimension n − k, it consists of q n−k elements. If n − k
is small, we may precompute the table and decode accordingly.
Note that this particular decoding procedure may not be effective for
all codes, and it may decode to a wrong word, so we shall look for other
possible decoding procedures. The advantage of this procedure is that it
does exist for any code. Therefore, we may study any code with the decoding
procedure of maximum likelihood decoding, which may not be effective. In
16 Introduction to Algebraic Coding Theory

all cases, the maximum likelihood decoding is an ad hoc method of decoding.


The advantage is that on using it, every subspace C establishes a self-
correcting code. Therefore, there is a one-to-one correspondence between
the set of self-correcting codes and the set of all subspaces.
Let us continue our discussion on possible codes.

1.6. Dual Codes

Let C be a linear code, i.e., C is a vector subspace of Fnq . Let us introduce


a pairing v | u on Fnq as follows.
Definition 1.18. Let v = (v1 , . . . , vn ) and u = (u1 , . . . , un ) be two vectors
∈ Fnq with respect to a fixed basis. Note that we only allow a basis to be a
permutation of the standard basis. We define a pairing v | u as
 n
v|u = vi ui .
i=1

Note that a pairing is not an inner product, since there is no sense of
sizes in Fnq , we cannot say v | v > 0 if v = 0. However, the dual space
exists, and we define the following.
Definition 1.19. Let C⊥ = {v ∈ Fqn : v | u = 0 for all u ∈ C}. If C is
a linear code, then C⊥ is called the dual code of C. 
Proposition 1.20. Given C a linear code, its dual C⊥ is also a linear
code. If C is an [n, k] code, then C⊥ is an [n, n − k] code.
Proof. The proof is left to the reader. 

Exercises

(1) Prove that Z/pZ is a field, while Z/pm Z is not a field for m ≥ 2.
(2) Let us consider the repetition code [a1 · · · an ] :→ [a1 a1 · · · a1
· · · an an · · · an ], where each digit repeats itself m times. Find the
generator matrix, check matrix, and the minimal distance of this code.
(3) Prove that for a given binary [n, k] code with at least one word of odd
weight, all code words of even weight form an [n, k − 1] code.
(4) Let us consider the example of Exercise (2). If we want to correct two
errors, how long should be the code? If we want to correct  errors, how
long should be the code?
Linear Codes 17

(5) Prove that if C is an [n, k] code, then C⊥ is an [n, n − k] code.


(6) Let C be a code space. Show that
min{d(a, b) : a, b ∈ C} = min{w(a) : a ∈ C}.
(7) Let C be the repetition code with n = 3 = m (cf. Exercise (2)). What
is C⊥ ?
(8) Let us use the lines for a coding theory of plan to correct one error. We
received (1, 3, 3, 4). Assume that there is at most one error. Which of
the four digits is an error, and which are the correct four digits?

1.7. Hamming Code

Let us consider the case of a binary code, i.e., q = 2, Fq = F2 . For n ≥ 3, a


Hamming code is a code with the check matrix H, in which the rows of H
are all non-zero elements of Fn2 . Note that the number of rows is 2n − 1, H
is a (2n −1)×n matrix, and the code thus constructed is a [2n −1, 2n −n−1]
code. After a suitable permutation of rows, we may assume that the last n
rows form an identity matrix J and hence H is in standard form.
For the purpose of decoding, we may arrange the rows of H differently.
We will present elements in Fn2 as integers as follows. Recall the binary
expansion of an integer i between 0 and 2n − 1 as follows:
n
i= aj 2j−1 , 0 ≤ i ≤ 2n − 1, aj = 0, 1,
j=1
and we put [a1 · · · an ], which is an element of Fn2 , as the ith row of H. Now,
suppose that a received word [b1 · · · b2n −1 ] has at most one error. It means
that if [b1 · · · b2n −1 ] is the code word, then either
[b1 · · · b2n −1 ] = [b1 · · · b2n −1 ]
or
[b1 · · · b2n −1 ] = [b1 · · · b2n −1 ] + [0 · · · 010 · · · 0].
The computer calculates
[b1 · · · b2n −1 ] × H = [c1 · · · cn ].
If [c1 · · · cn ] = [0 · · · 0], then we can see there is no error because if there is
a single error, we must have
[b1 · · · b2n −1 ] × H = ([b1 · · · b2n −1 ] + [0 · · · 010 · · · 0]) × H
= [0 · · · 010 · · · 0] × H = [0 · · · 0],
which is a contradiction. If [c1 · · · cn ] = [0 · · · 0], then there must be an error,
and by the above computation, it is easy to see that the error appears at
18 Introduction to Algebraic Coding Theory

the ith position, where


n

i= cj 2j−1 .
j=1

The method is effective in both locating and correcting the error since
error correction is simply switching 0 and 1. The shortcoming is that the
Hamming codes are unable to correct several errors (in particular, a burst
of errors).
The Hamming code is the grandfather of all self-correcting codes, it
calls for more theoretical study of Hamming codes. We have the following
proposition.

Proposition 1.21. The minimal distance of all pairs of elements of any


Hamming code C is three. It means that
min(w(a)) = 3, for all a = 0 in C.

Proof. We have a ∈ C ⇔ a × H = 0. This equation can be interpreted


as a linear equation among the rows of H, i.e., the minimal distance is the
minimal number of row vectors which are linearly dependent. Since all row
vectors of H are non-zeroes, we have min (w(a)) > 1. Since any two row
vectors are distinct, we have min (w(a)) > 2. We can easily find two vectors
which are added to a third row, so we conclude min (w(a)) = 3. 
The concept of the above proposition is very important. So, we have the
following definition.

Definition 1.22. Let C be an [n, k] code. If


d = min(w(a)) for all a = 0 in C,
then C will be called an [n, k, d] code. 

The following proposition was discovered by Singleton in 1964.


Proposition 1.23 (Singleton bound). Let C be an [n, k, d] code. Then,
d ≤ n − k + 1.

Proof. Let π be the projection of the word space to the last n − d + 1


coordinates. Let us restrict π to the code space C. Let two code words
c = c or 0 = c−c ∈ C. Then, there must be at least d non-zero coordinates
for c − c . Clearly, if all the last n − d + 1 coordinates of c − c are zeroes,
then the number of non-zero coordinates of c − c ≤ n − (n − d + 1) =
d − 1. This is contradictory to the definition of d. Hence, π(c − c ) = 0
Linear Codes 19

or π(c) = π(c ). Therefore, the restriction of π is a one-to-one map on C.


Looking at the dimensions, we conclude that k = dim (C)≤ n − d + 1 and
d ≤ n − k + 1. 

Definition 1.24. A code that satisfies the Singleton bound with equality,
i.e., d = n − k + 1, is called an MDS code (maximum distance separable
code). 

Note that all Hamming codes are [2n − 1, 2n − n − 1, 3] codes for n ≥ 3.


Note that for it to be an MDS code, we must have 3 = (2n − 1) − (2n − n −
1) + 1 = n + 1 ≥ 4, which is not true. Therefore, none of them is an MDS
code. We have the following proposition.

Proposition 1.25. Let C be an [n, k, d] code. Then, for any received word
a = [a1 · · · an ] for a code word a = [a1 · · · an ] with less than or equal to
(d − 1)/2 errors, there is at most an element a ∈ C such that d(a, a ) ≤
(d − 1)/2. Therefore, if the number of errors is restricted by (d − 1)/2,
then the decoded word is unique if it existed.

Proof. It follows from the statement that there is a code word a such
that d(a, a ) ≤ (d − 1)/2. If there are two elements a = [a1 · · · an ], b =
[b1 · · · bn ] ∈ C which satisfy the criteria of the proposition, then d(a, b) ≤
d(a, a ) + d(b, a ) < d, which is a contradiction.
Furthermore, the number of errors allowed is the maximal of d(a , a),
where a is the code word and a is the received word. The last statement is
obvious. 

Remark: In general, we may define (may be nonlinear) code as a subset M


of a vector space Fnq . Let r be a received word which may be any element
in Fnq as c ∈ M the original code word of r. Then there are two ways of
decoding: (1) let us use the maximum likelihood decoding (cf. Section
1.5) to decode. The problem is that it is neither always correct nor efficient.
(2) The following is what we will do in the coding theory. With the minimal
distance d defined as
d = min{d(a, b) : a = b ∈ M }.
Given an integer t ≤ (d − 1)/2, we return an error message when the
number of errors is d(r, M ) > t; otherwise, we have d(r, M ) ≤ t. Note that
this implies that there is a unique element c ∈ M such that d(r, c) ≤ t,
and we correct r to c. Then, we construct a decoder which corrects up to t
errors. The important properties of the decoder are to determine quickly if
20 Introduction to Algebraic Coding Theory

d(r, M ) > t and find quickly the correct c if d(r, M ) ≤ t. What we can do
now is not up to expectation: if there are less than or equal to t errors, the
decoder will find the correct c for us; if d(c, M ) > t, then the decoder will
find a c ∈ M such that d(c , r) ≤ r (with a small probability) and return
an error message if c cannot be found (with a large probability). 
Remark: Let us consider the example of the beginning of Section 1.1. Let
us consider the linear case. If we only consider L3 = {(0, y0 ), (1, y1 ), (2, y2 )},
then it is easy to see that any two lines passing two points must pass the
third one, while two lines can share a common point; therefore, d = 2
and (d − 1)/2 = 0. Thus, it cannot correct the error. If we consider
L4 = {(0, y0 ), (1, y1 ), (2, y2 ), (3, y3 )}, then it is easy to see that any two
lines passing two points must pass the third one, while two lines can share
one point; therefore, d = 3 and (d − 1)/2 = 1. Thus, it can correct one
error. 
Note that according to the above proposition, Hamming code may
correct 1 = (3−1)/2 error. This we already know. Later, we will construct
[n, k, d] codes for large d and decode more than one error.
Another important property of Hamming [2n − 1, 2n − n − 1, 3] codes
C is that any word a = [a1 · · · a2n −1 ] is within a distance of 1 of the
code space because a × H = [c1 , . . . , cn ] = [0 · · · 010 · · · 0] × H; therefore,
(a + e) × H = [0 · · · 0], where e = [0 · · · 010 · · · 0] and a + e ∈ C. We define
the following.
Definition 1.26. Let C be an [n, k, d] code. If all words in the word space
Fnq are within a distance of (d − 1)/2 of C, then C will be called a perfect
code. 
Definition 1.26 means that any word can be corrected with maximal
t = (d − 1)/2 errors to an unique code word.
Let us compute the probability of a decoder, which can correct t errors
on an [n, k] code, to fail. For instance, for Hamming codes t = 1, if
there are t + 1 or more errors, then the Hamming code decoder will not
decode correctly. Let the channel have a probability p of being incorrect
and q of being correct. Then, p + q = 1 and
n

1 = 1n = (p + q)n = Cni pi q n−i .
i=0
The probability r of failing to decode or decoding improperly is
n t

r= Cni pi q n−i = 1 − Cni pi q n−i .
i=t+1 i=0
Linear Codes 21

Exercises

(1) Show that a Hamming code is a perfect code (note that in the definition
of Hamming code, we assume that n ≥ 3).
(2) Let C be a binary perfect code of length n with minimum distance 7.
Show that n = 7 or n = 23.
(3) Let p be a prime number and q = pm . A q-ary Hamming code of length
(q n − 1)/(q − 1) is defined by a check matrix H with the following
properties: (1) the zero vector is not any row vector, (2) any two row
vectors are linearly independent, and (3) any non-zero vector is linearly
dependent on one of the row vectors. Show that it is a perfect code.
(4) Set up a computer program to decode a Hamming code.
(5) Show that if there are more than two errors in a received word r for a
Hamming code, then r will be decoded to a wrong code word.

1.8. Shannon’s Theorem

Shannon’s theorem is a guiding light of coding theory. For all the different
codes, we need common standards of measurements to compare them. We
sometimes define the common standards for linear code first and then
generalize them for arbitrary codes. Let us consider the efficiency of codes
first. We have the following definition.

Definition 1.27. The rate of information R of an [n, k] code is defined to


be k/n. 

Certainly, we have 0 < k/n ≤ 1, and we want the number k/n as large
as possible. It is obvious that if k/n = 1, then the code cannot correct any
error. Let us consider all codes (linear or otherwise). A code is defined as
a subset M = {a1 , . . . , am }, with m elements, of the word space Fnq =U.
We may use the maximum likelihood decoding to decode any received word
a to a, as a ∈ M with d(a, a ) minimal (cf. Remark of Proposition 1.25).
We shall generalize the above definition to the general cases. Let us use
the linear codes as guidance. For a linear code space of dimension k, the
number of elements is q k , and k/n = logq (q k )/n. Therefore, we naturally
generalized the above definition to the following.
22 Introduction to Algebraic Coding Theory

Definition 1.28. The rate of information R of a code M of m elements


in Fqn is defined to be
R(M ) = n−1 logq m. 

Let Pi be the probability of incorrect decoding for ai ∈ M . Then,



P (M ) = m−1 Pi
is defined to be the average probability of incorrect decoding for the code M .

Definition 1.29. The δ rate of distance of an [n, k, d] code is defined to


be δ = d/n. 

Proposition 1.30. The rate of information of an [n, k, d] code is at most


1 + 1/n − δ.

Proof. It follows from Proposition 1.23. 


It follows from the preceding proposition that as the rate of distance δ
gets larger for a linear code, the rate of information gets smaller. We are
working towards the statement of Shannon’s theorem [33], which has been
very influential to the development of coding theory. We will consider all
codes (linear or otherwise).
Suppose that the message is transmitted through a binary symmetric
channel, i.e., the symbols 0 and 1 have equal probability ℘ of being
transmitted incorrectly. The value of ℘ is < 1/2. We shall have a common
measurement of all channels of this kind. There are only two possible
outcomes, either correct or incorrect; the classical information function
℘ log2 ℘ + (1 − ℘) log2 (1 − ℘) gives a good measure, with the exception that
its values may be negative, so add 1 to the function to make the function
non-negative.
We define the following.

Definition 1.31. The capacity c(℘) of the transmission is


c(℘) = 1 + ℘ log2 ℘ + (1 − ℘) log2 (1 − ℘),
where ℘ < 1/2. 

The capacity is 1 if ℘ = 0 and close to 0 if ℘ is close to 1/2. We have


the following proposition.

Proposition 1.32. The capacity is a monotonic decreasing function of ℘


from 0 to 1/2.
Linear Codes 23

Proof. Let c(x) = 1 + x log2 x + (1 − x) log2 (1 − x) be the capacity


function. Then, we have
x
c (x) = log2 x − log2 (1 − x) = log2 ,
1−x
which is negative as long as 0 < x < 1/2, hence c(x) is monotonically
decreasing between 0 and 1/2. 
Observe that the above definitions are consistent with all of the previous
definitions given in the context of linear codes. Now, we may state Shannon’s
theorem [33].
Theorem 1.33 (Shannon’s theorem). For any rate of information 0 <
R < capacity, there is a sequence of codes {Mn } such that the rate of
information of Mn , R(Mn ) > R, and the average probability of incorrect
decoding P (Mn ) → 0 as n → ∞. 

We will skip the proof of the above existence theorem. Although it


is interesting conceptually, there is no constructive proof of Shannon’s
theorem. In light of the above Shannon’s theorem, observe that the
[2n − 1, 2n − n − 1, 3] Hamming codes Cn will have
R(Cn ) → 1,
δ = 3/(2n − 1) → 0,
i.e., the correcting power tends to zero as n → ∞. Note that the above
theorem states that there is a sequence {Mn } with the average probability of
incorrect decoding P (Mn ) → 0. The above sequence of δ → 0 of Hamming
codes just show that their probability of correct decoding 1−P (Mn ) → 0 or
of incorrect decoding P (Mn ) → 1. Or we know Hamming code can correct
only one error; as the code gets longer and longer, the probability to have
more than one error tends to be larger and larger, and then Hamming
code would fail as the length of code becomes longer P (Mn ) → 1. In other
words, Hamming codes are not the best code that we should expect. We
may wish to construct other sequence of linear codes that do fulfill some of
the expectations. It gives a powerful incentive to search for long and useful
codes, and it separates the study of codes into two branches: theoretical
codes disregarding the decoding procedures (we may use the Remark after
Proposition 1.25 using brute force to search for its code word or we may use
the universal maximum likelihood decoding) and the decoding procedures
themselves. Certainly, for practical applications, we cannot allow n → ∞.
The interested reader of Shannon’s theorem is referred to the work of
van Lint [9].
24 Introduction to Algebraic Coding Theory

1.9. Gilbert–Varshamov’s Bound

Note that Shannon’s theorem is an existence theorem. In this section, we


construct a weaker theorem, the theorem of Gilbert–Varshamov’s Bound.
Let us consider the n-dimensional vector space, word space, U = Fnq over
a finite field Fq of q elements. Furthermore, let vector α ∈ U and B(α, s)
be the ball of radius s ≤ n around α, i.e.,

B(α, s) = {β : d(α, β) ≤ s}.

Let us define its volume U(α, s) to be the number of elements inside


B(α, s). Note that if s = r, then B(α, s) = B(α, r). We have the following
geometric proposition.

Proposition 1.34. Let r be an positive integer. We have U(α, r) =


r n i
i=0 Ci (q − 1) . Note that it is independent of α, and we will denote the
common number by Uq (n, r).

Proof. Let 0 < d(α, β) = i ≤ r. Then, α, β differ at i positions. There


are Cni selections of the positions, and there are (q − 1)i possible values
for those different positions and hence the proposition. 
Let us consider a code M ⊂ Fqn which may not be linear, and let m be
the number of elements in M. We will say that this is an (n, m, s) code,
where n is the dimension of the word space and s is greater or equal to
the minimal Hamming distance between d elements of M. For a fixed pair
(n, s) with n ≥ s ≥ 0, we may consider the maximal possible m as follows.

Definition 1.35. Let us use the notations of the preceding paragraph. Let
A(n, s) = max{m | an (n, m, d) code exists with d ≤ s}. 

Note that if s = r, then A(n, s) = A(n, r). For a sequence of


(ni , mi , di ) codes, we may assume that they have the same rate of distance
δ, i.e., di = ni δ. Then, we may consider the limit of the rate of
information of the sequence to be limi→∞ n−1 i logq mi . More generally,
we may define the following.

Definition 1.36. Let the limit sup of the rate of information α(δ)
be

α(δ) = lim sup n−1 logq A(n, δn). 

We define the entropy function Hq (x) as follows.


Linear Codes 25

q−1
Definition 1.37. For 0 ≤ x ≤ q , we define the entropy function Hq (x)
as
Hq (0) = 0,
Hq (x) = x logq (q − 1) − x logq x − (1 − x) logq (1 − x), x = 0.


We need the following well-known Stirling’s formula:


log n!→
 n log n − n + O(log n) as n→
 ∞.
We have the following propositions.
q−1
Lemma 1.38. Let 0 ≤ λ ≤ q and q ≥ 2. Then,
lim n−1 logq Uq (n, λn) = Hq (λ).
n→∞

Proof. Let r = λn. We separate the proof into two cases: Case 1. q = 2;
Case 2. q ≥ 3.
Case 1. We suppose that q = 2. Then, we have λ ≤ 12 , and
H2 (0) = 0,
H2 (x) = −x log2 x − (1 − x) log2 (1 − x), x = 0
and
r

U2 (n, r) = Cni .
i=0
Using Stirling’s formula, we deduce that
 r 
 n!
−1
n log2 Ci ≥ n−1 log2 Cnr = n−1 log2
n

i=0
r!(n − r)!

= n−1 (n log2 n − r log2 r − (n − r) log2 (n − r) + O(log2 (n))


= log2 n − (r/n) log2 ((r/n)n) − (1 − (r/n)
× log2 ((1 − (r/n))n) + n−1 O(log2 (n)).
Since λn − = r for some 0 ≤ < 1, it is not hard to deduce that
n−1 log2 Cnr
≥ log2 n − λ log2 (λn) − (1 − λ) log2 ((1 − λ)n) + n−1 O(log2 (n))
≥ −λ log2 (λ) − (1 − λ) log2 ((1 − λ)) + n−1 O(log2 (n))
≥ H2 (λ) + n−1 O(log2 (n)).
26 Introduction to Algebraic Coding Theory

So, we prove one direction. For the other direction, we have


2−nH(λ) = (λ)nλ (1 − λ)(1−λ)n
 λn
n λ
= (1 − λ) .
1−λ
Therefore, we have
   λn
λ
2−nH(λ) Cni = Cni (1 − λ)n
1−λ
0≤i≤λn 0≤i≤λn

  i
λ
≤ Cni (1 n
− λ)
1−λ
0≤i≤λn

= Cni λi (1 − λ)n−i
0≤i≤λn

≤ (λ + (1 − λ))n = 1
or

Cni ≤ 2nH(λ) .
0≤i≤λn

After we take log2 on the above inequality, our Case 1 follows.


Case 2. We suppose that q ≥ 3 or (q − 1) ≥ 2. Then, for i ≤ r, we
always have
Cni−1 (q − 1)i−1 < Cni (q − 1)i .
For instance, for a proof, we have
Cni−1 (q − 1)i−1 < Cni (q − 1)i ⇔ i < (q − 1)(n − i + 1)
q−1
⇔ qi < (q − 1)(n + 1) ⇔ i < (n + 1)
q
q−1
⇐i<n ⇐ i ≤ r.
q
It follows from Proposition 1.34 and the above that
Cnr (q − 1)r ≤ Uq (n, r) ≤ (1 + r)Cnr (q − 1)r .
Now, we simply imitate the proof of Case 1 to deduce Case 2 (see
Exercises). 

Lemma 1.39. Let n, d be positive integers and d < n. Then, we have


A(n, d) ≥ q n /Uq (n, d − 1) ≥ q n /Uq (n, d).
Linear Codes 27

Proof. The last inequality is obvious, since Uq (n, d) ≥ Uq (n, d − 1) > 0.


Let C be a code whose number m of elements gives A(n, d). Then, we know
that there is no word in Fqn with distance d or more to all words in C.
Otherwise, we may throw in the new element and increase the number m
by 1. This is contradictory to the definition of m. Therefore, we have the
following:
A(n, d) × Uq (n, d − 1) ≥ q n .

n
It follows from Lemma 1.39 that A(n, s) = A(n, λs) ≥ q /Uq (n, λs).
Proposition 1.40 (Gilbert–Varshamov’s bound). We have
α(δ) ≥ 1 − Hq (δ).

Proof. We have
α(δ) = lim sup n−1 logq A(n, δn)
= lim sup n−1 logq A(n, δn)
≥ lim (1 − n−1 logq Uq (n, δn))
n→∞

= 1 − Hq (δ).

We may compare Proposition 1.40 with Shannon’s theorem. Consider
the binary case, i.e., q = 2, then an (n, m, nδ) code with minimal Hamming
distance nδ may correct nδ−1 2 errors (cf. Proposition 1.23). In this case, we
observe that the rate of correcting error nδ−1 δ 1
2n = 2 − 2n for large values of
n. To compensate for a noisy channel of having a rate of errors ℘, we
should choose a code with δ = 2℘. Then, we have the following equations
(note that log2 (2 − 1) = 0 and recall that α(δ) = α(2℘) is the limit sup of
the rate of information):
α(2℘) ≥ 1 + 2℘ log2 2℘ + (1 − 2℘) log2 (1 − 2℘).
Gilbert–Varshamov’s bound
For comparison, in Shannon’s theorem, we have the rate of information
R(M ):
R(M ) ≥ 1 + ℘ log2 ℘ + (1 − ℘) log2 (1 − ℘) − .
Shannon’s theorem
Recall that 1 + x log2 x + (1 − x) log2 (1 − x) is the capacity function
(Definition 1.31), which is a monotonic decreasing function from 0 to 1/2
28 Introduction to Algebraic Coding Theory

(Proposition 1.32); we may deduce that the right-hand side number (drop
the small ) in the second equation using Shannon’s theorem is always
bigger than the corresponding number in the first equation as long as
2℘ < 12 , which is valid for interested symmetric channels (see Exercise (5)).
Therefore, we conclude that the rate of information R(M ) by Shannon’s
theorem satisfies a stronger inequality than the limit sup of the rate
of information α(2℘) by Gilbert–Varshamov’s bound. Hence, Shannon’s
theorem is stronger in this case. However, Shannon’s theorem is an existence
theorem, and Gilbert–Varshamov’s bound is constructive. Therefore, each
has its own advantage. For more than 20 years, the Gilbert–Varshamov’s
bound, which serves as a standard to measure all codes, has been met by the
classical Goppa codes (cf. Proposition 3.22 ), and it has only been surpassed
by the geometric Goppa codes in the 1990s (see Theorem 5.11, Part IV).

Exercises

(1) Let Cn be the Hamming codes [2n − 1, 2n − n − 1, 3]. Let Rn be their


rates of information. Find limn→∞ Rn .
(2) Let Cn be the triple repetition code of length 3n, i.e., a message
[a1 a2 · · · an ] is sent to [a1 a1 a1 a2 a2 a2 · · · an an an ]. Let Rn be their rates
of information. Find limn→∞ Rn .
(3) Find the maximal value of Hq (x) − x.
(4) Finish the proof of Case 2 in Lemma 1.37.
(5) Prove the last paragraph of Section 1.9.
PART II

Ring Codes
This page intentionally left blank
Chapter 2

Rings

If we treat letters as unrelated symbols, then we have information theory


which cannot correct any error. If we treat sequences of letters as vectors
in a vector space, then we have Hamming codes which can correct one
error. Naturally, we are interesting in codes which may correct multiple
errors. Therefore, we are very likely to involve more algebraic relations.
Furthermore, it follows from Shannon’s Theorem that we may have to work
with long codes to make better codes. However, a long code will introduce
complexity for decoding. To overcome this complexity, we have to assume
more algebraic structures to help us.
To correct more errors, and to work on longer codes, we have to study
the polynomial rings Fq [β] over finite fields Fq and related topics. In this
chapter, we cover the essential materials from ring theory and finite-field
theory and related topics for future use.

2.1. Preliminaries

In mathematics, we have Ring theory, in addition to Group theory. By


definition, a ring R is a mathematical object which satisfies the following
definition.

Definition 2.1. A ring (R, +, ×) is a set R with two binary operations +


and ×, such that R is closed under those two operations, and they satisfy
the following:
(1) Group law: (R, +) is an abelian group.
(2) Associative law: We always have a × (b × c) = (a × b) × c.

31
32 Introduction to Algebraic Coding Theory

(3) Distributive law: We always have a × (b + c) = a × b + a × c and


(a + b) × c = a × c + b × c.

Sometimes, we use · in place of ×. If a · b = b · a always, then we say


R is a commutative ring. The simplest ring is a ring that consists of only
0. It is called the zero ring and is not of interest. In this book, we assume
that any ring used is not a zero ring. If R is commutative, and there is a
multiplicative identity e = 0 (we usually write e as 1), then we say R is a
commutative ring with identity. In this book, all rings are commutative
rings with identity, if not stated explicitly otherwise. 

Remark: (1) In a ring R, if e = 0, then a = a · e = a · 0 = 0 for any element


a, and the ring is a zero ring.
(2) A non-zero element α in R which is not a zero divisor, i.e., βα = 0
or αβ = 0 ⇒ β = 0, will be called a regular element.
(3) It is easy to see that a commutative ring with identity R which
contains a field K must be a vector space over K. In some books, it is
called K-ring. We shall use this terminology. In coding theory, we only
involve K-rings. In this book, we assume that all rings are K-rings for
some field K. Every K-ring R can be written as K[{R}], i.e., a polynomial
ring on the elements of R.
(4) The rings R used in coding theory usually contain some finite field Fq
as the polynomial rings Fq [{x}], meromorphic power series rings Fq (({x}))
or rational function rings Fq ({x}) (see the following). Therefore, the rings
R can be considered as a vector space over Fq with a multiplication between
elements in Fq and R. 

As usual, we define ideal I as a subring of the ring R with the property


that r · I = {r · a : ∀a ∈ I} ⊂ I. There is a canonical map π : R →
 R/I =
{r + I : r ∈ R}.

2.2. The Finite Field Fq

Recall that letters in coding theory are picked up from a finite field. We wish
to discuss some basic structure of a finite field F. One of the basic properties
we assume is the following theorem from Theory of Groups, Vol. 1, p. 147,
by Kurosh [12], where we consider abelian additive group; note that all
theorems there will be applied in this book multiplicatively.
Rings 33

Theorem 2.2 (Fundamental Theorem of Finitely Generated


Abelian Group). Given any finitely generated abelian group G, then G
is isomorphic to (⊕i Z)(⊕i Zci ) as abelian additive group, where the ⊕ has
only finitely many arguments and ci > 0 for all i.

Proof. See the reference. 

We have the following easy propositions for abelian groups.

Proposition 2.3. (1) Let G = Za ⊕ Zb and d = lcm(a, b). Then, every


element x ∈ G satisfies

dx = 0.

(2) Let G = Za ⊕ Zb , where a, b are co-prime. Then, G is cyclic of order


ab.

Proof. (1) It is easily provable. (2) Let c, d be selected such that ac+bd = 1.
Let x be a generator of Za and y be a generator of Zb . We claim that
z = (dx, cy) is a generator Za ⊕ Zb .
We have the following computations:

az = (adx, acy) = (0, acy) = (0, (1 − bd)y) = (0, y + (−bd)y) = (0, y).

Similarly, we have bz = (x, 0). Therefore, z generates G, and G is cyclic. It


is easy to show that order(G) = ab. 

Note that if we write the group operation multiplicatively, the equation


dx = 0 in (1) will be written as xd = 1.

Proposition 2.4. Let G be a multiplicative subgroup of a finite field Fq .


Then, G is cyclic.

Proof. Since Fq is finite, G is a finitely generated abelian group. Let us


make another copy of G with the product replaced by summation. We
may simply call it G0 . By Theorem 2.2, we have G0 to be isomorphic to
⊕(Zci ). Note that this isomorphism will change the multiplication of G to
the summation in ⊕(Zci ). Let us make an induction on the number factors
n. In general, let us consider any finite group G, which can be written as
⊕ni=1 (Zci ). If n = 1, then our proposition is clearly true since 1 will be a
generator of Zci , and Zci is cyclic. Let us assume that the proposition is
true for any abelian subgroup with number m < n of factors. If there are
34 Introduction to Algebraic Coding Theory

at least two ai , aj with d = lcm(ai , aj ) < ai aj , then all elements in the


subgroup (Zai ⊕ Zaj ) which has ai aj elements satisfy

dx = 0,

which again translates to the multiplication of the field:

xd = 1.

Then, the above polynomial will have more than d distinct elements as
solutions, which is impossible for a field. Now, it follows from (1) of
Proposition 2.3 that Zci ⊕ Zcj = Zci cj . After this recombination, the
factorization has only n − 1 factors, and our proposition follows. 
One interesting criterion for a polynomial f(x) ∈ F[x] to have a multiple
root in an algebraic closure Ω of F is the derivative test. We have to define
the derivative algebraically.

Definition 2.5. Let f(x) = i ai xi . Then, the derivative f  (x) is defined

to be f  (x) = i iai xi−1 . 

It is easy to see that the derivative obeys the following formal rules.

Proposition 2.6. Let a ∈ F be a constant, and f (x), g(x) ∈ F[x]. The


derivative operation has the following rules:

(1) : a = 0,
(2) : (f (x) + g(x)) = f  (x) + g  (x),
(3) : (f (x)g(x)) = f  (x)g(x) + f (x)g  (x).

Proof. The proof is a routine check. 


We have the following criterion.

Proposition 2.7. An element β ∈ Ω is a multiple root of f(x)∈ F[x] ⇐⇒


β is a common root of f(x) and f  (x).

Proof. Let β be a multiple root of f (x ). Then, f(x) = (x − β)2 h(x ) with


h(x )∈ Ω[x]. Then, we have

f  (x) = 2(x − β)h(x) + (x − β)2 h (x) = (x − β)g(x).

It is easy to prove that g(x) ∈ F[x]. Therefore, β is a root of f (x ) and


f  (x). On the other hand, if β is a root of f (x) and f  (x), we write
Rings 35

f(x) = (x − β)r(x). We have


f  (x) = r(x) + (x − β)r (x).
We conclude that β must be a root of r (x ), and f(x) = (x − β)2 s(x). 
We have the following proposition which gives us a lot of information
about the structures of finite fields.

Proposition 2.8. Let Fq be a finite field of q = pn elements in an


algebraic closure Ω of Zp . Then, F∗q = Fq \ 0 is a multiplicative cyclic group
of order pn − 1, and Fq is the solution set in Ω of the following equation:
n n
x(xp −1
− 1) = xp − x = 0.
Furthermore, all solutions of the following equation in Ω are distinct and a
collection of the solutions is a field:
n
xp − x = 0.
If L is another finite field in Ω with pm elements with n|m, then L ⊃ Fq ,
and
n m
xp −1
− 1|xp −1
− 1.

Proof. The first statement follows trivially from the preceding proposition.
For the second statement, let K1 be equal to the collection of all solutions
n
of the equation f (x) = xp − x = 0 in Ω. Since Ω is an algebraic closure
of Zp , then the equation f (x) = 0 splits completely. By considering the
derivative of f  (x) = 1, the equation has no multiple root. Therefore, K1
consists of pn elements. Moreover, let y, z ∈ K1 , then we have
n n n n n n
(y + z)p = y p + z p = y + z, and (yz)p = y p z p = yz.
Therefore, y + z, yz ∈ K. It is easy to check that all requirements of a field
are satisfied to establish that K1 is a field of pn elements and thus equal to
Fq . For the last part of the proposition, let m = ns, then we have pm − 1 =
m n n
psn − 1 = (pn − 1)r. Therefore, xp −1 − 1 = x(p −1)r − 1 = (xp − 1)g(x).
Thus, L ⊃ Fq . 
It is not hard to see the following proposition.

Proposition 2.9. We have Ω = ∪m>0 Fpm .

Proof. Let β ∈ Ω. Then, Zp [β] is a finite extension of Zp . Therefore,


Zp [β] = Fpm for some m. 
36 Introduction to Algebraic Coding Theory

Let us have the following common definition.

Definition 2.10. Let a mapping ρ be defined as ρ(α) = αp . The mapping


ρ is called the Frobenius map. 

We have the following usual proposition.

Proposition 2.11. The Frobenius map ρ is an automorphism of Fq of


order m, where q = pm .

Proof. It is a routine check to see that ρ is an automorphism since all


m
elements β are solutions of the equation xp − x = 0. Then, all elements
β ∈ Fq satisfy
m
βp = β
m
and ρm (β) = β p = β. Therefore, ρm = id. On the other hand, if ρ = id

for some  < m, then we have β p = β for all β ∈ Fq , which is impossible.


Corollary 2.12. Let m = s. Then, ρ is an isomorphism of Fpm which


fixes Fp . Note that the map ρ as an isomorphism of Fpm is of order s.
Furthermore, it follows from Galois theory that any automorphism of Fpm
which fixes Fp is of the form (ρ )j for some suitable j. 

We assume Galois theory and the above corollary. We define the


following.

Definition 2.13. A field K is said to be perfect if either its characteristic


is 0 or every element in K has a pth root in K, where p is its characteristic.


Proposition 2.14. A finite field is perfect.

Proof. It is easy to see that the Frobenius map ρ is one-to-one. The pigeon
hole principle implies ρ is onto. Therefore, for any given α ∈ K, there is an
β ∈ K such that

β p = α.

Clearly, β is the pth root of α. 


Rings 37

2.3. The Computer Programs for Finite Fields

The coding and the decoding processes depend on the computer programs
heavily. This section is written for those readers who are not familiar with
computer programs, especially a student of pure mathematics.
Let us assume that p = 2 in this section. The reader is requested to
discuss the cases of p > 2. Let the set Um of all m-bits be a vector space
of dimension m over F2 . We want to provide a more algebraic structure to
Um . In fact, a finite field F2m over F2 can be used to represent Um . This is
the right way to generalize a vector space in coding theory. The addition is
simple, while the multiplication is very messy in practice. We shall manage
the multiplication in three ways as follows.
(1) The table of logarithm: We shall only consider examples. The
reader may generalize the following construction process to the general
setup for any prime field Z/pZ with p > 2 and any finite field Zpm .
Let us consider F24 over F2 . Let us write F24 = F2 [α], where α satisfies
the equation x4 + x + 1 = 0. We have the following list of powers of α:
{1 = α0 , α, α2 , α3 , α + 1 = α4 , α2 + α = α5 , α3 + α2 = α6 , α3 + α + 1 =
α7 , α2 + 1 = α8 , α3 + α = α9 , α2 + α + 1 = α10 , α3 + α2 + α = α11 , α3 +
α2 + α + 1 = α12 , α3 + α2 + 1 = α13 , α3 + 1 = α14 , 1 = α15 }.
It is clear that 1, α, α2 , α3 are linearly independent over F2 . Let
us write them as [1000], [0100], [0010], [0001], then the above list of the
powers of α may be re-written as [1000], [0100], [0010], [0001], [1100], [0110],
[0011], [1101], [1010], [0101], [1110], [0111], [1111], [1011], [1001], [1000]. One
way to treat the problem of multiplication is to look up the list and find
i, j for any two elements [a1 a2 a3 a4 ], [b1 b2 b3 b4 ] such that
[a1 a2 a3 a4 ] = αi ,
[b1 b2 b3 b4 ] = αj ,
and we define
logα ([a1 a2 a3 a4 ]) = i,
logα ([b1 b2 b3 b4 ]) = j.
Then, we have
logα ([a1 a2 a3 a4 ][b1 b2 b3 b4 ]) = i + j.
Let us find the residue value k of i + j module 15 = 24 − 1, i.e., k =
i + j mod 15 and 0 ≤ k < 15. Now, we may look up the above list of
38 Introduction to Algebraic Coding Theory

powers again to find the corresponding value of αk to determine the value


of [a1 a2 a3 a4 ][b1 b2 b3 b4 ]. This method consists of looking up the table of
logarithm of length 24 − 1 (in general, for the field F2m is a table of length
2m − 1. If m is large, say m > 32, then the table will be beyond the memory
of an ordinary computer and not be feasible).
(2) The table of multiplication: Let us consider the implementation
of a finite field K of pm elements. We may consider the pair K ⊃ Zp . We
shall implement the summation and the multiplication for the prime field
Zp first. The summation shall be the usual sum for integers and then mod
p. For multiplication, there are two ways, either we multiply two numbers
and then mod p, or we use a multiplication table, if the number p of the
elements in Zp is not too big, and then we look up the table to achieve the
multiplication. Note that the size of the table is (1/2)(p − 1) × (p − 2) after
deleting all multiplications of the form 0×a = a×0 = 0 and 1×a = a×1 = a,
and then, the size is the determining factor for the decision.
Let us program the field K which is uniquely determined by the number
pm of elements in K, i.e., let K be the collection of all vectors of the form
[a1 , a2 , . . . , am ] where aj ∈ Zp . We define the summation as the summation
m
of vectors. For multiplication, consider the polynomial xp − x. There are
many monic polynomials factor f (x) ∈ Zp [x] which are irreducible over Zp
and of degree m (see Exercises). Let us select such a polynomial f (x). Let
v = [a0 , a1 , . . . , am−1 ] and u = [b0 , b1 , . . . , bm−1 ] be two element in K, then
we write
fv = a0 + a1 x + a2 x2 + · · · + am−1 xm−1 ,
fu = b0 + b1 x + b2 x2 + · · · + bm−1 xm−1
and compute
m−1

fv × fu mod (f (x)) = fvu (x) = cj xj ,
j=0

and we define: v × u = [c0 , c1 , . . . , cm−1 ]. We have two ways to define the


multiplication for the field K: If (1/2)(pm −1)(pm −2) is small enough, then
we may form a table of multiplication. Or if (1/2)(pn − 1)(pn − 2) is too
big, then for every multiplication, we shall go through the above equation
of modulo polynomial f (x) to find the product (see (3)).
Let us discuss the following example.
Example 1: Let K be a finite field of 23 elements. Then, K ⊃ Z2 , and
K is a vector space of dimension 3 over Z2 . Any element in K can be
Rings 39

represented as [a0 , a1 , a2 ], where aj = 0, 1. Let consider the polynomial


f (x) = 1 + x + x3 . It has no root in Z2 ; therefore, it is irreducible over Z2
(see Exercises). Henceforth, we may use it to define the multiplication in K.
For the sake of notations, let us use the ordinary numerals {0, 1, 2, . . . , 7},
with each written in binary expansion, and use zeroes to fill each to the
length 3 as follows:

0 = [0, 0, 0], 1 = [1, 0, 0], 2 = [0, 1, 0], 3 = [1, 1, 0],


4 = [0, 0, 1], 5 = [1, 0, 1], 6 = [0, 1, 1], 7 = [1, 1, 1].

On the other hand, we represent [a0 , a1 , a2 ] as a0 + a1 x + a2 x2 , i.e., we


represent [0, 1, 0] as x and [0, 0, 1] as x2 . Then, we have 2 × 2 (= x · x =
x2 ) = 4. Similarly, we may compute the following table.

× 2 3 4 5 6 7
2 4 6 3 1 7 5
3 6 5 7 4 1 2
4 3 7 6 2 5 1
5 1 4 2 7 3 6
6 7 1 5 3 2 4
7 5 2 1 6 4 3

For any other finite field, we use the above method to construct a
multiplication matrix (ai aj ) for all pairs of non-zero elements (ai , aj ) in
the field. Once we have the table, we have to lookup the table once to find
out the result of multiplication of the pair (ai , aj ).
Note that the table is symmetric, and the multiplication depends on the
polynomial f (x) = x3 + x + 1, for instance, 5 × 3 = 4 which comes from the
following equation:

(x2 + 1)(x + 1) = x3 + x2 + x + 1 = x3 + x + 1 + x2 = x2 mod (x3 + x + 1).

Note that x8−1 − 1 = (x + 1)(x3 + x + 1)(x3 + x2 + 1) mod 2. The other


possible selection of the irreducible polynomial of degree 3 is x3 + x2 + 1.
If we use it to define the multiplication, then we have 5 × 3 = 2. It is easy
to see that the definition of multiplication depends on the selection of the
irreducible polynomial of degree 3. 

(3) Linear Feedback Shift Register: Let us consider another method.


Let us assume that p = 2 in this subsection. Let us consider F24 . Let the
40 Introduction to Algebraic Coding Theory

defining equation be
f (x) = x4 + x + 1 = 0.
Note that we have 1 = −1. Let F24 = F2 [α] and α satisfies the above
equation. Let us represent [a1 a2 a3 a4 ] as a1 + a2 α + a3 α2 + a4 α3 and
[b1 b2 b3 b4 ] as b1 + b2 α + b3 α2 + b4 α3 . Then certainly, [a1 a2 a3 a4 ][b1 b2 b3 b4 ]
can be represented as (a1 + a2 α + a3 α2 + a4 α3 )(b1 + b2 α + b3 α2 + b4 α3 ) =
(a1 b1 ) + (a1 b2 + a2 b1 )α + · · · + (a4 b4 )α6 = c1 + c2 α + · · · + c7 α6 . Now,
we want to re-write the last expression as a polynomial in α of degree at
most 3 by going mod the defining equation. In general, we may reduce

any polynomial j+1 ci αi+1 to a polynomial of degree at most 3 using a
linear feedback shift register, which simulates the process of modulo out the
defining equation as follows:

c1 - c2 - c3 - c4 -· · · · · · -c−4 -c−3 -c−2 -c−1


6 6

LFSR for x4 = x + 1.

The process is as follows: Assume c = 1 = 0, then we push all terms


one step rightward, and we make the change c−4 → c−4 + 1 and c−3 →
c−3 + 1. The remaining terms stay the same. The next step is pushing all
terms one step rightward. The term c−1 will fall off the horizontal line. If
c−1 is 0, then we do nothing, and we push again. If c−1 is 1, then we use
the above diagram to feed back and make changes of the terms. We keep
making shifts until there are only four terms, c1 , c2 , c3 , c4 , left. What is left
is the result of multiplication.

Exercises

(1) Let Ω be an algebraic closure of Fp , where p is a prime number. Show


that there is a unique field of order pm ⊂ Ω.
(2) Find a quadratic irreducible equation f (x) over Fp .
(3) Show that a polynomial f (x) of degree n is irreducible over Fp ⇔
n
(1) f (x) | xp − x,
m
(2) (f (x), xp + 1) = (1), f or all m < n.
Rings 41

(4) Let α be a root of x2 + x + 1 = 0 and β be a root of x2 + x + α = 0 in


an algebraic closure Ω of F2 . Show that F4 = F2 [α] and F16 = F4 [β].
(5) We use the notations of the preceding problem. Show that every element
γ ∈ F16 can be expressed as γ = a3 αβ + a2 β + a1 α + a0 , where ai = 0, 1
are elements in F2 . We may represent γ by an integer a3 23 + a2 22 +
a1 2 + a0 . Find γ 2 if γ is represented by 13.
(6) Find all group generators for F24 .
(7) Over the finite field Fpn , show that

(α + β)p = αp + β p .

(8) Find an element α which generates the multiplicative group Fp2 , and
find its defining equation.
(9) Let δ ∈ F24 be a non-zero element. Find δ −1 .

2.4. The Total Quotient Rings

In the classical Goppa code1 (see Section 3.4), we use the total quotient
rings. Given a ring R without non-zero zero divisors, which is called an
integral domain, we may consider the quotient field of R defined as the set
s/r with s = 0 and s/r ≡ s /r iff sr = s r. In the classical case of the
integral domain Z, the ring of integers, its quotient field is the the field of
rational numbers Q. A possible generalization of quotient field to the case
of rings which are not integral domains are the total quotient rings.

Definition 2.15. If R is a ring with S equal to the set of regular elements,


where S is a non-empty set, then a total quotient ring of R is a ring
F, which is R × S (where the element [r, s] is written as r/s) with the
equivalent relation that r/s = r /s iff rs = s r. Its identity is the class of
b/b for all regular elements b. The set S is called the set of denominators.
Then, the quotient ring, { rs : r ∈ R, s ∈ S}, is called the total quotient ring
S−1 R = RS−1 . 

The following examples are helpful.

Example 2: Let R be the ring of real numbers. Then, its total quotient
ring is itself. 

1 Valery Goppa (1939–), Soviet and Russian mathematician.


42 Introduction to Algebraic Coding Theory

Example 3: Let R be the residue class ring K[x ]/(g(x )). Then, an element
f (x) ∈ R is a regular element ⇔ g(x ) and f (x ) are co-prime. In that case,
there are elements h(x ) and r (x ) such that

h(x)g(x) + f (x)r(x) = 1.

Therefore,

1
= r(x).
f (x)

It is easy to see that the total quotient ring of K[x ]/(g(x )) is itself. We
apply the total quotient ring to the classical Goppa code. Especially, if

f (x) = (x − γi ) with g(γi ) = 0, where {γi } are all distinct, then any
 ci
element of the form ∈ K[x]/(g(x)) can be written as n(x)/f (x)
(x−γi )
with deg(n(x)) < deg(g(x)). 

2.5. The Ring F[x]

As we point out that the rings used in coding theory are usually K-rings
(see Chapter 3), it is easy to see that R can be expressed as K[{R}]. The
simplest and most useful polynomial ring in coding theory is K[x], where x
is a variable. Let us use R to denote the polynomial ring F[x] for any field
F. We have the following definition.

Definition 2.16. Let us consider R. Let a polynomial f (x) = i ci xi be
given. The degree function, deg(f (x )), of any polynomial f (x) is as follows:

max{i : ci = 0} if f (x) = 0
deg(f (x)) =
−∞ if f (x) = 0. 

Using the fact that the field F is an integral domain, we have the
following basic properties of deg(f (x )):

deg(f (x)g(x)) = deg(f (x)) + deg(g(x)),

deg(f (x) + g(x)) ≤ max(deg(f (x)), deg(g(x))),

and it shows that F[x] is an integral domain.


Rings 43

We have the following proposition.

Proposition 2.17. Let g(x), αr (x), αu (x), ωr (x), fu (x) be polynomials


in R. If maximal(deg(αu (x)ωr (x)), deg(αr (x)fu (x))) < deg(g(x)), then
αu (x)ωr (x) ≡ αr (x)fu (x) mod (g(x)) ⇒ αu (x)ωr (x) = αr (x)fu (x).

Proof. The proof is trivial. 

The following proposition is classically known, and the proof is left to


the reader.

Proposition 2.18 (Euclidean algorithm). Given any two polynomials


f1 (x), f2 (x) = 0 in R. There are polynomials β1 (x) and f3 (x) in R with
deg(f3 (x)) < deg(f2 (x)) (note that f3 (x)=0 ⇔ deg(f3 (x)) = −∞) and
deg(β1 (x)) ≤ deg(f1 (x))− deg(f2 (x))} such that

f1 (x) = β1 (x)f2 (x) + f3 (x). 

The above proposition can be used to find the GCD of two polynomials
f (x ), g(x ). The following process, known as the “long algorithm”, is
fundamentally important and is of interest for decoding purposes later in the
book. We pay attention to the degrees of the polynomials involved, which
turns out to be important in decoding programs. Note that the Euclidean
algorithm is fast in computing. The main point in the process of decoding is
that it will be modified to an Euclidean algorithm with a stopping strategy
depending on the degrees involved (cf. Proposition 3.12).

Proposition 2.19 (Long algorithm). Let f1 (x), f2 (x), f3 (x) be given


as in the preceding proposition. If f3 (x) = 0, the following sequence of
polynomials fi (x) for i = 3, . . . , s with all polynomials non-zero, except
possibly fs (x), can be defined inductively for j = 3, . . . , s − 1 as

fj−1 = βj−1 (x)fj (x) + fj+1 (x)

such that after we name nj = deg(fj (x)), mj = deg(βj (x)), we have (1)
nj+1 < nj and (2) mj−1 = nj−1 − nj . Furthermore, after repeated back-
substitution (see Proof ), we have the following equation for j ≥ 3:

αj (x)f1 (x) + γj (x)f2 (x) = fj (x),

with deg(αj (x)) ≤ n − nj−1 and deg(γj (x)) ≤ n − nj−1 , where n =


max(deg(f1 (x)), deg(f2 (x))).
44 Introduction to Algebraic Coding Theory

Proof. We may apply the Euclidean algorithm to the pair f2 (x), f3 (x) and
so on, until we reach the case that fs (x) = 0. The first part (1), (2) of the
proposition is routine. Suppose that we have the second part for j = 4, . . . , 
with  < s. Then, we have the following three equations:
f−1 = β−1 (x)f (x) + f+1 (x),
α−1 (x)f1 (x) + γ−1 (x)f2 (x) = f−1 (x),
α (x)f1 (x) + γ (x)f2 (x) = f (x).
Substituting the last two equations to the first one and collecting coeffi-
cients, we get the following equations:
α+1 (x) = α−1 (x) − α (x)β (x),
γ+1 (x) = γ−1 (x) − γ (x)β (x).
Note that
deg(α−1 (x)) ≤ n − n−2 ≤ n − n ,

deg(α (x)β−1 (x)) ≤ n − n−1 + n−1 − n = n − n ;

deg(γ−1 (x)) ≤ n − n−2 ≤ n − n ,

deg(γ (x)β (x)) ≤ n − n−1 + n−1 − n = n − n .


Therefore, we have
deg(α+1 (x)) ≤ n − n ,
deg(γ+1 (x)) ≤ n − n . 
A clever use of the above well-known proposition is a keystone of coding
theory. Historically, the earliest application of the above theorem is in
number theory. The continuous fraction of π is the following:
1
π =3+ 1 .
7+ 1
15+ 1+292+···

We get the sequence of numbers {3, 22 333 355


7 , 106 , 113 , . . .} by truncating the
continuous fraction of π. As approximations of π, 3 was known to King
Soloman, and found in a classical Chinese mathematics book, Zhou Pai,
22 1 355 1
7 = 3 + 7 was known to Archimedes, and the number 113 = 3 + 7+ 1 15+1
was known as Zu’s number (Zu, AD 340–501, China). From number theory,
we know that this sequence gives us the best possible approximations of π.
Rings 45

For instance, if we restrict the denominator to be less than 1000, than 355
113
is the best approximation to π.
When we apply the long algorithm to coding problems, we have to know
the place to truncate the process. Now, Sugiyama, Kasahara, Hirasawa, and
Namekawa [35] noticed the place to stop for the above long algorithm for a
modern application for decoding purposes (see Proposition 3.12).
From the above proposition, we can deduce that R = F [x] is a principal
ideal ring, i.e., each ideal is generated by one element,

2.5.1. LFSR
Let us consider the problem of combining the above proposition with a
computer. As well-known, the Euclidean algorithm can be implemented
effectively using LFSR (linear feedback shift register) in any computer. Let
us consider
f1 (x) = β1 (x)f2 (x) + f3 (x)
as in the statement of the Euclidean Algorithm. Let
f1 (x) = c0 + c1 x + c2 x2 + · · · + cn1 −1 xn1 −1 + cn1 xn1 ,
f2 (x) = a0 + a1 x + · · · + an2 −1 xn2 −1 + xn2 .
In general, polynomial divisions can be performed using LFSR. Let us
consider a simple case over F2 . We use the following LFSR to perform
the division:

c1 - c2 - c3 - c4 -· · · - c−n2 - · · · - c−2 - c−1

6 6 6 6

a0 ··· an2 −2 an2 −1

Let a polynomial g(x) = c0 + c1 x + · · · + c x be given, and we want


to use f2 (x) to cut down the degree  of g(x). If  < n2 , then we will not
do anything. The process is to assume  ≥ n2 , c = 1 = 0, then c−n2 →
c−n2 + a0 , c−n2 +1 → c−n2 +1 + a1 , and so on. If  − 1 ≥ n2 , the next
step is pushing all terms one step rightward. The term c−1 will fall off the
horizontal line. If it is 0, then we do nothing, and we push again. If c−1 is 1,
then we use the above diagram to feed back and make changes to the terms.
We keep making shifts until there are n2 or less terms, c0 , c2 , . . . , cn2 −1 , left.
46 Introduction to Algebraic Coding Theory

What is left is the result of the Euclidean algorithm. As an exercise, the


reader is asked to set up an LFSR over F3 using f2 (x) = 1 − x + x2 − x3 .

2.5.2. Ideals
In the ring theoretical coding theory, the word space will be R/I, where
R = F[x], and I an ideal of R. The code space will be J/I, where J ⊃ I an
ideal. We have the following.

Proposition 2.20. Every ideal I of R = F[x] is of the form (g(x)).

Proof. We may use the preceding proposition to construct a generator


g(x). For an existence proof, note that the zero ideal is generated by 0,
otherwise, suppose I is non-zero. Let g(x) be a non-zero polynomial in I with
the smallest degree. Then g(x) is a generator by the Euclidean algorithm.


Corollary 2.21. The ring F[x]/I is a principal ideal ring.

Proof. Let J̄ be an ideal in R/I. Let J be the pre-image of J̄ in R. Then,


J is clearly an ideal in R. Let g(x) be its generator. Then clearly, g(x)
generates J̄. 
The following proposition is the Lagrange interpolation theorem which
can be considered as a special case of the Chinese remainder theorem.

Proposition 2.22. Let {βi }n1 , {αi }n1 be elements in F such that all βi ’s
are distinct. Then, there is an unique polynomial f (x) of degree at most
n − 1 such that f (βi ) = αi ∀ i. Moreover, f (x) is of the following form:

 j=i (x − βj )
f (x) = αi  .
i j=i (βi − βj )

Proof. It is easy to see that the f (x) defined above satisfies the require-
ments of the proposition. Suppose g(x) is another. Then, we have f (x)−g(x)
with n roots βi and of degree at most n − 1. Therefore, f (x) − g(x) = 0 or
f (x) = g(x). 

Corollary 2.23. Let {βi } be a set of n distinct numbers. Let Pn be the set

of all polynomials of degrees ≤ n−1. The Pn is generated by { j=i (x−βj )}
as a vector space over F. 

We make the following simple generalization of the above Lagrange


interpolation theorem.
Rings 47

Proposition 2.24. Let {βi }m n


1 , {αi }1 be elements in F where m ≤ n and
all βi ’s are distinct and αi ’s may not be all distinct. Furthermore, we have
m
positive integers mi (i.e., the multiplicities) such that i=1 mi = n. Then,
there is a unique polynomial f (x) of degree at most n − 1 such that f (x) −
αi = (x−βi )mi fi (x)∀ i for some suitable polynomials fi (x), with fi (βi ) = 0.
Moreover, there are polynomials gi (x) with deg(gi (x)) ≤ (mi − 1) that
satisfy
 
gi (x) (x − βj )mj = 1
i j=i

such that f (x) is of the following form:


 
f (x) = αi gi (x) (x − βj )mj .
i j=i

Proof. First, we prove the equation


 
gi (x) (x − βj )mj = 1.
i j=i

We shall use the proof of the Chinese remainder theorem. Since the

polynomials j=i (x − βj )mj ’s are co-prime, they generate the unit ideal,
i.e., we have
 
hi (x) (x − βj )mj = 1
i j=i

for some suitable hi (x). We may not take hi to be gi because the degree
restriction on gi may not be satisfied. Let r = max{deg(hi (x)) − mi }. If
r < 0, then we just let gi (x) = hi (x). Note that the degree restriction is
satisfied. If r ≥ 0, let s = the number of hi (x) such that deg(hi (x))−mi = r.
We make a double induction on r, s, i.e., we reduce the number s to 0, then
r will automatically drop. When r drops to negative, then we find gi (x)’s.
Note that we always have s > 1. Otherwise, s = 1, and there is a unique
term of the highest degree, which cannot be canceled by any other term,
and the above equation cannot be satisfied. Therefore, s ≥ 2. Let us pick
up any two terms of the highest degree, say corresponding to hi (x) and
hj (x). We have r = deg hi (x) − mi = deg hj (x) − mj . For any c, hi (x) and
hj (x) can be replaced by hi (x) + cxr (x − βi )mi and hj (x) − cxr (x − βj )mj ,
respectively, and the above equation is still satisfied. We may select c such
that deg(hi (x)+cxr (x−βi )mi ) is smaller. Thus, at least one term drops out
48 Introduction to Algebraic Coding Theory

from the collection of the highest terms. We reduce the number s at least
by 1. When the number s drops to zero, then r must drop. So, we find
gi (x)’s by double induction.
Now, we have
 
gi (x) (x − βj )mj = 1 and
i j=i
⎛ ⎞
 
αk ⎝ gi (x) (x − βj )mj ⎠ = αk .
i j=i

Therefore,
⎛ ⎞
 
⎝ αi gi (x) (x − βj )mj ⎠ − αk
i j=i
⎛ ⎞ ⎛⎛ ⎞⎞
   
=⎝ αi gi (x) (x − βj )mj ⎠ − ak ⎝⎝ gi (x) (x − βj )mj ⎠⎠
i j=i i j=i
⎛ ⎞ ⎛⎛ ⎞⎞
   
=⎝ αi gi (x) (x − βj )mj ⎠ − ak ⎝⎝ gi (x) (x − βj )mj ⎠⎠
i=k j=i i=k j=i

= (x − βk )mk fk (x).
 
It means that we may let f (x) = i αi gi (x) j=i (x − βj )mj , then we have
f (x) − αk = (x − βk )mk ∀k. Furthermore, if there is another polynomial
f ∗ (x) with the same properties of f (x), namely

f (x) − αk = (x − βk )mk fk (x) and f ∗ (x) − αk = (x − βk )mk fk∗ (x),

then f (x) − f ∗ (x) will be divisible by (x − βi )mi for all i, i.e., f (x) − f ∗ (x)

will be divisible by i (x − β)mi which has a degree n higher than n − 1.
We conclude that f (x) − f ∗ (x) = 0. 

2.5.3. A Ring Theoretical Presentation of a Hamming Code

Before we study the abstract theory of rings further, let us study an example
to illustrate the usage of ring theory to express the Hamming code and
introduce the readers to the next level of coding theory. Recall the [7, 4, 3]
Rings 49

Hamming code C having check matrix H1 as follows


⎛ ⎞
1 1 0
⎜0 1 1 ⎟
⎜ ⎟
⎜1 1 1 ⎟
⎜ ⎟
⎜ ⎟
H1 = ⎜ 1 0 1 ⎟ .
⎜ ⎟
⎜1 0 0 ⎟
⎜ ⎟
⎝0 1 0 ⎠
0 0 1
Let us permute the rows of H1 to redefine H as follows:
⎛ ⎞
1 0 0
⎜0 1 0⎟
⎜ ⎟
⎜0 0 1⎟
⎜ ⎟
⎜ ⎟
H = ⎜1 1 0⎟.
⎜ ⎟
⎜0 1 1⎟
⎜ ⎟
⎝1 1 1⎠
1 0 1
Now, consider the ring F2 [x]/(1 + x + x3 ) = K(= F23 ). It is easy to see
that the polynomial g(x) = 1 + x + x3 is irreducible over F2 (for instance,
if g(x) can be factored, then one of the factor must be linear, so it has a
root in F2 . However, we have g(0) = 0 and g(1) = 0, then g(x) cannot be
factored). So, K is a field and isomorphic to F32 as vector spaces. If β is the
image of x under the quotient mapping, then as a vector space we have
K = {a1 β 0 + a2 β 1 + a3 β 2 : ai ∈ F2 }.
We can drop the powers of β and write rows of the coefficients [a1 a2 a3 ] as
an element in the vector space K (= F23 ) as
K = {[000], [100], [010], [001], [110], [011], [111], [101]}.
On the other hand, K\{0} is a cyclic group multiplicatively, while we may
write them as element in a field K = {0, 1, β, β 2 , β 3 , β 4 , β 5 , β 6 }, where
β 7 = 1. The interesting thing about these two representations is that the
ordering of elements are identical (this is accidental). For instance, the sixth
element in the field representation is β 4 , while
x4 ≡ x + x2 mod (1 + x + x3 ).
Therefore,
β 4 = 0 + 1β + 1β 2 ,
and β 4 corresponds to [011], which is the sixth element in the vector space
representation of K(= F23 ).
50 Introduction to Algebraic Coding Theory

We may write H = (1ββ 2 β 3 β 4 β 5 β 6 )T explicitly as


⎡ ⎤ ⎡ ⎤
1 1 0 0
⎢ β ⎥ ⎢0 1 0⎥
⎢ ⎥ ⎢ ⎥
⎢ β 2 ⎥ ⎢0 0 1⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
H = ⎢ β 3 ⎥ = ⎢1 1 0⎥.
⎢ 4⎥ ⎢ ⎥
⎢ β ⎥ ⎢0 1 1⎥
⎢ 5⎥ ⎢ ⎥
⎣ β ⎦ ⎣1 1 1⎦
β6 1 0 1
We define Hamming code space as all words a = [a1 a2 a3 a4 a5 a6 a7 ] such
7
that a × H = 0, i.e., 1 ai β i−1 = 0, where ai ∈ F2 . Alternatively, we may

write each word as a polynomial a(x) = ai xi−1 such that deg a(x) ≤ 6,
and a(x) is a code word iff a(β) = 0 ∈ F23 . At this point, we may further
involve some known mathematical techniques to handle the coding problem.
The condition a(β) = 0 ∈ F23 implies a(x) = (1 + x + x3 ) × m(x).
Therefore, deg m(x) ≤ 3 and m(x) = m1 + m2 x + m3 x2 + m4 x3 , and all
m(x) form the message space. The relation of a(x) and m(x) is as follows:
a(x) = a1 + a2 x + a3 x2 + a4 x3 + a5 x4 + a6 x5 + a7 x6
= (1 + x + x3 )m(x)
= (1 + x + x3 )(m1 + m2 x + m3 x2 + m4 x3 )
= m1 + (m1 + m2 )x + (m2 + m3 )x2 + (m1 + m3 + m4 )x3
+ (m2 + m4 )x4 + m3 x5 + m4 x6 .
We have the following parametric form of the code space with mi ’s as the
parameters:
a 1 = m1 , a 2 = m1 + m2 ,
a 3 = m2 + m3 , a 4 = m1 + m3 + m4 , (3)
a 5 = m2 + m4 , a 6 = m3 ,
a 7 = m4 .
After we eliminate the parameters mi ’s, we have the following system of
defining equations:
a1 = a4 + a6 + a7 ,
a2 = a4 + a5 + a6 , (4)
a3 = a5 + a6 + a7 .
Rings 51

Therefore, we have [a4 a5 a6 a7 ] × G = [a4 a5 a6 a7 a1 a2 a3 ], where G is the


following matrix:
⎛ ⎞
1 0 0 0 1 1 0
⎜0 1 0 0 0 1 1⎟
G=⎜ ⎝0 0 1 0 1 1 1⎠.

0 0 0 1 1 0 1
Note that G is the same generator matrix used in the introduction.
We can stay within the ring of polynomial F2 [x] by starting with
4
message [m1 . . . m4 ], forming the polynomial m(x) = 1 mi xi−1 with the
code word a(x) as
7

a(x) = (1 + x + x3 )m(x) = ai xi−1
1

and to send the encoded message as the coefficients [a1 . . . a7 ]. The receiver
7
will check the received [a1 . . . a7 ] and compute a (β) = 1 ai β i−1 .
Assume that there is at most one error in the received word. If the
result of the preceding calculation is a (β) = 0, then there is no error.
Otherwise, a (β) = β j = 0, and there is an error at the (j + 1)th spot of
[a1 · · · a7 ]. After correcting the error and recovering the original polynomial
a(x), one has the polynomial m(x) = a(x)/(1 + x + x3 ) and the original
message [m1 · · · m4 ]. We call the polynomial g(x) = 1 + x+ x3 the generator
polynomial and the polynomial h(x) = (1 + x7 )/g(x) = 1 + x + x2 + x4 the
check polynomial. Note that a(x) is a code word iff a(β) = 0 iff a(x) =
c(x)g(x) iff (1 + x7 )|a(x)h(x).
The decoder works perfectly if there is no error or if there is one error.
However, if there more than one error, and (1) if r happens to be a code
word, then the decoder will treat it as the original code word, or (2) if r is
not a code word, then the decoder will replace it by a wrong code word.
The above example illustrates that Hamming codes can be discussed
purely in terms of polynomial rings. It broadens our horizons. This is
the next level of development in coding theory. Before we continue our
discussion of Fq [x], we must have some understanding of the finite field Fq .

Exercises

(1) Prove the Chinese remainder theorem for K[x], and use it to prove the
Lagrange interpolation theorem.
(2) Find the total quotient ring of Z/4Z.
52 Introduction to Algebraic Coding Theory

(3) Let χγ (x) be the characteristic function of γ ∈ Kn , where K is a finite


field, i.e., χγ (γ) = 1 and χγ (a) = 0 for all a = γ. Show that χγ (x) can
be written as a polynomial in n variables.
(4) Let π be any K-valued function on Kn , where K is a finite field. Show
that π can be written as a polynomial in n variables.
(5) Find two distinct polynomials f (x), g(x) ∈ Fp [x] such that f (γ) = g(γ)
for all γ ∈ Fp .

2.6. Separability

We have the following useful definition.

Definition 2.25. A polynomial f (x) ∈ K[x] is said to be separable if it


has no multiple root in an algebraic closure Ω of K. 

Proposition 2.26. Over a finite field Fq , every irreducible polynomial is


separable.

Proof. Let f (x) be an irreducible polynomial. If it has a multiple root,


then the root must be a root of f  (x). It means that f (x) and f  (x) must
have a common factor. With the assumption of irreducibility of f (x), this
can happen only if f  (x) is the zero polynomial, i.e., f (x) = g(xp ) for some
suitable g(x). Since a finite field is perfect, every coefficient of g(x) is a pth
power of some other element. It is easy to see that f (x) = g(xp ) = h(x)p
for some suitable h(x). This contradicts the irreducibility of f (x). 
Let q = p . We wish to count the number of monic irreducible
polynomials of degree r in Fqm [x ]. Let the number be Ir .

Proposition 2.27. Let s be any positive integer. We have



q ms = rIr .
r|s

Proof. Let us fix an algebraic closure Ω of Fqm . Let α ∈ Fqm satis-


fying a monic irreducible polynomial fα (x) ∈ Fqm [x] of degree, say r.
From finite-field theory, we have r | s. From Proposition 2.3, we know that
fα (x) is separable. Any root of fα (x) will generate a field extension over
Fqm of degree r. It follows from Proposition 1.34 that this field extension
is contained in Fqms . We may group all roots (there are r of them) of fα (x)
together. Our proposition follows easily. 
Rings 53

Now we use the well-known Möbius inversion formula in number theory


to find Ir . For this purpose, we have the following definition.

Definition 2.28. The Möbius function μ is defined as




⎨ 1, if n=1,
k
μ(n) = (−1) , if n is a product of k distinct primes,


0, otherwise. 

The following theorem is well known.


Theorem 2.29 (Möbius inversion formula). We have

If g(n) = f (d), ∀n
d|n
 n
then f (n) = μ(d)g ∀n.
d
d|n

Proof. (1) The Möbius function μ(n) satisfies



 1, if n=1,
μ(d) =
d|n
0, otherwise.

(2) We have the following:


 n  n
μ(d)g = μ g(d)
d d
d|n d|n
 n    n
= μ (f (k)) = μ f (k)
d kd
d|n k|d k|n d|(n/k)
   
= μ(d)f (k) = μ(d)f (k)
k|n d|(n/k) k|n d|(n/k)

= f (n).

Remark: The converse is also true. 
Recall that Ir is the number of monic irreducible polynomials of degree
r in Fqm [x]. It leads to the following proposition.

Proposition 2.30. If r ≥ 1, then Ir > 0. Let r be a prime. If r is 2, then


rIr = (q mr − q m ) > 0.

Otherwise, rIr = d|r μ(d)q mr/d > q mr (1 − q −rm/2+1 ) > 0.
54 Introduction to Algebraic Coding Theory

Proof. It is clear that every linear polynomial is irreducible. We have r = 1


which implies that Ir > 0. If r is 2, then r Ir = (q mr − q m ) > 0.
Let us assume that r > 2 and is a prime. It follows from Proposition
2.27 and Theorem 2.29 that

rIr = μ(d)q mr/d .
d|r

Then, we have Ir > 0. In general, we have r ≥ 3 and



rIr = μ(d)q mr/d
d|r

> (q mr − q md/i )
i≥2

mr
> (q − q mr/2+1 )
= q rm (1 − q −rm/2+1 )
>0
and our proposition is proved. 
The above proposition shows the existence of finite field Fpm for any
m in a fixed algebraic closure Ω. Each finite field Fpm exists uniquely. We
have the following diagram, where lines indicate inclusive relation.

··· ··· ··· ··· ··· ··· ··· ···

Fp4 Fp6 Fp10 Fp15 Fp35 Fp77 ··· ··· ··· ···
 A
 A @
@ AA 
 AA 
 AA
 A  @A  A  A
P A @A A A
Fp2 PPFp3 HH Fp5 Fp7 Fp11 Fp13Fp17 ···
PP H  
PPHH@ @ 
 
PPH@ 
PHP@ 
 
P
H
Fp

Figure 2.1.
Rings 55

2.7. Power Series Rings F[[x]] and Fields of Meromorphic


Functions F((x))

We shall consider formal power series in this section. First, the decoding
process of codes on F[x ] depend on some properties in this section (see
Proposition 2.36). Second, the important concept of residue in a Riemann
surface (or a smooth algebraic curve), which is all important in the
Geometric Goppa Codes, can be computed (see Proposition 4.42) with the
help of the materials in this section.
∞ i
Let us consider the expression f (x) = i=0 ai x , where ai ’s are
coefficients in the field F. In the past, analysts considered the problem
of evaluating the expression for x = b and deducing many concepts of
convergence, divergence, etc. If a power series is divergent, then we are
likely to disregard it from the point of analysis. However, algebraically, if
we do not evaluate them (other than the trivial evaluation at x = 0), then
we may simply treat them as algebraic items, and they play their roles.
Henceforth, expressions of the form



f (x) = ai xi
i=0

will be called formal power series or power series, and expressions of the
form


f (x) = ai xi
i=−m

will be called formal meromorphic functions or meromorphic functions.


All formal power series with coefficients in F will be denoted by F[[x ]].
All formal meromorphic functions with coefficients in F will be denoted
by F((x)).
 
Let f (x) = j aj xj , g(x) = k bk xk ∈ F((x)). We define, as usual,

⎛ ⎞
 
f (x) · g(x) = ⎝ aj bk ⎠ xi
i j+k=i

f (x) + g(x) = (ai + bi )xi .
i
56 Introduction to Algebraic Coding Theory

We have the following proposition.

Proposition 2.31. The sets F[[x]] ⊂ F((x)) are closed under the usual
addition and multiplication.

Proof. The proposition follows by routine checking. 



Definition 2.32. Let f (x) = i ci xi be a formal power series or a formal
meromorphic function. We define the order of f (x), in symbol ord(f (x)), as

min{i : ci = 0} if f (x) = 0,
ord(f (x)) =
∞ if f (x) = 0. 

We have the following proposition.

Proposition 2.33. We have


ord(f (x)g(x)) = ord(f (x)) + ord(g(x)),
ord(f (x) + g(x)) ≥ min(ord(f (x)), ord(g(x))).

Proof. The proof is a routine check. 

Proposition 2.34. We have f (x) which is a unit in F[[x]] (i.e., f (x)−1


exists) ⇐⇒ f (0) = 0.

Proof. (=⇒) If f (x) is a unit, then there exists g(x) such that f (x)g(x) =
1. Then, we have
ord(f (x)) + ord(g(x)) = 0.
Therefore, ord(f (x)) = 0 and f (0) = 0.
(⇐=) If f (0) = a = 0, then a−1 f (0) = 1. We may write a−1 f (x) as
1 − g(x) with ord(g(x)) ≥ 1. We have


(a−1 f (x))−1 = 1 + g(x)i .
i=1

It is easy to see that when we collect coefficients of like degrees in x on the


right-hand side and add them up, we only have finite sums. Therefore, the
right-hand side is a power series. It is then trivial to see that the inverse of
f(x)(=a−1 (a−1 f (x))−1 ) exists. 
Note that a ring R is said to be an integral domain if f · g = 0, then
either f = 0 or g = 0. We have the following.
Rings 57

Proposition 2.35. Given any field F, the set F[[x]] is an integral domain
and F((x)) is field.

Proof. The proof is a routine check. 

Let us use the above computations for polynomials. Let f (x) =


m i
n i
i=0 ai x , g(x) = i=0 bi x be two polynomials of degrees m, n, respec-
tively. For simplicity, let us assume that b0 = g(0) = 0. Then, fg(x)
(x)
∈ F[[x]].
We have
f (x)  i
= ci x .
g(x) i=0

The total number of coefficients of fg(x) (x)


is (m + 1 + n + 1). Let us
compute the number of independent coefficients. Furthermore, we may
change f (x), g(x) by a common multiple of non-zero constant without
changing fg(x) (x)
. Therefore, the number of independent coefficients should
be (m + n + 1). It is natural to ask if the first (m + n + 1) coefficients
c0 , c1 , . . . , cm+n of the above power-series expression uniquely determine
the rational function fg(x) (x)
. We have the following natural proposition which
is significant for decoding purposes.

Proposition 2.36. Given any polynomial h(x) of degree m + n, there is


at most one rational function fg(x)
(x)
, where f (x), g(x) are of degrees at most
m, n, respectively, with g(0) = 0 such that in the following equation in
F[[x]]:
 
f (x)
ord − h(x) ≥ m + n + 1 (1)
g(x)

Furthermore, (1) if such a rational function fg(x)(x)


exists, and there exists
another pair of polynomials (f1 (x), f2 (x)) with f2 (0) = 0, with f1 (x), g1 (x)
being of degrees at most m − s, n − s, respectively, for some non-negative
integer s, then fg11 (x) f (x)
(x) = g(x) ⇐⇒
 
f1 (x)
ord − h(x) ≥ m + n − s + 1. (2)
g1 (x)
(2) If a factor a(x) of degree u of g(x) is known, then we have
 
f (x)
ord − h(x) ≥ m + n − u + 1. (3)
g(x)
58 Introduction to Algebraic Coding Theory


Proof. Let us show the uniqueness. If fg∗ (x)
(x)
is another pair of polynomials,
where f (x), g (x) are of degrees at most m, n, respectively, with g ∗ (0) = 0
∗ ∗

and
 ∗ 
f (x)
ord − h(x) ≥ m + n + 1.
g ∗ (x)
Then, we have
 
f (x) f ∗ (x)
ord − ∗ ≥ m + n + 1.
g(x) g (x)
On the other hand, we have
f (x) f ∗ (x) f (x)g ∗ (x) − f ∗ (x)g(x)
− ∗ = .
g(x) g (x) g(x)g ∗ (x)
Note that g(0)g ∗ (0) = 0. We conclude

ord(f (x)g ∗ (x) − f ∗ (x)g(x))) ≥ m + n + 1.

Note that f (x)g ∗ (x) − f ∗ (x)g(x) is a polynomial of degree at most m + n,


if it is not zero. Note that for any non-zero polynomial, its degree is bigger
or equal to its order. Hence. it must be zero. We have
f (x) f ∗ (x)
= ∗ .
g(x) g (x)
Thus, we have the uniqueness. The proof of the part (1) of the proposition
is as follows. For ⇐=, we follow almost verbatim as above. For =⇒, we have
 
f1 (x)
ord − h(x) ≥ m + +n + 1 ≥ m + n − s + 1.
g1 (x)
Hence, the proof of (1) is complete. The proof of the part (2) of the
proposition is similar to the above and left to the reader. 

Example 4: The preceding proposition claims that there is at most one


rational function fg(x)
(x)
that satisfies all conditions. We give the following
example to show that there may not exist a rational function fg(x)
(x)
with the
4
said property. For instance, let h(x) = 1 + x , m = n = 2, then there is no
f (x)
g(x) with deg(f (x)) ≤ 2 and deg(g(x)) ≤ 2 such that
 
f (x)
ord − h(x) ≥ m + n + 1 = 5.
g(x)
Rings 59

Suppose the contrary, there are polynomials f (x), g(x), h(x) that satis-
fied the above equation. Let us multiply by g(x) the equation fg(x)(x)
=
h(x)mod(x5 ); it will produce
f (x) = g(x) + g(0)x4 mod (x5 ).
Since we have the restrictions on the degrees of f (x), g(x), we must have
f (x) = g(x), which do not satisfy all numerical conditions. 

Exercises

(1) Factor x16 − x over F2 into a product of irreducible polynomials.


(2) Given a finite field Fp , show that the following polynomial f (x) is
separable:
xi xp−1
f (x) = 1 + · · · + + ··· + .
i! (p − 1)!
(3) Let K be a field with ch(K) = p positive. Show that xp − x − a is either
irreducible or splits completely into linear factors, where a ∈ K.
(4) Let z = x2 + x3 . Show that [F2 ((x)) : F2 ((z))] = 2.
(5) Let K be a field of characteristic p and x, y be symbols. Show that
K(x, y) is inseparable over K(xp , y p ).
This page intentionally left blank
Chapter 3

Ring Codes

From now on, we shall fix the ground field Fq . A linear code is defined
by a subspace C of an n-dimensional vector space Fnq . Note that for the
purpose of decoding, we shall have more (algebraic) relations between the
vectors in the words space. Naturally, there shall be multiplicative relations
between the vectors in the words space. It means that we shall consider ring
structures. A simple ring is Fq [x]/(f (x)). It is easy to see that Fnq can be
represented by Fq [x]/(f (x)) for any polynomial f (x) of degree n. In this
representation, we have the multiplicative structure of a ring other than the
additive structure of a vector space. We shall study the concepts of coding
theory in the context of the ring Fq [x]/(f (x)). To make our work easier,
we shall later select a good f (x) for coding purposes. We have the following
definition to begin with.

Definition 3.1. Given any polynomial h(x), the Hamming weight of h(x)
n−1 i
mod (f (x)) is to consider h̄(x) ∈ Fq [x]/(f (x)) as h̄(x) = i=0 ci x ,
where n = deg(f (x)), and the Hamming weight of h(x) mod (f (x)) is the
Hamming weight of [c0 , c1 , . . . , cn−1 ]. Note that the preceding expression
h̄(x) is unique in any residue class with the degree less then n. 

There is a convenient way to consider only code subspaces defined by


ideals Ī in Fq [x]/(f (x)), and in this case, the elements in Ī are called code
words. Note that Fq [x] is a principal ideal ring, for instance let Ī be any ideal
in Fq [x]/(f (x)) and I be its pull back in Fq [x], i.e., I = {g(x) : ḡ(x) ∈ Ī,
then I is a principal ideal. We write I = (h(x)). It is clear that Ī = (h̄(x)).
So we conclude that Fq [x]/(f (x)) is a principal ideal ring.

61
62 Introduction to Algebraic Coding Theory

Since an ideal I = (f (x), h(x)) is generated by the greatest common


divisor h∗ (x) of f (x), h(x), we must have that h∗ (x) is a factor of f (x). It
is easy to see that any ideal Ī =(0) of Fq [x]/(f (x)) is of the form (h̄(x)),
where h(x) is a factor of f (x) in Fq [x]. If h(x) = 1, then Ī = the whole
ring, or the code space = the word space; it is a non-interesting case. The
interesting case is that h(x) is a non-unite factor of f (x) in Fq [x].
If we take f (x) to be xn − 1, then the coding theory will be very
interesting. In general we have a cyclic code in the following sense.

Definition 3.2. A code space C of length n is said to be cyclic iff


[a0 a1 . . . an−1 ] ∈ C ⇔ [an−1 a0 . . . an−2 ] ∈ C. 

A cyclic code is rich in algebraic structures. We have the following


equivalent characterizations.

Proposition 3.3. Any ideal Ī =(0) in Fq [x]/(xn −1) defines a cyclic code.
Conversely, any cyclic code of length n can be represented by an ideal Ī in
Fq [x]/(xn − 1).

Proof. If Ī = (1), then the code space is the whole word space, and the
code space is cyclic. Let an ideal Ī =(1) in Fq [x]/(xn − 1) and a code
word c(x) = c0 + c1 x + · · · + cn−1 xn−1 ∈ Ī. Then xc(x) ∈ Ī and xc(x) =
cn−1 + c0 x + · · · + cn−2 xn−1 since xn = 1, and the code space is cyclic.
Conversely, if C is cyclic of length n, then c(x) = c0 + c1 x + · · · +
cn−1 xn−1 forms a set Ī for all [c0 c1 . . . cn−1 ] ∈ C. The cyclic property of C
implies xc(x) ∈ Ī; hence, x c(x) ∈ Ī. It is easy to see that Ī is an ideal in
Fq [x]/(xn − 1), and C can be represented by Ī. 

The case p | n in the above is not interesting because then we have


xn − 1 = (xn/p − 1)p ; therefore, (xn/p − 1) is a non-zero word, while its
pth power is zero. This is against intuition and is a degenerate case. From
now on, we only discuss the non-degenerate cases and assume that n, p are
co-prime. Note that every non-zero ideal Ī can be generated by a factor of
xn − 1, and they are all precisely meaningful cyclic codes.

Example 1: Let us consider Hamming code [7, 4, 3] as in Section 2.5.


It is a cyclic code and can be represented by the ideal (x3 + x + 1) in
F2 [x]/(x7 + 1) = F2 [x]/(x7 − 1). 
Ring Codes 63

Definition 3.4. The ideal I (or a cyclic code C) is generated by a monic


polynomial g(x), which is a factor of xn − 1. It is called the generator
polynomial of the cyclic code C = (ḡ(x)), and the polynomial h(x) =
(xn − 1)/g(x) is called the check polynomial of C. 

Let Ω be an algebraic closure of Fq . We denote f (x) = xn − 1. Under


our assumption n, p are co-prime, we have

f  (x) = nxn−1 ,

which has only 0 as a root; furthermore, 0 is not a root of f (x). Therefore,


the polynomial xn − 1 will have no multiple roots in Ω by the derivative test
(cf. Proposition 2.7). Therefore, its divisor g(x), the generator polynomial,
will have no multiple roots. Let {γj } be the set of all roots of g(x) in
Ω. Then, it is clear that r(x) ∈ the ideal (g(x)) iff r(γj ) = 0 for all j.
Thus, the ideal (or subspace C) can be defined by {r(x) : r(γj ) = 0
mod (xn − 1), ∀j}. In fact, we may not have to take all roots of g(x) in Ω;
a partial set of roots may be enough.

Proposition 3.5. Given any set {γj } ⊂ Ω such that all γj are roots
of xn − 1 for a fixed n, let g(x) be the least-common multiple of all
monic irreducible polynomials satisfied by {γj } for all j. Then, g(x) is the
generator polynomial of a cyclic code C in Fq [x]/(xn − 1).

Proof. Let gi (x) be the monic irreducible polynomial satisfied by γi . Then,


it is clear that gi (x) | xn − 1 for all i. Therefore, xn − 1 is a common
multiple of all gi (x). By definition, g(x) is the least-common multiple and
g(x) | xn − 1. 

3.1. BCH Codes

Let us recall a significant property of the Hamming code as described by


Proposition 1.21, which concludes that the Hamming distance of such a code
is 3 by showing that in the matrix of all Hamming code words as column
vectors, any two column vectors are distinct, while any three column vectors
are linearly dependent. To study the Hamming distances of other code, we
shall consider the well-known Verdermonde matrices whose column vectors
are linearly independent. We have the following proposition.
64 Introduction to Algebraic Coding Theory

Proposition 3.6. Let xi ∈ F and M be the following n × n matrix:


⎛ ⎞
1 1 1 1 1 ... 1
⎜x . . . xn ⎟
⎜ 1 x2 x3 x4 x5 ⎟
⎜ 2 ⎟
⎜x1 x22
x32
x42
x52
. . . xn ⎟
2
⎜ ⎟
M =⎜ 3 ⎟.
⎜x1 x32 x33 x34 x35 . . . x3n ⎟
⎜ ⎟
⎜· · · ··· ··· ··· ··· ··· ··· ⎟
⎝ ⎠
xn−1
1 x2n−1
x3n−1
x4n−1
x5n−1
. . . xnn−1

Then, we have

detM = (xi − xj ).
i>j

Proof. First, we treat all xi as symbols. Subtracting the first column from
the second column, . . ., the first column from the nth column , we may
extract xi − x1 from the ith column, we conclude that i>1 (xi − x1 ) |
det(M ). By symmetry, we conclude

(xi − xj ) | det(M ).
i>j

Since both sides are polynomials of degrees n(n − 1)/2, they can only differ
by a constant. Comparing the coefficients of i≥2 xi−1 i on both sides, we
conclude that they must be equal. Since the proposition is true for xi as
symbols, it must be true for xi as elements of the field F. 

Let us further consider the ring Fq [x]/(xn −1) with the usual assumption
that n, p are co-prime. For the purpose of coding theory, we have the
following proposition.

Proposition 3.7. Let γ be a primitive nth root of unity (i.e., γ is a root


of xn − 1 and not a root of xs − 1 for any 0 < s < n) in an algebraic closure
Ω of the finite field F. Let g(x) be the least-common multiple of all monic
irreducible polynomials (i.e., satisfied by everyone) of γ  , γ +1 , . . . , γ +δ−1 ,
where , δ are some non-negative integers less than n−1. Let C be the cyclic
code with g(x) as the generator polynomial. Then, we have

min{Hamming wt(a) : a ∈ C, a = 0} ≥ δ.

Proof. Recall that the code space is the ideal generated by g(x). Hence,
all code polynomial c(x) must be satisfied by γ  , γ +1 , . . . , γ +δ−1 . Suppose
Ring Codes 65

that the proposition is false. Then, there is a non-zero code polynomial


c(x) = cij xij
ij

with s terms where s < δ such that c(γ  ) = · · · = c(γ +δ−1 ) = 0. Consider
the following system of equations
cij γ ij = 0,
ij

······
cij γ (+δ−1)ij = 0.
ij

Among the above δ linear equations in fewer than δ variables cij , we pick
the first s so that the number of equations matches the number of variables.
The coefficient matrix is the following:
⎛ i ⎞
γ 1 γ ()i2 . . . γ is
⎜ (+1)i1 ⎟
⎜γ γ (+1)i2 . . . γ (+1)is ⎟
N =⎜ ⎜ ⎟.

⎝· · · ··· ··· ··· ⎠
(+s−1)i1 (+s−1)i2 (+s−1)is
γ γ ... γ
It suffices to show that the matrix N is non-singular, then all cij must
be zero; this implies c(x) is the zero polynomial. Contradiction!
Let us show that the matrix N is non-singular. Let us pull γ ij from the
jth column. Then, we have the following matrix L:
⎛ ⎞
1 1 ... 1
⎜γ i1 γ i2 . . . γ is ⎟
⎜ ⎟
L=⎜ ⎟,
⎝· · · ··· ··· ··· ⎠
γ i1 (s−1) γ i2 (s−1) . . . γ is (s−1)
which is a Verdermonde matrix of rank s with xj replaced by γ ij . Since
we have γ ij − γ ik = 0 with ij = ik ≤ n − 1, the matrices L and N are
non-singular. Therefore, all cij = 0, and c(x) is a zero polynomial, contrary
to our assumption that c(x) is a non-zero polynomial. 
The following cyclic codes were discovered by Bose and Ray-Chaudhuri
(1960) and Hocquenghem (1959) and are known as BCH codes.

Definition 3.8. Let γ be a primitive nth root of unity in Ω. Let g(x)


be the least-common multiple of all monic irreducible polynomials of
66 Introduction to Algebraic Coding Theory

γ  , γ +1 , . . . , γ +δ−1 , where , δ are some non-negative integers less than


n − 1. Then, the cyclic code C with g(x) as the generator polynomial in
Fq [x]/(xn − 1) is called the BCH code of designed distance δ and length n.
Usually, we take  = 1. If n = q s − 1 (i.e., γ is a primitive element in Fqs ),
then it is called a primitive BCH code. 

We have the following corollary of the preceding proposition.

Corollary 3.9. The Hamming distance of a BCH code is at least δ.

Proof. It follows from the preceding proposition. 

The Hamming codes use the vector space structure of Fn2 . The BCH
codes use the ring structure of F2 [x]/(xn − 1). Note that F2 [x]/(xn − 1)
is isomorphic to Fn2 as a vector space, and furthermore, it has a rich ring
structure. The ring structure makes them better.

Example 2: Let us consider a BCH code C over F24 . Let us consider α, β,


where α satisfies x2 +x+1 = 0 over F2 and β satisfies x2 +x+α = 0 over F22 .
It is not hard to see that F22 = F2 [α] and F24 = F22 [β] = F2 [β]. Let γ = β
satisfying the minimal equation x4 + x + 1 = 0 over F2 . Then, certainly
γ 15 − 1 = 0 and γ is a primitive 15th root of unity. Let us consider the
BCH code determined by γ, γ 2 , γ 3 , γ 4 , γ 5 . Note that δ = 5 in this case. By
checking, one can conclude that γ, γ 2 , γ 4 , γ 5 satisfy the following equation:

1 + x + x3 + x4 + x5 + x7 + x8 = 0,

and γ 3 satisfies the following equation:

1 + x + x2 + x3 + x4 = 0.

So, the generator polynomial g(x) is the product of the above two
polynomials, which is

(1 + x + x3 + x4 + x5 + x7 + x8 )(1 + x + x2 + x3 + x4 )
= 1 + x3 + x6 + x9 + x12

and the check polynomial h(x) is (x15 − 1)/g(x) = 1 − x3 . It is easy to


see that the generator polynomial and the check polynomial multiply to
x15 − 1 = x15 + 1. For this code, the later Example 5 shows that it can
correct up to two errors and is not a Hamming code. 
Ring Codes 67

3.2. Decoding a Primitive BCH Code

For a useful code, the decoding process is significant. What do we mean


by decoding is that there is an integer t such that given a received
word r, the following are possible: (1) If there are less than or
equal t errors, then the decoder will find the original code word.
(2) If there more than t errors, then either we find a code word
c within a distance ≤ t from the received word (in general, if we
find a code word, then with a small probability, c may not be the
originally sent code word) or we return an error message to indicate
that what was found is not even a code word, and hence there are more
than t errors. There are several fast ways of decoding a primitive BCH code.
Other than one presented in this section, there are other ways; for instance,
Berlekamp’s algorithm, which is at least equally fast (see Appendix D). The
advantage of the method presented here is its mathematical simplicity.
Let us consider a primitive BCH code of length n over Fq with designed
distance δ. Let g(x) be the generating polynomial and c(x) = c0 + c1 x +
· · · + cn−1 xn−1 = s(x)g(x) be a code word.
During the transmission of the code word, there might be errors and
erasures, which means the positions of the erased data are known, while
the precise data at those positions are unavailable. Let the received word
be r̄(x) = · · · + Ei xi + · · · , where i’s indicate the erasure positions and Ei
indicate the apparent values. We shall treat the letter Ei as 0 and define
r(x) = · · · + 0xi + · · · . Let the positions of erasures be a set N which is
known. Let the number of erasures be u. Let the hypothetical error word
e(x) be defined by c(x) = r(x)−e(x). Clearly, given r(x), then c(x) and e(x)
determine each other. In the following flowchart, given r(x), the decoder
will produce e(x) if the number of errors and erasures are within a limit.
Otherwise, it will return an error message.
We have the following Figure 3.1.
In the below Figure 3.1, upon receiving the received word, we let it pass
the check polyn, which will use the agreed check polynomial h(x) to check if
it is a code word. If it passes, then it goes through the pass box, and it will
be declared as a code word; in other words, it flows through the left half of
Flowchart 1. If it fails the check polyn, then it goes through the right half
of Flowchart 1 and start with the decoder. The decoder1 is based on the

1 We follow the decoding process of SKHN [35]. Another slightly faster decoding process
is the Berlekamp’s algorithm [4] decoder. Please see Appendix D.
68 Introduction to Algebraic Coding Theory

received r

check polyn

pass decoder

check polyn

code word error

Figure 3.1.

assumption that the number of errors v and erasures u are limited by the
numerical condition 2v + u < δ, and we shall use the theory of power series
and the Euclidean algorithm. The decoder will produce an error word e (see
Section 3.2) such that r − e might be a code word. Even if the decoder
produces an error word e(x), we have to further test if the assumption of
the limited number of errors, the numerical condition, is truly satisfied by
testing if e(x) is the correct error word by checking c = r − e to see if it
is a code word using the check polynomial. If it is, then we pass on to the
block of code word. If it is not, then the decoding fails, and we return an
error message.
Let us concentrate on the decoder part. We assume that there are u
erasures and at most v errors. Let

e(x) = r(x) − c(x) = e0 + e1 x + · · · + en−1 xn−1

be the hypothetical error vector. Note that we assume that there are at most
v + u non-zero ei ’s. It follows from the remark after Proposition 1.25 that
if we assume that 2u + 2v < δ and c(γ i ) = 0 for i = 1, . . . , δ, it means that
e(γ i ) = r(γ i ) for i = 1, . . . , δ; then, there is a unique code word within the
error range. Certainly, by brute force of checking all possibilities, we may
recover c(x). However, this is too slow. We use the improved numerical
condition 2v + u < δ, which is better than 2u + 2v < δ and find a clever way
of solving the decoding problem. We introduce (following Peterson [30])
the following concepts of error locator which gives the locations of errors
Ring Codes 69

and error-evaluator polynomial ω(x) which gives the values at the error
locations.

Definition 3.10. Let M be the set of all places where either there is
an erasure or an error. We shall write the set M as the disjoint union
M = N ∪ L, with N consisting of u erasures and L consisting of at most v
errors. The error-locator polynomial σ(x) is defined as

σ(x) = (1 − γ i x) = (1 − γ i x) (1 − γ i x) = σ1 (x)σ2 (x).


i∈M i∈N i∈L

Since the set N is known, then the function σ1 (x) is a known polynomial of
degree u. Thus, σ(x) and σ2 (x) determine each other. The error-evaluator
polynomial ω(x) is defined as

ω(x) = (ei γ i x (1 − γ j x)).


i∈M j∈M\i

Note that ω(x) is an unknown function. 

It is easy to see that deg(σ1 (x)) = u, deg(σ2 (x)) ≤ v, deg(σ(x)) ≤ u + v


and deg(ω(x)) ≤ u+v. Note that if σ2 (x), ω(x) are found, then the solutions
of σ2 (x) will give us {γ −i : i ∈ L}. We take the inverses of all elements in
the preceding set, and we have {γ i : i ∈ L} and the locations of all errors.
−i
)γ i
Furthermore, we have − ω(γ σ (γ −i ) = ei for all erasure or error position i.
Therefore, if we can find σ2 (x) and ω(x), then we can find σ(x) and ω(x),
and we decode the message.
Furthermore, it suffices to find the rational function σω(x) 2 (x)
with the
denominator having a constant 1, instead of finding them individually, for
the following two reasons: (1) The only factor of σ2 (x) is of the form (1−γ i x)
with i ∈ L which is not a factor of ω(x); therefore, the above expression is in
the reduced form of the said rational function. (2) The denominator σ2 (x)
is further normalized by σ2 (0) = 1. Therefore, once we find the rational
function σω(x)
2 (x)
∈ Fq [[x]], it follows from Proposition 2.36 and the following
Proposition 3.11 that if we write it in the reduced form and require it to
satisfy condition (2) of Proposition 2.36, then we have the polynomials
ω(x), σ1 (x), σ2 (x) and solve the decoding problem.
Now we shall apply Proposition 2.36. The degree of ω(x) ≤ v + u, and
the polynomial σ1 (x) is known with degree s = u and the degree of σ2 ≤ v.
We have

v + u + v + u − u = 2v + u.
70 Introduction to Algebraic Coding Theory

Now, we only use the following numerical condition for decoding:


2v + u = δ  < δ.
Also, the rational function σω(x)
2 (x)
is unknown at the very beginning. We have
the following proposition which tell us that the first δ  coefficients of the
power-series expansion of σω(x)
2 (x)
satisfy the following equation with known
right-hand side. Furthermore, by the result of Proposition 2.36, the rational
function σω(x)
2 (x)
which exists theoretically is thus uniquely determined.

Proposition 3.11. We have the following expression in Fq [[x]]:


⎛ ⎞
δ
ω(x)
ord ⎝ − r(γ j )xj ⎠ ≥ δ  + 1.
σ(x) j=1

Proof. We have

ω(x) ei γ i x
= = ei (γ i x)j
σ(x) 1 − γ ix j=1
i∈M i∈M

∞ ∞
= ei γ ij xj = e(γ j )xj .
j=1 i∈M j=1

Note that c(γ j ) = 0 for j = 1, . . . , δ  and e(γ j ) = r(γ j ) for j = 1, . . . , δ  .


Therefore, we have
⎛ ⎞ ⎛ ⎞
δ ∞ δ
ω(x)
ord ⎝ − r(γ j )xj ⎠ = ord ⎝ e(γ j )xj − r(γ j )xj ⎠
σ(x) j=1 j=1 j=1

≥ δ  + 1.

On the other hand, Proposition 2.36 shows that the rational function
ω(x)
σ2 (x)is thus uniquely defined. Sometimes, one may establish the uniqueness
proposition first (as we prove Proposition 2.36 first), and then, we tie the
object in an equation (as we prove Proposition 2.36). Finally, we use the
equation to show the existence. For decoding purpose, we need a fast way
to recover the rational function σω(x)
2 (x)
from Proposition 3.11 (please see the
next proposition).
The above equation in the proposition, written slightly differently as
follows, is named by Berlekamp as the key equation (see Appendix D):

(1 + S(x))σ(x) ≡ ω(x) mod xδ +1 .
Ring Codes 71

Let us postpone the discussion of the method of Berlekamp to Appendix


D. Recall that σ1 (x) is a known function. Although a piece of partial power-
series expansion of σω(x)
2 (x)
is known by the preceding proposition, we have
to recover ω(x)
σ(x) fast. The process is as follows. Let f1 (x) be the polynomial
with deg(f1 (x)) < δ  + 1 and
δ

f1 (x) = σ1 (x) r(γ j )xj mod(xδ +1 ). (10 )
j=1

Note that since σ1 (x) is known, then f1 (x) is known and thus uniquely
determined. The conclusion of the above proposition can be re-written as
⎛  ⎞
δ

ω(x) = σ(x) ⎝ r(γ i )xi ⎠ + xδ +1
h(x)
i=1

= σ2 (x)f1 (x) + xδ +1 h∗ (x).

Therefore, ω(x) ∈ the ideal (f1 (x), xδ +1 ) ⊂ Fq [x]. Note that σ2 (0) = 1,
i.e., σ2 (x) is an unit in Fq [[x]]. We have the following interesting equation:
ω(x) 
= f1 (x) mod(xδ +1 ). (100 )
σ2 (x)
In fact, due to the uniqueness result of Proposition 2.19, the long

algorithm applied to f1 (x) and f2 (x) = xδ +1 will provide a fast way to
find the rational function σω(x)
2 (x)
. It turns out to be one of the most useful
tools in decoding. We have the following proposition.
Proposition 3.12 (Euclidean Algorithm with stopping strategy )
(Sugiyama–Kasahara–Hirasawa–Namakawa). Let us assume that
there are non-negative integers u, v and δ  = 2v + u and polynomials
ω, σ = σ1 σ2 . Let deg(ω(x)) = δ  , deg(σ1 ) = u and deg(σ2 ) = v, where
σ1 (x) is a known function defined by Definition 3.10. We use the notation
of Proposition 2.19. Let f1 (x) be defined above in equation (10 ), and let

f2 (x) = xδ +1 , and let fi (x), ni be defined as in Proposition 2.16. Let (the
stopping time) t be determined as t = 3 if n3 ≤ v + u; otherwise, t is
determined by nt−1 > v + u ≥ nt . For the case t = 3, we have
α3 (x)f1 + γ3 (x)f2 (x) = 1f1 (x) + 0f2 (x) = f3 (x).
Otherwise, (for t > 3), in the following equation:
αt (x)f1 (x) + γt (x)f2 (x) = ft (x), (1)
72 Introduction to Algebraic Coding Theory

we have
ω(x) ft (x)
= .
σ2 (x) αt (x)
Furthermore, we have σ2 (x) = h(0)−1 αt (x), where αt (x) = xs h(x) with
h(0) = 0 and ω(x) = h(0)−1 ft (x).

Proof. We like to factor out αt (x) from the above equation (1) and try
to get our conclusion directly. However, there are several technicalities. We
want to show that the operation of factoring can be performed in the power-
series ring Fq [[x]]. Note that we may assume 2v + u = δ  , and according
to Proposition 2.16, we have deg(αt (x))(≤ δ  + 1 − nt−1 ) ≤ v=n as in
Proposition 2.19 and deg ft (x) = nt ≤ u + v=m as in Proposition 2.36
in any case. Let αt (x) = xs h(x) with h(0) = 0 and h(x) be a unit in Fq [[x]].
Note that then,

ord(α−1
t ) = −s

in Fq ((x)), where s may be 0. Then, s ≤ v < 2v + u + 1 = δ  + 1; therefore,


xs is a factor of f2 (x), and it follows from the above equation (1) that xs
is a factor of ft (x). We may factor αt (x) from equation (1) above and get
a power-series expression in F q [[x]] for the left-hand side of equation (1).
We have
 
ft (x)
ord − f1 (x) = ord(γt (x)f2 (x)α−1t ) ≥ 2v + u + 1 − s
αt (x)
= (v + u) + v − s + 1.

It follows from the above equation (100 ) that we have


 
ω(x)
ord − f1 (x) ≥ 2v + u + 1.
σ2 (x)
It follows from Proposition 2.36, (1) that
ω(x) ft (x)
= .
σ2 (x) αt (x)

Since σω(x)
2 (x)
is the reduced form of the rational function, and the condition
of σ2 (0) = 1 makes the rational expression unique, our proposition follows
easily. 
ω(x) ω(x)
From the above proposition, we easily find σ(x) = σ1 (x)σ2 (x) with σ1 (x)
known.
Ring Codes 73

The following example will help us to understand the procedure.

Remark: The assumptions of the preceding proposition are satisfied if


there are less than or equal u erasures and at most v errors. 

Example 3: Let us continue our study of the example in Section 3.1. It


is easy to see that the generator polynomial= g(x) = 1 + x3 + x6 + x9 + x12 .
With the number δ = 5, we expect to correct two errors. Let the code word
be c(x) = x + x2 + x4 + x5 + x7 + x8 + x10 + x11 + x13 + x14 and the received
word be r(x) = x + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x10 + x11 + x13 + x14 .
We wish to recover the code word c(x) from the received word r(x). Let us
assume that there is no erasure u = 0 (i.e., σ1 (x) = 1, σ(x) = σ2 (x)), and
v = 2. We take δ  = 4.
We have the following computations:

r(γ) = γ 2 , r(γ 2 ) = γ + 1,
r(γ 3 ) = γ, r(γ 4 ) = γ 2 + 1.

We shall start the long algorithm with (f1 , f2 ) = (γ 2 x + (γ + 1)x2 +


γx3 + (γ 2 + 1)x4 , x5 ). We have

1 · f1 + 0 · f2 = 1 · (γ 2 x + (γ + 1)x2 + γx3 + (γ 2 + 1)x4 ) + 0 · x5


= (γ 2 x + (γ + 1)x2 + γx3 + (γ 2 + 1)x4 = f3 ,
f2 + (1 + γ 7 x)f3 = x5 + (1 + γ 7 x)(γ 2 x + (γ + 1)x2 + γx3 + (γ 2 + 1)x4 )
= γ 2 x + γ 14 x2 + γ 6 x3 = f4 ,
f3 + γ 2 xf4 = γ 2 x = f5
(1 + γ 2 x + γ 2 (γ 3 + γ + 1)x2 )f1 + γ 2 xf2 = γ 2 x = f5 .

We conclude that
ω(x) γ2x
= .
σ(x) 1 + γ 2 x + (γ 2 (γ 3 + γ + 1))x2
It means that we have

ω(x) = γ 2 x
σ(x) = 1 + γ 2 x + (γ 2 (γ 3 + γ + 1))x2 .

We find the two roots of σ(x) are γ −3 , γ −6 (the simple way is computing
all σ(γ −i ) which takes at most 15 evaluations).
74 Introduction to Algebraic Coding Theory

ω(x) −3
We have to find e3 , e6 . We calculate xσ  (x) at x = γ , γ −6 , and both
e3 , e6 are 1. Therefore, we find c(x) = r(x) − e(x) = x + x2 + x4 + x5 +
x7 + x8 + x10 + x11 + x13 + x14 is a possible code word. Further test by
multiplying it with the check polynomial = x3 + 1 yields 0 mod x15 + 1 or
c(x) = (x + x2 )(1 + x3 + x6 + x9 + x12 ) = (x + x2 )g(x). We conclude c(x)
is a code word. We find the original message is x + x2 . 

Exercises

(1) Show that a Hamming code is a cyclic code.


(2) Find a BCH code with minimal distance > δ.
(3) Write a computer program to decode the example in Section 3.2.
(4) Find an example of a received word with more than three errors in the
example in Section 3.2 which is decoded to a wrong code word.
(5) Find the probability of error that the received word r has more than two
errors, for the example in Section 3.2. We assume that the probability
of error at any location is p.

3.3. Reed–Solomon Codes

The BCH codes introduced in the preceding sections are general with the
disadvantages of being complicated and hard to use. We discuss their simple
counterparts, the Reed–Solomon codes, in this section.
The reader is referred to the beginning of Chapter 1. There, we discuss
a possible way of coding using polynomial curves over real numbers.
The process is as follows. Let [a0 , a1 , . . . , ak−1 ] be the original message.
k−1
It determines a polynomial curve f (x) = 0 ai xi of degree at most
k − 1. Then, we send out [b1 , b2 , . . . , bn ] = [f (1), f (2), . . . , f (k), . . . , f (n)].
Assuming that there are at most n−k 2 errors, we may use brute force of
checking all possibilities to decode the received message [b1 , b2 , . . . , bn ] as
follows. Knowing k, n, we may try all subsets of [b1 , b2 , . . . , bn ] of k + n−k 2
elements to see which one determines a polynomial curve of degree at most
k − 1. Once we find the curve, if it exists, we may use it to correct all errors.
The difficulty is the decoding process since a brute force way will be time-
consuming when n, k are large. Now, we shall change the coefficient field
from the real field R to a finite field K and rename it the Reed–Solomon
code (see the following definition). The important aspect of it is that there
is a fast way of decoding (as seen in the following).
Ring Codes 75

Now, we shall use a finite field Fq in place of F2 , where q = 2m . We


shall use the notation n = q − 1. Let γ be a primitive element of the field
Fq (i.e., the powers of γ exhaust all non-zero elements of Fq ). We shall
consider the points {γ, γ 2 , . . . , γ n } instead of {1, 2, . . . , n} in the real case.
Note that γ n = 1.
We shall define a code. Let a = [a0 , a1 , . . . , ak−1 ] be a message. We want
to make a code [b1 , b2 , . . . , bn ]. Let Pk be the set of all polynomials of degree
less than k, where k < n, over Fq ; then, Pk is a vector space of dimension

k over Fq . For any vector [a0 a1 . . . ak−1 ] ∈Fkq , let us define f (x) = ai xi ,
i i
and notice that f (x) ∈Pk . Let the values at γ be bi = f (γ ) for i = 1, . . . , n
(note γ n = 1). Thus, we define a one-to-one map from the message space

V = {a = [a0 a1 . . . ak−1 ] : ai ∈ Fq } to {f (x) = ai xi ∈ Pk }, and then to
the code space C = {b = [b1 b2 · · · bn ] : bi = f (γ ), b ∈ Fn
i
q }. We have a code
which sends a message a = [a0 , a1 , . . . , ak−1 ] to a code b = [b1 , b2 , . . . , bn ]
this way.

Proposition 3.13. Let us use the notations of the preceding paragraph.


The set {[b1 b2 . . . bn ] : there is a polynomial f (x) ∈ Pk with bi = f (γ i )} is
a k-dimensional subspace C of the vector space of all n tuples.

Proof. Note that deg(f (x)) < k < n. Let us define a map π: Pk
→ Fnq as π(f (x)) = [f (γ), f (γ 2 ), . . . , f (γ n )]. Then clearly, π is a linear
transformation. The only thing we have to show is that π is an one-to-
one map, i.e., if f (γ i ) = 0 for i = 1, 2, . . . , n, then f = 0. Note that then
xn − 1 | f (x), and deg(f (x))≥ n, if f = 0. A contradiction. 

We have the following definition.

Definition 3.14. The code defined by C in the preceding proposition is


called a Reed–Solomon code [n, k] in the value form, or simply Reed–
Solomon code [n, k]. 

Proposition 3.15. Let us fix a Reed–Solomon code [n, k] C. Let b, b∗ be


two distinct vectors in C. Then, d(b, b∗ ) ≥ n − k + 1.

Proof. Let b = [b1 b2 · · · bn ] = b∗ = [b∗1 b∗2 · · · b∗n ] and bi = f (γ i ), b∗i = f ∗ (γ i ).


Since b = b∗ , then f (x) = f ∗ (x). Therefore, (f − f ∗ )(x) has at most degree
of k − 1 and at most k − 1 zeroes or at least n − k + 1 non-zeroes among
{1, γ, γ 2 , . . . , γ n−1 }. 
76 Introduction to Algebraic Coding Theory

It follows from the remark after Proposition 1.25 that the Reed–
Solomon [n, k] code can correct up to (and including) n−k
2 errors.

Proposition 3.16. A Reed–Solomon code [n, k] is an [n, k, n− k + 1] code.

Proof. It follows from the preceding proposition that d ≥ n − k + 1. Let


i=k−1
f (x)= i=1 (x − γ i ). Then, it is clear that f (x) ∈ Pk (x) and f (γ i ) = 0
for i = 1, . . . , k − 1. Let bj = f (γ j ). We must have bj = 0 for j = k, . . . , n;
otherwise, its Hamming weight is less than n − k + 1. Then, the Hamming
weight of [b1 . . . bn ] is precisely n − k + 1. 

Proposition 3.17. A Reed–Solomon code in the value form is an MDS


code over Fq .

Proof. It follows from the definition of MDS code of Section 1.7. 

Remark: Let us assume that q > 2, where q = 2m . Although the Reed–


Solomon code in the value form is an MDS code over the field Fq , it is not
an MDS code over the bit field F2 . Note that if we consider a Reed–Solomon
code in the value form over the bit field, it will be an [mn, mk] code, with
the minimal distance n − k + 1 ≤ (n − k)m + 1. Therefore, it is not an
MDS code over the bit field F2 (cf. the definition of MDS code in Section
2.4). Recall that according to Shannon’s theorem, to get a sequence of good
codes, it is unlikely to have any restriction of the sizes of the blocks. Suppose
we keep the rate of information k/n =  constant, then the rate of distance
((1 −  + 1/n)n)/nm will go to zero as m → ∞, i.e., the correcting power
will tend to zero if we allow the size of the field q to get arbitrarily large.
Thus, the Reed–Solomon codes in the value form are not the ones forecasted
by Shannon’s theorem. An additional drawback is that the multiplications
over a large field will become more time-consuming to carry out. 

Another way of defining a Reed–Solomon code is to imitate a BCH


code as follows. Note that over Fq , all irreducible polynomials of {γ j } are
of degree 1.

Definition 3.18. A Reed–Solomon code [n, k] in the coefficient


form, or simply Reed–Solomon∗[n, k] code, is a primitive BCH code
with length n = q − 1. The generator polynomial is of the form g(x) =
i=d−1
i=1 (x − γ i ), where d = n − k + 1. Any message word is a polynomial
h(x) of degree less than k = n − d + 1, and its code word is the coefficients
of the polynomial c(x) = h(x) · g(x). 
Ring Codes 77

Example 4: Apparently these two codes, Reed–Solomon code and Reed–


Solomon∗ code, are seemingly different. However, the results may be the
same. For instance, let us consider the case k = 1 (d = n) the code
word for f (x) = 1 in a Reed–Solomon code is [11 · · · 1], and the code
word for h(x) = 1 in a Reed–Solomon∗ code [α1 . . . αn ], where αi is given
i=n−1
by the coefficients in the expansion of 1 × i=1 (x − γ i ) = 1 + 1x +
1x2 + · · · + 1xn−1 . 

We note that the dimension of a Reed–Solomon∗ [n, k] code is


n − (d − 1) = k = dimension of Reed–Solomon code [n, k]. In fact, we have
the following proposition.

Proposition 3.19. Let C be the code space of a Reed–Solomon code [n, k]


and C* be the code space of the corresponding Reed–Solomon* [n, k] code.
Then, C = C∗.

Proof. We wish to show that C ⊂ C∗ . Since they are of the same


dimension, then they must be equal. Let us use linear map: C →  C∗ .
j
Let us consider a basis {x : 0 ≤ j ≤ k − 1} in the message space of C. The
code word of xj in C is [γ j γ 2j . . . γ nj ]. Let us interpret it as the coefficients
n n
of the polynomial c(x) = i=1 γ ji xi = i=1 (γ j x)i (Recall that xn = 1).
It is easy to see the following identity:
  i=n−1
x xn − 1
c =x =x (x − γ i ).
γj x−1 i=1

Replacing x by xγ j in the above equation, note that xn = 1 and

γ jn xn = xn
γ j x − 1 = γ j (x − γ n−j ).

Certainly, n − j ≥ n − k + 1 = d, and recall that the generate polynomial


i=d−1
g(x) = i=1 (x − γ i ), we conclude
 j  n
γ x j x −1
c(x) = c = γ x =x (x − γ i )
γj γj x − 1
i=n−j

and

c(x) = x (x − γ i ) = g(x)h(x).
i=(n−j)
78 Introduction to Algebraic Coding Theory

Therefore, c(x) ∈ (g(x)), and the coefficients of c(x) which corresponds to


xj being an element in C∗ . Since it is true for any xj , we conclude that
C ⊂ C∗ . Our proposition is established. 

As illustrated by the preceding proposition, the two different forms of


Reed–Solomon codes are equivalent and will be called the Reed–Solomon
code. The Reed–Solomon code will later be generalized to a geometric
Goppa code (a type of algebraic geometric code) (see Example 1 in
Chapter 5). The Reed–Solomon code∗ can be decoded as a BCH
code.

Example 5: Let us consider the following Reed–Solomon∗ code over F24


and d = 9. Then, we have q = 16, n = 16 − 1 = 15, k = 7. The generator
polynomial g(x) is given by
i=8
g(x) = (x − γ i ),
i=1

where γ satisfies the equation x4 + x + 1 = 0 and is given in Example 2 of


Section 3.1. According to the theory of Reed–Solomon codes, the code may
correct up to 4 errors. So we assume that v = 4, and there is no erasure,
u = 0.
Let us consider a received message r(x) = 1+γ 6 +γx+γ 3 x2 +γ 7 x3 +γ 8 x4
+γ x + x6 + γ 3 x7 + γx8 + γ 2 x9 + γ 3 x10 + γ 3 x11 +γx12 + γ 2 x13 + γ 14 x14 .
3 5

We compute r(γ i ) as follows:

r(γ) = γ 3 , r(γ 2 ) = γ,
r(γ 3 ) = γ 7 , r(γ 4 ) = γ 7 ,
r(γ 5 ) = γ, r(γ 6 ) = 0,
r(γ 7 ) = γ 9 , r(γ 8 ) = 0.
j=8
We start the long algorithm with f1 = j=1 r(γ j )xj = γ 3 x + γx2 +
γ 7 x3 + γ 7 x4 + γx5 +γ 9 x7 and f2 = x9 as follows:

1 · f1 + 0 · f2 = f3 ,
f2 + (γ 13 + γ 6 x2 )f3 = γx + γ 14 x2 + γ 6 x3 + γ 13 x4 + γ 2 x5 + γ 13 x6 = f4 ,
f3 + (1 + γ 11 x)f4 = γ 9 x + γ 2 x2 + γx4 + γ 6 x5 = f5 ,
f4 + (γ 9 + γ 7 x)f5 = γ 9 x + γ 8 x2 + γ 5 x3 + γ 9 x4 = f6 .
Ring Codes 79

Notice that deg(f5 ) = 5 > v + u = 4 ≥ deg(f6 ), then we shall stop. We


substitute back and have the following equation:

((γ 9 + γ 7 x) + (γ 13 + γ 6 x2 ) + (γ 9 + γ 7 x)(1 + γ 12 x)(γ 13 + γ 6 x2 ))f1


+(1 + (γ 9 + γ 7 x)(1 + γ 11 x))f2
= f6 = γ 9 x + γ 8 x2 + γ 5 x3 + γ 9 x4

with

σ(x) = αu (x) = (γ 9 + γ 7 x) + (γ 13 + γ 6 x2 )
+ (γ 9 + γ 7 x)(1 + γ 11 x)(γ 13 + γ 6 x2 ),
ω(x) = fu (x) = γ 9 x + γ 8 x2 + γ 5 x3 + γ 9 x4 .

It is easy to see that σ(γ −5 ) = σ(γ −7 ) = σ(γ −10 ) = σ(γ −11 ) = 0.


Therefore, the error locations are 5, 7, 10, 11. Furthermore, we use the
−i
)γ i
formula that − ω(γ σ (γ −i ) = ei for error locations i. We find that e5 = 1,
e7 = γ 2 , e10 = 1, e11 = γ 2 + 1. We conclude that the error polynomial e(x)
and the code polynomial c(x) are

e(x) = x5 + γ 2 x7 + x10 + (γ 2 + 1)x11 ,


c(x) = r(x) − e(x) = 1 + γ 6 + γx + γ 3 x2 + γ 7 x3 + γ 8 x4
+ (γ 3 + 1)x5 + x6 + (γ 2 + γ 3 )x7 + γx8 + γ 2 x9
+ (1 + γ 3 )x10 + (γ 3 + γ 2 + 1)x11 + γx12 + γ 2 x13 + γ 14 x14 .

The corrected c(x) is a code polynomial, which can be checked by


multiplying with the following check polynomial:
15
(x − γ i ),
i=9

and show the result is a multiple of x15 + 1 as follows:

((1 + γ + γ 3 ) + γ 3 x + (γ + γ 2 )x2 + γ 3 x3
+ (1 + γ 2 + γ 3 )x4 + (1 + γ 3 )x5 + (1 + γ 3 )x6 )(x15 + 1).

Therefore, the original message is

(1 + γ + γ 3 ) + γ 3 x + (γ + γ 2 )x2 + γ 3 x3 + (1 + γ 2 + γ 3 )x4
+ (1 + γ 3 )x5 + (1 + γ 3 )x6 . 
80 Introduction to Algebraic Coding Theory

3.4. Classical Goppa Codes

Instead of using polynomials, we may use rational functions. We study


a class of codes, the classical Goppa2 codes, which are generalizations of
BCH codes. It will be clear from the following definitions that most classical
Goppa codes are not cyclic codes. In Chapter 4, we show that both classical
Goppa codes (see Chapter 5, Example 3) and Reed–Solomon codes (see
Chapter 5, Example 1) can be generalized to geometric Goppa codes.
Let g(x) be any non-constant polynomial ∈ Fq [x]. We shall consider the
ring Fq [x]. Let γ ∈ Fq such that g(γ) = 0. By the Euclidean algorithm,
there exist α(x) and r = 0 such that

g(x) = α(x)(x − γ) + r.
g(x)−g(γ)
It means that we have g(γ) = r = 0, α(x) = (x−γ) , and

(−r)−1 α(x)(x − γ) = 1 mod g(x).

Namely, (x− γ) has an inverse (−r)−1 α(x) mod g(x). We say that (x− γ)
is regular in Fqm [x]/(g(x)). It is easy to see that in the polynomial ring
Fq [x ], we have
 
−1 g(x) − g(γ)
(x − γ) ≡ 1 mod g(x).
g(γ) x−γ

Therefore, in the total quotient ring of Fqm [x]/(g(x)), we have

1 −1 g(x) − g(γ)
= ,
x−γ g(γ) x−γ

or we may write
 
1 −1 g(x) − g(γ)
≡ mod g(x). (1)
x−γ g(γ) x−γ
We have the following definition.

Definition 3.20. Let γi = γj be all distinct and G={γ1 , . . . , γn } ⊂ Fq .


We define the (classical) Goppa code Γ(G, g(x)) with Goppa polynomial
g(x), where g(γi ) = 0 for 1 ≤ i ≤ n to be the set of code words

2 Valery Goppa (1939–), Soviet and Russian mathematician.


Ring Codes 81

c = [c1 , . . . , cn ] over the letter field Fq for which


n
ci
≡ 0 mod(g(x)).
i=1
(x − γi ) 

The classical Goppa code will later be generalized to a geometric


Goppa code (a type of algebraic geometric code) (see Example 3 in
Chapter 5).
The following example will show that the BCH codes are special cases
of classical Goppa codes.

Example 6: Let the Goppa polynomial g(x) be xd−1 , and G={γ −i : 1 ≤


i ≤ n−1}, where γ is a primitive (n−1)th root of unity in Fq . Let us consider
the resulting classical Goppa code Γ(G, g). Note that g(γ −i ) = (γ −i )d−1 .
Using the equation (1) above, we have the following equations all mod
(g(x)):
n−1 n−1
ci (−ci ) xd−1 − (γ −i )d−1
≡0⇔ ≡0
i=1
(x − γ −i ) i=1
(γ −i )d−1 x − γ −i
n−1 d−2
(−ci )
⇔ (γ −i )d−2−k xk = 0
i=1
(γ −i )d−1
k=0
n−1 d−2
⇔ (−ci ) (γ i(k+1) )xk = 0
i=1 k=0
d−2 n−1
⇔ (−ci )(γ i(k+1) ) xk = 0.
k=0 i=1

Since it is a polynomial of degree less that d − 1 = deg g(x), we must have


all coefficients zeroes, i.e.,
n−1
(−ci )γ i(k+1) = 0, for k = 0, . . . , n − 2.
i=1
n−1
Let a polynomial c(x) = i=1 (−ci )xi . Then, we have c(γ (k+1) ) = 0 for
k + 1 = 1, . . . , d − 1. We conclude that the last condition is precisely the
condition for a BCH code. 

It is clear that a classical Goppa code is linear. It is important to find


other parameters. We have the following proposition.
82 Introduction to Algebraic Coding Theory

Proposition 3.21. Let g(x) be a polynomial of degree s. Then the classical


Goppa code Γ(G, g) has rank = n − s (i.e., the rank is the dimension of the
code space) and minimum distance ≥ s + 1.

Proof. Let us follow the definition for classical Goppa Codes. From the
equation
n
ci
≡ 0 mod(g(x)),
i=1
(x − γi )

we have that the left-hand side can be written as n(x)/d(x) with d(x) =
(x − γi ) and deg (n(x)) < n. Since (d(x), g(x)) = 1, then c = {c1 , . . . , cn }
is a code word ⇔ g(x) | n(x) or n(x) = g(x)h(x), where h(x) is of degree <
n−s. Since the classical Goppa codes C are parameterized by the coefficients
of h(x), its rank is n − s. On the other hand, if c = [c1 , . . . , cn ] is a code
word with at most s of the ci ’s non-zero, then
n
ci
≡0 mod(g(x))
i=1
(x − γi )
can be rewritten as
n
ci ci
= ,
i=1
(x − γi ) (x − γi )
i∈I

where I is an index set with at most s elements. It can be written as


n(x)/d(x) with deg(n(x)) < s and g(x) | n(x). Therefore, n(x) = 0. Due
1
to the linear-independence property of (x−γ i)
, all ci , i ∈ I must be zero.
Hence, we conclude that the minimum distance ≥ s + 1. 
The next proposition shows that the classical Goppa codes satisfy the
Gilbert–Varshamov’s bound.

Proposition 3.22. For any n, let Gn =Fqn . Then, there is a sequence of


classical Goppa codes Γ(Gn , gn (x)) for some monic irreducible polynomials
gn (x) such that limn→∞ R(Γ(Gn , gn (x))) ≥ 1 − H(δ), where R(M ) is the
rate of information of the code M and δ is any number with 0 ≤ δ ≤
q−1
q . It means that the sequence of classical Goppa codes meets the Gilbert–
Varshamov’s bound.
q−1
Proof. Given δ with 0 ≤ δ ≤ q , let dn < n be a sequence of integers
such that
dn
lim = δ.
n→∞ n
Ring Codes 83

Given any positive integer m, let tn be a sequence of integers such that


n
tn ≤ m . We shall find an irreducible polynomial gn (x) of degree tn such
that the minimal Hamming distance of Γ(Gn , gn (x)) is at least dn . For
this purpose, all code words with Hamming weight less than dn should be
excluded from Γ(Gn , gn (x)). Let (c1 , . . . , cn ) be such a code word, i.e.,
n
c1
≡0 mod (gn (x)).
i=1
(x − γi )
c(x)
We may write the above rational function as h(x) , and the denominator
h(x) is co-prime to gn (x), which means gn (x) | c(x). Let us consider a code
word [c1 , . . . , cn ] of Hamming weight j; it means that deg c(x) ≤ (j − 1),
and thus, c(x) has at most j−1 tn irreducible factors of degree tn . The total
number of code words of Hamming weight j is Cnj (q − 1)j . Therefore, the
total number of irreducible polynomials which should be excluded is, by
Proposition 2.30,
dn −1  
j−1 dn
Cnj (q − 1)j < Uq (n, dn − 1).
j=1
t n tn

Let us count the total number of irreducible polynomials of degree tn . By


Proposition 2.30, with tn the composite number,
1 mtn
Itn > q (1 − q −mtn /2+1 ).
tn
To prove our proposition, it suffices to show that asymptotically, we may
find dn , tn with
dn 1
Uq (n, dn − 1) < q mtn (1 − q −mtn /2+1 ).
tn tn
Let us take a logarithm with base q and divide by n. Furthermore, let
m→  ∞ , (n → ∞ ) and dnn →  δ (note that then dnn−1 →  δ). It follows from
Proposition 2.30 that we are looking for
mtn
Hq (δ) ≤ lim .
n→∞ n
We may select tn to make the above an equality. Therefore, we show the
existence of the required irreducible polynomial gn (x). Further, note that it
follows from Proposition 3.21 that the information rate, R(Γ(Gn , gn (x))),
of the classical Goppa code Γ(Gn , gn (x)) is at least 1 − mtn
n , and we prove
that
lim R(Γ(Gn , gn (x)) ≥ 1 − Hq (δ).
n→∞ 
84 Introduction to Algebraic Coding Theory

3.4.1. Decoding Classical Goppa Codes

Our process of decoding classical Goppa codes is similar to the decoding


process of a BCH code (see Section 3.2). Let g(x) be a polynomial of degree
2t. Then, the classical Goppa code Γ(G, g) has a minimal distance ≥ 2t + 1.
We expect to correct t errors. The process of decoding is as follows.
Let r = [r1 , . . . , rn ] be the received word. Define the syndrome, Sr (x), as
n
ri
Sr (x) = mod(g(x)).
i=1
x − γi

If the syndrome Sr (x) = 0, then r is a code word. If the syndrome Sr (x) = 0,


then we apply the following process to decode. Note that (x − γi ) and g(x)
are co-primes. Therefore, it follows from equation (1) before Definition 3.20
that
1
≡ hi (x) mod(g(x)),
x − γi

where hi (x) is a polynomial of degree less than 2t. Then, the syndrome
Sr (x) can be expressed as a polynomial h(x) which is equivalent to Sr (x)
mod (g(x)) and of degree less than 2t. The important fact is that Sr (x),
and hence h(x), is computable and known. Furthermore, let r = c+ e where
c is a code word and e is the error word, M = {i : ei = 0} = the error
locations of r with cardinality(M ) ≤ t. Then, Sc (x) = 0 and
ri
Sr (x) = Se (x) = = ri hi (x) mod(g(x)) = h(x) mod g(x).
x − γi
i∈M i∈M

As in the decoding a BCH code, let αr (x) = i∈M (x − γi ) be the error



locator of r and ωr (x) = i∈M ei j∈M\i (x − γj ) with degree < t be the
error evaluator. Then, we have the following key equation:
ri ei
Sr (x)αr (x) = (x − γi ) = (x − γi )
x − γi x − γi
i∈M i∈M i∈M i∈M

= αr (x)h(x) mod(g(x)) = ωr (x) mod(g(x)).

The decoding is to find αr (x) and ωr (x). Since Sr (x) is equivalent to


h(x) mod(g(x)), we conclude that ωr (x) is in the ideal generated by h(x)
and g(x). We use long algorithm to find ωr (x) and αr (x) in a routine
way as follows. Let f1 (x) = h(x) and f2 (x) = g(x). Now, note that
Ring Codes 85

deg(h(x)) < 2t = deg(g(x)). We have


1 · f1 (x) + 0 · f2 = f3
......
αu f1 (x) + γu f2 (x) = fu (x)
......,
where u is the first time that nu = deg(fu (x)) < t, i.e., nu−1 =
deg(fu−1 (x)) ≥ t. We have by Proposition 2.19 that deg(αu (x))(≤ 2t −
nu−1 ) ≤ t. Combining the above equations, we have
αr (x)(αu (x)f1 (x)) = αu (x)ωr (x) = αr (x)fu (x) mod (g(x)).
Since the multiples of the polynomials αu (x)ωr (x) and αr (x)fu (x) are of
degrees < 2t = deg(g(x)), we conclude
αu (x)ωr (x) = αr (x)fu (x),
fu (x) ωr (x)
= .
αu (x) αr (x)
Now, it is a simple matter to finish the decoding process in the following
way: (1) Throw away the G.C.D. of αu (x) and fu (x), and call the resulting
polynomials α∗u (x), fu∗ (x). Furthermore, we make α∗u (x) monic and adjust
fu∗ (x) without changing the fraction. We shall still call them α∗u (x), fu∗ (x),
then we have αr (x) = α∗u (x) and ωr (x) = fu∗ (x). (2) Find all roots of αr (x),
we have M . Let γi be a root of αr (x). Compute ωr (γi ). Then, we have
ei j∈M\i (γi − γj ) and hence ei . We correct the word by taking c = r − e
and check c to see if it is a code word. If it is, then we are done. If not, then
there are more than t errors, and we return an error message.

Exercises

(1) Decode Reed–Solomon [15, 5, 11] code.


(2) Decode the classical Goppa code in the example of Section 3.4.
(3) Write a computer program to decode Reed–Solomon [15,7,9] code.
(4) We have a document to be preserved. The document is of length 70, 000
letters (including all blanks, punctuation, etc.). If the decaying rate is
1% for every ten years, and if in case that 30% of the letters are ruined,
then the whole document is unreadable. How long will the document
last if it is written plainly or if it is written with Reed–Solomon code
[15,7,9]?
This page intentionally left blank
PART III

Algebraic Geometry
This page intentionally left blank
Chapter 4

Algebraic Geometry

In the last chapter, we discussed the Reed–Solomon codes, which are used
to evaluate a polynomial of degree at most k − 1 at n(≥ k) points in the
ring of polynomials Fq [x], where Fq is a field with q elements. We see
that it is equivalent to evaluate L(D) on PF1q (see Example 1 of Section
5.1) with D = nP∞ (see Section 4.3). Similarly, classical Goppa code can
be considered as a code over PF1q . We may extend the concept of Reed–
Solomon codes and classical Goppa codes to codes over any projective
smooth curve (instead of lines only). The Riemann–Roch theorem induces a
richer algebraic structure, and the corresponding codes will be more useful.
Now we advance to the next level of coding theory, in which we focus on
geometric Goppa codes. Our attention on the rings Fq [x] will be refocused
on algebraic functions of one variable, i.e, function rings of curves. We need
some knowledge of algebraic geometry, a rich and beautiful subject. It has
been there for two thousand years. Great minds of the past have rebuilt
it again and again. Some guiding lights from the past will be treated as
simple corollaries of theorems, and the foundations of algebraic geometry
might come last.
We will not try to give a comprehensive description of algebraic geom-
etry in this book. Instead, we emphasize the useful curve theory, especially
Riemann–Roch theorem and Weil’s theorem on the zeta function, which
gives us a count on the number of rational points on a smooth curve over
a finite field Fq , and simply discuss it to the extent that the reader will be
able to bear the burden with us. Sometimes for the readers’ understanding
of the subject, we have to give the general picture of algebraic geometry
which may not be relevant to the study of coding theory. Most proofs will be

89
90 Introduction to Algebraic Coding Theory

deferred to the standard books at hands such as Commutative Algebra [16]


by Zariski and Samuel, Introduction to the Theory of Algebraic Functions
of One Variable [10] by Chevalley, Introduction to Algebraic Geometry [13]
by Mumford, and Algebraic Geometry [26] by Harshorne. We give some
examples to illustrate the points.
Usually we consider an algebraically closed ground field in algebraic
geometry, but in the context of coding theory, we have to discuss the case
that the ground field is finite, and hence, we have to consider a field which is
not algebraically closed. In this survey chapter, we assume that the ground
field is either algebraically closed (which includes the complex field C) or
the real field R or finite field Fq if not stated explicitly otherwise.

4.1. Affine Spaces and Projective Spaces

In plane Euclidean geometry with real axis, we discuss quadratic curves:


ellipses, parabolas, hyperbolas. That is the beginning of algebraic geometry.
In the general setup which we intend to discuss, we: (1) replace the real
number axis by an axis over any field, (2) replace the words plane (i.e.,
two-dimensional) by any finite-dimensional affine or projective space, (3)
replace a single quadratic equation by any system of polynomial equations
(of any finite degrees).
The point (1) above is natural. The point (2) of increasing the dimension
is natural too. Let us consider the extension to projective spaces. The usual
point set {[a1 , . . . , an ] : ai ∈ K} (without the usual vector space structure)
is called an n-dimensional affine space An (K) over the field K.
It is well known that two distinct parallel lines in an affine plane will
never meet or only meet at infinity. If we add the points at infinities, then
all pairs of two lines will meet, at affine plane or the infinities. Following the
notations of Zariski and Samuel [16] (Vol. II, p. 127), we have the algebro-
geometric concept of completeness, which is unrelated to the topological
concept of completeness (which is defined by using Cauchy sequences which
in general, for an abstract field K, does not exist). In algebraic geometry,
we define that an algebraic set is said to be complete if every valuation of
the quotient field has a center in the set. For instance, an affine space is
not complete because the valuation defined by the subspace at infinity
has no center at the affine space. 1

1 Also see Munford [13] Section 9, Chapter 1.


Algebraic Geometry 91

The set of all points in affine space An (K) plus the points at infinity is
called n-dimensional projective space P n (K). It can be show that projective
space P n (K) is complete. Rigorously, we define an equivalence relation ≈
in the (n + 1)-dimensional affine space An+1 (K)\{0}= {(a0 , a1 , . . . , an ) :
not all ai = 0} as follows:

(a0 , a1 , . . . , an ) ≈ (b0 , b1 , . . . , bn ) ⇔ ai = tbi for some common t = 0.

We define the n-dimensional projective space P n (K) as (An+1 (K)\{0})/ ≈.


There is a natural embedding of An (K) into P n (K) by sending (a1 , . . . , an )
to (1, a1 , . . . , an ). Let us further explain this point of view.
It is easy to see that P 0 (K) is just the point {a0 : a0 = 0}/ ≈.
For P 1 (K), all points can be grouped into two sets: {(1, a1 ) : a1 ∈
K} ∪ ({(0, a1 ) : a1 = 0}/ ≈ {(1, a1 ) : a1 ∈ K} ∪ P 0 (K) = A1 (K) ∪ P 0 (K).
Let us consider the following picture.

Let us consider the real field R case. The projective line P 1 (R) can be
represented by all lines in the plane passing through the origin. Let the line
and the horizontal line span an angle θ. As long as θ = 90◦ , the line will
have an intersection (1, tan θ) with the vertical line (1, t); therefore, all lines
passing through (0, 0) can be represented as the vertical line x = 1 union
with one extra point P 0 (K) or the point at ∞ corresponding to θ = 90◦ .
Therefore, A1 (R) is the real line, and P 1 (R) is a cycle. If the ground field
is the complex field C, then A1 (C) is the complex line which is a real plane,
and P 1 (C) is the one-point compactification of the real plane, i.e., a sphere.
In general, for any field K, we have

P n (K) = An (K) ∪ P n−1 (K).

In the case the ground field is the real field R, we have P 2 (R) = A2 (R) ∪
P 1 (R), i.e., an affine plane with a circle attached at infinity. We have the
following picture for P 2 (R).
92 Introduction to Algebraic Coding Theory

The above does not enlighten any non-expert. Topologically, we may view
P 2 (R) as a unit closed disc (the open disc is homeomorphic to A2 (R))
with the antipodes boundary points identified (which is homeomorphic to
a circle, note that two antipodes determine a line). The interior of the disc
is identified with A2 (R) by v → v/(1 − |v|), where v is an vector inside the
unit disc and |v| is the length of v. The real project plane is a non-orientable
surface.
From the point of view of algebraic geometry, the affine spaces are not
complete 2 (in the sense of valuation) over any field K, while the projective
spaces are.
n
We define a linear subspace L in PK as the image of U\{0}, where U is
a positive-dimensional subspace of the An+1 K treated as a vector space. We
define dim(L) = dim(U) − 1.
We have the following well-known theorem about vector spaces:

Proposition 4.1. Let U1 , U2 be two subspaces of a vector space Kn+1 .


Then, we have

dim(U1 ∩ U2 ) ≥ dim(U1 ) + dim(U2 ) − (n + 1). 


n
Proposition 4.2. Let L1 , L2 be two linear spaces of PK . Then, we have

dim(L1 ∩ L2 ) ≥ dim(L1 ) + dim(L2 ) − n. 

2
Corollary 4.3. Let L1 , L2 be two linear spaces of PK and dim(L1 ) =
dim(L2 ) = 1. Then, the intersection of L1 , L2 is not empty. 

Even over a finite field, both AnK and PK n


are finite. However, the
distinction pointed out by the preceding Corollary is obvious. We say that

2 In a notation of Zariski and Samuel [16].


Algebraic Geometry 93

projective spaces are complete.3 The above corollary will be generalized to


Bézout’s Theorem (see Proposition 4.20).

4.2. Affine Algebraic Varieties

Although we only need curve theory for coding theory, we shall enjoy the
beautiful theory of algebraic geometry. Let us speak further about AnK .
There are two objects involved as follows: (1) the algebraic objects of the
affine space AnK , including a polynomial ring K[x1 , . . . , xn ] (over the projec-
n
tive space PK ; there is a homogeneous polynomial ring K[x0 , x1 , . . . , xn ]h
={all homogeneous polynomials}), and the system of equations there. (2)
Geometric objects of the affine space AnK , which is the solution sets X(I)
of ideals I of equations. Similarly, we have projective geometric objects,
i.e., the solution sets X(Ī) of homogeneous ideals of equations. The relation
between these two objects may be complicated in general. Primarily, we
take the point of view of algebra. We shall study affine varieties first.
We have the following well-known Hilbert basis theorem.

Proposition 4.4 (Hilbert basis theorem). The ideal generated by any


system of polynomials in n variables K[x1 , . . . , xn ] can be generated by
finitely many polynomials.

Proof. Zariski and Samuel Vol. I [16]. 

Note that X(I) may be empty even in the simplest case that I = (f ),
where f (x1 , . . . , xn ) = non-zero constant. Look at the following examples.

Example 1: In general there is no one-to-one relation between the


algebraic objects and the geometric objects. Let K = R the real field and
f (x1 , . . . , xn ) = x21 + 1, then there is no solution. Similarly, the equation
1 = 0 has no solution, and it is easy to see that the ideals (f ) = (1).
Therefore, the different ideals may produce the same solution sets. However,
if we extend the field R to the complex field C, then there are two solutions
x1 = i, −i, and the equation 1 = 0 still has no solution, and sometimes we
cannot tell the ideals apart by looking at the solution sets. 
Example 2: It is very important to know the number of solutions of
equations for the construction of algebraic-geometric coding. This is a
non-trivial problem. We want to indicate that sometimes there is no solution

3 In the sense of Zariski and Samuel [16].


94 Introduction to Algebraic Coding Theory

for a non-trivial equation. Let Fq be a finite field with q elements, where


q = pm , and let xq−1
0 + · · · + xq−1
n = 0, where n + 1 is less than p and greater
than 1 be a homogeneous polynomial, Since xq−1 i = 1 for any xi = 0. Then,
n
in PK , there is no solution. However, over a suitable finite field extension
of Fq , there will be solutions. 
We want to use algebra to study geometric objects. It would be better to
establish an one-to-one correspondence between the algebraic objects and
the geometric objects. Then, we have to go to some suitable extension of
the ground field K. If we consider all algebraic objects as rings, ideals etc.,
it might be simpler to consider the ultimate algebraic field extension, the
algebraic closure of K. In algebraic geometry, we usually assume that the
field K is algebraically closed from the very beginning. However, in coding
theory, we only consider the finite fields for the letter fields; hence, we
may not consider an algebraically closed field only. Furthermore, if we only
consider a fixed ideal I, we may be satisfied with some finite field extension
of K. The advantage is that if K is a finite field, then a finite extension
of it stays finite. Therefore, we may stay with finite and non-algebraically
closed fields. There are many properties of finite fields we can use.
Even we consider finite field sometimes, and we may study the cases
that the ground fields are algebraically closed. Let us see what happens if
we replace the field K by its algebraic closure K̄ or simply assume that K
is algebraically closed. Then, we have the following well-known theorem.
We need the following preliminary definition.

Definition 4.5. Let I be an ideal of a given ring R, the radical of I, Rad I,


is defined to be
I ⊂ Rad I = {f : f m ∈ I, for some positive integer m}.
We define a radical ideal I to be an ideal I which equals Rad I. 

We have the following important proposition.


Proposition 4.6 (Hilbert’s nullstellensatz). Let K be an algebraically
closed field. Let I be an ideal of K[x1 , . . . , xn ] and S any subset of AnK . Let
X(I) = {(a1 , . . . , an ) : (a1 , . . . , an ) ∈ Kn , g(a1 , . . . , an ) = 0, f or all g ∈ I} and
J(S) = {g : g ∈ K[x1 , . . . , xn ], g(a1 , . . . , an ) = 0, f or all (a1 , . . . , an ) ∈ S}.
Then,
Rad I = J(X(I)) = {f ∈ K[x1 , . . . , xn ] : f (a1 , . . . , an ) = 0,
f or all (a1 , . . . , an ) ∈ S}.
Algebraic Geometry 95

Proof. Zariski and Samuel Vol. II, p. 164 [16]. 

Example 3: Let C be the field of complex numbers. Note that C is


algebraically closed. Let I be a proper ideal (i.e., I = (1)). Then, I ⊂ some
maximal ideal M. Let I ⊂ Rad (I) = I ⊂ M. We have C[x1 , . . . , xn ]/M ≡
C. Let (x̄1 , . . . , x̄n ) = (a1 , . . . , an ). Then clearly, M = (x1 −a1 , . . . , xn −an ),
and f (a1 , . . . , an ) = 0 for all f ∈ I . It means that the ideal I (hence I) has
at least one common solution. 
If we have an algebraically closed ground field K, then we define an affine
algebraic variety to be the solution set of a radical ideal in K[x1 , . . . , xn ].
Then, the above proposition implies that in the case of algebraically closed
ground fields K, there is a natural one-to-one correspondence between the
set of algebraic varieties (= the sets of solutions of system of equations from
radical ideals) and the set of radical ideals. It means that in this setup, the
algebraic objects and the geometric objects reflect each other.

Exercises

(1) We define JacRad(I)= ∩m∈X(I) m. Any ring with the property that
JacRad(I) = Rad(I) for all ideals I will be called a Hilbert ring (or a
Jacobson ring). Show that the power series ring K[[x]] is not a Hilbert’s
ring.
(2) Let f = x2 y 3 ∈ K[x, y]. What is Rad (f )?
(3) Show that Rad((x2 + 1)) = Rad((1)) in the ring R[x], where R is the
field of real numbers.
(4) Find a finite set of generators for the ideal generated by {xi − y i+1 :
i = 10, 11, . . .} in the polynomial ring K[x, y] of two variables.
(5) Is the ideal (x1 + 1, y 2 + 1) ∈ R[x, y] prime? radical? where R is the
field of real numbers.

4.3. Regular Functions and Rational Functions

From complex analysis on the extended complex numbers (C ∪ {∞}) = C


(or P1 (C) which is a projective space)), we have three kinds of functions to
be considered:

(1) All meromorphic functions without poles. By Liouville’s theorem, these


consist of all constant functions.
96 Introduction to Algebraic Coding Theory

(2) All meromorphic functions with finitely many poles. It can be shown
that this is the set of all rational functions C(x) in one variable.
(3) All meromorphic functions with infinitely many poles, i.e., we include
all meromorphic functions with essential singularities. The Picard’s
great theorem told us that possibly except one value, the function will
take any value infinitely many times. It is not well studied.

Apparently the first set is too narrow; the field of constants is common
to many projective curves and will not tell the the underlining geometric
sets apart. It may not be useful for our study. The third set is too big and
not well studied. Riemann selected the second set and proved an important
theorem (see Proposition 4.28) which tells the underlining geometric sets
apart by the concept of genus (see Proposition 4.28). Since then, the field
of rational functions become an indispensable tool of algebraic geometry.
In the present book of algebraic coding theory, we need the results of
Riemann–Roch theorem over algebraic functions of one variable over a finite
field. This was started in the work in 1882 by Dedekind4 and Weber.5 Even
the concept of divisors were theirs. The form of Riemann–Roch theorem we
need is due to Weil.6
If we consider all rational functions of any algebraic curve C over a field
K, then we have an infinite-dimensional vector space. We want to classify
them. We define two algebraic curves to be birationally isomorphic iff their
rational function fields are isomorphic over K.

Definition 4.7. Two curves C, C  are said to be birationally equivalent iff


the rational function fields F(C), F(C  ) of them are K-isomorphic. 

However, it is only a qualitative statement about birational isomor-


phisms. Riemann ingeniously pick up rational functions with no worse poles
than the prescribed ones. He was able to show that it is a finite-dimensional
vector space, and the dimension is tied to a global topological invariant,
the genus. Later on, we establish the following Riemann’s inequality, where
d(D) is the degree of a divisor D (see Definition 4.22) and (D) is the
dimension of the L(D) (see Definition 4.22):

(D) ≥ d(D) + 1 − g,

4 German mathematician, 1831–1916.


5 German mathematician, 1842–1913.
6 André Weil (1906–1998), an influential French mathematician.
Algebraic Geometry 97

where g is a non-negative integer which is the genus. Furthermore, if d(D) ≥


2g − 1, then
(D) = d(D) + 1 − g.
One of the central approaches of mathematics is to define a set of
functions R(S) on a set S and to study the relations between R(S) and S.
We may have differentiable or analytic functions on a set S, then we may
have differentiable or analytic geometry. In algebraic geometry, following
Riemann, we are more interested in rational functions which may have
poles inside a projective variety.
Before we study the projective varieties and the fields of rational
functions associated with them, let us consider an affine piece, an affine
variety. Given an affine algebraic variety X defined by an ideal J(X),
we define the ring of regular functions R[X] as K[x1 , . . . , xn ]/J(X) =
K[x¯1 , . . . , x¯n ], i.e., we only consider induced polynomials from the affine
space AnK to X. Naturally, two induced polynomials are considered to be
identical iff they have identical values on the algebraic variety, i.e., it is the
view that the values f (P) of a function determine a function f .
Let us write f (P) as [f, P]. Over the complex numbers, a function [f, ∗]
is determined by f (P) for all points P. Apré Grothedick,7 in general, we
believe that a point P is determined by [∗, P] for all functions ∗ ∈ R(X).
Pushing one step further, we may identify a point P as an evaluation map
P : R[x] → R[X]/P a field. Then clearly, we require that P is a maximal
ideal. We may generalize the above considerations to a ring R. We have the
following definition.

Definition 4.8. A point P of a given affine algebraic variety X over


a field K with the ring of regular functions R[X] is a maximal ideal
P of R[X]. If K ≡ R[X]/P under the map, then the point is called a
rational point. Otherwise, it is a non-rational point. In general, given any
noetherian ring R, we may define the points set as m-spec(R) = {m :
m maximal ideal of R}.
Let X(J be defined as {m : m ∈ m-spec(R)}. Let f ∈ R. We define
f (P) as the image of f under the canonical map π : R → R/P. 

Example 4: Let us consider the ring Z of all integers. Let I = (0). Then
clearly, all maximal ideals are of the form (p), where p is a prime number.

7 A.Grothendick (1928–2014), stateless-French mathematician, founder of modern


Algebraic Geometry.
98 Introduction to Algebraic Coding Theory

We have X((0)) = {(p) : p prime}. And X((p)) = (p), X((12)) = {(2), (3)}.
Moreover, we have Rad((12)) = (2) ∩ (3) = (6). 

Example 5: Let X be the affine line A1R over real field R. Then, (x2 + 1)
is a maximal ideal of R[x]. It is a point. However, x2 + 1 = 0 has no real
solution, and it can be shown that R[x]/(x2 + 1) ≡ C ≡  R, where C is the
field of complex numbers. So, the point (x2 + 1) is not a rational point. 

Definition 4.9. Let P be a maximal ideal. The algebraic degree of the


field extension [R[X]/P : K] is called the residue degree μ(P). Rational
points are points with residue degree 1. 

We have the following proposition.

Proposition 4.10. A point P is rational ⇔ P = (x1 − a1 , . . . , xn − an ),


where ai ∈ K for all i.

Proof. (⇒) If P is rational, then R[X]/P = K, so x̄i = ai ∈ K for


all i. Furthermore, (x1 − a1 , . . . , xn − an ) is a maximal ideal, and P =
(x1 − a1 , . . . , xn − an ).
(⇐) If P = (x1 − a1 , . . . , xn − an ), then R[X]/P = K. 

We consider the case that K[x1 , . . . , xn ]/J(X) is an integral domain,


i.e., J(X) is a prime ideal. Then, the algebraic variety X will be called
an irreducible algebraic variety. We restrict our attention to only the set
of irreducible algebraic varieties, and by abusing the language, using the
term algebraic variety only for an irreducible algebraic variety, if not stated
explicitly otherwise. In this case, the total quotient ring of R[X] is a field.
We call it the rational function field F(X) of the algebraic variety X. If
the ideal J(X) stays prime for all field extensions of K, then we say the
algebraic variety X is absolutely irreducible. The ideal (x2 + 1) of R[x] is
not absolutely irreducible, while the ideal (x + 1) is absolutely irreducible.

Definition 4.11. We define the local ring of an affine algebraic variety


X at a point P as K[x¯1 , . . . , x¯n ]p = {f /g : f ∈ K[x¯1 , . . . , x¯n ], g ∈
/ P}.
Algebraically, we define a local ring, (R, q), as a ring R with a unique
maximal ideal q. It is easy to see that the local ring (R, q) of an affine
algebraic variety X at a point P is a local ring in the sense of algebra.
We recall the definition of its q-adic completion R̄ as the completion of R
with respect to the metric induced by the ideal q as follows. In general, let
Algebraic Geometry 99

(R, q) be a local ring; we define the order distance function as

ordq (f ) = max{m : f ∈ qm } if f ∈ F(X)\0, otherwise ∞.

Furthermore, we define the distance between f, g ∈ R as

d(f, g) = 2− ordq (f −g) .

It is routine to check that R forms a metric space with respect to the


preceding defined distance function. We complete R as a metric space and
call R̄ the q-adic completion. 

There is a dimension theory of local rings involving the maximal length


of chains of prime ideals. The reader is advised to consult Zariski and
Samuel [16] for the definition.

Definition 4.12. A local ring (R, q) of dimension m is said to be


a regular local ring iff its maximal ideal q can be generated by m
elements {x1 , . . . , xm }, which are called uniformization parameters. In the
equicharacteristic case (let the residue field R/q be K and the quotient
field of R and K have the same characteristic (cf. Zariski and Samuel, Vol.
II, p. 304 [16] , Cohen’s theorem), the completion R̄ of R is a formal power
series ring in m variables. In this case, R̄ = K̄[[x1 , . . . , xm ]], where K̄ is
algebraic over K, and the point is rational iff K̄ = K. A point P is said to
be a regular point of an affine algebraic variety X if the local ring at P is
regular. 

Example 6: Let p be a prime number. Then, the local ring Zp is regular


with quotient field Q of characteristic 0, while its residue field is of
characteristic p > 0. Therefore, it is not the equicharacteristic case. Its
completion is the p-adic numbers which is not a formal power series ring.


Example 7: Let P = (x2 + 1) be a maximal ideal in R[x], where R is


the real number field. Then, the local ring of R[x] at P is regular with the
residue field R[x]/(x2 + 1) ≡ C, where C is the field of complex numbers.
In the completion of R[x](x2 +1) , let t = x2 + 1, then there is an element
x
√ = i,
1−t
the completion of R[x](x2 +1) is C[[t]]. 
100 Introduction to Algebraic Coding Theory

Exercises

(1) Let us consider a curve C over a field K defined by

xy + x3 + y 3 = 0.

Is the local ring of C at (0, 0) regular?


(2) Let R be the field of real numbers. Show that there are no rational
points for two varieties X1 , X2 defined by ideals (x2 + 1) and (1).
Furthermore, show that there are non-rational points in X1 but not
in X2 .
(3) Prove Eisenstein’s criterion: Let f (x, y) ∈ K[x, y] be written as
n

f (x, y) = fi (x)y i .
i=0

Suppose that (f0 (x), f1 (x), . . . , fn (x)) = (1) and that there exists b ∈ L
which is a finite extension of K such that

fn (b) = 0,
fi (b) = 0, f or i < n,
(x − b)2  f0 (x).

Then, f (x, y) is absolutely irreducible.


(4) Using the above Eisenstein’s criterion, show that y n − x is absolutely
irreducible.
1
(5) Let P1 be the point (1, 0) on PK with the local ring OP1 . Let P2 be the

point (0, 1) on the curve defined by x3 + y 3 = 1 with the local ring OP2 .

Show that OP1 is non-isomorphic to OP2 . Show that ÔP1 is isomorphic

to Ô P2 .
(6) Let P = (x2 +1) be a maximal ideal in R[x], where R is the real number
field. Show that the local ring of R[x] at P is regular.

4.4. Affine Algebraic Curves

In coding theory, at present, the only part of algebraic geometry we use


is algebraic curve theory. An algebraic curve is determined by its rational
function field which is of transcendental degree 1 over a ground field K.
Algebraic Geometry 101

We shall start with affine algebraic geometry. We present a broad discussion


which will help us understand the subjects better and open up possible new
unknown applications. We have the following proposition.

Proposition 4.13 (Noether’s normalization lemma). Let R be an


integral domain which is a finitely generated extension ring of a field K.
Let r be the transcendental degree of the quotient field F of R over K.
Then, there is an algebraically independent subset {t1 , . . . , tr } such that R
is integral over K[t1 , . . . , tr ].

Proof. Zariski and Samuel Vol. I, p. 104 [16]. 

Dimension theory is an interesting subject. For linear algebra, we define


the dimension of a vector space as the number of elements of a basis. In
topology, some times, we use the C ěch theory of dimension. In analysis, we
have Hausdorff dimension. In algebraic geometry, we define the dimension
of AN n
K (or PK ) to be n. The above lemma lays the ground for us to
define the dimension of an (irreducible affine) algebraic variety X to be
the transcendental degree of the field F (X) over the field K. A curve is an
algebraic variety X of dimension 1.
We have the following definition of smoothness by the Jacobian criterion,
the readers are referred to Section 2.6 for the definition of derivative.

Definition 4.14. A rational point x = (a1 , . . . , aN ) is a smooth point of


an affine variety X = X(K[x1 , . . . , xN ]/ (f1 , . . . , fr ))) of dimension n iff

rank[bij ] = N − n,

where
∂fi
bij = (a1 , . . . , aN ).
∂xj

Otherwise, it is called a singular point.


An algebraic variety is smooth if all rational points are smooth points.


Proposition 4.15. For our case of a perfect ground field, a point is regular
iff it is smooth.

Proof. Mumford [13] p. 343. 


102 Introduction to Algebraic Coding Theory

Proposition 4.16. Let f (x, y) = 0 define an affine plane curve C. Then,


the collection of all the singular points of C, the singular locus, is defined
by the ideal (f (x, y), fx , fy ).

Proof. It follows from the definition. 

Proposition 4.17. Let f (x, y) = 0 define an absolutely irreducible


affine plane curve C. Then, the affine plane curve C is smooth iff
(f (x, y), fx , fy ) = (1).

Proof. It follows from the previous proposition. 


Example 8: Let n be a positive integer such that p†n. Let us consider the
following Fermat curves over Fpm with equation,

xn + y n − 1 = 0.

It follows from the above proposition that it is smooth for this affine part.

In general, we may discuss even arithmetic cases. See the following
example.
Example 9: We may consider the plane curve C defined by f (x, y) =
y 2 − x(x − 1)(x − a) over a field K of characteristic not 2, where a = 0, 1. It
is easy to see that (f, fx , fy ) = (1). Hence, the affine curve C is non-singular.
Let us homogenize the equation by introducing a variable z as
g(x, y, z) = y 2 z − x(x − z)(x − az). Clearly, g(x, y, 1) = f (x, y). It is clear
that the affine curves defined by g(1, y, z) or g(x, 1, z) are smooth. We say
the projective curve defined by g(x, y, z) is smooth. 
Let R be the rational function field of an algebraic variety over a ground
field K. The algebraic closure of K in the quotient field of R is called the
field of constants. For simplicity, we assume that K = the field of constants,
i.e., every element in R outside K is transcendental over K.

Proposition 4.18. Let F be a finitely generated field extension of a field


K. If the ground field K is perfect, then there is a transcendental basis
{x1 , . . . , xn } of F over K such that F is separable over K(x1 , . . . , xn ).

Proof. Zariski and Samuel Vol. I, p. 105 [16]. 


In case that the transcendental degree n = 1, since every finite separable
extension is a simple extension, if the field K is perfect, we may reduce the
Algebraic Geometry 103

field to K(x, y). It means that we find a plane model for the curve. Although
any curve has a plane curve as a model (i.e., those two curves share the same
rational function field, or those two curves are birationally isomorphic).
3
Note that a projective plane curve will have at most qq−1 −1
= q2 + q + 1
2
rational points (PK has so many points). A non-plane curve may have
many more rational smooth points (cf. Proposition 4.10) . If the number of
smooth points matters in our discussion (see the next chapter on geometric
Goppa codes), then we may not be able to discuss only smooth curves in
the plane, while it is known that any curve can be represented as a smooth
space curve.
Let C be a curve; recall that the curve is said to irreducible if the ideal
J(C) is prime. Let us consider the following example.

Example 10: Let the field K = R the field of real numbers. Let J(C) =
(x2 +1) in R[x, y]. Then, J(C) is irreducible. However, if we extend the field
R to the complex field C, then the generating polynomial (x2 + 1) splits
into a product (x + i)(x − i). So, the curve splits into two lines. 

Definition 4.19. Let the field K be algebraically closed. Let C be an


irreducible plane curves defined by equations F = 0 and P = (x−a, y−b) be
a point of A2K . Let us transform the point to (0, 0). Let f (x, y) = fm (x, y)+
higher terms with m order of f (x, y) at (0, 0). We have the following: (1)
C is smooth at (0, 0) iff m = 1; (2) if m ≥ 2 and fm (x, y) splits to distinct
linear polynomials, then C has ordinary singularity. 

Example 11: Let K = C the complex field. Let F be the following


equations:

(1) F = x2 + y 2 + x3 ,
(2) F = x2 + y 3 .

Then, it is easy to see that in case (1), C has an ordinary singularity,


and in case (2), C does not have an ordinary singularity, or we say the
singularity at (0, 0) is not ordinary. 

In the next proposition, we show that every curve over an algebraically


closed ground field K, any curve C is birationally isomorphic to a plane
curve with only ordinary singularities. For a proof, the reader is referred to
a book on algebraic curve theory by Walker [15].
104 Introduction to Algebraic Coding Theory

Proposition 4.20. Let the ground field K be algebraically closed. Then,


any curve C over K is birationally isomorphic to an plane curve which is
non-singular or is singular with only ordinary singularities. 

Remark: the above proposition will be used in 4.30 (Plücker’s formula) to


find the genus (see Proposition 4.28) of an algebraic curve. Furthermore, if
we discuss the notions of quadratic transformation and monoid transforma-
tion (or blow-ups) at those singular points, then it is easy to see that any
algebraic curve over an algebraic closed field has a non-singular model.
Example 12: Let K= C, f = x2 + y 3 . Then, the equation can be rewritten
as (x/y)2 + y = z 2 + y with x/y = z, and the field K(x̄, ȳ) can be rewritten
as K(¯(z), ȳ) = K(z̄). Therefore, the curve c is birationally isomorphic to a
line which is non-singular. 

Exercises

(1) Let C be a curve defined over K with ch(K) = 3 by the following


equation:
xy + x3 + y 3 = 0.
Find the local rings at (0, 0) and (1, 1) and their completions.
1
(2) Find the pole set of x for the curve PK with regular function ring K[x].
(3) Show that the curve C with defining equation F = xn + y n+1 is
birationally isomorphic to a line over any algebraically closed ground
field K.
(4) Show that x + y n is absolutely irreducible.
(5) Show that the local rings of x3 + y 3 = 1 at (1, 0) and the line x at (0)
are not isomorphic while their completions are.

4.5. Projective Algebraic Curves

The algebraic geometry used in coding theory is about smooth projective


curves. Let us discuss the historical developments of algebraic geometry.
The affine algebraic geometry is incomplete (in the sense of valuations).
Desargues8 created projective geometry. Following Poncelet,9 we defined a

8 G. Desargues (1591–1661), French mathematician.


9 J. Poncelet (1788–1867), French mathematician.
Algebraic Geometry 105

projective algebraic variety to be an affine algebraic variety with points at


infinity added. To be precise, with an ideal I in K[x1 , . . . , xn ], we define the
homogeneous ideal Ih associated with it as follows:

Ih = {xd0 f (x1 /x0 , . . . , xn /x0 ) : f ∈ I, d = deg(f )},

and we define the projective completion as X(Ih ). Then, we define a


projective variety as the projective completion of an affine algebraic variety.
Note that inside the projective variety, the affine piece defined by x0 = 1 is
the original affine variety. Similarly, we may take one of x1 , . . . , xn to be 1,
then we get the other n affine parts of the projective variety. These n + 1
affine pieces form an affine covering of the projective variety. A projective
variety V is said to be smooth iff all its n + 1 affine pieces are smooth.

Example 13: Let n be a positive integer such that p†n. Let us consider
the following projective Fermat curves over Fpm with equation,

xn1 + xn2 = xn0 .

Let us consider the affine part defined by x0 = 1. Let x = x1 /x0 , y = x2 /x0 .


Then, the above can be rewritten as

xn + y n = 1.

From the discussion of Example 8 in Section 4.4, we know it is smooth


for this affine part. Similarly, for other affine parts, it is smooth. So, it is
smooth as a projective curve. 

We define the rational functions of an algebraic variety (affine or


projective) as elements in the rational function field F(X). Note that there
is a major difference between the concepts of functions as in the rational
function field and functions of calculus (say C[0, 1], the set of continuous
functions defined over [0, 1]).
Usually, in the second case, functions have a common domain of
definition.
The common domain of definition of all rational functions is the empty
set. For the rational function field F(X), there are functions which have a
pole at any given rational point P; say, let f be a non-constant rational
function not having a pole at P, f (P) = a, then 1/(f (x) − a) will have a
pole at P. Therefore, the common domain of definition is the empty set, not
the algebraic variety X. This is a point which is taken up by sheaf theory
and makes algebraic geometry more interesting.
106 Introduction to Algebraic Coding Theory

Proposition 4.21. Let the ground field K be algebraically closed. Then,


any curve C over K is birationally isomorphic to an plane curve which is
non-singular or is singular with only ordinary singularities. 

Remark: The above proposition will be used in 4.30 (Plücker’s formula)


to find the genus (see Proposition 4.28) of an algebraic curve. Furthermore,
if we discuss the notions of quadratic transformation and monoid transfor-
mation (or blow-ups) at those singular points, then it is easy to see that
any algebraic curve over an algebraic closed field has a non-singular model.

Example 12: Let K= C, f = x2 + y 3 . Then, the equation can be rewritten


as (x/y)2 + y = z 2 + y with x/y = z, and the field K(x̄, ȳ) can be rewritten
as K(¯(z), ȳ) = K(z̄). Therefore, the curve c is birationally isomorphic to a
line which is non-singular. 
We wish to generalize Corollary 4.3 on the intersection of two projective
2
lines on PK . Let us consider any two homogeneous forms F, G of degrees
m, n, respectively. They define two not necessarily irreducible projective
plane algebraic curves C, D of degree m, n, respectively. We say that
they have a component if F, G have a non-constant common factor. The
intersection multiplicities of the two curves at various intersection points
are complicate. Say, a straight line usually intersects a smooth curve C at
point P with multiplicity 1. However, if the straight line is tangential to
the curve at the point P, then the intersection multiplicity is at least 2.
This is the intersection multiplicity. There is a proper way of counting. The
following Bézout’s theorem is historically important; however, it will not
be used in this book, and we simply list it here. Let us quote the Bézout’s
theorem as follows.

Proposition 4.22 (Bézout’s theorem). Let C, D be two not necessarily


irreducible projective plane algebraic curves of degree m, n, respectively. If
they do not have a common component, then they intersect at mn points
with intersection numbers properly counted.

Proof. This theorem follows from Walker’s book, Algebraic Curves [15].


Let us see the following example.

Example 14: Let us consider two projective curves z n−1 x + y n and x


defined over a field Fq . Then, the intersection multiplicity of these two
curves at P = (0, 0, 1) is n. 
Algebraic Geometry 107

[Valuations and Places] We take the point of view of algebra in the study
of curve theory. We follow the book [10]. First, instead of geometric object,
we are given a function field F of one variable over a perfect field K, i.e.,
F contains an element x which is transcendental over K and F is algebraic
of finite degree over K(x). Furthermore, we assume that K is algebraically
closed in F.
Zariski10 (cf. Vol. II, p. 110 [16]) defined the Riemann surface of F
as the collection of all K-valuations of F (see following definition) since
all valuations of curves are non-singular local rings (see following for a
definition). This shows that the Riemann surfaces are non-singular models
of curves. Over the complex field C, the Riemann surface in the sense of
Zariski equals to the Riemann surface in analysis as sets; however, they are
equipped with different topologies.
Let us recall that the K-valuations are defined as the subrings O with
the following properties:

(1) O ⊃ K,
(2) O = F, and
(3) if x ∈ F\O, then x−1 ∈ O.

By a place P in F, we mean a subset P of F which is the ideal of


non-units of some valuation ring O.

[Divisors] In fact, all valuations discussed in this book are discrete


valuation of rank 1 (cf. [16] Vol. II, p. 42). Let f ∈ F\K be an element
and P a place. Define μp (f ) = max{m : f ∈ Pm }, and we write

the divisor (f ) = p μp (f )P. Then, we know that the sum is a finite

sum. Let us write the zero divisor (f )0 = p μp (f )P, where μp (f )

are non-negative, and (f )∞ = − p μp (f )P, where μp (f ) are negative.
Then clearly, (f )o = (f −1 )∞ and (f )∞ = (f −1 )0 . We have the following
proposition.

Proposition 4.23. Let f ∈ F\K. Let (f )0 = p μp (f )P. Then,

p μp (f ) ≤ [F : k(f )]. 

Proof. cf. [10] p. 15, Theorem 4. 

10 Oscar Zariski (1899–1986), Russian-born American mathematician, one of the most


influential algebraic geometers of the 20th century.
108 Introduction to Algebraic Coding Theory

From the preceding proposition, we conclude that (f ) = (f )0 − (f )∞ is


a divisor in the next definition.

Definition 4.24. A divisor is D = i ni Pi , where the sum is a finite
formal sum and all ni are integers and Pi are places. It is clear that divisors
form a group D(F) with respect to addition. If all ni ≥ 0, we say that D is
effective, in symbol, D  0. We define L(D) as

L(D) = {f ∈ F(X) : (f ) + D  0},

and we define (D) = dimK (L(D)), with the degree of D, d(D) =


 
nj [Oj /Pj : K] = j nj μj , where Oi is the valuation of Pi , and
μj = [Oj /Pj : K]. 

In general, we define the divisor group11 of a smooth projective algebraic


curve C as the abelian group generated by all points Pi (rational or not).
The concept of divisor group was courtesy of Dedekind and Weber, who
used it to extend the work of Riemann on the complex field to arbitrary
field even over a finite field. We define the following.

Example 15: Let us consider the real field R and the projective line PR1
over R. Let us consider the function f = x2 + 1. It vanishes at the maximal
ideal P = (x2 + 1) wth residue field degree μp (f ) = 2, and a double pole at
∞, P∞ . Therefore, it is divisor D = (f ) = P − 2P∞ and degree d((f )) =
1 · 2 − 2 · 1 = 0. 

We have the following proposition (cf. Chevalley [10], p. 18, Theorem 5).

Proposition 4.25. The divisor of any element f = 0 of a field of algebraic


functions of one variable is of degree 0. 

Furthermore, we have the following proposition which follows from


Chevalley [10], p. 18, Theorem 3 and its corollary. It is a generalization
of the classical Liouville’s theorem.

Proposition 4.26. Let us assume that the field K is perfect. Then, every
non-constant function f of a project curve has some poles.

Proof. See Chevalley [10], p. 18, Theorem 3 and its corollary. 

11 Chevalley [10] uses multiplicative format for the divisor group rather than the additive
group format for the divisor group in this book. Note that his L(D) is our L(−D).
Algebraic Geometry 109


Definition 4.27. Let (f ) = ni Pi . The collection of all zeroes and poles
will be called the support of f , supp(f ) = {Pi : ni = 0}. 

Example 16: Let us consider PR1 , where R is the real field. Let P = (x2 +1),
P∞ be the point at ∞ and f = x2 + 1. Then, (f )0 = P and (f )∞ = 2P∞ ,
while μ(P ) = 2. Therefore, d((f )0 ) = 2 = d((f )∞ ). 

It is easy to see that all non-zero elements f, g in F form a group,


and we have (f g) = (f ) + (g). Therefore, ({(f ) : f = 0 ∈ F}) define
a subgroup of the divisor group D(F) which is an abelian group. Their
quotient is called the Picard group of F, i.e., two divisors D, E are said
to be linearly equivalent iff D − E = (f ) for some f ∈ F and f = 0. For
projective line P1 , since for any two points P1 and P2 , there is a function
f with (f ) = P1 − P2 . It is easy to see that any two divisors of the same
degree are linearly equivalent, and it follows from Proposition 4.24 that two
linearly equivalent divisors must be of same degree. Therefore, the Picard
group of P1 is isomorphic to the group of integers Z. This is an interesting
subject while not useful in coding theory. We need more tools, and will not
go further on this subject. In the following example we show that not any
two curves are birationally equivalent directly.

Example 17: Let K be an algebraically closed field field with ch(K)=p,


where p may be zero. Let us consider the following Fermat curves Fn :

xn1 + xn2 = xn0 ,

where n > 2 and p†n (in case that p = 0, the second condition p†n is void).
1
We claim that Fn is not birationally equivalent to PK , or the rational
function field F (Fn ) = K(t). Suppose it is, we may dehomogenize the
defining equation by setting x0 = 1, and let x1 = g(t)/f (t), x2 = h(t)/f (t)
with (f (t), g(t), h(t)) = (1). We wish to deduce a contradiction if f, g, h
are not all in the field K. Suppose that there are triples f, g, h, we take
a triple f, g, h with max (deg(f ), deg(g), deg(h)) the smallest possible one.
Note that if two of f, g, h are constants, then the third one must be constant
which is impossible. The above defining equation becomes
n−1

g(t)n + h(t)n = (g(t) + ω i h(t)) = f (t)n ,
i=0

where ω is a nth root of unity. Since each prime factor p(t) of two of
(g(t) + ω i h(t))’s must be a prime factor of g, h and hence a prime factor
110 Introduction to Algebraic Coding Theory

of f , all (g(t) + ω i h(t)) are co-prime. We may rewrite the above equation as
g(t) + ω i h(t) = ci αni (t)

and deg(αi ) = deg(f (t)). Now, we select three of them with one of αi
non-constant. Take the three corresponding equations and eliminate g, h;
we get a new equation of the following form:
ei αni (t) + ej αnj (t) + ek αnk (t) = 0.
Absorbing the coefficients ei , ej , ek to the polynomials, we rewrite the
above as
βin (t) + βjn (t) = βkn (t).
Note that max (deg(f ), deg(g), deg(h)) > max (deg(αi ), deg(αj ), deg(αk )).
A contradiction. 

Exercises

(1) Let us assume that ch(K)>2. Let us consider a curve, a projective


elliptic curve C in the plane, whose affine part is defined by x22 =
(x1 − 2)(x1 − 1)(x1 − 0). Show that C is not birationally equivalent to
1
PK .
1
(2) Let us consider PK with the ring of regular functions K[x]. Find the
divisor (x).
(3) Let a curve C with its affine part defined by x2 + y 3 + x3 = 0. What is
the equation which defines its projective completion?
(4) Let a projective curve C with its affine part defined by x2 + y 3 + x3 = 0.
What is the divisor (x)?
(5) Find the intersection number I(P, F ∩G) for P = (9, 0), F = x+y +x3 ,
G = x2 − y 4 + x3 .

4.6. Riemann’s Theorem

An important mathematical approach to the study of geometric sets


is to study the relations between the geometric sets and the algebraic
objects defined over the sets. One example is the Poincaré conjecture: The
geometric sets, spheres, and the algebraic objects, the homotopy groups
of the spheres, determine each other. In fact, before Poincaré, Descartes
studied the relations between the functions which defined the algebraic
curves.
Algebraic Geometry 111

There are important Riemann’s theorem (see Proposition 4.28) and


Riemann–Roch theorem (see Proposition 4.34) for Riemann surfaces over
complex numbers or complex algebraic curves. Those theorems are gener-
alized to higher dimensions and to algebraic cases, say, over a finite field.
In coding theory, we use the Riemann–Roch theorem over a finite field.
Hartshorne stated ([11], p. 293) that if a reader is willing to accept the
statement of the Riemann–Roch theorem, he can read this chapter at a
much earlier stage of his study of algebraic geometry. This may not be a
bad idea, pedagogically, because in that way he will see some applications
of the general theory, and in particular will gain some respect for the
significance of the Riemann–Roch theorem. In contrast, the proof of the
Riemann–Roch theorem is not very enlightening. What Hartshorne talked
about was the algebraic geometry theory over an algebraically closed ground
field. Our fields are not even algebraically closed, and the usual proofs in
algebraic geometry over algebraically closed fields cannot be used without
deep reflections and modifications. What we planned to do for Riemann’s
theorem (see the following) and Riemann–Roch theorem (see next section)
are descriptive proofs, we give plenty examples and refer the hard parts of
the proofs to Chevalley [10].

4.6.1. Riemann’s Theorem

Before we state the theorem, let us consider the following examples.

Example 18: Let us consider the simplest curve, a straight line in the
2
projective plane PK , defined by x2 = 0. Let us consider the pole set
{x0 = 0, x1 = 1, x2 = 0} = {P}. It is not hard to see that the set of
all functions with at most one pole at P is {(a0 x0 + a1 x1 )/x0 } which is
a vector space of dimension 2 over K. In general, let D = nP. It is not
hard to see that the set of all functions with at most D = nP as poles
are {(a0 xn0 + a1 xn−1
0 x1 + · · · + an xn1 )/xn0 } which is a vector space of
dimension n + 1 = (D) over K. Let t = x1 /x0 . Then, a0 xn0 + a1 xn−1 0 x1 +
n n n
· · · + an x1 )/x0 can be re-written as a0 + a1 t + · · · + an t , a polynomial in t
of degree ≤ n. Then, we verify Riemann’s inequality (cf. Proposition 4.28)
(D) ≥ d(D) + 1 − g, with g = 0. 

Example 19: Assume that ch(K) = 2 and K contains a square root of


any element in K. Let us consider another curve, an elliptic curve C in
the projective plane, defined by x22 x0 = (x1 − ax0 )(x1 − x0 )(x1 − 0), where
a = 1, 0. This projective curve is smooth (cf. Example 9). For the sake of
112 Introduction to Algebraic Coding Theory

our discussion, let us fix the affine piece (A2K ) = {x0 = 1} as the points at
finite distance. Let us consider the pole set {x0 = 0 = x1 , x2 = 1} = {P}.
Let us consider the set of all functions L(nP) with at most n poles at P,
where n is a non-negative integer and no pole elsewhere. We claim that
L(nP) is a vector space of dimension n over K.
We shall dehomogenize the equation by setting x = xx10 , y = xx20 . Then,
the defining equation can be rewritten as

g(x, y) = y 2 − (x − a)(x − 1)(x − 0).

In general, the functions f (x, y) in F (C) = K(x)[y]/(g(x, y)) are of the


following form:

g0 (x) + g1 (x)y
f (x, y) = .
h(x)

We may assume that (h(x), g0 (x), g1 (x)) = (1); otherwise, we may reduce
the form. We make a further assumption that h(x) splits completely
(otherwise, go to a finite extension of K, if the reader feels comfortable
by assuming that K is algebraically closed, then assume it). Suppose that
f (x, y) has no pole at finite distance (means in A2K ). We claim that h(x)
is a non-zero constant.
If not, we show that f has a pole at finite distance, and thus f ∈
L(nP∞ ). Consider any non-constant factor x − β of h(x). We have the
following two cases: (1) β = 0, 1, a,; (2) β = 0, 1, a. In case (1), the
intersection of x − β with curve C will be distinct points P1 and P2
(corresponding to two distinct non-zero values of y on C with x = β). Either
g0 (β) = 0 or g1 (β) = 0 but not both are zero since (h(β), g0 (β), g1 (β)) = 1.
In any situation, the numerator of f (x, y) can not be 0 at both non-zero
values of y. Therefore, f must have a pole at finite distance, and it is not
allowed. In case (2), the intersection of x − β with the curve C is at the
point P = (β, 0) twice. If g0 (β) = 0 since y = 0, then the numerator does
not pass through the point P = (β, 0), so the function will have two poles
at P = (β, 0). It is impossible. Therefore, g0 (β) = 0. We shall study the
completion of the local ring OP at the point P. Let us discuss the situation
that β = 0 (other situations are similar). The defining equation is

y 2 = (x − a)(x − 1)(x − 0).


Algebraic Geometry 113

It is easy to see y is a uniformization parameter, and ÔP = K[[y]]. We have


x = a−1 y 2 + · · · ,
g0 = b2 y 2 + · · · , (1)
g 1 = c0 + c1 y + · · · ,
where c0 = 0. It is easy to conclude that the numerator of f has P as zero
once and the denominator of f has P as zero twice. Therefore, f has a pole
at finite distance. Our claim is thus proved.
At P∞ , we set u = xx02 = y −1 , v = xx12 = xy −1 . The defining equation
becomes
u = (v − au)(v − u)(v − 0).
It is easy to see that ordP∞ (v) = 1, ordP∞ (u) = 3. Hence. ordP∞ (y) = −3,
and ordP∞ (x) = −2. Therefore, a polynomial of the form g0 (x) + g1 (x)y
will have order −2i or −2j − 3 at P∞ . It is easy to see that there is no
function with a simple pole at P∞ and all functions with at most n poles
at P∞ form a vector space of dimension n, i.e., (nP) = n. In fact, the
curve is of genus 1. We have the inequality of Riemann as
n ≥ n + 1 − 1 = n + g − 1. 
The shocking discovery is that the vector space L(nP) is finite dimen-
sional in general. We can tell the difference between two curves discussed
in Examples 18 and 19 by studying L(nP). This is one of Riemann’s great
discoveries.
There are many relations between (D) and d(D). For instance, if D  0,
it can be shown that (D) ≤ d(D) + 1. Let us prove it for a simple case
that D = nP with P a rational point, i.e., ŌP = K[[t]]. Let us define
a map π : L(D) → Kn as π(f ) = (a1 , . . . , an ), where f ∈ L(D) and
f = an t−n + · · · + a1 t−1 + · · · . Then clearly, π is a linear map and ker(π)
=K. Therefore, (D) ≤ n + 1 ≤ d(D) + 1.
Similarly, it is easy to see that if E  D,i.e., E − D = H  0; then,
dim(L(E)/L(D)) = (E) − (D) ≥ d(E) − d(D), i.e.,
(E) − d(E) ≥ (D) − d(D).
Especially, if E = D + P for some rational point P, then we have
(E) ≤ (D) + 1.
114 Introduction to Algebraic Coding Theory

Proposition 4.28. We always have (E) − d(E) ≤ (D) − d(D) for any
two divisors E and D with E  D.

Proof. Chevalley p. 21 [10]. 


We have the following Riemann’s theorem. Note our notations are
different from [10].
Proposition 4.29 (Riemann’s theorem). Let X be a given projective
curve and D any divisor. Then, there is a minimal non-negative integer g,
which will be called the genus of X such that
(D) ≥ d(D) + 1 − g.
Moreover, if d(D) ≥ 2g − 1, then
(D) = d(D) + 1 − g.

Proof. Chevalley p. 22 [10]. 

Proposition 4.30. If d(D) < 0, then (D) = 0.

Proof. Let f ∈ L(D). Then, we have (f ) + D  0 and d(D) = d((f )) +


d(D) ≥ 0. A contradiction. 

4.6.2. Plücker’s Formula


Note that the genus does not change under a separable field extension of the
ground field K (cf. [10] Theorem 5, p. 99). We may extend the ground field
F to its algebraic closure Ω without changing the genus. Hence, we shall
mention the following classical Plücker’s formula without the restriction
that the ground field is algebraically closed.
Proposition 4.31 (Plücker’s formula). Let C be a smooth plane curve
of degree n. Then, the genus g of C is given by the following formula:
(n − 1)(n − 2)
g= . 
2
Remark: If the curve C is with only ordinary singularities Pi with
multiplicities {mi } (i.e., a singularity with mi distinct tangent lines), then
the genus is
(n − 1)(n − 2)  mi (mi − 1)
g= − .
2 i
2
It is called the extended Plücker’s formula. 
Algebraic Geometry 115

Example 20: Let us consider the elliptic curve of Example 19 with n = 3.


It is easy to see that it is smooth and regular. Then, by Plücker’s formula,
its genus g is 1. It follows from Riemann’s theorem that (D) ≥ d(D) = n
for D = nP∞ , which has been verified by direct computation. 

Example 21: Let us consider a Fermat’s curve x31 +x32 +x30 over F2m . Then,
by previous discussions, we know it is smooth and regular. By Plücker’s
formula, its genus g is 1. It follows from Riemann’s theorem that (D) ≥
d(D) always. Let us verify Riemann’s theorem for some special divisors.
Let P∞ = (0, 1, 1). Let us consider D = nP∞ for some non-negative
integer n. Note that d(D) = n. Let us make a projective transformation π:

(1) : π(x1 ) = y1 + y2 ,
(2) : π(x2 ) = y2 ,
(j) : π(x0 ) = y0 .

Then, the defining equation becomes y13 + y2 y12 + y22 y1 + y03 . Let us consider
the affine part defined by setting y1 = 1 with x = y0 , y = y2 . The equation
becomes

y 2 + y = x3 + 1.

The function field F (C) is of degree 2 over F2m (x). In general, the functions
in F (C) = K(x)[y] are of the following form:
g0 (x) + g1 (x)y
f (x, y) = .
h(x)
We may assume that (h(x), g0 (x), g1 (x)) = (1). We make a further
assumption that h(x) splits completely (otherwise, go to a finite extension
of K or assume the field is algebraically closed). Suppose that f (x, y) has
no pole at finite distance, we claim that h(x) is a non-zero constant.
If not, we show that f has a pole at finite distance and thus f ∈ L(nP∞ ).
Consider any non-constant factor x−β of h(x). The intersection of x−β with
curve C will be distinct points P1 = (β, y1 ) and P2 = (β, y2 ) (corresponding
to two distinct non-zero values, y1 , y2 , of y on C with x = β, which can
always be achieved if we go to an algebraic extension of K). If the numerator
of f (x, y) is zero at both points, then we have

g0 (β) + y1 g1 (β) = 0,
g0 (β) + y2 g1 (β) = 0.
116 Introduction to Algebraic Coding Theory

If we treat y1 , y2 as numbers and g0 (β), g1 (β) as variables, then we conclude

g0 (β) = 0,
g1 (β) = 0,

which contradicts our assumption that there is no common factor for


h(x), g0 (x), g1 (x)! Our claim is thus proved.
At P∞ = (0, 0, 1), we set u = yy02 = xy −1 , v = yy12 = y −1 . The defining
equation becomes

v 2 + v = u3 + v 3 .

It is easy to see that ordP∞ (v) = 3, ordP∞ (u) = 1. Hence, ordP∞ (y) = −3,
and ordP∞ (x) = −2. Therefore, a polynomial of the form g0 (x) + g1 (x)y
will have order −2i or −2j − 3 at P∞ . It is easy to see all functions with
at most n poles at P∞ form a vector space of dimension n. 

Example 22: Let us consider the curve x2 − y 2 + x3 + y 4 defined over


the complex numbers C. By the Jacobian criteria, the origin is the only
singularity with ordinary double multiplicity. Hence, by an extended
Plücker formula, which works for singular curves, its genus g is (4−1)(4−2)
2 −
1 = 2 (cf. [15]).
From the above, we easily conclude that a smooth projective plane
curve of degree n is of genus (n − 1)(n − 2)/2. Therefore, let n =
1, 2, 3, 4, . . ., the genera are 0, 0, 1, 3, . . .. If there is a plane curve of
genus 2, then there must be some singular points in any its planar
model. 

If we consider all algebraic projective curves, the coarsest classification


is by their genera, i.e., two curves are in the same class if their genera
are the same. This is the discrete parameter the classification of algebraic
smooth projective curves according to their genera. If the ground field is the
complex number field C, then it simply states the underlying topological
spaces are the same for all algebraic projective curves with same given
genus. However, analytically, the structure can still be different. Thus, we
have a further fine parameter for all algebraic projective curves over the
complex field C with the same genus, namely, the variety of moduli of
curves of genus g, which is a point if g = 0, has dimension 1 if g = 1 and
dimension 3g − 3 if g ≥ 2.
Algebraic Geometry 117

4.6.3. Rational Curve


A curve C is said to be a rational curve if the rational function field of C
is isomorphic to K(x). We have the following proposition.

Proposition 4.32. Let F be an algebraic function field of one variable.


Then, F ≡ K(x) ⇔ C has a rational point and g = 0.

Proof. Chevalley p. 23 [10]. 


Example 23: Let C be a plane curve over K = Fq (u, v), where u, v are
symbols, defined by
uxm + vy m = 1,
with m ≥ 2, p † m. It follows from Plücker’s formula that the genus of
C is (m − 1)(m − 2)/2. We claim that there is no rational point. If
there is one, let its coordinate be (g(u, v)/f (u, v), h(u, v)/f (u, v)), where
f (u, v), g(u, v), h(u, v) are polynomials in u, v; we may further assume that
there is no common factor among all three polynomials, then the above
equation can be rewritten as
ug(u, v)m + vh(u, v)m = f (u, v)m .
Let deg(f (u, v)) = a, deg(g(u, v)) = b, deg(h(u, v)) = c. Then, the degrees
of those three terms are 1 + mb, 1 + mc, ma, respectively. Moreover, the
highest two of them must be equal. We conclude that the highest forms of
ug m (u, v) and vhm (u, v) must have factors in v of degrees ma and mb + 1,
then the terms cannot cancel out. This is impossible.
Since there is no rational point, for m = 2, the curve C has genus g = 0
but is not a projective line. 

Exercises

(1) Show that the curve defined over Fp2 by the equation
ax2 + by 2 = 1
is a birationally equivalent (cf. Definition 4.26) to a projective line for
any 0 = a, b ∈ Fp2 , where p is an odd prime number.
(2) Finish the arguments of Example 22.
(3) Finish the arguments of Example 23.
1
(4) Prove Riemann’s theorem for PK directly.
(5) Given ch(K) > 2, and the curve C defined by x2 − y 2 + x3 + y 4 = 0,
let P be the point at infinity. Find L(P), L(2P), L(3P).
118 Introduction to Algebraic Coding Theory

4.7. Riemann–Roch Theorem I

Note that Riemann’s theorem is an inequality. There is a number missing


for it to be an equality. Roch, a student of chemistry who had a Ph.D. in
electromagnetism and worked under Riemann, found a natural explanation
of the missing number. Riemann passed away on July 20, 1866 at the age
of 40, and Roch, a few months later, died at the age of 26. Their work
has been generalized to functions of one variable over a finite field which is
useful for coding theory. First, we have to introduce the abstract concepts
of differentials. We shall follow Chevalley [10] in our treatment of Riemann–
Roch theorem. It gives us a fast way of proving Riemann–Roch theorem. We
quote [10] to show that the abstractly defined term differential in the next
section is really the ordinary differential in the classical analysis. Classically,
we have the following interesting remark.
Remark: Let us consider the complex case, where K = C the field of
complex numbers. Let ω = f (x)dx be any classically named differential
and g(x) be any meromorphic function locally defined at a point Pj . Then,
we may treat ω as a linear functional on the meromorphic function g(x),
locally defined at Pj as the (Cauchy) residue of g(x)f (x)dx at a point Pj :

1
ω(g(x)) = g(x)f (x)dx
2πi Γ
= the residue of g(x)f (x)dx at a point Pj ,
where Γ is a simple loop around Pj such that g(x)f (x)dx has no pole

on the loop nor other poles inside the loop and i = −1 as usual in
complex analysis. Note that ω(g(x)) ∈ C, and ω can be considered as a
linear function in g(x). Furthermore, if we take the loop small enough so
that there is no pole inside, then the integral is 0. Therefore, we take the
same loop with the reverse direction, then the integration is still 0, which

means resPj (g(x)f (x)dx) = −0 = 0 for global rational function g(x).

That is an important classical Cauchy theorem: resPj (g(x)f (x)dx) =

resPj (h(x)dx) = 0.
Therefore, we may treat differentials as functionals on repartitions (see
Definition 4.32). Furthermore, the residue which was defined by integration
can be defined pure algebraically as with x a local uniformizing parameter
and given the (meromorphic) expression of h(x) = g(x)f (x) as


h(x) = aj xj .
j=−m
Algebraic Geometry 119

Then, we may simply define residue as a−1 and disregard the integration.
Certainly, we have to prove the residue thus defined is algebraically
sound. 

4.7.1. Repartitions and Differentials

In the above remark, we use an integration theory. In our present situation,


we are without the usage of integrations. Weil found the correct algebraic
way of treating differentials. We define repartitions and differentials for the
general cases as follows (Chevalley [10]).

Definition 4.33. A repartition ξ is an assignment to every place P a


function ξP such that there are only finitely many places P with ordP (ξP ) <
0. The collection of all repartitions is denoted by Ξ. Note that a rational
function f has only finitely many poles Pj . It is clear that we may assign the
rational function f to all places Pj . It is called the constant repartitions. The
collection of constant repartitions is isomorphic to F and will be denoted by
the same symbol F. Given any divisor D, we define Ξ(D) as the collection
of all repartition ξ such that

ordP (ξP ) + ordP (D) ≥ 0

for all P. 

We give the following general definition, (cf. [10] p. 30).

Definition 4.34. A differential ω is a (K-)linear functional on Ξ such that


there exists a divisor D with ω vanishing identically on Ξ(−D) + F, where
Ξ(−D) is defined in the previous definition. If D is any divisor which has
the stated property, then we say the differential ω is a multiple of D, and
we write ω ≡ 0( mod (D)) and ω ∈ Ω(D).
Let us define the divisor δ(ω) of a differential ω as the maximal D
such that ω ∈ Ω(D). Clearly, such a divisor exists and is unique (cf. [10]
p. 32. Note that Chevalley used an with a an ideal and considered an
maximal). We may reformulate Ω(D) = {ω : δ(ω)  D}. We define the
degree (ω) = (δ(ω)) and d(ω) as d(δ(ω)), the divisor of a differential is
called a canonical divisor.
Furthermore, we name i(D) the index of D to denote the dimension of
the vector space Ω(D). 
120 Introduction to Algebraic Coding Theory

4.7.2. Riemann–Roch Theorem


The above definitions are abstract and general. They provide a short proof
for a form of the Riemann–Roch theorem we need.
Let R be the field of algebraic functions of one variable of genus g, and
let D be a divisor of R. We have the following Riemann–Roch theorem.
Proposition 4.35 (Riemann–Roch theorem). Let us assume the
preceding paragraph. We have
(D) − i(D) = d(D) + 1 − g.

Proof. Chevalley, p. 30 [10]. 

4.7.3. Canonical Class


Let F be a function field of one variable over a perfect field K. Let ω = 0
be a differential of F over K. Then, we have the following proposition.

Proposition 4.36. The vector space of all differentials can be written


as Fω.

Proof. Let ω  ∈ Fω. Then, ω  is a differential (see the paragraph before


Theorem 5 on p. 31 of [10]). Furthermore, we refer to Theorem 5 on p. 31
of [10] to see that any differential ω  can be written as xω with x ∈ F. 
We have the following proposition about the important numbers d(ω)
and (ω).

Proposition 4.37. Let W be a canonical divisor. Then, we have the


following: (1) All canonical divisors W are linearly equivalent. (2) d(W) =
2g − 2. (3) (W) = g, where g is the genus of the curve C.

Proof. (1) It is easily provable. (2) Chevalley, p. 32 [10]. (3) Chevalley,


p. 30 [10]. 
It follows from the preceding proposition that if we know one non-zero
differential ω, then we know Fω and all differentials. What we plan to do in
the next section is to show that we can define dx as a non-zero differential.

4.7.4. Residue
In the example at the beginning of this section, we have the important
concept of residue of Cauchy over complex numbers C. We want to
Algebraic Geometry 121

generalize the concept to the cases of finite fields. Algebraically, we define


the residue field of a valuation O as O/P, where P is the only maximal
ideal of O. In our present situation, we have K a perfect field, hence O/P is
separable over K. Any differential ω when restricted to a place P will induce
a linear map on O/P as a finite-dimensional vector space over K. It can be
proved that any linear map of O/P can be represented as a :→ trace(a)σ,
where σ ∈ O/P is uniquely determined by the map (see [10], Lemma 1,
p. 48). We define the residue of ω to the unique element σ. Although the
definition of residue looks strange, it is really the generalization of Cauchy’s
definition by integration! However, we use the residues in the coding theory
only for rational points of the curve where the computations of residues of
differentials are simple (see Proposition 4.42).
Similar to the classical analysis, we have the following interesting
proposition.

Proposition 4.38. Let ω be a differential with poles {Pj }, then we have



resPj (ω) = 0.

Proof. Chevalley, p. 49 [10]. 


Proposition 4.39 (The existence theorem for differentials). Let

P1 , . . . , Pm be finitely many distinct rational points. Let D = Pi . Given
elements ri , . . . , rm in K such that

ri = 0.
i

Then there exists a differential ω of F with the divisor (ω) ∈ Ω(D) such
that resPi ω = ri .

Proof. Chevalley, p. 50 [10]. 


Remark: If there is another rational point P outside the set {P1 , . . . , Pm },

then we may push the residue to P , i.e., define r = − i ri , then the
 
condition r + i ri = 0 is automatically satisfied. Let D = Pi . Note
that Ω(D + P ) ⊂ Ω(D). 

Exercises
1
(1) Prove the existence theorem for differential for PK directly.
1
(2) Prove the approximation theorem for functions for PK directly.
122 Introduction to Algebraic Coding Theory

4.8. Riemann–Roch Theorem II

4.8.1. The differential dx for K(x) and F


From the Proposition 4.35 of the last section, we know that it suffices to
find just one non-zero differential ω, then we know all other differentials.
For this purpose, let us select x ∈ F such that F is finite separable over
K(x) (it can be done by Noether’s theorem). We shall first select a special
differential dx for K(x) and then extend dx to F. We have the following
proposition.

Proposition 4.40. Let us consider the field K(x) as selected in the


preceding paragraph. Let P∞ be the place at ∞ and Dj = −jP∞ . We
have (Dj ) = 0 for j = 1, 2 and i(D1 ) = 0, i(D2 ) = 1.

Proof. Since the degrees of Dj are negative, it follows from Proposition


4.29 that we have (Dj ) = 0. The second part of the proposition follows
from Riemann–Roch theorem since g = 0. 
Let us use the material on p. 102 in [10] for the following proposition.

Proposition 4.41. There is a unique differential ω ∈ Ω(D2 ) which has


value −1 on ξ = x1 at P∞ . 

We shall denote the unique differential by dx, or more precisely by


dxK(x) . Consider F ⊃ K(x) and a separable extension of K(x). We shall
use the usual method to extend dxK(x) to dxF . We have
dxF = cotraceK(x)/F (dxK(x) ),
where cotrace is defined on p. 105 in [10] as
cotraceK(x)/F (ω)(ξ) = ω(traceF/K(x) (ξ)),
while trace is the usual one. We may abuse the notation and use dx for the
field dxF . The only thing we have to know is the value of (dx)y = ydx at a
place P; we define it to be the residue of ydx at P (see the following).

4.8.2. Complete Local Ring


We shall quote the following theorem of Cohen (cf. [16] p. 304).

Proposition 4.42. An equicharacteristic complete local ring O admits a


field of representatives. 
Algebraic Geometry 123

The above proposition means that in our situation, a local ring O (we
only consider valuation rings) contains a perfect field K, hence the residue
map O → O/P = O induces an isomorphic on K. So, the characteristic of
K equal to the characteristic of O. This is the equicharacteristic case, and
O is K [[t]], where K is the field of representative. We have the following
proposition which will be useful for algebraic coding theory.

Proposition 4.43. Let C be a smooth projective curve, and let P be a


rational point, i.e., (O/P=K), and uniformization parameter t ∈ R and
xdy be a (classical) differential. Let ydx be expressed as
∞ 

j
ydx = cj t dt ∈ K((t))dt cr = 0.
r

Then, we have the following: (1) μp (ydx) = r; (2) resP (ydx) = c−1 .

Proof. Statement (1) follows from definition, and for (2), see Chevalley,
p. 110 corollary of Theorem 6 [10]. 
Remark: In our applications to coding theory, we restrict the above
proposition to the simple cases that the differentials only have simple poles,
i.e., r ≥ −1.
Note that in the classical complex case, every place is rational; this
result shows that this residue matches Cauchy residue.
Example 24: Let us consider the curve C = PC1 , where C is the field of
the complex numbers.
(1) We shall make computations according to Riemann–Roch theorem.
Let D = mP∞ , where m is an integer ≥ −1. Since the genus g = 0, we
have d(mP) ≥ 2g − 1 = −1; it follows from Proposition 4.28 (Riemann’s
theorem) that (mP) = d(mP) + 1 − g. Hence, it follows from Proposition
4.34 (Riemann–Roch theorem) that i(mP) = 0, i.e., there is no differential
ω with divisor δ(ω)  mP∞ . Let us consider D = −mP∞ , where m ≥ 2
is an integer. Let the regular function ring of PC1 \{0} be C[x]. Then, it
follows from Proposition 4.29 that (−mP) = 0. Further, it follows from
Proposition 4.34 (Riemann–Roch theorem) that
−i(−mP) = d(−mP) + 1 − g,
and hence, we have
i(−mP) = m − 1.
124 Introduction to Algebraic Coding Theory

Let fg(x)
(x)
dx ∈ Ω(−mP). Since the differential has no pole in A1C , g(x) must
be a non-zero constant; we may assume it is 1, i.e., the differential is
f (x)dx. It is easy to check that the vector space Ω(−mP) is generated
by {dx, xdx, . . . , xm−2 dx}.
(2) We wish to show that the differentials in the classical sense are
equivalent to linear functions on Ξ. First, we generalize our previous
concepts as follows.
Let f (x)dx be a differential in the classical sense. Let vp = the order of

f (x)dx at point P, i.e., f (x)dx = ( ∞ j
i=vp aj t )dt, where avp = 0 and t is a
 
uniformization parameter. Then, (f (x)dx) = vp P. Let D = j nj Pj
be a divisor. Then, we have f (x)dx ∈ Ω(D) ⇐⇒ (f (x)dx)  D. We
give a concrete argument for (2) directly. We separate the discussions
into the following two steps: (A) Every differential in the classical sense
induces a differential in the Weil’s sense, and this induction is a one to one
correspondence. (B) The induced map is a bijection between Ω(D) and the
linear functionals vanishing on Ξ(D) + F.
Proof of (A): We show that every differential in the classical sense can be
viewed as a (K-)linear function on Ξ which vanishes on Ξ(D) + F for some
suitable divisor D. Let ω = f (x)dx be a differential in the classical sense.
Then, for any repartition ξ and any point P ∈ PC1 (note that all points are
rational), let ξp be the function element specified by ξ at the point P, we
may define
ωP (ξ) = the residue of ξp ω at P.
Since for the above residue to be non-zero at P, either ω must have a pole
or ξP must have a pole, and both sets are finite, therefore there are only
finitely many P for the above residue to be non-zeroes, we may define

ω(ξ) = ωP (ξ).
P

From the above, we conclude that ω defines a linear function on Ξ.


It is clear that for any constant repartition ξ = g(x) ∈ F, ω(ξ) =

res(g(x)f (x)dx) = 0.
For any differential f (x)dx in the classical sense, let (f (x)dx) =
  
n P − j mj Pj , where ni , mj are non-negative; let D = i ni Pi −
i i i
j mj Pj . It is easy to see that (f (x)dx) = D, (f (x)dx)  D and
(f (x)dx) ∈ Ω(D). Furthermore, ωP (ξ) = 0 for all P, ξ ∈ Ξ(D). Therefore, ω
vanishes on Ξ(D)+F. We define a map from the differentials in the classical
sense ∈ Ω(D) to elements in the new sense which vanish on Ξ(D) + F.
Algebraic Geometry 125

Moreover, we claim that any non-zero differential ω = f (x)dx in the


classical sense cannot induce a zero-linear function. Let a be any point that
is neither a zero nor a pole of f (x). Consider a repartition ξ defined by
ξ = (x − a)−1 at P = (x − a) and 0 elsewhere. Then, ω(ξ) = f (a) = 0.
Proof of (B): We wish to show that the induced residue map is onto. We
observe that the classical differential ω ∈ Ω(D) will vanish on Ξ(D) + F for
any given divisor D. Therefore, the induced map by the residue will send
Ω(D) to a subspace of all linear functionals which vanish on Ξ(D) + F.
We wish to show that they have the same dimensions. Then, clearly it will
imply that the induced residue map is onto. We shall count the number of
linear independent classical differentials in Ω(−D) for any divisor D. We
claim that the number is max{0, d(D) − 1}.
Let −D = DA + D∞ , where DA is the affine part of −D and D∞ is

the part of −D at ∞. Consider DA = nj Pj with Pi = (x − aj ) and
 −nj
D∞ = mP∞ . Let h(x) = j (x − aj ) and f (x)dx ∈ Ω(−D), i.e.,

(f (x)dx)  −D.
 
Then, (f (x)dx) = j nj Pj + m P∞ + k k Pk with nj ≥ nj , m ≥ m
and k ≥ 0. We see that h(x)f (x) has no poles at finite distance, and thus,

it is a polynomial g(x) in x. Therefore (g(x)dx) is (nj − nj )Pi + (m +

( j nj )P∞ . Let t = x−1 be a uniformization parameter at ∞. We have
dx = −t−2 dt. Then, it follows that at the point ∞,
⎛ ⎞ ⎛ ⎞
 
v∞ (h(x)f (x)dx) = m + ⎝ nj ⎠ ≥ m + ⎝ nj ⎠ = −d(D),
j j

v∞ (g(x)) ≥ 2 − d(D).

It means that g(x) is a polynomial of degree ≤ d(D) − 2. The set of such


g(x) forms a vector space of dimension d(D) − 1 or 0.
We claim that the number of linear-independent linear functions on Ξ
which vanish on Ξ(−D) + F is max{0, d(D) − 1}.
Let us discuss a simple case, say, D = P = P∞ . Let ξ be any repartition.
It is easy to see that there is a rational function f (x) such that ξ − f (x) has
no pole. Let ξP (P) = a which may be 0. We shall consider ξ − f (x) − a,
then this repartition has a zero at P; therefore, it is in Ξ(−P) and
ξ ∈ Ξ(−P) + F. We conclude that a linear function which vanishes on
Ξ(−P) + F must vanish on any repartition ξ, which means the differential
which vanishes on Ξ(−P) + F must be the zero-linear function. Our claim is
126 Introduction to Algebraic Coding Theory

proved in this case. For the general cases, the reader is referred to Chevalley,
pp. 26–30 [10] or Exercise 4.
According to (1), (2), (3), differentials in the classical sense are the linear
functions which vanish on Ξ(D) + F. 
1
Example 25: Let C = PK with ch(K) > 2. Then, F (C) = K(x). Let
ω = (x2 +x3 )d(x2 )+dx be a differential. It is easy to see that ω = (1+2x3 +
2x4 )dx. At any point Pa = ((x − a)), the local ring at Pa is R = K[x](x−a)
with maximal ideal (x−a)R, where (x−a) is the uniformization parameter.
Then, ω = (1 + 2(x − a + a)3 + 2(x − a + a)4 )d(x − a) = f (x − a)d(x − a).
We have ordPa (ω) = ordPa (f (x − a)). We have one more point P∞ at ∞.
The local ring at P∞ is K[τ ](τ ) , where τ = x1 . The differential ω =
(1+2τ −3 +2τ −4 )d(τ −1 ) = −(τ −2 +2τ −5 +2τ −6 )dτ ) = −τ −6 (2+2τ +τ 4 )dτ .
So, we have that μP∞ (ω) = ordP∞ (−(τ −2 + 2τ −5 + 2τ −6 ) = −6. 
Example 26: In the definition of residue, we have to use the trace function
which can be illustrated by examples. If the point is not rational, then we
1
have to consider the trace function. Let us consider the curve PK and the
following differential:
2xdx
η= .
x2 + 1
Let the ground field K be C the complex field. By partial fraction, we have

with i = −1,

1 1
η= + dx.
x+i x−i
It is easy to see that the residues at (x + i), (x − i) are 1, 1, respectively.
At P∞ with uniformization parameter t = x−1 and x2 + 1 is an unit, η
becomes

η = (−2t−1 + · · · )dt.

It is easy to see that resPj η = 0. However, let us consider the case that
the ground field K = R the real field. At ∞, it has a residue −2 as before.
There is only one point P = (x2 + 1) at finite distant with a pole. Let
t = x2 + 1 be a uniformization parameter at this point. Then, we have

η = t−1 dt.

The coefficient of t−1 term is 1. However, the point is not rational. Its
residue will be defined as the trace of 1, note that 1 · 1 = 1 · 1 + 0 · i and
Algebraic Geometry 127

1 · i = 0 · 1 + 1 · i. It is easy to deduce that


resP (η) = T rC/R (I) = 2,

where T rC/R is the trace operation from C to R. We still have resPi η = 0.
Let us consider another differential ω defined as
2dx
ω= 2 .
x +1
Let the ground field K be C the complex field. By partial fraction, we have

with i = −1,

i −i
ω= + dx.
x+i x−i
It is easy to see that the residues at (x + i), (x − i) are i, −i, respectively.
At P∞ with uniformization parameter t = x−1 , dt = −x−2 dx + · · · . The
differential ω becomes
ω = (−2 + · · · )dt.

Hence, the residue is 0 at ∞. It is easy to see that resPj ω = 0. However,
let us consider the case that the ground field K = R the real field. At ∞, it
has a residue 0. There is only one point P = (x2 + 1) at finite distant with
a pole. Let t = x2 + 1 be a uniformization parameter at this point. Then

we have Example 7 in Section 4.3 where we show that x = i 1 − t in the
complete local ring C((t)):
ω = (x(t−1 ))dt = (it−1 + · · · )dt.
The coefficient of t−1 is i. Its residue will be defined as the trace of i(=

−1). Note that i · 1 = 0 · 1 + 1 · i and i · i = −1 · 1 + 0 · i; it is easy to
deduce
resP (ω) = T rC/R (i) = 0 + 0 = 0,

where T rC/R is the trace operation from C to R. We still have resPj η = 0.

1
Example 27: Let us consider the algebraic curve PK . Let η = f (x)dx =
r(x)
s(x) dx be any non-zero differential. For simplicity, we assume that s(x) can
be factored into linear polynomials (otherwise, go to a finite field extension
of K). By the theory of partial fractions, we have
 cjk
η = (u(x) + )dx.
(x − aj )k
j,k,aj
128 Introduction to Algebraic Coding Theory


The sum of all residues at finite distance is clearly cj1 . We shall show
that the residue at P∞ is its negative. Since the residue map is linear,
it suffices to check every term in the above formula. Note that at P∞ , a
uniformizing parameter is x−1 = t. Then, we have

u(x)dx = −t−2 u(t−1 )dt,


 
cjk −2 cjk tk
dx = −t dt, if k ≥ 2,
(x − aj )k (1 − aj t)k
 
cj1 −2 cj1 t
dx = −t dt.
(x − aj ) (1 − aj t)

Clearly, the residue at P∞ is − cj1 and combining with our previous

result that the sum of all residues at finite distance is clearly cj1 , we

have respj (η) = 0. 
Example 28: Let us consider Example 19 of the preceding section. As
usual, we assume that the field K is not of characteristic 2. We have an
equation which defines a projective curve C of genus 1 with a = 0, 1:

y 2 z = (x − az)(x − z)(x − 0z).

Let us dehomogenize the above equation by set z = 1; the equation becomes


y 2 = (x−a)(x−1)(x−0). Let us consider the differentials dx, dy. By implicit
differentiation, we have

3x2 − 2(1 + a)x + a
dy = dx.
2y
Therefore, dx and dy are linearly equivalent. Let us consider the differential
dx. We consider the curve C at finite distance as a covering of the affine line
A1K . At finite distance of the affine line A1K , let us consider the following:(1)
a rational point (x−β) with β = 0, 1, a; (2) a rational point with β = 0, 1, a;
(3) non-rational point f (x), where f (x) is a monic irreducible polynomial
of degree > 1 as follows.

(1) Our defining equation provides y = 0 and y is a uniformization


parameter. Then, we have x = y 2 +· · · and dx = (2y+· · · )dy. Therefore,
dx has a 0 of order 1 at 
these three points.
(2) The value of y must be ± (β − a)(β − 1)(β) = ±c, where c may or may
not be in K. We separate the discussions into following two subcases:
(A) Suppose that 0 = c ∈ K. Then, x−β is a uniformization parameter.
Since dx = d(x − β), then there is neither a zero nor a pole.
Algebraic Geometry 129

(B) Suppose that c ∈ K. Then, x − β is a uniformization parameter.


Since dx = d(x − β), then there is neither zero nor pole.
(3) Note that with our basic assumption that the field K is perfect,
f (x), f  (x) have no common root. Clearly, we have df (x) = f  (x)dx or
dx = ( f 1(x) )df (x). It can be shown that f (x) is a local uniformization
parameter in K[x](f (x)) and at the corresponding point(s) on the curve
C. Since the residue of ( f 1(x) ) is not zero modulo (f (x)), then there is
neither a zero nor a pole.

Let us consider the point at ∞. As pointed out in Example 19 in


Section 4.6, x = t−2 for a uniformization parameter t. Therefore, dx =
−2t−3 dt, i.e., it has three poles at ∞. We conclude that (dx) = P0 + P1 +
Pa − 3P∞ and degree of ((dx)) = 0 = 2g − 2 and g = 1. 

Proposition 4.44. We always have


i(D) = (W − D),
where W is a canonical divisor.

Proof. Let W=(f dx). We have


g ∈ L(W − D) ⇔
(g) + (W − D)  0 ⇔
(g) + (W)  D ⇔
(g) + (f dx)  D ⇔
gf dx ∈ Ω(D).
Therefore, there is a bijective map from L(W − D) to Ω(D). They must
be of the same dimension. Our proposition follows from the definitions of
(W − D) (cf. Definition 4.23) and of i(D) (cf. Definition 4.33). 
Example 29: Let us use the notations of Example 28. Let D = P0 +
P1 + Pa − 3P∞ . Certainly, we know that (dx)  D. Let f dx be another
differential such that (f dx)  D. Then clearly, we have (f )  0. Therefore,
f = const. Hence, i(D) = 1. Let us consider another example. Let D =
nP∞ , where n is a positive integer. Note that d(W − D ) < 0. Therefore,
0 = (W − D ) = i(D ). 

Proposition 4.45. If d(D) > 2g − 2, then i(D) = 0.

Proof. Clearly, d(W − D) < 0, and 0 = (W − D) = i(D). 


130 Introduction to Algebraic Coding Theory

In the previous discussions, sometimes we extend the ground field from


K to a separable extension K . We have the following proposition.

Proposition 4.46. If K is a separable extension of K (which is always


true for K a perfect field), then there is no change of genus g for the ground
field extension.

Proof. Chevalley, p. 99 [10]. 


The above proposition tells us that when we compute the genus, we
may extend the ground field to the algebraic closure of K and use the
corresponding result about genus from algebraic geometry, say, using the
Proposition 4.30 (Plücker’s formula).
Example 30: Let us consider Example 19 of the preceding section. We
have an equation which defines a curve C of genus 1 with a = 0, 1:
y 2 = (x − a)(x − 1)(x − 0).
Let us use the notations of Example 29. According to the Riemann–Roch
theorem, we have
(D) − i(D) = 0.
Since i(D) = 1, so (D) = 1. A generator of L(D) is y1 . Let us consider D =
nP∞ . Then, it follows from the Riemann–Roch theorem that (D ) = n
since we have i(D ) = 0 and g = 1. Clearly, the set {xk y j : j = 0, 1, and 2k+
3j ≤ n} forms a basis for L(D). 

Exercises

(1) Given a smooth and regular curve C, the Weierstrass gaps is the integers
i such that there is no rational function z which has a pole only at P
and ordP (z) = −i. Show that there are precisely g Weierstrass gaps
where g is the genus of C.
(2) For Example 29, show that (D) = 1 directly.
1
(3) Let P be a rational point on PK . Show that i(P) = 0 and (P) = 2.
1
(4) Given the curve C = PK , find a differential f (x)dx with residues 1 at
x = 1 and −1 at x = −1 with 0 residues elsewhere.
(5) Given the curve PC1 , where C is the field of complex numbers and any
divisor D, show that the dimension of all linear functional on Ξ which
vanishes on Ξ(−D) + F is max(0, d(D)-1).
Algebraic Geometry 131

4.9. Weil’s Conjecture and Hasse–Weil Inequality

Given an smooth projective algebraic curve C in algebraic geometry coding


theory, it is important to know the number of rational points on the curve
over a finite field Fq . Let us extend the ground field Fq to Fqr . Let Nr be
the number of rational points of C̄ over Fqr . The Weil’s conjecture is a
conjecture about the zeta function of C which is related to Nr .

Definition 4.47. The zeta function Z(t) is defined as


∞ 
 tr
Z(t) = exp Nr .
r=1
r 

Clearly, we know the function Z(t) iff we know all Nr . We shall try to


locate Z(t).
Proposition 4.48 (Weil’s conjecture for curves). We have for a
smooth projective curve over a finite field Fq of genus g the following:
P1 (t)
Z(t) = ,
(1 − t)(1 − qt)
where

2g
P1 (t) = (1 − αi t),
i=1

with | αi |= q.

Proof. The conjecture was proved by Weil, see [40]. 


Remark: The Weil’s conjecture for higher dimensional projective algebraic
varieties is established by Deligne (1974). 
Example 31: Let us consider C = Pq1r . It is easy to see that C has q r
rational points on A1qr and one point extra (the point at ∞). So, there are
q r + 1 points and Nr = q r + 1. Substituting it in the definition of Z(t),
we have
∞ 
 tr
r
Z(t) = exp (q + 1)
1
r
∞  ∞ 
 t r  t r
= exp (q r ) exp (1) .
1
r 1
r
132 Introduction to Algebraic Coding Theory

Recall that
 ∞
 
1 xr
ln =− .
1−x 1
r

Therefore, we deduce that


1
Z(t) = .
(1 − t)(1 − qt) 

Proposition 4.49 (Hasse–Weil’s inequality). Let C be a smooth and


absolute irreducible projective curve. Then, we have

| N1 − q − 1 |≤ 2g q.

Proof. We may deduce a proof in the curves case from Hartshorne’s book
“Algebraic Geometry” V Ex 1.10, Appendix C Ex 5.7. 

Example 32: Let p ≥ 5 and q = pm . Then, the Fermat curve xq−1 0 +


xq−1
1 + xq−1
2 has no solution as a projective curve. Therefore, N 1 = 0, and
g = (q−2)(q−3) . The Hasse–Weil’s inequality is to show that (q − 2)(q −
√ 2 √
3) q ≥ q + 1. Let f (q) = (q − 2)(q − 3) q − q − 1. It is easy to see that
f (5) > 0 and f  (x) > 0 for all x ≥ 5. It is easy to see f (q) > 0 for all p ≥ 5
and Hasse–Weil’s inequality is satisfied in those cases. 
Example 33: Let us consider the smooth curve C defined by x30 + x31 + x32
over Fq . It is easy to see that genus g = 1. Let us take q = 2. Let us count
the number of rational points on C. Since x3i = xi , it is easy to see that
two of xi ’s must be 1 and the third is 0. So, we have three rational points.
Let q = 22 . Let us count the number of rational points on C. Note that
α3 = 1 for any non-zero α ∈F22 . It is impossible to have two of x0 , x1 , x2 to
be 0 since the third one must be 0 and (0, 0, 0) is not a projective point. It
is also impossible to have x0 , x1 , x2 all non-zero. Note that then x3i = 1 for
all i, we have the impossible situation of 1 + 1 + 1 = 0. Therefore, we must
have exactly one zero. Note that if several xi = 0, then we may assume
that one of xi = 1. Then, we have the following:

(1) x0 = 0, x1 = 1, x2 arbitrarily non-zero ∈ F22 . There are three points.


(2) x1 = 0, x2 = 1, x0 arbitrarily non-zero ∈ F22 . There are three points.
(3) x2 = 0, x0 = 1, x1 arbitrarily non-zero ∈ F22 . There are three points.

So, there are totally nine points. This makes the non-restrictive
inequality in Hasse–Weil’s inequality an equality.
Algebraic Geometry 133

Let q = 23 . We shall use brute force to compute the number of rational


points. It is impossible to have two of x0 , x1 , x2 to be 0 since the third one
must be 0 and (0, 0, 0) is not a projective point. So, we have the following
two cases: (1) two of them are non-zero and the third zero; (2) all of them
are non-zero.

Case (1) Let us consider x2 = 0, x1 = 1. Then, we have

x30 = 1.

The above equation is satisfied only if x0 = 1. Therefore, by taking x0 = 0


or x1 = 0, we conclude that there are three rational points.

Case (2) We may take x2 = 1. Let α be a generator of the multiplicative


group F2∗3 . Then, x7 = 1 for any non-zero x. If x31 = 1, then x1 = 1, and we
must have x0 = 0. This point has been counted. Therefore, we have that
x1 = 1, or x1 = α, α2 , . . . , α6 . We want to show that for every possible value
of x1 , we may find one and only one value for x0 to satisfy the equation.
Let us consider x1 = αj with 1 ≤ j ≤ 6. Let x0 x¯0 be two solutions with
fixed x1 = αj . Then, we have

1 + α3j = αk = αk+7 = αk+14 = x30 = x̄30 .

Then, it is easy to see that


3
x̄0
= 1.
x0
Therefore, x̄0 = x0 , so the solution must be unique. Moreover, 3 is a factor
of one of k, k + 7, k + 14; therefore, the equation can be solved.
So, there are totally nine rational points.
Let q = 24 . We shall use brute force to compute the number of rational
points. Let α, β be defined by the following equations:

α2 + α + 1 = 0,
β 2 + β + α = 0.

It is easy to see that F22 = F2 [α] and F24 = F22 [β]. Let us count the
number of rational points on C. It is impossible to have two of x0 , x1 , x2
to be 0 since the third one must be 0 and (0, 0, 0) is not a projective point.
So, we have the following two cases: (1) two of them are non-zero and the
third zero; (2) all of them are non-zero.
134 Introduction to Algebraic Coding Theory

Case (1) Let us consider x2 = 0, x1 = 1. Then, we have


x30 = 1.
The above equation is satisfied only if x0 ∈ F22 ⊂ F24 . There are 3 possible
points. Therefore, by taking x0 = 0 or x1 = 0, we conclude that there are
9 rational points.
Case (2) We may take x2 = 1. Then, the defining equation can be rewritten
as
x30 + 1 = x31 .
Let y = x30 = 0. Since x1 = 0, we have
y 5 = x15
0 = 1,

(y + 1)5 = x15
1 = 1.

It is easy to see that the above equations can be rewritten as


y 4 + y 3 + y 2 + y + 1 = 0,
y 4 + y + 1 = 0.
Then, y = 0 or 1 which are impossible. Therefore, there is no rational point
in this case. Totally, we have 9 rational points.
We may use the Weil’s Conjecture for the same purpose to determine
the numbers of points for all F2r once we know the number N1 = 3 and
N2 = 9. Let us compute N3 . Note that
∞ 
 tr
Z(t) = exp Nr ,
r=1
r

P1 (t)
Z(t) = ,
(1 − t)(1 − qt)
where

2g
P1 (t) = (1 − αi t) = (1 − αt)(1 − βt)
i=1

since g = 1. Then, we have


ln Z(t) = 3t + 9t2 /2 + N3 t3 /3 + · · · = ln (1 − αt) + ln (1 − βt)
− ln (1 − t) − ln (1 − 2t) = (1 + 2 + α + β)t
+ (1 + 22 + α2 + β 2 )/2t2 + (1 + 23 + α3 + β 3 )/3t3 + · · · .
Algebraic Geometry 135

Comparing coefficients of the powers of t, we have the following equations:

(1) 1 + 2 + α + β = 3,
(2) 1 + 4 − α2 − β 2 = 9,
(3) 1 + 23 − α3 + β 3 = N3 .

Solving the first two equations, we have


√ √
α = −2, β = − −2.

Substituting into equation (3), we conclude that N3 = 9. It checks with our


previous computation.
Note that Hasse proved the following formula for any genus 1 smooth
projective curve:

Nr = 1 + q r − αr − β r .

Our result checks with the formula. 


Example 34: We want to show that the zeta function is not determined
by the genus of the curve. Let us consider the following genus one curve:

y 2 = x(x − 1)(x − α)

over F22 , with α = 0, 1 in F22 . Then, F22 consists of 0, 1, α, α + 1 and α


satisfies the following equation:

α2 + α + 1 = 0.

Let x = 0, 1, α, then y = 0. Let x = α+1, then y = 1, we have another point


at ∞; therefore, there are five rational points. In the previous example,
there are nine points over F22 . Therefore, the zeta functions must be
different. 
Example 35: Let us consider a curve C defined by x50 + x51 + x52 = 0 over
F24 . Let us count the number of rational points on C. It is impossible to
have two of x0 , x1 , x2 to be 0 since the third one must be 0 and (0, 0, 0) is
not a projective point. Let us consider the case that there is exactly one
zero. Then, we have

(1) x0 = 0, x1 = 1, x52 = 1. Note that x5 + 1 | x15 + 1 by the field equation


for F24 , there are 5 distinct x2 satisfying x52 = 1. There are five points.
(2) x1 = 0, x0 = 1, similarly there are five points.
(3) x2 = 0, x1 = 1, similarly there are five points.
136 Introduction to Algebraic Coding Theory

Suppose that there is no zero among x0 , x1 , x2 . We may take x0 = 1.


Let x51 = a = 0, then x52 = 1 + a = 0. It means that a = 0, 1. Since
(x51 )3 = 1 = (x52 )3 , we must have a3 = (a + 1)3 = 1. Or a, a + 1 ∈ F22 , Let
F22 = F2 [α]. Then, (a, a + 1) = (α, α + 1) or (α + 1, α). For the equations
x51 = a, x52 = 1 + a, there are five distinct solutions for each, so there are 25
solutions for one pair (a, a + 1). There are 50 solutions. The total number
of solutions is 65. Note that g = 6 and | 65 − 6 − 1 | ≤ 2 × 6 × 8. The
Hasse–Weil’s inequality is satisfied. 

Example 36: For the later applications in coding theory, let us consider
the Klein quartics curve over F2 or F22 or F24 defined by the equation
x3 y + y 3 + x = 0. Its genus is g = 3.
For coding theory, let us count the number of rational points. It is clear
that x = 0 ⇔ y = 0. We denote this point (0, 0) (or (1, 0, 0) as the projective
point) by P. It is easy to see that (1, 1) is not a point. After we homogenize
the equation, it becomes x3 y + y 3 z + xz 3 = 0. Let z = 0. Then, the curve
has other two points at ∞, (0, 0, 1) = P1 and (0, 1, 0) = P2 over any
field K. So, there are three rational points over F2 :

P = (1, 0, 0), P1 = (0, 0, 1), P2 = (0, 1, 0).

Now, let us count the points at the finite distance. Over F22 , we have x3 = 1
for all x = 0. Therefore, the affine equation gets reduced to y + 1 + x = 0
if x, y = 0, i.e., y = 1 + x, where x = 1, 0. Let α be a field generator of F22
over F2 , i.e., α2 + α + 1 = 0. Then, (α, α + 1) = P3 , (α + 1, α) = P4 are
the extra rational points on the curve. Therefore, over F22 , there are five
rational points. The two extra points are

P3 = (α, α + 1), P4 = (α + 1, α).

Let us consider the field F24 . Let F24 = F22 [β] with β satisfying β 2 +β+
α = 0. First, we consider x = α. Note that α3 = 1. Therefore, the equation
will be reduced to y 3 + y + α = 0. It is easy to see that y 3 + y + α =
(y + α + 1)(y 2 + (α + 1)y + (α + 1)). We have to solve y 2 + (α + 1)y + (α +
1) = 0. The two roots are y = α2 (β + 1), α2 β. We name (α, α2 (β)) = P5 ,
(α, α2 (β + 1)) = P6 . Similarly, we find P7 = (α + 1, (α + 1)2 (β + α),
P8 = (α + 1, (α + 1)2 (β + α + 1)). Thus, we have four more points:

P5 = (α, αβ + β), P6 = (α, αβ + β + α + 1),


P7 = (α + 1, αβ + +α + 1), P8 = (α + 1, αβ + α).
Algebraic Geometry 137

We may consider the possibility that y = α, α + 1. Let y = α. Then,


the equation will reduce to αx3 + x + 1 = 0. Replacing x by αu, and the
equation becomes u3 + u + α + 1 = 0, then the equation is similar to the
one we just discussed. We conclude that there are four more points:
P9 = (αβ + β + 1, α), P10 = (αβ + β + α, α),
P11 = (αβ, α + 1), P12 = (αβ + α, α + 1).
Let us consider y = αβ. Then, P13 = (β, αβ). Let us consider y =
α(β + 1). Then, P14 = (β + 1, αβ + α). Let us consider y = αβ + β + 1.
Then, P15 = (α + β, αβ + β + 1). Let us consider y = αβ + β + α. Then,
P16 = (α + β + 1, αβ + β + α).
There are 17 rational points over F24 . 

Exercises

(1) Count the number of solutions of x2 + y 2 = 1 over Z2 . Do the same


over Z5 .
(2) Do not use the propositions of this section. Show that a projective curve
of genus 0 over Fq has at most q + 1 rational smooth points.
(3) Let {Cs } be a sequence of curves with genus gs and the number of
rational smooth point Ns . Show that if Ns → ∞ , then gs → ∞ .
(4) Show that the numbers of rational points for the projective curve in
Example 33 satisfy the Hasse–Weil’s inequality.
(5) Use Example 33 and Weil’s conjecture to compute Weil’s zeta function
for that particular projective curve.
(6) What is the number of rational points for the projective curve in
Example 33 over the field F25 ?
(7) What is the number of rational points for the projective curve in
Example 33 over the field F26 ?
(8) Find the zeta function of the curve in Example 34.
This page intentionally left blank
PART IV

Algebraic Geometric Codes


This page intentionally left blank
Chapter 5

Algebraic Curve Goppa Codes

5.1. Geometric Goppa Codes

We use the term the theory of smooth algebraic projective curves over a finite
field Fq for the theory of functions of one variable over a finite field Fq .
As we all know that after Shannon, the theory of coding theory were
separated into two parts: (1) theoretical part about the existence of good
codes, (2) decoding procedures. Note that for Hamming code, or more
general for any vector-space code, we have the following diagram:
π
message space Fkq −
→ word space Fnq ,
where the map π is an injective map with the image of the code space.
For the later developments, we slightly modify the coding theory to the
following diagram:
σ σ
message space Fkq −→
1 2
function space −→ word space Fnq .
The first map σ1 is injective. Thus, we use functions to rename the messages,
and the second map σ2 is a homomorphism with the image of σ2 σ1 (Fkq ) =
as the code space. Usually, the map σ2 is an evaluation map which evaluates
a function f at an ordered n-tuple of points (P1 , . . . , Pn ). Thus, it maps a
function f to [f (P1 ), . . . , f (Pn )] ∈ Fnq . Note that σ2 σ1 will send the message
space to the word space, certainly we do not want to send any non-zero
message to zero; thus, we require that the composition σ2 σ1 is an injective
map on the message space. In our previous discussions, the message space
is naturally mapped to the code space. The theorists are mainly working on

141
142 Introduction to Algebraic Coding Theory

the function space, and the engineers work on the methods to correct the
errors after the transmissions of the code words. For Reed–Solomon codes,
the function space is all polynomials with degree k − 1 or less, which is a
subspace of all polynomials with degree n − 1 or less (these pair of vector
spaces are mapped to a pair of a code space ⊂ the word space = Fnq by
evaluations at a sequence of points). For classical Goppa codes, the function
space is the set of rational functions of the following form:
n
 ci
f= ≡0 mod (g(x))
i=1
(x − γi )

and σ2 (f ) is the n-tuples [c1 , . . . , cn ].


In 1981, Goppa discovered an amazing connection between the theory
of algebraic smooth curves over a finite field Fq and the theory of error-
correcting block q-ary codes. He allowed the function space to be a subspace
of L(D) for some divisors D on the smooth curve. The idea is quite simple
and generalizes the well-known construction of Reed–Solomon codes and
the classical Goppa codes. The Reed–Solomon codes use polynomials in one
variable over Fq (the rational functions over P1Fq with only pole at ∞) (see
Example 1), and the classical Goppa codes use the residue form over PF1q
(see Example 3). Goppa generalized those ideas using rational functions
or differentials on an algebraic projective curve (these two versions are
equivalent, see later discussions). In coding theory, we have message space
Fkq , the k-dimensional code space, and the word space U = Fnq . Taking any
basis of the k-dimensional subspace of the function space, we may specify

a basis {φi } for the subspace and a map π([a1 , . . . , ak ]) = i ai φi from the
message space to the function space, and further, define the second map
which complete the coding process. In terms of linear algebra, what we
require is

image(σ1 ) ∩ ker(σ2 ) = {0}.

Note that if the dimension of image(σ1 ) is k, then the rate of information


is k/n (cf. Definition 1.27). Certainly, we want to consider the maximally
possible k, i.e., we shall thus maximize the message space by
requiring in the rest of the book that

image(σ1 ) ⊕ ker(σ2 ) = the function space.

This part of transforming messages to function space is easy. We shall


discuss the map σ2 of the function space to Fnq first.
Algebraic Curve Goppa Codes 143

Definition 5.1 (Algebraic geometric code or a geometric Goppa


code in function form C L (B, D)). Let C be an absolutely irreducible
smooth projective algebraic curve of genus g over Fq . Consider an (ordered)
set {P1 , P2 , . . . , Pn } of distinct Fq rational points on C and a divisor B =
n
i=1 Pi and a divisor D on C. For simplicity, let us assume that the
support of D is disjoint from the support of B; therefore, f (Pj ) = ∞ for
f ∈ L(D) for all j (i.e., L(D) is the function space). The linear evaluation
map EvB (i.e., σ2 ) will send the linear space L(D) of rational functions on
C (associated to D) to the word space Fnq .
EvB : L(D)→ Fnq .
f → (f (P1 ), . . . , f (Pn )). 

Remark 1: If we take D as mP for some rational point P, then it is called


a one-point code. 
It is easy to see that the word space is Fnq in n-dimensional vector
space. We wish to find the parameters [n, k, d] for this code. Clearly,
n = dim(Fnq ). A function f ∈ L(D) is sent to [0 · · · 0] ⇔ f (Pj ) = 0 for
all j, i.e., f ∈ L(−B) ⇔ (since the supports of B and D are disjoint)
f ∈ L(D − B). Therefore, we have the code space which is canonically
isomorphic to L(D)/L(D − B); it means that with V = image(σ1 ), we
have
V ⊕ L(D − B) = L(D),
where V (=σ1 (message space)) is isomorphic to the code space. Let k =
dimension of V = dimension of code space, then we have
k = (D) − (D − B).
Once we have the function space, let {φ1 , . . . , φk } be the basis for the

subspace V , then a map π([a1 , . . . , ak ]) = i ai φi will define a map from
the message space to V ⊂ the function space.
The important cases for applications are 0 < d(D) < n; we find the
indices k and d. We have the following proposition.

Proposition 5.2. Let us use the notations of Definition 5.1. Let us


assume that 0 < d(D) < n and d(D) > g − 1, then (D − B) = 0 and
the code space is isomorphic to L(D), and
k = (D) ≥ d(D) − g + 1 > 0.
The minimal distance d satisfies
d ≥ n − d(D).
144 Introduction to Algebraic Coding Theory

We have the following:

n + 1 ≥ k + d ≥ n + 1 − g.

Proof. In this case, we have d(D − B) < 0 and (D − B) = 0. Therefore,


L(D − B) = {0}, and the code space V = L(D). In view of Riemann’s
theorem, we have

k = (D) ≥ d(D) − g + 1 > 0.

We prove the inequality d ≥ n − d(D). Let D be written as the difference


of two disjoint effective divisors D = D0 − D∞ . Since k > 0, we may

pick 0 = f ∈ L(D). Let B be Pj , where B Pj and f (Pj ) = 0.

Clearly, we have B B . Let us separate the proof into the following
two cases: either (1) d(B ) ≤ d(D) or (2) d(B ) > d(D). (1) Note that if
d(B ) ≤ d(D), then the distance d(f, 0) = n − d(B ) ≥ n − d(D), recall that
d = min{d(f, 0) : f ∈ L(D)}, which is the inequality d ≥ n − d(D) we wish
to prove.
The relation (f ) + D = (f )0 − (f )∞ + D0 − D∞ 0 means that
(f )0 D∞ , D0 (f )∞ and (f )0 B . Since B and D are disjoint, B ,
D∞ are disjoint, then it is easy to conclude that (f )0 D∞ + B . Let us
consider the case (2), i.e., d(B ) > d(D). Let d(D) = d(D0 )−d(D∞ ) = r−s,
where r = d(D0 ) and s = d(D∞ ). Then, d((f )0 ) ≥ d(D∞ ) + d(B ) >
s + d(D) = s + r − s = r ≥ d((f )∞ ). Therefore, f = 0. A contradiction to
our assumption that f = 0. We prove that

d ≥ n − d(D).

For the last inequalities, the first one is the classical Singleton bound
(Proposition 1.23). The second one comes from the two inequalities already
established for k and d of this proposition as

k + d ≥ d(D) − g + 1 + d ≥ n + 1 − g. 

Remark 2: The number k provides the rate of information k/n, and the
lower bound n − d(D) for d provides the bound number  (n−d(D) 2  for the
number t of corrections for the code (cf. the remark after Proposition 1.25,
as long as t ≤  (n−d(D)−1
2 , the code can correct t errors by brute force). The
lower bounds for the rank (d(D)−g+1) and the minimal distance (n−d(D))
are called the designed rank and designed minimal distance. 
Algebraic Curve Goppa Codes 145

Example 1: Let the curve C be the projective line PF1q over a finite field Fq .
Let n = q − 1 and β be a generator of the cyclic group F∗q . Let B =
Pβ + Pβ 2 + · · · + Pβ n , where Pβ j is the point that solves the equation
x − β j = 0, and D = (k − 1)P∞ , where P∞ is the point at infinity, and we
assume that k < n. Then, we have

L(D) = the set of all polynomials of degree less than k over Fq .

It is then easy to see that this is the classical Reed–Solomon code. 

Example 2: Let us consider curve C defined by x30 + x31 + x32 over


F22 . It is a curve of genus 1. According to Example 33 of Chapter 4,
there are nine rational points. We may take eight of them and call them

P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , and call the other one P. Let B = 8i=1 Pi
and D = mP. Then, it follows from the preceding proposition that

k ≥ m − 1 + 1 = m,
d ≥ 8 − m.

If we select m = 3, then k ≥ 3, d ≥ 5 and the code will have at least a rate


of information 38 , and later on, we shall show that it corrects at least  d−1
2 
≥2 errors. It is distinct from the Hamming code which corrects only 1 error
(cf. the remark after Proposition 1.25). 
Originally, Goppa used the differentials on C to form the dual construc-
tion of the above as follows.

Definition 5.3 (Algebraic geometric code or a geometric Goppa


code in residue form C Ω (B, D)). Let C be an absolutely irreducible
smooth projective algebraic curve of genus g over Fq . Consider an (ordered)
set {P1 , P2 , . . . , Pn } of distinct Fq rational points on C and a divisor B =
n
i=1 Pi , and a Fq -divisor D on C. For simplicity, we assume that the
support of D is disjoint from the support of B. Let the space of differentials
be Ω(D − B) and the map σ2 be defined as the following map resB .

resB : Ω(D − B)→ Fnq ,


η → (resP1 η, . . . , resPn η). 

We wish to find the parameters [n, k, d] for this code. Clearly, n =


dim(Fnq ). Note that Ω(D) ⊂ Ω(D − B). A differential η ∈ Ω(D − B) is sent
146 Introduction to Algebraic Coding Theory

to [0 · · · 0] ⇔ resPj η = 0 for all j, i.e., (η) D ⇔ η ∈ Ω(D). There is a


vector subspace V ⊂ Ω(D − B) with V ⊕ Ω(D) = Ω(D − B), and
k = i(D − B) − i(D).

Proposition 5.4. A geometric Goppa code in residue form defines an


[n, k, d] code in the same way as above; then, we have n = dim(Fnq ),
k = n − ((D) − (D − B)),
and in particular if 2g − 2 < d(D), then
k = n − d(D) + g − 1 + (D − B),
and the minimal distance d satisfies
d ≥ d(D) − 2g + 2.

Proof. It follows from the previous discussion that


k = i(D − B) − i(D).
It follows from the Riemann–Roch theorem that
i(D) = (D) − d(D) − 1 + g,
i(D − B) = (D − B) − d(D − B) − 1 + g.
Therefore, we have
k = −(D) + d(D) + (D − B) − d(D − B))
= n − ((D) − (D − B)).
In case that 2g−2 < d(D), we use Proposition 4.44 Riemann–Roch theorem
to conclude that i(D) = 0 and (D) = d(D) + 1 − g. Therefore, we have
k = n − ((D) − (D − B))
= n − d(D) + g − 1 + (D − B).
Let us assume that there are more than n − (d(D) − 2g + 2) zero-residues
for a differential η at {P1 , . . . , Pn }, that is, under the residue map, res(η)
has more than n − (d(D) − 2g + 2) points of {P1 , . . . , Pn } with zero values.
It thus has less than (d(D) − 2g + 2) points of {P1 , . . . , Pn } with non-zero
values. We wish to prove that η = 0.

Let B = j Pj , where j runs through all points Pj with respj η = 0.
Note that η will have no poles at those Pj . Then, as an assumption, we
have d(B ) > n − (d(D) − 2g + 2) or d(B − B ) < d(D) − 2g + 2. Let us write
Algebraic Curve Goppa Codes 147

D = D0 − D∞ as both effective and disjoint. We have d(D) = d(D0 ) −


d(D∞ ) = r − s (for notation, see the proof of Proposition 5.2). Then, we
have (η)0 − (η)∞ D0 − D∞ − B, d((η)0 ) ≥ d(D0 ) = r, D∞ + (B − B )
(η)∞ , and d((η)∞ < s + (d(D) − 2g + 2) = r − 2g + 2. Therefore, we have
d((η)) = d((η)0 ) − d((η)∞ ) > 2g − 2. If η = 0, then d((η)) = 2g − 2. Hence,
η = 0. What we have proved is that any non-zero differential η cannot have
more than n − (d(D) − 2g + 2) zero-residues. Therefore, it has more than
or equal to (d(D) − 2g + 2) non-zero-residues. It means that we prove
d ≥ d(D) − 2g + 2. 
Remark 3: The lower bounds for the rank (n − d(D) + g − 1 + (D − B))
and the minimal distance (d(D) − 2g + 2)) are called the designed rank
and designed minimal distance. 
Example 3: Recall Definition 3.20 of classical Goppa code as follows.
Definition 3.20. Let G={γ1 , . . . , γn } ⊂ Fq . We define the (classical)
Goppa code Γ(G, g(x)) with Goppa polynomial g(x), where g(γi ) = 0 for
1 ≤ i ≤ n to be the set of code words c= [c1 , . . . , cn ] over the letter field
Fq for which
n
 ci
≡0 mod (g(x)).
i=1
(x − γi ) 
Let us consider a geometric Goppa code as follows. Let the curve C be
the projective line PF1q over a finite field Fq . Let β1 , . . . , βn be n distinct
elements in Fq (Note that n ≤ q). Let g(x) be a polynomial with g(βj ) = 0
for all j. Let us consider Pj equals the point defined by x − βj = 0 and P∞
equals the point at ∞. Let

D = (g(x))0 − P∞ = ordQ (g(x))Q − P∞
Q=P∞

and B = Pi . Let us pick any non-zero η ∈ Ω(D − B). Then,
r(x)
η = f (x)dx = dx.
s(x)

Since η has only poles at finite distance in B, then s(x) | (x − βj ). We

may assume that s(x) = (x−βi ). Check the order at P∞ . Note that given
deg(r(x)) = m, deg(s(x)) = n, the number of poles at ∞ for r(x)/s(x) is
m − n, and dx has two poles at ∞. Since η has at most one pole at ∞, we
have m − n + 2 ≤ 1 or m ≤ n − 1 or deg(r(x)) ≤ n − 1. So, we have by the
148 Introduction to Algebraic Coding Theory

theory of partial fractions that


 cj
η = f (x)dx = dx.
x − βj
Then, clearly cj = residue at Pj and the mapping which maps η to
[c1 , . . . , cn ] are the same for the residue map η :→ [resP1 , . . . , resPn ].
Furthermore, ordQ (r(x)) ≥ ordQ (g(x)) for all Q. Therefore, g(x) | r(x).
 ci
We conclude that η ∈ Ω(D − B) ⇔ x−βi ≡ 0 mod (g(x)). It is easy to
see that this is the classical Goppa Code. 

Both Reed–Solomon codes and classical Goppa codes lead to geometric


Goppa codes. We have a third definition of geometric Goppa codes as
follows.

Definition 5.5 (Algebraic geometric code or a geometric Goppa


code in primary form C p (B, D)). Let C be an absolutely irreducible
smooth projective algebraic curve of genus g over Fq . Consider an (ordered)
set {P1 , P2 , . . . , Pn } of distinct Fq rational points on C and a divisor B =
n
i=1 Pi and an Fq -divisor D on C. For simplicity, let us assume that the
support of D is disjoint from B. The primary Goppa code is defined as the

set of vectors [a1 , . . . , an ] ∈ Fnq such that aj f (Pj ) = 0 for all f ∈ L(D).


We wish to establish the relations between the three different versions


of geometric Goppa codes by proving the following propositions.

Proposition 5.6. The geometric Goppa code in function form CL (B, D)


is the dual code of the geometric Goppa code in primary form Cp (B, D).
The dimension of the code space of the geometric Goppa code in primary
form Cp (B, D) is n − ((D) − (D − B)).

Proof. It is evident from the definitions. 

We consider two codes are identical if both their word spaces are the
same Fnq and their code spaces are the same subspace.

Proposition 5.7. The geometric Goppa code in residue form CΩ (B, D)


is the geometric Goppa code in primary form Cp (B, D).

Proof. We prove the code space of CΩ (B, D) ⊂ the code space of


Cp (B, D). Then, by a dimension argument, since they are both of dimen-
sions n − ((D) − (D − B)), they must be equal.
Algebraic Curve Goppa Codes 149

Let f ∈ L(D) and η ∈ Ω(D − B). Then, (f ) −D and (η) D − B.


It follows that (f η) = (f ) + (η) −B. It means that all residues of f η
outside {Pj } are zeroes. So, we have (cf. Proposition 4.37)

f (Pj ) resPj (η) = 0.

Hence, we conclude that the code space of CΩ (B, D) and Cp (B, D) are each
the dual code of the code space of CL (B, D). Hence, the geometric Goppa
code in residue form CΩ (B, D) is the geometric Goppa code in primary form
Cp (B, D). 

Proposition 5.8. Let C be a smooth projective curve over Fq with distinct


n
rational places P1 , . . . , Pn , P and B = j=1 Pj ; then, there exists a
differential ω such that (ω) −B such that
resPj (ω) = 1, for j = 1, . . . , n.
Let U = B + (ω) or (ω) = U − B. Clearly, the supports of U, B are
disjoint. Let D be any divisor with the support of D disjoint from the support
of B. Let D = U − D (on the other hand, if we are given any divisor
D with the support of D disjoint from the support of B, we may define
D = U − D ). Then, the geometric Goppa code in residue form CΩ (B, D)
is isomorphic to the geometric Goppa code in function form CL (B, D ).

Proof. Clearly, it follows from the remark after Proposition 4.38 that the
differential ω exists. The divisor (ω) = −B + U, such that the support of
B is disjoint from the support of U. Let us define a map π : CL (B, D ) →
CΩ (B, D) by
π(f ) = f ω.
It is easy to see that
f ∈ L(D ) ⇔ (f ) + U − D 0
⇔ (f ) D − U ⇔ (f ) + (ω) D−U+U−B=D−B
⇔ f ω ∈ Ω(D − B).
Furthermore, π : L(D − B)→ Ω(D) by
f ∈ L(D − B) ⇔ (f ) + U − D − B 0 ⇔ (f ) B+D−U
⇔ (f ) + (ω) B + D − U + U − B = D ⇔ f ω ∈ Ω(D).
Hence, π : L(D )/L(D − B) → Ω(D − B)/Ω(D), and π induces an
isomorphism. 
150 Introduction to Algebraic Coding Theory

This theory is sensitive to the curves and divisors involved. For instance,
there are curves with very few rational points or even no rational points.
Clearly, the theory is bad or void in that case. For smooth plane curves,
2
since the plane PK has q 2 + q + 1 rational points, there are at most q 2 + q + 1
rational points on the plane curve, and the geometric Goppa code based on
it will be short. Or the divisor selected is poor for the application purposes
(we have to take the decoding process in consideration). Therefore, we
shall consider space curves. We are interested in special curves with many
rational points so that the selection is easy with special divisors which may
aid us in decoding.
We illustrate the above method by several examples as follows.

Example 4: Let us fix the ground field to be F24 . Let us consider curve C


defined by x50 + x51 + x52 over F24 . It follows from Example 35 in Section
4.9 that the number of rational points is 65 and the genus g = 6. Let us
take one rational point P, B= the sum of the remaining 64 points and
D = 37P. Then, d(D − B) = −27 < 0, and (D − B) = 0. Note that
10 = 2g − 2 < d(D) = 37. Therefore, for geometric Goppa code in residue
form (cf. Proposition 5.4), the rank k = 64−37+6−1 = 32, and the minimal
distance d ≥ 27. As for Goppa code in function form, the rank k = 32, and
the minimal distance d ≥ 27. We have the following data: n = 64, k = 32,
the rate of information is k/n = 32/64 = 1/2, and the number of maximal
correctable errors is (d − 1)/2 ≥ 13. Comparing with the Reed–Solomon
code over F24 , which has n = 24 − 1 = 15, 1 ≤ k < 15, say k = 7, then the
rate of information is closed to 1/2, and d = n − k + 1 = 9. It may correct
4 errors. It is a much shorter code with weaker correcting power. 

5.2. Comparisons with Reed–Solomon Code

As pointed out by the remark of Proposition 1.25, any finite code, linear or
not, can be decoded by brute force. In the next two chapters, we discuss two
faster ways of decoding algebraic curve Goppa codes and compare them with
a brute-force decoding. In this section, we show that if two codes are with
closed rates of information and closed rates of distances, then the longer
one is more precise. This is easy to understand. For instance, let C1 be
an [n, k, d] code which corrects  d−1
2  errors and C2 be an [2n, 2k, 2d] code
which corrects  2 . Let them work on a block of length 2n. Then, we use
2d−1

a decoder for C1 twice and a decoder for C2 once. Then, any received word
which can be decoded by the decoder for C1 can be done by the decoder
Algebraic Curve Goppa Codes 151

for C2 . On the other hand, a received word with  d−1 2 +1 errors for the first
n letters can only be decoded by the second decoder if the total number
2 . Hence, it is easy to see that C2 is
of errors is less than or equal to  2d−1
more precise. We show this point by simple computations.
The decoding process for Reed–Solomon codes are faster than the
known decoding algebraic geometry Goppa codes (see the next chapter).
However, the algebraic geometry Goppa codes are more precise than the
Reed–Solomon codes. The ground field controls the time required for
multiplication, which certainly affects the total speed of computation. For
our comparisons of different codes we shall fix a ground field. Let us fix the
ground field to be Fq = Fpm ; there are q + 1 smooth points in PF2q . There,
the length of code is ≤ q − 1 = 15 = n for Reed–Solomon codes, classical
Goppa codes, etc. The Weil’s theorem on the number of smooth rational
points of a projective curve gives us a much bigger number, and Example 4
of the preceding section shows that a projective curve C over F24 could
have 64 points, which is many more that 24 + 1 = 17 points. So, we have a
longer code. We discuss why a longer code is more precise.
We follow après Pretzel [8], p. 69. A further advance of geometric Goppa
codes in the future might be an improvement of the speed of decoding. If
the speed is not an issue, then the geometric Goppa code has an advantage
of being more accurate, which will be illustrated in this section.
Let us consider a Reed–Solomon code over the field F24 defined by all
polynomials of degree < 7. Then, it is a [15, 7, 9]-code, where 15 = 24 −1, 7 =
7, 9 = 15 − 7 + 1. So, the number of message symbols is 7, and it can correct
4 = (9 − 1)/2 errors. Let us consider a geometric Goppa code based on the
curve x50 + x51 + x52 (cf. Example 4 in the preceding section). By Plücker’s
formula, its genus g is 6. We have computed to find that there are 65 rational
points (cf. Example 35 in Section 4.9). Let one of these 65 points be P.
Let us consider a one-point code with B = the sum of other 64 points and
D = 37P. Then, it is an [n, k, d]-code, where n = 64, k = 32, d ≥ 27. Using
SV algorithm (see the following), it may correct (27 − 6 − 1)/2 = 10 errors.
Let us process through the said Reed–Solomon-code four times. Then, we
process 28 message symbols, while we may process 32 symbols through the
said geometric Goppa code.
Let us compare these two processors. We have 32 > 28 and geometric
Goppa code carrying more messages. Another yardstick is the failure rate,
i.e., the probability that one may fail to recover code messages due to
there being more than allowed errors and mistakenly decoding to wrong
messages. To simplify our notations, we say that the decoder always returns
152 Introduction to Algebraic Coding Theory

an error message in those cases. Let us compute the probability for each
processors to return an error message, i.e., if there are more than allowed
errors appearing. For Reed–Solomon process, if there are 5 or more errors,
then the processor will not decode and will reject the received word. Let
the channel has a probability p of being incorrect and q of being correct.
Then, p + q = 1 and
n

1 = 1n = (p + q)n = Cni pi q n−i
i=0

for a plain processor without self-correcting capability, and the probability


of returning an error message is


1
r0 = pi = p = 1 − q.
i=1

For Reed–Solomon processor, the probability r1 of returning an error


message is


15 
4
i 15−i i 15−i
r1 = C15
i p q =1− C15
i p q .
i=5 i=0

By the same arguments, using SV algorithm for geometric Goppa code, the
probability r2 of returning an error message is


64 
10
i 64−i i 64−i
r2 = C64
i p q =1− C64
i p q .
i=11 i=0

Let us assign a numeral for p. Let r0 = p = 0.01. Then, the above numbers
are r0 = 0.01 and r1 = 0.27627423 × 10−6 , and r2 = 0.595098292 × 10−10 .
Therefore, to use the Reed–Solomon processor four times, the probability
R1 of returning an error message is

R1 = 1 − (1 − r1 )4 = 0.11052 × 10−5 .

We have R2 = r2 = 0.595098292 × 10−10 . Therefore, the SV algorithm for


geometric Goppa code is more accurate than the Reed–Solomon-code. We
may consider a block of 70, 000 message symbols. We have to use the plain
processor 70, 000 times, the Reed–Solomon processor 10, 000 times, and the
geometric Goppa code 2, 188 times. The probabilities of error message for
Algebraic Curve Goppa Codes 153

these three processors are

1 − (1 − r0 )70000 = 100% f or plain processor,


1 − (1 − r1 )10000 = 0.27591868% f or Reed − −Solomon, and
1 − (1 − r2 )2188 = 0.00002188% f or SV algorithm f or Goppa code.

Later, in Section 6.4, we discuss the DU algorithm. Using it, we may


correct up to 13 errors for this particular geometric Goppa code. Then, we
have

64 
13
i 64−i i 64−i
r3 = C64
i p q =1− C64
i p q = 0.1383190860 × 10−10
i=14 i=0

and 1 − (1 − r3 ) 2188
= 0.000000000655500732%, which is much better than
the Reed–Solomon code.

5.3. Improvement of Gilbert–Varshamov’s Bound

It follows from Proposition 1.33 (Shannon’s theorem) that we may have


to consider a sequence of codes with lengths tending to ∞ to achieve an
asymptotically good result. On the other hand, Proposition 1.40 (Gilbert–
Varshamov’s bound) establishes a lower bound,

α(δ) ≥ 1 − Hq (δ).

In this section, we show that the above Gilbert–Varshamov’s bound can


be improved by a family of geometric Goppa codes. We consider geometric
Goppa codes based on a sequence of curves C with  → ∞ . We require
that the numbers n of rational smooth points of C tends to ∞ to satisfy
the requirement of Proposition 1.33 (Shannon’s theorem), Then, it follows
from Proposition 4.49 that the genus g must tend to ∞. Certainly, this
consideration is only for theoretical purposes. For any application, we can
only allow  to become as large as possible.

5.3.1. Modular Curves and Shimura Modular Curves


For the theoretical part of coding theory, we are interested in a sequence
of curves. There are well-known sequences of algebraic curves called as the
modular curves which are discussed briefly in this section.
From complex analysis, we view an elliptic curve E as a torus, which can
be represented as the quotient space of C/L, where the lattice L can be
154 Introduction to Algebraic Coding Theory

defined as a rank 2 Z-submodule of the complex numbers C generated by


two complex numbers ω1 , ω2 such that ω1 /ω2 is not a real number. We may
choose the order of ω1 , ω2 such that im(ω1 /ω2 ) > 0. Then, ω1 , ω2 define
the same lattice L iff
    
a, b ω1 ω1
= ,
c, d ω2 ω2

where the two-by-two matrix above is in the full modular group = the special
linear group over Z = SL2 (Z) = Γ. Let the principal congruence subgroup
of level N be
   
a, b a, b
Γ(N ) = : det = 1, a ≡ d ≡ 1, b ≡ c ≡ 0 modN .
c, d c, d

Two lattices L, L induce isomorphic elliptic curves iff L = c · L , where


c ∈ C∗ = C\{0}. Then, we may normalize the pair (ω1 , ω2 ) by multiplying
ω2−1 to the pair to get (z, 1). Note that we require im(z) > 0. We have the
following computation:
ω1 aω1 + bω2 az + b
z = = = .
ω2 cω1 + dω2 cz + d
We shall define (it to be unorthodox from the point of view of algebra:)
 
a, b az + b
α(z) = z= .
c, d cz + d

Let H = {z ∈ C : im (z) > 0} be the upper half-plane of the complex


plane. It is easy to see that (recall ad − bc = 1) im(α(z)) > 0 as in the
following computation:
   
az + b (az + b)(cz̄ + d)
im(z  ) = im = im
cz + d |cz + d|2
 
adz + bcz̄ im(z)
= im = .
|cz + d| 2 |cz + d|2
Therefore, α sends elements in H to elements in H.
Let Y (N ) = Γ(N )\H and endow it with the quotient topology. It is
known that Y (N ) is an affine curve. It is possible to deduce that Y (1) is
the moduli curve of all isomorphic classes of elliptic curves. Let X(N ) be the
compactification of Y (N ). Then, it is well known that X(N ) is a projective
algebraic curve, and the sequence of X(N ) is a sequence of curves.
Algebraic Curve Goppa Codes 155

Very similar to the modular curves, we have the Shimura curves or


Shimura modular curves. We shall only mention them in the simplest way.
What we use in coding theory is the following Proposition 5.9.
Given a finite field Fp2 , let (= p) be a prime. Let Γ0 () denote the
following multiplicative subgroup of SL2 (Z):
   
a, b a, b
Γ0 () = : det = 1, and c ≡ 0 mod .
c, d c, d

Then, Γ0 () acts on the half-plane H, and Γ0 ()\H is an affine curve Y0 ()
(for a detailed discussion, see Tsfasman et al. [33]). We define the reduction
of Y0 () by characteristic p and complete it to a projective curve X0 ()(=
C ). Then, its genus is /12. We have the following proposition.

Proposition 5.9 (Ihara–Tsfasman–Vlǎduţ–Zink). For a finite field


Fp2 with p prime, there exists a sequence of curves C (= Shimura modular
curves X0 ()) which have genus g = /12 and the number of rational
points n ≥ (p − 1)( + 1)/12. Thus, g /n ≤ 1/(p − 1).

Proof. Ihara [24], Tsfasman–Vlǎduţ–Zink [33]. 

Remark: The curves mentioned above are interesting. For instance, there is
no smooth-plane models for them if  is large enough. Note that a projective
plane PF2q can be decomposed as A2Fq ∪AFq ∪0; its number of rational points
is q 2 + q + 1. Therefore, a smooth plane curve has at most q 2 + q + 1 rational
points; then, the Shimura curves X0 () do not have a smooth and regular
planar model if  > (12(p4 + p2 + 1)/(p − 1) − 1), where q = p2 . Note
that then, we have the number of rational points n ≥ (p − 1)( + 1)/12 >
(p − 1)12(p4 + p2 + 1)/((p − 1)12) = (q 2 + q + 1) the number of rational
points of a plane. 

5.3.2. The Theorem of Tsfasman, Vlǎduţ, and Zink

We want to show that geometric Goppa codes can better than the well-
known Gilbert–Varshamov’s bound. Recall the following entropy function
Hq (x) with Hq (0) = 0 and defined for 0 < x < (q − 1)/q:

Hq (x) = x logq (q − 1) − x logq (x) − (1 − x) logq (1 − x).

We have the following proposition which will be used in the next


theorem.
156 Introduction to Algebraic Coding Theory

Proposition 5.10. The maximum value of Hq (x) − x is attained for x =


(q − 1)/(2q − 1)(= δ). The maximal value of Hq (x) − x is logq (2q − 1) − 1,
i.e., we have 1 − Hq (δ) = 2 − δ − logq (2q − 1).

Proof. Using calculus, we differentiate Hq (x) − x and set it to 0. We have

logq (q − 1) − logq (x) − logq e + logq (1 − x) + logq e − 1 = 0.

The above equation can be rewritten as

logq (q − 1)(1 − x)/(qx) = 0

or

(q − 1)(1 − x)/(qx) = 1.

Solving the above equation, we get

x = (q − 1)/(2q − 1)

and substituting it in Hq (x) − x, we have

(q − 1)/(2q − 1) logq (q − 1) − (q − 1)/2q − 1) logq (q − 1)/(2q − 1)


− (1 − (q − 1)/(2q − 1)) logq (1 − (q − 1)/(2q − 1)) − (q − 1)/(2q − 1)
= logq (2q − 1) − 1.

Using the second derivative test, we get


logq e logq e
− − < 0 if 0 < x < (q − 1)/q.
x 1−x
We find it is negative, so we have the maximal point. 

We have the following theorem for coding theory.

Theorem 5.11. Let us use the notations of the previous Proposition 5.9.
Moreover, we assume that p ≥ 7 and q = p2 . Then, we have a one-point code
on X0 () with block length n − 1 tending to ∞ such that for δ = (q − 1)/
(2q − 1), the relative minimum distance d /(n − 1) tends to a limit > δ,
and their rate of information k /(n − 1) tends to a limit > 1 − Hq (δ).

Proof. Let us consider the sequence of Shimura curves X0 (). Let us select
one rational point P and call the other rational points P1 , . . . , Pn −1 .
Algebraic Curve Goppa Codes 157

n −1
Let B = i=1 Pi . Let D = α P with α > 2g − 2. Then, the rank
k and minimal distance d satisfies
k ≥ (n − 1) − α + g
d ≥ α − 2g + 2
and the rate of distance δ and the rate of information R are given as
R = k /(n − 1) ≥ (n − α + g − 1)/(n − 1),
δ = d /(n − 1) ≥ (α − 2g + 2)/(n − 1).
Choose α so that (α − 2g + 2)/(n − 1) → δ. Then, the limit of δ is
at least δ, and the limit of rate of information R (cf. Proposition 5.9) is at
least
1 − δ − lim(g /n ) ≥ 1 − δ − 1/(p − 1).
Under our assumption, p ≥ 7, and we want to prove that logp2 (2p2 − 1) >
p/(p − 1) for all p ≥ 7. For p = 7, we have log72 (2 · 49 − 1) = 1.1755 > 7/6 =
7/(7 − 1), or the inequality logp2 (2p2 − 1) − p/(p − 1) > 0 is satisfied. It
is not hard to see that it is a monotonic increasing function. Therefore, we
always have logp2 (2p2 − 1) > p/(p − 1) = 1 + 1/(p − 1) for all p ≥ 7. Note
that we use q = p2 , the above result can be rewritten as logq (2q − 1) >
p/(p − 1) = 1 + 1/(p − 1). It follows from the preceding Proposition 5.10
that (cf. Definition 1.36, Proposition 1.40)
α(δ) = lim R ≥ 1 − δ − lim(g /n ) ≥ 1 − δ − 1/(p − 1)
→∞

> 2 − δ − logq (2q − 1) = 1 − Hq (δ). 

Exercises

(1) Find the generator matrix for the code defined in Example 2.
(2) Find the check matrix for the code defined in Example 2.
(3) Write a computer program to code any message for the code defined in
Example 2.
(4) Show that the Gilbert–Varshamov’s bound is not reached by a sequence
of Hamming codes.
This page intentionally left blank
Chapter 6

Decoding the Geometric


Goppa Codes

6.1. Introduction

For a useful code, the decoding process is significant. What do we mean


by decoding is that there is an integer t such that given a received
word r, (1) if there are less than or equal to t errors, then the
decoder will find the original code word; (2) if there are more than
t errors, then either we find a code word c which is within a distance
t from the received word r (in general, if we find a code word, then
with a small probability, c may not be the original sent code word)
or we return an “error” message to indicate that what was found
is not even a code word, and hence there are more than t errors.
As we pointed out in the previous section (cf. Proposition 5.8), the
three different forms of geometric Goppa codes determine each other. In the
present chapter, we discuss only the decoding procedures for the primary
form. Then, the decoding processes for the other two forms naturally follow.
Let us consider a primary code Cp (B, D) based on a smooth curve C
over Fq of genus g, where B = P1 +P2 +· · ·+Pn is a sum of distinct rational
points and D is a divisor with the degree d(D) < n (cf. Proposition 5.2)
and the support of D disjoint from B.
A long message is chopped into many blocks of length n each. We shall
only concentrate on decoding of one block. Hence, when we talk about
messages, error words, words, etc., we always mean blocks of lengths n.
Recall that the primary Goppa code is defined as the set of vectors

[a1 , . . . , an ] ∈ Fnq such that ai f (Pi ) = 0 for all f ∈ L(D) whose
dimension k = n − (D) > 0.
159
160 Introduction to Algebraic Coding Theory

Let the minimal distance d of the code be min{d(a, b) : a = b


are code words}. Note that the only check we have to carry out for a
n
vector a = [a1 , . . . , an ] to be a code word is to check if i=1 ai φj (Pi ) = 0
for a basis {φj }j=n−k
j=1 of the vector space L(D). Later on, when we mention
check procedure, we either mean to use the preceding procedure of check
or use a check matrix (φj (Pi )) to check if (φj (Pi ))aT = 0, which are the
same thing. Let d be the minimal distance of the code space.
The theoretical constructions of codes will be of interest to the general
public if there are economical ways of decoding them. Although the remark
after Proposition 1.25 points out that any code can be decoded by brute
force if the code is finite, there are faster ways than that. We discuss the
possible ways.
Most ways of decoding simply use matrix theory and hence are easy.
There are two prominent ways, among many other methods, of decoding,
which are the Skorobogatov–Vlăduţ algorithm (SV algorithm) (1991) and
the Duursma algorithm (DU algorithm) (1993) which adopts the scheme of
Feng and Rao of majority voting (1993).
The shortcomings of the above-mentioned two methods are the slow
processes of solving linear equations. In general, they require n3 (if we
factor in some extra burden) steps to solve the system of linear equations
which are comparably slower than the process of decoding Reed–Solomon
code (n2 steps). On the other hand, we already know that the geometric
Goppa codes are more accurate than the Reed–Solomon codes (cf. Section
5.2) which is the advantage of geometric Goppa codes.
In the decoding procedures, let a non-negative number t be the number
of errors to be corrected. Later, in the SV algorithm, t is ≤  d−g−1
2 , and in
the DU algorithm, t is ≤  d−1 2 . The decoding procedures of correcting
t errors (cf. the remark after Proposition 1.25) consist, in general, of
three steps as in the following Flowchart 2. In Flowchart 2, syndrm cal
means syndrome calculation (see the section on syndrome calculation under
Section 6.2), error e means error vector e, and error means error message,
which means that there are more than t errors and hence the decoder fails.
If the word r already goes through the check procedure (left half of the
flowchart) and fails, then let it go through the syndrm cal. If it passes
the syndrm cal, which means it either has no error or more than t errors,
since it already failed the check, and it must have more than t errors, then
it goes directly to error.

(1) The sender and the receiver agree on the generator matrix and the
check matrix (cf. Exercises 4 and 5). Upon receiving the received word,
Decoding the Geometric Goppa Codes 161

the receiver either (A) lets it goes to the left to check if it is a code
word by using the check matrix or the check procedure to find if the
received message is a code word. If it is a code word (it may not be
the original sent code word), then go to the next block of message. If
not, then use the following syndrome calculation, which lead it to the
syndrome calculation (see Section 6.2) of the right column. Or (B) We
may go to the right column directly. We apply the syndrome calculation
to the received word r. There are two possibilities: either it passes the
calculation or it fails the calculation. If it passes the calculation and
comes from the left column with a failure for the check procedure or
check matrix, that means the basic assumptions 1 ≤ wt(e) ≤ t is not
true; therefore, we conclude that there are more than t errors, we return
an error message. Or maybe, it directly comes to this calculation and
passes, we have to check it by the checking procedure. If it further passes
the check, then it is a code word; if it fails then it goes to SV or DU.
(2) We start either the SV algorithm (Section 6.3) or DU algorithm
(Section 6.4), which is with the basic assumption that there are t
or less errors. At the end of the procedures, we construct an error
vector e.
(3) Further, the test will decide if r − e is a code vector. If it is, then we
complete the decoding procedure and correct the errors (occasionally,
it may correct more than t errors) successively. If it is not, then there
are more than t errors and return a message “error”.

We have the following Figure 6.1.

received r
XX

check syndrm cal


P

pass fail pass SV or DU




code word syndrm cal check error e

code word error

Figure 6.1. Decoding flowchart.


162 Introduction to Algebraic Coding Theory

6.2. Error Locator

Let us consider a primary code Cp (B, D) based on a smooth curve C over


Fq of genus g, where q = 2m , B = P1 + P2 + · · · + Pn is a sum of distinct
rational points and D is a divisor with the degree d(D) < n (cf. Proposition
5.2) and the support of D disjoint from B. Let a code word c be transmitted
and a word r = c + e be received. The word e is called the error word. The
decoding process is to find e with 1 ≤ wt(e) ≤ t, given only r. Then, we
find c = r − e.

Recall that the check procedure is to compute φk (Pj )rj for {φk }
a basis for L(D). Given any function φ, those numbers φ(Pj )rj for j =
1, . . . , n are our main tools of searching. We define the following.

Definition 6.1. For any rational function φ, r ∈ the rational function field
of C with no pole at B, we define pseudo-dot (or a pairing) product · as
 
φ·r = φ(Pj )r(Pj ) = φ(Pj )rj ,
j j

where rj = r(Pj ) and call φ · r the syndrome of r with respect to φ. Note


that it is a number ∈ Fq . Let U be a divisor. We say that r has no syndrome
with respect to L(U), if φ · r = 0 for all φ ∈ L(U). It means that if there
is no syndrome with respect to L(D), then r is a code word. We define the
 product as
φ  r = [φ(P1 )r(P1 ), . . . , φ(Pn )r(Pn )] = [φ(P1 )r1 , . . . , φ(Pn )rn ],
which is a word. 

Remark: It is easy to check that (ψ  χ)  e = (χ  ψ)  e = χ  (ψ  e).



We first define an error locator theoretically, and then find the
properties of the error locator, and use the properties to locate the error
locations. Once we get the error locator, we use it to find all error locations
and the values for the error vector e at the locations. In this way, we find the
original code word c = r − e. It will be of great help if we know all locations
of the error vector e. Even a partial knowledge of the error locations will
be meaningful. We define the following:

Definition 6.2. Given a received word r with the unknown original code
word c. Let e = (e1 , . . . , en ) be an error word with c = r +e. The point Pj is
called an error location (with respect to e) if ej = 0. A non-zero function
Decoding the Geometric Goppa Codes 163

θ is called an error locator (with respect to e) if θ(Pj ) = 0 for all error


locations Pj and θ has no poles among P1 , . . . , Pn . Thus, θ  e = [0, . . . , 0]
always. 

From an error locator θ, we cannot determine the set of all error


locations M of e since θ may be zero outside M . However, if the set
M  = {Pj : θ(Pj ) = 0} is not too big (see Proposition 6.6) comparing
with M (we have M  ⊃ M ), then we show later (see Proposition 6.8) that
it is possible to determine M and all coordinates {ej : ej = 0} and solve
the decoding problem.

6.2.1. Plan of Decoding


Let us assume that the received word r fails the test of syndrome
calculations; we further assume that 1 ≤ wt(e) ≤ t. Then, we want to
find an error locator. If we can find a non-zero solution {αi } of a system
of equations from the left as follows, then we determine an error locator

θ = αi ψi (see Proposition 6.7):
⎡ ⎤
∗ ··· ··· ∗
  ⎢· · · Si,j · · · · · ·⎥  
α0 · · · · · · αv−1 · ⎢ ⎥
⎣· · · · · · · · · · · ·⎦ = 0 · · · · · · 0 .
∗ ··· ··· ∗
It is possible that we can find only trivial solutions; it simply means that
our basic assumption 1 ≤ wt(e) ≤ t is wrong, i.e., there is no error or there
are more than t errors. We have to use the check procedure or the check
matrix to decide if there is no error or if there are more than t errors. Even
we find the error locator θ, the basic assumption 1 ≤ wt(e) ≤ t may still
be wrong, and we may find a faulty error locator θ and hence a faulty error
vector e. We have to use the check procedure or the check matrix to decide.
We always assume that 1 ≤ wt(e) ≤ t. After we already find an error
locator θ (which may be faulty), we want to find error vector e (which may
be faulty). Let M  = {Pk : θ(Pk i) = 0}. We solve a system of equations,
where Pk ∈ M  , from the right to determine an unique error vector e, where
∗’s are known numbers:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
∗ ··· ··· ∗ E0 ∗·r
⎢· · · ··· · · · · · ·⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ · ⎥ ⎢ · ⎥
⎣· · · φj (Pk ) · · · · · ·⎦ · ⎣ · ⎦ = ⎣ · ⎦.
∗ ··· ··· ∗ Eu−1 ∗·r
164 Introduction to Algebraic Coding Theory

It is possible that the error vector e we find may not be the true one. It
is due to that our basic assumption 1 ≤ wt(e) ≤ t is not satisfied. We have
to use the check procedure or the check matrix to decide if e is the true one
by checking if r − e is a code word.

6.2.2. The General Properties of Error Locators and


Syndrome Calculations

Let us consider L(D). Let {ϕ0 , . . . , ϕ−1 } be a basis of L(D). Then, we may
write {ϕi (Pj )rj } as a  × n matrix (ϕi (Pj )rj ). Note that if every row adds
up to 0, then ϕi · r = 0, ∀i, so there is no syndrome with respect to L(D)
and r = c or e = 0. The above is the check procedure. We use the following
proposition to decide if a received word r has either more than t errors or no
error (cf. the remark after Proposition 1.25). It is an important proposition
to understand the decoding procedures; it shows that if we only allow t or
less errors in any block, we shall only check all functions χ in L(Y) with
d(Y) ≥ t + 2g − 1 and the support(Y) disjoint from support(B) instead of
check L(D). Note that even if there is no syndrome, it may be due to the
false assumption that the number of error is ≤ t (i.e., the true number of
errors > t). We have to further test to see if it is a code word.

Proposition 6.3. Consider a received word r with error word e satisfying


1 ≤ wt(e) ≤ t. If for all functions χ in L(Y) with d(Y) ≥ t + 2g − 1 and
the support(Y) disjoint from support(B), χ · e = 0, then e = 0, i.e., there
is no syndrome.

Proof. Note that e is a code word in Cp (B, Y) which can be identified


with the residue form CΩ (B, Y) (Proposition 5.7). Furthermore, it follows
from Proposition 5.4 that the minimal weight of the non-zero code word e
is wt(e) ≥ d(Y ) − 2g + 2 ≥ t + 2g − 1 − 2g + 2 = t + 1. Then, either e = 0
or d(e) ≥ t + 1, and hence, with the assumption that wt(e) ≤ t, we have
e = 0. 

6.2.3. Assumption of t Errors and the Error Locators

The following is a useful and important criterion for error locator θ. This
proposition will be applied also in the DU Algorithm in the next section.

Proposition 6.4. Assume that 1 ≤ wt(e) ≤ t. Assume that d(Y) ≥ t +


2g − 1 and support(Y) is disjoint from the support(B). Let θ be a rational
Decoding the Geometric Goppa Codes 165

function without poles in B. Then, θ is an error locator for e ⇔ θχ · e = 0


for all χ ∈ L(Y).

Proof. Let e = θ  e = (θ(P1 )e1 , . . . , θ(Pn )en ). Note that the weight of
e ≤ t since e has at most t non-zero coordinates.
(⇒) Since θ is an error locator, we have e = 0. We have
θχ · e = χ · (θ  e) = χ · e = 0,
for all χ ∈ L(Y).
(⇐) Since χ · e = χ · (θ  e) = θχ · e = 0 for all χ ∈ (Y), then e is
a code word in Cp (B, Y) which has a minimal weight ≥ d(Y) − 2g + 2 ≥
(2g + t − 1) − 2g + 2 = t + 1 > t. Therefore, e = (0, 0, . . . , 0), and θ is an
error locator for e. 

6.2.4. Syndrome Calculations


We have the following proposition for syndrome calculations.

Proposition 6.5. Consider a received word r with error word e having


wt(e) ≤ t. If all functions χ in L(U) with d(U) ≥ t + 2g − 1 and D  U
satisfy χ · r = 0, then e = 0, i.e., there is no syndrome and r ∈ the code
space.

Proof. Since D  U, then L(U) ⊂ L(D), and χ · c = 0 for all χ ∈ L(D).


Therefore, χ · r = χ · e = 0 for all χ ∈ L(U). We conclude that e = 0 by the
Proposition 6.3. 
It follows from the preceding proposition that if we only consider a
smaller divisor U with d(U) ≥ t + 2g − 1 and D  U, then as long as
the received word r satisfies the following syndrome calculus, where
ϕ0 , . . . , ϕw−1 form a basis for L(U), we can only conclude that either r
is a code word or there are more than t errors:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
ϕ0 (P1 ) · · · · · · ϕ0 (Pn ) r1 0
⎢ · · · · · · · · · · · · ⎥ ⎢ · ⎥ ⎢·⎥
⎢ ⎥ · ⎢ ⎥ = ⎢ ⎥.
⎣ ··· ··· ··· ··· ⎦ ⎣ · ⎦ ⎣·⎦
ϕw−1 (P1 ) · · · · · · ϕw−1 (Pn ) rn 0
Indeed, we have the following. (1) If the equation is satisfied, i.e., the
answer is the zero vector 0 on the right-hand side of the equation, then
there is no syndrome with respect to L(U) ⊂ L(D). However, we can only
conclude that either r is a code word or there are more than t errors.
166 Introduction to Algebraic Coding Theory

If we used the check procedure or the check matrix before heading to decide
that there are errors, then certainly, there are more than t errors. Otherwise,
we complete a basis ϕ0 , . . . , ϕw−1 of L(U) to a basis ϕ0 , . . . , ϕ−1 , we have
to use the remaining part of check procedure now to complete it. Thus, we
can tell if r is a code word or there are more than t errors.
Let us go back to the syndrome calculation without check first. (2) If
the equation is not satisfied, i.e., if the answer on the right-hand side is
a non-zero vector, we assume that there are fewer than t errors and use
the syndrome table to find an error locator and the error vector e (see
Propositions 6.7 and 6.8). Certainly, we have to check if r − e is a code
word by using the check matrix or the check procedure. If r − e is a code
word, then we succeed in correcting the errors. If r − e is not a code word,
then there are more than t errors and the decoder fails. We need d(U) ×
n (sometimes only (2g + t − 1) × n as in the case of one-point codes)
multiplications for the syndrome calculation of every block.
Suppose an error locator θ is found. We wish to estimate the size of the
set M  = {Pi : θ(Pi ) = 0}. The following proposition gives an estimation.

Proposition 6.6. Let A be a divisor. Then, any function θ ∈ L(A) can


have at most d(A) points as zeroes outside the support of A.

Proof. Let us write the divisor:



(θ) + A = ni Q i
Qi ∈support(A)

+ nj Qj ≡ E + F.
Qj∈
 support(A)

If F 
 0 or E 
 0, then E + F
 0. It is impossible. Thus, F  0, E  0.
We wish to show that d(F) ≤ d(A). Suppose the contrary, d(F) > d(A).
Then, d(E + F) > d(A), and d(A) = d((θ) + A) > d(A). A contradiction.

Since we usually select divisor A with support (A) disjoint from support
(B) which contains M  , then we conclude that d(A) ≥ ||M  ||. This is an
important estimation of the size of M  .

6.3. SV Algorithm

This is the part of step (2) of Section 6.1 of this chapter. Let us consider
a geometric Goppa code in primary form Cp (B, D) based on a smooth
Decoding the Geometric Goppa Codes 167

projective curve C with genus g and a suitable divisors B, D. Let the rank
k = n − d(D) + g − 1 (note that according to Proposition 4.28 (Riemann’s
theorem), if d(D) ≥ 2g − 1, then we have (D) = d(D) + 1 − g, and k = n −
(D) = n−d(D)+g −1) and the minimal distance d = d(D)−2g +2 and the
number of permissible errors t ≤  d−g−1
2 . The SV algorithm requires many
numerical inequalities. They are for (1) syndrome calculation to decide if
the syndrome of r with respect to U is 0, (2) the existence of error locator θ,
and (3) the usage of θ to compute the error vector e.

6.3.1. Sufficient Relations

We want to find auxiliary divisors U, A, Y, X which satisfy the following


list of relations between them: (1) D  U; (2) D  Y + A  X;
(3) d(U) ≥ t + 2g − 1; (4) (A) > t; (5) d(A) < d(D) − (2g − 2) − t;
(6) d(X) ≥ d(A) + 2g − 1; (7) d(Y) ≥ t + 2g − 1; (8) support(Y) ∩
support(B) = support(A)∩ support(B) = ∅.
By the preceding 6.5 proposition, U, if exists, can be used for syndrome
calculation. Do we know if those divisors D, U, A, Y, X exist? Let us
consider a simple example of one-point code, we want to show that the
above list of relations can be satisfied in this case. This shows that the
complicated sufficient relations can be satisfied sometimes. After all, we
often use an one-point code.

6.3.2. One-Point Code

Let us consider one-point codes of a special kind.


Let us consider a one-point code Cp (B, mP), where m ≥ 3g + 2t − 1.
We take D = mP = X, A = bP, with g + t ≤ b ≤ m − 2g − t + 1,
U = Y = (m − b)P.
We show that the above sufficient relations can be fulfilled. It is easy
to check that for the above list of relations (1) D  U, (2) D  Y + A  X,
and (8). Support(Y)∩B = ∅.
We use the following lemma to show that all numerical conditions are
satisfied. For (3), d(U) = m − b = d(D) − d(A) ≥ t + 2g − 1. For (4),
it follows from (i) of the following lemma. For (5), it follows from (ii) of
the following lemma. For (6), it follows from (iii) of the following lemma
since X = D. For (7), it follows from (iii) of the following lemma. For (8),
it follows from our selections of Y, A, B. It follows that all numerical
conditions are satisfied.
168 Introduction to Algebraic Coding Theory

Lemma. If 0 ≤ t (t is the number of errors we wish to correct) and m =


d(D) ≥ 3g + 2t − 1, then there is a positive integer b such that
t + g ≤ b ≤ d(D) − 2g − t + 1. (1)
Furthermore, since the divisor A is with d(A) = b, then
(i): (A) > t,
(ii): b = d(A) < d(D) − (2g − 2) − t = d(D) − 2g − t + 2,
(iii): d(D) − b ≥ 2g + t − 1 > 2g − 2.
Note that the designed minimal distance d = d(D) − 2g + 2. Equation (1)
means t ≤  d−g−1
2 .
Proof. Let us prove the last inequality first. Clearly, we have
2t ≤ d(D) − 3g + 1 ⇔ t + g ≤ d(D) − 2g − t + 1 = d − t − 1 ⇔ 2t ≤ d − g − 1.
Then, it follows from (1) that
d−g−1
t≤ .
2
We just pick up an integer b in between t + g and d(D) − 2g − t + 1.
Furthermore, by Riemann’s theorem, we have (i)
(A) ≥ d(A) + 1 − g = b + 1 − g ≥ 1 + t > t.
The parts (ii) and (iii) are obvious. 

6.3.3. Syndrome Table


From now on, we shall assume that the list of relations is satisfied. We may
have a divisor U for the syndrome calculation and another divisor Y for the
following syndrome table (which is different from syndrome calculation).
We assume that U = Y. Let {φ0 , . . . , φu−1 } be a basis for L(X), let
{ψ0 , . . . , ψv−1 } be a basis for L(A), and let {χ0 , . . . , χw−1 } be a basis for
L(Y). Note that v = (A) > t by the condition (4) on the list. Without
losing generality, we use the notation that Sij = ψi χj and
Si,j = Sij · e = ψi χj · e
for the error word e. Note that we let the received word be r = c + e with c
the original code word and e the error vector, then we have D  A+Y  X
and Sij = ψi χj ∈ L(A + Y) ⊂ L(D). Therefore, we have Sij · c = 0 and
Si,j = Sij · e = Sij · r − Sij · c = Sij · r.
Decoding the Geometric Goppa Codes 169

In other words, although e is to be found, we know Sij · e by computing


Sij · r. We form a v × w matrix [Si,j ] and call it the syndrome table.

6.3.4. Construction of Error Locators

The main reference is Proposition 6.3. Now, we construct an error locator


using linear algebra with the help of the preceding propositions. Note that
the following proposition is useful not only for SV Algorithm but also a
good inspiration for DU Algorithm of the next section.

Proposition 6.7. Assume that 1 ≤ wt(e) ≤ t. Further assume that


d(Y) ≥ t + 2g − 1. The following system of equations has a non-trivial
solution: [α0 , . . . , αv−1 ] where v > t,
v−1

(Skj · r)αk = 0, j ≤ w = dim(L(Y)), (2)
k=0

which means in matrix form (recall that Si,j = Sij · r),


⎡ ⎤
S0,0 ··· · · · S0,w−1
  ⎢ ··· ··· ··· ··· ⎥  
α0 · · · · · · αv−1 · ⎢
⎣ ···
⎥ = 0······0 .
··· ··· ··· ⎦
Sv−1,0 ··· · · · Sv−1,w−1

The left null space of the above matrix is not trivial. For any non-trivial

solution [α0 , . . . , αv−1 ], let θ = αj ψj . Then, θ is an error locator for e
in L(A).

Proof. First thing we want to show is that the above system of equations
has a non-trial solution. We have that (A) = v > t. Let M be the set
of error locations of e (although we do not know e, the set M exists
theoretically). Let us consider the following system of equations:

ψj (Pk )αj = 0, Pk ∈ M.
j

There are t equations in v variables αi and v > t. Hence, there must be a



non-trivial solution [α0 , . . . , αv−1 ]. Let θ = ψj αj . Then, θ(Pk ) = 0 for
all Pk ∈ M . Therefore, θ is an error locator for e, and θ  e = 0.
The only thing we want to verify is that the set {α0 , . . . , αv−1 } satisfies
the above equation (2). It follows from Proposition 6.4 that θχ·e = 0 for all
170 Introduction to Algebraic Coding Theory

χ ∈ L(Y), especially, for all χj (∈ L(Y)). Henceforth we have,


  
Skj · rαk = Skj · eαk = ψk χj · eak = χj · (θ  e) = 0
k k k

for all j. Thus, θ is an error locator for e in L(A). Therefore, equation (2)
is satisfied by {α0 , . . . , αv−1 }.
Conversely, let the set {α0 , . . . , αv−1 } satisfy the above equation (2).

Let θ = ψj αj . It follows from Proposition 6.4 that it suffices to show
that θχ · e = 0 for all χ ∈ L(Y), especially, for all χj ∈ L(Y). We have
  
θχj · e = ψk χj · eak = Skj · eαk = Skj · rαk = 0.
k k k

Thus, any non-trivial solution of equation (2) induces an error locator. 

6.3.5. Finding the Error Vector e


The following proposition tells us how to use an error locator θ to find e.

Proposition 6.8. Assume that 1 ≤ wt(e) ≤ t. Let θ ∈ L(A) be an error


locator (of e) and M  be the set of positions i such that θ(Pi ) = 0. Then, we
have the following. (i) The cardinal number of M  , ||M  || ≤ d(A). (ii) Recall
{φ0 , . . . , φu−1 } is a basis of L(X) ⊂ L(D). Moreover, given d(X) ≥ d(A) +
2g − 1, if we solve the following system of equations:

φj (Pk )Ek = φj · r j = 0, . . . , u − 1, (3)
Pk ∈M 

where the Ek are all indeterminates, then the non-trivial solution exists
uniquely. (iii) Furthermore, the error vector e = [e1 , . . . , en ] is a solution
/ M .
set and ej = 0, for all j ∈

Proof. (i) It follows from Proposition 6.6 that since M  ⊂ support(B) and
support(A) ∩ support(B) = ∅, all elements of M  are outside the support of
A. We have ||M  || ≤ d(A). (ii) Note that L(X) ⊂ L(D) and φj · r = φj · e.
We may replace all φj · r in the above equation by φj · e. Let M = all error
locations of e, we then have M ⊂ M  . We have

φj (Pk )Ek = φj · e = φj · r.
k∈M 

Since φj (Pk )ek = 0 for all k ∈ M ⊂ M  , clearly e is a solution of the above


equation. Thus, we know the existence of the solution. We wish to show
Decoding the Geometric Goppa Codes 171

the uniqueness of the solution. Let e , e∗ be two non-trivial solutions. Then,


their difference e − e∗ satisfies the following system of equations:

φj · (e − e∗ ) = 0, j = 1, . . . , u,

which means that (e − e∗ ) ∈ Cp (B, X), which has a minimal distance
d = d(X) − 2g + 2 ≥ d(A) + 1 (by the sufficient relation (6)). On the other
hand, e , e∗ have only zero values outside M  which has a cardinality of
at most d(A). We conclude that e − e∗ = 0 or e = e∗ . Thus, we prove
the uniqueness of the solutions. It means that we may solve the system of
equations (3) to find the error vector e. (iii) They are obvious from the
above discussion. 

The preceding three propositions are the kernel of the decoding process.
Note that always we assume 1 ≤ wt(e) ≤ t. If the system of equation (2)
produced only trivial solution, then our assumption 1 ≤ wt(e) ≤ t is false,
i.e., either there is no error or there are more than t errors. The two cases
can be separated by the check procedure or by using a check matrix.
Even if it does produce an error vector e, it may be just an accident, and
we still have to check r + e to be sure. If r + e is a code word, then we
decode successively. Otherwise, if r + e is not a code word, then the decoder
fails.
The above system of equations (3) can be written in matrix form with
ik ∈ M  as
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
φ0 (Pi0 ) · · · · · · φ0 (Pim ) Ei1 φ0 · r
⎢ ··· ··· ··· ··· ⎥ ⎢ · ⎥ ⎢ · ⎥
⎢ ⎥·⎢ ⎥ ⎢ ⎥.
⎣ ··· ··· ··· ··· ⎦ ⎣ · ⎦=⎣ · ⎦ (4)
φu−1 (Pi0 ) · · · · · · φu−1 (Pim ) Eim φu−1 · r

Let Ei1 = ei1 , . . . , Eim = eim be a set of solutions of the preceding


equation. Since we assume that 1 ≤ wt(e) ≤ t always, i.e., there are at
most t errors, which may not be true in the real situation. At this stage, we
have to further check if c = r−e is a code word. We may use the check matrix
or the following procedure. Let  = (D) and {φ0 , . . . , φu−1 , . . . , φ−1 } be
a basis of L(D). Let us consider the following matrix equation:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
φu (Pi1 ) · · · · · · φu (Pim ) ei1 φu · r
⎢ ··· ··· ··· ··· ⎥ ⎢ · ⎥ ⎢ · ⎥
⎢ ⎥·⎢ ⎥=⎢ ⎥. (5)
⎣ ··· ··· ··· ··· ⎦ ⎣ · ⎦ ⎣ · ⎦
φ−1 (Pi1 ) · · · · · · φ−1 (Pim ) eim φ−1 · r
172 Introduction to Algebraic Coding Theory

If the above matrix equation is satisfied, then φj · c = φj · (r − e) = 0, for


all j = 0, . . . ,  − 1, and c is a code word. Otherwise, the decoding procedure
fails, and there are more than t errors.

6.3.6. Summary of SV Algorithm

Let C be a smooth projective curve of genus g over a finite field Fq , where


q = 2m . We assume that we have primary Goppa code (cf. Definition 5.5)
Cp (B, D) which is equivalent to CΩ (B, D) (cf. Proposition 5.7). Thus, we
have an integer n. We shall assume that n > d(D) > 0 and d(D) > g − 1;
thus, (D − B) = 0 and k = n − d(D) + g − 1 (cf Proposition 5.2). Let
the minimal distance d = d(D) − 2g + 2. Let t =  d−g−12  be the maximal
number of errors to be corrected (cf. the remark after Proposition 1.25).
Let us select auxiliary divisors U, A, Y, X. The relation between the
divisors are D  U, D  Y + A  X and (U) ≥ 2g + t − 1, (A) > t,
d(A) < d(D) − (2g − 2) − t, d(X) ≥ d(A) + 2g − 1, d(Y) ≥ t + 2g − 1,
support(Y)∩ support(B) = ∅.
Our consideration is on a one-point code of the special kind, we
may select m ≥ 3g +2t−1 and b with t+g ≤ b ≤ m−2g −t+1 (according to
the lemma in Section 6.3, such b exists). Let P be a rational point disjoint
from B, D=mP, A=bP, U=Y=(m − b)P, X=D. It is easy to see that
all numerical inequalities of this section are satisfied. This SV algorithm
can decode up to  d−g−12  errors, with a speed depending on the speed of
solving a system of linear equations.
If we receive a word r, we use the agreed check matrix or the check
procedure to test if r is a code word. If it is a code word, then we
pass to the next block. If it is not a code word, then using L(U), we
go through the syndrome calculation (Proposition 6.5). Or we may go
to syndrome calculation directly without using the check matrix or the
check procedure first. If the syndromes with respect to L(U) are 0’s, then
either r is a code word or there are more than t errors. A further test
using the check procedure or the check matrix on the received word
r will distinguish those two situations. If the syndromes with respect to
L(U) are not all 0’s, solve the system of equations (2) of Proposition 6.7 to
find an error locator θ. Using the error locator θ to determine the possible
set of errors M  (cf. Proposition 6.6) (which may have more than all error
locations of e while with a cardinality ≤ d(A)). Then, we solve the system
of equations (3) of Proposition 6.8 to find the error vector e. At this stage,
we have to further check if c = r − e is a code word. We may use the
Decoding the Geometric Goppa Codes 173

check matrix or let  = (D) and {φ0 , . . . , φu , . . . , φ−1 } be a basis of


L(D). Let us consider the following matrix equation (from equation (5)):
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
φu (Pi1 ) ··· · · · φu (Pim ) ei1 φu · r
⎢ ··· ··· ··· ··· ⎥ ⎢ · ⎥ ⎢ · ⎥
⎢ ⎥·⎢ ⎥=⎢ ⎥.
⎣ ··· ··· ··· ··· ⎦ ⎣ · ⎦ ⎣ · ⎦
φ−1 (Pi1 ) ··· · · · φ−1 (Pim ) eim φ−1 · r

If the above matrix equation is satisfied, then φi · c = φi · (r − e) = 0


for i = u, . . . ,  − 1. Furthermore, φi · c = φi · (r − e) = 0 f or i = 1, . . . , u.
Therefore, c is a code word for all i, and c is a code word. Otherwise, the
decoding procedure fails, and there are more than t errors.

Let us consider the following example.

Example 1: Let us consider the Klein quartics projective curve over F24
defined by the equation x3 y + y 3 + x = 0. We shall consider one-point
code with genus g = 3. Let us take t = 3, b = 6, m = 14 = d(D). Then,
all numerical conditions are satisfied. This code is with n = 16, k = 4 =
n − d(D) + g − 1, d = d(D) − 2g + 2 = 10.

Pre-computation: All pre-computations are carried out before decoding;


hence, they will not be counted in computing time.

(1) Let us count the number of rational points. There are 17 rational points
{P1 , P2 , . . . , P16 , P} over F24 . (cf. Example 36 in Section 4.9. We
shall use the notations of that example). We take B = P1 + · · · + P16
and D = 14P, where P = the origin.
(2) We take A = 6P, U = Y = 8P, X = 14P. It is easy to check that
d(D) = 14, d(A) = 6, d(Y) = 8, d(X) = 14 ≥ 2 · g − 1 = 5; therefore, it
follows from Proposition 4.28 (Riemann’s theorem) that (D) = d(D)+
1 − g, k = n − d(D) − 1 + g and D  U, D  A + Y  X.
(3) By direct computation, we know the rank of this code is n − d(D) + g −
1 = 4 (cf. Exercises 4 and 5), and the designed minimal distance
d ≥ d(D) − 2g + 2 = 10. The SV algorithm will correct  d−g−1 2 ≥3
errors.
(4) We compute a basis of L(D) = L(14P). It is easy to see that the
following {f0 , f3 , f5 , f6 , f7 , f8 , f9 , f10 , f11 , f12 , f13 , f14 } form a basis:

f0 f3 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14


2 2 2
1 y 1 y y 1 y y 1 y y
1 x x2 x2 x3 x3 x3 x4 x4 x4 x5 x5 .
174 Introduction to Algebraic Coding Theory

Note that ordP (x) = 3, ordP (y) = 1, and ordP (fi ) = −i. The reasons that
they form a basis are the following: (1) fi ∈ L(14P); (2) they are linearly
independent over F24 ; (3) by Riemann’s theorem, (14P) = 14 + 1 − g = 12.
We shall compute the following 12 × 16 matrix C:
⎡ ⎤
f0 (P1 ) · · · · · · f0 (P16 )
⎢ ⎥
⎢ f (P ) · · · · · · f3 (P16 ) ⎥
C =⎢ 3 1 ⎥.
⎣ ··· ··· ··· ··· ⎦
f14 (P1 ) · · · · · · f14 (P16 )

It is easy to see that the first 4, f0 , f3 , f5 , f6 , of {fi } form a basis for


L(A) = L(6P) and hence v = 4, and the first 6, f0 , f3 , f5 , f6 , f7 , f8 , of {fi }
form a basis for L(Y) = L(U) = L(8P) and hence w = 6.
We shall further compute fk fj for fk ∈ L(A) = L(6P) and fj ∈ L(Y) =
L(8P). We have the following relations:

fk fj = fk+j for (k, j) = (5, 7),


  
y y2 1 y
f5 f7 = f12 + f5 = · = 4+ 2 .
x2 x3 x x

Note that the last equation is equivalent to the defining equation of the
curve x3 y + y 3 + x = 0. We compute the generator matrix and the check
matrix which are 16 × 4 and 4 × 16 matrices, respectively (cf. Exercises 4
and 5).

6.3.7. Syndrome Calculation

Let us consider the received word r, code word c, and error word e. We
compute fi · r for fi = χi ∈ L(Y) = L(U) = L(8P), i.e.,
⎡ ⎤
f0 (P1 ) ··· · · · f0 (P16 ) ⎡ r1 ⎤ ⎡ f0 · r ⎤
⎢ ⎥
⎢f3 (P1 ) ··· · · · f3 (P16 )⎥ ⎢ r2 ⎥ ⎢ ⎥
⎥ = ⎢ f3 · r ⎥ .
⎢ ⎥·⎢
⎣ ⎦ ⎣ ⎦
⎣ ··· ··· ··· ··· ⎦ · ·
f8 (P1 ) ··· · · · f8 (P16 ) r16 f 8 · r

If fi · r = 0 for i = 0, 3, 5, 6, 7, 8, then either there are more than t


errors or r is a code word. Only a complete checking procedure can tell the
difference. The total number of multiplications is 6 × 16 = 96. To finish the
checking procedure, We have to compute the following for the remaining
Decoding the Geometric Goppa Codes 175

fi ∈ L(D):
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
f9 (P1 ) · · · · · · f9 (P16 ) r1 f9 · r
⎢ ⎥
⎢f10 (P1 ) · · · · · · f10 (P16 )⎥ ⎢ r2 ⎥ ⎢ ⎥
⎥ = ⎢ f10 · r ⎥.
⎢ ⎥·⎢
⎣ ⎦ ⎣
⎣ ··· ··· ··· ··· ⎦ · · ⎦
f14 (P1 ) · · · · · · f14 (P16 ) r16 f14 · r
The total number of computations for the case that there is no error or
there is at least one error is 12 × 16 = 192.
Note that fj = ψj ∈ L(A) = L(6P) for j = 0, 3, 5, 6. Further, note that
fk = ϕk ∈ L(X) = L(D) = L(14P) for k = 0, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
and ψk χj = fk fj for k = 0, 3, 5, 6, j = 0, 3, 5, 6, 7, 8.

6.3.8. A Concrete Example of Transmission


We shall consider a concrete example. Let us use the notations of the
numerical computations involved in F24 ; let α, β ∈ F24 defined by α2 +
α + 1 = 0, β 2 + β + α = 0. It is not hard to see that F22 = F2 [α] and
F24 = F22 [β]. Let us consider a general number aαβ + bβ + cα + d, where
a, b, c, d are 0 or 1. Let us first represent it as [abcd] a four bits number and
then represent the four bits number [abcd] as an integer a23 + b22 + c2 + d
in the binary expression. So, all four bits numbers become integers between
0 and 15. Let us order the points {Pi } by the indices. Let us consider a
concrete example. Certainly, in the real case, we only know the received
word. However, for the convenience of discussion, let the code word c, error
word e, and received word r be as follows:
c = 12 12 10 9 2 1 13 3 6 9 2 15 1 0 5 0,
e= 0 0 0 0 0 0 0 0 0 0 0 0 1 14 0 15,
r = 12 12 10 9 2 1 13 3 6 9 2 15 0 14 5 15.
We use L(8P) for syndrome calculation. The received word r will not
pass the syndrome calculation, for instance, f0 · r = 1 · r the sum of r-row
as elements in the field which is = 0.
Finding an Error Locator (cf. Proposition 6.7):
The syndrome table [Sk,j = [Skj ·r]] can be formulated at once as follows:
k\j 0 3 5 6 7 8
0 8 11 5 15 2 1
3 11 15 1 3 9 7
5 5 1 9 7 11 6
6 15 3 7 14 6 13
176 Introduction to Algebraic Coding Theory

We have to solve the following system of equations to find an error


locator θ:
 
Sk,j αk = (Skj · r)αk = 0, j = 0, 3, 5, 6, 7, 8.
k=0,3,5,6 k=0,3,5,6

Explicitly, we have
8α0 + 11α3 + 5α5 + 15α6 = 0,
11α0 + 15α3 + 1α5 + 3α6 = 0,
5α0 + 1α3 + 9α5 + 7α6 = 0,
15α0 + 3α3 + 7α5 + 14α6 = 0,
2α0 + 9α3 + 11α5 + 6α6 = 0,
1α0 + 7α3 + 6α5 + 13α6 = 0.
A non-zero solution of the six equations are α0 = 3, α3 = 8, α5 = 6,
α6 = 1. Note that the multiplication and addition are not the usual ones
between integers. They are the ones for the field elements. For instance,
2 + 3 = α + α + 1 = 1 = 5 and 2 × 3 = α × (α + 1) = 1 instead
of 6. The fastest way of solving the small size system of linear equations
is still Gaussian elimination. The number of multiplications involved is
62 × 4/3 = 48. The corresponding error locator is θ = 3f0 + 8f3 + 6f5 + f6 .
Let us find the zero set of θ. We use the following table with the rows
of f0 , f3 , f5 , f6 pre-computed (say, we want to compute f5 (P6 ), we have
f5 (P6 ) = αβ+β+α+1 α2 = β + 1 = 5), and the values of θ is computed by the
formula θ = 3f0 + 8f3 + 6f5 + f6 . We have the following table of values
(Table 6.1) to help us do the computation.
To find the zero set, we only have to add the row vectors
[fj (P1 ), . . . , fj (P16 )] with the produced coefficients and observe the result-
ing 0 coordinates. For instance, the summation corresponding to P14 is
3 · 1 + 8 · 12 + 6 · 4 + 11 which written in terms of α, β with α2 + α + 1 = 0,

Table 6.1.
P1 , P2 , P3 , P4 , P5 , P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16
f0 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
f3 0, 0, 3, 2, 3, 3, 2, 2, 14, 13, 10, 8, 15, 12, 9, 11
f5 0, 0, 1, 1, 4, 5, 6, 7, 12, 15, 9, 11, 5, 4, 7, 6
f6 0, 0, 2, 3, 2, 2, 3, 3, 8, 10, 14, 13, 9, 11, 12, 15
θ 3, 3, 3, 10, 4, 15, 5, 8, 4, 4, 2, 11, 0, 0, 9, 0
Decoding the Geometric Goppa Codes 177

β 2 + β + α = 0 are (α + 1) + αβ(αβ + β) + (α + β)β + (αβ + α + 1) =


β 2 (α2 + α + 1) = 0. In general, it takes 4 × 16 = 64 multiplications to find
the zero set of θ. So, it is fast. For the present case, the zero set M  is
{P13 , P14 , P16 }.

Finding the Error Vector e (cf. Proposition 6.8):

We have to solve the last set of the first six equations:

1E13 + 1E14 + 1E16 = 0,


15E13 + 12E14 + 11E16 = 8,
5E13 + 4E14 + 6E16 = 7,
9E13 + 11E14 + 15E16 = 3,
10E13 + 8E14 + 8E16 = 2,
13E13 + 13E14 + 10E16 = 4.

The matrix form of the above system of equations is as follows:


⎡ ⎤ ⎡ ⎤
1 1 1 0
⎢15 12 11⎥ ⎡ ⎤ ⎢8⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ E13 ⎢ ⎥
⎢5 4 6⎥ ⎣ ⎢7⎥
⎢ ⎥ · E14 ⎦ = ⎢ ⎥.
⎢ 9 11 15⎥ ⎢3⎥
⎢ ⎥ E16 ⎢ ⎥
⎣10 8 8 ⎦ ⎣2⎦
13 13 10 4

According to our Proposition 6.8, there will be an unique non-trivial


solution. It is easy to see that E13 = 1, E14 = 14, E16 = 15 satisfy
all equations. For this step, usually it suffices to look at the first three
equations. The number of multiplications needed is 9.

6.3.9. Recovering the Original Message

We have to further check if the code vector c is really a code vector. For that
  
purpose, note that fi ·r = fi (Pk )rk = k∈M fi (Pk )rk + k∈ M fi (Pk )rk
  
and fi ·c = fi (Pk )ck = k∈M fi (Pk )ck + k∈ M fi (Pk )ck , and e = c−r =
c + r (since the characteristic is 2), and for k ∈ M, fi (Pk )rk = fi (Pk )ck ,
and for k ∈ M, fi (Pk )rk +fi (Pk )ck = fi (Pk )ek . Further, note that fi ·c = 0

(which is what we want to prove!) if and only if we have k∈M fi (Pk )ek =
fi · r. Furthermore, for i = 0, 3, 5, 6, 7, 8, the equations have been checked,
178 Introduction to Algebraic Coding Theory

and we have the following system of remaining equations in matrix form:


⎡ ⎤ ⎡ ⎤
10 8 14 6
⎢4 5 6⎥ ⎢ ⎥
⎢ ⎥ ⎡ 1 ⎤ ⎢ 11 ⎥
⎢ ⎥ ⎢ ⎥
⎢5 4 7 ⎥ ⎣ ⎦ ⎢ 11 ⎥
⎢ ⎥ · 14 = ⎢ ⎥.
⎢11 9 15⎥ ⎢ 6 ⎥
⎢ ⎥ 15 ⎢ ⎥
⎣15 12 9 ⎦ ⎣ 13 ⎦
6 7 5 4

For instance, for the first row, we have

10 · 1 + 8 · 14 + 14 · 15
= (αβ + α) + (αβ)(αβ + β + α) + (αβ + β + α)(αβ + β + α + 1)
= β 2 + α2 + (αβ + 1)(β + α) + αβ + αβ + α
= β + α = 6.

Indeed, all equations are satisfied which implies that fi · c = 0 for all
0 ≤ i ≤ 14. We conclude that c is a code word, and we decode successfully.
Note that if the above equations are not satisfied, then our decoder fails.
For this step, it takes 18 multiplications.
The total number of multiplications needed is as follows: (1) If there is
no error, (A) we use the check matrix, then it is 64; (B) otherwise, we use
the syndrome calculation, then it is 192. (2) If there are less than t errors,
then it is 192 + 48 + 9 + 64 + 18 = 331. (3) If the decoder fails, then it
is 96 + 48 + 9 + 64 + 18 = 235. We are processing 4 letters which is 16
bits. Per bit, we have (1): (A) 4 multiplications, (B) 12 multiplications;
(2) 20.69 multiplications; (3) 14.69 multiplications. The maximal number
of multiplications for 16 bits (k = 4 and each block contains 4 message
symbols) is 32.69 which means 2.04 per bit. A modern computer of 500
mhz can correct at least 15 million blocks or 240 million bits per second. 

Remark: The main mistake made by some books about the SV algorithm
is in the syndrome calculation. It is taken to be Skj · r = ψk χj · r. If all the
outcomes are zeroes, then it is faultily claimed that there is no syndrome
and no error.
The true syndrome calculation uses Proposition 6.4. If there is a received
word r that satisfies χ · r = 0 for all χ ∈ L(U), where D  U and (U) ≥
t + 2g − 1, then either there are more than t errors or there is no error. The
only way to show that there is no error is by using the check matrix or by
computing φj · r = 0 for a basis {φj } of L(D). 
Decoding the Geometric Goppa Codes 179

Exercises

(1) Show that [1, 1, 1, 2, , 15, 12, , 8, 2, , 8, 6, 14, 11, 0, 1, 0, 0] is a code word in
Example 1.
(2) Show that [5, 5, 15, 9, 3, 2, 14, 4, 12, 10, 14, 4, 0, 0, 1, 0] is a code word in
Example 1.
(3) Show that [1, 1, 3, 1, 3, 13, 9, 11, 8, 15, 5, 13, 0, 0, 0, 1] is a code word in
Example 1.
(4) Find a generator matrix for the code in Example 1.
(5) Find a check matrix for the code in Example 1.
(6) Write a computer program to decode the code in Example 1.
(7) Write down the details of decoding the geometric Goppa code
Cp (B, 37P) with d(B) = 64 based on x50 + x51 + x52 with the ground
field F24 (cf. Example 35 in Section 4.9).
(8) Write a computer program to decode the code in Exercise 4.

6.4. DU Algorithm

Due to the fact that the preceding SV algorithm only decodes up to


 d−g−1
2  errors instead of the designed number  d−1 2  (cf. the remark
after Proposition 1.25) for a geometric Goppa code based on a smooth
projective curve C with genus g, and the best geometric Goppa codes
are based on smooth projective curves with large genus g, note that then
 d−g−1
2  <<  d−12 ; the shortcoming is serious sometimes. Note that if
g = 0, there is no difference between  d−g−1
2  and  d−1
2 , and if g = 1, then
there is small difference. In general, we assume that g ≥ 2. Feng and Rao
found a way to get around it by majority voting. Duursma follows their
idea and the idea of Weierstrass gaps to extend the SV algorithm to the
DU algorithm which will correct errors up to the designed power of  d−1 2 .
The DU algorithm is more complicated than the SV algorithm.
Summary of DU Algorithm

6.4.1. Pre-computation
Let C be a smooth projective curve of genus g (g ≥ 2) over a finite field Fq ,
where q = 2m . We assume that we have primary Goppa code (cf. Definition
5.5) Cp (B, D) which is equivalent to CΩ (B, D) (cf. Proposition 5.7). Thus,
we have an integer n. We shall assume that n > d(D) > 0 and d(D) > g − 1
180 Introduction to Algebraic Coding Theory

and thus (D − B) = 0 and k = n − d(D) + g − 1 (cf. Proposition 5.2). Let d


the minimal distance = d(D) − 2g + 2. Let us define t =  d−1 2 . We have the
following list of relations: (1) Let D be a divisor with d(D) ≥ (2g +2t−1)
(we may take D = d(D)P). (2) Let D = D − (d(D) − 2g − 2t + 1)P (if
D = d(D)P, then D = (2g + 2t − 1)P). Then, we have d(D ) = 2g + 2t − 1
and D  D . Therefore, L(D − B) ⊃ L(D − B), and every code word in
Cp (B, D) is a code word in Cp (B, D ). We decode words in Cp (B, D ).
One way to think about it is the sender and the receiver should agree on the
divisor D instead of D at the beginning. (3) The rank k = n − d(D) + g − 1
(note that according to Proposition 4.28 (Riemann’s theorem, if d(D) ≥
2g − 1, then we have (D) = d(D) + 1 − g, and k = n − (D) = n −
d(D) + g − 1). The minimal distance is d = d(D ) − 2g + 2, and the DU
algorithm will correct t =  d−12  errors (see remark in Section 6.6).
Now, we shall choose (4) an auxiliary divisor A with d(A) = t and with
support of A disjoint from the support of B (say, A = tP).
The secondary auxiliary divisor we use is (5) A , where A is defined
by A = D − A − (2g − 1)P (if A = tP and D = d(D)P, then D =
(2t + 2g − 1)P, A = tP = A). Then, d(A ) = t. Note that the important
divisor D = A + A + (2g − 1)P. We shall study L(A + iP), L(A + iP)
and L(A + A + iP) for i = 1, . . . , 2g − 1 (see the following).
In this section and the next section, we shall assume that the above
list of relations are satisfied. The syndrome calculation will not be affected
(cf. Section 6.2); we just compute φi · r, where φi runs through a basis of
L(D ) = L(A + A + (2g − 1)P) ⊂ L(D) and r is the received word. As
usual, if the results of the syndrome calculation are zeroes, then either r is
a code word or there are more than t errors. We use the check procedure or
the check matrix to tell the difference. The problem is that if the result of
the syndrome calculation is not zero, then we have to find an error locator
θ. Once we find the important error- locator θ, Proposition 6.13 (see below)
will lead us to decoding. We shall emphasize to find θ.

6.4.2. One-Point Code

Let us consider one-point codes of a special kind.


Let us consider a one-point code Cp (B, mP), where m = 2g + 2t − 1.
Note that for SV Algorithm, we need m ≥ 3g +2t−1, i.e., in that case, we
need g more numerals for codes for every block. We may simplify the above
list of relations to (1) D = (2g + 2t − 1)P, (2) D = D = (2g + 2t − 1)P,
(3) the minimal distance d = d(D) − 2g + 2, (4) A=tP, and (5) A =A.
Decoding the Geometric Goppa Codes 181

Under further study, what we really need are only (1) and (4), which we
shall use in our study of example. Note that d = m − 2g + 2 = 2t + 1
and t = d−1 d−1
2 =  2 , which according to the remark after Proposition
1.25 is the best we can do. Comparing with SV algorithm, the value of
m = 2g + 2t − 1 is much smaller than the value 3g + 2t − 1 for SV algorithm.
The difference of the sizes is g. We have to use the Feng–Rao’s majority
voting to create g lines of values for the decoding purpose.

6.4.3. Ordering of Basis


For the convenience of ordering the basis of L(A + (2g − 1)P) or L(A +
(2g − 1)P), we shall generalize the classic orderings for Weierstrass gaps.
Let A0 = A−tP. Then, d(A0 ) = 0 and A+(2g−1)P = A0 +(2g+t−1)P. In
the following proposition, we study L(A+(2g−1)P). A similar phenomenon
for L(A + (2g − 1)P) is claimed verbatim.

Proposition 6.9. We have (A0 + (2g + t − 1)P) = g + t. We may select


a basis {ψi } for L(A0 + (2g + t − 1)P), where 0 ≤ i ≤ 2g + t − 1 with
g of them missing (so, there are precisely g + t of them) such that ψi ∈
L(A0 + iP)\L(A0 + (i − 1)P), for 0 ≤ i ≤ (2g + t − 1).

Proof. It follows from Proposition 4.30 that, since for any canonical divisor
W, W − (A0 + (2g + t − 1)P) is a divisor of negative degree, then (W −
(A0 +(2g +t−1)P)) = 0. Then, it follows from the Riemann–Roch theorem
that (A0 + (2g + t − 1)P) = 2g + t − 1 + 1 − g = g + t. The other part of
the proposition follows from Proposition 4.28 since ψi ∈ L(A0 + iP) and
(A0 + iP) − (A0 + (i − 1)P) ≤ 1. 

It follows from the preceding proposition that we may define as follows.

Definition 6.10. For any rational function θ ∈ L(A0 + (2g + t − 1)P), we


define νA (θ) = min{i : θ ∈ L(A0 + iP)} and call the g missing νA (θ) the
Weierstrass gaps of A. We shall call νA (θ) the Weierstrass index of θ. 

Later, we define the matrix [Si,j ] with Si,j = ψi χj · e, where i =


Weierstrass index of ψi and j = Weierstrass index of χj .

Proposition 6.11. For a rational function θ ∈ L(A0 + (2g + t− 1)P) with



θ = bi ψi + k<i bk ψk and bi = 0, we have νA (θ) = i.

Proof. It follows from the previous proposition. 


182 Introduction to Algebraic Coding Theory

Remark: The classic ordering and the one-point code mentioned above in
the subsection are the ones with A0 = 0 and νA (ψi ) = − ordP (ψi ). 

Let us write A = A(Qi )Qi with A(Qi ) is the corresponding
coefficient of Qi . We have the following proposition.

Proposition 6.12. For any rational function θ ∈ L(A0 + (2g + t − 1)P) =


L(A + (2g − 1)P), we have νA (θ) = d(A) − A(P) − ordP (θ) (for notation,
see the preceding paragraph) and given νA (θ) = i, νA (θ ) = j, then
νA+A (θθ ) = i + j.

Proof. Let A = E + sP. Then, A0 = E + (s − t)P and we have

θ ∈ L(A0 + iP)
⇔ θ ∈ L(E + (i + s − t)P)
⇔ − ordP (θ) ≤ (i + s − t)
⇔ − ordP (θ) − s + t ≤ i.

Apparently, the minimal possible value for the last inequality is − ordP (θ)−
s + t = d(A) − A(P ) − ordP (θ).
Furthermore, if θ ∈ L(A0 + iP)\L(A0 + (i − 1)P), and θ ∈ L(A 0 +
jP)\L(A 0 + (j − 1)P), then θθ ∈ L(A0 + A 0 + (i + j)P)\L(A0 + A 0 +
(i + j − 1)P). 

6.4.4. Syndrome Table

Similarly, we shall construct bases {ψi } for L(A + (2g − 1)P), {χi } for
L(A +(2g−1)P), where 0 ≤ i ≤ 2g+t−1 and {φi } for L(A+A +(3g−1)P),
where 0 ≤ i ≤ 3g + 2t − 1, respectively. Let A0 = A − tP and ψi ∈
L(A0 +iP)\L(A0 +(i−1)P). Note that D = A+A +(2g −1)P; therefore,
we construct more φi than needed for coding purposes. Let us consider a
special kind of one-point code A = A = tP. Then, we have A0 = 0,
{χi = ψi }, {φi } basis for L((2g +t−1)P) and L((3g +2t−1)P), respectively.
Further, ψi ∈ L(iP)\L((i − 1)P), and D = (2g + 2t − 1)P.
We pre-compute the following relations:

ψi χj = ai,j φi+j + bi,j,k φk ,
k<i+j

where ai,j = 0, without losing generality, we may assume ai,j = 1.


Decoding the Geometric Goppa Codes 183

Let {φi : 0 ≤ i ≤ (2g + 2t − 1)} = d(D ). We construct the following


syndrome table which is a (g + t) × (g + t) matrix S. Customarily, we use
the (i, j) entry (where i, j are not Weierstrass gaps) to denote
Si,j = ψi χj · e.
Note that they are not the usual sub-indices, i, j are their Weierstrass
indices. We have that in every row and column, there are g indices missing.
Since we do not know e, the above matrix only exists theoretically. However,
if 0 ≤ i, j ≤ 2g + t − 1 and i + j ≤ 2g + 2t − 1, then φi,j ∈ L(D ) and
Si,j = ψi χj · e = ψi χj · r,
which is computable and known. For i + j > 2g + 2t − 1, the terms Si,j
are unknown; the reason is that we do not require ψi χj · c = 0. Hence,
ψi χj · e may not equal ψi χj · r, and we cannot use ψi χj · r to replace ψi χj · e.
Furthermore, as suggested by the SV Algorithm, it would be better to
compute ψi χj · e for i + j up to 3g + 2t − 1. Note that 3g + 2t − 1 −
(2g + 2t − 1) = g. What we plan to do is using Feng–Rao’s majority
voting method (see Section 6.6) to find the next g values of ψi χj · e.
Temporarily, we put all Si,j , known or unknown, in a matrix form, and it
is called the syndrome table. It is of the following (g + t) × (g + t) square
form with ∗ known numbers, ? unknown numbers:
g+t
⎡  ⎤
∗ ··· ··· ∗ ∗
⎢∗ · · · ··· ∗ ∗⎥
⎢ ⎥
⎢∗ · · · ··· ∗ ∗⎥
⎢ ⎥
⎢ ⎥
⎢∗ · · · ··· ∗ ∗⎥.
⎢ ⎥
⎢∗ · · · ··· ∗ ?⎥
⎢ ⎥
⎣∗ · · · ∗ ? ?⎦
∗ ···∗ ? ? ?
The indices of the top row (or the leftmost column) of the above table
are from 0 to 2g + t − 1 which are the order of the functions from 0 to
2g + t − 1. Recall that there are g numbers missing. Therefore, there are
2g + t − 1 + 1 − g = g + t terms. So, the above is really a (g + t) × (g + t)
matrix.

6.4.5. The Construction of Complete Syndrome Table


To begin with, we know the syndrome table {Si,j } for all 0 ≤ i, j ≤
2g + t − 1, i + j ≤ 2g + 2t − 1. So, we know a part of this syndrome table
bounded by a line i + j = 2g + 2t − 1.
184 Introduction to Algebraic Coding Theory

In general, we shall build the syndrome table line by line, for all i, j
with 0 ≤ i, j ≤ 2g + t − 1 from the line i + j = 2g + 2t − 1 + s to the
next line i + j = 2g + 2t − 1 + s + 1 for s = 0, . . . , g − 1. Every time
after we construct Si,j (see the next section) for all i, j with 0 ≤ i, j ≤
2g + t − 1, i + j = 2g + 2t − 1 + s + 1 for s = 0, . . . , g − 1, we try to find an
error locator θ row-wise or column-wise. There are two possibilities: either
(1) we find one error locator (then we proceed to decode) or (2) we cannot
find one (then, we push to find the Si,j on the next line using the materials
of Feng–Rao’s majority voting of the next section). We shall handle case (1)
in this section. Let us start with s = 0.

6.4.6. The Construction of Error Locator: Step s = 0

As always, we assume that there is an error vector e with 1 ≤ wt(e) ≤ t,


while completely unknown to us. With this meager information of knowing
only the first few rows and columns (for Weierstrass indices (i, j), they are
with ≤ t, j ≤ 2g + t − 1) or i ≤ 2g + t − 1, j ≤ t completely, we may consider
the first case (do the column-wise first), we may be able to find an error
locator θ row-wise, i.e., solving the following system of equations for {αi },
say, the non-zero solutions form a set {αi }.

Si,j αi = 0, j ≤ 2g + t − 1. (2 )
i≤t

Note that {Si,j } are known for i + j ≤ 2g + 2t − 1, the above system of


equations has more equations (2g + t − 1) than variables (t). So, in general,
there may not be a non-zero solution. We assume that there is a non-zero
solution, then the non-zero solution {α } can be extended to a solution
set of


2g+t−1
Si,j αi = 0 j ≤ 2g + t − 1(= w) (2)
i=0

by assigning the extra αi = 0, where i ≤ 2g + t − 1.


 
Let us still call the solution set {αi }. Let θ = αi ψi . It follows from
Proposition 6.4 that θ is an error detector if we replace Y in that proposition
by A = A0 + (t + 2g − 1)P. Note that we assume that there is a non-zero
solution {αi }. Similarly, we may solve the system of equations column-wise
to try to find an error locator if we can (if we cannot find error locator
either ways for all points (i, j) on the line i + j = 2g + 2t − 1, then use
Decoding the Geometric Goppa Codes 185

the materials of the next section to construct the syndrome table of the
next line i + j = 2g + 2t − 1 + 1). If we can find an error locator θ this
way (or row-wise), then we may use the following proposition to solve the
decoding problem.

Proposition 6.13. Assume that 1 ≤ wt(e) ≤ t. Then, we have the


following: (1) If the above system of equations (2 ) has a non-trivial solution

{αj } for j ≤ t, then θ = 
j≤t αj ψj is an error locator. Conversely, if
 
we suppose that there is an error locator θ = j≤t αj ψj , then the set
{αj } is a non-trivial solution of the system of equations (2 ). (2) Suppose


that there is an error locator θ ∈ L(A), where A = A0 + tP. Let the set
M  = {Pi : θ(Pi ) = 0}. Then, its cardinality is at most t, and the following
system of equations

φi (Pk )Ek = φi · r(= φi · e), i ≤ 2g + 2t − 1 (3 )
Pk ∈M 

has a unique non-zero solution e |M  , where e |M  is the restriction of


e to the coordinates determined by M  . Note that outside the positions
determined by M  , e has coordinates zeroes, so we can find e by the preceding
information.

Proof. (1) The first part follows from the discussions before the proposi-
tion which show the existence of the error locator θ. Let us prove the second

part of (1). We have θe = 0. Hence, 0 = χj ·(θe) = i≤t αi ψi ·(χj e) =
 
 
i≤t αi ψi χj · e = i≤t αi Si,j .
(2) Since θ ∈ L(A), it follows from Proposition 6.6 that the cardinality
of M  ≤ d(A) = t.
We have
  
φi (Pk )Ek + φi (Pk )Ek = φi (Pk )Ek
 M
Pk∈ Pk ∈M  Pk ∈M 

since ek = 0 for all Pk ∈ M  . Clearly, e |M  = e is a solution of (3 ). Let e∗


be another solution. Then, the weight of e − e∗ ≤ t, and e − e∗ will be a
solution of the homogeneous system of equations

φi (Pk )Ek = 0, i ≤ 2g + 2t − 1. (5 )
Pk ∈M 

It means that e − e∗ is a code word ∈ Cp (B, A + A + (2g − 1)P), while


Cp (B, A+A +(2g −1)P)) has a minimal distance ≥ 2g +2t−1−(2g −2) =

2t + 1 > t ≥ (wt(e − e∗ )). Therefore, e − e∗ = 0, and the solution is unique.
186 Introduction to Algebraic Coding Theory

It means that we may solve the system of equations (3 ) to find the error
vector e. 

6.4.7. The Construction of Error Locator: Step s + 1 < g


Assume that 1 ≤ wt(e) ≤ t. Suppose that by induction, we have the terms
Si,j for i ≤ t + s and j ≤ 2g + t − 1. We shall look for an error locator.
Suppose that we find a non-zero solution {αi } of the following system of
equations (if we cannot find error locator either ways for all points (i, j)
on the line i + j = 2g + 2t + s − 1, then use the materials of the next section
to construct the syndrome table of the next line i + j = 2g + 2t + s − 1 + 1):

Si,j αi = 0, j ≤ 2g + t − 1. (2 )
i≤t+s

Note that the number of equations is in general greater than the number
of variables, and there are no non-zero solutions in general. The non-zero
solution {α } can be extended to a solution set of
v

Si,j αi = 0 j ≤ w(= 2g + t − 1) (2)
i=0

by assigning the extra αi = 0, where v = t + 2g − 1. Let us still call the


 
solution set {αi }. Let θ = αi ψi , then θ is an error locator in L(A0 +
(s + t)P) (cf. Proposition 6.4). The following proposition will guarantee
that we will be able to decode the message.

Proposition 6.14. Assume 1 ≤ wt(e) ≤ t. Then, we have the following.


(1) If the above system of equations (2 ) has a non-trivial solution {αi } for

i ≤ t + s, then θ = j≤t+s αi ψi is an error locator. Conversely, if there is

an error locator θ = i≤t+s αi ψi , then the set {αi } is a non-trivial solution
of the system of equations (2 ). (2) Let the set M  = {Pi : θ(Pi ) = 0}.
Then, its cardinality is at most s + t, and the following system of equations

φ (Pk )Ek = φ · r(= φ · e),  ≤ 2g + 2t + s (3 )
Pk ∈M 

has a unique non-zero solution e |M  , where e |M  is the restriction of


e to the coordinates determined by M  . Note that outside the positions
determined by M  , e has coordinates zeroes, so we can find e by the preceding
information.

Proof. (1) The discussions before the proposition show the existence of the
error locator θ. Let us prove the second part of (1). We have θe = 0. Hence
  
0 = χj · (θ  e) = i≤t+s αi ψi · (χj  e) = i≤t+s αi ψi χj · e i≤t+s Si,j .
Decoding the Geometric Goppa Codes 187

(2) Since Si,j = ψi χj · r are all known for 0 ≤ i, j ≤ 2g + t − 1 and


0 ≤ i + j ≤ 2g + 2t + s. Furthermore, by our pre-computation, we have

ψi χj · r = φi+j · r + bi,j,k φk · r,
k<i+j

then φ · e are all known for  ≤ 2g + 2t + s. Since θ ∈ L(A + sP), then


it follows from 6.6 proposition that the cardinality of M  ≤ d(A + sP) =
(s + t).
We have
  
φi (Pk )Ek + φi (Pk )Ek = φi (Pk )Ek
 M
Pk∈ Pk ∈M  Pk ∈M 

since ek = 0 for all Pk ∈ M  . Therefore, the above equation (3 ) has a


solution. Clearly e |M  = e is a solution of (3 ). Let e∗ be another solution.
Then, the weight of e − e∗ ≤ (s + t + 1), and e − e∗ will be a solution of
the homogeneous system of equations:

φi (Pk )Ek = 0, i ≤ 2g + 2t − 1. (5 )
Pk ∈M 

It means that, e − e∗ is a code word ∈ Cp (B, A + A + (2g + s)P), while


Cp (B, A + A + (2g + s)P) has a minimal distance ≥ 2g + 2t + s − (2g − 2) =
 
2t + s + 2 > t + s + 1 ≥ wt(e − e∗ ). Therefore, e − e∗ = 0, and the solution
is unique. It means that we may solve the system of equations (3 ) to find
the error vector e. 

Remark: In Proposition 6.20, we prove the theorem of majority voting.


We need the statement that there is no error locator θ with νA (θ) ≤ t + s,
while the above proposition is one of a sequence of propositions that if
there is a error locator θ with νA (θ) ≤ t + s, then the decoding problem
can be solved. After we proved the sequence of propositions to that effect,
then the only case which require our attention will have no error locator θ
with νA (θ) ≤ t + s. Hence, part of the assumptions of Proposition 6.20 is
justified. 

6.4.8. The Construction of Error Locator:


Final Step s + 1 = g
This is the final step in our discussion of the usage of syndrome table.
Assume that 1 ≤ wt(e) ≤ t. Although we do not know e, we do know
sufficient many Si,j = ψi χj ·e. We first show that there is an error locator θ.
Then, we expect to find the error word e.
188 Introduction to Algebraic Coding Theory

Due to the restrictions on the indices 0 ≤ i, j ≤ 2g + t − 1 and i + j =


2g + 2t − 1 + (g − 1) + 1 = 3g + 2t − 1. Therefore, the syndrome table can
be constructed partially up to the first Weierstrass indices ≤ 2g + t − 1 (or
up to the second indices ≤ 2g + t − 1), and i + j ≤ 3g + 2t − 1 and the value
φi · e can be found for all 0 ≤ i ≤ 3g + 2t − 1.
Let M be the set of error locations of e (although we do not know
e, the set M exists theoretically). Let us consider the following system of
equations:

ψi (Pk )αi = 0, Pk ∈ M.
i≤g+t

There are t equations in t + 1 variables αi . Hence, there must be a non-



trivial solution [α0 , . . . , αg+t ]. Let θ = ψi αi . Then, θ(Pk ) = 0 for all
Pk ∈ M . Therefore, θ is an error locator for e and θ  e = 0. The
trouble is that we do not know the set M , the above equations only exist
in the virtue world, and we are not allowed to solve them.
We shall replace the above equations indexed by the set M by the
following system of equations:

Si,j αi = 0, j ≤ 2g + t − 1. (2 )
i≤g+t

We want to verify that the set {α0 , . . . , αg+t } satisfies the above
equation (2 ). It follows from Proposition 6.4 that θχ · e = 0 for all χ ∈
L(A +(2g−1)P), especially, it suffices for a basis {χj } for L(A +(2g−1)P).
Henceforth we have,
 
Sij · eαi = ψi χj · eai = χj · (θ  e) = 0
i i

for all j. Therefore, equations (2 ) are satisfied.


Conversely, let the non-zero set {α0 , . . . , αg+t } satisfy the equation (2 ).

Let θ = ψj αj . It follows from Proposition 6.4 that it suffices to show
that θχ · e = 0 for all χ ∈ L(A + (2g − 1)P). It suffices to check for a basis
{χj }. We have
  
θχj · e = ψi χj · eαi = Sij · eαi = Sij · rαi = 0.
i i i


Thus, any non-trivial solution of equation (2 ) induces an error locator. It
means that the concrete system of equations (2 ) is equivalent to the virtual
equations indexed by the set M for θ.
Decoding the Geometric Goppa Codes 189

We have the following proposition.

Proposition 6.15. Assume that 1 ≤ wt(e) ≤ t. Let the set M  = {Pi :


θ(Pi ) = 0}. Then, its cardinality is at most g+t, and the system of equations

φi (Pk )Ek = φi · e, i ≤ 3g + 2t − 1 (3 )
Pk ∈M 

has an unique non-zero solution e |M  , where e |M  is the restriction of


e to the coordinates determined by M  . Note that outside the positions
determined by M  , e has coordinates zeroes, so we can find e by the preceding
information.

Proof. Since θ ∈ L(A + gP), it follows from 6.6 proposition that the
cardinality of M  ≤ d(A + gP) = (g + t).
Since c = r + e and outside M  , c = r and e = 0, and the field Fq is of
characteristic 2; therefore, we have φ · r = φ · e and φ · (c + r) = 0 outside
M  . We know that φ · r = φ · e for inside M  . Therefore, the above equation
(3 ) has a solution. Clearly, e |M  = e is a solution of (3 ). Let e∗ be another
solution. Then, the weight of e − e∗ ≤ (g + t) and e − e∗ will be a solution
of the homogeneous system of equations

φi (Pk )Ek = 0, i ≤ 3g + 2t − 1. (5 )
Pk ∈M 

It means that e − e∗ ∈ Cp (B, A + A + (3g − 1)P), while Cp (B, A + A +


(3g − 1)P) has a minimal distance ≥ 3g + 2t − 1 − (2g − 2) = g + 2t + 1 >
g + t ≥ wt(e − e∗ ). Therefore, e − e∗ = 0 and the solution is unique. It
means that we may solve the system of equations (3 ) to find the error
vector e. 
So, the question is how to construct the syndrome table [Si,j ] (where
Si,j = Sij · e) line by line without knowing what is e. The only things which
help us are that step by step until the last step, for s = 0, 1, . . . , g − 1,
we assume that for all i, j with i + j ≤ 2g + 2t − 1 + s, Si,j are known,
and furthermore, we assume that we cannot find an error locator θ with
νA (θ) ≤ t + s row-wise (nor column-wise), i.e., the following two system of
equations have only trivial solutions:

Si,j αi = 0, j ≤ 2g + t − 1, (2∗ )
i≤t+s

Si,j βj = 0, i ≤ 2g + t − 1. (2∗∗ )
j≤t+s
190 Introduction to Algebraic Coding Theory

This fact will be critically important in the proof of Proposition 6.20, which
states that with all points Si,j for i + j ≤ 2g + 2t − 1 + s classified as valid
votes and others invalid votes, and then the valid votes are further classified
as correct or incorrect votes, the Proposition 6.20 states that the correct
vote exists and is the majority of all valid votes. Thus, we shall collect all
valid votes, and among the valid votes, we look up the majority block which
must be the correct vote. Once the correct vote is found, we make the vote
unanimous by changing all incorrect votes, invalid votes, and non-existent
votes to the correct one. Thus, we determined that the values on the line
i + j ≤ 2g + 2t − 1 + s will be decided correctly.

6.5. Feng–Rao’s Majority Voting

As suggested by SV Algorithm, it is helpful to know Si,j up to i + j ≤


3g + 2t − 1. Let us use the notation S|i,j to denote the submatrix [Su,v ]
for u ≤ i, v ≤ j and i + j ≤ 2g + 2t − 1 + s + 1 ≤ 3g + 2t − 1, where
Su,v = ψu χv · e. Now, we study the terms Si,j for i + j = 2g + 2t − 1 + s + 1,
which means that in the following submatrix, all ∗ terms are known, and
we want to study the particular unknown term Si,j .
⎡ ⎤
∗ ··· ··· ∗
⎢· · · · · · · · · · · · ⎥
S|i,j = ⎢
⎣ ∗ ··· ∗
⎥.
∗ ⎦
∗ ··· ∗ Si,j

6.5.1. Construction Process


We want to find the possible values at Si,j . Let us put a variable x at
that position. We use Gaussian elimination row-wise (resp. column-wise)
to eliminate the last row (resp. column) of S|i,j except at (i, j) position.
Depending on the results of these elimination processes, we have a few
possibilities as follows: (1) If all numbers at the last row become zero
except the last one which is a linear function in x, then we set the value at
(i, j) position to be zero, and thus, it determines a value for x. Note that
there might be several elimination processes for row operation and even
more for column operation. The value of Si,j is determined uniquely (the
values determined by row operations and column operations are identical).
(2) Otherwise, if it cannot be done, then the constructive way fails, and
the term Si,j cannot be determined. We use the following terminologies to
clarify the situation. We have the following definition.
Decoding the Geometric Goppa Codes 191

Definition 6.16. Let (Si,j ) be the coefficients matrix of equations (2∗ )


or (2∗∗ ) after the proof of Proposition 6.15. If both row-wise and column-
wise eliminations mentioned above succeed to determine the value at (i, j)
position, the values of Si,j constructed row-wise and column-wise are
consistent, i.e., they are equal, the value will be called the valid vote of
each individual Si,j . If the term Si,j cannot be determined by eliminations
either row-wise or column-wise, or the solutions in either cases are not
unique, or if they both exist and are unique but they do not agree,
then we consider that there is no value, and we consider that the vote is
invalid. 

6.5.2. Relations Between All Coefficients of the Matrix


(Si,j ) for i + j = 2g + 2t − 1 + s + 1 with Fixed s
We shall assume that Si,j are all known for i + j ≤ 2g + 2t − 1 + s. We are
working on the next line i + j = 2g + 2t − 1 + s + 1 to find the correct value
of Si,j by looking over all possible values.
Let us consider the valid votes. Our task is to find the unknown word
e. Before we can determine e, we shall try to find ψi χj · e. For all ψi χj · e
in our discussions, we can prove some are the correct numbers, while some
are not. We have the following definition.

Definition 6.17. Let us fix s with s + 1 ≤ g. Let us consider the


coefficients matrix (Si,j ) of equations (2∗ ) or (2∗∗ ). If a valid vote Si,j
can be shown as Si.j = ψi · χj · e, then it is called a correct vote. 

The above definition seems inoperatable without knowing e before hand.


The important trick is to show that some Si,j is correct without knowing
e. All correct Si,j depending on the index i may be all different; however,
there is a way to tell correct one and incorrect one apart, i.e., the correct
ones all determine the same φi+j · e by the different ψi χj · e, i.e., let us
consider {φi : νA+A (φi ) = i, 0 ≤ i ≤ 3g + 2t − 1} ⊂ L(A + A + (3g − 1)P).
We may express ψi χj for i + j = 2g + 2t − 1 + s + 1 (note that s + 1 ≤ g)
as follows:

ψi χj = ai,j φi+j + bi,j,k φk . (6)
k<i+j

The above is the linear expression of ψi χj in terms of a basis φi+j and


{φk : k < i + j}. The coefficients ai,j , bi,j,k can be pre-computed. Moreover,
ai,j = 0; otherwise, νA+A (ψi χj ) < i + j. Therefore, since ai,j and all bi,j,k
192 Introduction to Algebraic Coding Theory

and φk · e are known, then the value Si,j = ψi χj · e is uniquely determined


by the value of φi+j · e and vice versa. We shall consider Si,j as a virtue
vote, while by the above formula (6) φi+j · e as the common vote by all
virtue vote. In this way, we may group all virtue votes in blocks. Later (see
Proposition 6.20), we show that it is the block of majority among all blocks
the block of correct vote and thus we can tell the correct value of Si,j .
A priori, we do not know if there is a correct one. Now, we shall search
for all possible values Si,j for fixed i, j with i + j = 2g + 2t − 1 + s + 1.
Among them we shall determine which ones are valid and which ones form
the majority block if there is any.

6.5.3. Possible Values of All Coefficients of the Matrix (Si,j )


for i + j = 2g + 2t − 1 + s + 1 with Fixed s
Let us fix i, j with i+j = 2g +2t−1+s+1. In our situation, the matrix S|i,j
is not completely known, i.e., all entries are known except the Si,j term. If
the last row of S|i,j−1 is a linear combination of the preceding rows, i.e.,
the following equations can be solved:

αk Sk, = Si, f or  < j, (7)
k<i

then let {α0 , . . . , αi−1 } be a solution, then Sij · e can be defined as



Si,j = αk Sk,j .
k<i

We will have a better understanding of the above procedure by using


linear algebra. Let us give Si,j some value to complete the matrix S|i,j .

Proposition 6.18. Let S|i,j = [Sk, ](k≤i,≤j) be a submatrix. Then,


(1) rank(S|i−1,j−1 ) = rank(S|i,j ), ⇔ Si,j casts a valid vote; (2)
rank(S|i−1,j−1 ) = rank(S|i−1,j ) = rank(S|i,j−1 ) ⇔ there is a unique way
to fill the value for Si,j so that rank(S|i−1,j−1 ) = rank(S|i,j ).

Proof. First, we shall prove claim (1).


(⇒) We have

rank(S|i−1,j−1 ) ≤ rank(S|i,j−1 ) ≤ rank(S|i,j ).

So, they must be all equal. Then, by Gaussian elimination to the last row of
S|i,j , when the last row of S|i,j−1 becomes 0, the value of Si,j is determined.
Hence, our assigned arbitrary value for Si,j must be the determined one.
Decoding the Geometric Goppa Codes 193

Suppose that there are two ways to give two different values β1 , β2 for Si,j .
If we assign β2 to the matrix, then clearly the last row of S|i,j will become
[0, . . . , 0, a] with a = 0. Then, we have

rank(S|i−1,j−1 ) = rank(S|i,j ) − 1.

It is against our hypothesis. So, the value of Si,j is uniquely determined this
way. Similarly, its value is uniquely determined column-wise. Let the value
of Si,j be b row-wise and c column-wise. Let us take the value b for Si,j and
then apply the column-wise Gaussian elimination to the last column. We
get a new matrix with last row [0, . . . , 0, b − c]. It follows that b = c.
(⇐) If Si,j casts a valid vote, then by Gaussian elimination to the
last row, the last row will become [0, . . . , 0]. Let us then apply Gaussian
elimination to the last column. The last column will be [0, . . . , 0]T . It is
easy to see that

rank(S|i−1,j−1 ) = rank(S|i,j ).

Second, we shall prove claim (2).


Using Gaussian elimination to the last row, the last entry of the last
row is 0. Then, the last row consists of 0’s only. Since Gaussian elimination
is reversible, we determine the value of Si,j uniquely. Now, apply Gaussian
elimination to the last column, and we have the last column consisting of
0’s only. So, clearly

rank(S|i−1,j−1 ) = rank(S|i,j ). 

We want to illustrate the relation of the rank of S|i,j and the existence
of error locators of a certain kind. Let us assume that the value of Si,j
casts a valid vote. Let us consider an error locator θ with νA (θ) = i and
θ ∈ L(A0 + (2g + t − 1)P). Let

θ = ai ψi + bk ψk ,
k<i

where ai = 0. As usual, we assume it to be 1. It follows from Proposition


6.4 that we always have

θχ · e = χ · (θ  e) = χ · 0 = 0, f or all  ≤ 2g + t − 1.

Now, we slightly weaken the conditions in the above equation to the


following:

θχ · e = χ · (θ  e) = χ · 0 = 0, f or all  ≤ j,
194 Introduction to Algebraic Coding Theory

or

ψi χ · e + bk ψk χ · e = 0, f or all  ≤ j,
k<i

or simply change the notations as



Si, = (−bk )Sk, = 0, f or all  ≤ j.
k<i

Thus, the last row of S|i,j is a linear combination of the preceding rows.
We rewrite the above in the following way: If the last row of S|i,j is a linear
combination of the preceding rows, i.e., solve the following equations:

αk Sk, = Si, f or  < j. (7)
k<i

If there is a solution {α0 , . . . , αi−1 }, then Si,j = Sij · e = ψi χj · e must be



Si,j = αk Sk,j .
k<i

Similarly, we do it column-wise. Now, it is clear that the rank of S|i,j is


related to our voting procedure.
We shall again explain the above using linear algebra. Let us consider a
i
  
non-zero vector v = [· · · , bk , . . . , 1, 0, . . . , 0]. Then, v is in the left null-space
of S|i−1,j ⇐ the rational function θ defined by

θ = ψi + bk ψk
k<i

is an error locator with νA (θ) = i (cf. Proposition 6.4).


We have the following proposition.

Proposition 6.19. Assume that 1 ≤ wt(e) ≤ t, and there is no error


locator θ, θ such that νA (θ) < i, θ ∈ L(A0 + (2g + t − 1)P), and νA (θ ) < j,
θ ∈ L(A0 + (2g + t − 1)P), and ψk · e is correctly defined for k < i, and χk · e
is correctly defined for k < j. Furthermore, we assume that either (1) there
are error locators θ and θ with νA (θ) = i, θ ∈ L(A0 + (2g + t − 1)P) and
νA (θ ) = j, θ ∈ L(A0 + (2g + t − 1)P) or (2) there is at least one of
the two error locators θ, θ and Si,j is a valid vote, then Si,j casts a vote
that is inductively defined and correct. Furthermore, it is obvious that φi+j
(cf. equation (6) after Definition 6.17) is inductively defined and correct.
Decoding the Geometric Goppa Codes 195

Proof. Let us assume (1). In the above discussion, we show that equation
(7) can be solved and it produces a value for Si,j . By Gaussian elimination
to the last row, the last row is reduced to [0, . . . , 0]. Now, apply the same
reasoning to the columns. The last column is [0, . . . , 0]T . Therefore, we have
rank(S|i−1,j−1 ) = rank(S|i,j ). It follows from the preceding proposition
that Si,j casts a valid vote. Let

θ = ai ψi + bk ψk , (1)
k<i

where ai = 0. As usual, factor both sides by ai , we assume it to be 1.


Furthermore, the above expression is unique. Otherwise, take the difference
of two expressions, then we have an error locator with νA less than i. Hence,
it must be zero, and the expression is unique. We have

0 = θ  e = ψi  e + bk ψk  e.
k<i

It means that we have


 
ψi  e = bk ψk  e, ψi · e = bk ψk · e
k<i k<i

since the characteristic of the field is 2. Now all ψk · e are correctly defined
as we already know, then ψi · e must be correctly defined and unique.
Moreover, if there is an error locator θ with νA (θ ) = j, θ ∈ L(A0 +
(2g + t − 1)P), then repeating the above argument, we conclude that if

θ = aj χj + bk χk ,
ki <j

where aj = 0, as usual, we assume it to be 1 and the expression is unique.


We have

0 = θ   e = χj  e + bk χk  e.
k <j

Since all χk  e are correctly defined as we already know, then χj  e must
be correctly defined and unique. It means that we have
 
χj  e = bk χk  e, χj · e = bk χk · e.
k <j k <j
196 Introduction to Algebraic Coding Theory

The computation of ψi χj · e can be carried out as


⎛ ⎞

ψi χj · e = ψi · (χj  e) = ψi · ⎝ bk χk  e⎠
k <j
 
= χk · (ψi  bk e) = bk χk · (ψi  e)
k <j k <j
 
 
= bk χk · bk ψk  e
k <j k<i

= bk bk ψk χk · e.
k<i,k <j

Although the term e is unknown, we do know φi · c = 0, for i ≤ 2g + 2t − 1.


Hence, φi · e = φi · r and are all known, for i ≤ 2g + 2t − 1. For all h with
2g + 2t − 1 < h ≤ 2g + 2t − 1 + s, inductively, φh · e are known and defined
and correct. Then, it follows from the above formula that if the assumption
(1) of this proposition are satisfied, it is easy to prove ψi χj · e is inductively
defined and correct. Furthermore, since

ψi · χj − φi+j = bi,j,k φk ,
k<i+j

it is easy to show that φi+j · e is inductively defined and correct.


Let us assume (2). We may assume that θ exists. Let us use the notations
above. Then, ψi  e is correctly defined. Furthermore, we have χj θ · e = 0
and hence we have
Si,j = χj ψi · e = χj · (ψi  e).
It is easy to see that Si,j is correctly defined. The rest is easy. 

6.5.4. The Block of Correct Votes is in Majority


As we point out, we do not know the values of φ · e for 2g + 2t − 1 < .
Hence, we cannot test to find φ · e directly. We go over all Si,j for i + j =
2g + 2t − 1 + s + 1, and collect all valid votes. Recall that we assume in
the remark of the section The Construction of Error-Locator. Step
s + 1 < g, that there are no error locators θ and θ with νA (θ) ≤ t + s,
θ ∈ L(A+ (2g − 1)P) and νA (θ ) ≤ t+ s, θ ∈ L(A + (2g − 1)P); otherwise,
we can decode (cf. Proposition 6.14). The following proposition says that
the majority of valid votes is the correct one.
Decoding the Geometric Goppa Codes 197

Proposition 6.20. Assume that 1 ≤ wt(e) ≤ t, 0 ≤ s and s + 1 ≤ g.


We assume that there are no error locators θ and θ with νA (θ) ≤ t + s,
θ ∈ L(A + (2g − 1)P) and νA (θ ) ≤ t + s, θ ∈ L(A + (2g − 1)P).
Then, among all valid votes, (the number of correct votes)−(the number of
incorrect votes) ≥ s + 1. Since s ≥ 0, the correct votes will be in majority.
In particular, there is at least one correct vote.

Proof. Note that i + j = 2g + 2t − 1 + s + 1 and i, j ≤ 2g + t − 1. We


conclude that t + s + 1 ≤ i, j ≤ 2g + t − 1.

(1) We claim that there are at least g error locators θ with νA (θ) distinct
numbers between t+s+1 and 2g +t−1 and with the corresponding number
ai = 1 in the equation (1) of Proposition 6.19.
Since there is no error locator θ with νA (θ) ≤ t + s, we may count all

error locators in L(A + (2g − 1)P). Let the divisor E be Pi ∈M Pi , where
M is the set of error locations of e. Clearly, an error locator θ is a non-zero
element in L(A + (2g − 1)P − E). Note that d(A + (2g − 1)P − E) ≥ 2g − 1.
Therefore, it follows from Riemann’s theorem that (A+(2g −1)P−E) = g.
Let {θi } be a basis. We may use linear changing of basis to make the
new basis {θi } have distinct values νA (θi ) and the corresponding number
ai = 1 in the equation (1) of Proposition 6.19. We still name the new
basis {θi }.
(2) Let us count the number mcor of correctly defined votes and the
number ninc of valid and incorrect votes. According to the preceding
Proposition 6.19, mcor ≥ the number of pairs (i, j) with both error locators
θ, θ with νA (θ) = i, νA (θ ) = j, and θ ∈ L(A + (2g − 1)P), θ ∈
L(A + (2g − 1)P), and ninc ≤ the number of pairs (i, j) with no error
locators θ with νA (θ) = i, θ ∈ L(A + (2g − 1)P) or error locators θ with
νA (θ ) = j, θ ∈ L(A + (2g − 1)P).
Let the set J be the set of all integers {i : t + s + 1 ≤ i ≤ 2g + t − 1},
then there is a map π with π(Si,j ) = i which sends {Si,j : i+j = 2g +2t+s}
to J. Let I be the subset of J which is the collection of {νA (θ)} of error
locators θ, where the corresponding number ai = 1 in the equation (1) of
Proposition 6.19. It means that for every dimension, there is exactly one
error locator. So, it follows from (1) that the cardinality of I, | I |, = g.
(3) Let I1 be the subset of J which is the collection of {νA (θ )} of error
locators θ . Then, by arguments identical to (1) and (2), we have | I1 |= g.
Let us define a reflection σ of the interval I = {i : t + s + 1 ≤ i ≤
2g + t − 1}, i.e., σ(j) = 2g + 2t + s − j. Note that σ 2 = id. Then, it is easy
198 Introduction to Algebraic Coding Theory

to see that i + j = 2g + 2t + s ⇔ i = σ(j) as follows:

i + j = 2g + 2t + s ⇔ i = (2g + 2t + s) − j = σ(j).

Let us define I  = σ(I1 ). Then, it is easy to see that | I | = g. Given valid


Si,j with i + j = 2g + 2t + s and i ∈ I ∪ I  , then we have that either
θ ∈ L(A + (2g − 1)P) or j = σ(i), θ ∈ L(A + (2g − 1)P). In ether case,
Si,j is correct. Let the number of incorrect valid vote be minc . ‘We have

minc ≤| J\(I ∪ I  ) |= 2g − s − 1− | I ∪ I  | .

Moreover, if i ∈ I ∩ I  , then i = σ(j), and both θ ∈ L(A + (2g − 1)P)


and θ ∈ L(A + (2g − 1)P exist. It follows from Proposition 6.19 that Si,j
is correct. Let the number of correct valid vote be mcor . We have

mcor ≥| I ∩ I  | .

So, we have

mcor − minc ≥ | I ∩ I  | + | I ∪ I  | +s + 1 − 2g
= | I | + | I  | +s + 1 − 2g > s + 1.

So, we proved that more than half of the valid votes are correct; they provide
the identical θi+j · e, and they are in majority. 

Remark: After we tally all valid votes and separate them into blocks
according to the values of φi+j induced by them and find the winner, which
is the correct one, we make the vote unanimous by changing all incorrect
votes, invalid votes, and non-existent votes to the correct one. Once we
complete the extra line Si,j with i + j = 2g + 2t − 1 + s + 1, we shall
try to find an error locator with order t + s + 1 row-wise or column-wise.
If we cannot find an error locator either way, then we decide on the next
line of i + j = 2g + 2t − 1 + s + 2. We thus proceed to the end, using the
materials in The Construction of Complete Syndrome Table, Final
Step s + 1 = g, we find the error locator θ and the error vector e. Finally,
we solve the decoding problem. 

Example 2: Let us consider a one-point code. Let us consider the Klein


quartics curve over F24 defined by the equation x3 y + y 3 + x = 0 (cf.
Example 36 in Section 4.9. We shall use the notation of Example 36).
Its genus is g = 3. Let us consider a one-point code Cp (B, D) with B =
P1 + · · · + P16 , and D = (2g + 2t − 1)P = 11P with t = 3.
Decoding the Geometric Goppa Codes 199

We take A = A = 3P, A0 = 0, and D = D = 11P. For the geometric


Goppa code Cp (B, D), we have the rank k = n − d(D) + g − 1 = 7 and the
designed minimal distance d = d(D) − 2g + 2 = 7. Using the DU algorithm,
we expect to correct 3 errors. Comparing with our previous example of SV
Algorithm we have the following table.

SV Algorithm DU Algorithm
n = 16 16
k = 4 7
d = 10 7
t = 3 3

Note that all other things of these two algorithms are comparable, but
the contents of their messages are different. For a block of length 16,
SV Algorithm contains information of 4 letters and DU Algorithm
contains information of 7 letters. Therefore, the same length of transmission
of DU Algorithm contains 74 amount of information compared with
SV Algorithm. We shall make the following pre-computation.

(1) We pre-compute the generator matrix and the check matrix (cf.
Exercises 4 and 5).
(2) We compute a basis of L((3g + 2t − 1)P) = L(14P) = L(A + A +
(3g − 1)P). It is easy to see that the following: {φ0 , φ3 , φ5 , φ6 , φ7 , φ8 , φ9 ,
φ10 , φ11 , φ12 , φ13 , φ14 } form a basis.

φ0 φ3 φ5 φ6 φ7 φ8 φ9 φ10 φ11 φ12 φ13 φ14


1 y 1 y2 y 1 y2 y 1 y2 y
1 x x2 x2 x3 x3 x3 x4 x4 x4 x5 x5 ,

where ordP (φi ) = −i. The reasons that they form a basis are the following:
(1) φi ∈ L(14P); (2) they are linearly independent over F22 ; (3) by
Riemann’s theorem, (14P) = 14 + 1 − g = 12.
We shall compute the following 12 × 16 matrix C:
⎡ ⎤
φ0 (P1 ) · · · · · · φ0 (P16 )
⎢ ⎥
⎢ φ (P ) · · · · · · φ3 (P16 ) ⎥
C=⎢ 3 1 ⎥.
⎣ ··· ··· ··· ··· ⎦
φ14 (P1 ) · · · · · · φ14 (P16 )

It is easy to see that the first 6 of the φi form a basis for L(A0 + (2g +
t − 1)P) = L(8P)).
200 Introduction to Algebraic Coding Theory

We shall consider another auxiliary divisor A + (2g − 1)P = (2g + t −



1)P = 8P. We have to express ψi χj = k≤i+j ak φk as follows (Note that
χj ∈ L(A + (2g − 1)P) = L(8P) and ψi ∈ L(A0 + (2g + t − 1)P) = L(8P)):

φ0 φi = φi , for i ≤ 8
φ3 φi = φi+3 , for i ≤ 8
φ5 φi = φi+5 , for i ≤ 8, and i = 7
φ6 φi = φi+6 , for i ≤ 8.

Using the defining equation y 3 = x3 y + x, we deduce the remaining


equations as follows:

y3 1 y
φ5 φ7 = = 4 + 2 = φ12 + φ5 ,
x5 x x
y4 x3 y 2 + xy y y2
φ7 φ7 = = = + = φ14 + φ7 .
x6 x6 x5 x3
The above are our pre-computations.

6.5.5. Syndrome Calculation

Let us consider the received word r. We use the check matrix or the check
procedure to determine if r is a code word. If it is a code word, then we pass
to the next block of received message. The computation requires 7 × 16 =
112 multiplications. If it is not a code word, then we start the syndrome
calculation as follows. We compute φi · r for φi ∈ L(A) = L(8P), i.e.,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
φ0 (P1 ) · · · · · · φ0 (P16 ) r1 φ0 · r
⎢ ⎥ ⎢
⎢φ3 (P1 ) · · · · · · φ3 (P16 )⎥ ⎢ r2 ⎥ ⎢ ⎥
⎥ = ⎢ φ3 · r ⎥.
⎢ ⎥·⎣ ⎦ ⎣
⎣ ··· ··· ··· ··· ⎦ · · ⎦
φ8 (P1 ) · · · · · · φ8 (P16 ) r16 φ8 · r

If φi · r = 0 for all φi , then r is a received word with more than 3


errors (since it fails the check matrix test, it cannot be a code word), we
shall return an error message and we pass to the next block of the received
message. The total number of multiplications is 6 × 16 = 96. If φi · r = 0
for some φi , we have to compute the following to start the building of our
Decoding the Geometric Goppa Codes 201

syndrome table:
⎡ ⎤ ⎡r ⎤ ⎡ ⎤
φ9 (P1 ) · · · · · · φ9 (P16 ) 1
φ9 · r
⎢ ⎥ ⎢ r ⎥
⎣φ10 (P1 ) · · · · · · φ10 (P16 )⎦ · ⎢ ⎥ ⎣ ⎦
2
⎣ · ⎦ = φ10 · r .
φ11 (P1 ) · · · · · · φ11 (P16 ) φ11 · r
r16

The total number of computations for the case that there is an error is
9 × 16 = 144.

6.5.6. A Concrete Example of Transmission


We use the same representation of F24 as in Example 36 of Section 4.9.
Recall that we let α be a field generator of F22 over F2 , i.e., α2 + α + 1 = 0.
Let us consider the field F24 . Let F24 = F22 [β] with β satisfying β 2 +β+α =
0. It is easy to see that {αβ, β, α, 1} is a basis for F24 over F2 , i.e., any
element r ∈ F24 can be written as a0 · 1 + a1 · α + a2 β + a3 · αβ. We
represent r as a0 + a1 · 2 + a222 + a3 · 23 as an integer in the following table.
Let us consider the example that the code word c, the error word e, and
the received word r are as follows:
c = 13 0 14 3 0 10 11 8 1 10 14 12 1 1 1 1,
e = 0 0 0 0 0 0 0 0 0 0 0 0 1 14 0 15,
r = 13 0 14 3 0 10 11 8 1 10 14 12 0 15 1 14.
Certainly the receiver has only the received word r. We include the
original message c and the error word e just for our discussions. The whole
point is that given only r, we want to recover c.

6.5.7. Syndrome Tables


The received word will not pass the syndrome calculation. For instance,
φ3 ·r = α = 0 (we sometimes write α = 2 which may cause some confusions).
So, we look at the following syndrome table (Table 6.2).
Table 6.2 is constructed by the relation r = c+e and φi φj ·r = φi φj ·e for
all i + j ≤ 2g + 2t − 1 = 11. If there is an error locator θ with νA (θ) ≤ 3 and
θ ∈ L(8P), then the first two rows must be linearly dependent. Since they
are not, there is no such θ (note the requirement of Proposition 6.20). In
general, to check if two rows of vectors of length 6 are linearly independent,
it takes 6 multiplications at most.
202 Introduction to Algebraic Coding Theory

Table 6.2.
φ0 , φ3 , φ5 , φ6 , φ7 , φ8
φ0 0, 8, 4, 3, 12, 8
φ3 8, 3, 8, 7, 11, 11
φ5 4, 8, 11, 11, S5,7 ,
φ6 3, 7, 11, S6,6 ,
φ7 12, 11, S7,5 ,
φ8 8, 11,

We shall construct for s = 0 the terms Si,j for i + j = 2g + 2t −


1 + 0 + 1 = 12 (cf. Section 6.6). There are three terms Si,j on the line
i + j = 12 (Si,j = φi φj · e on the line i + j = 12) which are unknown,
i.e., φ5 φ7 · e = S5,7 , φ6 φ6 · e = S6,6 , φ7 φ5 · e = S7,5 . Note that the terms
φ4 φ8 · e = S4,8 = φ8 φ4 · e = S8,4 are not included in our discussions since
φ4 does not exist.
Let us compute the following (3 × 3) submatrix S|5,5 of Table 6.2:
⎡ ⎤ ⎡ ⎤
0 8 4 0 αβ β
⎣8 3 8 ⎦ = ⎢ ⎣αβ α + 1 αβ

⎦.
4 8 11 β αβ αβ + α + 1

It is easy to see by direct computation that the above matrix is non-


singular. It follows from Proposition 6.18 that S5,7 , S7,5 give only non-valid
votes and that the only remaining S6,6 = 7 must be a correct vote (cf.
Propositions 6.18 and 6.20) (hence valid) and by our previous computations,
we know that φ12 · e = φ6 · φ6 e = 7, and we shall use the value of φ12 · e = 7
to correct the values of S5,7 · e = S7,5 · e = φ12 · e + φ5 · e. We have
S5,7 = S7,5 = φ7 · φ5 e = φ12 · e + φ5 · e = 7 + 4 = β + α + 1 + β = 3. In
general, to find if S5,7 casts a valid vote, we have to check if rank(S|3,7 ) =
rank(S|3,6 ) = rank(S|5,6 )? It needs 13 multiplications. If the answer is yes,
then we can find its value with 2 more multiplications and its vote. If the
answer is no, we discard it. Since the matrix is always symmetric, we do
not have to do anything about S7,5 . For S6,6 , a similar argument will give
us 16 multiplications to determine if the value constitutes a valid vote, and
if so the vote. Therefore, in general we need 31 multiplications to fill those
3 spots.
Decoding the Geometric Goppa Codes 203

Table 6.3 is as follows.

Table 6.3.
φ0 , φ3 , φ5 , φ6 , φ7 , φ8
φ0 0, 8, 4, 3, 12, 8
φ3 8, 3, 8, 7, 11, 11
φ5 4, 8, 11, 11, 3, S5,8
φ6 3, 7, 11, 7, S6,7 ,
φ7 12, 11, 3, S7,6 ,
φ8 8, 11, S8,5 ,

Now, since there is no φ4 , then there is no error locator θ with νA (θ) = 4.


It means that νA (θ) ≤ 4 ⇔ νA (θ) ≤ 3. From the point of view of linear
algebra, there is no S4,8 · e. Then, we may try to fill Table 6.3 for the next
step. We shall try to find the correct values of S5,8 , S6,7 , S7,6 , S8,5 .
Since the matrix S|5,5 is non-singular, S5,8 , S8,5 cast invalid votes (cf.
Proposition 6.18). We have s + 1 = 2, so the number of correct votes must
be ≥ 2 (cf. Proposition 6.20) and S6,7 , S7,6 both cast correct vote. Then,
it is routine to find out S6,7 = 12, S7,6 = 12. By correcting all wrong votes
and invalid votes, we have S5,8 = 12, S8,5 = 12.
In general, the computations about ranks of various submatrices use
20 + 18 = 30 multiplications, and to find the values of S5,8 (= S8,5 ), S6,7 (=
S7,6 ) use 2 + 3 = 5 multiplications. So, totally it needs 35 multiplications.
Table 6.4 becomes as follows.

Table 6.4.
φ0 , φ3 , φ5 , φ6 , φ7 , φ8
φ0 0, 8, 4, 3, 12, 8
φ3 8, 3, 8, 7, 11, 11
φ5 4, 8, 11, 11, 3, 12
φ6 3, 7, 11, 7, 12, S6,8
φ7 12, 11, 3, 12, S7,7 ,
φ8 8, 11, 12, S8,6 ,
204 Introduction to Algebraic Coding Theory

We check if there is an error locator θ with νA (θ) ≤ 5 and θ ∈ L(8P),


or an error locator θ with νA (θ ) ≤ 5 and θ ∈ L(8P). Now, since the
matrix S|5,5 is non-singular, we cannot find any. Therefore, we may use
Proposition 6.18.
In general, we have to check if the first three rows are linearly dependent.
It takes 12 + 5 = 17 multiplications.
So, we consider s = 2, s + 1 = 3. There are three terms to be considered,
S6,8 , S7,7 , S8,6 . According to Proposition 6.20, among the valid votes, we
have (correct votes)−(incorrect votes) ≥ s + 1 = 3. So, all of them must
be valid and correct. On the other hand, s = 2 = g − 1 is the end of our
induction. So, we should make the last computation.
It can be checked that for the matrix S|6,7 , with the rows denoted by
R0 .R3 , R5 , R6 , we have R6 = 6R5 + 8R3 + 3R0 . Therefore, S6,8 = 4 = S8,6
and S7,7 = φ14 · e + φ7 · e = 4 + 12 = α + αβ + α = 8.
In general, we have to compute the linear relation between the rows
R0 .R3 , R5 , R6 of the matrix S|6,7 . Using Gaussian elimination, it takes 26
multiplications, and it takes 3 more multiplications to find the value of S6,8 .
So, totally it takes 29 multiplications to find all S6,8 , S7,7 , S8,6 .
We have the following complete Table 6.5.

Table 6.5.
φ0 , φ3 , φ5 , φ6 , φ7 , φ8
φ0 0, 8, 4, 3, 12, 8
φ3 8, 3, 8, 7, 11, 11
φ5 4, 8, 11, 11, 3, 12
φ6 3, 7, 11, 7, 12, 4
φ7 12, 11, 3, 12, 8,
φ8 8, 11, 12, 4,

It follows from Proposition 6.7 that the over-determined linear system



of equations i Si,j αi = 0 for i = 0, 3, 5, 6, j = 0, 3, 5, 6, 7, 8. has a non-zero
 
solution {αi }. Then, it follows from Proposition 6.7 that θ = αi φi is
an error locator. Now, θ = φ6 + 6φ5 + 8φ3 + 3φ0 is an error locator with
νA (θ) ≤ 6 and θ ∈ L(8P). Let us find the zero set of θ. We use the following
Table 6.6 with the rows of φ0 , φ3 , φ5 , φ6 pre-computed and the last row of
θ thus computed.
Decoding the Geometric Goppa Codes 205

Table 6.6.
P1 , P2 , P3 , P4 , P5 , P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16
φ0 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
φ3 0, 0, 3, 2, 3, 3, 2, 2, 14, 13, 10, 8, 15, 12, 9, 11
φ5 0, 0, 1, 1, 4, 5, 6, 7, 12, 15, 9, 11, 5, 4, 7, 6
φ6 0, 0, 2, 3, 2, 2, 3, 3, 8, 10, 14, 13, 9, 11, 12, 15
θ 3, 3, 3, 10, 11, 13, 9, 15, 8, 3, 13, 9, 0, 0, 9, 0

In general, since θ(Pi ) = φ6 (Pi ) + 6φ5 (Pi ) + 8φ3 (Pi ) + 3φ0 (Pi ), we use
the pre-computation of φj (Pi ) to locate the zeroes of θ. It takes 4 × 16 = 64
multiplications. For the present case, the zero set M  is {P13 , P14 , P16 }.
Using Proposition 6.13, we have to solve the last set of equations as
follows:

1E1 + 1E2 + 1E3 = 0,


15E1 + 12E2 + 11E3 = 8,
5E1 + 4E2 + 6E3 = 4,
9E1 + 11E2 + 15E3 = 3,
10E1 + 8E2 + 13E3 = 12,
14E1 + 13E2 + 10E3 = 8.

According to our Proposition 6.13 , there will be some non-trivial


solutions. It is easy to see that E1 = 1, E2 = 14, E3 = 15 satisfies all
equations.
In general, there are at most g + t = 6 variables, and to solve this
system of equations we need 72 multiplications.
Therefore, we conclude
r = 13 0 14 3 0 10 11 8 1 10 14 12 0 15 1 14,
e= 0 0 0 0 0 0 0 0 0 0 0 0 1 14 0 15,
c = 13 0 14 3 0 10 11 8 1 10 14 12 1 1 1 1.
We have to further test to see if c is a code word. We check the following
matrix equation:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
13 14 8 1 7
⎣ 7 6 5 ⎦ · ⎣ 14 ⎦ = ⎣ 11 ⎦.
6 7 4 15 11
206 Introduction to Algebraic Coding Theory

For instance, the first equation is


13 · 1 + 142 + 8 · 15
= (αβ + 1) · 1 + (αβ + β + α)2 + αβ(αβ + β + α + 1)
= β + α + 1 = 7.
Similarly, we may verify the other two equations. Since the above
equation is satisfied, we know that c is a code word and we successively
decode. In general, we need 112 multiplications for an error-free block, 407
multiplications to correct 1 block with fewer than three errors and 407
multiplications to return an error message (to indicate that there are more
than three errors). Since it processes 7 letters which is 28 bits, per bit, it
takes 4, 14.5, 14.5 multiplications for error-free, correct errors, and failure,
respectively. It is faster than the SV algorithm. 

Exercises

(1) Show that [0, 4, 12, 12, 1, 0, 0, 13, 12, 12, 3, 11, 0, 0, 0, 0] is a code word
in Example 2.
(2) Show that [13, 9, 9, 4, 0, 14, 1, 0, 13, 4, 9, 6, 0, 0, 0, 0] is a code word in
Example 2.
(3) Show that [14, 2, 12, 2, 0, 10, 0, 1, 2, 9, 12, 14, 0, 0, 0, 0] is a code word in
Example 2.
(4) Show that [10, 1, 14, 6, 0, 9, 0, 0, 13, 10, 15, 3, 1, 0, 0, 0] is a code word in
Example 2.
(5) Show that [0, 6, 3, 9, 0, 5, 0, 0, 6, 0, 5, 3, 0, 1, 0, 0] is a code word in
Example 2.
(6) Show that [1, 5, 14, 12, 0, 2, 0, 0, 13, 8, 5, 5, 0, 0, 1, 0] is a code word in
Example 2.
(7) Show that [1, 12, 9, 11, 0, 6, 0, 0, 14, 10, 5, 9, 0, 0, 0, 1] is a code word in
Example 2.
(8) Find a generator matrix for the code in Example 2.
(9) Find a check matrix for the code in Example 2.
(10) Write a computer program to decode the code of Example 2.
(11) Using DU algorithm, write down the details of decoding the geometric
Goppa code Cp (B, 37P) based on x50 + x51 + x52 with the ground field
F24 (cf. Section 6.8).
Appendices
This page intentionally left blank
Appendix A

Convolution Codes

A.1. Representation

The concept of a convolution code was introduced by Elias [19]. Any


sequence of data or a stream of data (a0 , a1 , . . . , an , . . .) with ai ∈ F can be
represented as a power series


f (x) = ai xi .
0

Let us consider the coding process over F2 . Given a stream over F2 ,


(a0 , a1 , a2 , a3 , . . . , an , . . .), i.e., ai ∈ F2 , we may use the following process
to produce two (in general, maybe 1, or any finitely many) data streams
(b0 , b1 , . . . , bn , . . .), (c0 , c1 , . . . , cn , . . .).

g1 (x)
(· · · , b1 , b0 )

f(x)
(· · · , a1 , a0 ) split memor1 memor2

g2 (x)
(· · · , c1 , c0 )

209
210 Introduction to Algebraic Coding Theory

At the beginning, before the count of discrete time 0, 1, 2, . . ., we fill the


three boxes at the middle row and the two ⊕ with zeroes; then, at the
time 0, a0 enters the leftmost box at the middle and all other squares at
the middle row and the two ⊕ are with 0’s. The two ⊕ at the top and
bottom denote modulo 2 additions. As time goes from n to n + 1, all values
will move to the next ones according to the arrows, and then the two plus
operations in the two ⊕ will combine numbers. The n + 1th coefficient will
occupy the leftmost box. Note that at times 0, 1, the resulting stream g1 (x)
(resp. g2 (x)) will receive two zeroes, which we disregard.
Mathematically, let us use

 ∞

g1 (x) = bi xi , g2 (x) = ci xi ,
0 0

where
b0 = a0 , b1 = a1 , bn = an + an−2 for all n>1,
c0 = a0 , c1 = a1 + a0 , cn = an + an−1 + an−2 for all n>1.
Or we simply write
g1 (x) = (1 + x2 )f (x),
g2 (x) = (1 + x + x2 )f (x).
It seems that the encoder is simply a multiplication by a polynomial.
However, there is a catch: a multiplication by x (a delay of time by 1)
should be considered as invertible! So, we should enlarge the polynomial
ring F[x] to the Laurent polynomial ring F[x](x) = { h(x) xd
: h(x) ∈ F[x ]}.
If we let the Laurent polynomial ring F[x](x) act on the power series
ring F[[x]], then we find it is not even closed under the action induced by
x−1 . The natural thing to do is to enlarge the power series ring F[[x]] to
the meromorphic function field F((x)). Recall that
∞ 

F2 ((x)) = ai xi : ai ∈ F2 , m ∈ Z .
−m

In the above diagram, the encoded stream g1 (x) is defined by g1 (x) = (1 +


x2 )f (x). Mathematically, we need only g1 (x) to determine f (x). Suppose
there is no error, mathematically, f (x) = (1 + x2 )−1 g1 (x); however, there
is a problem: the recursive formula a0 = b0 , a1 = b1 , a2 = b2 − b1 , a3 =
b3 − b2 + b1 , . . . is getting longer and longer without a bound. Therefore, we
require an unbounded number of memory units. Furthermore, if we have a
Appendix A: Convolution Codes 211

single error for g1 (x), namely replacing g1 (x) by g1 (x)+xn , then the inverse
will differ with f (x) at infinitely many places. Any encoder with the last
problem will be called a catastrophic encoder and will not be used, and we
avoid a decoder requiring infinitely many memory units.

A.2. Combining and Splitting

It is clear that if we are allowed only one message series g1 (x) to decode,
then the only good encoders which take one incoming stream of data f (x)
and produce one stream of data g1 (x) are multiplying by xn . Those are
non-interesting. We should consider using several message series to decode
to find one message series. Let us first study a naive technique of combining
and splitting data streams. Let


f (x) = ai xi
i=−m


gj (x) = bji xi for j = 1, 2, . . . , r.
i=−m

Then, we have hj (x), h(x) uniquely defined by the following equations:


n−1

f (x) = xj hj (xn ),
j=0

r−1

h(x) = xj gj (xr ).
j=0

It means that we may split one stream of data f (x) into n streams
of data hj (x) for j = 0, 1, 2, . . . , n − 1, in symbols Sn (f (x)) =
[h0 (x), h1 (x), . . . , hn−1 (x)], and combining r streams of data gj (x) for
j = 0, 1, 2, . . . , r − 1 into one stream of data h(x), in symbols
Cr (g0 (x), . . . , gr−1 (x)) = h(x). The splitting and combining operations are
one to one and onto maps between F((x)) and F((x))r or F((x))n , while
they are non-linear respect to the field F((x)).

A.3. Smith Normal Form

The way of splitting a data stream f (x) in in Section A.1 is outside the way
described in Section A.2. Let us have a detailed study of it. The splitting
212 Introduction to Algebraic Coding Theory

can be written mathematically as


   
[f (x)] 1 + x2 1 + x + x2 = g1 (x) g2 (x) .

Using some linear algebra, we may rewrite the above matrix equations as
 
      1 + x2 1 + x + x2  
f (x) · 1 · 1 0 · = g1 (x) g2 (x) .
x 1+x

It means that we have the following matrix equation:


 
 2
   
2 =
 1 + x2 1 + x + x2
1+x 1+x+x 1 · 1 0 · .
x 1+x

The above is the well-known Smith normal form of a matrix over the P.I.D.
F2 [x ] as follows.

Proposition A.1. Let R be a P.I.D. and M an r × n matrix with entries


in R. Then, M can be written as

M = AΓB,

where (1) both A and B are invertible such that their inverses are with
entries in R, (2) Γ is in the diagonal form with entries on the diagonal the
invariant factors γi of M. Note that γ1 | γ2 | . . . | γr . 

Proof. Omitted. 

In our present example, we have A−1 = [1] and


 
1 + x −1 − x − x2
B −1 = .
−x 1 + x2

Therefore, we have the following identity:


   
  1 + x −1 − x − x2 1  
g1 (x) g2 (x) · · · 1 = [f (x)].
−x 1 + x2 0

We conclude that with only finitely many memory units (in fact, at most 4),
we may recover f (x) if there is no error for g1 (x) and g2 (x). Furthermore,
if there are single errors for g1 (x) and g2 (x), say replacing them by
xn , xm , then the decoding results are polynomials which are not infinitely
long meromorphic functions. Therefore, the encoder is not a catastrophic
encoder.
Appendix A: Convolution Codes 213

In general, we may consider r streams of data J = [f1 (x), . . . , fr (x)]


where fi (x) ∈ F((x )), we may view it as a vector J in the r-dimensional
vector space F((x ))r . An encoder may be viewed as a r × n matrix M with
coefficients in the Laurant polynomial ring F[x ](x) to produce n streams of
data as a vector T in F((x ))n in the following formula:
JM = T.
According to the above Smith normal form, we may write M = AΓB; the
above equation can be rewritten as
JAΓB = T.
Since both A, B are invertible, Γ is uniquely determined by M . We have Γ
to be right invertible ⇐⇒ invariant factors of M are invertible, i.e., they are
powers of x. Then, and only then, do we have a non-catastrophic encoder.

A.4. Viterbi Algorithm

There are several decoding methods. All are based on maximum-likelihood


decoding and are not very difficult. One is the Viterbi algorithm using the
state diagrams, and the other is the sequential decoding (or Fano algorithm)
using the tree diagram.
Among all possible decoding methods known today, the Viterbi algo-
rithm is the best. Let us discuss the Viterbi algorithm.
Let us consider an example with f (x) = 1 + x + x4 + x6 as in the
representation of Section A.1, then g1 (x) = (1 + x2 )f (x) = 1 + x + x2 + x3 +
x4 + x8 , g2 (x) = (1 + x + x2 )f (x) = 1 + x3 + x4 + x5 + x7 + x8 , and h(x) =
g1 (x2 ) + xg2 (x2 ) = 1 + x + x2 + x4 + x6 + x7 + x8 + x9 + x11 + x15 + x16 + x17 ,
which represents the data stream [111010111101000111]. Note that we shall
transmit h(x), then we may recover the even part (g1 (x2 )) and the odd part
(xg2 (x2 )), hence g1 (x) and g2 (x), and then f (x). However, a noisy channel
produces a received word r(x) = 1+x+x2 +x6 +x8 +x9 +x11 +x15 +x16 +x17
which corresponds to [111000101101000111]. How do we recover the original
h(x)? More specifically, how do we recover the original data stream? We
shall consider all possible transformations: if the input bit is 1 (i.e., f (x) =
1 + · · · ), then the output is [11] (i.e., h(x) = 1 + x + · · · ); if the input bit
is 0, then the output is [00]. If we consider only the possible sequences of
coefficients, then it is easy to see that after t steps, we consider 2t possible
paths. The number is explosively large. It is the number of the first t places
of binary expansions of all real numbers between 0 and 1. It will become
214 Introduction to Algebraic Coding Theory

uncountably infinity as n → ∞ . We shall prune all possible sequences of


output coefficient vectors using the Hamming distance to the received word
r(x) as the criterion.
Let us consider the following Figure A.1 of paths of 1 ≤ t ≤ 3.

00 0/00 2. 0/00 3. 0/00 3. 0/00 4. 0/00 6. 0/00 7. 0/00 7. 0/00 8. 0/00


4
1/11 1/11 1/11
10 0. 3. 5. . . . . .
2
0/01
01 · 2. 4. . . . . .
1
1/10
0/10 4
11 · 0. 1/01 1. 1/01 1. 1/01 3. 1/01 4. 1/01 4. 1/01 5. 1/01
t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8

Figure A.1.

We write the input/output on most arrow (where the input = the


corresponding coefficient of f (x) and the output = the corresponding
coefficient of h(x)), and write the Hamming distance of the output sequence
and the truncated r at the tip of an arrow. For instance, for the path
determined by a sequence of inputs [001] (f (x) = x2 ), the output is
[000011](h(x) = x4 + x5 + · · · ); comparing with the received sequence
r = [111000 · · · ], we find that the Hammong distance is 5. For the path
determined by a sequence of input [110] (f (x) = 1 + x), the output is
[111010](h(x) = 1 + x + x2 + x4 + 0x5 + · · · ); comparing with the received
sequence r = [111000 · · · ](r(x) = 1+x+x2 +· · · ), we find that the Hammong
distance is 1. Looking at the above diagram up to t = 3, we realize that since
the minimal distance is 1 between all allowed sequences and the received
one, there must be some error(s). Furthermore, there are four pairs of paths
where for the two paths in the same pair, each one starts and ends at
the same digits. Comparing the two paths in each pair, we may delete the
one with the larger Hamming distance from the received r. This is the
pruning method. We shall use the pruning method to keep only four paths
for consideration at t (instead of 2t = 23 = 8 possible paths) and extend
the above diagram from t = 3 to t = 8. We have the following diagram with
the details left to the reader. We select the path with Hamming distance 2
Appendix A: Convolution Codes 215

00 . . . . . . . .3

10 . . . . . . . .3

01 · . . . . . . .2

11 · . . . . . . .3
t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8

Figure A.2.

to the received word r as the maximum-likelihood decoded result. From


the selected path, we decide that the maximum-likelihood code word is
[111010111101000111] which accidentally happens to be the original code
word (see Figure A.2).
This page intentionally left blank
Appendix B

Sphere-Packing Problem and


Weight Enumerators

To tile the real plane with same size tiles of same shape, one may use same
sized triangles, equilaterals, or hexagons. For all other shapes there are
always empty spaces left. This tiling is tight; we may consider the problem
of filling a plane with the same size discs with gaps allowed. One knows
from experience that the following arrangement for discs is tight, and in
fact, it can be proved mathematically.

In 1611, Kepler conjectured that the best space-saving way to stack


equal sized balls in R3 was already known to all fruit sellers in the market,
the pyramidal arrangements. In 1990, the old open problem was dug up by
Hsiang; in fact, he published a controversial paper solving this 400-years-old
conjecture (claimed in 1990). Later on, Hales (claimed in 1998) using the
computer in an essential way solved the problem. For the higher dimensional
cases, the only known solutions are for dimensions 4, 8, and 24.
The relation of the sphere-packing problem and the self-correcting code
is as follows. Let us take the plane example. If both the sender and the

217
218 Introduction to Algebraic Coding Theory

receiver select the centers of the discs as the permissible code words, and
the receiver receives a point which is slightly different from the point of
the original message, we believe that the distance between those two points
indicate the measure of error that occurred. As long as the received point
is in a disc (which is likely), the receiver will decode it as the center of
the disc. Naturally, we want to pack the space by the most efficient way so
we may have the largest possible spheres in some region of the plane, i.e.,
which corresponds to the largest rate of information. We shall generalize
the sphere-packing problem to higher dimension.

B.1. Golay Code

Let us consider the beautiful [23,12] Golay code. In the vector space
V = F23
2 , a ball of radius 3 centered at [00 · · · 00] contains at least
3

Ci23 = 2048 = 211
i=0

points. It is easy to see any ball of radius 3 contains exactly 211 elements. In
the vector space V, there are 223 elements, so it is possible to have 212 balls
with radius 3 that do not overlap. Indeed this happens (see the following),
and we have a code with 212 code words of length 23 and the information
rate 12/23 = 0.52.
We shall follow the way of constructing the Reed–Solomon code to
construct the Golay code. Let us consider the field F211 . All non-zero
elements in F211 satisfy the following equation:
11
x2 −1
+ 1 = 0.
Note that
211 − 1 = 23 × 89.
Therefore, F211 contains a 23rd root of unity. We have the following
decomposition:
x23 + 1 = (x + 1)(x11 + x9 + x7 + x6 + x5 + x + 1)
× (x11 + x10 + x6 + x5 + x4 + x2 + 1) = (x + 1)g(x)g ∗ (x),
where g ∗ (x) = x11 g(1/x). We have the following definition:

Definition B.1. The [23,12] Golay code consists of all polynomials


h(x)g(x) in F2 [x]/(x23 − 1) with deg(h(x))≤ 11. 
Appendix B: Sphere-Packing Problem and Weight Enumerators 219

Note that the [23,12] Golay code is not a Reed–Solomon code. We have
the following proposition.

Proposition B.2. It is possible to have 212 balls with radius 3 that do not
overlap and pack V = F23
2 .

Proof. Omitted. 
The other Golay code is the [11,6] Golay code over F3 , where balls of
radius 2 densely pack F11
3 .

B.2. Uniformly Packed Code

The perfect codes over a finite field are like the tiling of the plane. The only
perfect codes are (1) repetition code of odd length, (2) q-ary Hamming
codes (cf. Exercise 1.5 (1), (2), (4)), and (3) [23,12] binary Golay code and
[11,6] ternary Golay code. Otherwise, it is impossible to use balls of the
same radius to tile a vector space Fnq . Let us consider the simple case over
F2 . We study
A(n, d) = the largest integer M such that there exist M codewords
× {x1 , . . . , xM } such that d(xi , xj ) ≥ d, if i=j.
Let d be odd. Then, the number A(n, d) is the maximal number of ways
of putting balls of radius d/2 in Fn2 without touching each other. There
might be some extra space allowed. This is similar to the sphere-packing
problem in R3 . We may form a code C with C = {x1 , . . . , xM } which will
correct d/2 errors. In general, it is difficult to find the number A(n, d)
except some easy cases with n, d small or d ≥ n.
We may put more conditions on the code C to make it easier to
construct. Let us define the distance d(x, C) and the covering radius ρ(C)
of a set C as
d(x, C) = min{d(x, c) : c ∈ C},
ρ(C) = max{d(x, C) : x ∈ Fn2 },
d = min{d(c1 , c2 ) : c1 = c2 ∈ C}.
We have the following definition.

Definition B.3. Given a code C which may not be linear. Let e be a


positive integer such that the minimal distance d of C satisfies d ≥ 2e + 1
and ρ(C) = e + 1. Then, C is said to be a uniformly packed code with
220 Introduction to Algebraic Coding Theory

parameter r if every word x with d(x, C) ≥ e has distance e or e + 1 to


exactly r code words. 

Let us assume that C is a uniformly packed code. Let us consider the


condition d(x, C) ≥ e. If d(x, C) = e, then there is an unique c ∈ C such
that d(x, c) = e. Otherwise, let d(x, c1 ) = d(x, c2 ) = e and c1 = c2 . Then, it
follows from the triangle inequality that d(c1 , c2 ) ≤ 2e < d. A contradiction.
Note that in this situation, it is not hard to see that given x each of the r
code words c1 , c2 , . . . , cr whose sets of coordinates are different from the
coordinates of x must be all disjoint; otherwise, since the ground field
has only two elements 0, 1, the distance between two code words ci with
non-disjoint sets of coordinates which are different from the corresponding
coordinate of x will be ≤ 2e contradicts to d ≥ 2e + 1. Therefore, we have
(r − 1)(e + 1) + e ≤ n, so we conclude that
n−e
r−1 ≤ ,
e+1
n+1
r≤ .
e+1
If d(x, C) = e + 1, then similarly we have
n
r≤ .
e+1
We define a nearly perfect code as a uniformly packed code with parameter
n
r =  e+1 . After years of research, it turns out that there are few uniformly
packed codes and nearly perfect codes (cf. [9], p. 126).

B.3. Weight Enumerators

For any code C, it is meaningful to compute the geometric configuration


{d(ci , cj ) : ci , cj ∈ C} of C defined by the Hamming distance. In general,
the computation is complicated. We may restrict C to linear [n, k] code only.
Then, we may take ci = 0 and define A =| {cj : cj ∈ C, wt(cj ) = } |. As
a standard trick in mathematics, we form the following polynomial, which
is called the weight enumerator of C:
A(z) = A0 + A1 z + · · · + An z n .
We have clearly A0 = 1 and A(1) = q k . We have the following example.
Example B1: Let C be the repeated code with generating matrix [11111].
Then, C = {[00000], [11111]}, and A(z) = 1 + Z 5 . 
Appendix B: Sphere-Packing Problem and Weight Enumerators 221

However, to compute A(z) for large q, we still have to look over all q k
vectors in C. It is still a formidable task. If n − k is small, then we may be
able to compute the weight enumerator for the dual code C ⊥ . The following
proposition is courtesy of MacWilliams.
Proposition B.4 (MacWilliams identity). Let A(z) be the weight
enumerator of an [n, k] linear code C and B(z) be the weight enumerator of
the dual code C ⊥ . Then, A(z) and B(z) are related by the following formula:
n
1 
B(z) = Ai (1 − z)i (1 + (q − 1)z)n−i−1 .
q k i=0

Proof. The reader is referred to [6], p. 148 or [9], p. 41. 

Example B2: Let C = [7, 4] Hamming code. Then, its dual code C ⊥
is an [7, 3] code. It turns out that each non-zero code words in C ⊥ has
weight 4. Thus, the weight enumerator of C ⊥ is 1 + 7z 4 . Therefore, the
weight enumerator of C which is considered as the dual code of C ⊥ is
1
[(1 + z)7 + 7(1 − z)4 (1 + z)3 ] = 1 + 7z 3 + 7z 4 + z 7 . 
8
This page intentionally left blank
Appendix C

Other Important Coding and


Decoding Methods

C.1. Hadamard Codes

The space ship US Mariner 1969 carried a Hadamard code. We introduce


the Hadamard matrix which is defined as follows.

Definition C.1. A square matrix Hn of order n with entries +1 or −1


such that Hn HnT = nI is called a Hadamard matrix. 

Example C1: We give the following examples:


   
1 1 + +
H2 = = ,
1 −1 + −
⎡ ⎤ ⎡ ⎤
1 1 1 1 + + + +
⎢1 1 −1 −1 ⎥ ⎢+ + − −⎥
H4 = ⎢
⎣1 −1
⎥=⎢ ⎥.
1 −1⎦ ⎣+ − + −⎦
1 −1 −1 1 + − − + 
It can be shown that there are Hadamard matrices only of order 2 or
4m. In general, given any integer n = 4m, we do not know if there is a
Hadamard matrix of order n. The smallest unknown case is n = 668. In Hn
and −Hn , we replace −1 by 0. Thus, we have 2n rows which are code words
in F2 n . It is easy to see that all code words are of equidistant n2 . Thus, we
have a code, a Hadamard code. If n is not a power of 2, then the Hadamard
code is nonlinear. The best codes with n ≤ 2d are practically all nonlinear
and related to the Hadamard matrix.

223
224 Introduction to Algebraic Coding Theory

C.2. Reed–Muller Code

There are several ways to present the Reed–Muller code which is a binary
code correcting several errors. The code was discovered by Muller in 1954
([29]) and the decoding method was due to Reed in 1954 [31]. Its decoding
method is easy; hence, it has been used in several occasions, for instance,
during 1969–1977, all of NASA’s (USA) Mariner class deep-space probes
were equipped with a Reed–Muller code (i.e., RM (5, 1), see following
definition).
Recall from Section 5.1 that a linear coding theory has the following
diagram
σ σ
message space Fkq −→
1 2
function space −→ word space Fnq .
The first map σ1 is injective. Thus, we use functions to rename the messages,
and the second map σ2 is an injective map with the image of σ1 (Fkq ) = the
code space. Usually, the map σ2 is an evaluation map which evaluate a
function f at an ordered n-tuple of points (P1 , . . . , Pn ). Thus, it maps a
function f to [f (P1 ), . . . , f (Pn )] ∈ Fnq . Note that σ2 σ1 will send the message
space to the word space; certainly, we do not want to send any non-zero
message to zero. Thus we require that the composition σ2 σ1 is an injective
map on the message space.
We shall use the following defined P (m, r) as the function space. Let
P (m, r) denote the set of all polynomials of degree ≤ r ≤ m in m variables
(x1 , . . . , xm ) over F2 . Note that computation wise, over F2 , we always have
x2 = x.
Hence, all monomials can be reduced in the computational sense to
multiples of distinct xi .
Let us consider F2 m as the ground vector space. We have a simple
way to represent P (m, r). Let n = 2m . Then there are n = 2m vectors
(or points) in F2 m and (v0 , v1 , . . . , vn−1 ) denote a list of all the 2m binary
vectors in F2 n in some order. Then, for each f ∈ P (m, r), we may define
f (vj ) = h(aj1 , . . . , ajm ), where vj = [aj1 · · · ajm ]. Furthermore, we may
use integers nj to represent vj as we define nj = 2(i−1)aji . We have the
following example of using integers to represent point in F2n .
Example C2: Let us consider m = 4. Then, we have the following
Table C.1 for the values of polynomials x1 , x2 , x3 , x4 . Numerically, xi
goes to 2i−1 and the coefficients a1 , a2 , a3 , a4 in the table determine the
polynomial (xi )ai which in turn goes to 2(i−1)ai , where ai = 1 or 0.
Appendix C: Other Important Coding and Decoding Methods 225

Table C.1.
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
x1 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1
x2 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1
x3 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1
x4 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1

For the polynomials, we use the formula (f + g)(vj ) = f (vj ) + g(vj ) to


find their values. In this way, we may construct a table for the values of all
polynomials. 
Furthermore, we get a binary vector of length n = 2m via the mapping
σ2 (f ) = (f (v0 ), . . . , f (vn−1 )). The set of all vectors obtained in this way is
called the rth order Reed–Muller code of length n = 2m , or RM (m, r) for
short. It is easy to see that RM (m, r) is a linear code [n, k, d] (in fact, it is
a cyclic code). We find the parameters k, d.

Proposition C.2. We have

k = 1 + C1m + · · · + Crm .

Proof. The number of monomials of degree j which are multiples of


distinct xi is clearly Cjm ; hence, the number of monomials of degree ≤ r is
clearly

1 + C1m + · · · + Crm .

Therefore, we need only to show that any non-zero polynomial f will not be
sent to zero vector in F2n by σ2 , i.e., σ2 (f ) = (f (v0 ), f (v1 ), . . . , f (vn−1 )) =
(0, 0, . . . , 0). Suppose that h(x1 , . . . , xm ) = f is a linear combination of
those monomials and f (vi ) = 0 for all i. If h(0, x2 , . . . , xm ) = 0, then
we can find values for x2 , . . . , xm so that it is not zero by mathematical
induction. If h(0, x2 , . . . , xm ) = 0, then h(x1 , . . . , xm ) = x1 g(x2 , . . . , xm )
and h(1, x2 , . . . , xm ) = g(x2 , . . . , xm ), so that we can find values for
x2 , . . . , xm such that h(vi ) is not zero by mathematical induction. In any
case, we have a contradiction if σ2 (f ) = (0, 0, . . . , 0). 

Corollary C.3. The vector space of all polynomials in m variables of


degree less than or equal to r and having all terms with degree in each
xi being one or zero is isomorphic to RM (m, r).
226 Introduction to Algebraic Coding Theory

Proposition C.4. We have the minimal distance d = 2m−r .

Proof. We first show that d ≤ 2m−r . Let us use the notations of the proof
of the preceding proposition. Let h = x1 x2 · · · xr . Then, f = 0 whenever
x1 = 0 or x2 = 0 or · · · or xr = 0. Using set-theoretic inclusions to compute
the number of zeroes among v0 , . . . , vn−1 , we may conclude that it is 2m −
2m−r . There are at least 2m − 2m−r zeroes among v0 , . . . , vn−1 . Therefore,
the minimal weight, hence the minimal distance, is at most 2m−r .
We wish to prove that d ≥ 2m−r . Let us consider a polynomial
h(x1 , x2 , . . . , xm ) = 0, so every variable appears at most once in every
term. We have h(0, x2 , . . . , xm ) = 0 =⇒ h(x1 , . . . , xm ) = x1 g(x2 , . . . , xm ).
Similarly, h(1, x2 , . . . , xm ) = 0 =⇒ h(x1 , . . . , xm ) = (x1 − 1)p(x2 , . . . , xm ).
After we try every xi , then we conclude that either there is an xi ,
say x1 , such that (1) h(0, x2 , . . . , xm ) = 0 and h(1, x2 , . . . , xm ) = 0 or
(2) h = m i=1 (xi − δi ), where δi = 0 or 1. The second case happens only
if r = m, and 2m−r = 1, our proposition is certainly true. In the first
case, both h(0, x2 , . . . , xm ) = g(x2 , . . . , xm ) = 0 and h(1, x2 , . . . , xm ) =
p(x2 , . . . , xm ) = 0. By induction on the number of variables, we conclude
that g(x2 , . . . , xm ) and h(x2 , . . . , xm ) have at most 2m−1 − 2m−1−r zeroes.
Since the sets of zeroes for g(x2 , . . . , xm ) and p(x2 , . . . , xm ) are with x1 = 0
or x1 = 1, they are disjoint. Therefore, f has at most (2m−1 − 2m−1−r ) +
(2m−1 − 2m−1−r ) = 2m − 2m−r zeroes. So the minimal weight, hence the
minimal distance, is at least 2m−r . 

Proposition C.5. We have RM (m, r)⊥ = RM (m, m − r − 1).

Proof. Let us first compute the dimensions:

dim(RM (m, r)⊥ ) = 2m − (1 + C1m + · · · + Crm )


m m
= Cr+1 + · · · + Cm−1 +1
m
= Cm−r−1 + · · · + C1m + 1

= dim(RM (m, m − r − 1)).

Now, we only need to show that every monomial in RM (m, m − r − 1) is


in RM (m, r)⊥ , i.e., with f = ( i xi )( j xj ), where ( i xi ) ∈ RM (m, m −
r − 1) and ( j xj ) ∈ RM (m, r); the following sum is zero always (note that
after reducing the total degree of f by x2i = xi , we may take f to be a
Appendix C: Other Important Coding and Decoding Methods 227

product of less than or equal to m − 1 distinct variables):


n−1
(f (vi )) = 2m−deg(f ) = 0 mod 2.
i=0

The above equation is obvious. 


The method of decoding RM codes is the threshold decoding described
in Section C.4. The reader is referred to [31].

C.3. Constructing New Code from Old Codes

C.3.1. Extend, Puncture, Shorten, Lengthen, Expurgate,


and Augment a Code
Let us take an example to explain the meanings of the operations. Let Ci be
a linear code with the following check matrix Hi . Let H0 be given as follows.
Let C0 be a cyclic code with generating polynomial g0 (x) = x3 + x + 1.
⎡ ⎤
1 0 1 0 1 0 0
H0T = ⎣1 1 1 1 0 1 0⎦,
0 1 1 1 0 0 1
Extend: Any code can be extended by annexing additional check symbols.
The resulting code is not cyclic in general. We have the following example
with C2 be given by the following check matrix”
⎡ ⎤
1 0 1 0 1 0 0 0
⎢ 1 1 1 1 0 1 0 0⎥
H2T = ⎢⎣0 1 1 1 0 0 1 0⎦.

1 1 1 1 1 1 1 1
Puncture: Any code can be punctured by deleting some of its check
symbols. We have an example of C2 → C0 .
Expurgating: A cyclic code generated by the polynomial g(x) can be
expurgated to form another cyclic code by multiplying any additional factor
into the generated polynomial. The most common expurgate code is the
code C1 generated by g(x)(x − 1). We have the following example C0 → C1 .
⎡ ⎤
1 0 1 0 1 0 0
⎢ 1 1 1 1 0 1 0⎥
H1T = ⎢
⎣0 1 1 1 0 0 1⎦.

1 1 1 1 1 1 1
228 Introduction to Algebraic Coding Theory

Augment: A cyclic code generated by the polynomial g(x) may be


augmented into another cyclic code of the same length whose generator
polynomial is a factor of g(x). The most common augmented code is
the one generated by g(x)/(x − 1) (we assume that it is allowed, i.e.,
g(1) = 0). We have the example C1 → C0 with C0 a cyclic code generated
by g0 (x) = g1 (x)/(x − 1).
Lengthen: If the generator polynomial of an original binary cyclic code of
length n contains the factor (x + 1), then this cyclic code can be lengthened
to a linear code of length n+1 by annexing the vector [11 · · · 1] to the code’s
generator matrix. We have the example C1 → C2 .
Shorten: Any code can shortened by deleting message symbols. We have
the example C2 → C1 with C1 a cyclic code.
To summarize the preceding discussion, we have the following figure,
Table C.2.

C.3.2. Direct Product of Cyclic Codes


See Section 3.5.

C.3.3. Concatenated Codes and Justensen Codes


The concept of concatenated codes was introduced by Forney [22] in 1966.
Let us consider a simple example. Suppose that we send out an e-mail which
consists of several sentences in certain set of letters. We may first encode
the letters in certain way. Say we have three letters, we may encode them
in F25 as

00000
11100
00111.

Note that the minimal distance of the three letters is 3, and we expect to
correct 1 error. Then, we may use some encoding scheme, say RS-code, to
encode the whole e-mail. The way to encode the letters will be called the
inner code, and the way to encode the e-mail will be called the outer code.
In general, we may encode any message by a code C1 (like the encoding of
all letters) which is called the inner code, and then encode the code words
C1 by another code C2 which is called the outer code. The most important
development along this direction is the Justensen codes [28].
Appendix C: Other Important Coding and Decoding Methods 229

C.4. Threshold Decoding

We may use the threshold decoding to handle the problem of decoding


several codes. We illustrate the threshold decoding by the following
example.

Example C2: Let C a [7, 3] binary linear code with the check matrix H
⎡ ⎤
1 1 0 1 0 0 0
⎢1 0 1 0 1 0 0⎥
H=⎢ ⎣0 1 1 0 0 1 0⎦.

1 1 1 0 0 0 1

Let c = [c0 , c1 , . . . , c6 ] be the code word sent and r = [r0 , r1 , . . . , r6 ] =


c + e be the received word, where e = [e0 , e1 , . . . , e6 ] is the error vector.
This code may correct 1 error. We shall consider the code in a different way.
Analytically, the check matrix produces the following system of equations
satisfied by c0 :

c0 = c1 + c3
c0 = c2 + c4
c0 = c5 + c6 .

Now, let us assume the channel is binary symmetric with error proba-
bility ℘ < 12 . Then, we have


⎪ r1 + r3 : if e1 + e3 = 0

⎨r + r
2 4 : if e2 + e4 = 0
c0 =

⎪ r5 + r6 : if e5 + e6 = 0


r0 : if e0 = 0.

If there is one error, then we may use majority vote to decide which
value is correct for c0 . If two errors are allowed, and there is a 2-2 tie on the
vote, there is one vote with e0 = 0 since e0 = 0 is with a probability 1 − ℘
and ei +ej = 0 is with a probability (1−℘)2 +℘2 = 1−℘−℘(1−2℘) < 1−℘,
so it is in favor of the vote with e0 = 0, i.e., c0 = r0 . 

Reed obtained a generalization of the above technique to RM codes


(cf. [31]) which we illustrate by the following example. Later on, the method
was applied to many codes and named threshold decoding. The interested
reader is referred to MacWilliams and Sloane [7].
230 Introduction to Algebraic Coding Theory

Example C3: Let us consider RM (3, 1) which is a [8, 4, 4]- code. We have
the following Table C.2.

Table C.2.
0, 1, 2, 3, 4, 5, 6, 7,
m0 1, 1, 1, 1, 1, 1, 1, 1,
m1 0, 1, 0, 1, 0, 1, 0, 1,
m2 0, 0, 1, 1, 0, 0, 1, 1,
m3 0, 0, 0, 0, 1, 1, 1, 1,

We have the following method of decoding.


Threshold Decoding: Let the original message be [m0 , m1 , m2 , m3 ], the
code word be c = [c0 , c1 , . . . , c7 ] = m0 [1, 1, . . . , 1] + m1 [0, 1, 0, 1, 0, 1, 0, 1] +
m2 [0, 0, 1, 1, 0, 0, 1, 1] + m3 [0, 0, 0, 0, 1, 1, 1, 1] and the received word be r =
[r0 , r1 , . . . , r7 ]. Then, the relations between mi , cj are

c 0 = m0 , c1 = m0 + m1 , c2 = m0 + m2 , c3 = m0 + m1 + m2 , c4 = m0 + m3 ,
c5 = m0 + m1 + m3 , c6 = m0 + m2 + m3 , c7 = m0 + m1 + m2 + m3 .

The above relations can be written as follows:

m1 = c 0 + c 1 = c 2 + c 3 = c 4 + c 5 = c 6 + c 7 ,
m2 = c 0 + c 2 = c 1 + c 3 = c 4 + c 6 = c 5 + c 7 ,
m3 = c 0 + c 4 = c 1 + c 5 = c 2 + c 6 = c 3 + c 7 .

Clearly, we do not know ci ’s and only know ri ’s. Assume that there is at
most 1 error. Let us consider the value of m1 , the values of m2 , m3 can be
considered similarly. Then, at least three of the four values of r0 + r1 , r2 +
r3 , r4 + r5 , r6 + r7 must be correct. Therefore, a majority vote will decide
the correct value of m1 . If there are two errors, then the vote might be
tied. In that case, the value is indeterminate. After we find the values for
m1 , m2 , m3 , we are left to find m0 . Note that

[c0 , c1 , . . . , c7 ] = m0 [1, 1, . . . , 1] + m1 [0, 1, . . . , 0, 1] + m2 [0, 0, 1, . . . , 1, 1]


+ m3 [0, 0, . . . , 1, 1].

Once we find the correct values of m1 , m2 , m3 , let us compute

[r0 , r1 , . . . , r7 ] + m1 [0, 1, . . . , 0, 1] + m2 [0, 0, 1, . . . , 1, 1] + m3 [0, 0, . . . , 1, 1].


Appendix C: Other Important Coding and Decoding Methods 231

The result of the above sum should be close to either [1, 1, . . . , 1] or


[0, 0, . . . , 0]. A majority vote should give us the correct value of m0 (except
the case of a 4–4 tie, then the value of m0 is indeterminate). 

C.5. Soft-Decision Decoding

The practice of transmitting signals electronically involves wave functions


through a medium. After a white noise is added to the wave function,
there is a distortion to the signals. Usually, one cannot decide for certain
if the signal s is 0 or 1 (we only consider binary bits). One can associate
reliability probabilities p0 and p1 (note that p0 + p1 = 1) to the particular
signal s being 0 and 1. We usually decide the bit is 0 if p0 ≥ 1/2 and
1 if p0 ≤ 1/2. We consider that if p0 is close to 1, then it is reliable to
assign the signal to 0, and if it is close to 0, then it is reliable to assign the
signal to 1. If it is close to 1/2, then the reliability is weak. One may use
the values of p0 and p1 to decode the sequence of signals. This is what is
called soft-decision decoding. On the other hand, if we only use the bits,
without considering their reliability probabilities, to decode then it is called
hard-decision decoding. So far in this book, we only consider hard-decision
decoding.
Let us consider that the original message is c = (c0 c1 · · · cn ) and
the received word is r = (r0 r1 · · · rn ) with the reliabilities sequence
(1) (n)
(p0 · · · p0 ). The Forney’s generalized minimum distance (GMD) decoder
[22] is essentially to erase the d − 1 least reliable symbols and then use an
algebraic decoder to correct the remaining received word.
A more interesting method is to use the reliability probabilities
(1) (n)
(p0 · · · p0 ). Let us consider a 3 repetition code which for certain bit
produces the reliability probabilities (0.51, 0.51, 0.12) for (0, 0, 0). A hard-
decision (which we had considered so far) decoder will first make a decision
that the bit received are (0, 0, 1) (since 0.51 > 0.50, 0.12 < 0.50) and then
use the majority rule to decide the original bit is 0. A soft-decision decoder
will take the reliability probabilities into consideration. It would take the
average reliability probability as the deciding factor; the average reliability
probability is
1
(0.51 + 0.51 + 0.12) = 0.38.
3
Therefore, the original bit is more likely to be 1. In general, it can be shown
that the soft-decision decoder is better.
232 Introduction to Algebraic Coding Theory

C.6. Turbo Decoder

In 1993, at the IEEE International Conference on Communications in


Geneva, a pair of French engineers, Claude Berrou and Alain Glavioux,
announced their turbo Codes [17] which come very close to Shannon’s
theoretical results. It was surprising news to the world of coding theory.
Berrou likens the code to a crossword puzzle in which one would-be
solver (decoder) receives the “across” clues and the other receives the
“down” clues. After the decoders update their proposed solutions, they
compare notes and update their solutions again. They continue this process
until either they reached a limited number of times (say, 18 times) and
stopped or they reach a consensus about the original message.

C.7. Low-Density Parity Check (LDPC) Codes

There is another code which rivals the turbo codes, the low-density parity
check (LDPC) Codes [23] which was created by Robert Gallager, who was a
Ph. D. student at MIT, in 1958. This code is sometimes called Gallager code.
First, Gallager used sparse matrices for the generator matrices. Second,
Gallager used a decoder for every bit and let the decoders talk among
themselves and thus created a huge rumor mill with thousands or tens of
thousands of talkers. The patent right of LDPC was held by Codex Corp.
until it expired without ever being used. One of the reasons was it was
technically infeasible to create the rumor mill in the 1950s.
Appendix D

Berlekamp’s Decoding Algorithm

Berlekamp’s decoding algorithm is the fastest decoding method for Reed–


Solomon codes today. In industry, it is the main tool to decode the Reed–
Solomon codes. Recall that in Section 3.2 we define the error-locator
polynomial σ(x) and the error-evaluator polynomial ω(x) as follows. Let
M = the set of error locations {γ j }, then

σ(x) = (1 − γ i x) = 1 + · · · ,
i∈M
 
ω(x) = ei γ i x (1 − γ j x).
i∈M j∈M\i

Let
r(x) = r0 + r1 x + · · · + rn−1 xn−1 = the received word,
2t

S(x) = r(γ i )xi .
i=1

Then, we have the following equation:


S(x)σ(x) = ω(x) mod x2t+1 .
We may slightly modify the above equation as
(1 + S(x))σ(x) = (σ(x) + ω(x)) mod x2t+1 .
We may abuse the notation to write σ(x) + ω(x) as ω(x) (for the sake of
induction later on) and get the following key equation:
(1 + S(x))σ(x) = ω(x) mod x2t+1 .

233
234 Introduction to Algebraic Coding Theory

A central problem in coding theory is given S(x), how do we quickly


produce σ(x) and ω(x) with deg(σ(x), deg(omega(x) ≤ t? One of the simple
methods is to look at the conditions and consider
S(x) = s1 x + s2 x2 + · · · + s2t x2t ,
σ(x) = 1 + a1 x + a2 x2 + · · · + at xt ,
where si are known and ai are unknown. Since deg(ω(x))≤ t, we set the
coefficients of xt+1 , . . . , x2t in the product of S(x)σ(x) to be zeroes and get
the following linear equations:
st a1 + st−1 a2 + · · · + s1 at + st+1 = 0,
st+1 a1 + st a2 + · · · + s2 at + st+2 = 0,
·········
s2t−1 a1 + s2t−2 a2 + · · · + st at + s2t = 0.
Since we know that there are coefficients {ai } satisfy the above equations,
the above system of equations is consistent and produces a solution for
indeterminates a1 , . . . , at . We just solve it, find σ(x), and take ω(x) =
S(x)σ(x) mod x2t+1 . However, this method of solving systems of equations
is slow. In this section we reproduce the original method of Berlekamp [4]
which is still the fastest method today.

D.1. Berlekamp’s Algorithm

Note that we already know the existence of σ(x) and ω(x); it follows
from Proposition 2.56 that they are unique. The only problem is how
to find them fast. Note that we know deg(σ(x)) ≤ t, deg(ω(x)) ≤ t and
deg(S(x)) ≤ 2t. Berlekamp’s idea is not to solve the above system of linear
equations directly, rather to find a sequence of σ (k) (x) and ω (k) (x) with
σ (2t) (x) = σ(x) and ω (2t) (x) = ω(x). The key equation is generalized to a
sequence of equations of the following form:
(1 + S(x))σ (k) (x) = ω (k) (x) mod xk+1 , (D1k )
with the degree restrictions polynomials deg(σ (k) (x)), deg(ω (k) (x)) ≤ k+1 2 .
0
We define σ = 1, inductively; suppose that we have constructed
σ (k) (x), ω (k) (x), we want to define σ (k+1) (x), ω (k+1) (x). Let us look at one
more term as follows:
(1 + S(x))σ (k) (x) = ω (k) (x) + Δ(k) xk+1 mod xk+2 , (Ek )
Appendix D: Berlekamp’s Decoding Algorithm 235

where Δ(k) is the next coefficient and a number of the above equation. If
Δ(k) = 0, then we may take σ (k+1) (x) = σ (k) (x) and ω (k+1) (x) = ω (k) (x)
and continue our inductive process of construction. Suppose Δ(k) = 0.
Then, it is complicated to define them. We have to do something more.
We introduce two more functions τ (k) and γ (k) . Let us define them by the
following equations:
Δ(σ (k) ) = σ (k) − σ (k+1) = Δ(k) xτ (k) ,
Δ(ω (k) ) = ω (k) − ω (k+1) = Δ(k) xγ (k) .
Inductively, after we define τ (k) and γ (k) , then we have to define τ (k+1)
and γ (k+1) . let us take (Ek ) − (D1k+1 ) and simplify. Then, we deduce the
following equation:
(1 + S(x))τ (k) = γ (k) + xk mod xk+1 . (D2k )
Inductively, we define τ (k+1) and γ (k+1) as follows. We may use one of the
following two ways, if Δ(k) = 0, let
τ (k+1) = xτ (k) , and γ (k+1) = xγ (k) , (D3)
or if Δ(k) = 0, let
σ (k) ω (k)
τ (k+1) = , and γ (k+1)
= . (D4)
Δ(k) Δ(k)
The critical things are to control the degrees of σ (k) (x), ω (k) (x). We wish
that they are less than or equal to k+12 .
Initially, Berlekamp adds two more integer-valued functions D(k), B(k)
and defines σ (0) = 1, ω (0) = 1, τ (0) = 1, γ (0) = 0, and integer-valued
functions D(0) = 0, B(0) = 0. Inductively, we have two cases:

⎪ (k)
⎪Δ = 0

or


⎨ (k) k+1
Case 1 = Δ = 0, D(k) > 2 or





⎩ Δ(k) = 0 and D(k) = k + 1 , and B(k) = 0,
2
⎧ k+1

⎨ Δ(k) = 0 and D(k) < or
2
Case 2 =

⎩ Δ(k) = 0, D(k) = k + 1 , and B(k) = 1.
2
In case 1, we define τ (k+1) , γ (k+1) by equation (D3) and set
(D(k + 1), B(k + 1)) = (D(k), B(k)).
236 Introduction to Algebraic Coding Theory

In case 2, we define τ (k+1) , γ (k+1) by equation (D4) and set


(D(k + 1), B(k + 1)) = (k + 1 − D(k), 1 − B(k)).
It is easy to see that B(k) only takes values 0 or 1. We have the following
propositions.

Proposition D.1. We always have the following: (1) deg(σ (k) ) ≤ D(k)
with equality if B(k) = 1. (2) deg(ω (k) ) ≤ D(k) − B(k) with equality if
B(k) = 0. (3) deg(τ (k) ) ≤ k−D(k) with equality if B(k) = 0. (4) deg(γ (k) ) ≤
k − D(k) − (1 − B(k)) with equality if B(k) = 1.

Proof. See [4]. 

Proposition D.2. For each k, we have


ω (k) τ (k) − σ (k) γ (k) = xk .

Proof. See [4]. 

Proposition D.3. If σ (k) and ω (k) are any pair of polynomials which
satisfy
σ (k) (0) = 1 and (1 + S)σ (k) = ω (k) mod xk+1 .
Let D = max{deg σ (k) , deg ω (k) }. Then, there exist polynomials u and
v such that
u(0) = 1, v(0) = 0,
deg u ≤ D − D(k), deg v ≤ D − [k − D(k)],
(k+1) (k) (k)
σ = uσ + vτ ,
ω (k+1) = uω (k) + vγ (k) .

Proof. See [4]. 

Proposition D.4. If σ and ω are relative prime and σ(0) = 1 and (1 +


S)σ = ω mod xk+1 , then

(1) Either deg σ ≥ D(k) + 1 − B(k) ≥ D(k), or deg ω ≥ D(k), or both.


(2) If deg(σ) ≤ k+1 k
2 and deg(ω) ≤ 2 , then σ = σ
(k)
and ω = ω (k) .

Proof. See [4]. 


We let k = 2t, then we inductively construct σ, ω and finish the
Berlekamp’s algorithm.
References

[1] Matthew: Matthew’s Gospel, AD 80–90.


[2] Lao Tzu: Tao Te Ching. The oldest excavated portion dated to late 4th
century BC. One of the English translation Fall River Press.
[3] Schrödinger, E. What is Life. The Physical Aspect of Living Cell.
Cambridge: At The University Press. New York: The MacMillan Company,
1945.
[4] Berlekamp, E.R. Algebraic Coding Theory. New York: McGraw-Hill, 1968.
[5] Berlekamp, E.R. (ed). Key Papers in The Development of Coding Theory.
New York: IEEE press, 1974.
[6] McEliece, R.J. The Theory of Information and Coding. Encyclopedia of
Mathematics and Its Applications, Vol. 3, Reading, Mass: Addison-Wesley,
1977.
[7] MacWilliams, J. and Sloane, N. The Theory of Error-Correcting and Coding.
Amsterdam: North Holland Publishing Co. 1977.
[8] Pretzel, O. Codes and Algebraic Curves. Clarendon Press, Oxford, 1998.
[9] van Lint, J.H. Introduction to Coding Theory. Springer, 1999.
[10] Chevalley, C. Introduction to the Theory of Algebraic Functions of One
Variable, AMS, Providence, RI, 1951.
[11] Hartshorne, R. Algebraic Geometry. Springer-Verlag, 1977.
[12] Kurosh, A.G. The Theory of Groups, Chelsea, New York, NY, 1955.
[13] Mumford, D. Introduction to Algebraic Geometry. Springer-Verlag, 1964.
[14] Van der Waerden, B. Modern Algebra. Frederick Unger Publ. Co. 1950.
[15] Walker, R.J. Algebraic Curves, Princeton University, Princeton, Dover
reprint (1962).
[16] Zariski-Samuel Commutative Algebra, Vol I & II D. Van Nostrand Co. 1960.
[17] Berrou, C. and Glavieux, A. Near optimum error correcting coding and
decoding: Turbo codes. IEEE Trans. Comm., 44(10): 1261–1271. October
1969.

237
238 Introduction to Algebraic Coding Theory

[18] Burton, H.O. and Weldon, E.J. Cyclic product codes. IEEE Trans. Inf.
Theory, IT-11: 433–439, 1965.
[19] Elias, P. Coding for noisy channels. IRE Conv. Record, part 4, 37–46.
Einhoven University of Technology, 1993
[20] Duursma, I.M. Decoding codes from curves and cyclic codes. Ph.D. disse-
tation, Einhoven University of Technology, 1993.
[21] Feng, G.L. and Rao, T.R.N. A simple approach for construction of algebraic-
geometric codes from affine plane curves. IEEE Trans. on Inf. Theory, 40,
1003–1012, 1994.
[22] Forney, G.D.Jr. Generalized minimum distance decoding. IEEE Trans. on
Inf. Theory, 12(2): 125–131, April 1966.
[23] Gallager, R.G. Low-density parity-check codes. IRE Trans. Inf. Theory.,
IT-8: 21–28. January 1962.
[24] Ghorpade, S. and Datta, M. Remarks on Tsfasman–Boguslavsky conjec-
ture and higher weights of projective Reed–Muller codes. In Arithmetic,
Geometry, Cryptography and Coding Theory, Providence, RI: AMS, 2017,
pp. 157–169.
[25] Goppa,V.D. A new class of linear error-correcting codes. Probl. Inf. Trans.,
6: 207–21, 1970.
[26] Hartmann,C.R.P. and Tzeng, K.K. Generalizations of BCH bound. Inf.
Control, 20: 489–498, 1972.
[27] Ihara, Y. Congruence relations and Shimura curves. Proc. symp. Pure Math.,
33(2): 291-311, 1979.
[28] Justensen, J. A class of constructive asymptotically good algebraic codes.
IEEE Trans. Inf. Theory, 18: 652–656, 1972.
[29] Muller, D.E. Metric Properties of Boolean Algebra and Their Application to
Switching Circuits. Report No. 46, Digital Computer Laboratory, Univ. of
Illinois, April 1954.
[30] Peterson, W.W. Encoding and error-correction procedures for the Bose-
Chaudhuri codes. IRE Trans. Inf. Theory, October 1960.
[31] Reed, I.S. A class of multiple-error-correcting codes and the decoding
scheme. J. IRE Trans. Inf. Theory, September 1954.
[32] Reed, I.S. and Solomon, G. Polynomial codes over certain finite fields. J. Soc.
Ind. Appl. Math., June 1960.
[33] Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.,
27: 379–423, 623–656, 1948.
[34] Skorobogatov, A.N. and Vlǎduţ, S.C. On the decoding of algebraic-
geometric codes. IEEE Trans. Inf. Theory, 36: 1051–1060, 1990.
[35] Sugiyama, Y. Kasahara, M. Hirasawa, S., and Namekawa, T. A method for
solving key equation for decoding Goppa codes. Inf. Control, 27: 87–99,
1975.
[36] Tsfasman, M.A., Vlǎduţ, S.C., and Zink, T. Modular curves, Shimura curves
and Goppa codes better then the Varshamov-Gilbert bound. Math. Nachr.,
109: 21–28. 1982.
[37] Xing, C. Nonlinear codes from algebraic curves improving the Tsfasman-
Vlǎduţ-Zink bound. IEEE Trans. Inf. Theory, 49(7): 1652–1657, 2003.
References 239

[38] Hasse, H. Zur Theorie der abstrakten elliptischen funktionenkörper I,II &
III. Crelle’s Journal, 175: 193–208, 1936.
[39] Serre, J.P. Sur le nombre des points rationnels d’une courbe algebraic sur
un corps fini. C.R. Acad. Sc. Paris, 296: 397–402, 1983.
[40] Weil, A. Number of solutions of equations over finite field. Bull. Amer. Math.
Soc., 55: 497–508, 1949.
This page intentionally left blank
Index

A C
abelian group, 5 calculus, 105, 156
absolutely irreducible, 98, 100, 104, canonical divisor, 120, 129, 181
143, 145, 148 catastrophic encoder, 211
affine algebraic variety, 93, 95, 105 Cěch theory, 101
affine space, 90–93, 97 characteristic, 36, 59, 99, 155
algebraic closure, 7, 41, 52, 63, 94, check matrix, xvi, 10–11, 16–17, 49,
102 157, 160–161, 166, 171–174, 178,
algebraic degree, 98 199–200
algebraic geometry, vii, 89, 101, check polynomial, 51, 63, 66, 68, 74,
105 79
algebraic integer, 131 check symbol, xvi, 10
algebraic object, 93–95 Chevalley, C., vii–viii, 90, 108, 114,
algebraic variety, 97–99, 101 117, 119–121, 123, 126, 130
algebraically closed, 90, 94–95 Chinese Remainder Theorem, 47, 51
analytic function, 97 co-prime, 47, 110
analytic geometry, 97 code, 75
arithmetic, 102 BCH, 63, 65–67, 76, 78, 81
associative law, 5 block, xvii, 11
augmenting code, 228 classical Goppa, 80–85, 142,
147–148
B cyclic, 62–65, 74
ball, 24 dual, 16, 148–149
Berlekamp algorithm, 234 error-correcting, 142
Bézout’s theorem, 93, 106 geometric Goppa, 78, 81, 89,
binary expansion, 17 103, 143, 145, 148–149, 151,
birationally equivalent, 96, 109–110 159–160, 166, 179, 199, 206
birationally isomorphic, 103 Golay, 218

241
242 Introduction to Algebraic Coding Theory

Hamming, 17–21, 23, 28, 49–51, differential, 118–119, 121, 123–130,


62–63, 74 145–147
linear, 4, 16, 23 dimension, 12, 15, 19, 24, 75, 77,
MDS, 19, 76 90–92, 99, 101, 111, 113, 116, 125,
one point, 143, 151, 167, 172, 131, 148
180, 182, 198 dimension theory, 99, 101
perfect, 20–21 distributative law, 5
Reed–Solomon, vii, xix, 74–77, divisor, 110, 150, 159, 162, 180
85, 142, 145, 150–153, 160 divisor group, 108
repetition, 3–4, 16–17, 28 domain of definition, 105
self-correcting, xvi, xix–xx, 3–4 dot product, 162
code subspace, 61 Duursma algorithm (DU algorithm),
code word, 9, 13, 18, 62, 67, 74, 77, 153, 160–161, 179–180, 199, 206
80, 83, 147, 164–166, 168, 172
coding theory, 3, 7 E
Cohen’s theorem, 99 effective, 108
commutative group, 5 Eisenstein’s criterion, 100
commutative law, 5 Elias, P., 209
complete, 92–93 ellipse, 90
complex number, 95, 98–99, 116, 118, elliptic curve, 110–111, 115
123, 130 entropy, 24, 155
concatenated codes, 228 equicharacteristic, 99
convolution code, 209 equivalent, 9, 84
coset leader, 14–15 error evaluator, 69, 84
covering, 128 error location, 79, 84, 162, 170, 172,
curve, 3–4, 100–106, 120, 123, 197
126–132, 135–137, 142–143, 145, error locator, 69, 84, 163–167,
147–148, 150–151, 153, 155, 159, 169–170, 172, 176, 180, 184, 186,
162, 166, 173, 179, 198 188–189, 193–198, 201, 203–204
error message, 68, 85, 152, 160–161,
D 206
decoder, 20, 67–68, 151, 160, 166, 178 error vector, 13–14, 68, 160–162,
decoding, xvi, 4, 12, 15, 17, 20, 166–168, 170, 172, 198
22–23, 43, 45, 67, 84–85, 150–151, error word, 84, 162, 164–165, 174, 187
159–161, 163, 172–173, 179, 185, Euclidean algorithm, 43–44, 46
206 Euclidean geometry, 90
Dedekind, R., 96 expurgated code, 227
Deligne, 131 extended code, 227
denominator, 41, 69, 83
derivative, 34, 63, 156 F
Desargues, G., 104 Fano algorithm, 213
designed, 66–67, 144, 147, 168, 173, Feng, G.L., 160, 179
179, 199 Fermat curve, 102, 115, 132
differentiable function, 97 field, 5, 90, 93–94, 98, 106
differentiable geometry, 97 field of constants, 102
Index 243

finite field, 9, 14, 24, 31, 36, 41, Ihara, 155


51–52, 59, 75, 92, 94, 127, 142, 145, infinity, 91, 105, 145
147, 155 inner code, 228
flowchart, 67, 161 intersection, 110, 115
formal power series, 55, 99 invalid vote, 190, 198, 203
Frobenius map, 36 irreducible algebraic variety, 98, 102
function form, 143, 148, 150
J
G Jacobian criterion, 101
Galois theory, 36
Gaussian elimination, 9, 176 K
generator matrix, xvi, 10, 16, 51, 157, Kepler, J., 217
160, 174, 199 key equation, 233
generator polynomial, 51, 63–64, 66, Klein, 136, 173, 198
73, 76, 78 Kurosh, A.G., 32
genus, 114–117, 120, 130, 136–137,
143, 145, 148, 150–151, 153, 155, L
159, 162, 167, 173, 179, 198 Lagrange interpolation, 46, 51
geometric object, 93–95 Laurent polynomial ring, 210
Gilbert–Varshamov’s bound, 24, lengthened code, 228
27–28, 82, 153 letter, xvi, 9, 81, 147, 178, 206
Goppa polynomial, 80–81, 147 line, 3–4
Goppa, V.D., 142, 145 linear algebra, 3, 8
long algorithm, 43, 71, 73, 78, 84
H
Hadamard code, 223 M
Hadamard matrix, 223 majority voting, 160, 179, 196
Hamming distance, 12, 24, 27, 63, 66, matrix theory, 160
83 maximum likelihood decoding, 14–15,
Hamming weight, 12, 61, 64, 76, 83 21
Harshorne, R., 90 meromorphic function, 55–56
Hartshorne, 132 message symbol, xvi, 10, 151–152
Hasse–Weil’s inequality, 131–132, minimal distance, 12, 16, 18, 76, 84,
136–137 143–144, 147, 150, 157, 164,
Hausdorff dimension, 101 167–168, 171, 173, 180, 185, 187,
Hilbert Basis Theorem, 93 189, 199
Hilbert’s nullstellensatz, 94 minimal weight, 165
hyperbola, 90 Möbius, 53
multiple root, 34, 52, 63
I Mumford, D., viii, 90, 101
ideal, 45–47, 61–63, 71, 84, 93–95,
97–98, 100, 102–103, 105 N
maximal, 95, 97–100, 126 nearly perfect code, 220
prime, 98–99 Noether’s Normalization Lemma,
radical, 94–95 101
244 Introduction to Algebraic Coding Theory

O rational function field, 96, 98, 103,


outer code, 228 105, 162
rational point, 97–98, 100–101, 103,
P 105, 113, 117, 121, 128, 130–137,
143, 145, 148, 150–151, 155–156,
 product, 162 159, 162, 173
parabola, 90 real number, 14, 90, 95, 99–100, 103
perfect field, 36, 52, 101–102, 129–130 received word, 21, 51, 73, 84, 152,
plane, 90, 111 164–165, 168, 174–175, 180,
Plücker’s formula, 114–117, 151 200–201
point, 97–101, 103, 105 receiver, 4, 51, 160, 180
pole, 104–105, 108–109, 111–113, regular, 99–101
115–116, 118, 121, 124–130, 142, regular element, 32, 42, 80
146–147, 162–163, 165 regular function, 97, 104, 110
polynomial, vii, 31, 34, 40, 42–43, repartition, 119, 124–125
46–52, 57–59, 61, 63–66, 75–77, 80, residue, 124, 126–130, 146–147, 149
82–85, 90, 93–95, 97, 103, 110, 113, residue degree, 98
116, 125, 127–128, 142, 145, 147, residue field, 99
151 residue form, 145, 148–150
polynomial ring, 51 residue map, 128
Poncelet, J., 104 Riemann’s theorem, 110, 114–115,
power series, 209 117, 144, 174, 197, 199
primary form, 148, 159, 166 Riemann–Roch theorem, 120, 130,
primitive, 64–66, 75, 81 146
projective algebraic variety, 104–105, Theorem I, 118
131 Theorem II, 122
projective space, 90–93 ring, 31
puncturing code, 227 complete local, 127
Hilbert’s, 95
Q Jacobson, 95
q-adic completion, 98 local, 98–100, 104, 126
quadratic curve, 4, 90 regular local, 99

R S
radical ideal, 95 sender, 160, 180
rank of code, 9, 82, 144, 147, 150, separable, 52, 59, 102, 130
157, 173, 199 sequential decoding, 213
rank of matrix, 192–195, 202–203 Shannon’s Theorem, 21–23, 27–28,
Rao, T.R.N., 160, 179 76, 153
rate of distance, 22, 24, 76, 157 Shimura, 153, 155–156
rate of information, 21–24, 28, 76, 82, shortened code, 228
145, 156–157 Singleton, R., 18
rational function, 57–58, 69, 71, 83, Singleton bound, 18–19, 144
97, 105, 130, 142–143, 162, 164, singular locus, 102
181–182, 194 singular point, 101–102
Index 245

SKHN, 67 trace, 126–127


Skorobogatov–Vlăduţ, 160 transcendental basis, 102
Smith normal form, 212 transcendental degree, 101–102
smooth, 114 tree diagram, 213
smooth point, 101, 103, 137 Tsfasman, M.A., 155
space code, 9, 11, 13, 17–18, 20, 50,
77, 148–149, 165 U
message, 9, 11 uniformization parameter, 99, 113,
syndrome, 11, 13, 15 123, 126–129
sphere-packing problem, 217 uniformly packed code, 219
split, 59, 103, 112, 115
standard basis, 9, 16 V
standard form, 17
valid vote, 192–193, 195–196,
state diagram, 213
202
Stirling’s formula, 25
vector space, 8
stream of data, 209
Verdermonde matrix, 63, 65
subspace, 9, 11, 16, 61, 63, 75, 92
Viterbi algorithm, 213
Sugiyama–Kasahara–Hirasawa–
Vlǎduţ, S.C., 155
Namakawa, 71
volume, 24
SV algorithm, 151–152, 160–161,
166–167, 172–173, 178–179, 206
W
symmetric channel, 22
syndrome, 11, 15, 84, 162, 164–165, Walker, R.J., viii
167, 172, 178 Weber, H., 96
syndrome calculation, 160–161, Weierstrass gap, 130, 181, 183
166–167, 172, 175, 178, 200 weight enumerator, 220
syndrome calculus, 165 Weil, A., 119, 131
syndrome table, 168–169, 175, Weil’s conjecture, 131, 137
183–184, 187–189, 201–204
Z
T Zariski-Samuel, viii, 6–7, 90, 93, 95,
threshold decoding, 229 99, 101–102
topology, 101, 116 zeta function, 131, 137
total quotient ring, 41–42, 51, 80, 98 Zink, T., 155

You might also like