100% found this document useful (1 vote)
17 views

Exercises In Applied Mathematics With A View Toward Information Theory Machine Learning Wavelets And Statistical Physics Daniel Alpay download

The document is a comprehensive guide titled 'Exercises in Applied Mathematics With a View toward Information Theory, Machine Learning, Wavelets, and Statistical Physics' by Daniel Alpay. It covers various mathematical topics and includes exercises aimed at enhancing understanding in areas such as information theory and machine learning. The book is part of the Chapman Mathematical Notes series, which publishes high-quality research in mathematics.

Uploaded by

neldertawni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
17 views

Exercises In Applied Mathematics With A View Toward Information Theory Machine Learning Wavelets And Statistical Physics Daniel Alpay download

The document is a comprehensive guide titled 'Exercises in Applied Mathematics With a View toward Information Theory, Machine Learning, Wavelets, and Statistical Physics' by Daniel Alpay. It covers various mathematical topics and includes exercises aimed at enhancing understanding in areas such as information theory and machine learning. The book is part of the Chapman Mathematical Notes series, which publishes high-quality research in mathematics.

Uploaded by

neldertawni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Exercises In Applied Mathematics With A View

Toward Information Theory Machine Learning


Wavelets And Statistical Physics Daniel Alpay
download
https://ebookbell.com/product/exercises-in-applied-mathematics-
with-a-view-toward-information-theory-machine-learning-wavelets-
and-statistical-physics-daniel-alpay-57081340

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Global Practices And Training In Applied Sport Exercise And


Performance Psychology 1st Edition J Gualberto Cremades

https://ebookbell.com/product/global-practices-and-training-in-
applied-sport-exercise-and-performance-psychology-1st-edition-j-
gualberto-cremades-50847362

Reflective Practice In The Sport And Exercise Sciences Critical


Perspectives Pedagogy And Applied Case Studies 2nd Edition Brendan
Cropley

https://ebookbell.com/product/reflective-practice-in-the-sport-and-
exercise-sciences-critical-perspectives-pedagogy-and-applied-case-
studies-2nd-edition-brendan-cropley-48958746

Exercises In Building Construction Fortyfive Homework And Laboratory


Assignments To Accompany Fundamentals Of Building Construction
Materials And Methods 4th Edition Edward Allen

https://ebookbell.com/product/exercises-in-building-construction-
fortyfive-homework-and-laboratory-assignments-to-accompany-
fundamentals-of-building-construction-materials-and-methods-4th-
edition-edward-allen-48673126

Exercises In Organic Synthesis Based On Synthetic Drugs 1st Edition


Marcus Vincius Nora

https://ebookbell.com/product/exercises-in-organic-synthesis-based-on-
synthetic-drugs-1st-edition-marcus-vincius-nora-51881492
Exercises In Organic Synthesis Based On Synthetic Drugs Nora De Souza
Mv

https://ebookbell.com/product/exercises-in-organic-synthesis-based-on-
synthetic-drugs-nora-de-souza-mv-52689950

Exercises In Analysis 2nd Edition Dipak Roy

https://ebookbell.com/product/exercises-in-analysis-2nd-edition-dipak-
roy-56103848

Exercises In Cellular Automata And Groups 12th Edition Tullio


Ceccherinisilberstein Michel Coornaert

https://ebookbell.com/product/exercises-in-cellular-automata-and-
groups-12th-edition-tullio-ceccherinisilberstein-michel-
coornaert-57922308

Exercises In Architecture Learning To Think As An Architect Simon


Unwin

https://ebookbell.com/product/exercises-in-architecture-learning-to-
think-as-an-architect-simon-unwin-22007032

Exercises In Programming Style Cristina Videira Lopes Lopes

https://ebookbell.com/product/exercises-in-programming-style-cristina-
videira-lopes-lopes-23396224
Chapman Mathematical Notes

Daniel Alpay

Exercises
in Applied
Mathematics
With a View toward Information Theory,
Machine Learning, Wavelets,
and Statistical Physics
Chapman Mathematical Notes

Series Editor
Daniele C. Struppa, Chapman University, Orange, CA, USA

Editorial Board Members


Fabrizio Colombo, Politecnico di Milano, Milan, Italy
Peter Jipsen, Chapman University, Orange, CA, USA
Andrew N. Jordan, Chapman University, Orange, CA, USA
Andrew Moshier, Chapman University, Orange, CA, USA
Marco Panza, Chapman University, Orange, CA, USA
Irene Sabadini, Politecnico di Milano, Milan, Italy
Alain Yger, University of Bordeaux, Bordeaux, France

This series, hosted by Chapman University, will publish high-quality research


volumes in a variety of mathematical areas such as mathematical analysis, math-
ematical physics, logic, and history and philosophy of mathematics. Monographs,
proceedings of conferences, and topical collections of articles will appear in this
series. The goal is to provide timely dissemination of state-of-the-art research in
rapidly developing areas, as well as surveys of important areas of research. Some of
these volumes may also be used as graduate level textbooks.
All volumes in this series will be selected after a careful peer-review process on
the basis of their scientific quality, the importance and the timeliness of the topics,
and the clarity of exposition.
Daniel Alpay

Exercises in Applied
Mathematics
With a View toward Information Theory,
Machine Learning, Wavelets, and Statistical
Physics
Daniel Alpay
Schmid College of Science and Technology
Chapman University
Orange, CA, USA

ISSN 3005-1509 ISSN 3005-1517 (electronic)


Chapman Mathematical Notes
ISBN 978-3-031-51821-8 ISBN 978-3-031-51822-5 (eBook)
https://doi.org/10.1007/978-3-031-51822-5

Mathematics Subject Classification: 15-02, 94-02, 80-01, 14-01

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This book is published under the imprint Birkhäuser, www.birkhauser-science.com by the registered
company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.


It is a pleasure to thank the Foster G. and
Mary McGaw Professorship in Mathematical
Sciences which supported this research. It is
also a pleasure to dedicate this work to my
son Raphael.
Contents

1 Prologue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 A Guide Throughout the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 The Source Partition Theorem: Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Another Motivation for H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Shannon’s Theorem for the Binary Symmetric Channel (BSC) . . . . . 14
1.6 Repetition Code and Hamming Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 Maximum Entropy and a First Connection with Statistical Physics 20
1.8 The Case of Continuous Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.9 Some Questions Related to Statistical Physics . . . . . . . . . . . . . . . . . . . . . . . 25
1.10 Some Questions and Tools in Quantum Mechanics . . . . . . . . . . . . . . . . . . 27
1.11 Some Questions and Tools in Quantum Information Theory . . . . . . . . 29
1.12 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.13 Solutions of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.14 Some Specific Exercises and Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Part I Algebra
2 Complements in Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 Beyond Complex Numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.3 Vector Spaces and Linear Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.4 Hermitian Forms and Positive Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.5 The Perceptron Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.6 The Scalar Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.7 Solutions of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3 Positive Semi-Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.2 Schur Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
3.3 Newton’s Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
3.4 Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

vii
viii Contents

3.5 Positive Semi-Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184


3.6 Tensor Products and the Tensor Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
3.7 Completely Positive Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
3.8 Maximum Entropy Analysis and the Yule-Walker Equation . . . . . . . . 206
3.9 Solutions of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
4 Algebra and Error-Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
4.1 Sets, Functions, and Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
4.2 Rings, Ideals, and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
4.3 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
4.4 Splitting Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
4.5 Error-Correcting Codes: Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
4.6 Linear Block Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
4.7 A Code Which Corrects Two Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
4.8 Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
4.9 Solutions of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

Part II Analysis
5 Complements in Real and Complex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
5.1 Warm-Up Questions and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
5.2 Some Known Facts: Bolzano, Rolle, and Cauchy et les autres . . . . . . 319
5.3 Convex and Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
5.4 Stirling’s Formula. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
5.5 Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
5.6 Liouville’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
5.7 Multivariable Calculus and Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
5.8 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
5.9 Analytic Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
5.10 Solution of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
6 Complements in Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
6.1 Topologies and Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
6.2 Hilbert Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
6.3 The Lebesgue Space L2 (R, dt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
6.4 Operators in Banach and Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
6.5 The Fourier Transform and Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
6.6 Legendre and Zak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
6.7 Positive Definite Functions and Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
6.8 Positive Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
6.9 Reproducing Kernel Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
6.10 Bochner’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
6.11 The Fock Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
6.12 Hermitian Forms and Reproducing Kernel Krein Spaces. . . . . . . . . . . . 468
6.13 Solutions of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
Contents ix

Part III Probability and Applications


7 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
7.1 Review on Finite Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
7.2 Random Variables in Finite Probability Spaces . . . . . . . . . . . . . . . . . . . . . . 530
7.3 Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
7.4 Probability Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
7.5 Sigma-Algebras and General Probability Spaces . . . . . . . . . . . . . . . . . . . . 548
7.6 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
7.7 Second-Order Stationary Processes: Discrete Case . . . . . . . . . . . . . . . . . . 553
7.8 Loève’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
7.9 Solutions of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
8 Entropy: Discrete Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
8.1 Proof of the Source Partition Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
8.2 Properties of the Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
8.3 Other Measures of Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
8.4 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
8.5 Metrics on the Set of Random Variables and Entropy . . . . . . . . . . . . . . . 616
8.6 Boltzmann-Gibbs Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
8.7 Solutions of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
9 Thermodynamics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
9.1 Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
9.2 Fluids and the State Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
9.3 The Ideal Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642
9.4 Nonideal Gas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
9.5 Solutions of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685
Name Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
Chapter 1
Prologue

The fundamental problem of communication is that of


reproducing at one point either exactly or approximately a
message selected at another point.

Claude Shannon [387], A mathematical theory of


communication (1948)

There is, essentially, only one problem in statistical


thermodynamics: the distribution of a given amount of energy E
over N identical systems or, perhaps better, to determine the
distribution of an assembly of N identical systems over the
possible states in which this assembly can find itself, given that
the energy of the assembly is a constant E.

Erwin Schrödinger, [381, p. 1]

. . . reconcile the time-reversibility of classical mechanics and


the irreversibility of thermodynamics.

Peter Whittle, on Ehrenfest model, [439, pp. 136–137]

The above quotations pertain to information theory, equilibrium thermodynamics,


and statistical physics (be it classical or quantum), three captivating topics, with
numerous (and sometimes unexpected) interactions. The latter occur on the level
of the mathematical tools being used and, more importantly, via the feedback and
communication between these fields using the problems and their solutions. The
methods used can be very involved, even at the finite-dimensional linear algebra
level and even more when functional analysis is used. When one adds to the list the
theory of error-correcting codes, then finite fields and algebraic curves come into
play, in particular the Riemann-Roch theorem (or more precisely, its counterpart1 in
the setting of algebraic curves over a finite field).

1 Neither the original theorem nor its algebraic counterpart is discussed in the present book.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1


D. Alpay, Exercises in Applied Mathematics, Chapman Mathematical Notes,
https://doi.org/10.1007/978-3-031-51822-5_1
2 1 Prologue

In this book, we present a collection of mathematical exercises with solutions,


modeled on our previous books [15, 16, 18], which aims at preparing the
reader to study statistical physics, equilibrium thermodynamics, information
theory (and the associated theory of error-correcting codes), and their various
connections. We therefore have added a number of notes and remarks to
motivate the exercises. The theory of communication and signal theory are in
the background, and a part of the exercises has been chosen from the theory
of wavelets and machine learning. These exercises should give the student
a good preparation to enter the fascinating fields of harmonic analysis and
signal processing at the graduate level.

To quote from the book of Daniel Amit [42, p. xiv] (and replacing there he by he/she
and his by his/her),

Had I had my ideal reader, he/she would have read this chapter superficially on first
reading. Then, if the rest of the book would have kept up his/her interest, he/she
would have come back to reread the introduction.

1.1 A Guide Throughout the Book

The present book is intended to senior undergraduates and beginning graduate


students, majoring in mathematics, physics, or engineering. These students will
have heard about notions such as entropy, machine learning, wavelets, and quantum
channels, but will not necessarily have the tools to understand the concepts. The
majority of them may even have written some “machine learning” programs,
without being aware of the deep mathematical tools behind the theory. These
tools include in particular positive semi-definite matrices2 (which appear, among
other places, in principal component analysis, kernel methods, quantum channels,
and as covariance matrices), elementary functional analysis (for instance, in the
various approximation theorems in neural network theory), and probability (and
in particular the entropy function and some introduction to stochastic processes and
generalized stochastic processes). As a sample example of application, we mention
the approximation of a pre-assigned probability distribution using a restricted
Boltzmann machine (see [5]).
To guide the students in their journey to research in the abovementioned (and
definitely also other) fields, we present exercises from a number of domains, both

2 Often called positive in the book.


1.1 A Guide Throughout the Book 3

theoretical and more applied, and use as a roadmap Shannon’s communication


channel and its quantum counterpart.
A large part of the book is in the finite-dimensional setting, and indeed it is fair
to say that little more than the theory of finite probability spaces is needed to get a
first understanding of information theory. Similarly, coding theory stays on the level
of finite mathematics on a first approach, and only linear algebra is really used in
the setting of finite quantum channels, although the notions and results are more
involved (or, maybe, lesser known). Finally, we note that the proof of the perceptron
convergence theorem in .RN uses finite-dimensional linear algebra (in particular,
the Cauchy-Schwarz inequality and the triangle inequality for the Euclidean norm
in .RN ), even if the two classes of feature vectors to be separated are not finite (see
[312], [330, Chapter 5]. The proof of the theorem is presented in form of a guided
exercise in Sect. 2.5).
We now survey some of the mathematical tools discussed in the sequel.
1. Real analysis: the theory of convex functions plays an important role, as well as
the Legendre transform. The latter is defined by

L(f )(x) = sup(βx − f (β)),


. (1.1.1)
β

where f is convex. For a .C 2 (R) function with strictly positive second derivative,
the Legendre transform can be written as
 −◦  −◦ 
.L(f )(x) = x f (x) − f
(1) (1)
f (x) , (1.1.2)

where we denoted by .f (1) the derivative of f and by .g −◦ the inverse of the


function g with respect to composition (see Exercise 1.1.1; also the discussion in
[309, pp. 26–28], where the case of concave functions is considered).
2. Multivariable calculus, in particular the notion of exact differential: The first law
of thermodynamics asserts that the difference of energy between two equilibrium
states, say .E1 and .E2 , of a given system does not depend on the path chosen to
go from one equilibrium to the second one. It is therefore defined by an exact
differential, arising from a potential. The above difference .E2 − E1 is the sum of
the transfer of thermal energy .Q1⍿→2 (the heat) and of the transfer of the energy
.W1⍿→2 due to work:

E2 − E1 = Q1⍿→2 + W1⍿→2 .
.

In general neither .Q1⍿→2 nor .W1⍿→2 are defined by exact differentials, and the
above expression is written as

dE = δQ + δW,
. (1.1.3)
4 1 Prologue

where the .δ in .δQ and .δW points out that the corresponding expression is not an
exact differential. The decomposition in (1.1.3) is unique, in the sense that heat
is, by this very definition, the amount of exchanged energy which is not work (of
course, one has to define work in an independent way; this is the energy obtained
from purely mechanical systems). (See, e.g., the discussions in [128, p. 45 and
p. 78] and [356, p. 35].) This exchange is at the microscopic level. In practice,
the quantity of heat is expressed in terms of the calorimetric coefficients ([128,
p. 78] see Definition 5.7.12).
3. Probability theory and its various connections with other fields, at the elementary
level (i.e., without sigma-algebras): This setting allows to consider interesting
questions (and as we mentioned already, to prove the asymptotic partition
theorem), but is not an elaborate enough tool to study, for instance, Markov
chains.
4. Measure theory, in particular to go from the setting of finite probability spaces to
the case of infinite probability spaces: Measure theory is also needed to define
in a correct way the Lebesgue spaces such as .L2 (R, dx), used in the theory of
signals indexed by continuous time and in quantum information theory.
5. The notion of Hilbert spaces and their operators: The space .L2 (R, dx) was men-
tioned just above in the fourth item. The notion of compact operator is important,
for instance, in the study of Karhunen-Loève expansions. Interestingly enough,
the Arzelà-Ascoli theorem is used in the arguments (see [52, Appendix]).
6. The notion of positivity, be it a positive semi-definite matrix 3 or positive operator,
permeates the book and is one of the key players in the present work: In
linear system theory, positivity is the mathematical translation of the notion of
dissipativity. Positive definite functions appear in particular as covariances of
second-order stochastic processes, and completely positive maps intervene in the
definition of a quantum channel. Moreover, a density is a positive functional on
positive observables, while a macro-state can be seen as a positive linear form on
an algebra (see, e.g., [57, p. 212]). Projections are positive operators, and so are
effects, i.e., positive operators less than the identity (with respect to the partial
ordering of positive semi-definite matrices; see Definition 3.5.4 for the latter).
Last, but not least, we mention one of the usual suspects for the analyst, namely,
the reproducing kernel Hilbert space associated with a positive definite function.

Before presenting the first exercise, we mention that the term Question will
be reserved to exercises for which no solution is given in the book.

3 We will use often the terminology positive rather than positive semi-definite here; see
Remark 3.5.2.
1.2 Introduction 5

Exercise 1.1.1 Prove (1.1.2) and compute the Legendre transform of .ln Z(β), with
βE
Z(β) = 1+e2 .
.

Question 1.1.2 More generally, compute the Legendre transform of .ln Z(β), where
Z(β) is the partition function defined in Exercise 1.7.3.
.

1.2 Introduction

The notion of entropy or maybe more appropriately the notions of entropy form a
common link between information theory, equilibrium thermodynamics, statistical
physics, and quantum mechanics. We refer to the paper The many facets of entropy,
[435] for a survey. To define the thermodynamic entropy, recall first Carnot’s
theorem, which states that for a reversible heat engine acting between two heat
sources, it holds that

Q1 Q2
. + = 0.
T1 T2

In this equation, .T1 and .T2 are the respective temperatures of the sources, and .Q1
and .Q2 denote the quantities of heat respectively exchanged with these sources.
Building on the work of Sadi Carnot (whose short life, 1796–1832, reminds of the
short lives of Abel, Galois,4 Riemann, and others), Clausius introduced the entropy
S in thermodynamics in 1865 via its differential dS in the formula

δQ
dS =
. . (1.2.1)
T
In this expression, .δQ is the variation of heat in a given process and is not an exact
differential (meaning that the quantity of heat is not a state function), and .1/T is the
integrating factor (T being the temperature). Formula (1.2.1) expresses the fact that,
in physics (as opposed to everyday usage), heat and temperature are two different
concepts, related by entropy.
A mathematical notation for the differential .δQ would rather be .ωQ , and we will
sometimes use the latter, writing

ωQ = T dS
. (1.2.2)

rather than (1.2.1).


From a different point of view, Boltzmann defined entropy via his celebrated
formula as

4 We refer to the articles [373] and [403], which set the records straight on some misconceptions
on the life and death of Evariste Galois.
6 1 Prologue

S = kB ln W
. (1.2.3)

where .kB is Boltzmann’s constant (which, in the case of a gas, allows to make the
link between temperature and the kinetic energy of the gas particles) and W is the
number of micro-states compatible with a given macro-state. Let us recall at this
stage the formula

R
kB =
. (1.2.4)
N
where R is the gas constant per mole and N is Avogrado number, that is (by
definition), the number of molecules of a gas in a mole. We also mention that
the Lagrange multiplier .β appearing in the partition function (1.7.3) below can be
written as
1
β=
. (1.2.5)
kB T

allowing to relate temperature in the classical and quantum settings.


We rewrite (1.2.3) as


W
1
S = −kB
. ln(1/Wi ), (1.2.6)
Wi
i=1

where .Wi = 1/W for .i = 1, . . . , W . This formula still makes sense for general
probability distributions and also appears in this case in Boltzmann’s work and,
forgetting the (fundamental) constant .kB , allows to make the link with Shannon’s
entropy (1948; see [387]; one then replaces .ln by the logarithm in basis 2, denoted by
.log2 ) and with von Neumann’s entropy (1927; see [431, p. 257]; see (1.2.9) below).

Historically, the time arrow goes from Clausius to Shannon, but pedagogically,
a better and common strategy is to begin with Shannon’s entropy, and this is the
path we have chosen here. In the quotation at the beginning of this chapter, Shannon
had in mind to reproduce efficiently the given message. In the process, he basically
created in the 1940s information theory, a mathematical theory which considers
the problem of transmitting information over a (in general noisy) communication
channel from an information source to a destination, with a number of errors as
small as possible and with a rate of transmission as close to 1 as possible.5 (See
Fig. 1.1). This figure is only a very schematic diagram, of course. In particular,
the encoder should be divided into a source encoder and a channel encoder, and
similarly for the decoder (see [173, p. 3]; see Figs. 1.2 and 1.3).
In the above, the term transmitting may be a bit misleading, since the communi-
cation channel can be, for instance, a compact disk, and then the information may be

5 Two obviously contradicting goals


1.2 Introduction 7

source coding channel decoding decoded


A B C D

noise

Fig. 1.1 Communication channel

source encoder channel encoder

Fig. 1.2 Encoder

impaired but stays at the same place. The term information is also a bit misleading,
and is not taken in the usual every day’s life sense. For instance, one will characterize
a language in terms of the frequency of appearance of letters (or more generally pair
of letters and their transition frequencies), without considering the meaning of a
specific sequence of letters. We wrote in the above frequency rather than probability
on purpose. One will first obtain the frequencies by empirical means and then use
these as probabilities in computations. This is not at all an innocent approach, related
to ergodicity, and will be flawed in certain cases.
One can summarize the first main challenges of information theory as follows:
Challenges 1.2.1
(a) Is it possible to transmit data efficiently in presence of noise?
.

(b) How to organize and transmit efficiently the data6 ?


.

In the above questions, a number of terms need to be defined:


1. Information source : in a simplified but useful first approach, an information
source will just be a finite probability space

. {a1 , . . . , aK } (1.2.7)

6 When the noise can be neglected, this is noiseless coding theory, a topic not considered here.
8 1 Prologue

channel decoder signal decoder

Fig. 1.3 Decoder

with probability distribution

P {ak } = pk ,
. k = 1, . . . , K. (1.2.8)

The number (compare with (1.2.6))


K
H =−
. pk log2 pk (1.2.9)
k=1

(where .log2 denotes the logarithm in basis 2 and with the convention that
.0log2 0 = 0) is called the entropy of the information source. In other words,
.H = E(X), where X is the random variable taking value .−log2 (pk ) with

probability .pk and where .E denotes the mathematical expectation. The entropy
plays a key role in the theory to solve the communication problems mentioned
in Challenges 1.2.1 . As recalled in [342, p. 8], this quantity was first defined (up
to a multiplicative constant) in the setting of thermodynamics by Boltzmann and
Gibbs (see also the paper of Szilard [402]). Shannon (see [387, p. 19]) mentions
the appearance of the function H in statistical mechanics. As discussed in [69],
the fact that the same term is used for quantities pertaining to a priori completely
different situations has created a lot of confusion.
More involved information sources can also be considered, in particular
ergodic sources and band limited continuous signals. Moreover, it should be
noted that not all errors of transmission carry the same weight (quite often, a
false detection is less important than the lack of detection, as has realized by
anyone who has been under a missile attack). This disparity is translated by a
distortion rate function, which assigns a weight, say .d(u, v) to the fact that u was
sent and v was received. Let .P (u, v) denote the probability that v was received
when u was sent. The number

.d = d(u, v)P (u, v) (1.2.10)
u,v
u/=v

is called the average distortion.


1.2 Introduction 9

2. Communication channel : At a first stage, and for an information source defined


as above, a communication channel will be characterized by a finite set of
transition probabilities. If .Y ∈ Ω is received after .X ∈ Ω is sent, we are given
the transition probabilities

def.
. pj k = P (Y = aj |X = ak ), j, k = 1, . . . , K. (1.2.11)

In the case of the binary symmetric channel with probability of error .1 − p, the
alphabet is .{0, 1}, and the transition probabilities are

P (Y = 0|X = 0) = P (Y = 1|X = 1) = p
. (1.2.12)
P (Y = 1|X = 0) = P (Y = 0|X = 1) = 1 − p.

The probability of error is .1 − p when the inputs are independent.


3. Rate of transmission. Assuming the answer to question .(a) above is positive,
redundancy will be necessary. The rate of transmission will measure this
redundancy.
Note that different alphabets may appear at the input and at the output of the
channel (see, e.g., the erasure channel in Exercise 8.4.6). Transition probabilities are
then defined accordingly. Note also that a more precise model of a channel would
introduce (among other things) a cost function.
The second and third quotations at the beginning of the chapter, taken from the
books [381] of Erwin Schrödinger and [439] of Peter Whittle, respectively, go to
the very core and first problem of statistical physics: to explain thermodynamics
and in particular the second law of thermodynamics, using classical mechanics.
This leads from the beginning to apparent contradictions, since the differential
equations governing mechanics are reversible in time, while the second law implies
an arrow of time. Thermodynamics deals with global variables of a given system
(for instance, a gas) in equilibrium (more precisely, in thermodynamic equilibrium,
a term which we will not define at this point), such as temperature, pressure,
volume, and chemical potential, while applying mechanics is at the local level of
the molecules of the gas, i.e., macro versus micro. There is one key hypothesis
when going from the micro-level to the macro-level:

All the microstates compatible with a given macro-state have equal probabil-
ity,

which is translated into Boltzmann formula (1.2.3). The term probability says it all;
as for the (subsequent) theory of information, statistical physics (as would indicate
its name) makes use of probability theory.
10 1 Prologue

The previous discussion hints at a wide range of interacting domains, both in


mathematics, physics, and engineering, and a student entering this web needs
to know quite a number of results and methods. This is where the present book
intervenes.

In this prologue, we outline some of the connections mentioned above and discuss
some of the questions and mathematical machinery to be used. We review informally
the following topics, to give the reader a general idea of the questions considered in
information theory and statistical physics:
1. The source partition theorem (and introduction to entropy).
2. Another asymptotic motivation of the entropy (the thermodynamic limit).
3. Shannon’s theorem for the binary symmetric channel.
4. The simplest error-correcting codes (the repetition code and the Hamming code).
5. The maximum entropy principle, the Boltzmann-Gibbs distribution, and the
related minimum cross-entropy principle.
6. The case of continuous signals. We mention Shannon’s sampling theorem and
Loève expansion theorem.
7. A simple combinatorial example, to illustrate the main axiom of statistical
physics.
8. Discussion of tools from quantum mechanics and in particular the tensor product
and the Fock space.
9. Some questions and tools in quantum information theory.
The above list shows that the highlights of the book could be divided into
four groups, along the dichotomies finite dimensional/infinite dimensional and
classical/quantum. The finite-dimensional setting already gives the main ideas
(asymptotic partition theorem and quantum channels), but the true analysis requires
measure theory, the theory of stochastic processes and functional analysis in the
classical setting, and the theory of operator algebras for the quantum setting.
Remark 1.2.2 The communication channel diagram allows to set various top-
ics/courses in perspective:
1. Stochastic processes: the topic is needed in particular to model the noise in a
communication system. The relevant study requires measure theory.
2. Information theory: to understand the theory beyond error-correcting codes and
study new kinds of channels.
3. Error-correcting codes: to transmit the data with a probability of error as small as
possible and getting as close as possible to the capacity of the channel. The tools
are mostly algebraic, using, for instance, the theory of finite fields.
4. Statistical physics: to understand the notion of entropy introduced in information
theory in a wider setting, comparing with the other definitions of entropy.
5. Wavelets: one needs to compress the data to have efficient transmission. Wavelet
theory requires deep tools from Hilbert space theory and functional analysis.
6. Quantum information theory and quantum error-correcting codes.
1.3 The Source Partition Theorem: Entropy 11

1.3 The Source Partition Theorem: Entropy

To motivate the introduction of the entropy via (1.2.9), we present now the source
partition theorem, also called the asymptotic partition theorem, or the asymptotic
equipartition theorem. For the concept of product probability, see Definition 7.1.7.
It is fascinating, at least to the author, that the entropy function appears in the
source partition theorem in such a natural way in the asymptotic behavior of a
combinatorial problem (by the latter, we mean, how many important sequences of
length n are there as the length of the sequences increases ).
Theorem 1.3.1 Let .Ω = {a1 , . . . , aK } be a finite probability space with probability
distribution

P (ak ) = pk ,
. k = 1, . . . , K, (1.3.1)

and entropy


K
H =−
. pk log2 pk , (log2 denotes the logarithm in basis 2)
k=1

and let for every .n ∈ N the space .Ωn be endowed with the corresponding product
probability. Then for every .ϵ ∈ (0, 1) and every .δ > 0 there exists .n0 ∈ N such
that, for every .n ≥ n0 , the set .{a1 , . . . , aK }n can be partitioned as a union of two
non-overlapping sets as

. {a1 , . . . , aK }n = An (ϵ, δ) ∪ Bn (ϵ, δ)

with

. P (Bn (ϵ, δ)) < ϵ

and

.(1 − ϵ)2n(H −δ) ≤ Card An (ϵ, δ) ≤ 2n(H +δ) .

Remarks 1.3.2
(a) We have

P (An (ϵ, δ)) ≥ 1 − ϵ


.

and, intuitively, the theorem says that there are .2nH important sequences as
.n → ∞. These are the typical sequences (there are various more precise ways
12 1 Prologue

to define typical sequences; see the proof of the theorem and Exercise 8.1.7).
The set .Bn (ϵ, δ) corresponds to rare events in the language of large deviations
theory. (See [411, §4.5, pp. 26–27] for a discussion of the interpretation of the
entropy in the setting of large deviations.)
(b) We will see that .H ≤ ln K and equality that holds if and only if the probability
distribution is uniform:
1
p(ak ) =
. , k = 1, . . . , K.
K

All sequences of length n have then the same probability, namely, .K −n . In that
case, .2nH = K n , and of course, the theorem, although still correct, cannot
provide any improvement.
(c) The proof of the theorem (see Sect. 8.1) requires little more than the law of large
numbers (see Theorem 7.2.12 for the latter).
(d) Assume the random variables not independent. Is there a similar theorem? The
answer is yes for stationary ergodic sources; one needs more advanced tools in
mathematical analysis and probability to state and prove the result. In particular,
one considers infinite sequences indexed by the integers and needs to define a
probability measure on .ΩZ .
We also note that the notion of entropy allows to define the capacity of a noisy
channel (see Definition 8.4.5). For a channel with input X and output Y , the capacity
is defined by

.C = sup (H (X) + H (Y ) − H (X, Y )) , (1.3.2)

and is computed over all the output probabilities (or input probabilities). It is a key
quantity in the statement of Shannon’s theorem on coding in presence of noise (see
Theorem 1.5.6 below for a version for the binary symmetric channel).
Exercise 1.3.3 Compute the capacity of the channel with transition probabil-
ity (1.2.11) when .K = 2.
Hints: Following [364, p. 117], it is useful to introduce quantities .q0 and .q1 such
that
    
p00 p10 q0 p00 log2 p00 + p10 log2 p10
. = . (1.3.3)
p01 p11 q1 p01 log2 p01 + p11 log2 p11

It may also be useful to use the formula

H (X) + H (Y ) − H (X, Y ) = H (Y ) − H (Y |X).


.

See (8.1.11) for the definition of the conditional entropy .H (Y |X) and (8.2.18) for
the mutual information.
1.4 Another Motivation for H 13

1.4 Another Motivation for H

Another asymptotic estimate which allows to introduce in a natural way the entropy
H is obtained via the Maxwell-Boltzmann statistics (see also Exercise 7.1.19). For
every .n ∈ N, we consider .Nn distinguishable objects that can be placed inside K
different boxes (for instance, books and bookshelves). Thus, there are .N1,n objects
in the first box, .. . . , .NK,n objects in the K-th box. Let .Nn = N1,n + · · · + NK,n .
There are
 
Nn def. Nn !
. = (1.4.1)
N1,n , N2,n , . . . , NK,n N1,n ! · · · NK,n !

possible ways to arrange these objects in the boxes. Assume that

limn→∞ Nk,n = ∞,
. k = 1, . . . , K, (1.4.2)

and that the thermodynamic limit

Nk,n
limn→∞
. = pk > 0, k = 1, . . . , K (1.4.3)
Nn

is in force. One then has:

∀ ϵ > 0, ∃ n0 such that: .

    
 1 K 
 Nn 
n ≥ n0
. =⇒  log2 + pk log2 pk  < ϵ.
 Nn N1,n , N2,n , . . . , NK,n 
k=1

See Exercise 5.4.3. The result is completely different when the objects are not
distinguishable; one then gets the Bose-Einstein statistics; see Exercise 7.1.22.
Remarks 1.4.1 We note the following:

(1) The quantity .− K k=1 pk ln pk (we have moved without warning to the natural
logarithm . . . ) is then called the thermodynamical probability by Boltzmann
(see [332, p. 1]).
(2) The expression (1.4.1) is called the multinomial coefficients. It appears also in
the case of independent trials with more than two possible outcomes; see, e.g.,
[337, p. 108].
(3) The approach to obtain the entropy presented in this section uses Stirling’s for-
mula rather than probability. This example is the starting point in Kolmogorov’s
paper [260, §2] to the effect that
it is not clear why information theory should be based so essentially on probability
theory
and ([260, §9 p. 39])
14 1 Prologue

Information theory must precede probability theory, and not be based on it. By the
very essence of this discipline, the foundations of information theory have a finite
combinatorial character.
See also [286].

1.5 Shannon’s Theorem for the Binary Symmetric Channel


(BSC)

We now present Shannon’s theorem for the binary symmetric channel (see (1.2.12))
and take the formulation from [242, p. 4]. We need to first define the Hamming
distance, codes (see Definition 4.5.2), and nearest neighbor decoding. In the
statement .GF (2) = Z/2Z is the Galois field with two elements.
Definition 1.5.1 The number of non-zero entries of .x ∈ (GF (2))n is called the
Hamming weight (or the weight), denoted by .wH (x), of .x, and the Hamming
distance between two elements .x and .y of .(GF (2))n is the number of entries where
these elements differ. It is the weight of their difference:

dH (x, y) = wH (x − y).
. (1.5.1)

Definition 1.5.2 A .(n, M, d) code .C over .(GF (2))n is a subset of M vectors of


n
.GF (q) with minimal distance d. When the code is linear with dimension k, we

have .M = 2k , and one uses the terminology and notation .[n, k, d] code.
Exercise 1.5.3 (see, for instance, [109, p. 25]) Define for .x and .y in .(GF (2))n

x ⊙ y = (x1 y1 , x2 y2 , . . . , xn yn ).
. (1.5.2)

Show that

.wH (x − y) = wH (x) + wH (y) − 2wH (x ⊙ y). (1.5.3)

Remark 1.5.4 Viewing (1.5.2) as the counterpart of the intersection of two sets,
formula (1.5.3) suggests that .wH (x − y) is the counterpart of the symmetric
difference of sets; see (4.1.19) for the latter, and see (4.1.20), which is the set
theoretical counterpart of (1.5.2), to pursue this analogy. The solution of the exercise
takes advantage of this similarity.
Definition 1.5.5 The nearest neighbor decoding consists in choosing the code word
closest to the received word in the Hamming distance.
Theorem 1.5.6 (Shannon; see [242, p. 4] for this formulation) Let .p ∈ [0, 1/2)
in (1.2.12), and let
1.6 Repetition Code and Hamming Code 15

C(p) = 1 + p log2 p + (1 − p) log2 (1 − p)


.

be its capacity (see Exercise 1.3.3). Let .ε and .δ be two pre-assigned strictly positive
numbers. There exists .N = N(ε, δ, p) such that for every .n ≥ N, there exists a
code .C ⊂ (GF (2))n satisfying:
log2 M
(1) The rate of transmission .R = n is such that

C(p) − ε < R < C(p) + ε.


. (1.5.4)

(2) The probability of error using the nearest neighbor decoding is less than .δ.

1.6 Repetition Code and Hamming Code

The entropy plays a central role in the definition of the capacity of a channel
(see (1.3.2)) and in the statement of Shannon’s theorem on communication in pres-
ence of noise. The proof of that latter theorem is not constructive. Error-correcting
codes allow to achieve in practice efficient communication. Before presenting the
first example of an error-correcting code, we mention that communication (say over
the binary channel (1.2.12) with probability of error .p < 1/2) is always possible
with probability of error as small as we wish, the price to pay being the rate of
transmission going to 0. To that effect, it is enough to send each input .2N + 1 times
and, using the majority rule, decide what was sent. The probability of incorrect
decoding is then:


2N +1  
2N + 1 k
.PN (error) = p (1 − p)2N +1−k . (1.6.1)
k
k=N +1

For instance, when .2N + 1 = 3, .P3 = 3p2 − 2p3 < p for .p ∈ (0, 1/2).
Exercise 1.6.1 Show that

limN →∞ PN = 0.
. (1.6.2)

At least two proofs are possible. The first uses the bounds (5.4.8) for .n! given by
Stirling’s formula, while the other uses the law of large numbers. See also, for
instance, [131, p. 43] for a similar discussion.
The repetition code does correct one error but is quite inefficient. Hamming was
the first to develop a nontrivial code that corrects errors (see Remark 1.6.2). We now
present the simplest Hamming code in the setting of the Galois field .GF (2) (also
written .Z/2Z). Rather than sending four bits of information, .x0 , x1 , x2 , x3 , we send
7 bits .x0 , . . . , x6 defined by
16 1 Prologue

⎛ ⎞
1 0 0 0 1 1 0
⎜0 1 0 0 1 0 1⎟
x0 x1 x2 x3 x4 x5 x6 = x0 x1 x2 x3 ⎜
⎝0

0 1 0 0 1 1⎠
. 0 0 0 1 1 1 1 (1.6.3)
= x0 x1 x2 x3 I4 P
= x0 x1 x2 x3 x0 + x1 + x3 x0 + x2 + x3 x1 + x2 + x3 ,

with
⎛ ⎞
1 1 0
⎜1 0 1⎟
.P = ⎜ ⎟.
⎝0 1 1⎠
1 1 1

Let

G = I4 P
. and H = −P t I3 . (1.6.4)

G is called the generating matrix of the code, and the Hamming code is the left
range of G. The matrix H (or sometimes its transpose .H t ) is called the parity-check
matrix. The numbers .x4 , x5 , x6 are called the parity check bits.
Note that the columns of H give all the integers from 1 to 7 in basis 2. We also
remark that we have .H Gt = 0.
Historical Remark 1.6.2 Hamming’s code appears first in the 1948 paper of
Shannon [387, §17, p. 44] (with a reference to Hamming as being the founder of
the method), with generating matrix
⎛ ⎞
1 1 1 0 0 0 0
⎜1 0 0 1 1 0 0⎟
.⎜ ⎟. (1.6.5)
⎝0 1 0 1 0 1 0⎠
1 1 0 1 0 0 1

Note that this matrix is not of the form (1.6.4). Hamming’s paper on error-correcting
code appeared in 1950; see [199]. There, Hamming also mentions (see [199, p. 160])
that the first article on error-correcting codes seems to be due to Golay and appeared
in 1949; see [181].
Exercise 1.6.3 With H as in (1.6.4), show that a vector .x ∈ (Z/2Z)7 is a code
word if and only if .xH t = 0.
Now suppose that we send .x = x0 x1 x2 x3 x4 x5 x6 and receive

y = x + e,
.
1.6 Repetition Code and Hamming Code 17

where

e = e0 e1 e2 e3 e4 e5 e6
.

is the error. We compute

yH t =
. xH t
 + 
eH t .
equals 0 since x is a code word syndrome

Exercise 1.6.4 The map .e ⍿→ eH t is one-to-one and onto between the errors of
Hamming weight 1 and the rows of .H t .
Hence the Hamming code can correct one error, and its probability of incorrect
decoding is

Pe = 1 − (1 − p)7 − 7p(1 − p)6 = 21p2 + p2 o(p)


. (1.6.6)

and rate .4/7.


Admittedly, the increase in rate for this specific example is only somewhat
higher than the rate of the repetition code for .N = 3. But the point is elsewhere:
the Hamming code was the first instance (besides the repetition code, which is
inefficient as .N → ∞) of an error-correcting code and hinted that it might be
possible to correct a larger number of errors in an efficient way.
More generally (see the following exercise), one can define a linear code, still
named Hamming code, by a .m×(2m −1) matrix whose columns are the coefficients
in basis 2, of the numbers from 1 to .2m − 1.
Exercise 1.6.5 Construct Hamming error-correcting codes of length .2m − 1 and
with m parity bits, and correcting one error.
Even more generally, one can replace the Galois field .GF (2) by the finite field
GF (q), where .q = pu , with p prime and .u ∈ N. Writing the coefficients of the
.

numbers 1 to .q m − 1 in basis q,

t = a0 + a1 q + · · · + am−1 q m−1 ,
. aj ∈ {0, . . . , q − 1} (not all aj equal to 0),
m −1)
one builds the parity check matrix .H ∈ (GF (q))m×(q of a code which corrects
one error.
Historical Remark 1.6.6 The first example of an error-correcting code correcting
two or more errors appeared in [85] more than a decade after Hamming’s code,
using the theory of Galois fields (finite fields). As a rule, the various generations of
codes that followed each used some specific and often quite involved mathematical
tools, for instance:
• Theory of algebraic structures (ideals of rings), for cyclic codes.
• Linear system theory, and realization of rational functions, for convolutional
codes.
18 1 Prologue

• The theory of algebraic curves, and algebraic geometry, for Goppa codes.
• Probability theory and information theory, for polar codes.
Remark 1.6.7 The Hamming distance .wH (see Definition 1.5.1) is a distance or a
metric in the functional analysis sense, meaning that for all .x, y, z ∈ (Z/2Z)N , the
following conditions are in force:

dH (x, y) ≥ 0
dH (x, y) = 0 ⇐⇒ x = y
.
dH (x, y) = dH (y, x)
dH (x, z) ≤ dH (x, y) + dH (z, y).

See Definition 6.1.8 and Sect. 6.1 for complements on metric spaces.
A fixed-length code is a subset (in general, we will want a linear code, i.e., a
linear subspace) of .(Z/2Z)N ; see Definition 4.5.2. Requiring that the minimum
distance between any choice of two different code words is at least .2d + 1, we can
correct d errors by decoding the received element .y to be the element code closest
to .y in the Hamming distance.
Question 1.6.8 Check on the matrix G in (1.6.5) that the Hamming code corrects
one error.
Hint: It suffices to show that the Hamming distance between two code words is at
least 3. Another more direct way is to compute

(Y4 + Y5 + Y6 + Y7 , Y2 + Y3 + Y6 + Y7 , Y1 + Y3 + Y5 + Y7 ),
. (1.6.7)

where the received word is .(Y1 , . . . , Y7 ). In case of one error, one obtains its location
in the received word using (1.6.7); see [200, p. 41], [387, p. 44].
The Hamming code is a special case of a cyclic code (see Sect. 4.8). Staying in
the setting of .GF (2), these are defined as linear subspaces, say .C, of .(GF (2))n with
the property that

(a0 , a1 , . . . , an−1 ) ∈ C =⇒ (an−1 , a0 . . . . , an−2 ) ∈ C.


. (1.6.8)

Note that
 
0(n−1)×1 In−1
. an−1 a0 · · · an−2 = a0 a1 · · · an−1 ,
1 01×(n−1)

where we identify the vector .(a0 , a1 , . . . , an−1 ) with the matrix . a0 a1 · · · an−1 .
Question 1.6.9 Is the code with generating matrix (1.6.4) cyclic?
Hint: A general proof, involving the notion of equivalent codes, is given in
Exercise 4.8.8. Here, at this stage of the book, it is enough for the student to check
cyclicity, or lack thereof, directly on a basis of the code.
1.6 Repetition Code and Hamming Code 19

Exercise 1.6.10 Prove that the .(7, 4, 3) Hamming code is perfect, that is, the union
of the (pairwise non-intersecting) closed balls of radius 1 centered at the code points
is the whole of .(GF (2))7 .
The study of cyclic codes involves the factorization of the polynomial .Xn − 1 ∈
(GF (2))[X] into irreducible factors. The Hamming code of length 15 corresponds
to the factor .M(X) = X4 + X + 1 of .X15 + 1 (see, e.g., [390, p. 65]).
Exercise 1.6.11 Divide .X15 + 1 by .X4 + X + 1 in .(GF (2))[X].
More generally, we have the following factorization of .X15 + 1 into irreducible
polynomials of .(GF (2))[X]:

X15 +1 = (X +1)(X2 +X +1)(X4 +X +1)(X4 +X3 +1)(X4 +X3 +X2 +X +1).


.

(1.6.9)
See, e.g., [345, p. 58] and the discussion after Exercise 4.3.5.
Remark 1.6.12 The factorization of .X15 + 1 in .R[X] is completely different; see
Exercise 2.1.27. Of course, the comparison is somewhat fallacious since in each
of these instances, the .+1 in .X15 + 1 designates two different objects, belonging,
respectively, to .GF (2) and .R.
Remark 1.6.13 Another example of factorization in terms of irreducible factors,
this time in .(GF (3))[X], would be

X8 − 1 = (X − 1)(X2 − 2X + 2)(X2 + 2)(X − 2)(X2 − X + 2).


.

See (4.9.9) and the discussion above that equation.


Codes have to be adapted to the noise (for instance, to correct bursts of errors).
Developing codes involves various mathematics tools, such as linear system theory
(convolutional codes), algebraic geometry (Goppa codes), and others. We also
mention structured matrices and fast algorithms to invert them. We state, for
instance, the following question.
Question 1.6.14 Let .F be a finite field of characteristic 2, and let .X1 , . . . , Xr ∈ F,
assumed all different from 0 and pairwise distinct. We do not know the .Xj but the
sums

S1 = X1 + · · · + Xr
S2 = X12 + · · · + Xr2
.
.. (1.6.10)
.
S2r = X12r + · · · + Xr2r

are given. How to get .X1 , . . . , Xr from .S1 , . . . , S2r ?


20 1 Prologue

The solution involves inverting a Hankel matrix (i.e., a matrix constant on the off-
diagonals; see Remark 3.2.24).
Exercise 1.6.15 Let .s1 , . . . , sr be the symmetric polynomial functions of the num-
bers .X1 , . . . , Xr . Show that, with .S1 , S2 , . . . as in (1.6.10),
⎛ ⎞⎛ ⎞ ⎛ ⎞
S1 S2 · · · Sr sr Sr+1
⎜S2 S3 · · · Sr+1 ⎟ ⎜sr−1 ⎟ ⎜Sr+2 ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ . ⎟⎜ . ⎟ = ⎜ . ⎟. (1.6.11)
⎝ . . ⎠⎝ . ⎠ ⎝ . ⎠
. .
Sr Sr+1 · · · S2r−1 s1 S2r

Hint: Let .s(x) = (1 + xX1 ) · · · (1 + xXr ) = 1 + s1 x + s2 x 2 + · · · + sr x r . Compute


j +r
.X
i s(Xi−1 ) and sum on i. See [78]. See Sect. 3.3 for another approach.

1.7 Maximum Entropy and a First Connection with


Statistical Physics

To shed another light on the entropy H defined by (1.2.9), consider the following
problem:
Problem 1.7.1 Given .K ∈ N and .K + 1 real numbers .r1 , . . . , rK and r, find the
“best” probability distribution such that


K
. pk r k = r (1.7.1)
k=1

(of course, by convexity, we need .r ∈ [min rk , max rk ]).


The problem is not solvable as given (but in the case where .K = 2 and .r1 /= r2 ,
for which there is a unique solution and the term “best” need not be specified
or explained). When .K > 2, the term “best” can be given various interpre-
tations, by maximizing some pre-assigned (symmetric) function of K variables
.f (p1 , . . . , pK ). When one takes f to be the entropy, one is lead (see Exercise 8.6.4)

to the Boltzmann -Gibbs distribution

e−βrk
pk (β) =
. , k = 1, . . . , K, (1.7.2)
Z(β)

where the quantity .Z(β), called the partition function, is defined by


K
.Z(β) = e−βrk , (1.7.3)
k=1
1.7 Maximum Entropy and a First Connection with Statistical Physics 21

and .β is the solution of the transcendental equation


K
. rk e−βrk = Z(β)r. (1.7.4)
k=1

Remark 1.7.2 In the theory of artificial neural networks, the distribution (1.7.2)
appears under the name of softmax distribution (see [7, p 14]).
The letter Z used in the notation stems from the German word Zustandssumme,
where Zustand is German for state; see [381, p.13]. The partition function allows
to compute most, if not all, of the relevant other quantities in problems where
it appears. For instance (see Exercise 8.6.6), the entropy corresponding to the
Boltzmann-Gibbs distribution is given by (for .β /= 0)

d Z(β)
H = −β 2
. . (1.7.5)
dβ β

Exercise 1.7.3 Suppose .min rk < max rk , and let .r ∈ (min rk , max rk ). Show that
the Eq. (1.7.4) has a unique solution.
We now consider the limiting case .β = 0. Then, .pk (0) = 1/K, .k = 1, . . . , K, is
the uniform probability distribution.
Exercise 1.7.4 Assume

as above .min rk < max rk . What happens in the preceding
K
r
analysis when .r = K 1 k
, i.e., when a solution to (1.7.1) corresponds to the uniform
probability distribution .pk = 1/K, .k = 1, . . . , K?
In the paper [237] the so-called Brandeis dice problem is defined as follows:
Problem 1.7.5 (see [237, p. 31]) A dice is tossed N times, and the average number
of spots is equal to .4.5, and not .3.5, as would be the case for a fair dice. What can
we say about the probability distribution of the spots of the dice.
In the abovementioned paper, Jaynes uses the maximum entropy analysis and the
Boltzmann distribution to determine the distribution of the spots. Note that this
approach has been criticized in particular in the paper [424].
Exercise 1.7.6 In the paper [237, p. 34], Eq. (1.7.4) (for the case under considera-
tion there) is written as

3x 7 − 5x 6 + 9x − 7 = 0,
. with x = e−β . (1.7.6)

Is this possible?
Remark 1.7.7 The next example corresponds to Planck’s oscillator; see, e.g., [381,
p. 20]. The result is also mentioned (with .b = 0) in [195, p. 46]. The distribution
corresponds then to the Boltzmann-Gibbs distribution with countable energy levels
.kβ, .k = 1, 2, . . . and is shown in [195] to correspond to the maximum entropy
22 1 Prologue

of a random variable with that countable number of values, under a fixed energy
constraint


. kpk = E. (1.7.7)
k=1

The result presented in Exercise 5.2.8 is used in the arguments.


Exercise 1.7.8 In the notation of the previous remark, we assume that (1.7.7)
holds.
(1) Compute the partition function .Z(β) when .rk = kβ + b, where .β > 0 and b is
real.
(2) Compute the corresponding entropy.
(3) Assume .β = 2b. Find an equivalent of .(Z(β))N as N goes to infinity.
It is important to compute the entropy associated with the Boltzmann distribution.
We defer it to Exercise 8.6.6.
Remark 1.7.9 It is a fascinating fact that one of the main, if not the main,
distributions in statistical physics appears in such a way. Such distributions (in
the continuous case) play an important role in Gibbs’ work (see [175, p. 32]) and
are called there canonical. In the classical works, it appeared as a consequence of
Stirling formula, counting the number of possibilities of arrangements of particles.
It also appears, in a maybe more fundamental way, as a consequence of fluid statics
applied to the ideal gas. See [292, pp. 1–3] and Exercise 9.2.3.
The function Z, or more precisely, a variation of it, appears in the theory of large
deviations in probability. In fact, this theory is one of the bridges between statistical
physics and probability. Indeed, consider an expression of the form


K
Z(β) =
. pk e−βrk
k=1

where .p1 , p2 , . . . , pK form a finite probability distribution. One can write

Z(β) = E(e−βX ),
.

where X is a finite random variable, taking values .r1 , . . . , rK with respective


probabilities .p1 , . . . , pK , and where .E denotes mathematical expectation in the
underlying probability space. Then, if .X1 , X2 , . . . is a sequence of independent
random variables in the same probability space, with the same distribution as X,
and if .E0 < E(X), one can write (see [309, p. 42])
K 

.P Xk < KE0 ≤ eK {minβ≥0 (E0 β+ln Z(β))} . (1.7.8)
k=1
1.8 The Case of Continuous Signals 23

Remark 1.7.10 The weak law



of large numbers (see Theorem 7.2.12) asserts that
K
X
the sequence of averages . k=1 K
k
, .K = 1, 2, . . ., converges in probability to
.E(X). Equation (1.7.8) gives a measure of how far this average can differ (can

largely deviate ) from .E(X) in terms of the (hopefully strictly negative) coefficient
.minβ≥0 (E0 β + ln Z(β)).

Remark 1.7.11 It is interesting to note that the Gibbs distribution is a special case
of a large family of probability distributions, characterized by the Hammersley-
Clifford theorem. The result pertains to the theory of Markov random fields on
graphs and uses the notion of clique in a graph. We send the reader to the references
[74, 189, 252] for more information.

1.8 The Case of Continuous Signals

In the case of a signal .x(t) indexed by real time, perturbed by a noise .w(t), quite a
number of approaches are possible, of which we mention three, one signal oriented,
the second one more noise oriented, and the third one using a special Hilbert space
of entire functions. In both cases, one replaces the signal x by the series of its
Fourier coefficients in an appropriately chosen orthonormal basis, in .L2 (R, dt)
and .L2 ([−T , T ], dt), respectively (with dt being the Lebesgue measure). In the
signal-oriented approach, one assumes that the signal is band limited, i.e., has a
representation of the form
 w0
1
x(t) =
. eitw σ (w)dw,
2π −w0

where the integration variable is denoted by w to emphasize that it denotes angular


frequency (linked to the frequency f by .w = 2πf ) and applies Shannon’s sampling
theorem (see Theorem 6.9.13) to write
  π n  sin(w0 t − π n)
.x(t) = x . (1.8.1)
w0 w0 t − π n
n∈Z

The functions .t ⍿→ wπ0 sin(w 0 t−π n)
w0 t−π n with .n ∈ Z form an orthonormal family of
.L2 (R, dt), and in particular

  
π   π n 2
. |x(t)|2 dt = x
R w0 w0
n∈Z

for x of the form (1.8.1). They are orthogonal, but do not form an orthonormal
basis (see Exercise 6.5.13 for the latter). Orthogonality can be seen either using
Plancherel’s equality (see Exercise 6.5.4) or using a contour integral (see Exer-
cise 5.9.16). It is important to note the following:
24 1 Prologue

Remark 1.8.1 The function


sin π t
.t ⍿→ (with value at 0 equal to 1)
πt
is a wavelet, that is, it defines a multiresolution analysis of the Lebesgue space
L2 (R, dx) (see Definition 6.3.2).
.

A number of related questions arise. Given a sequence .(an )n∈Z of complex numbers,
say subject to
 
. |an |2 < ∞ or |an | < ∞,
n∈Z n∈Z

what represents the function


 sin(w0 t − π n)
x(t) =
. an .
w0 t − π n
n∈Z

If we replace the above orthonormal basis by the normalized Hermite functions,


can we say something? A new ingredient appears then, namely, invariance under the
Fourier transform, since the Hermite function are the eigenfunctions of the latter.
One has then interesting links with various classes of Gelfand-Shilov spaces (in
particular of Roumieu and Beurling type; see [290] and strong algebras and [22]).
In the noise-oriented approach, one restricts the analysis to a finite interval, say
.[−T , T ], and builds an orthonormal basis for the space .L2 ([−T , T ], dt) using the

integral operator
 T
(KT y)(t) =
. K(t, u)f (u)du
−T

associated with the covariance function of the noise .K(t, s) = E(w(t)w(s)) (where,
as usual, .E denotes mathematical expectation), restricted to .[−T , T ] × [−T , T ].
When the covariance function is jointly continuous in .t, s, the operator .KT is of
Hilbert-Schmidt class, and one can derive a representation, both of the noise and of
the signal, along a basis of eigenvalues of .KT .
Finally, we mention another important approach, using the Bargmann transform
(see Remark 6.11.3). It is therefore clear from the previous discussion that the
constructions in the continuous case involve more elaborated tools, from functional
analysis and stochastic processes.
1.9 Some Questions Related to Statistical Physics 25

1.9 Some Questions Related to Statistical Physics

At the microscopic level, laws of physics are reversible with respect to time, but
not necessarily at the macroscopic level; a main question in statistical physics is to
understand this phenomenon. Although numerous persons have contributed to this
question, it is well to recall the names of Maxwell (1860), Boltzmann (1860), and
Gibbs (1902), where the years refer to the approximate time of their life when the
relevant contributions were done. It is important to remark that the development of
statistical physics began at a time where the atomic hypothesis was still in doubt,
and not widely accepted. In this context, it is interesting to note the following
milestones:
• Already in 1738, Daniel Bernoulli assumed that gases were made of particles
(molecules).
• In 1828, Brown [93] showed that the movement of the microscopic particles
observable in a microscope has no organic origin.
• In 1889, Gouy [187] firmly established that the movement of the above particles
results only from the internal thermal (molecular) unrest of the given liquid. This
possibility was stated first in 1877 by Delsaux.
• In 1905, Einstein explained (see [139, 141]) that the movements of microscopic
particles in a fluid are caused by the collisions of the particles with the molecules
of the underlying liquid. In the above quoted paper, Einstein does not claim that
this movement is the same as the Brownian motion; the fact that it is indeed the
same is explained in later papers (see the collection [141]). Einstein gives the
formula for the average of the square of the displacement of the particles, found
proportional to the time. In his inaugural dissertation [138] and in his paper [140]
(translated in [141]), he gives estimates of the size of molecules of sugar and
shows in [141, pp. 61–62] that his formulas lead to the number .P = 6.210−10 m
for the size of the molecule of sugar (revised to .4.910−10 m in a supplement to
the article) and to the estimate .N = 6.561023 for the Avogrado number.
• Langevin [278] reproves Einstein’s formula using the fundamental theorem of
mechanics and probabilistic methods. He writes what is (to quote Jean-Pierre
Kahane) the first stochastic differential equation.
These are only a few milestones, in a fascinating chapter of history of sciences.
Reading the papers mentioned above will give a more complete picture. The papers
of Gouy and Langevin can be found on the site gallica.bnf.fr of the Bibliothèque
Nationale de France.
Historical Remark 1.9.1 It is often written that Brown discovered the
phenomenon that bears his name. But this is not true. The movement of microscopic
particles had been observed much earlier. The paper [421] quotes and gives the
translation of a French paper by J. Inghen-Housz, published in 1784, where the
(yet to called) Brownian motion is briefly described. We also mention Bywater,
who described the Brownian motion in 1819. (See [2] for more historical remarks
and references.) As mentioned above, Brown showed that the movement of these
26 1 Prologue

particles has no organic origin. As he writes in page 9 of the version of his paper
(available online at https://bibdigital.rjb.csic.es):

But hence I inferred that these molecules were not limited to organic bodies, nor
even to their products.
As remarked in [95], the term molecule used by Brown really means the
microscopic particle he is observing. We refer to [95] for a historical survey. We also
refer to Jean-Pierre Kahane’s lecture on Vimeo at https://vimeo.com/97833419.
Given a macro-state isolated and at equilibrium, the fundamental hypothesis of
statistical physics states that all micro-states compatible with this macro-state have
the same probability. Let us give now a classical example, taken from probability,
to clarify the notion of macro-state and micro-state. Consider a sequence of 2N
independent coin flips, with heads and tails, the coin being moreover fair. A macro-
state may be thought as the number of tails, say T , appearing in the sequence. The
number of micro-states compatible with this macro-state is equal to
 
2N
.
T

and is called the statistical weight of the macro-state. Its logarithm (in basis 2 in
general, and multiplied by an appropriated constant, which we will ignore at this
stage) is called the entropy of the macro-state and is maximal for .T = N. For large
N, Stirling’s formula (see (5.4.8), (5.4.9)) leads to
 
2N (2N)!
. log2 = log2 ∼ 2N. (1.9.1)
N (N!)2

Exercise 1.9.2 Give a precise approximation rather than the informal equa-
tion (1.9.1).
Exercise 1.9.3 (see [429, p. 13]) Prove that

2
P (T = N) ∼
. ,
πN

as .N → ∞.
It follows from the Eq. (1.13.7) in the Solution of Exercise 1.9.2 that

log2 2N
N
.limN →∞ = 1,
log2 22N

i.e., from the entropy point of view (i.e., taking logarithms), the number of accessible
states is asymptotically equal to the total number of states. But the quotient
1.10 Some Questions and Tools in Quantum Mechanics 27

2N
N
.
22N
goes to 0 as N goes to infinity.
Statistical physics and information theory have numerous points of contact. For
the latter, even in the absence of noise, probability theory is more convenient to
give models for information sources and channels (for instance, via the transition
probability for the discrete memoryless channel). For the former, even the case of the
study of a “too large” number of objects, governed a priori by the laws of mechanics,
it is hopeless (as remarked in every introduction of every book on statistical physics)
to try to solve the underlying differential equations.
Remark 1.9.4 An interesting analogy (which, as all analogies, has its limitations)
is given in the book [318, p. 239]. The book was written and published prior to the
development of digital pictures and speaks of pictures taken on a photographic film,
but the idea is the same (with silver halide crystals replaced by pixels). Consider
thus a picture made of a large number of pixels. The micro-state is the full data of
all the information of each pixel. Because of the finite resolution of human eye, one
could change a number of pixels, and our eye will still see the same picture, the
macro-state. The analogy has also a point of contact with wavelets. Any book on
wavelets will illustrate the process of degeneracy of a picture when averaging the
pixels A final, and unrelated, point: after having been almost wiped out by digital
methods, photographic film is getting alive again.

1.10 Some Questions and Tools in Quantum Mechanics

Everyday experience shows the irreversibility of most, if not all, phenomena, in


apparent contradiction with the laws of physics, which are reversible with respect
to time. A paper of Émile Borel [82] makes it clear how the slightest error in
measurement at the microscopic level leads to errors at the macroscopic level
very quickly. Statistical physics aims at understanding this question, by getting
global (i.e., large scale) information from micro-scale information: for instance, to
determine, and understand, the global properties of a gas from the local properties
of its molecules. It is surprising that this is possible at all.
An important notion in statistical physics which percolated to mathematics, and
in particular to information theory, is that of ergodicity.
There are more similarities between the system made of the molecules of (say
an ideal) gas in a given closed box and the output of an information source in
information theory than meets the eye. More precisely, in both cases:
1. Ergodicity: time average is equal to space average.
2. An underlying finiteness assumption: finite number of energy levels in the first
case and a finite alphabet in the second case.
28 1 Prologue

Following [57, p. 212], it is interesting to note that observables (which are


densities in the classical setting) should be seen as continuous linear forms, i.e.,
as distributions on an appropriate space of test functions.
Motivated, or more precisely driven, by quantum mechanics, the question arose
of considering the non-commutative versions of different notions from classical
analysis, and in particular of Lebesgue spaces. An important aspect of this problem
is the definition of a non-commutative probability space. In quantum probability and
quantum information theory, a finite probability space is replaced by a finite Hilbert
space, typically .CN with the inner product


N
〈z, w〉 =
. w u zu . (1.10.1)
u=1

The events are replaced by orthogonal projections, and random variables are
replaced by self-adjoint linear maps (now called observables ). Note that a priori,
a probability space has no linear structure (Hida’s white noise space and spaces
obtained from the Bochner-Minlos theorem are of course counterexamples to this
claim) while we now introduce a linear structure on a non-commutative probability
space.
The counterpart of a probability distribution is now a state. We now give the
definition in the finite-dimensional case.
Definition 1.10.1 A state is a positive semi-definite matrix with trace 1 (the trace
of a matrix is the sum of its diagonal elements; see Definition 3.1.58).
The state is pure if it cannot be decomposed as a nontrivial convex combinations of
other states. Otherwise it is called a mixed state. A unit vector a defines the state

.aa since

Tr aa ∗ = a ∗ a = 1.
.

We have:
Exercise 1.10.2 Show that the state .aa ∗ where .a ∈ Cn with .a ∗ a = 1 is pure, and
that any pure state is of this form.
Hint: The orthogonal projection .In − aa ∗ plays an important role in the proof.
Note that a finite probability distribution .(p1 , . . . , pN ) can be seen as the
eigenvalues of a positive semi-definite matrix with trace 1. Furthermore, the product
of probability spaces is replaced by the tensor product of Hilbert spaces, and the
product probability is replaced by the tensor product of the states. We note that if A
and B are positive matrices of same dimension, so are the matrices

aB + bB, with a, b positive real number,


. A ◦ B, the Schur, or Hadamard product,
A1/2 BA1/2 , which appears in the theory of quantum effects.
1.11 Some Questions and Tools in Quantum Information Theory 29

These operators have their own importance, keep positivity, but will not lead to a
state in general, and are not defined if A and B have different sizes. On the other
hand, the tensor product of two states will still be a state and is defined even when
the underlying spaces have different dimensions. To give more motivation to the
appearance of the tensor product, it is best to go to the continuous case and consider
two particles, with wave functions .f1 (r1 , t) and .f2 (r2 , t) respectively. Then the
system consisting of the two particles is characterized by the function

f (r1 , r2 , t) = f1 (r1 , t)f2 (r2 , t)


.

(see, e.g., [110, p. 162]) and belongs to the tensor product of the spaces where these
functions belong (see [326, Proposition 6.2, p. 111]). If for a given (and all t) the
function .r1 ⍿→ f1 (r1 , t) and the function .r2 ⍿→ f2 (r2 , t) belong to Hilbert spaces
.H1 and .H2 , respectively, then the function .(r1 , r2 ) ⍿→ f1 (r1 , t)f2 (r2 , t) belongs

to the (non-symmetric) tensor product .H1 ⊗ H2 . More particles will lead to the
non-symmetric Fock space

F(H) = CΩ ⊕ H ⊕ (H ⊗ H) ⊕ · · ·
. (1.10.2)

where .⊗ denotes the tensor product. In the case of fermions (resp. bosons),
one considers the corresponding symmetric or anti-symmetric Fock space. The
symmetric Fock space is identified with Bargmann’s Fock space (see [61, 62] for
the latter) in [396].
Note that the space of (linear bounded) operators from .F(H) into itself is an
important example of a non-commutative probability space, when endowed with
the inner product

〈A, B〉 = Tr(B ∗ ρA)


.

where .ρ is a positive nuclear (i.e., trace class) operator with trace equal to 1, i.e., a
state.
Finally, we note that there are a number of key differences between quantum
information theory and classical information theory, not all on quantum side.

1.11 Some Questions and Tools in Quantum Information


Theory

A classical channel is characterized by probability transitions; in the case of the


finite alphabet (1.2.7), and when the input of the channel is denoted by X and the
output by Y , we are given the conditional probabilities

Pkj = P (Y = ak |X = aj ),
. k, j = 1, . . . , K.
30 1 Prologue

We note that


K
. Pkj = 1, k = 1, 2 . . . , K,
j =1

and hence P is a (right) stochastic matrix. This fact is used in the following exercise:
Exercise 1.11.1 Show that a transition matrix .P = (Pkj )K k,j =1 sends linearly
probability distributions into probability distributions, meaning that if

.p = p1 p2 · · · pK ∈ C1×K

represents a probability distribution, then .q = pP also represents a probability


distribution.
Of course the multiplication of a probability distribution by an arbitrary real number
will not be a probability distribution, but the linearity allows to consider convex
combinations of probability distributions; the image is then still a probability
distribution.
In the quantum setting, it would seem natural to define a quantum channel as a
linear map from .Cn×n into .Cm×m sending states into states. But one requires more:
the map, say .ϕ, should be in fact completely positive, meaning that for every .N ∈ N,
if the block matrix .(Akj )N N
k,j =1 is positive, so is the block matrix .(ϕ(Akj ))k,j =1 .
These are exactly maps of the form


U
.ϕ(A) = Mu∗ AMu (1.11.1)
u=1

where the matrices .Mu ∈ Cn×m are such that


U
. Mu Mu∗ = In .
u=1

This latter condition insures that if .ρ is a state so is


U
'
ρ =
. Mu∗ ρMu .
u=1

The matrices .M1 , . . . , MU are called the interaction operators (or Kraus operators)
and give a model for the quantum noise. In the finite-dimensional case, maps of the
form (1.11.1) were characterized by Choi (see [101] and Sect. 3.7). It is important
to remark that the tensor product of two positive semi-definite matrices is a positive
semi-definite matrix (see Exercise 3.6.4) and that maps of the form (1.11.1) are
1.12 Machine Learning 31

stable under tensor products (see Exercise 3.7.7). We see therefore deep connections
between linear algebra and (finite) quantum channels and, more generally, between
functional analysis and quantum channels, one of the main links being the notion of
tensor product. The latter is quite involved in the infinite-dimensional case.
Exercise 1.11.2 Give an example of a linear map on matrices, not of the
form (1.11.1), and which sends states into states.
In connection with the previous exercise, see also Exercise 3.7.9.

1.12 Machine Learning

Machine learning, a very important subfield of artificial intelligence, is a multidisci-


plinary domain, both in methods and applications. It uses tools from a host of fields
in mathematics and statistical physics. In a way similar to statistical physics, the
purpose (or, one purpose) of machine learning is to find structure in large sets and
discover emergent properties from large data sets (see [379]). While not the topic of
the present book, the student will find relevant tools used in machine learning in a
number of places. We mention in particular:
1. Linear algebra. Linear algebra is needed in particular in the study of neural
networks and principal component analysis, and familiarity with computations
with block matrices is needed. For instance, the well-known inversion formula
for .2 × 2 matrices
 −1  
ab 1 d −b
. =
cd ad − bc −c a

has far-reaching generalizations for block matrices (see Formula (3.2.7), which
we rewrite below):
 −1  −1 
AB A + A−1 B(D □ )−1 CA−1 −A−1 B(D □ )−1
. = ,
CD −(D □ )−1 CA−1 (D □ )−1

where .A, B, C, D are matrices of appropriate sizes and the matrices A and .D □ =
D − CA−1 B are assumed invertible.
Also from linear algebra, the notion of hyperplane permeates most, if not all,
classification methods, beginning with the perceptron convergence algorithm
(see Sect. 2.5).
2. Graph theory. In a number of situations (such as the study and synthesis of
macro-molecules or the study of social networks), the data is given in the form
of a graph, and it is not efficient to reduce the information to a family of feature
vectors, losing in the process the underlying geometric structure. Here comes
into play graph theory and the corresponding methods. When the graph is a
32 1 Prologue

lattice, one can define the discrete counterpart of partial derivatives along the
axis directions. In general, one can only define the Laplacian. It appears (what
else?) that it can be defined (say, in the finite-dimensional case) in terms of a
positive operator, whose spectral structure plays a key role in the study of the
data (in a way similar to principal component analysis).
3. Functional analysis. We mention in particular metric spaces, which are used in
the K-nearest neighbor method. Hilbert spaces also play an important role, both
in the finite- and infinite-dimensional settings. Positive definite kernels play also
an important role in machine learning, in the theory of support vector machines
(SVM) and the associated so-called kernel trick. Recall first that SVM is a
supervised classification problem for feature vectors in .RM belonging to two
different families. The aim is to partition these two families along an hyperplane.
Sometimes, the families are not separable in .RM but in a larger space. The kernel
trick consists in taking a positive definite kernel, say .k(t, s), with .t, s ∈ RM .
The map .s ⍿→ k(·, s) allows to consider the various problems in a much larger
space (namely, the reproducing kernel Hilbert space with reproducing kernel
.k(t, s)), where hopefully the (images of the) two families can be separated by

an hyperplane. We take this opportunity to also mention that a positive definite


kernel on a set .Ω defines in a natural way a metric on .Ω, namely,

dk (t, s) =
. k(t, t) + k(s, s) − 2Re k(t, s), t, s ∈ Ω,

(see (6.7.15)) possibly after moding out points corresponding to the same kernel
function

s∼t
. ⇐⇒ k(·, t) = k(·, s).

When applied to .Ω = RM and to support vector machines, the inclusion of the


set of feature vectors inside a larger space amounts to replacing the Euclidean
metric of .RM by the metric associated with the kernel. The metric also induces
a Borel measure on the set, in some sense more adapted to the set.
4. Probability theory, information theory, and statistical physics. The notion
of entropy is also used there. The Boltzmann-Gibbs distribution from statistical
physics plays an important role.

1.13 Solutions of the Exercises

Solution of Exercise 1.1.1 Assuming that f is in .C 2 (R) the minimum occurs for
.β such that

.x = f (1) (β),
1.13 Solutions of the Exercises 33

−◦
i.e., .β = f (1) . The result follows by plugging this value of .β in .βx − f (β).
Before computing the Legendre transform of ln Z(β), we first remark that

E 2 eβE
(ln Z(β))(2) =
. ≥0
(1 + eβE )2

and so ln Z(β) is convex. See also Remark 1.13.4 (there, −β is used rather than β,
but this does not change the convexity property; the computations seem nicer with
the present choice). Let now

φ(β) = βx − ln Z(β).
.

Then

dφ EeβE
. =x− ,
dβ 1 + eβE

which has a zero at β such that

x 1
. = , (1.13.1)
E 1 + e−βE

that is,
 
1 E
.β = − ln −1 . (1.13.2)
E x

The Legendre transform is then equal to


 
1 + eβE
L(x) = xβ − ln
2
 
1 + e−βE
= xβ − ln
2e−βE
 
E
. = xβ − ln (where we have used (1.13.2))
2xe−βE
E
= ln 2 + β(x − E) − ln
x
 
x−E E E
= ln 2 − ln − 1 − ln ,
E x x

where we have used (1.13.2) once more, to go from the second line to the last line.
Setting ϵ = Ex , this last expression can be written as
34 1 Prologue

 
1
L(x) = ln 2 + (1 − ϵ) ln − 1 + ln ϵ
ϵ
.
= ln 2 + ϵ ln ϵ + (1 − ϵ) ln(1 − ϵ)
= ln 2 − h(ϵ),

where

h(ϵ) = −ϵ ln ϵ − (1 − ϵ) ln(1 − ϵ)
.

(for the latter, compare with (8.2.12), where log2 is used; see also Remark 1.13.1)
which will be equal to 0 for ϵ = 1/2. ⨆

Solution of Exercise 1.3.3 We follow [364, pp. 117–118]. Let (e0 , e1 ) and (b0 , b1 )
be, respectively, the input and output probability vectors, and use (8.2.18):

I (X, Y ) = H (Y ) + H (X) − H (X, Y ) = H (Y ) − H (Y |X),


.

where we refer to (8.1.11) for the definition of H (Y |X). We have

H (Y ) − H (Y |X) = −b0 log2 b0 − b1 log2 b1 + p00 e0 log2 p00 +


.
+ p01 e1 log2 p01 + p10 e0 log2 p10 + p11 e1 log2 p11 .

By (1.3.3)

p00 e0 log2 p00 + p01 e1 log2 p01 + p10 e0 log2 p10 + p11 e1 log2 p11 =
= e0 (p00 log2 p00 + p10 log2 p10 )+
+e1 (p01 log2 p01 + p11 log2 p11 )
.
= e0 (p00 q0 + p10 q1 ) + e1 (p01 q0 + p11 q1 )
= q0 (p00 e0 + p01 e1 ) + q1 (p10 e0 + p11 e1 )
= b0 q0 + b1 q1 .

We thus need to compute the maximum of the function

f (b0 , b1 ) = −b0 log2 b0 − b1 log2 b1 + b0 q0 + b1 q1


.

when restricted to probability distributions. Replacing b1 = 1 − b0 , we have a


function of one variable, with derivative

df (b0 , 1 − b0 ) 1
. = (− ln b0 + ln(1 − b0 )) + q0 − q1
db0 ln 2
1.13 Solutions of the Exercises 35

vanishing at b0 such that

1 − b0
. = 2q0 −q1 .
b0

2 q0
Thus, b0 = 2q0 +2q1 , the maximum being C = log2 (2q0 + 2q1 ), and hence

2C = 2q0 + 2q1 .
.



Remark 1.13.1 When the channel is symmetric, i.e., p0 = p1 = p, we have

. C = 1 − H (p),

where H (p) = −p log2 p−(1−p) log2 (1−p) (see also Theorem 1.5.6 and (8.2.12)
for the latter). When p = 1/2, the capacity is equal to C = 0. When the channel is
“trivial” (i.e., p = 0 or p = 1), the capacity is 1.
Solution of Exercise 1.5.3 Set

A = {i : xi = 1}
. and B = {i : yi = 1} . (1.13.3)

Then,

wH (x ⊙ y) = Card A ∩ B
.

and

wH (x) + wH (y) = Card A ∪ B.


.

To conclude, and recalling that we work in Z/2Z, we observe that xi − yi = 1 if


and only if (xi , yi ) = (1, 0) or (xi , yi ) = (0, 1). Thus,

. wH (x − y) = Card AΔB,

where Δ is the symmetric difference of A and B (see (4.1.19)). The result follows
then from

Card AΔB = Card A + Card B − 2 Card A ∩ B.


.



Solution of Exercise 1.6.1 Recall that p ∈ (0, 12 ), and in particular
36 1 Prologue

p
0<
. < 1.
1−p

We first note that


   2N
 +1

2N + 1 2N +1−k
PN (error) ≤ p (1 − p)
k
N
k=N +1
.  2N +1 
2N + 1 (2N)! 
2N +1−k
= p (1 − p)
k
.
N + 1 (N!)2
k=N +1

Hence, using (5.4.8)

PN (error) ≤
√  2N +1  k 
1 2 2N + 1 2π (2N)2N + 2 e−2N
1
 p
− 12N+1
≤e 24N (1 − p)2N +1
N +1 2π N 2N +1 e−2N 1−p
k=N +1

   N +1
1  N +1 1 − p
1 2 (2N + 1)(2N) 2 p 1−p
= e 24N − 12N+1 √ 4N (1 − p)2N +1 p
2π (N + 1)N 1−p 1 − 1−p

   N +1
p
.
1 2 (2N + 1)(2N)
1
2 1− 1−p
= e 24N − 12N+1 √ (4p(1 − p))N p p
2π (N + 1)N 1 − 1−p
 1

1 2
− 12N+1 (2N + 1)(2N) 2 p(1 − p)
≤e 24N √ (4p(1 − p))N
2π (N + 1)N 1 − 2p
  √  
1 2
− 12N+1 (2N + 1) 2 p(1 − p) 1
= e 24N √ · √ · (4p(1 − p))N .
2π (N + 1) 1 − 2p N
  
bounded as N →∞

This expression goes to 0 exponentially fast since 0 < 4p(1 − p) < 1 for p ∈
[0, 1/2). ⨅

Solution of Exercise 1.6.3 Assume first x ∈ (GF (2))1×7 to be in the left range of
G. Thus,

x = x0 x0 P
.
1.13 Solutions of the Exercises 37

where x0 ∈ (GF (2))1×4 . Thus

xH t = −x0 P + x0 P = 0.
.

Conversely, if xH t = 0, and setting x = x0 x1 , with x0 ∈ (GF (2))1×4 , we have

x0 P + x1 = 0,
.

and so x = x0 I4 P belongs to the left range of G. ⨆



Solution of Exercise 1.6.4 This is because the seven rows of H t describe the seven
numbers 1, . . . , 7 in basis 2. ⨆

Solution of Exercise 1.6.5 We take the code defined by the parity-check matrix
H ∈ (Z/2Z)m×(2 −1) with columns the integers from 1 to 2m −1 written in basis 2.
m



7
Solution of Exercise 1.6.10 Each closed ball of radius 1 has 1 + 1 = 8 = 23
elements and there are 24 code words. But

Card (GF (2))7 = 27 = 23 × 24 .


.



Remark 1.13.2 More generally, let m ≥ 3. The (2m − 1, 2m − 1 − m, 3) Hamming
code (recall the notations in Definition 4.5.2) is also perfect since every closed
sphere of radius 1 contains now 1 + 2 1−1 = 2m elements, and there are 22 −1
m m

code words, so that


m −1 m −1−m
22
. = 22 · 2m .

We note that

2m − 1 − m
. limm→∞ =1
2m − 1

but of course the probability of error (1.6.6) now becomes


m −1 m −2
1 − (1 − p)2
. − (2m − 1)p(1 − p)2

which does not go to 0, but to 1, as m → ∞.


38 1 Prologue

Solution of Exercise 1.6.11 We have

X11 + X8 + X7 + X5 + X3 + X2 + X + 1

X4 + X + 1 )X15 + 1
X15 + X12 + X11
1X12 + X11 + 1
1X12 + X9 + X8
12X11 + X9 + X8 + 1
12X11 + X8 + X7
123X9 + X7 + 1
123X9 + X6 + X5
1234X7 + X6 + X5 + 1
12345X7 + X4 + X3
123456X6 + X5 + X4 + X3 + 1
123456X6 + X3 + X2
1234567X5 + X4 + X2 + X
1234567X5 + X2 + 1
12345678X4 + X + 1
1234567X4 + X + 1
1234567890

Hence,

X15 + 1 = (X4 + X + 1)(X11 + X8 + X7 + X5 + X3 + X2 + X + 1)


.

in (GF (2))[X]. ⨆

Solution of Exercise 1.6.15 Recall that we are in characteristic 2 and that the Xi
are assumed different from 0. We have
j +r j +r j +r−1
s(Xi−1 ) = Xi
j
Xi
. + s1 Xi + · · · + sr Xi = 0, i = 1, . . . , r, j = 0, 1, . . .

Summing these equations from i = 1 to i = r, we obtain

.Sj +r + s1 Sj +r−1 + · · · + sr Sj = 0, j = 0, 1, . . .

and hence the result. ⨆



Solution of Exercise 1.7.3 Let
K −βrk
k=1 rk e
u(β) =
. .
Z(β)
1.13 Solutions of the Exercises 39

Then,

u' (β) =
       
K 2 −βrk K −βrk − K −βrk K −βrk
k=1 (−rk )e k=1 e k=1 rk e k=1 (−rk )e
=  2
K −βrk
. k=1 e
 2    
K −βrk − K 2 −βrk K −βrk
k=1 rk e k=1 rk e k=1 e
=  2 ,
K
e −βrk
k=1

which is negative, thanks to the Cauchy-Schwarz inequality applied to the vectors


 βr1 βrK
  βr βrK

r1 e − 2 · · · rK e − 2 e− 2 · · · e− 2 .
1
. and

There will be equality in the Cauchy-Schwarz inequality if and only if the two
corresponding vectors are linearly dependent, that is, if and only if
 βr1 βrK
  βr βrK

r1 e − 2 · · · rK e − 2 = t · e − 2 · · · e − 2 .
1
.

for some t ∈ R. This will hold if and only if r1 = · · · = rK , which cannot be since
we assume min rk < max rk . So the function u is strictly decreasing. The result
follows since u(−∞) = max rk and u(∞) = min rk . ⨆

Remark 1.13.3 This classical computation appears in more general settings later in
the book (see, e.g., Exercise 6.6.5. For a reference in the literature, see, for instance,
[52, Lemma 3.6.3 p 78]).
Remark 1.13.4 In the notation of the Solution of Exercise 1.7.3 we have

d2 ln Z(β)
u' (β) = −
. ≤ 0,
dβ 2

so that ln Z(β) is convex.


Solution of Exercise 1.7.4 The choice β = 0 in (1.7.2) corresponds to pk (0) =
1/K, k = 1, . . . , K. Since the Eq. (1.7.4) has a unique solution, the uniform
distribution is the only solution corresponding to β = 0. ⨆

Remark 1.13.5 When one views the parameter β as the inverse of temperature,
β = 0 gives an infinite temperature, corresponding to maximum disorder, and
a uniform distribution. On the other hand, the case β = ∞ corresponds to the
distribution with probability 1 for the minimal value to be achieved and probability
0 for the other values.
40 1 Prologue

Solution of Exercise 1.7.6 Equation (1.7.6) has two solutions, the trivial solution
x = 1 and the solution x = 1.449.... But Eq. (1.7.4) has a unique solution.
So (1.7.6) does not arise from (1.7.4). The paper [237] indeed mentions the solution
x = 1.449..., but it does not come from (1.7.6). (See Remark 1.13.6.) ⨆

Remark 1.13.6 In the case of a dice, and with r = 4.5, Eq. (1.7.4) becomes (with
x = e−β )

.2(x + 2x 2 + 3x 3 + 4x 4 + 5x 5 + 6x 6 ) = 9(x + x 2 + x 3 + x 4 + x 5 + x 6 ),

so that x solves

3x 5 + x 4 − x 3 − 3x 2 − 5x − 7 = 0,
.

with unique real solution x = 1.449 . . . One has

3x 7 − 5x 6 + 9x − 7 = (x − 1)2 (3x 5 + x 4 − x 3 − 3x 2 − 5x − 7)
.

and this may have been the source of the confusion in [237].
Solution of Exercise 1.7.8 We have (with positive β)


Z(β) = e− b e−βk
k=0
1
.
= e−b
1 − e−β
e(β/2−b)
=
eβ/2 − e−β/2

which is equal to 1
2 sinh(β/2) when β = 2b.
We now compute the associated entropy. We have

e−βk−b e−βk
pk =
. = ∞ −βj = e−βk (1 − e−β ). (1.13.4)
Z(β) j =0 e

Thus, using

 z
. kzk = , |z| < 1,
(1 − z)2
k=1

we can write
1.13 Solutions of the Exercises 41


 ∞

− pk ln pk = e−kβ (1 − e−β ) βk − ln 1 − e−β
k=0 k=0

βe−β
= − ln 1 − e−β
.

1 − e−β
H (e−β )
= ,
1 − e−β

where H (p) = −p ln p − (1 − p) ln(1 − p).


Taking into account (1.13.4) and (1.7.7), we can relate the quantities β, E, and
b. We have


. e−βk (1 − e−β )(βk + b) = E,
k=0

so that
∞ 

−β −βk
β(1 − e
. ) ke = E − b,
k=0

and so

e−β
β
. = E − b.
1 − e−β

Therefore
E−b
e−β =
. .
E−b+β

Finally, when β = 2b, we can write

e−N b
(Z(β))N ∼
. ,
1 − e−2N b

as N → ∞, and we leave to the reader to make this statement precise. ⨆



The above exercise illustrates, on a simple example, another aspect of the deep
relationships between the Boltzmann-Gibbs distribution and Shannon’s entropy.
Remark 1.13.7 The function (Z(β))N is the partition function of a system of N
independent systems, each with partition function Z(β). The preceding approxima-
tion, trivial in the case at hand, is a special case of an important result, related to the
theory of large deviations, and using the saddle method to estimate asymptotically
certain integrals.
Random documents with unrelated
content Scribd suggests to you:
Stranger

Little Cowboy, what have you heard,


Up on the lonely rath’s green mound?

Little Cowboy

Only the plaintive yellowbird


Sighing in sultry fields around,
“Chary, chary, chary, chee-ee!—”
Only the grasshopper and the bee.

Fairy Shoemaker (singing underground)

Tip-tap, rip-rap,
Tick-a-tack-too!
Scarlet leather, sewn together,
This will make a shoe.
Left, right, pull it tight;
Summer days are warm;
Underground in Winter,
Laughing at the storm!

Stranger

Lay your ear close to the hill.


Do you not catch the tiny clamour,
Busy click of an Elfin hammer,
Voice of the Leprechaun singing shrill
As he merrily plies his trade?
He’s a span
And a quarter in height.
Get him in sight, hold him tight,
And you’re a made
Man!
You watch your cattle the Summer day,
S t t l i th h
Sup on potatoes, sleep in the hay;
How would you like to roll in your carriage,
Look for a Duchess’s daughter in marriage?
Seize the Shoemaker—then you may!

Fairy Shoemaker (singing underground)

Big boots a-hunting,


Sandals in the hall,
White for a wedding-feast,
Pink for a ball.
This way, that way,
So we make a shoe;
Getting rich every stitch,
Tick-a-tack-too!

Stranger

Nine-and-ninety treasure-crocks
This keen Miser-Fairy hath,
Hid in mountains, woods, and rocks,
Ruin and round-tow’r, cave and rath,
And where the cormorants build;
From times of old
Guarded by him;
Each of them filled
Full to the brim
With gold!
I caught him at work one day, myself,
In the castle-ditch, where Foxglove grows,—
A wrinkled, wizened, and bearded Elf,
Spectacles stuck on his pointed nose,
Silver buckles to his hose,
Leather apron—shoe in his lap—

Fairy Shoemaker (singing underground)

Ri ti t
Rip-rap, tip-tap,
Tick-a-tack-too!
(A grasshopper on my cap!
Away the moth flew!)
Buskins for a Fairy Prince,
Brogues for his son,—
Pay me well, pay me well,
When the job is done!

Stranger

The rogue was mine, beyond a doubt.


I stared at him; he stared at me;
“Servant, sir!” “Humph!” says he,
And pulled a snuff-box out.
He took a long pinch, looked better pleased,
The queer little Leprechaun;
Offerèd the box with a whimsical grace,—
Pouf! he flung the dust in my face,
And, while I sneezed,
Was gone!

William Allingham
GLAD LITTLE, SAD LITTLE,
BAD LITTLE ELVES
Saint Francis and Saint Benedict,
Bless this house from wicked wight;
From the Night-mare and the Goblin
That is hight Good-Fellow Robin;
Keep it from all evil spirits,
Fairies, weasels, rats, and ferrets,
From curfew-time
To the next prime.

William Cartwright (1635?)


LITTLE REDCAP

From Ireland
Sure and it was in old Ireland, some years ago, that Tom Coghlan
returned one evening to his house, expecting to find the fire blazing,
the potatoes boiling, and his wife and children as merry as grigs.
But, instead, the fire was out, his wife was scolding, and the children
were all crying from hunger.
Poor Tom was quite astonished to find matters going on so badly,
for, though there was a plenty of potatoes in the house, there wasn’t
a single stick of wood for the fire. Something had to be done. And
Tom bethought himself of the great furze-bushes that grew around
the ruins of the old fort on top of the near-by hill. So he snatched up
his axe and away he went.
Before he reached the top of the hill the sun had gone down, and
the moon had risen and was shedding her wavering, watery light on
the ruins of the old fort. The breeze rustled the dark furze-bushes
with an eerie sound, and Tom shivered with dread. But he braced up
his heart, and, approaching the fort, raised his axe to cut down a big
bush. Just then, near him, he heard the shriek of a small, shrill
voice.
Tom, startled, let the axe fall from his grasp, and, looking up, saw
perched on the furze-bush in front of him a little old man, not more
than a foot and a half high. He wore a red cap. His face was the
colour of a withered mushroom, while his sparkling eyes, twinkling
like diamonds in the dark, illuminated his distorted face. His thin legs
dangled from his fat, round body.
“Ho! Ho!” said the Little Redcap, “is that what you’re after, Tom
Coghlan? What did me and mine ever do to you that you should cut
down our bushes?”
“Why, then, nothing at all, your honour!” said Tom, recovering a bit
from his fright, “nothing at all! Only the children were crying from
hunger, and I thought I’d make bold to cut a bush or two to boil the
potatoes, for we haven’t a stick in the house.”
“You mustn’t cut down these bushes, Tom!” said the Little Redcap.
“But, as you are an honest man, I’ll buy them from you, though I
have a better right to them than you have. So, if you’ll take my
advice, carry this mill home with you, and let the bushes alone,” said
the Little Redcap, holding out a tiny stone mill for grinding meal.
“Mill, indeed!” said Tom, looking with astonishment at the thing,
which was so small that he could have put it with ease into his
breeches pocket. “Mill, indeed! And what good will a bit of a thing
like that do me? Sure, it won’t boil the potatoes for the children!”
“What good will it do you?” said the Little Redcap. “I’ll tell you what
good it will do you! It will make you and your family as fat and
strong as so many stall-fed bullocks. And if it won’t boil the
potatoes, it will do a great deal better, for you have only to grind it,
and it will give you the greatest plenty of elegant meal. But if you
ever sell any of the meal, that moment the mill will lose its power.”
“It’s a bargain,” said Tom. “So give me the mill, and you’re heartily
welcome to the bushes.”
“There it is for you, Tom,” said the Little Redcap, throwing the mill
down to him; “there it is for you, and much good may it do you! But
remember you are not to sell the meal on any account.”
“Let me alone for that!” said Tom.
And then he made the best of his way home, where his wife was
trying to comfort the children, wondering all the time what in the
world was keeping Tom. And when she saw him return without so
much as one stick of wood to boil the potatoes, her anger burst out.
But Tom soon quieted her by placing the mill on the table and telling
her how he had got it from the Little Redcap.
“We’ll try it directly,” said she. And they pulled the table into the
middle of the floor, and commenced grinding away with the mill.
Before long a stream of beautiful meal began pouring from it; and in
a short time they had filled every dish and pail in the house. Tom’s
wife was delighted, as you may believe, and the children managed
the best they could for that night by eating plenty of raw meal.
Well, after that everything went very well with Tom and his family.
The mill gave them all the meal they wanted, and they grew as fat
and sleek as coach-horses. But one morning when Tom was away
from home, his wife needed money. So she took a few pecks of the
meal to town and sold it in the market.
And sorry enough she was, for that night, when Tom came home
and began to grind the mill, not a speck of meal would come from it!
He could not for the life of him find out the reason, for his wife was
afraid to tell him about her selling the meal.
“Sure, and the little old fellow cheated me well!” thought Tom, as
mad as a nest of hornets. So he put his axe under his arm, and
away he went to the old fort, determined to punish the Little Redcap
by cutting down his bushes. But scarcely had he lifted his axe, when
the Little Redcap appeared, and mighty angry he was, too, that Tom
should come cutting his bushes, after having made a fair bargain
with him.
“You deceitful, little, ugly vagabond!” cried Tom, flourishing his axe,
“to give me a mill that wasn’t worth a sixpence! If you don’t give me
a good one for it, I’ll cut down every bush!”
“What a blusterer you are, Tom!” said the Little Redcap, “but you’d
better be easy and let the bushes alone, or maybe you’ll pay for it!
Deceive you, indeed! Didn’t I tell you that mill would lose its power if
you sold any of the meal?”
“And sure and I didn’t, either,” said Tom.
“Well, it’s all one for that,” answered the Little Redcap, “for if you
didn’t, your wife did. And as to giving you another mill, it’s out of the
question. For the one I gave you was the only one in the fort. And a
hard battle we had to get it away from another party of the Good
People! But I’ll tell you what I’ll do with you, Tom; let the bushes
alone, and I’ll make a doctor of you.”
“A doctor, indeed!” said Tom. “Maybe it’s a fool you’re making of
me!”
But it was no such thing, for the Little Redcap gave Tom Coghlan a
charm so that he could cure any sick person. And Tom took it home,
and became a great man with a very full purse. He gave good
schooling to his children. One of them he made a grand butter-
merchant in the city of Cork, and the youngest son—being ever and
always a well-spoken lad—he made a lawyer; and his two daughters
married well.
And Tom is as happy as a man can be!
THE CURMUDGEON’S SKIN

From Ireland
It is well known in old Ireland that a Four-leaved Shamrock has the
power to open a man’s eyes so that he can see all kinds of
enchantments, and this is what happened to Billy Thompson:—
One misfortune after another decreased his goods. His sheep died;
and his pig got the measles, so that he was obliged to sell it for little
or nothing. But still he had his cow.
“Well,” said Billy to his wife, for he was a good-humoured fellow, and
always made the best of things,—“Well!” said he, “it can’t be helped!
Anyhow, we’ll not want the drop of milk to our potatoes, as long as
the cow’s left to comfort us!”
The words were hardly out of his mouth when a neighbour came
running up to tell him that his cow had fallen from a cliff, and was
lying dead in the Horses’ Glen. For Billy, you must know, had sent his
cow that very morning to graze on the cliff.
“Och! Ullagone!” cried Billy. “What’ll we do now! Och! you cruel,
unnatural beast as to clift yourself, when you knowed as well as
myself that we couldn’t do without you at all! For sure enough now
the children will be crying for the drop of milk to their potatoes!”
Such was Billy’s lament, as with a sorrowful heart he made the best
of his way to the Horses’ Glen. “Anyway,” thought he, “I’ll skin the
carcass, and the meat will make fine broth for the children.”
It took him some time to find where the poor beast was lying, but at
last he did find her, all smashed to pieces at the foot of a big rock.
And he began to skin her as fast as he could, but having no one to
help him, by the time the job was finished, the sun had gone down.
Now Billy was so intent on his work that he did not perceive the
lapse of time, but when he raised his head and saw the darkness
coming on, and listened to the murmuring wind, all the tales he had
ever heard of the Pooka, the Banshee, and Little Redcap, the
mischievous Fairy, floated through his mind, and made him want to
get home as fast as possible. He snatched a tuft of grass, wiped his
knife, and seized hold of the hide.
It so happened that in the little tuft of grass with which Billy wiped
his knife was a Four-leaved Shamrock. And whether from grief or
fear, Billy, instead of throwing away the grass, put it in his pocket
along with his knife. And when he stood up and turned to take a last
look at the carcass he saw, instead of his poor cow, a little old
Curmudgeon sitting bolt upright, looking as if he had just been
skinned alive!
“Billy Thompson! Billy Thompson,” cried the little old fellow in a
shrill, squeaking voice. “You spalpeen! You’d better come back with
my skin! A pretty time of day we’ve come to, when a gentleman like
me cannot take a bit of sleep but a rude fellow must come and strip
the hide off him! But you’d better bring it back, Billy Thompson, or
I’ll make you remember how you dared to skin me, you spalpeen!”
Now Billy, though he was greatly frightened, remembered that he
had a black-handled knife in his pocket, and whoever has that, ’tis
said, can look all the Fairies of the world in the face without quaking.
So he put his hand on the knife, and began backing away, with the
skin under his arm.
“Why, then, your honour,” said he, “if it’s this skin you’re wanting,
you must know it’s the skin of my poor cow that was clifted yonder
there. And sore and sorrowful the children will be for the want of her
little drop of milk!”
“Why, then, if that’s what you’d be after, Billy, my boy,” said the little
fellow, at the same time jumping before him with the speed of a
greyhound, “do you think I’m such a fool as to let you walk off with
my skin? If you don’t drop it in the turn of a hand, you’ll sup
sorrow!”
“Nonsense!” said Billy, drawing out his black-handled knife, and
holding it so the little man could see it. “Never a one of me will let
you have this skin till you give me back my cow. I know well enough
she was not clifted at all, at all, and that you and the other
Curmudgeons have got hold of her!”
“You’d better keep a civil tongue in your head,” said the little fellow,
who seemed to get quite soft at the sight of the knife. “But you’re a
brave boy, Billy Thompson, and I’ve taken a fancy to you! I don’t say
but I might get you your cow again, if you’ll give me back my skin.”
“Thank you kindly,” said Billy, winking slyly. “Give me the cow first;
then I will.”
“Well, there she is for you, you unbelieving hound!” said the little
Curmudgeon.
And for sure and for certain, what did Billy Thompson hear but his
own cow bellowing behind him for the bare life! And when he looked
back what should he see but his cow running over rocks and stones
with a long rope hanging to one of her legs, and four little fellows,
with red caps on them, hunting her as fast as they could!
“There’ll be a battle for her, Billy! There’ll be a battle!” laughed the
little Curmudgeon.
And sure enough, the little Redcaps began to fight, and in the
meantime the cow, finding herself at liberty, ran towards Billy, who
lost not a minute, but, throwing the skin on the ground, seized the
cow by the tail and began to drive her away.
“Not so fast, Billy!” said the little Curmudgeon, who stuck close by
his side; “not so fast! Though I gave you the cow, I didn’t give you
the rope that’s hanging to her leg.”
“A bargain’s a bargain,” said Billy, “so as I’ve got it, I’ll keep rope and
all.”
“If you say that again,” said the little fellow, “I’ll be after calling the
Redcaps that are fighting below there. But I don’t want to be too
hard on you, Billy, for if you have a mind for the rope, I’ll give it to
you for the little tuft of grass you have in your pocket.”
“There, take it,” said Billy, throwing down the grass with the Four-
leaved Shamrock in it.
No sooner was it out of his hand than he received such a blow that it
dashed him to the ground, insensible. When he came to himself, the
sun was shining, and where should he be but near his own house
with the cow grazing beside him? Billy Thompson could hardly
believe his eyes, and thought it was all a dream, till he saw the rope
hanging to his cow’s leg.
And that was a lucky rope for him! For, from that day out, his cow
gave more milk than any six cows in the parish, and Billy began to
look up in the world. He took farms, and purchased cattle till he
became very rich. But no one could ever get him to go to the
Horses’ Glen. And to-day he never passes an old fort, or hears a
blast of wind, without taking off his hat in compliment to the Good
People; and ’tis only right that he should.
JUDY AND THE FAIRY CAT

From Ireland
Late one Hallowe’en an old woman was sitting up spinning. There
came a soft knock at the door.
“Who’s there?” asked she.
There was no answer, but another knock.
“Who’s there?” she asked a second time.
Still no answer, but a third knock. At that the old woman got up in
anger.
“Who’s there?” she cried.
A small voice, like a child’s, sobbed: “Ah, Judy dear, let me in! I am
so cold and hungry! Open the door, Judy dear, and let me sit by the
fire and dry myself! Judy dear, let me in! Oh—let—me—in!”
Judy, thinking that it must be a small child who had lost its way, ran
to the door, and opened it. In walked a large Black Cat waving her
tail, and two black kittens followed her. They walked deliberately
across the floor, and sat down before the fire, and began to warm
themselves and lick their fur, purring all the time. Judy never said a
word, but closed the door, and went back to her spinning.
At last the Black Cat spoke.
“Judy dear,” said she, “do not sit up so late. This is the Fairies’
holiday, and they wish to hold a counsel in your kitchen, and eat
their supper here. They are very angry because you are still up, and
they cannot come in. Indeed, Judy, they are determined to kill you.
Only for myself and my two daughters, you would now be dead. So
take my advice and do not interfere with the Fairies’ Hallowe’en. But
give me some milk, for I must be off.”
Well, Judy got up in a great fright and ran as fast as she could, and
brought three saucers full of milk, and set them on the floor before
the cats. They lapped up all the milk, then the Black Cat called her
daughters and stood up.
“Thank you, Judy dear,” she said. “You have been very civil to me,
and I’ll not forget. Good night! Good night!”
And with that she and her kittens whisked up the chimney, and were
gone.
Then Judy saw something shining on the hearth. She picked it up; it
was a piece of silver money, more than she could earn in a month.
She put out the light, and went to bed; and never again did she sit
up late on Hallowe’en and interfere with Fairy hours.
THE BOGGART

From Yorkshire
Once upon a time a Boggart lived in a farmer’s house. He was a
mischievous Elf, and specially fond of teasing the children. When
they were eating their supper, he would make himself invisible, and,
standing back of their chairs, would snatch away their bread and
butter and drain their mugs of milk. On cold nights he would pull the
clothes from their warm beds and tickle their feet.
And the children liked to tease the Boggart in return. There was a
closet in the kitchen with a large knot-hole in its wall behind which
the Boggart lived. The children used to stick a shoehorn into the
hole; and the Boggart would throw it back at them. The shoehorn
made the little man so angry that one day he threw it at the
youngest boy’s head and hurt him badly.
At length the Boggart became such a torment that the farmer and
his wife decided to move to another place and let the mischievous
creature have their house to himself.
The day of the moving came. All the furniture was piled into a
wagon, and a neighbour called to say good-bye. “So, farmer,” said
he, “you are leaving the old house at last!”
“Heigh-ho!” sighed the farmer, “I am forced to do it. That villain
Boggart torments us so that we have no rest night or day! He almost
killed my youngest boy. So you see we are forced to flit.”
Scarcely were the words out of his mouth when a squeaky voice
cried from the bottom of the churn, that was in the wagon:—
“Aye! Aye! We’re flitting, you see!”
“Ods! Hang it!” cried the poor farmer. “There is that villain Boggart
again! If he’s going along with us, I shall not stir a peg. Nay! Nay!
It’s no use, Molly,” said he turning to his wife. “We may as well stay
here in the old house as to be tormented in the new one that is not
so convenient!”
And they stayed.
OWNSELF

From Northumberland
Once upon a time there was a widow and her little boy. Their home
was a small cottage in the wood. The mother worked hard from
early morning until evening, and she was so tired that she liked to
go to bed early. But the little boy did not like to go to bed early at
all.
One evening when his mother told him to undress, he begged her,
saying: “I’m not sleepy. May I sit up just this once?”
“Very well,” said she. “Sit up if you wish, but if the Fairies catch you
here alone, they will surely carry you off.” Then she went to bed.
The little boy laughed, and sat down on the hearth before the fire,
watching the blaze and warming his hands.
By and by he heard a giggling and a laughing in the chimney, and
the next minute he saw a tiny girl, as big as a doll, come tumbling
down and jump on to the hearth in front of him.
At first the little boy was dreadfully frightened, but the tiny girl
began to dance so prettily, and to nod her head at him in such a
friendly way, that he forgot to be afraid.
“What do they call you, little girl?” said he.
“My name is Ownself,” said she proudly. “What is yours?”
“My name,” he answered, laughing very hard, “is My Ownself.”
Then the two children began to play together as if they had known
each other all their lives. They danced, and they sang, and they
roasted chestnuts before the fire, and they tickled the house-cat’s
ears. Then the fire commenced to flicker, and it grew dimmer and
dimmer; so the little boy took the poker and stirred up the embers.
And a hot coal tumbled out and rolled on to Ownself’s tiny foot. And,
oh! how she screamed! Then she wept, and flew into such a rage
that the little boy got frightened and hid behind the door.
Just then a squeaky voice called down the chimney: “Ownself!
Ownself! What wicked creature hurt you?”
“My Ownself! My Ownself!” she screamed back.
“Then come here, you troublesome little Fairy,” cried the voice
angrily.
And a Fairy mother, slipper in hand, came hurrying down the
chimney; and catching Ownself, she whipped her soundly and
carried her off, saying:—
“What’s all this noise about, then? If you did it your ownself, there’s
nobody to blame but yourself!”
THE SICK-BED ELVES

From China
Wang Little-Third-One lay stretched on his bed of bamboo laths,
where a low fever kept him. He complained to every one, especially
to his friend the Magician who came to see him.
The Magician was very wise, so he gave Wang a drink of something
delicious and cool, and went away.
When Little-Third-One had drunk this, his fever fell, and he was able
to enjoy a little sleep. He was awakened by a slight noise. The night
was come. The room was lighted by the full moon, which threw a
bright gleam through the open door.
Then he saw that the room was full of insects that were moving and
flying hither and thither. There were white ants that gnaw wood,
bad-smelling bugs, enormous cockroaches, mosquitoes, and many
many flies. And they were all buzzing, gnashing their teeth, or
falling.
As Little-Third-One looked, he saw something move on the
threshold. A small man, not bigger than a thumb, advanced with
cautious steps. In his hand he held a bow; a sword was hanging by
his side.
Little-Third-One, looking closer, saw two dogs as big as shirt-buttons,
running in front of the little man. They suddenly stopped. The archer
approached nearer to the bed, and held out his bow, and discharged
a tiny arrow. A cockroach that was crawling before the dogs, made a
bound, fell on its back, kicked, and was motionless. The arrow had
run through it.
Behind the little man, other little men had come. Some rode on small
horses, and were armed with swords, and still others were on foot.
All these huntsmen scattered about the room, and ran or rode, to
and fro, shooting arrows, and brandishing their swords; until
hundreds and hundreds of insects were killed. At first the
mosquitoes escaped, but, as they cannot fly for long, every time one
of them settled on the wall, it was transfixed by a huntsman.
Soon none were left of all the insects that had broken the silence
with their buzzing, their gnashing of teeth, and their falling.
A horseman then galloped around the room, looking from right to
left. He gave a signal. All the huntsmen called their dogs, went to
the door, and disappeared.
Little-Third-One had not moved, for fear that he should disturb the
hunt. At last he went peacefully to sleep, and woke the next day
cured. When his friend the Magician came to see him, Little-Third-
One told him about the mysterious huntsmen, and his friend the
Magician smiled.
HOW PEEPING KATE WAS PISKEY-
LED

From Cornwall
’Tis Hallowe’en Night, Teddy, my boy. Don’t go out on the moor, or
near the Gump, for the Piskeys and the Spriggans are abroad,
waiting to mislead straying mortals. Many are the men and women
that the Little People have whisked away on Hallowe’en Night; and
the poor mortals have never been heard of since.
Sit down, Teddy, my boy, crack these nuts, and eat these red apples;
and I’ll tell you how Peeping Kate was Piskey-led.
I have heard the old folks say how long ago—maybe a hundred
years or so—the Squire of Pendeen had a housekeeper, an elderly
dame, called Kate Tregeer.
Well, one Hallowe’en Night, some spices and other small things were
wanted for the feasten-tide, and Kate would not trust any one to go
for them except herself. So she put on her red coat and high
steeple-crowned hat, and walked to Penzance. She bought the
goods and started for home.
It was a bright moonlight night, and though no wind was blowing,
the leaves of the trees were murmuring with a hollow sound. And
Kate could hear strange rustlings in the bushes by the side of the
road.
She had walked a very long time, and her basket was so heavy that
she began to feel tired. Her legs bent under her and she could
scarcely stand up. Just then she beheld, a little in front of her, a man
on horseback. And she could tell by the proud way he sat that he
was a gentleman-born.
She was very glad to see him, and as he was going slowly, she soon
overtook him; and when she came up, his horse stood stock-still.
“My dear Master,” she said, “how glad I am to see you. Don’t you
know me? I’m Kate Tregeer of Pendeen; and I can’t tell you how
hard I’ve worked all day.”
Then she explained to him how she had walked to Penzance, and
was now so tired that she could not stand up. But the gentleman
made no reply.
“My dear Master,” said she, “I’m footsore and leg-weary. I’ve got as
far as here, you see, but I can get no farther. Do have pity on a poor
unfortunate woman, and take her behind you. I can ride well
enough on your horse’s back without a saddle or pillion.”
But still the gentleman made no reply.
“My dear Master,” she said again, “My! but you’re a fine-looking man!
How upright you sit on your horse! But why don’t you answer me?
Are you asleep? One would think you were taking a nap; and your
horse, too, it is standing so still!”
Not having any word in reply to this fine speech, Kate called out as
loud as she could: “Even if you are a gentleman-born, you needn’t
be so stuck-up that you won’t speak to a poor body afoot!”
Still he never spoke, though Kate thought that she saw him wink at
her.
This vexed her the more. “The time was when the Tregeers were
among the first in the parish, and were buried with the gentry! Wake
up and speak to me!” screamed she in a rage. And then she took up
a stone, and threw it at the horse. The stone rolled back to her feet,
and the animal did not even whisk its tail.
Kate now got nearer, and saw that the rider had no hat on, nor was
there any hair on his bald head. She touched the horse, and felt
nothing but a bunch of furze. She rubbed her eyes and saw at once,
to her great astonishment, that it was no gentleman and horse at
all, only a smooth stone half buried in a heap of furze. And there she
was still far away from Pendeen, with her heavy basket, and her legs
so tired that she could scarcely move. And then she saw that she
had come a short distance only, and knew that she must be
bewitched.
Well, on she went; and seeing a light at her left hand she thought
that it shone from the window of a house where she might rest
awhile. So she made for it straight across the moor, floundering
through bogs, and tripping over bunches of furze. And still the light
was always just ahead, and it seemed to move from side to side.
Then suddenly it went out, and she was left standing in a bog. The
next minute she found herself among furze-ricks and pigsties, in the
yard of Farmer Boslow, miles away from Pendeen.
She opened the door of an old outhouse, and entered, hoping to get
a few hours’ rest. There she lay down on straw and fell asleep; but
she was soon wakened by some young pigs who were rooting
around in the straw. That was too much for Kate. So up she got, and
as she did so she heard the noise of a flail. And seeing a glimmer of
light in a barn near by, she crept softly to a little window in the barn,
and peeped to find what was going on.
At first she could see only two rush-wicks burning in two old iron
lamps. Then through the dim light she saw the slash-flash of a flail
as it rose and fell, and beat the barn floor. She stood on tiptoes, and
stuck her head in farther, and whom did she see, wielding the flail,
but a little old man, about three feet high, with hair like a bunch of
rushes, and ragged clothes. His face was broader than it was long,
and he had great owl-eyes shaded by heavy eyebrows from which
his nose poked like a pig’s snout. Kate noticed that his teeth were
crooked and jagged, and that at each stroke of the flail, he kept
moving his thin lips around and around, and thrusting his tongue in
and out. His shoulders were broad enough for a man twice his
height, and his feet were splayed like a frog’s.
“Well! Well!” thought Kate. “This is luck! To see the Piskey threshing!
For ever since I can remember I have heard it said that the Piskey
threshed corn for Farmer Boslow on winter nights, and did other odd
jobs for him the year round. But I would not believe it. Yet here he
is!”
Then she reached her head farther in, and beheld a score of little
men helping the Piskey. Some of them were lugging down the
sheaves, and placing them handy for him; and others were carrying
away the straw from which the grain had been threshed. Soon a
heap of corn was gathered on the floor, as clean as if it had been
winnowed.
In doing this the Piskey raised such a dust that it set him and some
of the little men sneezing. And Kate, without stopping to think,
called out:—
“God bless you, little men!”
Quick as a wink the lights vanished, and a handful of dust was
thrown into her eyes, which blinded her so that for a moment she
could not see. And then she heard the Piskey squeak:—
“I spy thy face,
Old Peeping Kate,
I’ll serve thee out,
Early and late!”
Kate, when she heard this, felt very uneasy, for she remembered
that the Little People have a great spite against any one who peeps
at them, or pries into their doings.
The night being clear, she quickly found her way out of a crooked
lane, and ran as fast as she could, and never stopped until she
reached the Gump. There she sat down to rest awhile.
After that she stood up; and turn whichever way she might the same
road lay before her. Then she knew that the Piskey was playing her a
trick. So she ran down a hill as fast as she could, not caring in what
direction she was going, so long as she could get away from the
Piskey.
After running a long while, she heard music and saw lights at no
great distance. Thinking that she must be near a house, she went
over the downs toward the lights, feeling ready for a jig, and
stopping now and then to dance around and around to the strains of
the music.
But instead of arriving at a house, in passing around some high
rocks she came out on a broad green meadow, encircled with furze
and rocks. And there before her she saw a whole troop of Spriggans
holding an Elfin Fair. It was like a feasten-day. Scores of little booths
were standing in rows, and were covered with tiny trinkets such as
buckles of silver and gold glistening with Cornish diamonds, pins
with jewelled heads, brooches, rings, bracelets, and necklaces of
crystal beads, green and red or blue and gold; and many other
pretty things new to Kate.
There were lights in all directions—lanterns no bigger than Foxgloves
were hanging in rows; and on the booths, rushlights in tulip-cups
shone among Fairy goodies such as Kate had never dreamed of. Yet
with all these lights there was such a shimmer over everything that
she got bewildered, and could not see as plainly as she wished.
She did not care to disturb the Little People until she had looked at
all that was doing. So she crept softly behind the booths and
watched the Spriggans dancing. Hundreds of them, linked hand in
hand, went whirling around so fast as to make her dizzy. Small as
they were, they were all decked out like rich folk, the little men in
cocked hats and feathers, blue coats gay with lace and gold buttons,
breeches and stockings of lighter hue, and tiny shoes with diamond
buckles.
Kate could not name the colours of the little ladies’ dresses, which
were of all the hues of Summer blossoms. The vain little things had
powdered their hair, and decked their heads with ribbons, feathers,
and flowers. Their shoes were of velvet and satin, and were high-
heeled and pointed. And such sparkling black eyes as all the little
ladies had, and such dimpled cheeks and chins! And they were
merry, sprightly, and laughing.
All the Spriggans were capering and dancing around a pole wreathed
with flowers. The pipers, standing in their midst, played such lively
airs that Kate never in all her life had wanted to dance more. But
she kept quite still, for she did not wish the Little People to know
that she was there. She was determined to pocket some of the
pretty things in the booths, and steal softly away with them. She
thought how nice a bright pair of diamond buckles would look on her
best shoes, and how fine her Sunday cap would be ornamented with
a Fairy brooch.
So she raised her hand and laid it on some buckles, when—oh! oh!—
she felt a palmful of pins and needles stick into her fingers like red-
hot points; and she screamed:—
“Misfortune take you, you bad little Spriggans!”
“SHE SAW A WHOLE TROOP OF SPRIGGANS HOLDING AN
ELFIN FAIR”
Immediately the lights went out, and she felt hundreds of the Little
People leap on her back, and her neck, and her head. At the same
moment others tripped up her heels, and laid her flat on the ground,
and rolled her over and over.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like