crypto.pdf notes
crypto.pdf notes
by
Steven Gordon
Crypto 22.03
4 January 2022 (r1973)
Contents
List of Tables xi
Glossary xiii
I Introduction 1
1 Introduction 3
i
ii CONTENTS
5 Number Theory 31
5.1 Divisibility and Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Fermat’s and Euler’s Theorems . . . . . . . . . . . . . . . . . . . . . . . 37
5.4 Discrete Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.5 Computationally Hard Problems . . . . . . . . . . . . . . . . . . . . . . . 41
13 RSA 135
13.1 RSA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
13.2 Analysis of RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
13.3 Implementations of RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
13.4 RSA in OpenSSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
13.4.1 RSA Key Generation in OpenSSL . . . . . . . . . . . . . . . . . . 145
13.4.2 RSA Signing in OpenSSL (Sender) . . . . . . . . . . . . . . . . . 149
13.4.3 RSA Encryption in OpenSSL (Sender) . . . . . . . . . . . . . . . 149
13.4.4 RSA Decryption in OpenSSL (Receiver) . . . . . . . . . . . . . . 150
13.4.5 RSA Verification in OpenSSL (Receiver) . . . . . . . . . . . . . . 150
13.4.6 RSA OpenSSL Exercises . . . . . . . . . . . . . . . . . . . . . . . 150
13.5 RSA in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
iv CONTENTS
V Authentication 173
16 Hash Functions and MACs 175
16.1 Informal Overview of Hashes and MACs . . . . . . . . . . . . . . . . . . 175
16.2 Introduction to Hash Functions . . . . . . . . . . . . . . . . . . . . . . . 177
16.3 Properties of Cryptographic Hash Functions . . . . . . . . . . . . . . . . 178
16.4 Introduction to Message Authentication Codes . . . . . . . . . . . . . . . 180
Index 231
vi CONTENTS
List of Figures
vii
viii LIST OF FIGURES
20.1 Scaling the classical number field sieve (NFS) vs. Shor’s quantum algo-
rithm for factoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
LIST OF FIGURES ix
20.1 Worst Case Brute Force Attempts with Classical and Quantum Algorithms 203
xi
xii LIST OF TABLES
Glossary
ACK Acknowledgement. Packet or frame type usually sent upon successful receipt
of data.
AP Access Point. Device in wireless LAN that bridges wired and wireless
segments.
ASCII American Standard Code for Information Interchange. Format for mapping
English characters to 7 bit values.
ATM Asynchronous Transfer Mode. Wired technology used in core and access
networks.
BSD Berkeley Software Distribution. The original open source variant of Unix,
now a popular Linux alternative for servers.
BSSID Basic Service Set Identifier. Unique to a wireless LAN AP; normally the
AP MAC address.
xiii
xiv LIST OF TABLES
CBC Cipher Block Chaining. Mode of operation used to allow symmetric block
ciphers to encrypt data larger than a block size.
CCA Chosen Ciphertext Attack. Attack category where the attacker can select
ciphertext values and learn the corresponding plaintext values.
CFB Cipher Feedback mode. Mode of operation used to allow symmetric block
ciphers to encrypt data larger than a block size
CLI Command Line Interface. User interface to a computer that involves typing
text based commands.
CPA Chosen Plaintext Attack. Attack category where the attacker can select
plaintext and obtain the corresponding ciphertext.
CTR Counter mode. Mode of operation used to allow symmetric block ciphers to
encrypt data larger than a block size
CTS Clear To Send. Wireless LAN control from sent in response to RTS.
DES Data Encryption Standard. Symmetric key cipher. Not recommended for
use.
DDoS Distributed Denial of Service. DoS attack coming from many computers.
DNS Domain Name System. Maps human friendly domain names to computer
readable IP addresses.
DoS Denial of Service. Attack on server or network the prevents normal users
from access the service.
ECB Electronic Code Book. Mode of operation used to allow symmetric block
ciphers to encrypt data larger than a block size.
ECIES Elliptic Curve Integrated Encryption Scheme. Combines ECDH for key
exchange with symmetric key data encryption.
ESSID Extended Service Set Identifier. Name given to wireless LAN network;
multiple APs may be in the same network.
FTP File Transfer Protocol. Application layer protocol for transferring files
between client and server. Uses TCP.
FSF Free Software Foundation. Organisation that promotes the use of free (as in
freedom), open source software.
GNU GNU’s Not Unix. A free operating system, using free, open source software.
Often combined with Linux kernel to produce GNU/Linux.
HMAC Hash-based MAC. Message authentication code function that uses existing
hash algorithms. That is, converts hash functions into MAC functions.
HTTPS HTTP Secure. HTTP on top of SSL/TLS, to provide secure web browsing.
IANA Internet Assigned Numbers Authority. Organisation that defines the use of
Internet numbers such as ports and protocol numbers.
ICMP Internet Control Message Protocol. Protocol for testing and diagnostics in
the Internet. Used by ping.
IETF Internet Engineering Task Force. Organisation that defines standards for
Internet technologies, including IP, TCP and HTTP.
ISAKMP Internet Security Association and Key Protocol. Security protocol for key
exchange.
KPA Known Plaintext Attack. Attack category where the attacker knows pairs of
plaintext/tciphertext.
LAN Local Area Network. Network covering usually offices, homes and buildings.
Layer 1 and 2 technology.
MD5 Message Digest 5 hash function. Cryptographic hash function that is still
widely used, but no longer considered secure for many purposes.
NIC Network Interface Card. Device in a computer that connects the computer
to a network.
NTP Network Time Protocol. Protocol for clients to synchronise their clocks to
more accurate time servers.
OFB Output Feedback mode. Mode of operation used to allow symmetric block
ciphers to encrypt data larger than a block size
OWASP Open Web Application Security Project. Project that keeps track of
common attacks on web applications and provides advice on securing apps.
PHY Physical Layer. Lowest layer in Internet and OSI layer architectures. Deals
with transmitting bits as signals.
PSEC Provably Secure Elliptic Curve Encryption. Data encryption using ECC.
PSK Pre-Shared Key. Secret cryptographic key that two parties have exchanged in
advance.
QKD Quantum Key Distribution. A secret key sharing protocol based on quantum
technology.
RAM Random Access Memory. Short term, volatile storage area for computers.
RFC Request For Comment. Type of standard used by IETF. The standards for
IP, TCP and DNS are RFCs.
RSA Rivest Shamir Adleman cipher. Public key cryptographic cipher used for
confidentiality, authentication and digital signatures.
RTT Round Trip Time. Time for a message to travel from source to destination
and then back to the source.
SCP Secure Copy. Command and protocol for transferring files securely from one
computer to another.
SDH Synchronous Digital Hierarchy. Wide area network technology used across
cities and countries.
SMTP Simple Mail Transfer Protocol. Application layer protocol for transferring
email between computers.
SSL Secure Sockets Layer. Protocol for securing application data that uses TCP
for communications. Replaced by TLS but still referred to.
Introduction
1
Chapter 1
Introduction
• LATEX source for the book (including all the .tex, images and style files) as well as
selected examples: https://sandilands.info/crypto/source/
Video
Introduction to Cryptography Study Notes (6 min; Dec 2021)
https://www.youtube.com/watch?v=s7HmFPvw8nc
3
4 CHAPTER 1. INTRODUCTION
Chapter 2
This chapter defines key concepts and terminology in cryptography. This would likely
to have been already covered in a subject that introduces computer security (e.g. an
introduction to networking or IT security). Therefore it is quite brief, serving mainly a
refresher and to set the scene for subsequent chapters.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
Video
Cryptography Concepts and Terminology mini-lecture (14 min; Feb 2020)
https://www.youtube.com/watch?v=fx4nGsoum6A
5
6 CHAPTER 2. CRYPTOGRAPHY CONCEPTS AND TERMINOLOGY
Authentication ensures that the individual is who she claims to be (the authentic or
genuine person) and not an impostor
Example of authentication: check username and password when user logs into system.
Example of authorisation: check that user is authorised to access a particular docu-
ment.
Example of accounting: record logs of who accesses files and provide summary reports.
This book does not attempt to cover everything. The scope is:
• How: encrypt the original data; anyone can see the encrypted data, but only au-
thorised individuals can decrypt to see the original data
• Used for both sending data across network and storing data on a computer system
While encryption is used to provide different services in cryptography, the main service
is confidentiality: keeping data secret. In the following we talk about using encryption for
confidentiality. Later we will see that the same encryption mechanisms can also provide
other services such as authentication, integrity and digital signatures.
Figure 2.1 shows a simple model of system that uses encryption for confidentiality.
Assume two users, A and B, want to communicate confidentially. User A has a plaintext
message to send to B. User A first encrypts that plaintext using a key. The output
ciphertext is sent to user B (e.g. across the Internet). We assume the attacker, user C,
can intercept anything sent – in this case they see the ciphertext. User B receives the
2.2. CRYPTOGRAPHY CONCEPTS 7
ciphertext and decrypts. If the correct key and algorithm is used, then the output of the
decryption is the original plaintext.
The aim of the attacker is to find the plaintext. They can either do some analysis
of the ciphertext to try to discover the plaintext, or try to find the key (if the attacker
knows key 2, they can decrypt the same as user B).
In symmetric key crypto, Key 1 and Key 2 are identical (symmetry of the keys).
In public key crypto, Key 1 is the public key of B and Key 2 is the private key of B.
(asymmetric of the keys).
Given the above scenario, the important terms in cryptography are:
9
Chapter 3
Software Tools
This chapter lists common tools referred to within the rest of the book. The purpose is
to make you aware of the tools; not to teach you how to use the tools. While some setup
and basic usage instructions may be given, you can normally find detailed instructions
by searching online, or within the tool help or manual pages.
While your output of the ls command may be different it should show 32, meaning
the file size is 32 bytes (note the text contains 32 characters, including the space at the
end; the -n option means no new line character is added).
File: crypto/tools.tex, r1962
11
12 CHAPTER 3. SOFTWARE TOOLS
$ cat demo.txt
This is a super secret message. $
The command prompt does not start on a new line since our text file does not finish
with a new line character.
Now let’s look at the file in hex and then binary format (using the -b option):
$ xxd demo.txt
00000000: 5468 6973 2069 7320 6120 7375 7065 7220 This is a super
00000010: 7365 6372 6574 206d 6573 7361 6765 2e20 secret message.
$ xxd -b demo.txt
00000000: 01010100 01101000 01101001 01110011 00100000 01101001 This i
00000006: 01110011 00100000 01100001 00100000 01110011 01110101 s a su
0000000c: 01110000 01100101 01110010 00100000 01110011 01100101 per se
00000012: 01100011 01110010 01100101 01110100 00100000 01101101 cret m
00000018: 01100101 01110011 01110011 01100001 01100111 01100101 essage
0000001e: 00101110 00100000 .
2. The hex (or binary) values, with some spacing to ease the readability
3. The ASCII character (if it is printable) for the corresponding byte. A dot is shown
if the ASCII character is not printable (e.g. the DELete or ESCape characters).
xxd has a variety of command line options, which are well described in the man
page. For example, to show 8 bytes per line (column size), group into 2 sets of 4 bytes,
displaying a length just the first 16 bytes:
$ xxd -c 8 -g 4 -l 16 demo.txt
00000000: 54686973 20697320 This is
00000008: 61207375 70657220 a super
Combined with other commands, such as cut and grep, you can extract information
of interest. For example, show the binary representation of the first 64 bytes of the file
/bin/ls, 8 bytes per line:
$ bc
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free
Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type ‘warranty’.
<kbd>1␣+␣2</kbd>
3
<kbd>20␣-␣13</kbd>
7
<kbd>7␣*␣6␣+␣2</kbd>
44
<kbd>7␣*␣(6␣+␣2)</kbd>
56
<kbd>56␣/␣8</kbd>
7
<kbd>54␣/␣10</kbd>
5
Note that the last division gives the quotient as the answer, not a fraction. By default,
fractions are not used, but can easily be enabled by setting the scale parameter as follows:
scale=2
54/10
5.40
10^2
100
2^3
8
The real use of bc in this book comes when performing operators on large num-
bers. As bc is an arbitrary precision calculator, it will perform any calculation without
approximating (although beware, some calculations will take a long time).
2^10
1024
2^100
1267650600228229401496703205376
2^1000
10715086071862673209484250490600018105614048117055336074437503883703\
51051124936122493198378815695858127594672917553146825187145285692314\
04359845775746985748039345677748242309854210746050623711418779541821\
53046474983581941267398767559165543946077062914571196477686542167660\
429831652624386837205668069376
2^10000000
...
While there are faster algorithms than what bc uses, it can be used for modular
exponentiation.
29401^19231
11791936741673782277951361412655628509750802626058442595065879112837\
30645660979602186783941907308557893020948598603221372351480244103370\
...
27540919410894776657722419140083914356072020143002078956241640716425\
878094269792146304397724529078575760209188791401
29401^19231 % 37669
35694
quit
To perform (normal) logarithms, you need to start bc using the -l option to load the
math library. Then the l() function can be used to calculate the natural logarithm, or
find the logarithm in any base. Although be careful with the scale.
$ bc -l
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free
Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type ‘warranty’.
<kbd>l(2.718)</kbd>
.99989631572895196894
<kbd>l(100)/l(10)</kbd>
2.00000000000000000000
<kbd>l(32)/l(2)</kbd>
5.00000000000000000004
<kbd>l(1024)/l(2)</kbd>
10.00000000000000000010
To see the details of the LCG algorithm used, look in the Bash source code; after
downloading and unpacking the source, look in the file variables.c, search for the
function brand. You can also see that the seed is based on the current time and process
ID.
A little bit of text processing will return just the random value (omitting the other
output produced by xxd). Let’s use cut to grab the 2nd field, considering the output as
space separated/delimited:
$ cat /dev/urandom | xxd -l 16 -g 16 | cut -d " " -f 2
313be197c436bebf074a2da3599a0ce0
16 CHAPTER 3. SOFTWARE TOOLS
Read the man pages for an explanation of the Linux kernel random number source
device /dev/urandom and the related /dev/random. The section 7 man page gives an
overview, while the section 4 man page gives more technical details on the two devices.
$ man -S7 random
$ man -S4 urandom
You can write the hash value to a file, and then use that file to perform a check:
$ sha256sum demo.txt > demo.sha256
$ cat demo.sha256
12e38182116f070ef1a4d8961692787aa57add87d5496c4daf402279bc71c0b6 demo.txt
$ sha256sum -c demo.sha256
demo.txt: OK
A change to the file should result in failure of the check (if the hash is not recomputed):
$ cat demo.txt
This is a super secret message. $
$ echo -n "This is a super secret message! " > demo.txt
$ sha256sum -c demo.sha256
demo.txt: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
You can also use OpenSSL (Section 3.2) to apply hash functions.
3.2 OpenSSL
3.2.1 Overview of OpenSSL
https://www.openssl.org/ is a program and library that supports many different crypto-
graphic operations, including:
• Hash functions
• Certificate creation
• Digital signatures
While the primary purpose of OpenSSL is as a library, i.e. you write software that calls
OpenSSL to perform cryptographic operations for your software, it also is a standalone
program with a command-line interface. While we only use the standalone program, once
you are familiar with it, you should be able to use the library.
OpenSSL supports different operations or commands, with the name of the command
following openssl. For example, to perform symmetric key encryption the command is
enc and on the command line you run:
$ openssl enc
$ openssl help
Standard commands
asn1parse ca ciphers cms
...
...
In reverse order:
For cipher and message digest commands, you can read the common format in the
man pages man enc and man dgst, respectively. Most of the standard commands have
their own man page, e.g. man rsa, man x509. Note that there are sometimes multiple
commands that can be used to perform the same cryptographic operation. For example,
you can generate Rivest Shamir Adleman cipher (RSA) key pairs using either genrsa
or genpkey commands. This is mainly for compatibility reasons, that is, over time new
commands have been added and the old command maintained.
Some common commands you will see in this book include:
$ openssl version
OpenSSL 1.1.1 11 Sep 2018
$ openssl list -cipher-algorithms
AES-128-CBC
AES-128-CBC-HMAC-SHA1
...
SM4-ECB
SM4-OFB
$ openssl list -cipher-commands
aes-128-cbc aes-128-ecb aes-192-cbc aes-192-ecb
aes-256-cbc aes-256-ecb aria-128-cbc aria-128-cfb
aria-128-cfb1 aria-128-cfb8 aria-128-ctr aria-128-ecb
aria-128-ofb aria-192-cbc aria-192-cfb aria-192-cfb1
aria-192-cfb8 aria-192-ctr aria-192-ecb aria-192-ofb
3.2. OPENSSL 19
Section 3.1.3 shows different ways to generate random numbers in Linux. OpenSSL has
its own PRNG which is also considered cryptographically strong. This is accessed using
the rand command and specifying the number of bytes to generate. To get hex output,
use the -hex option:
$ openssl rand -hex 8
89978d4960720a750f35d569bcf28494
You can also output to a file and view the file with xxd:
On Linux, the OpenSSL rand command normally uses output from /dev/urandom to
seed (initialise) it’s PRNG. Read the man page for more information.
20 CHAPTER 3. SOFTWARE TOOLS
3.3 Python
Python (www.python.org) is a programming language that is seeing increasing use in
networking applications that require cryptography. We use it for examples in this book
as it is relatively quick to pick up and start building prototype applications with custom
or existing cryptographic mechanisms.
Like most programming languages, including Java, PHP and C++, libraries are avail-
able that already implement common cryptographic mechanisms; you can focus your
efforts on developing applications, not implementing encryption ciphers and hash algo-
rithms. However there is currently no single standard cryptography library for Python;
several are available. In this book we use the cryptography package, as introduced
in Section 3.3.1. Another common library, not used in this book, is based on PyNaCL
(https://pynacl.readthedocs.io/), which is based on libsodium and NaCL.
To use classical ciphers, the PyCipher package is used, which is introduced in Sec-
tion 3.3.2.
$ cd pycipher-master/
$ sudo python setup.py install
$ python setup.py test
This installs and tests the latest version. Depending on the version, some tests my
fail. In my case it ran 41 tests, but 2 tests failed (using the Porta algorithm). Do not
use the algorithms that failed the tests.
Using pycipher
A quick example of encrypting and decrypting with pycipher is below. Other ciphers
include: Beaufort, Foursquare, Enigma, Polybius, Bifid, ADFGVX, Coltrans, Playfair,
and Vigenere. Details on the ciphers supported and how to use them are in the latest
documentation.
$ python
Python 2.7.3 (default, Feb 27 2014, 20:00:17)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycipher
>>> pycipher.Caesar(3).encipher("hello")
’KHOOR’
>>> pycipher.Caesar(3).decipher("khoor")
’HELLO’
>>> quit()
22 CHAPTER 3. SOFTWARE TOOLS
Chapter 4
nx × ny = nx+y
nx
= nx−y
ny
logn (x × y) = logn (x) + logn (y)
!
x
logn = logn (x) − logn (y)
y
Example 4.1 (Properties of Exponentials). Properties can be applied to simplify calcu-
lations:
212 = 22+10
= 22 × 210
= 4 × 1024
= 4096
With this property of exponentials, if you can remember the values of 21 to 210 then
you can approximate most values of 2b that you come across in communications and
security. Table 4.1 gives the exact or approximate decimal value for b-bit numbers.
File: crypto/statistics.tex, r1791
23
24 CHAPTER 4. STATISTICS FOR COMMUNICATIONS AND SECURITY
Exponent, b 2b
(bits) Exact Value Approx. Value
0 1 -
1 2 -
2 4 -
3 8 -
4 16 -
5 32 -
6 64 -
7 128 -
8 256 -
9 512 -
10 1,024 1,000 = 103
11 - 2,000
12 - 4,000
13 - 8,000
14 - 16,000
...
19 - 512,000
20 - 1,000,000 = 106
21 - 2 × 106
22 - 4 × 106
23 - 8 × 106
...
29 - 512 × 106
30 - 109
31 - 2 × 109
32 - 4 × 109
33 - 8 × 109
...
39 - 512 × 109
40 - 1012
50 - 1015
60 - 1018
70 - 1021
x × 10 - 103x
Example 4.2 (Properties of Exponentials with Binary Values). Properties and approx-
imations can be used to perform large calculations:
2128
= 2128−100
2100
= 228
= 28 × 220
≈ 256 × 106
≈ 108
4.2 Counting
Definition 4.1 (Number of Binary Values). Given an n-bit number, there are 2n possible
values.
Example 4.5 (Number of IP Addresses). An IP address is a 32-bit value. There are 232
or approximately 4 × 109 possible IP addresses.
Example 4.6 (Number of Keys). If choosing a 128-bit encryption key randomly, then
there are 2128 possible values of the key.
Video
Number of Binary Values (5 min; Jan 2015)
https://www.youtube.com/watch?v=AJU0BgwkXLU
Definition 4.2 (Fixed Length Sequences). Given a set of n items, there are nk possible
k-item sequences, assuming repetition is allowed.
Example 4.7 (Sequences of PINs). A user chooses a 4-digit PIN for a bank card. As
there are 10 possible digits, there are 104 possible PINs to choose from.
26 CHAPTER 4. STATISTICS FOR COMMUNICATIONS AND SECURITY
Video
Fixed Length Sequences (7 min; Jan 2015)
https://www.youtube.com/watch?v=9srF2V1f1gU
Definition 4.3 (Pigeonhole Principle). If n objects are distributed over m places, and if
n > m, then some places receive at least two objects.
Video
Pigeonhole Principle (2 min; Jan 2015)
https://www.youtube.com/watch?v=sz9yPCGW2D4
Example 4.9 (Pigeonhole Principle on Balls). There are 20 balls to be placed in 5 boxes.
At least one box will have at least two balls. If the balls are distributed in a uniform
random manner among the boxes, then on average there will be 4 balls in each box.
Video
Pigeonhole Principle with Uniform Random Distribution (1 min; Jan 2015)
https://www.youtube.com/watch?v=PDCuL_SExu0
Example 4.10 (Pigeonhole Principle on Hash Functions). A hash function takes a 100-
bit input value and produces a 64-bit hash value. There are 2100 possible inputs dis-
tributed to 264 possible hash values. Therefore at least some input values will map to
the same hash value, that is, a collision occurs. If the hash function distributes the input
100
values in a uniform random manner, then on average, there will be 2264 ≈ 6.4 × 1010
different input values mapping to the same hash value.
Video
Pigeonhole Principle and Hash Functions (5 min; Jan 2015)
https://www.youtube.com/watch?v=5xjMuZIMLLk
Video
Factorial and arranging balls (2 min; Jan 2015)
https://www.youtube.com/watch?v=Ay_E8bsOXJw
Example 4.12 (Factorial and English Letters). The English alphabetic has 26 letters,
a–z. There are 26! ≈ 4 × 1026 ways to arrange those 26 letters.
Video
Arranging English Letters (2 min; Jan 2015)
https://www.youtube.com/watch?v=ksilZXfwuQs
possible keys.
Video
Number of keys for ideal block cipher (6 min; Jan 2015)
https://www.youtube.com/watch?v=iQBLbz0w99s
Example 4.14 (Pairs of Coloured Balls). There are four coloured balls: Red, Green,
Blue and Yellow. The number of different coloured pairs of balls is 4×3/2 = 6. They are:
RG, RB, RY, GB, GY, BY. Repetitions are not allowed (as they won’t produce different
coloured pairs), meaning RR is not a valid pair. Ordering doesn’t matter, meaning RG is
the same as GR.
Example 4.15 (Pairs of Network Devices). A computer network has 10 devices. The
number of links needed to create a full-mesh topology is 10 × 9/2 = 45.
28 CHAPTER 4. STATISTICS FOR COMMUNICATIONS AND SECURITY
Example 4.16 (Pairs of Key Sharers). There are 50 users in a system, and each user
shares a single secret key with every other user. The number of keys in the system is
50 × 49/2 = 1, 225.
Video
Number of Pairs from n Items (5 min; Jan 2015)
https://www.youtube.com/watch?v=ZykkvK_Hu5g
4.4 Probability
In this chapter when referring to a “random” number it means taken from a uniform
random distribution. That means there is equal probability of selecting each value from
the set.
Example 4.17 (Probability of Selecting Coloured Ball). There are five coloured balls in
a box: red, green, blue, yellow and black. The probability of selecting the yellow ball is
1/5.
Example 4.18 (Probability of Selecting Backoff Value). IEEE 802.11 (WiFi) involves a
station selecting a random backoff from 0 to 15. The probability of selecting 5 is 1/16.
Video
Probability of Selecting a Particular Value from a Set (2 min; Jan 2015)
https://www.youtube.com/watch?v=hB5Hs4QPUUQ
Definition 4.8 (Total Expectation). For a set of n events which are mutually exclusive
and exhaustive, where for event i the expected value is Ei given probability Pi , then the
total expected value is:
n
E=
X
Ei Pi
i=1
Video
Total Expectation Definition (1 min; Jan 2015)
https://www.youtube.com/watch?v=HiHIE9oFeiU
Example 4.19 (Total Expectation of Packet Delay). Average packet delay for packets
in a network is 100 ms along path 1 and 150 ms along path 2. Packets take path 1 30% of
the time, and take path 2 70% of the time. The average packet delay across both paths
is: 100 × 0.3 + 150 × 0.7 = 135 ms.
4.4. PROBABILITY 29
Video
Total Expectation and Packet Delay (3 min; Jan 2015)
https://www.youtube.com/watch?v=-yxbhR-EeHQ
Example 4.20 (Total Expectation of Password Length). In a network with 1,000 users,
150 users choose a 6-character password, 500 users choose a 7-character password, 250
users choose 9-character password and 100 users choose a 10-character password. The
average password length is 7.65 characters.
Video
Total Expectation and Password Selection (3 min; Jan 2015)
https://www.youtube.com/watch?v=zTX7ENu-F20
Video
Number of Attempts Needed to Randomly Select a Value (1 min; Jan 2015)
https://www.youtube.com/watch?v=brDlrkuiH50
Example 4.21 (Number of Attempts in Choosing Number). One person has chosen
a random number between 1 and 10. Another person attempts to guess the random
number. The best case is that they guess the chosen number on the first attempt. The
worst case is that they try all other numbers before finally getting the correct number,
that is 10 attempts. If the process is repeated 1000 times (that is, one person chooses
a random number, the other guesses, then the person chooses another random number,
and the other guesses again, and so on), then on average 10% of time it will take 1
attempt (best case), 10% of the time it will take 2 attempts, 10% of the time it will take
3 attempts, . . . , and 10% of the time it will take 10 attempts (worst case). The average
number of attempts is therefore 5.
Video
Attempts to select a value between 1 and 10 (5 min; Jan 2015)
https://www.youtube.com/watch?v=nQUda8Uq-Ho
Example 4.22 (Number of Attempts in Choosing Key). A user has chosen a random
128-bit encryption key. There are 2128 possible keys. It takes an attacker on average
2128 /2 = 2127 attempts to find the key. If instead a 129-bit encryption key was used, then
the attacker would take on average 2129 /2 = 2128 attempts. (Increasing the key length
by 1 bit doubles the number of attempts required by the attacker to guess the key).
30 CHAPTER 4. STATISTICS FOR COMMUNICATIONS AND SECURITY
Video
Attempts to guess a secret key (3 min; Jan 2015)
https://www.youtube.com/watch?v=8IttaYPN4MA
4.5 Collisions
Definition 4.10 (Birthday Paradox). Given n random numbers selected from the range
1 to d, the probability that at least two numbers are the same is:
!n(n−1)/2
d−1
p(n; d) ≈ 1 −
d
Example 4.23 (Two People Have Same Birthday). Given a group of 10 people, the
probability of at least two people have the same birth date (not year) is:
364
10(9)/2
p(10; 365) ≈ 1 − = 11.6%
365
Defintion 4.10 can be re-arranged to find the number of values needed to obtain a
specified probability that at least two numbers are the same:
v
1
u !
t2d ln
u
n(p; d) ≈
1−p
Example 4.24 (Group Size for Birthday Matching). How many people in a group are
needed such that the probability of at least two of them having the same birth date is
50%?
1
s
n(0.5; 365) ≈ 2 × 365 × ln = 22.49
1 − 0.5
So 23 people in a group means there is 50% chance that at least two have the same birth
date.
Example 4.25 (Group Size for Hash Collision). Given a hash function that outputs a
64-bit hash value, how many attempts are need to give a 50% chance of a collision?
1
s
n(0.5; 2 ) ≈
64
2 × 264 × ln
1 − 0.5
√
≈ 264
= 232
Following Example 4.25, the number of attempts to produce a collision when using
an n-bit hash function is approximately 2n/2 .
Chapter 5
Number Theory
This chapter introduces basic concepts of number theory. These concepts are useful when
studying several aspects of cryptography, especially public key cryptosystems.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
Example 5.1 (Divides). 3 divides 12, since 12 = 4 × 3. Also, 3 is a divisor of 12, or 3|12.
Definition 5.2 (Greatest Common Divisor). gcd(a, b) returns the greatest common di-
visor of integers a and b. There are efficient algorithms for finding the gcd, i.e. Euclidean
algorithm.
Example 5.2 (Greatest Common Divisor). gcd(12, 20) = 4, since the divisors of 12 are
(1, 2, 3, 4, 6, 12) and the divisors of 20 are (1, 2, 4, 5, 10, 20).
Definition 5.3 (Relatively Prime). Two integers, a and b, are relatively prime if gcd(a, b) =
1.
Example 5.3 (Relatively Prime). gcd(7, 12) = 1, since the divisors of 7 are (1, 7) and
the divisors of 12 are (1, 2, 3, 4, 6, 12). Therefore 7 and 12 are relatively prime to each
other.
Exercise 5.1 (Relatively Prime). How many positive integers less than 10 are relatively
prime with 10?
Solution 5.1 (Relatively Prime). There are 9 positive integers less than 10, i.e. 1, 2, 3, . . . , 9.
For an integer a to be relatively prime to 10, then gcd(a, 10) = 1. The divisors of 10 are 1,
2, 5 and 10. As the even integers have a divisor of 2, then they cannot be relatively prime
with 10. That leaves 1, 3, 5, 7 and 9. gcd(5, 10) = 5 and therefore 5 is not relatively
prime with 10. The integers 1, 3, 7 and 9 cannot be divided by 3, 5 or 10, and therefore
all have a greatest common divisor with 10 of 1. Hence 1, 3, 7 and 9 are less than 10 and
relatively prime with 10. The answer is 4.
File: crypto/number.tex, r1963
31
32 CHAPTER 5. NUMBER THEORY
Video
Divisibility, Greatest Common Divisor and Relatively Prime (10 min; Apr 2021)
https://www.youtube.com/watch?v=c5t9WuP8C1w
Definition 5.4 (Prime Number). An integer p > 1 is a prime number if and only if its
only divisors are +1, −1, +p and −p.
Example 5.4 (Prime Number). The divisors of 13 are (1, 13), that is, 1 and itself.
Therefore 13 is a prime number. The divisors of 15 are (1, 3, 5, 15). Since the divisors
include numbers other than 1 and itself, 15 is not prime.
Definition 5.5 (Prime Factors). Any integer a > 1 can be factored as:
where p1 < p2 < . . . < pt are prime numbers and where each ai is a positive integer
Example 5.5 (Prime Factors). The following are examples of integers expressed as prime
factors:
13 = 131
15 = 31 × 51
24 = 23 × 31
50 = 21 × 52
560 = 24 × 51 × 71
2800 = 24 × 52 × 71
5.1. DIVISIBILITY AND PRIMES 33
Exercise 5.2 (Integers as Prime Factors). Find the prime factors of 12870, 12936 and
30607.
Solution 5.2 (Integers as Prime Factors). A naive approach (which works for these
small examples) is to check if the number is divisible by primes, in increasing order. For
example, is 12870 divisible by 2? Yes, then 2 is a prime factor. Is the result, 6435 divisible
by 2? No, then is 6435 divisible by 3? Yes, and so on.
A “cheat” is to use software to find the factors. In Linux command line there is a
command called factor.
The answers are:
Definition 5.6 (Prime Factorization Problem). There are no known efficient, non-
quantum algorithms that can find the prime factors of a sufficiently large number.
Example 5.6 (Prime Factorization Problem). RSA Challenge involved researchers at-
tempting to factor large numbers. Largest number measured in number of bits or decimal
digits. Some records held over time are:
1991: 330 bits or 100 digits
2005: 640 bits or 193 digits
2009: 768 bits or 232 digits
Equivalent of 2000 years on single core 2.2 GHz computer to factor 768 bit
Current algorithms such as RSA rely on numbers of 1024, 2048 and even 4096 bits in
length
Video
Prime Numbers and Prime Factorization (11 min; Apr 2021)
https://www.youtube.com/watch?v=i_LXZjK7Z98
Definition 5.7 (Euler’s Totient Function). Euler’s totient function, φ(n), is the number
of positive integers less than n and relatively prime to n. Also written as ϕ(n) or Tot(n).
Definition 5.8 (Properties of Euler’s Totient). Several useful properties of Euler’s totient
are:
φ(1) = 1
Example 5.7 (Euler’s Totient Function). The integers relatively prime to 10, and less
than 10, are: 1, 3, 7, 9. There are 4 such numbers. Therefore φ(10) = 4.
The integers relatively prime to 11, and less than 11, are: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
There are 10 such numbers. Therefore φ(11) = 10. The property could also be used since
11 is prime.
Since 7 is prime, φ(7) = 6.
Since 77 = 7 × 11, then φ(77) = φ(7 × 11) = 6 × 10 = 60.
Video
Euler’s Totient Function (8 min; Apr 2021)
https://www.youtube.com/watch?v=lnmbs-rPT-I
Example 5.8 (mod operator). The following are several examples of mod:
3 mod 7 = 3, since 0 × 7 + 3 = 3
9 mod 7 = 2, since 1 × 7 + 2 = 9
10 mod 7 = 3, since 1 × 7 + 3 = 10
(−3) mod 7 = 4, since (−1) × 7 + 4 = −3
Definition 5.11 (Congruent modulo). Two integers a and b are congruent modulo n if
(a mod n) = (b mod n). The congruence relation is written as:
a ≡ b (mod n)
When the modulus is known from the context, it may be written simply as a ≡ b.
3 ≡ 10 (mod 7)
14 ≡ 4 (mod 10)
3 ≡ 11 (mod 8)
Z7 = {0, 1, 2, 3, 4, 5, 6}
• Two integers a and b are congruent modulo n if (a mod n) = (b mod n), which is
written as
a ≡ b (mod n)
• (mod n) operator maps all integers into the set of integers Zn = {0, 1, . . . , (n−1)}
Example 5.11 (Modular Addition). The following are several examples of modular
addition:
2 + 3 (mod 7) = 5 (mod 7) = 5 mod 7 = 5 (mod 7)
2+6 (mod 7) = 8 (mod 7) = 8 mod 7 = 1 (mod 7)
6+6 (mod 7) = 12 (mod 7) = 12 mod 7 = 5 (mod 7)
3+4 (mod 7) = 7 (mod 7) = 7 mod 7 = 0 (mod 7)
Example 5.13 (Modular Subtraction). For brevity, the modulus is sometimes omitted
and = is used in replace of ≡. In mod 7:
6 − 3 = 6 + AI(3) = 6 + 4 = 10 = 3 (mod 7)
6 − 1 = 6 + AI(1) = 6 + 6 = 12 = 5 (mod 7)
1 − 3 = 1 + AI(3) = 1 + 4 = 5 (mod 7)
While the first two examples obviously give answers as we expect from normal subtraction,
the third does as well. 1 − 3 = −2, and in mod 7, −2 ≡ 5 since −1 × 7 + 5 = (−2).
Recall Z7 = {0, 1, 2, 3, 4, 5, 6}.
Video
Modular Addition, Additive Inverse and Modular Subtraction (13 min; Apr 2021)
https://www.youtube.com/watch?v=9uQe-7Fux9w
2 × 3 = 6 (mod 7)
2 × 6 = 12 = 5 (mod 7)
3 × 4 = 12 = 5 (mod 7)
Definition 5.17 (Multiplicative Inverse). a is a multiplicative inverse of b in mod n if
a×b ≡ 1 (mod n). For brevity, MI(a) may be used to indicate the multiplicative inverse
of a. a has a multiplicative inverse in (mod n) if a is relatively prime to n.
Example 5.15 (Multiplicative Inverse in mod 7). 2 and 7 are relatively prime, therefore
2 has a multiplicative inverse in mod 7.
φ(7) = 6, meaning 1, 2, 3, 4, 5 and 6 are relatively prime with 7, and therefore all of
those numbers have a MI in mod 7.
Example 5.16 (Multiplicative Inverse in mod 8). 3 and 8 are relatively prime, therefore
3 has a multiplicative inverse in mod 8.
4 and 8 are NOT relatively prime, therefore 4 does not have a multiplicative inverse in
mod 8. φ(8) = 4, and therefore only 4 numbers (1, 3, 5, 7) have a MI in mod 8.
5.3. FERMAT’S AND EULER’S THEOREMS 37
5 ÷ 2 = 5 × M I(2) = 5 × 4 = 20 ≡ 6
In mod 8:
7 ÷ 3 = 7 × M I(3) = 7 × 3 = 21 ≡ 5
7 ÷ 4 is undefined, since 4 does not have a multiplicative inverse in mod 8.
Definition 5.19 (Properties of Modular Arithmetic).
Video
Modular Multiplication, Multiplicative Inverse (6 min; Apr 2021)
https://www.youtube.com/watch?v=hh0Nb_Gp-w0
Definition 5.21 (Fermat’s Theorem 2). If p is prime and a is a positive integer, then:
ap ≡ a (mod p)
There are two forms of Fermat’s theorem—use whichever form is most convenient.
Example 5.18 (Fermat’s theorem). What is 2742 mod 43? Since 43 is prime and 42 =
43 − 1, this matches Fermat’s Theorem form 1. Therefore the answer is 1.
Example 5.19 (Fermat’s theorem). What is 640163 mod 163? Since 163 is prime, this
matches Fermat’s Theorem form 2. Therefore the answer is 640, or simplified to 640 mod
163 = 151.
Definition 5.22 (Euler’s Theorem 1). For every a and n that are relatively prime:
aφ(n) ≡ 1 (mod n)
38 CHAPTER 5. NUMBER THEORY
aφ(n)+1 ≡ a (mod n)
Note that there are two forms of Euler’s theorem—use the most relevant form.
Example 5.20 (Euler’s theorem). Show that 3740 mod 41 = 1. Since n = 41, which is
prime, then φ(41) = 40. As 37 is also prime, 37 and 41 are relatively prime. Therefore
Euler’s Theorem form 1 holds.
Example 5.21 (Euler’s theorem). What is 137944621 mod 4757? Factoring 4757 into
primes gives 67 × 71. Therefore φ(4757) = φ(67)x × φ(71) = 66 × 70 = 4620. Therefore,
this follows Euler’s Theorem form 2, giving an answer of 13794.
Video
Fermat’s and Euler’s Theorems (16 min; Apr 2021)
https://www.youtube.com/watch?v=k0NBKQ8W90U
23 mod 7 = 8 mod 7 = 1
34 mod 7 = 81 mod 7 = 4
36 mod 8 = 729 mod 8 = 1
Video
Modular Exponentiation (1 min; Apr 2021)
https://www.youtube.com/watch?v=Nj6GxMeYRO4
i = loga (b)
The above definition is for normal arithmetic, not for modular arithmetic. Logarithm
in normal arithmetic is the inverse operation of exponentiation. In modular arithmetic,
modular logarithm is more commonly called discrete logarithm. Note we replace n with
p—the reason will become apparent shortly.
5.4. DISCRETE LOGARITHMS 39
i = dloga,p (b)
Video
Normal and Discrete Logarithms (3 min; Apr 2021)
https://www.youtube.com/watch?v=T17nhLEWwoA
Definition 5.28 (Hard Problem: Integer Factorisation). If p and q are unknown primes,
given n = pq, find p and q.
Also known as prime factorisation. While someone that knows p and q can easily
calculate n, if an attacker knows only n they cannot find p and q.
Definition 5.29 (Hard Problem: Euler’s Totient). Given composite n, find φ(n).
Definition 5.30 (Hard Problem: Discrete Logarithms). Given b, a, and p, find i such
that i = dloga,p (b).
Video
Computationally Hard Problems for Cryptography (4 min; Apr 2021)
https://www.youtube.com/watch?v=fJXdXNxeNVs
Part III
43
Chapter 6
Classical Ciphers
This chapter introduces several historical or classical ciphers. While these ciphers are no
longer used, they are simple enough to perform operations by hand while demonstrating
important concepts used in the design of most symmetrical ciphers used today. The
actual history of the ciphers is not presented here; you can find that in most cryptography
textbooks or via searches online.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
In the examples we will assume the Caesar cipher (and most other classical ciphers)
operate on case-insenstive English plaintext. That is, the character set is a through to z.
However it can also be applied to any language or character set, so long as the character
set is agreed upon by the users.
Exercise 6.1 (Caesar Cipher Encryption). Using the Caesar cipher, encrypt plaintext
hello with key 3.
Solution 6.1 (Caesar Cipher Encryption). To encrypt the plaintext hello with the key
3, each letter in the plaintext is encrypted by shifting 3 positions to the right in the
alphabet. The letter 3 positions to the right of h is K, as illustrated below:
a b c d e f g h i j K l m n o p q r s t u v w x y z
45
46 CHAPTER 6. CLASSICAL CIPHERS
The letter 3 positions to the right of l is O (notin that there are two l’s in the plaintext,
so there will be two O’s in the ciphertext):
a b c d e f g h i j k l m n O p q r s t u v w x y z
Video
Caesar Cipher Encryption Example (2 min; Feb 2020)
https://www.youtube.com/watch?v=HILcygVamnU
Question 6.1 (How many keys are possible in the Caesar cipher?). If the Caesar cipher
is operating on the characters a–z, then how many possible keys are there? Is a key of 0
possible? Is it a good choice? What about a key of 26?
Video
Number of Keys in Caesar Cipher (3 min; Feb 2020)
https://www.youtube.com/watch?v=Uk1k_GA_2Y0
Exercise 6.2 (Caesar Cipher Decryption). You have received the ciphertext TBBQOLR.
You know the Caesar cipher was used with key n. Find the plaintext.
Solution 6.2 (Caesar Cipher Decryption). To decrypt the ciphertext TBBQOLR with the
key n, each letter in the ciphertext is decrypted by shifting n=13 positions to the left in
the alphabet. The letter 13 positions to the left of T is g, as illustrated below:
A B C D E F g H I J K L M N O P Q R S T U V W X Y Z
Therefore, the first three letters of the plaintext so far are goo. You can continue as
above to find the final plaintext is goodbye.
Video
Caesar Cipher Decryption Example (3 min; Feb 2020)
https://www.youtube.com/watch?v=N6YwWnkXh8M
We will now look at the Caesar cipher from a mathematical perspective. By treating
each letter in the alphabet as a number, we can write equations that define the encrypt
and decrypt operations on each letter.
6.1. CAESAR CIPHER 47
In the equations, P is the numerical value of a plaintext letter. Letters are numbered
in alphabetical order starting at 0. That is, a=0, b=1, . . . , z=25. Similarly, K and C
are the numerical values of the key and ciphertext letter, respectively. Shifting to the
right in encryption is addition, while shifting to the left in decryption is subtraction. To
cater for the wrap around (e.g. when the letter z is reacher), the last step is to mod by
the total number of characters in the alphabet.
Solution 6.3 (Caesar Cipher Encryption). Key p means K = 15. The first ciphertext
letter is S, so C1 = 18. Using the decrypt equation:
P1 = (C1 − K) mod 26
= (18 − 15) mod 26
= 3 mod 26
= 3
P2 = (C2 − K) mod 26
= (3 − 15) mod 26
= (−12) mod 26
= 14
P3 = (C3 − K) mod 26
= (21 − 15) mod 26
= (6) mod 26
= 6
Therefore the third plaintext letter is g, and the entire plaintext is dog.
Video
Caesar Cipher Decryption using Mathematical Approach (4 min; Feb 2020)
https://www.youtube.com/watch?v=yvLYP7zxnkA
Note that the pycipher package needs to be installed and imported first (see Sec-
tion 3.3.2).
Exercise 6.4 (Caesar Brute Force). The ciphertext FRUURJVBCANNC was obtained using
the Caesar cipher. Find the plaintext using a brute force attack.
Solution 6.4. As a naive approach, try all possible keys, and then check the plaintext
values obtained. If one is recognisable, then most likely have found the correct plaintext.
Without any knowledge of which key was used, one approach is to try the keys in order.
For example, try key 1, and then key 2, then key 3. (In theory you could try key 0, but
we know in the Caesar cipher that it does nothing).
The range function in Python produces values inclusive of the lower limit and exclu-
sive of the upper limit. That is, from 0 to 25.
6.1. CAESAR CIPHER 49
The results of the brute force are formatted to show the key (it is slightly different
from the Python code output).
Video
Brute Force Attack on Caesar Cipher with Python (5 min; Feb 2020)
https://www.youtube.com/watch?v=GpfoaxcxHWs
Question 6.2 (How many attempts for Caesar brute force?). What is the worst, best
and average case of number of attempts to brute force ciphertext obtained using the
Caesar cipher?
There are 26 letters in the English alphabet. The key can therefore be one of 26
values, 0 through to 25. The key of 26 is equivalent to a key of 0, since it will encrypt
to the same ciphertext. The same applies for all values greater than 25. While a key of
0 is not very smart, let’s assume it is a valid key.
The best case for the attacker is that the first key they try is the correct key (i.e. 1
attempt). The worst case is the attacker must try all the wrong keys until they finally
try the correct key (i.e. 26 attempts). Assuming the encrypter chose the key randomly,
there is equal probability that the attacker will find the correct key in 1 attempt (1/26),
as in 2 attempts (1/26), as in 3 attempts (1/26), and as in 26 attempts (1/26). The
average number of attempts can be calculated as (26+1)/2 = 13.5.
Assumption 6.1 (Recognisable Plaintext upon Decryption). The decrypter will be able
to recognise that the plaintext is correct (and therefore the key is correct). Decrypting
ciphertext using the incorrect key will not produce the original plaintext. The decrypter
will be able to recognise that the key is wrong, i.e. the decryption will produce unrecog-
nisable output.
Question 6.3 (Is plaintext always recognisable?). Caesar cipher is using recognisably
correct plaintext, i.e. English words. But is the correct plaintext always recognisable?
What if the plaintext was a different language? Or compressed? Or it was an image or
video? Or binary file, e.g. .exe? Or a set of characters chosen randomly, e.g. a key or
password?
The correct plaintext is recognisable if it contains some structure. That is, it does
not appear random. It is common in practice to add structure to the plaintext, making
50 CHAPTER 6. CLASSICAL CIPHERS
it relatively easy to recognise the correct plaintext. For example, network packets have
headers/trailers or error detecting codes. Later we will see cryptographic mechanisms
that can be used to ensure that the correct plaintext will be recognised. For now, let’s
assume it can be.
There are two ways to improve the Caesar cipher:
Exercise 6.5 (Decrypt Monoalphabetic Cipher). Decrypt the ciphertext QSWBSR using
the permutation chosen in the previous example.
Solution 6.5 (Decrypt Monoalphabetic Cipher). A simple lookup on the mapping de-
fined in the example returns the plaintext secret.
Video
Mono-alphabetic Substitution Cipher Example (3 min; Feb 2020)
https://www.youtube.com/watch?v=RWoDvO2WQ0A
6.2. MONOALPHABETIC CIPHERS 51
Question 6.4 (How many keys in English monoalphabetic cipher?). How many possible
keys are there for a monoalphabetic cipher that uses the English lowercase letters? What
is the length of an actual key?
Consider the number of permutations possible. The example used a single permuta-
tion chosen by the two parties.
Video
Number of Keys in an English Monoalphabetic Substitution Cipher (3 min; Feb 2020)
https://www.youtube.com/watch?v=XXCHks0vMW0
Solution 6.6 (Brute Force on Monoalphabetic Cipher). With a 26 letter alphabet, there
are 26! permutations or keys. The average number of keys to try in a brute force attack
is (26! + 1)/2, or approximately half of them, 26!/2. The Python code can try 109 keys
per second. Therefore the average brute force time, T , is:
(26! + 1)/2
T =
109
2 × 1026
≈
109
≈ 2 × 1017 seconds
≈ 64 million centuries
Video
Brute Force Attack Time on English Monoalphabetic Cipher (7 min; Feb 2020)
https://www.youtube.com/watch?v=c4gLyX9mwgM
Definition 6.4 (Frequency Analysis Attack). Find (portions of the) key and/or plaintext
by using insights gained from comparing the actual frequency of letters in the ciphertext
with the expected frequency of letters in the plaintext. Can be expanded to analyse sets
of letters, e.g. digrams, trigrams, n-grams, words.
The letter frequencies of the figure above are based on Peter Norvig’s analysis of
Google Books N-Gram Dataset. Norvig is Director of Research at Google. His website
has more details on the analysis.
Solution 6.7 (Break a Monoalphabetic Cipher). See the the steps under the section
“Frequency Analysis of Monoalphabetic Cipher” on the following website:
sandilands.info/sgordon/classical-ciphers-frequency-analysis-examples
Exercise 6.8 (Playfair Matrix Construction). Construct the Playfair matrix using key-
word australia.
Solution 6.8 (Playfair Matrix Construction). We write the keyword in a 5-by-5 matrix,
starting as:
a u s t r
l i
Note that we don’t write the letter a multiple times in the matrix, and the letter i
also represents the letter j.
Now we fill the remainder of the matrix with the English letters in alphabetical order.
Again, no duplicate letters are included.
a u s t r
l i b c d
e f g h k
m n o p q
v w x y z
Video
Playfair Cipher Matrix Construction (3 min; Feb 2020)
https://www.youtube.com/watch?v=5QGiCkZidE4
Algorithm 6.4 (Playfair Encryption). Split the plaintext into pairs of letters. If a pair
has identical letters, then insert a special letter x in between. If the resulting set of letters
is odd, then pad with a special letter x.
Locate the plaintext pair in the Playfair matrix. If the pair is on the same column,
then shift each letter down one cell to obtain the resulting ciphertext pair. Wrap when
necessary. If the plaintext pair is on the same row, then shift to the right one cell.
Otherwise, the first ciphertext letter is that on the same row as the first plaintext letter
and same column as the second plaintext letter, and the second ciphertext letter is that
on the same row as the second plaintext letter and same column as the first plaintext
letter.
Repeat for all plaintext pairs.
6.3. PLAYFAIR CIPHER 55
Playfair decryption uses the same matrix and reverses the rules. That is, move up
(instead of down) if on the same column, move left (instead of right) if on the same row.
Finally, the padded special letters need to be removed. This can be done based upon
knowledge of the langauge. For example, if the intermediate plaintext from decryption
is helxlo, then as that word doesn’t exist, the x is removed to produce hello.
Exercise 6.9 (Playfair Encryption). Find the ciphertext if the Playfair cipher is used
with keyword australia and plaintext hello.
Solution 6.9 (Playfair Encryption). The Playfair matrix from the previous exercise is:
a u s t r
l i b c d
e f g h k
m n o p q
v w x y z
First split the plaintext into pairs: he, ll and o. As the second pair has identical
letters, insert a special character x and move the second l into the third pair. The
resulting pairs are:
he lx lo
Now for each pair, apply the rules to find the corresponding ciphertext pair.
For plaintext pair he:
a u s t r
l i b c d
e f g h k
m n o p q
v w x y z
As the pair are on the same row, the ciphertext pair is taken as the letters to the
right:
a u s t r
l i b c d
e f g h k
m n o p q
v w x y z
The first two letters of the ciphertext are KF.
The second pair, lx, is on different rows and columns:
a u s t r
l i b c d
e f g h k
m n o p q
v w x y z
The ciphertext pair is taken from the same row and column, but reversed in order:
a u s t r
l i b c d
e f g h k
m n o p q
v w x y z
The second pair of ciphertext is BV.
56 CHAPTER 6. CLASSICAL CIPHERS
Video
Encryption with the Playfair Cipher (7 min; Feb 2020)
https://www.youtube.com/watch?v=7kmTq35mLzA
Question 6.5 (Does Playfair cipher always map a letter to the same ciphertext letter?).
Using the Playfair cipher with keyword australia, encrypt the plaintext hellolove.
With the Playfair cipher, if a letter occurs multiple times in the plaintext, will that
letter always encrypt to the same ciphertext letter?
If a pair of letters occurs multiple times, will that pair always encrypt to the same
ciphertext pair?
Is the Playfair cipher subject to frequency analysis attacks?
Video
Playfair Cipher and Frequency Analysis (4 min; Feb 2020)
https://www.youtube.com/watch?v=gPmRhuXd5j0
For example, when encrypting a set of plaintext letters with a polyalphabetic cipher, a
monoalpabetic cipher with a particular key is used to encrypt the first letter, and then the
same monoalphabetic cipher is used but with a different key to encrypt the second letter.
They key used for the monoalphabetic cipher is determined by the key (or keyword) for
the polyalphabetic cipher.
• Vigenère Cipher: uses Caesar cipher, but Caesar key changes each letter based on
keyword
• One Time Pad: same as Vigenère/Vernam, but random key as long as plaintext
Video
Encryption with Vigenere Cipher and Python (4 min; Feb 2020)
https://www.youtube.com/watch?v=r8xsofoAdNI
Exercise 6.10 (Vigenère Cipher Encryption). Use Python (or other software tools) to
encrypt the plaintext centralqueensland with the following keys with the Vigenère
cipher, and investigate any possible patterns in the ciphertext: cat, dog, a, giraffe.
Solution 6.10 (Vigenère Cipher Encryption). Using the pycipher library:
$ python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycipher
>>> pycipher.Vigenere("cat").encipher("centralqueensland")
’EEGVRTNQNGEGULTPD’
58 CHAPTER 6. CLASSICAL CIPHERS
>>> pycipher.Vigenere("dog").encipher("centralqueensland")
’FSTWFGOEAHSTVZGQR’
>>> pycipher.Vigenere("a").encipher("centralqueensland")
’CENTRALQUEENSLAND’
>>> pycipher.Vigenere("giraffe").encipher("centralqueensland")
’IMETWFPWCVESXPGVU’
Video
Vigenere Python Examples (4 min; Feb 2020)
https://www.youtube.com/watch?v=VqjDjocUqKY
While the Vigenère cipher improves on monoalphabetic ciphers, it still has a weakness.
The approach for breaking the cipher is:
The following shows an example of breaking the Vigenère cipher, although it is not
necessary to be able to do this yourself manually.
Example 6.4 (Breaking Vigenère Cipher). Ciphertext ZICVTWQNGRZGVTWAVZHCQYGLMGJ
has repetition of VTW. That suggests repetition in the plaintext at the same position,
which would be true if the keyword repeated at the same position.
012345678901234567890123456
ZICVTWQNGRZGVTWAVZHCQYGLMGJ
That is, it is possible the key letter at position 3 is the repated at position 12. That in
turn suggest a keyword length of 9 or 3.
ciphertext ZICVTWQNGRZGVTWAVZHCQYGLMGJ
length=3: 012012012012012012012012012
length=9: 012345678012345678012345678
An attacker would try both keyword lengths. With a keyword length of 9, the attacker
then performs Caesar cipher frequency analysis on every 9th letter. Eventually they find
plaintext is wearediscoveredsaveyourself and keyword is deceptive.
This attack may require some trial-and-error, and will be more likely to be successful
when the plaintext is very long. See the Stallings textbook, from which the example is
taken, for further explanation.
6.6. VERNAM CIPHER 59
Video
Cryptanalysis of Vigenere Cipher (4 min; Feb 2020)
https://www.youtube.com/watch?v=u0FtomzkoTQ
ci = pi ⊕ ki
The Vernam cipher is essentially a binary form of the Vigenère cipher. The mathe-
matical form of Vigenère encryption adds the plaintext and key and mods by 26 (where
there are 26 possible charactersd). In binary, there are 2 possible characters, so the
equivalnet is to add the plaintext and key and mod by 2. This identical to the XOR
operation.
To demonstrate the Vernam cipher, we will use Python to perform the XOR (⊕)
operation.
The Python code defines a function called xor that takes two strings representing
bits, and returns a string represent the XOR of those bits. The actual XOR is performed
on integers using the Python hat ôperator. The rest is formatting as strings.
Exercise 6.11 (Vernam Cipher Encryption). Using the Vernam cipher, encrypt the
plaintext 011101010101000011011001 with the key 01011.
Video
Vernam Cipher using Bits and XOR (7 min; Feb 2020)
https://www.youtube.com/watch?v=oXsyZhZrRE4
60 CHAPTER 6. CLASSICAL CIPHERS
• Encrypting plaintext with random key means output ciphertext will be random
– E.g. XOR plaintext with a random key produces random sequence of bits in
ciphertext
Example 6.5 (Attacking OTP). Consider a variant of Vigenère cipher that has 27 char-
acters (including a space). An attacker has obtained the ciphertext:
ANKYODKYUREPFJBYOJDSPLREYIUNOFDOIUERFPLUYTS
There are many other legible plaintexts obtained with other keys. No way for attacker
to know the correct plaintext
The example shows that even a brute force attack on a OTP is unsuccessful. Even
if the attacker could try all possible keys—the plaintext is 43 characters long and so
there are 2743 ≈ 1061 keys—they would find many possible plaintext values that make
sense. The example shows two such plaintext values that the attacker obtained. Which
one is the correct plaintext? They both make sense (in English). The attacker has no
way of knowing. In general, there will be many plaintext values that make sense from a
brute force attack, and the attacker has no way of knowing which is the correct (original)
plaintext. Therefore a brute force attack on a OTP is ineffective.
Let’s finish our coverage of classical substitition ciphers with a summary of the OTP:
6.8. TRANSPOSITION TECHNIQUES 61
The practical limittions are significant. The requirement that the key must be as long
as the plaintext, random and never repeated (if it is repeated then the same problems
arise as in the original Vernam cipher) means large random values must be created. But
creating a large amount of random data is actually difficult. Imagine you wanted to use a
OTP for encrypting large data transfers (multiple gigabytes) across a network. Multiple
gigabytes of random data must be generated for the key, which is time consuming (seconds
to hours) for some computers. Also, the key must be exchanging, usually over a network,
with the other party in advance. So to encrypt a 1GB file to need a 1GB random key.
Both the key and file must be sent across the network, i.e. a total of 2GB. This is very
inefficient use of the network: a maximum of 50% efficiency.
Later we will see real ciphers that work with a relatively small, fixed length key (e.g.
128 bits) and provide sufficient security.
Video
One-Time Pad as an Unbreakable Cipher (7 min; Feb 2020)
https://www.youtube.com/watch?v=GSsDofkajD4
• Substitution: replace one (or more) character in plaintext with another from the
entire possible character set
Video
Rail Fence Transposition Cipher Example (2 min; Feb 2020)
https://www.youtube.com/watch?v=35OM0S_MDcg
Definition 6.7 (Rows Columns Cipher Encryption). Select a number of columns m and
permutate the integers from 1 to m to be the key. Write the plaintext row-by-row over
m columns. Read column-by-column, in order of the columns determined by the key, to
obtain the ciphertext.
Be careful with the decryption process; it is often confusing. Of course it must be the
process such that the original plaintext is produced.
Exercise 6.13 (Rows Columns Encryption). Consider the plaintext securityandcryptography
with key 315624. Using the rows columns cipher, find the ciphertext.
Solution 6.12 (Rows Columns Encryption). With a key of 315624, we write the plain-
text row-by-row across 6 columns:
3 1 5 6 2 4
s e c u r i
t y a n d c
r y p t o g
r a p h y x
A special letter, x in this case, is used to pad to fill the last row. This padding must
be agreed upon in advance by the sender and receiver.
Now read column-by-column, starting with column indicated by the key as 1, i.e.
EYYA. Then column 2: RDOY. The resulting ciphertext is EYYARDOYSTRRICGXCAPPUNTH.
6.8. TRANSPOSITION TECHNIQUES 63
Video
Encrypting with Rows/Columns Transposition Cipher (3 min; Feb 2020)
https://www.youtube.com/watch?v=nZdtNiDLKJ0
Example 6.6 (Rows Columns Multiple Encryption). Assume the ciphertext from the
previous example has been encrypted again with the same key. The resulting ciphertext
is YYCPRRCTEOIPDRAHYSGUATXH. Now let’s view how the cipher has “mixed up” the letters
of the plaintext. If the plaintext letters are numbered by position from 01 to 24, their
order (split across two rows) is:
01 02 03 04 05 06 07 08 09 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
After the first encryption, the numbers reveal a pattern: increasing by 6 within groups
of 4. This is because of the 6 columns and 4 rows. After the second encryption, it is not
so obvious to identify patterns.
The point is that while a single application of the transposition cipher did not seem
to offer much security (in terms of hiding patterns), adding the second application of the
cipher offers an improvement. This principle of repeated applications of simple operations
is used in modern ciphers.
In summary:
• But combining transposition ciphers with substitution ciphers, and repeated appli-
cations, practical security can be achieved
Video
Multiple Rounds of Rows/Columns Transposition Cipher (5 min; Feb 2020)
https://www.youtube.com/watch?v=oxQBHCWqe6A
64 CHAPTER 6. CLASSICAL CIPHERS
Chapter 7
Chapter 6 introduced concepts of encryption using classical ciphers. This chapter for-
malises these concepts, in Section 7.1 defining the building blocks for encryption in mod-
ern ciphers, in particular in symmetric key cryptography. Section 7.2 looks at encryption
from the attackers point of view. Understanding the approaches attackers can take is
necessary to be able to build secure systems with will withstand attacks. Section 7.3
and Section 7.4 outline the general design approaches to the two types of symmetric key
ciphers: block ciphers and stream ciphers.
Further details about encryption and attacks are covered in subsequent chapters,
including details on Data Encryption Standard (DES) (Chapter 8) and Advanced En-
cryption Standard (AES) (Chapter 9). The alternative to symmetric key encryption,
public key cryptography is introduced in Part IV.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
Video
Encryption Building Blocks (13 min; Mar 2020)
https://www.youtube.com/watch?v=ZVF2kYPnm3g
Figure 7.1 shows the general model for encrypting for confidentiality that we have seen
previously.
All ciphers until about the 1960’s were symmetric key ciphers. The encrypter and
decrypter used the same key, i.e. symmetry between the keys. The key must be shared
between the two users and kept secret.
File: crypto/encryption.tex, r1965
65
66 CHAPTER 7. ENCRYPTION AND ATTACKS
A new form of cryptography was designed in the 1960’s and 1970’s, where the en-
crypter uses one key and the decrypter uses a different but related key. The keys are
asymmetric. One of the keys is kept secret, while the other can be disclosed, i.e. made
public.
We will focus on symmetric key ciphers initially, and return to public-key ciphers
later.
We often use simple mathematical notation to describe the steps. E() is a function
that takes two inputs: key K and plaintext P. It returns ciphertext C as output. E()
represents the encryption algorithm. D() is the decryption algorithm.
Symmetric key encryption is the oldest form of encryption and involves both parties
(e.g. sender and receiver) knowing the same secret key. Plaintext is encrypted with
the secret key, and the ciphertext is decrypted with that secret key. If anyone else (i.e
attacker) learns the secret key, then the system in not secure.
For symmetric key encryption to be secure, the algorithm must be well designed
(strong, not easy to break) and the secret key must be kept secret. AES is an example of
a strong algorithm, and it uses keys of length 128 bits or longer. One of the challenges of
7.1. ENCRYPTION BUILDING BLOCKS 67
symmetric key encryption is informing the receiver of the secret key in advance: it must
be done in a secure manner.
Product systems multiple stages of substitutions and permutations, e.g. Feistel net-
work, Substitution Permutation Network (SPN)
Symmetric key ciphers are designed around two basic operations: substitution and
permutation. We have seen these operations when looking at classical ciphers. We also
saw the principle that repeating the operations can make a cipher more secure. Modern
ciphers are designed using these two basic operations, but repeated multiple times. For
example, perform a substitution and then permutation, then repeat. The result is a
“product system”.
The Feistel network and SPN are two common design principles for modern ciphers
and will be mentioned later when discussing block ciphers like AES and DES.
Block cipher process one block of elements at a time, typically 64 or 128 bits
Stream cipher process input elements continuously, e.g. 1 byte at a time, by XOR
plaintext with keystream
Originally the idea was that block ciphers were suitable for processing large amounts of
data when there were no strict time constraints. Stream ciphers were fast and suitable for
real-time applications. For example, for encrypting real-time voice, as the data (plaintext)
is generated, it needs to be quickly encrypted and then the ciphertext transmitted across
a network. By encrypting only a small amount of plaintext at a time and using the
extremely fast XOR operation, stream ciphers could perform the encryption without
introducing significant delay.
However nowadays, the dedicated hardware support for block ciphers like AES, there
is not a significant difference in performance (delay) of block and stream ciphers. Hence
we see block ciphers (in particular, AES) used in scenarios for which stream ciphers were
originally designed for.
We will focus on block ciphers initially, and return to stream ciphers later.
While no longer recommended or in widespread use, DES was the first cipher that
saw widespread use. The primary limitation of DES however was the key was eventually
subject to a brute force attack. It was only 56 bits.
While Triple DES, which used the original DES but expanded the key length, was
popular for awhile, a new cipher was needed to perform well in a variety of hardware
68 CHAPTER 7. ENCRYPTION AND ATTACKS
platforms. AES was standardised in 1998 and continues to be the recommended symmet-
ric key block cipher for most applications today. There are no known practical attacks
that cannot be defended.
DES and AES are covered in depth later.
Figure 7.3 lists common symmetric key encryption block ciphers starting with DES,
through to around the time of AES. Most block ciphers operate on blocks of 64 or 128
bits, and support a range of key lengths. There are three main design principles: Feistel
network or structure, Substitution Permutation Network, or Lai-Massey.
AES is still highly recommended for most applications. There have been newer pro-
posals since then, however very few are standards or see wide spread usage. A recent
trend is on developing “lightweight” ciphers that perform well on very small devices, e.g.
sensors.
A detailed review of block ciphers is Roberto Avanzi’s “A Salad of Block Ciphers:
The State of the Art in Block Ciphers and their Analysis”, 2017, which is available for
free at https://eprint.iacr.org/2016/1171.pdf
Video
Attacks on Encryption (28 min; Mar 2020)
https://www.youtube.com/watch?v=yuiGyCx3WFA
7.2. ATTACKS ON ENCRYPTION 69
– Assumptions about what attacker knows and can do, e.g. intercept messages,
modify messages
– Requirements of the system/users, e.g. confidentiality, authentication
Table 7.1: Worst Case Brute Force Time for Different Keys
Table 7.1 shows, for different key lengths, the time it takes to try every key if a single
computer could make attempts at one of three rates: 109 per second, 1012 per second, or
1015 per second. There are not necessarily realistic speeds, although roughly represent
lower and upper limits for today’s computing power.
While this table presents the worst case time, in most cases, it is not much different
from the average time. Recall the average time is about half of the worst case time. For
70 CHAPTER 7. ENCRYPTION AND ATTACKS
a 128 bit key at 1015 decrypts per second, the worst case time is about 1 × 1016 years,
and the average time is about 0.5 × 1016 . That is, both about 1016 years. With such large
times, cutting the time in half makes no practical difference.
Note that the last line is for a key for a monoalphabetic English cipher. There are
26! possible keys which is equivalent to a binary key of about 88 bits.
For comparison, the age of the Earth is approximately 4 × 109 years and the age of
the universe is approximately 1.3 × 1010 years.
7.2.3 Cryptanalysis
Cryptanalysis is the “smart” approach to breaking ciphers. The attacker uses knowledge
of the ciphers, as well as expected patterns in ciphertext and plaintext to find unknown
information (e.g. keys or plaintext).
Attacks on ciphers can be classified based on how much information an attacker is
assumed to know to successfully perform the attack. We describe different classifications
in the following.
The common assumption is that an attacker knows the encryption algorithm and
ciphertext, and that they had no influence over the choice of ciphertext. This is referred
to a ciphertext only attack. A cipher that is subject to a ciphertext only attack is the
weakest of the groups of attacks we will consider.
However if a cipher cannot be defeated by a ciphertext only attack, then it still may be
defeated if the attacker has additional information. The following defines these additional
attacks. They all assume the attacker has the same information as a ciphertext only
attack (i.e. encryption algorithm and ciphertext), but also make additional assumptions
about other known information and the ability to select/influence values. Generally, the
more information an attacker knows or can control, the easier their task of defeating a
cipher.
7.2. ATTACKS ON ENCRYPTION 71
– encryption algorithm
– ciphertext
– one or more plaintext–ciphertext pairs formed with the secret key
• E.g. attacker has intercept past ciphertext and somehow discovered their corre-
sponding plaintext
• All pairs encrypted with the same secret key (which is unknown to attacker)
In a Known Plaintext Attack (KPA), the attacker also has access to one or more pairs
of plaintext/ciphertext. That is, assume the ciphertext known, Cknown , was obtained
using key Kunknown and plaintext Punknown (either of which the attacker is trying to find).
The attacker also knows at least C1 and P1 , where C1 is the output of encrypting P1 with
key Kunknown . That is, the attacker knows a pair (P1 , C1 ). They may also know other
pairs (obtained using the same key Kunknown ).
How could an attacker known past plaintext/ciphertext pairs? A simple example is
if the plaintext messages were only valid for a limited time, after which they become
public. Such as coordinates for a public event to take place. Before the event takes
place the coordinates are encrypted and secret. But after the event takes place, while the
coordinates were decrypted, the attacker has learnt the value of the coordinates/plaintext
(without knowing the key).
Generally, the more pairs of plaintext/ciphertext known, the easiest it is to defeat a
cipher.
– encryption algorithm
– ciphertext
– plaintext message chosen by attacker, together with its corresponding cipher-
text generated with the secret key
– encryption algorithm
72 CHAPTER 7. ENCRYPTION AND ATTACKS
– ciphertext
– ciphertext chosen by attacker, together with its corresponding decrypted plain-
text generated with the secret key
In a Chosen Ciphertext Attack (CCA) the attacker chooses a ciphertext, and obtains
the corresponding plaintext, in an attempt to discover a secret key. Note in this attack,
the aim is to find the secret key. If the attacker has a way to obtain plaintext from
a chosen ciphertext, then they could simply intercept ciphertext to find plaintext. A
CCA normally involves the attacker tricking a user to decrypt ciphertext and provide the
plaintext.
There are variations of the above types of attacks, and the details of the attacks may
be quite different, however this classification is sufficient to demonstrate that successful
cryptanalysis depends partially on the amount of information known to the attacker.
• One-time pad is only unconditionally secure cipher (but not very practical)
Time: usually measured as number of operations, since real time depends on implemen-
tation and computer specifics
While time to break the cipher is the metric of interest, it is usually simplified to
number of operations. For cryptanalysis, successful attacks should take fewer operations
than brute force. That is, an attack that takes more operations the a brute force attack
is considered an unsuccessful attack.
Often attacks requires intermediate values to be stored in memory while performing
the attack. The less memory needed, the better the attack.
As seen in the previous classification, known plaintext, chosen plaintext and chosen
ciphertext attacks all require the attacker to know additional information. The more in-
formation necessary for the attack to be successful, the poorer the attack is. For example,
a known plaintext attack that will be successful if 1,000,000 pairs of plaintext/ciphertext
are known, is better than a known plaintext attack that requires 2,000,000 pairs.
Video
Measuring Attacks on Ciphers (4 min; Mar 2021)
https://www.youtube.com/watch?v=3tfqACxHUSA
Video
Block Cipher Design Principles (9 min; Mar 2021)
https://www.youtube.com/watch?v=ULKX4Xqpclg
Let’s look at some simple, ideal block ciphers to illustrate basic concepts, which will
then lead to common design principles used to create block ciphers in use today. In its
simplest form, a block cipher maps an n-bit plaintext block to a n-bit ciphertext block,
with the exact mapping determined by the cipher design and selected key. The mapping
can be viewed as a lookup table.
Figure 7.5 is an example of a 2-bit ideal block cipher. The table shows input plaintext
blocks in the left column, different keys in the top row, and the resulting output ciphertext
block in the body of the table. To be used for sending a confidential message, both the
sender and receiver would know the table (e.g. stored in memory on their devices), or
some way to calculate the table) and agree upon the key to use. For a given plaintext
block, the sender looks up the key to find the output ciphertext to send. The receiver
looks up the receiver ciphertext in the column of the key, and the row determines the
plaintext.
Exercise 7.1 (Encrypt with Ideal Cipher 1). Encrypt the message Tokyo using the above
ideal 2-bit block cipher 1 with key K6.
Solution 7.1 (Encrypt with Ideal Cipher 1). As the example block cipher operates on
2-bit binary blocks, but a five letter English message is to be encrypted we will make
assumptions about the encoding and mode of operation to be used.
First, we will assume ASCII (or UTF-8) encoding is to be used (see Section B.1.4).
Each letter will map to an 8-bit value, i.e. T = 01010100, o = 01101111, k = 01101011,
and y = 01111001. The resulting plaintext in binary is 40 bits:
0101010001101111011010110111100101101111
Second, as we have 40 bits of plaintext, but a 2-bit block cipher, we will assume each 2-
bit block of plaintext will be encrypted (20 blocks in total), and the resulting 20 ciphertext
blocks will be concatenated to produce final 40 bit ciphertext. This naive approach is
referred to as the Electronic Code Book mode of operation. Modes of operations are
discussed in Chapter 11. The 20 plaintext blocks are:
01 01 01 00 01 10 11 11 01 10 10 11 01 11 10 01 01 10 11 11
Consider the 1st plaintext block of 01, using key K6, looking up the block cipher table
returns a ciphertext block of 10. We know have the first of 20 ciphertext blocks and can
move on to the 2nd ciphertext block. It turns out the next plaintext block is the same
as the first (01), and since the same key is used (K6), the same ciphertext block will
7.3. BLOCK CIPHER DESIGN PRINCIPLES 75
be output (10). In fact, the first three plaintext blocks are the same, so the ciphertext
blocks so far are:
10 10 10
The 4th plaintext block is 00. Looking up in the table with key K6 produces output
ciphertext block 11. We know have:
10 10 10 11
Continuing with all 20 plaintext blocks will produce ciphertext blocks:
10 10 10 11 10 00 01 01 10 00 00 01 10 01 00 10 10 00 01 01
Concatenating all ciphertext blocks together produces the ciphertext:
1010101110000101100000011001001010000101
Should the ciphertext be encoded as ASCII/UTF8 to complete the encryption? It
could be, but note that some of the characters may note be printable (e.g. ESCape
or ACK). For block ciphers we typically operate on binary plaintext and ciphertext.
Encoding and decoding between binary and other formats is not normally part of the
cipher, so we will leave the ciphertext as a sequence of bits.
The above exercise identified several issues that arise when applying an ideal block
cipher:
• Repetition of plaintext blocks: undesirable. Make block size larger and use mode
of operation that obscures repetition
• Key space: larger block size needed to allow more keys in ideal block cipher
• Implementing an ideal block cipher: how are they generated? can all values be
stored?
Figure 7.6 shows a different 2-bit ideal block cipher. It maps plaintext to ciphertext
in a different order than cipher 1.
This example is just used for illustrative purposes. If you had an ideal block cipher
that covered every permutation of plaintext values, then only a single cipher is needed.
Question 7.1 (What is plaintext with key K13, ciphertext 11 with ideal cipher 2?).
What is plaintext with key K13, ciphertext 11 with ideal cipher 2?
76 CHAPTER 7. ENCRYPTION AND ATTACKS
Decryption also involves a lookup. In the column for key K13, identify the ciphertext
11, and the row indicates the original plaintext 10.
Question 7.2 (What is plaintext with key K4, ciphertext 11 with ideal cipher 2?). What
is plaintext with key K4, ciphertext 11 with ideal cipher 2?
Same cipher, same ciphertext but different key. However in column of K4 there are
two values of ciphertext 11. So we cannot determine for sure what was the original
plaintext: 00 or 10. This actually is a trick question, since the cipher design is in error.
A cipher must be reversible, so decryption is possible. This is an example of a cipher
design error that includes an irreversible mapping.
Figure 7.7 shows the fixed cipher: it is now reversible, and decryption is possible for
all values of key and ciphertext.
Question 7.3 (How many bits are needed to represent the key in cipher 2?). The example
2-bit ideal block cipher 2 (as well as cipher 1) list 24 different keys (or mappings from
plaintext to ciphertext). How many bits are needed to represent a key for this cipher?
Firstly, why are 24 keys listed? With a 2-bit block, there are 22 = 4 possible blocks,
i.e. 00, 01, 10, and 11. There are 4! = 24 different ways to arrange those 4 plaintext
blocks to produce ciphertext, i.e. 24 permutations of the plaintext blocks. A key is used
to select the distinct permutation.
With key length of 1 bit, we can represent 21 = 2 possible keys. With a key length
of 2 bits, we can represent 22 = 4 possible keys. With a key length of 3 bits, we can
represent 23 = 8 possible keys. With a key length of 4 bits, we can represent 24 = 16
possible keys. With a key length of 5 bits, we can represent 25 = 32 possible keys. That
is, a key length of 4 bits is not enough to represent our 24 keys, but a key length of 5 is.
Therefore we need a 5-bit key for this ideal 2-bit block cipher.
Question 7.4 (How to reduce repetition of plaintext blocks?). With a 2-bit ideal block
cipher, with a long plaintext, many of plaintext blocks will repeat. This is bad for security
(see Modes of Operation). What can you change in the design of an ideal block cipher
that reduces repetition of plaintext blocks?
Increasing the block size for a block cipher will reduce the change of block repetition.
Recall the first example of the 2-bit ideal block cipher encrypting Tokyo. The plaintext
was 40-bits, resulting in 20 blocks. As there are only 22 = 4 different plaintext values,
there will be repetition. On average (if the plaintext was random, which is not likely but
it simplifies the analysis), each plaintext value will be repeated 20/4 = 5 times.
If however a 3-bit ideal block cipher was used, there would be 23 = 8 different plaintext
values. There would be 14 blocks (40/3, with the last block having just 1 bit of plaintext).
On average, each plaintext value will be repeated 14/8, which is less than 2 times.
7.3. BLOCK CIPHER DESIGN PRINCIPLES 77
Increasing to a 4-bit ideal block cipher gives 16 different plaintext values, 10 blocks,
and a possibility there will be no repetition. Of course if the plaintext is much longer
than 40 bits, then repetition is still likely.
Figure 7.8 illustrates the impact of different block sizes for an example 80 bit plaintext
(whereas the previous example was a 40 bit plaintext).
Note that with a block size of 3 bits, the last block contains 2 bits of plaintext and 1
bit of padding. Padding is needed as all blocks must be the same size (since block ciphers
operate on fixed sized blocks). There are different schemes for padding, e.g. bit padding,
zero padding and PKCS7.
• n-bit block cipher takes n bit plaintext and produces n bit ciphertext
• Design trade-offs:
The trade-offs are conflicting, meaning ideal block ciphers are good in theory, but in
practice we need a different design approach.
Exercise 7.2 (Ideal 64-bit Block Cipher). Consider an ideal 64-bit block cipher. How
many different different keys are possible? How many bits are needed to store a single
key? How much space is required to store the mappings?
78 CHAPTER 7. ENCRYPTION AND ATTACKS
Solution 7.2 (Ideal 64-bit Block Cipher). We will not attempt to list all keys. With
64-bit blocks, there are 264 ! different permutations or mappings, meaning 264 ! possible
keys. To store a single key, we need about log2 (264 !) bits. Our software calculator will
not handle this, not even bc. So let’s try Wolfram Alpha, which returns 1.15398 × 1021 .
That means about 1021 bits are needed to store a key. That is approximately 125,000,000
TB. If someone wanted to send a short encrypted message to you, they would first need
to exchange a 125,000,000 TB key with you. Hence we see an ideal block cipher with
large blocks is not practical due to the key length.
For storage of the mappings, consider if you had to create a table similar to the 2-bit
ideal block ciphers. There are 264 ! columns, representing the keys. There are 264 rows,
representing the possible plaintext values. Each cell in the table contains a 64-bit, or 8
Byte, ciphertext value. So the storage space needed is 264 ! × 264 × 8 Bytes. If you attempt
to calculate this you will quickly see it is not practical to store the entire table.
Video
Ideal Block Cipher (8 min; Mar 2021)
https://www.youtube.com/watch?v=-LjOIGjURGs
To overcome the limitations of ideal block ciphers, Horst Feistel designed a general
scheme that is practical in the sense of implementation and key lengths, but still achieves
suitable security.
• Feistel proposed applying two or more simple ciphers in sequence so final result is
cryptographically stronger than component ciphers
• Approach:
For example, with a 64-bit block cipher, there are 264 possible mappings/keys, mean-
ing the key length is log2 (264 ) = 64 bits.
• Diffusion
7.3. BLOCK CIPHER DESIGN PRINCIPLES 79
• Confusion
Diffusion and confusion are concepts introduced by Claude Shannon. See a summary
of Shannon’s contributions in telecommunications, digital circuits and cryptography in
Chapter C.
You don’t need to know the details of the Feistel structure. Just be aware that it is
a design principle used in many block ciphers, including DES.
– Block size, e.g. 64, 128 bits: larger values leads to more diffusion
– Key size, e.g. 128 bits: larger values leads to more confusion, resistance against
brute force
– Number of rounds, e.g. 16 rounds
– Subkey generation algorithm: should be complex
– Round function F : should be complex
Video
Ideal Block Cipher vs Feistel Structure (2 min; Mar 2021)
https://www.youtube.com/watch?v=5-zWE7GjjaQ
Figure 7.10 illustrates the general operation of a stream cipher encryption and de-
cryption. The sender uses a shared secret key K and an algorithm to generate effectively
7.5. EXAMPLE: BRUTE FORCE ON DES 81
a random stream of bits. This random stream of bits is XORed with the plaintext bits
as needed.
The receiver uses the same key and algorithm, which in turn generates the same
random stream of bits. When XORed with the ciphertext, the original plaintext is
output.
An issue when using stream ciphers is that a key cannot be re-used. This is usually
addressed by introducing an initialisation value or vector (IV).
• Encrypting two different plaintexts with the same key leads to key re-use attack
Question 7.5 (When can key re-use attack be successful if IV is used?). If a stream
cipher is using a n-bit Initialisation Vector/Value (IV), but the same key, under what
conditions is a key re-use attack possible? Assume the IV increments every time an
encrypt operation is performed.
– 1998: DeepCrack
– 2006: COPACABANA
• Developed by EFF
• 80 × 109 keys/sec
Video
DeepCrack Brute Force on DES (1 min; Mar 2021)
https://www.youtube.com/watch?v=SIH2lgfV3Ao
• See www.sciengines.com
Using the above example, we can roughly estimate what it would cost today to brute
force DES.
A simplification of Moore’s law is that computers double their speed every 1.5 years.
In practice it is not that simple, but it is a useful rule to estimate the cost of brute force
today. It means in 1.5 years time, you could buy a computer that double the speed if a
new computer today, and at the same cost. Alternatively, you could buy a lower specced
computer, which is the same speed as a new computer today, buy half the cost of today’s
computer.
Assuming computers halve in cost every 1.5 years, between 2006 and 2020 is 14 years.
Over 15 years, there are 10 1.5 year periods, so the cost would halve 10 times. (Again
since this is an estimate, let’s use 15 years instead of 14). If you half $10,000 10 times,
you get $9.76. That is, a $10 computer today can brute force DES in 8.6 days.
As brute force attacks can be parallelised easily, you could spend $100 on 10 computers
(or buy a $100 computer) and break DES in less than a day. DES is not secure against
a brute force attack (and hasn’t been for a long time).
Video
SciEngines Copacabana Brute Force on DES (3 min; Mar 2021)
https://www.youtube.com/watch?v=8RGD7ckwoBI
• Biclique Attack
FPGAs are essentially computer processors programmed for a specific task, in this
case, decrypting with AES very fast. For about $12,800 a RIVYERA could decrypt
AES-128 at a rate of 500 × 106 keys per second.
A known plaintext attack on AES is called the Biclique attack. The RIVYERA
implementation of the Biclique attack could decrypted AES-128 at a rate of 945 × 106
keys per sec, about twice that of a brute force.
Now let’s consider what it would take to break AES.
• Biclique attack about 2 to 4 times faster, but requires 288 known plaintext/cipher-
text pairs
Applying the same logic from analysis of DES brute force and Moore’s law (i.e. every
1.5 years halve cost or double speed), we can perform a rough analysis of the cost/time
to break AES-128. The numbers (dollars, years) are so large such that even if the
approximations are incorrect by a factor of 1,000,000,000 (e.g. reducing 1014 years to
100, 000 years, then it is still impossible to break AES-128.
Video
SciEngines Rivyera Attack on AES (4 min; Mar 2021)
https://www.youtube.com/watch?v=KC3Z3yp0s5k
7.7. EXAMPLE: MEET-IN-THE-MIDDLE ATTACK 85
• Encrypt plaintext with one key, then encrypt output with another key
Double encryption was a (naive) option for extending the key length of DES. It
effectively would double the key length from 56 bits to 112 bits. A new cipher would not
have to be designed or analysed, and existing software/hardware implementations could
be used.
But a meet-in-the-middle attack makes Double-DES (or double encryption on any
block cipher) insecure.
• With two known plaintext, ciphertext pairs, probability of successful attack is al-
most 1
Figure 7.15 shows an example 5-bit block cipher with a 3-bit key. To encrypt, look in
the left column to find the row of the plaintext, then look for the column corresponding
to the key. The intersection of row and column gives the ciphertext.
This example block cipher is used in the Meet-in-the-Middle attack exercise.
Exercise 7.3 (Meet-in-the-Middle Attack). Figure 7.15 shows an example 5-bit block
cipher, referred to as Bob’s Cipher. A double version of Bob’s cipher, called Double-Bob,
was used by two users to exchange multiple encrypted messages using the same 6-bit
secret key. You have obtained the plaintext/ciphertext pairs of two of those messages:
(P1 , C1 ) = (01101, 11111) and (P2 , C2 ) = (11001, 11011). Using a meet-in-the-middle
attack, find the secret key.
Solution 7.3 (Meet-in-the-Middle Attack). Figure 7.16 shows notes on performing the
attack. Figure 7.17 shows calculations of the performance of the attack, and compares
to an attack on Double-DES.
Video
Meet-in-the-Middle attack on 5-bit block cipher (52 min; Feb 2016)
https://www.youtube.com/watch?v=AwNlaN1w9jg
7.7. EXAMPLE: MEET-IN-THE-MIDDLE ATTACK 87
• Different variations:
Figure 7.18 shows the concept of Triple Encryption, where two different keys are
used. This effectively doubles the key strength compared to the original cipher. Another
variation (not shown) would be to use three different keys, effectively tripling the key
strength.
Note that if you use the same key for each step, then because of the E-D-E approach,
this reverts to the original cipher. That is, if you use Triple-DES but use the same key
in each step, this reverts to (single) DES. The benefit of this is that you can have an
implementation of Triple-DES (which is built on the implementations of DES), and allow
the user to choose a key to suit their needs: 1 key for DES, 2 keys for 112-bit security, 3
keys for 168-bit security.
Video
Theoretical Attacks on DES and AES (2 min; Mar 2021)
https://www.youtube.com/watch?v=H001PSmfgMc
7.8. EXAMPLE: CRYPTANALYSIS ON TRIPLE-DES AND AES 89
This chapter provides details of Data Encryption Standard (DES), with concepts demon-
strated via a simplified, educational version called Simplified-DES. Many of the details
serve mainly as reference, with little discussion.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
8.2 Simplified-DES
To understand the details of a cipher, it often helps if you can perform the encryption
(or decryption) steps yourself. However as common block ciphers operate on blocks of
64 bits or larger, and use similar sized keys, it is difficult to manually and efficiently
perform operations. Therefore, to illustrate the principles of selected real ciphers, sim-
plified versions have been developed. This section presents Simplified Data Encryption
Standard (S-DES), which is a cut-down version of DES. For example, S-DES uses op-
erates on 8-bit blocks, uses an 8-bit key and has only 2 rounds. As it is designed using
the same principles as (real) DES but using smaller values, it is possible to step through
an example encryption by hand. For some this can be a powerful way to understand
the operations used in real DES. It is important however to note that S-DES is just for
education; it is not a real cipher used in practice today or in the past. You will only find
it referred to in textbooks and university classes.
91
92 CHAPTER 8. DATA ENCRYPTION STANDARD
Figure 8.1 shows the key generation and encryption steps of S-DES. Key generation,
shown on the left, is used to generate round keys and is the same algorithm when used
for both encryption and decryption. That is, the encrypter and decrypter will generate
the exact same round keys.
The encrypter started with a shared secret key 10 bits long and 8 bits of plaintext.
Two sub-keys, or round keys, K1 and K2 are generated using the key generation steps,
which involve Permutations and Left Shifts.
Encryption applies an Initial Permutation, then a round function fk (with details to
be shown shortly), SWaps the two halves of the 8 bit output, then reapplies the round
function, but using the 2nd round key as input. Encryption ends with the inverse of the
Initial Permutation.
Figure 8.2 shows the key generation and decryption. Decryption is in fact identical to
encryption, except the round keys are used in the opposite order. That is, for encryption
round key K1 is used first, then round key K2 . For decryption, K2 is used first and then
K1 .
8.2. SIMPLIFIED-DES 93
Figure 8.3 shows the details of the round function, fk . Note that the same steps are
applied in the 2nd round, but instead K2 is used as the round key. Operations include
Expand and Permutate, XOR, S-boxes and a Permutation of 4 bits. The 8 bits output
(left half and right half) are then input the the SWap block (swapping the two halves).
Definitions of the permutations and S-boxes follow.
Definition 8.1 (S-DES Permutations). Permutations used in S-DES:
P10 (permutate)
Input : 1 2 3 4 5 6 7 8 9 10
Output: 3 5 2 7 4 10 1 9 8 6
P8 (select and permutate)
Input : 1 2 3 4 5 6 7 8 9 10
Output: 6 3 7 4 8 5 10 9
P4 (permutate)
Input : 1 2 3 4
Output: 2 4 3 1
EP (expand and permutate)
Input : 1 2 3 4
Output: 4 1 2 3 2 3 4 1
IP (initial permutation)
Input : 1 2 3 4 5 6 7 8
Output: 2 6 3 1 4 8 5 7
As an example, permutation P4 takes a 4-bit input and produces a 4-bit output. The
1st bit of the input becomes the 4th bit of the output. The 2nd bit of the input becomes
the 1st bit of the output. The 3rd bit of the input becomes the 3rd bit of the output.
The 4th bit of the input becomes the 1st bit on the output.
The permutations are fixed. That is they are always these exact permutations, and
known by the encrypter, decrypter and attacker.
• LS-1: left shift by 1 position
• LS-2: left shift by 2 positions
• IP−1 : inverse of IP, such that X = IP−1 (IP(X))
• SW: swap the halves
• fK : a round function using round key K
• F: internal function in each round
Definition 8.2 (S-DES S-Boxes). S-Box considered as a matrix: input used to select
row/column; selected element is output
4-bit input: bit1 , bit2 , bit3 , bit4
bit1 bit4 specifies row (0, 1, 2 or 3 in decimal)
bit2 bit3 specifies column
01 00 11 10 00 01 10 11
11 10 01 00 10 00 01 11
S0 = S1 =
00 10 01 11 11 00 01 00
11 01 11 10 10 01 00 11
8.2. SIMPLIFIED-DES 95
Exercise 8.1 (Encrypt with S-DES). Show that when the plaintext 01110010 is en-
crypted using S-DES with key 1010000010 that the ciphertext obtained is 01110111.
Video
Simplified DES Example (44 min; Jan 2016)
https://www.youtube.com/watch?v=3jGMCyOXOV8
Solution 8.1 (Encrypt with S-DES). The input 10-bit key, K, is: 1010000010. Then
the steps for generating the two 8-bit round keys, K1 and K2 , are:
2. Left shift by 1 position both the left and right halves: 00001 11000
4. Left shift by 2 positions the left and right halves: 00100 00011
2. Assume the input from step 1 is in two halves, L and R: L=1010, R=1001
5. Input left halve of step 4 into S-Box S0 and right halve into S-Box S1:
7. XOR output from step 6 with L from step 2: 0111 XOR 1010 = 1101
8. Now we have the output of step 7 as the left half and the original R as the right
half. Swap the halves and move to round 2: 1001 1101
13. XOR output of step 12 with left halve from step 8: 0111 XOR 1001 = 1110
14. Input output from step 13 and right halve from step 8 into inverse IP
So our encrypted result of plaintext 01110010 with key 1010000010 is: 01110111
In summary, S-DES:
• If know plaintext and corresponding ciphertext, can we determine key? Very hard
The general design of S-DES follows the same principles as DES, although the algo-
rithm parameters differ.
• S-DES vs DES
• Rounds: 2 vs 16
• F: 4 bits vs 32 bits
• S-Boxes: 2 vs 8
The following section presents the details of DES. This is primarily for reference (or
as evidence of the similarities and differences with S-DES). You are not expected to know
the details of the DES operations.
8.3. DETAILS OF DES 97
Figure 8.4 shows the overall steps in DES encryption. The details of each block are
shown in the following.
Figure 8.5 shows the initial permutation and it’s inverse. The table is read row-by-
row. So the 58th input bit becomes the 1st output bit. The 50th input bit becomes the
2nd output bit. And the 7th input bit becomes the 64th output bit.
Figure 8.6 shows the details of a single round of encruption, i.e. the round function.
Similar to S-DES, it takes the right half, applies an expand and permutate (E), XOR
with the round key, applies S-Boxes, and then a final permutate (P).
Figure 8.7 shows E and P which are used within a round of DES.
Figure 8.8 shows the first 4 S-Boxes. Each S-Box takes a 6 bit input. The first and
last bit are used to determine the row, and the middle 4 bits determine the column. The
result is a decimal values within the range 0 to 15, which determines the 4 bit output.
See https://en.wikipedia.org/wiki/DES_supplementary_material for an example
of reading the S-Boxes.
Figure 8.9 shows the last 4 S-Boxes.
98 CHAPTER 8. DATA ENCRYPTION STANDARD
Figure 8.12 shows the schedule of left shifts indicating how many bits are shifted left
when a Left Shift is applied in each round for key generation.
Now lets encrypt using DES. You can use the list command in OpenSSL to see the
-cipher-algorithms and -cipher-commands (see Section 3.2.3). You will note there
are different variants of DES, such as DES-ECB, DES-CBC and DES-CFB. The second
identifier specifies the mode of operation. Modes of operation are covered in Chapter 11,
but in short, these allow DES, which operates on 64-bit blocks, to be used to encrypt
arbitrary sized plaintexts. The simplest mode of operation, but least secure, is Electronic
Code Book (ECB). We will use ECB in this example.
Symmetric key encryption in OpenSSL is performed using the enc operation. In the
simplest form, we specify the algorithm then the input file and output file (in our case,
ciphertext1.bin). If we don’t specify a secret key, then OpenSSL will prompt for a
password and then convert that to a secret key.
$ xxd -l 96 ciphertext1.bin
0000000: 5361 6c74 6564 5f5f f253 8361 b87d 1a3e Salted__.S.a.}.>
0000010: 30ed be95 5b38 ebf9 a013 ca64 bbf4 03ea 0...[8.....d....
0000020: 3ebb cdf8 483d 5a12 acd8 bc75 140c 920b >...H=Z....u....
0000030: da41 7376 edc3 b9bd 59c4 a5ce 0a67 408a .Asv....Y....g@.
0000040: d23e 10ee 7ac3 f5b6 4f09 4aaf 88e4 1f96 .>..z...O.J.....
0000050: 3171 7277 91a7 100c ac04 7871 dd39 cf4c 1qrw......xq.9.L
The lack of output from the diff command indicates the files plaintext1.in and
plaintext1.out are identical. We’ve retrieved the original plaintext.
xxd was used to view the first 96 bytes, in hexadecimal, of the ciphertext. The first
8 bytes contain the special string Salted__ meaning the DES key was generated using
a password and a salt. The salt is stored in the next 8 bytes of ciphertext, i.e. the value
f2538361b87d1a3e in hexadecimal. So when decrypting, the user supplies the password
and OpenSSL combines with the salt to determine the DES 64 bit key.
Section 8.4.2 shows a more detailed example where the key and IV are specified.
$ ls -l
total 4
-rw-r--r-- 1 sgordon sgordon 72 Nov 11 16:39 plaintext.txt
$ xxd -c 8 plaintext.txt
0000000: 4865 6c6c 6f2e 2054 Hello. T
0000008: 6869 7320 6973 206f his is o
0000010: 7572 2073 7570 6572 ur super
0000018: 2073 6563 7265 7420 secret
0000020: 6d65 7373 6167 652e message.
0000028: 204b 6565 7020 6974 Keep it
0000030: 2073 6563 7265 7420 secret
0000038: 706c 6561 7365 2e20 please.
0000040: 476f 6f64 6279 652e Goodbye.
$ xxd -b -c 8 plaintext.txt
0000000: 01001000 01100101 01101100 01101100 01101111 00101110 00100000
01010100 Hello. T
0000008: 01101000 01101001 01110011 00100000 01101001 01110011 00100000
01101111 his is o
0000010: 01110101 01110010 00100000 01110011 01110101 01110000 01100101
01110010 ur super
0000018: 00100000 01110011 01100101 01100011 01110010 01100101 01110100
00100000 secret
0000020: 01101101 01100101 01110011 01110011 01100001 01100111 01100101
00101110 message.
0000028: 00100000 01001011 01100101 01100101 01110000 00100000 01101001
01110100 Keep it
0000030: 00100000 01110011 01100101 01100011 01110010 01100101 01110100
00100000 secret
0000038: 01110000 01101100 01100101 01100001 01110011 01100101 00101110
00100000 please.
0000040: 01000111 01101111 01101111 01100100 01100010 01111001 01100101
00101110 Goodbye.
1. Create a short text message with echo. The -n option is used to ensure no newline is
added to the end. There are two things about this message that will be important
later: the length is a multiple of 8 characters (9 by 8 characters) and the word
secret appears twice (in particular positions).
5. Show the message in hexadecimal and binary using xxd. From now on, we’ll only
look at the hexadecimal values (not binary).
To encrypt with DES-ECB we need a secret key (as well as IV). You can choose your
own values. For security, they should be randomly chosen. We saw in Chapter 3 different
ways to generate random values. Let’s use OpenSSL’s rand twice: the first will be for
the secret key and the second for the IV.
8.4. DES IN OPENSSL 105
Now encrypt the plaintext using DES-ECB. The IV and Key are taken from the
outputs OpenSSL PRNG above. Importantly, we use the -nopad option at the end:
Now look at the output ciphertext. First note it is the same length as the plaintext
(as expected, when no padding is used). And on initial view, the ciphertext looks random
(as expected). But closer inspection you see there is some structure: the 4th and 7th
lines of the xxd output are the same. This is because it corresponds to the encryption of
the same original plaintext " secure " (recall that word was repeated in the plaintext, in
the positions such that it is in a 64-bit block). Since ECB is used, repetitions in input
plaintext blocks will result in repetitions in output ciphertext blocks. This is insecure
(especially for long plaintext). Another mode of operation, like CBC, should be used.
$ ls -l
total 8
-rw-r--r-- 1 sgordon sgordon 72 Nov 11 16:42 ciphertext.bin
-rw-r--r-- 1 sgordon sgordon 72 Nov 11 16:39 plaintext.txt
$ xxd -c 8 ciphertext.bin
0000000: 56dc b368 d9ef 0793 V..h....
0000008: 7be4 a87d e26d c2f1 {..}.m..
0000010: e042 bbe6 9e00 6d37 .B....m7
0000018: f1e9 7163 cb4a 38d8 ..qc.J8.
0000020: 5394 a92f 8cf2 ac72 S../...r
0000028: 5064 be07 f67c d807 Pd...|..
0000030: f1e9 7163 cb4a 38d8 ..qc.J8.
0000038: a31c 0efd cd0b dd03 ........
0000040: 0486 7e2d 00ad 762d ..~-..v-
And look at the decrypted value. Of course, it matches the original plaintext message.
$ ls -l
total 12
-rw-r--r-- 1 sgordon sgordon 72 Nov 11 16:42 ciphertext.bin
-rw-r--r-- 1 sgordon sgordon 72 Nov 11 16:39 plaintext.txt
-rw-r--r-- 1 sgordon sgordon 72 Nov 11 16:43 received.txt
$ cat received.txt
Hello. This is our super secret message. Keep it secret please. Goodbye.
$ xxd -c 8 received.txt
0000000: 4865 6c6c 6f2e 2054 Hello. T
0000008: 6869 7320 6973 206f his is o
0000010: 7572 2073 7570 6572 ur super
0000018: 2073 6563 7265 7420 secret
106 CHAPTER 8. DATA ENCRYPTION STANDARD
Now lets try and decrypt again, but this time using the wrong key. I’ve changed the
last hexadecimal digit of the key from “1” to “2”.
$ openssl enc -des-ecb -d -in ciphertext.bin -out received2.txt -iv
a499056833bb3ac1 -K 001e53e887ee55f2 -nopad
Looking at the decrypted message, it is random. We didn’t obtain the original plain-
text. Normally, when padding is used, OpenSSL adds a checksum when encrypting which
allows, after decrypting, incorrect deciphered messages to be automatically detected.
$ ls -l
total 16
-rw-r--r-- 1 sgordon sgordon 72 Nov 11 16:42 ciphertext.bin
-rw-r--r-- 1 sgordon sgordon 72 Nov 11 16:39 plaintext.txt
-rw-r--r-- 1 sgordon sgordon 72 Nov 11 16:46 received2.txt
-rw-r--r-- 1 sgordon sgordon 72 Nov 11 16:43 received.txt
$ xxd -c 8 received2.txt
0000000: 0346 e59e c22d 403f .F...-@?
0000008: 63ff 28fd eb6b 387d c.(..k8}
0000010: b52f d595 06c0 342f ./....4/
0000018: f419 3569 e383 c857 ..5i...W
0000020: 0a77 0b49 6f62 cb64 .w.Iob.d
0000028: 8265 d419 51f3 ea12 .e..Q...
0000030: f419 3569 e383 c857 ..5i...W
0000038: f296 33f3 5cf4 d359 ..3.\..Y
0000040: e205 4018 0ce0 34f5 [email protected].
However the checksum used within OpenSSL is not perfect, so it shouldn’t be relied
upon for secure authentication (i.e. checking the received message is correct). Chapter 17
discusses different ways for the receiver to be sure they have obtained the original message.
Video
DES Encryption using OpenSSL (13 min; Jan 2012)
https://www.youtube.com/watch?v=VdE21ku7SMs
Solution 8.2 (DES Key Generation). It is important that any symmetric key is generated
randomly. Using OpenSSL rand operation is a good approach. See Section 3.2.4 and/or
Section 8.4.2 for examples.
The key must be 64 bits (8 bytes or 16 hex digits).
8.5. DES IN PYTHON 107
Exercise 8.3 (DES Encryption). Create a message in a plain text file and after using
DES, send the ciphertext to the person you shared the key with.
Solution 8.3 (DES Encryption). See OpenSSL examples in Section 8.4.2. The sender
and receiver should agree upon the mode of operation, an IV (recommended to be random
in general, although not needed for ECB) and the use of padding (recommended to be
used).
• cryptography.io/en/latest/hazmat/primitives/symmetric-encryption/
108 CHAPTER 8. DATA ENCRYPTION STANDARD
Chapter 9
This chapter provides details of Advanced Encryption Standard (AES), with concepts
demonstrated via a simplified, educational version called Simplified Advanced Encryption
Standard (S-AES). Many of the details serve mainly as reference, with little discussion.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
• 1999: 3DES (168-bit key). NIST recommended 3DES be used (DES only for legacy
systems)
• 1997: NIST called for proposals for new Advanced Encryption Standards
The process for determining the algorithm to be selected for AES was performed as a
multi-round competition run by NIST. There are varying criteria for selecting a winner.
109
110 CHAPTER 9. ADVANCED ENCRYPTION STANDARD
– General Security
– Software and hardware implementations (needs to be efficient)
– Low RAM/ROM requirements (e.g. for smart cards)
– Ability to change keys quickly
– Potential to use parallel processors
Rijndael, a proposal by Vincent Rijmen and Joan Daemen, was selected the winner
for the following reasons:
• Low memory requirements: good, except encryption and decryption require sepa-
rate space
• Key flexibility: supports on-the-fly change of keys and different size of keys/blocks
Key parameters of Rijndael were the block and key sizes. While the Rijndael support
various sizes, the eventual NIST standard settling on a single block size with three key
lengths.
• For details of AES see the Stallings textbook, AES on Wikipedia or the AES stan-
dard from NIST
9.2. SIMPLIFIED-AES 111
9.2 Simplified-AES
• Educational purposes only. Mohammad A. Musa , Edward F. Schaefer and Stephen
Wedig (2003) A Simplified AES Algorithm and its Linear and Differential Crypt-
analyses, Cryptologia, 27:2, 148-177, DOI: 10.1080/0161-110391891838
• Operations:
S-AES operates on 16-bit blocks, with some operations on 8-bit words and others on
4-bit nibbles. For example, a 16-bit block is equivalent to two 8-bit words or four 4-bit
nibbles.
Figure 9.1, Figure 9.2, and Figure 9.3 show the encryption, decryption and key gen-
eration algorithms, respectively. The operations used in the algorithms are defined later.
Figure 9.1 shows the overall steps for S-AES and key expansion and encryption. The
key generation takes a 16-bit secret key and expands that into 3 16-bit round keys. The
first round key K0 is simple the original key. The next two round keys, K1 and K2 are
generated by an expansion algorithm. Figure 9.3 shows that algorithm for K1 .
S-AES encryption operates on 16-bit blocks of plaintext. To encrypt, there is an initial
add key, and then two rounds, where the 2nd round does not include the mix columns
operation.
Figure 9.2 shows the decryption operations. Note that it is similar to encryption in
reverse, with all operations replaced with their inverse operations. The same round keys
are used as in encryption, but in the opposite order.
Figure 9.3 shows the key generation operations for generated round key K1 . Similar
steps are used to generate K2 , where the input is K1 and a different round constant.
Definition 9.1 (S-AES State Matrix). S-AES operates on a 16-bit state matrix, viewed
as 4 nibbles " # " #
b0 b1 b2 b3 b8 b9 b10 b11 S0,0 S0,1
=
b4 b5 b6 b7 b12 b13 b14 b15 S1,0 S1,1
While S-AES operates on 16-bits at a time, those bits are viewed as a state matrix
of 4 nibbles. Note the matrix is filled columnwise, with the first 8 bits (2 nibbles) in the
first column.
The following shows operations based on the state matrix.
Definition 9.2 (S-AES Shift Row, Add Key and Rotate Nibbile operations). S-AES
Shift Row: " # " #
S0,0 S0,1 S0,0 S0,1
→
S1,0 S1,1 S1,1 S1,0
9.2. SIMPLIFIED-AES 113
The left-most 2 bits in a nibble determine the row, and the right-most 2 bits in the
nibble determine the column. The output nibble is based on the S-Box. The Inverse
S-Box is used in decryption.
114 CHAPTER 9. ADVANCED ENCRYPTION STANDARD
Definition 9.4 (S-AES Mix Columns). Mix the columns in the state matrix be perform-
ing a matrix multiplication.
Mix Columns:
1 4
" # " #" #
0 0
S0,0 S0,1 S0,0 S0,1
=
0
S1,0 0
S1,1 4 1 S1,0 S1,1
Inverse Mix Columns:
9 2
" # " #" #
0 0
S0,0 S0,1 S0,0 S0,1
=
0 0
S1,0 S1,1 2 9 S1,0 S1,1
Definition 9.5 (S-AES Mix Columns (Simple)). Mix the columns in the state matrix
be performing the following calculations.
Mix Columns:
0
S0,0 = S0,0 ⊕ (0100 × S1,0 )
0
S1,0 = (0100 × S0,0 ) ⊕ S1,0
0
S0,1 = S0,1 ⊕ (0100 × S1,1 )
0
S1,1 = (0100 × S0,1 ) ⊕ S1,1
Inverse Mix Columns:
0
S0,0 = (1001 × S0,0 ) ⊕ (0010 × S1,0 )
0
S1,0 = (0010 × S0,0 ) ⊕ (1001 × S1,0 )
0
S0,1 = (1001 × S0,1 ) ⊕ (0010 × S1,1 )
0
S1,1 = (0010 × S0,1 ) ⊕ (1001 × S1,1 )
For multiplication, lookup using Figure 9.4.
Figure 9.4 shows the GF(24 ) multiplication table in binary. The green column is used
in encryption (Mix Columns) and the two blue columns are used in decryption (Inverse
Mix Columns). For example with encryption, when multiplying a value by 4 (0100 in
binary), lookup the value in the first column (e.g. 0111) and the answer will be in the
green column (e.g. 1111).
Now let’s compare S-AES to the real AES, specifically AES-128.
• S-AES
9.3. SIMPLIFIED-AES EXAMPLE 115
• AES-128
Solution 9.1 (Encrypt with S-AES). See the PDF of the solution at:
https://sandilands.info/sgordon/teaching/reports/simplified-aes-example-v2.
pdf
116 CHAPTER 9. ADVANCED ENCRYPTION STANDARD
Both the Key (note uppercase -K) and IV were specified on the command line as a
hexadecimal string. With AES-128, they must be 32 hex digits (128 bits). You may
choose any value you wish.
Use the list operation in OpenSSL to see the variants of AES supported by OpenSSL
(see Section 3.2.3).
You can select the algorithms to test, e.g. AES, DES and Message Digest 5 hash
function (MD5):
$ openssl speed aes-128-cbc des md5
...
The ’numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
9.4. AES IN OPENSSL 117
The output shows the progress, the versions and options used for OpenSSL and then
a summary table at the end. Focus on the summary table, and the last line (for aes-
128-cbc) in the example above. The speed test encrypts as many b Byte input plaintexts
as possible in a period of 3 seconds. Different size inputs are used, i.e. b = 16, 64,
256, 1024 and 8192 Bytes. The summary table reports the encryption speed in Bytes
per second. So if 25955833 16-Byte plaintext values are encrypted in 3 seconds, then
the speed reported in the summary table is 25955833 × 16 ÷ 3 ≈ 138 million Bytes per
second. You can see that value (138,894.09kB/s) in the table above. So AES using 128
bit key and CBC can encrypt about 138 MB/sec when small plaintext values are used
and 155 MB/sec when plaintext values are 8192 Bytes.
Normally OpenSSL implements all algorithms in software. However recent Intel CPUs
include instructions specifically for AES encryption, a feature referred to as AES-NI. If
an application such as OpenSSL uses this special instruction, then part of the AES
encryption is performed directly by the CPU. This is usually must faster (compared to
using general instructions). To run a speed test that uses the Intel AES-NI, use the evp
option:
$ openssl speed -evp aes-128-cbc
...
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 689927.75k 729841.81k 745383.38k 747226.84k 747784.87k
Compare the values to the original results. In the original test we achieved 138
MB/sec. Using the Intel AES hardware encryption we get a speed of 689 MB/sec, about
5 times faster.
Solution 9.2 (AES Key Generation). It is important that any symmetric key is generated
randomly. Using OpenSSL rand operation is a good approach. See Section 3.2.4 for
examples.
The users need to select a key length: 128, 192 or 256 bits.
Exercise 9.3 (AES Encryption). Create a message in a plain text file and after using
AES, send the ciphertext to the person you shared the key with.
Solution 9.3 (AES Encryption). See OpenSSL examples in Section 9.4.1. The sender
and receiver should agree upon the mode of operation, an IV (recommended to be random
in general, although not needed for ECB) and the use of padding (recommended to be
used).
Exercise 9.5 (AES Performance Benchmarking). Perform speed tests on AES using
both the software and hardware implementations (if available). Compare and discuss
the impact of the following on performance: key length; software vs hardware; different
computers (e.g. compare the performance with another person).
Solution 9.5 (AES Performance Benchmarking). See OpenSSL examples in Section 9.4.2.
The performance of AES-128, AES-192 and AES-256 should be compared. Also, compare
software implementation of AES (default when running OpenSSL) with the hardware im-
plementation (-evp) if supported by your computer.
• https://cryptography.io/en/latest/hazmat/primitives/symmetric-encryption/
Chapter 10
119
120 CHAPTER 10. PSEUDORANDOM NUMBER GENERATORS
Chapter 11
This chapter presents common modes of operation available with symmetric block ciphers.
Modes of operation allow the block ciphers to be applied to inputs greater than the block
size. The difference in designs lead to different security and performance tradeoffs. This
chapter is primarily for reference, presenting the modes but with little explanation of
each.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
• Block cipher: operates on fixed length b-bit input to produce b-bit ciphertext
• Naive approach: Break plaintext into b-bit blocks (padding if necessary) and apply
cipher on each block independently
– ECB
121
122 CHAPTER 11. BLOCK CIPHER MODES OF OPERATION
We will not cover each mode of operation in detail, but rather present them so you
are aware of some of the common modes. For more technical details of some of these
modes of operation, including discussion of padding, error propagation and the use of
initialisation vectors, see NIST Special Publication 800-38A Recommendations for Block
Cipher Modes of Operation: Methods and Techniques. Additional (newer) modes of
operation are in the NIST SP 800-38 series, such as 800-38C CCM, 800-38D GCM and
800-38E XTS-AES.
• Problem: with long message, repetition in plaintext may cause repetition in cipher-
text
• Input to encryption algorithm is XOR of next 64-bits plaintext and preceding 64-
bits ciphertext
• Initialisation Vector (IV) must be known by sender/receiver, but secret from at-
tacker
11.3. CIPHER BLOCK CHAINING MODE 123
11.7 XTS-AES
XTS-AES is a mode of operation designed for AES to be used to encrypt stored data
(e.g. disk drives). Compared to CBC, it improves the ability for a receiver to detect if
the ciphertext has been changed.
• XTS-AES designed for encrypting stored data (as opposed to transmitted data)
• Overcomes potential attack on CBC whereby one block of the ciphertext is changed
by the attacker, and that change does not affect all other blocks
• See Stallings Chapter 6.7 for details and differences to transmitted data encryption
128 CHAPTER 11. BLOCK CIPHER MODES OF OPERATION
Part IV
129
Chapter 12
This chapter summarises key concepts in public key cryptography. These concepts will
be demonstrated when looking at specific algorithms, including RSA (Chapter 13), Diffie-
Hellman Key Exchange (Chapter 14) and Elliptic Curve Cryptography (Chapter 15).
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
With symmetric key encryption, assume the sender generates a random key. The
receiver of the encrypted data must also know that key in order to decrypt the data. But
how does the receiver learn the key? If the sender sends the key unencrypted then an
attacker can learn the key and it is no longer secret. If the sender encrypts the key, then
the same problem arises: how do they get the second key (which is used to encrypt the
first key) to the receiver?
Public key encryption can solve this problem, as we will see in the following slides.
File: crypto/public.tex, r1944
131
132 CHAPTER 12. PUBLIC KEY CRYPTOGRAPHY
Symmetric key encryption has been the main form of cryptography for a long time.
It wasn’t until the 1960’s and 1970’s that public key cryptography was designed.
– Keys are generated using known algorithm (they are not chosen randomly like
symmetric keys)
• Public key, PU
• Private key, PR
• Ciphers: if encrypt with one key in the pair, can only successfully decrypt with the
other key in the pair
Consider all the students in the class. With public key crypto, each student would
generate their own key pair. They could tell everyone their public key (e.g. yell it out in
class, print on the screen and show), but they must keep their private key secret. Note
that the keys are related: an algorithm is used to generate them (they are not randomly
chosen like symmetric key encryption secret keys). That algorithm must be designed
such that it is practically impossible for someone to find the private key if they know the
public key.
The encryption/decryption algorithms in public key crypto are designed such that if
you encrypt plaintext with one key in the pair, then you can only successfully decrypt the
ciphertext if using the other key from that pair. For example, if you encrypt a message
with the public key of Steve, then you can only decrypt the ciphertext if you know the
private key of Steve.
Some public key ciphers also work in the other direction: if you encrypt a message
with the private key of Steve, then you can only decrypt the ciphertext if you know the
public key of Steve. We will see this in digital signatures.
This assumes User A (on the left ) already knows the public key of user B. Since it is
PUBLIC there is no problem with A knowing B’s public key. However in practice, there
are problems with A being sure that the public key does indeed belong to B (maybe it
is someone pretending to be B). We don’t cover that here, but in the chapter on digital
certificates we will see this issue (of knowing who’s public key it is) be addressed.
• Diffie-Hellman
• Elliptic Curve
Video
Concepts of Public Key Cryptography (21 min; Apr 2021)
https://www.youtube.com/watch?v=9ZFm9i_uYvM
134 CHAPTER 12. PUBLIC KEY CRYPTOGRAPHY
Chapter 13
RSA
This chapter presents the RSA algorithm, as an example of public key cryptography.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
As we will see, the plaintext and ciphertext are integers. Any data can be represented
in binary, and then split into blocks, where each block is taken as an input to RSA.
More information about Rivest, Shamir and Adleman is given in Chapter C.
• Step 1: Users generated RSA key pairs using RSA Key Generation Algorithm
The following will show the algorithms used in steps 1, 3 and 4. For now we assume
the users can exchange public keys, noting that public keys do not need to be kept secret.
For example, one method to exchange public keys over a network is to simply email the
public key, unencrypted. It doesn’t matter if an attacker intercepts the public key, since,
by definition, it is public to everyone.
Later we will see that the exchange of public keys is in fact harder than it seems.
File: crypto/rsa.tex, r1945
135
136 CHAPTER 13. RSA
Video
Introduction to RSA (4 min; Apr 2021)
https://www.youtube.com/watch?v=29Jpc-rvH8w
Algorithm 13.1 (RSA Key Generation). Each user generates their own key pair
1. Choose primes p and q
2. Calculate n = pq
3. Select e: gcd(φ(n), e) = 1, 1 < e < φ(n)
4. Find d ≡ e−1 (mod φ(n))
The user keeps p, q and d private. The values of e and n can be made public.
• Public key of user, P U = {e, n}
• Private key of user P R = {d, n}
Note that the private key includes both d and n, however the same n is also included
in the public key. So while n is included in the private key, it is not actually private.
This describes the conceptual view of the RSA public and private key. Implementations
of RSA may store additional information in the keys, especially the private key.
Video
RSA Key Generation Algorithm (6 min; Apr 2021)
https://www.youtube.com/watch?v=bKuaAv8LsFY
Exercise 13.1 (RSA Key Generation). Assume user A chose the primes p = 17 and
q = 11. Find the public and private keys of user A.
Solution 13.1 (RSA Key Generation). First calculate n:
n = p×q
= 17 × 11
= 187
Now find φ(n) using the property of Euler’s totient (Definition 5.8):
Next we choose a number relatively prime with 160, which will be e. Or in other
words, the greatest common divisor of e and 160 is 1. There are multiple values possible
for e. We need to choose just one value, and at this point, any of those values. Let’s
start small. As 160 is even, the even numbers will not be relatively prime with 160 so we
can ignore them. What about 3? As 3 is prime and is not a divisor of 160, then 3 and
160 are relatively prime. So e = 3 is a valid choice. There are other valid choices (e.g. 7,
9, 11, . . . ), but we will go with 3.
Now we need to find the multiplicative inverse of 3 in mod 160. That is, find a d such
that:
3 × d (mod 160) ≡ 1
The extended Euclidean algorithm can efficiently find a multiplicative inverse. But for
now, as we are using small numbers, we can use trial and error. Note that the condition
can be satisfied if we can find a d that satisfies the following, for an integer a:
3 × d = (a × 160) + 1
Video
RSA Key Generation Example (14 min; Feb 2015)
https://www.youtube.com/watch?v=_57utzfyyPY
Note the conceptual simplicity of the encryption and decryption algorithms, com-
pared to DES and AES. Also note that the decryption algorithm is in fact identical to
encryption—it is only the variable names that have changed.
Video
RSA Encryption and Decryption (2 min; Apr 2021)
https://www.youtube.com/watch?v=lQLJy6XVRuY
For a RSA to be usable it must meet the following usability and security requirements:
138 CHAPTER 13. RSA
2. Successful decryption: Encryption with one key of a key pair (e.g. PU) can only be
successfully decrypted with the other key of the key pair (e.g. PR)
3. Computational efficiency: Easy to calculate M e mod n and C d mod n for all values
of M < n
We will not show how RSA meets these requirements yet (it is covered in more depth
later), but RSA does indeed meet these requirements.
The 1st requirement is that if a message is encrypted, then the decryption of the
resulting ciphertext will produce the original message.
The 2nd requirement is that you can only use keys in the same key pair; using the
wrong key will produce incorrect results.
The 3rd requirement is that users can easily perform the encrypt and decrypt op-
erations. By “easily” we mean within reasonable time (i.e. seconds, not thousands of
years).
The 4th requirement is that an attacker cannot find the private value d or the message.
The 5th requirement is that, even if the attacker knows old plaintext values and the
corresponding ciphertext (which was obtained using the same key pair), they should not
be able to find d or M .
Looking at the algorithms it is not immediately obvious how the security requirements
are met. That is because, for example, the encryption algorithm is an equation with 4
variables (C, M , e, n), of which 3 are known to the attacker. Why can’t the attacker
re-arrange the equation and find the value of the unknown variable C? We will see some
analysis of the security later.
• RSA encryption uses one key of a key pair, while decryption must use the other
key of that same key pair
• In practice, RSA is primarily used for authentication, i.e. sign and verifying mes-
sages
13.1. RSA ALGORITHM 139
Why does confidentiality work? Since the receiver is the only user that knows their
private key, then they are the only user that can decrypt the ciphertext.
Why does authentication work? Since the sender is the only user that knows their
private key, then they are the only user that can sign the message/plaintext. And the
receiver can verify it came from that user if the signature decrypts successful with the
sender’s public key.
Figures 13.1 and 13.2 illustrate how the key pair is used in RSA to provide either
confidentiality or authentication. Note that such a feature (ability to use keys in either
direction) is including in some, but not all, public key cryptography ciphers.
Figure 13.1 shows RSA used to provide confidentiality of the message M . User A is
on the left and user B is on the right. The operations E() and D() correspond to the
encrypt and decrypt algorithms of RSA, respectively. User A encrypts the message using
user B’s public key, P UB . The ciphertext is sent to user B. User B then decrypts using
their own private key, P RB .
Figure 13.2 shows RSA used to provide authentication of the message M . The opera-
tions E() and D() correspond to the encrypt and decrypt algorithms of RSA, respectively,
however they are more commonly referred to as signing and verification operations, re-
spectively. User A encrypts/signs the message using their own private key, P RA . The
ciphertext/signed message is sent to user B. User B then decrypts/verifies using user A’s
public key, P UA .
Exercise 13.2 (RSA Encryption for Confidentiality). Assume user B wants to send a
140 CHAPTER 13. RSA
confidential message to user A, where that message, M is 8. Find the ciphertext that B
will send A.
Solution 13.2 (RSA Encryption for Confidentiality). For confidentiality, the sender
encrypts using the receiver’s public key. From the previous key generation exercise, the
public key of user A is P UA = {e = 3, n = 187}. With M = 8, the RSA encryption
algorithm can be applied:
C = M e mod n
= 83 mod 187
= 512 mod 187
= 138
Video
RSA Encryption Example (11min; Feb 2015)
https://www.youtube.com/watch?v=fdGGErmf9E8
Exercise 13.3 (RSA Decryption for Confidentiality). Show that user A successfully
decrypts the ciphertext.
Solution 13.3 (RSA Decryption for Confidentiality). User A receives the ciphertext,
C = 138 from B, and decrypts using their own private key P RA = {d = 107, n = 187}.
M = C d mod n
= 138107 mod 187
Be careful at this stage. Some calculators will approximate the exponentiation (the
calculator applications in Ubuntu 18.08 and Windows 10 do not, but older desk calcu-
lators will). You may try an arbitrary precision calculator, such as bc (see Chapter 3).
The output from the exponentiation using bc is:
138ˆ107
92696267009151974112580966494142469075148237762435797813883675229744\
10315603725576855575549455980054411733018856229158449793951447981059\
64058537231504845445105996494390906329961481123710256232656386293889\
6109715508026034609979392
Then performing the mod gives:
138ˆ107 % 187
8
Therefore user A has successfully decrypted the ciphertext, obtaining the original
plaintext, M = 8.
13.2. ANALYSIS OF RSA 141
n = pq
1 < e < φ(n)
ed ≡ 1 (mod φ(n)) or d ≡ e−1 (mod φ(n))
Here we see why the key generation algorithm is designed as it is. Decryption will
only work (that is, produce the original plaintext) if the top equation is true. Note that
M e = M ed . So the condition is that if you take the plaintext M and raise it to the
d
power ed then the answer must be the original M (in mod n). For this to be true, e and
d must be chosen appropriately—it will not work for just any value of e and d. Using
Euler’s theorem it can be shown that it will be true if e and d are multiplicative inverses
of each other in mod φ(n).
Now we consider the guidelines for choosing values of parameters in RSA key gener-
ation.
• Choosing e
• Choosing d
• Choosing p and q
As we saw in the exercise, key generation involves selecting values for p, q and e
(where e influences the value of d as it is the multiplicative inverse).
As e is a public value, a small value can be selected (since a brute force is not relevant;
the attacker already knows it) and in fact, many users can use the same value as each
other. For example, OpenSSL defaults to using e = 216 + 1 = 65537 for all keypairs
generated. That is, by default everyone using OpenSSL to generate keypairs will have
the same value of e. This value is small, meaning encryption is reasonable fast.
As d is the multiplicative inverse of e, a small e means d will be large. This is good,
because d must be kept private; large values are not subject to brute force attack. But it
makes decryption slow, since it involves M d , which is often taking one very large number
M and raising to the power of another very large number d. We will see later there are
algorithms that can speed up the decryption process.
The primes p and q should be chosen randomly (again, they are private, so should be
hard for an attacker to guess). A common approach is to choose a large odd number and
then check if it is prime. There are primality testing algorithms that can either prove
the number selected is prime, or give high confidence that it is prime (i.e. probabilistic
test). When RSA is used for signatures—it’s most common use—probabilistic testing is
sufficient (it is faster than testing for provable primes).
Now we look at why RSA is considered secure by considering the possible attacks on
RSA.
The three mathematical attacks require the attacker to solve computationally hard
problems. That is, when large values are used,
Video
Analysis of RSA (6 min; Apr 2021)
https://www.youtube.com/watch?v=CX2-Crudguk
Video
Avenues of attack on RSA (10 min; Feb 2015)
https://www.youtube.com/watch?v=ywlzcE3eQzQ
13.3. IMPLEMENTATIONS OF RSA 143
• Modular arithmetic, especially exponentiation, can be slow with very large numbers
(1000’s of bits)
• Also Euler’s theorem and Chinese Remainder Theorem can simplify calculations
• Implementations of RSA often store and use intermediate values to speed up de-
cryption
While there are methods to speed up decryption in RSA (see the next slide), it is still
significantly slower than encryption in practice.
• Encryption:
C = M e mod n
• Decryption:
M = C d mod n
• Public exponent, e
• Private exponent, d
• Exponent1, dp = d (mod p − 1)
• Exponent2, dq = d (mod q − 1)
We see the parameters used within OpenSSL. p, q, n, e and d are normal. However
dp , dq and qinv are intermediate values introduced and stored as part of the private key.
They are used to speed up the decryption calculation. The decryption algorithm is split
into multiple steps using these intermediate values, such that it is significant faster than
if using a single step. However the end result is still the same.
While you don’t need to know what the intermediate steps are, it is useful to know
that these intermediate values exist, as you will see them when using RSA in practice
(e.g. generating keys with OpenSSL).
13.4. RSA IN OPENSSL 145
sh1/NRJOohTYtsDgvH49CpAaT9R7w42eBRfUHOv7H9KeYyv3GNlARyzXouM4WtIb
dFkMqrwrQyEIkl73l8VdXXDZtQ/xByDOjPMBxvosNM2f9jcw2BbctslbvpaJ2Mk2
oW892h8CgYEAlNOwWPMCzlyvhHqSba22clMmZRIVYzOa/g80rQq7nmAI7QoXY02/
JcOxOFBA7xK8XUToG/7hPLQ5VQfwxxpogDYvC6W0drt3z3VR8rSmN11/sUgU/4aX
9mgEnwHlukKFJ9B3MFB1niX8z542RxHUW4FGT62BGW0Ka8T7DX+sCO8CgYEAgLYG
/L7oY6ekMXnKbIK1HKyvRV0k2YGOArxmdr5Uzgw0bA3lzytAfal+Bwq8NThSgl5p
WLqNaJ1SFTcUQh1PZeYq2h3lF0IlkeFnouYIcdLHghYFtun6ZS4+Pks7zgqgr2s0
XfdWhKbIIzO/+XogkA89zzDn1GRb5dt5wPTT5r8CgYEA29235n/Hw7wzOJyao6nO
3rjCZon4/V2G800VJF5hhAqCX5KDLd0KIMbaHaxsjW+n79CqZSUz3kZtpSXBXRJ7
SIXoCYljaoxdJ6SkVED6uFmcZ+3iwioxXzpIFIW0ZZj5S/WgBkPsioAJ6Cp5S8zh
BFB15UA+JWFH2SRabjXf0+4=
-----END PRIVATE KEY-----
The genpkey command takes an algorithm (RSA) as an option, and that algorithm
may have further specific options. In this example we set the RSA key length to 2048
bits and used a public exponent of 3. Omitting these -pkeyopt options will revert to the
default values. The private key (and public key information) is output to a file.
The private key file is encoded with Base64. To view the values:
4f:ba:96:f4:5d:88:1c:aa:47:4c:c1:9e:75:9d:47:
d0:7c:c1:6d:bf:8c:cd:8d:83:7c:aa:e7:54:ba:a9:
1e:8b:52:cf:76:8c:4c:d8:01:3c:f5:75:ff:33:fa:
a7:14:2f:e9:da:be:63:92:2d:d0:35:d7:4a:90:a5:
b2:a3:12:32:35:7a:c2:48:92:aa:ca:95:08:ae:d4:
dd:13:41:b7:b1:0b:d7:4d:c2:d2:3b:f4:6e:fd:51:
1f:98:39:1f:02:58:9c:98:cb:f6:4f:79:4a:ec:af:
59:9c:48:62:fb:7d:a7:16:fd:a1:b6:86:1f:39:fd:
77:48:0a:6d:8f:5a:04:72:49:4b:b4:33:6a:c7:77:
12:fe:00:cb:a4:6a:e6:27:97:0e:20:75:d4:ce:d1:
c9:94:37:2c:4a:ad:5a:f9:69:00:4f:85:9c:4a:b4:
55:79
publicExponent: 3 (0x3)
privateExponent:
70:3d:80:19:9d:5d:16:a3:3b:73:2f:3b:30:71:59:
28:a5:db:3c:6b:61:e4:15:c2:71:1f:be:ad:f3:a4:
c6:88:83:18:ce:65:6a:d2:bc:4d:cb:25:ea:57:4c:
d8:ad:f7:fc:af:22:c4:77:a7:17:02:06:cb:0c:77:
5d:53:fa:4a:da:e8:67:b1:5b:c9:88:1d:b0:dd:4e:
05:bf:a1:a6:77:d3:74:e0:c6:e0:a3:34:c5:56:35:
27:0f:4d:93:b0:13:1c:2f:88:81:14:4e:68:da:8a:
fd:d6:49:2a:5d:de:5e:57:a8:71:ef:8d:d1:c6:14:
5c:e1:df:a4:5d:88:8f:ff:bd:c4:97:54:70:bb:e8:
09:21:90:9d:77:68:31:08:51:24:0f:31:f1:34:3a:
62:38:76:02:b6:b3:11:fc:05:70:26:02:07:3b:60:
e9:2c:2c:5e:2d:fa:d8:da:12:87:94:15:05:8f:4b:
01:62:1a:05:5a:53:7c:2d:9a:fd:43:a1:2c:1b:00:
a4:96:b3:ff:61:0e:0d:ef:80:df:00:16:4b:b7:1c:
27:41:92:99:a9:a7:60:98:aa:80:56:14:37:d4:35:
6d:aa:4e:d0:7b:21:4d:9f:c1:43:ad:a2:a8:96:f0:
27:a0:a9:53:5f:f9:7f:8a:59:3f:39:99:bd:4d:9e:
a3
prime1:
00:df:3d:88:85:6c:84:35:8b:07:46:b7:db:a4:84:
91:ab:7c:b9:97:9b:20:14:cd:68:7d:16:cf:03:90:
19:6d:90:0d:63:8f:23:14:f4:9e:b8:a5:89:d4:78:
61:66:9c:1a:8b:e7:5c:29:fe:51:db:0e:55:ff:8b:
e9:2a:a7:9c:c0:51:46:91:78:8e:b2:19:33:b7:2f:
fa:ec:0e:f9:53:0c:3f:89:ec:1f:7f:49:e3:f1:9c:
06:ee:82:d8:97:63:c7:bb:b8:b2:c8:78:b0:6d:38:
fb:37:6d:51:6a:9a:be:89:41:e9:77:84:41:a6:23:
8f:a1:a7:78:94:3f:82:0d:67
prime2:
00:c1:11:0a:7b:1e:5c:95:7b:76:4a:36:af:a2:c4:
0f:ab:03:06:e8:0b:b7:46:42:55:04:1a:99:b2:1d:
7f:35:12:4e:a2:14:d8:b6:c0:e0:bc:7e:3d:0a:90:
1a:4f:d4:7b:c3:8d:9e:05:17:d4:1c:eb:fb:1f:d2:
9e:63:2b:f7:18:d9:40:47:2c:d7:a2:e3:38:5a:d2:
1b:74:59:0c:aa:bc:2b:43:21:08:92:5e:f7:97:c5:
5d:5d:70:d9:b5:0f:f1:07:20:ce:8c:f3:01:c6:fa:
2c:34:cd:9f:f6:37:30:d8:16:dc:b6:c9:5b:be:96:
89:d8:c9:36:a1:6f:3d:da:1f
exponent1:
00:94:d3:b0:58:f3:02:ce:5c:af:84:7a:92:6d:ad:
b6:72:53:26:65:12:15:63:33:9a:fe:0f:34:ad:0a:
bb:9e:60:08:ed:0a:17:63:4d:bf:25:c3:b1:38:50:
40:ef:12:bc:5d:44:e8:1b:fe:e1:3c:b4:39:55:07:
148 CHAPTER 13. RSA
f0:c7:1a:68:80:36:2f:0b:a5:b4:76:bb:77:cf:75:
51:f2:b4:a6:37:5d:7f:b1:48:14:ff:86:97:f6:68:
04:9f:01:e5:ba:42:85:27:d0:77:30:50:75:9e:25:
fc:cf:9e:36:47:11:d4:5b:81:46:4f:ad:81:19:6d:
0a:6b:c4:fb:0d:7f:ac:08:ef
exponent2:
00:80:b6:06:fc:be:e8:63:a7:a4:31:79:ca:6c:82:
b5:1c:ac:af:45:5d:24:d9:81:8e:02:bc:66:76:be:
54:ce:0c:34:6c:0d:e5:cf:2b:40:7d:a9:7e:07:0a:
bc:35:38:52:82:5e:69:58:ba:8d:68:9d:52:15:37:
14:42:1d:4f:65:e6:2a:da:1d:e5:17:42:25:91:e1:
67:a2:e6:08:71:d2:c7:82:16:05:b6:e9:fa:65:2e:
3e:3e:4b:3b:ce:0a:a0:af:6b:34:5d:f7:56:84:a6:
c8:23:33:bf:f9:7a:20:90:0f:3d:cf:30:e7:d4:64:
5b:e5:db:79:c0:f4:d3:e6:bf
coefficient:
00:db:dd:b7:e6:7f:c7:c3:bc:33:38:9c:9a:a3:a9:
ce:de:b8:c2:66:89:f8:fd:5d:86:f3:4d:15:24:5e:
61:84:0a:82:5f:92:83:2d:dd:0a:20:c6:da:1d:ac:
6c:8d:6f:a7:ef:d0:aa:65:25:33:de:46:6d:a5:25:
c1:5d:12:7b:48:85:e8:09:89:63:6a:8c:5d:27:a4:
a4:54:40:fa:b8:59:9c:67:ed:e2:c2:2a:31:5f:3a:
48:14:85:b4:65:98:f9:4b:f5:a0:06:43:ec:8a:80:
09:e8:2a:79:4b:cc:e1:04:50:75:e5:40:3e:25:61:
47:d9:24:5a:6e:35:df:d3:ee
Check by looking at the individual values. Only the public key values are included:
alice@node1:~$ openssl pkey -in pubkey-alice.pem -pubin -text
-----BEGIN PUBLIC KEY-----
MIIBIDANBgkqhkiG9w0BAQEFAAOCAQ0AMIIBCAKCAQEAqFxAJmwLofTZLMbYyKoF
vPjI2qES1iCjqa+eBO13KczEpTWYIDwadLC434LzRQTz+wa0JrN6ooMKMJKzC/33
cEhcm4oJrkwsiUv1CJ9yebO9L1EqUPTPKAFPupb0XYgcqkdMwZ51nUfQfMFtv4zN
jYN8qudUuqkei1LPdoxM2AE89XX/M/qnFC/p2r5jki3QNddKkKWyoxIyNXrCSJKq
ypUIrtTdE0G3sQvXTcLSO/Ru/VEfmDkfAlicmMv2T3lK7K9ZnEhi+32nFv2htoYf
Of13SAptj1oEcklLtDNqx3cS/gDLpGrmJ5cOIHXUztHJlDcsSq1a+WkAT4WcSrRV
eQIBAw==
-----END PUBLIC KEY-----
Public-Key: (2048 bit)
Modulus:
13.4. RSA IN OPENSSL 149
00:a8:5c:40:26:6c:0b:a1:f4:d9:2c:c6:d8:c8:aa:
05:bc:f8:c8:da:a1:12:d6:20:a3:a9:af:9e:04:ed:
77:29:cc:c4:a5:35:98:20:3c:1a:74:b0:b8:df:82:
f3:45:04:f3:fb:06:b4:26:b3:7a:a2:83:0a:30:92:
b3:0b:fd:f7:70:48:5c:9b:8a:09:ae:4c:2c:89:4b:
f5:08:9f:72:79:b3:bd:2f:51:2a:50:f4:cf:28:01:
4f:ba:96:f4:5d:88:1c:aa:47:4c:c1:9e:75:9d:47:
d0:7c:c1:6d:bf:8c:cd:8d:83:7c:aa:e7:54:ba:a9:
1e:8b:52:cf:76:8c:4c:d8:01:3c:f5:75:ff:33:fa:
a7:14:2f:e9:da:be:63:92:2d:d0:35:d7:4a:90:a5:
b2:a3:12:32:35:7a:c2:48:92:aa:ca:95:08:ae:d4:
dd:13:41:b7:b1:0b:d7:4d:c2:d2:3b:f4:6e:fd:51:
1f:98:39:1f:02:58:9c:98:cb:f6:4f:79:4a:ec:af:
59:9c:48:62:fb:7d:a7:16:fd:a1:b6:86:1f:39:fd:
77:48:0a:6d:8f:5a:04:72:49:4b:b4:33:6a:c7:77:
12:fe:00:cb:a4:6a:e6:27:97:0e:20:75:d4:ce:d1:
c9:94:37:2c:4a:ad:5a:f9:69:00:4f:85:9c:4a:b4:
55:79
Exponent: 3 (0x3)
To sign the message you need to calculate its hash and then encrypt that hash using
your private key. To create a hash of a message (without encrypting):
alice@node1:~$ openssl dgst -sha1 message-alice.txt
SHA1(message-alice.txt)= 064774b2fb550d8c1d7d39fa5ac5685e2f8b1ca6
OpenSSL has an option to calculate the hash and then sign it using a selected private
key. The output will be a file containing the signature.
alice@node1:~$ openssl dgst -sha1 -sign privkey-alice.pem -out sign-alice.bin
message-alice.txt
alice@node1:~$ ls -l
total 16
-rw-r--r-- 1 sgordon users 28 2012-03-04 15:14 message-alice.txt
-rw-r--r-- 1 sgordon users 1704 2012-03-04 14:58 privkey-alice.pem
-rw-r--r-- 1 sgordon users 451 2012-03-04 15:08 pubkey-alice.pem
-rw-r--r-- 1 sgordon users 256 2012-03-04 15:20 sign-alice.bin
Note that direct RSA encryption should only be used on small files, with length less
than the length of the key. If you want to encrypt large files then use symmetric key
encryption. Two approaches to do this with OpenSSL: (1) generate a random key to be
used with a symmetric cipher to encrypt the message and then encrypt the key with RSA;
(2) use the smime operation, which combines RSA and a symmetric cipher to automate
approach 1.
Now Alice sends the following to Bob:
• Optionally, if she hasn’t done so in the past, her public key, public-alice.pem
Solution 13.4 (RSA Key Generation). See the examples of genpkey and pkey commands
in Section 13.4.1.
Exercise 13.5 (RSA Signing). Create a message in a file, sign that message using the
dgst command, and then send the message and signature to another person.
13.5. RSA IN PYTHON 151
Solution 13.5 (RSA Signing). Use a text editor, such as nano, to create a file containing
a message. See the examples of dgst in Section 13.4.2.
Exercise 13.7 (RSA Performance Test). Using the OpenSSL speed command, com-
pare the performance of RSA encrypt/sign operation against the RSA decrypt/verify
operation.
Solution 13.7 (RSA Performance Test). You can select the rsa algorithm using the
speed command, so that the performance test is only for RSA (and doesn’t include AES
etc.).
• https://cryptography.io/en/latest/hazmat/primitives/asymmetric/
152 CHAPTER 13. RSA
Chapter 14
This chapter presents the Diffie–Hellman key exchange algorithm, which was the first
example of public key cryptography.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
It is important to note that DHKE is a “key exchange” protocol. The purpose is for
two users to exchange a secret key. Once a secret key has been exchanged with DHKE,
the two users can then use that secret key for other purposes (e.g. for encrypting data
using AES).
File: crypto/dh.tex, r1968
153
154 CHAPTER 14. DIFFIE–HELLMAN KEY EXCHANGE
If you do not know what a discrete logarithm is, it is worth refreshing your knowledge
in number theory from Chapter 5.
Algorithm 14.1 (Diffie–Hellman Key Exchange). One-time setup. A and B agree upon
public values prime p and generator g, where g < p and g is a primitive root of p.
Protocol.
3. A → B: send P UA
7. B → A: send P UB
The values p and g are either agreed upon in advance, or selected by one user and
sent to the other in the first message. Both values are public; the attacker is assumed to
know them.
When two users need to exchange a shared secret, one of them initiates the protocol.
User A and B actually perform the same steps, but just with different values. First a
private value P R is randomly selected. Then a public value P U is calculated. Both users
exchange their public P U values (and the attacker may learn them). Finally, both users
calculate their private values K based on their own P R and received P U . The values
and calculations are designed such that the K calculated by each user will be the same.
K is the shared secret key.
Note that the parameters have different variables or names in different sources. You
may also see:
• prime p: q
• generator g: α
• secret K: s
Exercise 14.1 (Diffie–Hellman Key Exchange). Assume two users, A and B, have agreed
to use DHKE with prime p = 19 and generator g = 10. Assuming A randomly chose
private P RA = 7 and B randomly chose private P RB = 8, find the shared secret key.
14.1. DIFFIE–HELLMAN KEY EXCHANGE ALGORITHM 155
P UA = g P RA mod p
= 107 mod 19
= 10000000 mod 19
= 15
A sends P UA = 15 to B.
Now consider from user B’s perspective. B calculates their public value using their
chosen P RB = 8::
P UB = g P RB mod p
= 108 mod 19
= 100000000 mod 19
= 17
B sends P UB = 17 to A.
B can also calculate their version of the shared secret:
KB = P UAP RB mod p
= 158 mod 19
= 2562890625 mod 19
= 5
As A has received B’s public value, A can also calculate their version of the shared
secret:
KA = P UBP RA mod p
= 177 mod 19
= 410338673 mod 19
= 5
In summary, A and B have exchanged public values and then calculated a shared
secret key of K = 5. Figure 14.1 illustrates the DHKE steps.
156 CHAPTER 14. DIFFIE–HELLMAN KEY EXCHANGE
Video
Diffie-Hellman Key Exchange with Example (18 min; Mar 2015)
https://www.youtube.com/watch?v=p_UhlXDOlfU
While we don’t show it here, it can easily be proved that DHKE will produce the
same value of K for both users.
Modular exponentiation, while slow with big numbers, is easy to calculate, i.e. can be
achieved in less than seconds.
The inverse operation of modular exponentiation, referred to as a discrete logarithm,
is hard to calculate. With large enough values, it is considered impossible to calculate.
Question 14.1 (Prove Identical Keys in DHKE). Prove that user A and user B will
always calculate the same shared secret key in DHKE. That is, prove that KA = KB .
Video
Proof of Identical Keys in DHKE (5 min; Mar 2015
https://www.youtube.com/watch?v=y5G8YMA_sDU
14.3. MAN-IN-THE-MIDDLE ATTACK ON DHKE 157
Question 14.2 (Brute Force Attack on PR in DHKE). Assuming you have intercepted
P UA = 15 from the DHKE exercise, how would you perform a brute force attack to find
P RA ? How could such a successful brute force attack be prevented in practice?
Exercise 14.2 (Discrete Logarithm Attack in DHKE). Assuming a brute force attack is
not possible, write an equation that the attacker would have to solve to find P RA .
P UA = g P RA mod p
There is one unknown variable in the equation of four variables. The equation con-
sists of modular exponentiation. The inverse operation is modular logarithm, or more
commonly discrete logarithm, which can be written as:
P RA = dlogg,p (P UA )
which can be read as “given the base g and modulus p, find the index (or exponent)
P RA that produces the result P UA ”.
• 2016: Discrete logarithm with 768 bit prime p was solved within 5300 core years
on 2.2GHz Xeon E5-2660 processor
Exercise 14.3 (MITM Attack on DHKE). Consider the “Diffie–Hellman Key Exchange”
exercise where user A chooses P RA = 7 and B chooses P RB = 8. Show how a MITM
can be performed such that an attacker Q can decrypt any communications between A
and B that use the secret shared between A and B.
158 CHAPTER 14. DIFFIE–HELLMAN KEY EXCHANGE
Solution 14.3 (MITM Attack on DHKE). In a MITM attack, the attacker Q intercepts
messages between A and B, and masquerades as A to B, and as B to A. So when A sends
its public value P UA to B, it is intercepted by Q. Q then masquerades as B: selecting
it’s own P RQA , calculating a P UQA and sending back to A. A and Q calculate a shared
secret key which will be identical. Without authentication of messages, A thinks it is
communicating with B (since it send a message to B, and received a reply from who they
think is B).
Q then performs a DHKE with B, and B thinks this is with A. The end result is that
A and Q have a shared secret, and B and Q have another shared secret, and both A and
B think their shared secret is with each other.
Figure 14.2 illustrates the MITM, where Q chooses random private value P RQA = 4
for the DHKE with A and P RQB = 12 for the DHKE with B.
Now assuming the shared secrets are used as a key in a symmetric key cipher. When
A encrypts a message and sends to B, Q can intercept and decrypt, since Q knows A’s
shared secret (9). Q can then encrypt the message with B’s shared secret (11) and send
on to B. B receives and decrypts, and subsequently responds to A. The exchange of
encrypted data continues between A and B, without them noticing that Q is intercepting
and decrypting the data.
Video
Man-in-the-Middle Attack on Diffie-Hellman Key Exchange (16 min; Mar 2015)
https://www.youtube.com/watch?v=Jokkhl8kq4c
14.4. IMPLEMENTATIONS OF DHKE 159
• Newer protocols allow for an exchange of values (e.g. a Group Exchange protocol)
As p and q are public and known to the attacker, using the same values all the time
should not be a problem. Exchanging values involves extra communication overhead and
also processing overhead. However following the principle of changing keys frequently to
give an attacker less chance to compromise them, many protocols now support the ability
to change the public parameters.
For this demo, we use the scenario of user Alice on node1 and Bob on node2. Take
note of the prompt to see who is performing each command.
The first step is to generate the Diffie-Hellman (DH) global public parameters, saving
them in the file dhp.pem. We use the OpenSSL genpkey command, using the algorithm
DH and the -genparam option:
........................................................................+..
..................................................................+........
..............................................+........................+...
.....++*++*++*
Now let’s display the generated global public parameters, first in the encoded form,
then in the text form:
alice@node1:~$ cat dhparam.pem
-----BEGIN DH PARAMETERS-----
MIGHAoGBAOZVzJ4E8766527Mp3FD71xEUYdmFan4tPcSuPO99H7n9xfAm7WytmRQ
gxNn2dz4X58FKLzVMY+x2rLyPOd8SLa3OB7tE+gKFMymswteN//lPbFeLWtyei78
7lGJNnjVDpqJFmo1nldMTDyl5Z+ueZJP5vGGs2ouvem/Cf5N5QRTAgEC
-----END DH PARAMETERS-----
Each user can use the public parameters to generate their own private and public key,
saving them in their respective files. Similar to RSA, the DH private key file also stores
the public key information.
alice@node1:~$ openssl genpkey -paramfile dhparam.pem -out dhprivkey-alice.pem
26:d8:f1:11:8c:92:37:a4:51:01:40:8d:bf:fe:6c:
fd:95:b0:11:a0:16:e4:e0:ab:8a:ef:06:01:e8:36:
a4:52:b8:bb:88:be:7c:a7:1e:4f:22:f9:7a:a6:5f:
83:58:ee:69:34:8d:12:27:d6:5d:b6:e5:36:41:d1:
a6:54:2a:a4:be:4b:4a:dc:75:fa:c8:16:af:79:a8:
e3:f5:09:7f:83:13:e7:b7:25:df:37:ea:dc:8c:77:
4e:20:33:df:a9:9c:95:cc:ef:33:3b:f4:02:b0:66:
19:8c:30:48:1e:2a:83:87:5c
prime:
00:e6:55:cc:9e:04:f3:be:ba:e7:6e:cc:a7:71:43:
ef:5c:44:51:87:66:15:a9:f8:b4:f7:12:b8:f3:bd:
f4:7e:e7:f7:17:c0:9b:b5:b2:b6:64:50:83:13:67:
d9:dc:f8:5f:9f:05:28:bc:d5:31:8f:b1:da:b2:f2:
3c:e7:7c:48:b6:b7:38:1e:ed:13:e8:0a:14:cc:a6:
b3:0b:5e:37:ff:e5:3d:b1:5e:2d:6b:72:7a:2e:fc:
ee:51:89:36:78:d5:0e:9a:89:16:6a:35:9e:57:4c:
4c:3c:a5:e5:9f:ae:79:92:4f:e6:f1:86:b3:6a:2e:
bd:e9:bf:09:fe:4d:e5:04:53
generator: 2 (0x2)
The other user uses the same public parameters, dhparam.pem, to generate their
private/public key:
bob@node2:~$ openssl genpkey -paramfile dhparam.pem -out dhprivkey-bob.pem
4c:3c:a5:e5:9f:ae:79:92:4f:e6:f1:86:b3:6a:2e:
bd:e9:bf:09:fe:4d:e5:04:53
generator: 2 (0x2)
The users must exchange their public keys. To do so, they must first extract their
public keys into separate files using the pkey command
Bob would perform a similar command as above with his keys (not shown).
We can view the public keys:
alice@node1:~$ openssl pkey -pubin -in dhpub-alice.pem -text
-----BEGIN PUBLIC KEY-----
MIIBIDCBlQYJKoZIhvcNAQMBMIGHAoGBAOZVzJ4E8766527Mp3FD71xEUYdmFan4
tPcSuPO99H7n9xfAm7WytmRQgxNn2dz4X58FKLzVMY+x2rLyPOd8SLa3OB7tE+gK
FMymswteN//lPbFeLWtyei787lGJNnjVDpqJFmo1nldMTDyl5Z+ueZJP5vGGs2ou
vem/Cf5N5QRTAgECA4GFAAKBgQDZq9eMk9/d65INV9ZRMSbY8RGMkjekUQFAjb/+
bP2VsBGgFuTgq4rvBgHoNqRSuLuIvnynHk8i+XqmX4NY7mk0jRIn1l225TZB0aZU
KqS+S0rcdfrIFq95qOP1CX+DE+e3Jd836tyMd04gM9+pnJXM7zM79AKwZhmMMEge
KoOHXA==
-----END PUBLIC KEY-----
PKCS#3 DH Public-Key: (1024 bit)
public-key:
00:d9:ab:d7:8c:93:df:dd:eb:92:0d:57:d6:51:31:
26:d8:f1:11:8c:92:37:a4:51:01:40:8d:bf:fe:6c:
fd:95:b0:11:a0:16:e4:e0:ab:8a:ef:06:01:e8:36:
a4:52:b8:bb:88:be:7c:a7:1e:4f:22:f9:7a:a6:5f:
83:58:ee:69:34:8d:12:27:d6:5d:b6:e5:36:41:d1:
a6:54:2a:a4:be:4b:4a:dc:75:fa:c8:16:af:79:a8:
e3:f5:09:7f:83:13:e7:b7:25:df:37:ea:dc:8c:77:
4e:20:33:df:a9:9c:95:cc:ef:33:3b:f4:02:b0:66:
19:8c:30:48:1e:2a:83:87:5c
prime:
00:e6:55:cc:9e:04:f3:be:ba:e7:6e:cc:a7:71:43:
ef:5c:44:51:87:66:15:a9:f8:b4:f7:12:b8:f3:bd:
f4:7e:e7:f7:17:c0:9b:b5:b2:b6:64:50:83:13:67:
d9:dc:f8:5f:9f:05:28:bc:d5:31:8f:b1:da:b2:f2:
3c:e7:7c:48:b6:b7:38:1e:ed:13:e8:0a:14:cc:a6:
b3:0b:5e:37:ff:e5:3d:b1:5e:2d:6b:72:7a:2e:fc:
ee:51:89:36:78:d5:0e:9a:89:16:6a:35:9e:57:4c:
4c:3c:a5:e5:9f:ae:79:92:4f:e6:f1:86:b3:6a:2e:
bd:e9:bf:09:fe:4d:e5:04:53
generator: 2 (0x2)
After exchanging public keys, i.e. the files dhpub-alice.pem and dhpub-bob.pem,
each user can derive the shared secret. Alice uses her private key and Bob’s pub-
lic key to derive a secret, in this case a 128 Byte binary value written into the file
secret-alice.bin:
alice@node1:~$ openssl pkeyutl -derive -inkey dhprivkey-alice.pem -peerkey
dhpubkey-bob.pem -out secret-alice.bin
Bob does the same using his private key and Alice’s public key to produce his secret
in the file secret-bob.bin:
14.6. DHKE IN PYTHON 163
The secrets should be the same. Although there is no need for Bob to send his secret
file to Alice, if he did, then Alice can use cmp to compare the files, or even xxd to manually
inspect the binary values:
alice@node1:~$ cmp secret-alice.bin secret-bob.bin
alice@node1:~$ xxd secret-alice.bin
0000000: b7cb b892 b541 7810 d8ec d089 6c89 3c19 .....Ax.....l.<.
0000010: e8e1 27d8 66ee dac8 684a f0bd 0a7f e7d3 ..’.f...hJ......
0000020:␣3643␣8654␣fddf␣4399␣e58e␣2c7c␣3d33␣9532␣␣6C.T..C...,|=3.2
0000030:␣f693␣edf2␣c9a0␣40e8␣58b8␣38de␣74a5␣c0b0␣␣[email protected]...
0000040:␣64ab␣4006␣a3cd␣d795␣2cef␣d0fc␣2b0f␣d1ab␣␣d.@.....,...+...
0000050:␣d1e5␣1a2a␣3431␣e3fa␣ba63␣f7cf␣1c61␣ff65␣␣...*41...c...a.e
0000060:␣d9cd␣c85d␣c5fe␣5c50␣c543␣aaeb␣de49␣8501␣␣...]..\P.C...I..
0000070:␣6cf1␣66a6␣87b6␣ddec␣835c␣b4b1␣3d9d␣e2fe␣␣l.f......\..=...
alice@node1:~$␣<kbd>xxd␣secret-bob.bin</kbd>
0000000:␣b7cb␣b892␣b541␣7810␣d8ec␣d089␣6c89␣3c19␣␣.....Ax.....l.<.
0000010:␣e8e1␣27d8␣66ee␣dac8␣684a␣f0bd␣0a7f␣e7d3␣␣..’.f...hJ......
0000020: 3643 8654 fddf 4399 e58e 2c7c 3d33 9532 6C.T..C...,|=3.2
0000030: f693 edf2 c9a0 40e8 58b8 38de 74a5 c0b0 [email protected]...
0000040: 64ab 4006 a3cd d795 2cef d0fc 2b0f d1ab d.@.....,...+...
0000050: d1e5 1a2a 3431 e3fa ba63 f7cf 1c61 ff65 ...*41...c...a.e
0000060: d9cd c85d c5fe 5c50 c543 aaeb de49 8501 ...]..\P.C...I..
0000070: 6cf1 66a6 87b6 ddec 835c b4b1 3d9d e2fe l.f......\..=...
Now both Alice and Bob have a shared secret, securely exchanged across a public
network using DHKE.
• https://cryptography.io/en/latest/hazmat/primitives/asymmetric/
164 CHAPTER 14. DIFFIE–HELLMAN KEY EXCHANGE
Chapter 15
RSA (Chapter 13) and Diffie-Hellman (Chapter 14) are two widely-used public key cryp-
tography algorithms. Their security depends on the difficulty of factoring large integers
into primes and solving discrete logarithms for integers, respectively. Their problem how-
ever is that keys are relatively large (e.g. 2048-bits for RSA). This leads to high commu-
nications overhead when exchanging keys in security protocols, and possibly performance
limitations when implementing on low-cost computers.
Elliptic Curve Cryptography (ECC) is another, newer approach to public key cryp-
tography. Mathematical operations are performed on an elliptic curve, where some oper-
ations can be easy if certain values are known, but practically impossible of those values
are unknown. This is similar to the integer factorisation and discrete logarithm problems
that make RSA and Diffie-Hellman secure. In fact, the problem is solving a discrete
logarithm on an elliptic curve (rather than for integers as in Diffie-Hellman).
The main benefit of ECC is in performance. Specifically to achieve similar level of
security as RSA and Diffie-Hellman, ECC has much smaller key sizes: 100’s of bits vs
1000’s of bits. Chapter 18 gives common recommended key lengths for RSA, Diffie-
Hellman and ECC. In the past, RSA and (normal) Diffie-Hellman were favoured as ECC
was relatively new. But now ECC is used in many applications, e.g. secret key exchange
regularly uses the elliptic curve form of Diffie-Hellman rather than the normal, integer-
based form1 .
This chapter gives a brief, as simple-as-possible, introduction to ECC.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
y 2 = x3 + ax + b
File: crypto/elliptic.tex, r1949
1
When referring to “Diffie-Hellman”, we normally mean the algorithm based on integer discrete
logarithms. However there is a Diffie-Hellman algorithm based on elliptic curve discrete logarithms. We
will refer to this as “ECDH”.
165
166 CHAPTER 15. ELLIPTIC CURVE CRYPTOGRAPHY
The constraints on a and b specify the relationship between the values, i.e. you cannot
necessarily choose any values. We will not go into that detail here.
Figure 15.1 shows an example elliptic curve where a = −3 and b = 5, plotted for x
values from -4 to 4. An elliptic curve always mirrors itself about the horizontal (red)
axis.
Definition 15.2 (Addition Operation with an Elliptic Curve). Select two points on the
curve, A and B, and draw a straight line through them. The line will intersect with the
curve at a third point, R (and no other points). The horizontal inverse of point R, is
defined as the addition of A and B.
A + B = −R
See the following figure for an example of this concept. Note the points, A, B, R and
-R are just (x, y) coordinates.
Figure 15.2 shows the concept of addition. Adding the points A and B results in the
point shown as A+B. There is always a third point that intersects the curve on the line
between A and B, and there is always an inverse of this point.
Note that we could continue the addition. For example, with A+B, add another point
C, to arrive at a new point A+B+C. And so on.
Rather than adding two different points, we can simply add a single point to itself.
The same concepts apply.
Figure 15.3 shows the self addition of point P. When adding a single point P to itself,
the line that intersects P is chosen as the line tangent to P. So P+P = 2P.
We can continue to add P.
15.1. OVERVIEW OF ELLIPTIC CURVE CRYPTOGRAPHY 167
Figure 15.4 shows P + 2P = 3P. Then we can add P again to get 4P and so on.
Figure 15.5 shows NP. In this example N=13. That is, we start with point P, and
add P twelve times, resulting in the point 13P.
So now we know the concept of point addition on an elliptic curve, how can that be
used for cryptography?
As with other public key systems, elliptic curve cryptography relies on the fact that
it is easy for the user to generate the public and private key, but practically impossible
for an attacker to find the private key from the public key.
Why is that the case? So far we said N P is found by adding P N − 1 times, that
is, takes N − 1 addition operations. So an attacker could simply start with P , and keep
adding P until they get an answer of N P . Now the know how many additions, i.e. the
private value N .
However if N is large enough the attackers method will be practically impossible. And
for the user to generate N P when they know N , there is a shortcut that is practically
achievable.
– Calculate P , P + P = 2P = 21 P , 2P + 2P = 4P = 22 P , 4P + 4P = 8P = 23 P ,
. . . , 2255 P (255 additions)
– Write N as binary expansion, e.g.:
∗ N = 233 = 27 + 26 + 25 + 23 + 20
∗ N P = 27 P + 26 P + 25 P + 23 P + 20 P
∗ In this example, there are 4 point additions
∗ Maximum number of point additions for 256-bit N is 255
– Calculate N P using the binary expansion
– Maximum number of point additions for 256-bit N : 255 + 255 = 510
In summary, knowing the b-bit value N , the user needs to perform about 2 × b point
additions. This is easy. But the attacker, who doesn’t know N , must perform about 2b
point additions, which is practically impossible.
• But to ensure all values contained within finite coordinate space, modular arith-
metic is used
• p is a prime number
The figures and examples given previously shown an elliptic curve without modular
arithmetic. But in elliptic curve cryptography, modular arithmetic occurs. The same
principles, and reasoning why it is hard for the attacker, still apply. The plots of the
elliptic curve in modular arithmetic look different however—they now have distinct points
in a finite coordinate space. Search online for examples.
Next we will see how elliptic curves are applied to build cryptographic mechanisms.
The most common applications are for secret key exchange, especially with Elliptic
Curve Diffie-Hellman (ECDH), and digital signatures with Elliptic Curve Digital Signa-
ture Algorithm (ECDSA). We will look at ECDH in the following.
Algorithm 15.1 (Elliptic Curve Diffie-Hellman Key Exchange). Assume users A and B
have EC key pairs: P UA = N P , P RA = N , P UB = M P , P RB = M .
1. User A calculates secret SA = N · P UB = N M P using shortcut point addition.
An attacker that knows the public keys and initial point P has to find either N or
M . If those numbers are large enough, this is practically impossible.
Until now we have referred to general or example elliptic curves without specifying
the parameter values. In practice, users of ECC do not select their own parameters, but
rather use standardised parameters.
SECG in SEC 2 defined a large set of curves. The NIST curves were a subset of the
SEC 2 curves. NSA Suite B curves are a subset of NIST curves.
For a selected curve, you can see the detailed parameters. For example, for the
secp256k1 curve:
B: 7 (0x7)
Generator (uncompressed):
04:79:be:66:7e:f9:dc:bb:ac:55:a0:62:95:ce:87:
0b:07:02:9b:fc:db:2d:ce:28:d9:59:f2:81:5b:16:
f8:17:98:48:3a:da:77:26:a3:c4:65:5d:a4:fb:fc:
0e:11:08:a8:fd:17:b4:48:a6:85:54:19:9c:47:d0:
8f:fb:10:d4:b8
Order:
00:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:
ff:fe:ba:ae:dc:e6:af:48:a0:3b:bf:d2:5e:8c:d0:
36:41:41
Cofactor: 1 (0x1)
Public/private key pairs can be generated from named curve (e.g. secp256k1) or by
first outputting curve parameters to a file. Here we will show the latter:
alice@node1:~$ openssl ecparam -name secp256k1 -out secp256k1.pem
alice@node1:~$ openssl ecparam -in secp256k1.pem -genkey -noout -out alice-k
ey.pem
Alternatively, you could combine the above two commands into a single, by specifying
the -name of the curve rather than the -in file. The OpenSSL Command Line Elliptic
Curve Operations wiki explains the different options, as well as ensuring parameters are
in a format that can be used by different versions of OpenSSL.
Once the curve parameters file (e.g. secp256k1.pem is generated, you can use the
genpkey, key and pkeyutl operations in a similar manner as with Diffie-Hellman in
Section 14.5.
Part V
Authentication
173
Chapter 16
This chapter introduces two primitives used in authentication and data integrity: cryp-
tographic hash functions and Message Authentication Codes. While these primitives can
be based on symmetric key ciphers (and occasionally public key ciphers), in many cases
they are custom-designed algorithms to meet the specific needs for authentication.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
• Hash functions
– Takes message as input and returns short, unique and random-looking output
– Different inputs will produce different outputs
– Also called: Modification Detection Code (MDC), unkeyed hash function
– Output called: hash (h), digital fingerprint, imprint, message digest
– h = H(M )
– Takes message and a secret key as input and returns short, unique and random-
looking output
– Different inputs (key and/or data) will produce different outputs
– Also called: keyed hash function
– Output called: tag (t), code or MAC
– t = M AC(K, M )
File: crypto/hash.tex, r1951
175
176 CHAPTER 16. HASH FUNCTIONS AND MACS
– Given the output (hash/tag), attacker cannot find the input message
– Given one message, attacker cannot find another message with same output
(hash/tag)
– Attacker cannot find any two messages that produce same output (hash/tag)
Note that there is different terminology used for the properties. The names in paren-
theses are an alternative form.
The first two properties are similar from a security perspective: most algorithms
that have one property also have the other. However the third property of (strong)
collision resistance is harder to provide. That is, some algorithms may have the first two
properties, but not the third of (strong) collision resistance.
– preimage, 2nd preimage, collision resistance (if attacker can perform chosen
message attack)
– none
– preimage, 2nd preimage, collision resistance (if attacker can perform chosen
message attack)
– preimage resistant
16.2. INTRODUCTION TO HASH FUNCTIONS 177
A hash function is an algorithm that usually takes any sized input, like a file or a
message, and produces a short (e.g. 128 bit, 512 bit) random looking output, the hash
value. If you apply the hash function on the same input, you will always get the exact
same hash value as output. In practice, if you apply the hash function on two different
inputs, you will get two different hash values as output.
• Message authentication
• Digital signatures
• Storing passwords
Hash functions are important in many areas of security. They are typically used to
create a fingerprint/signature/digest of some input data, and then later that fingerprint
is used to identify if the data has been changed. However they also have uses for hiding
original data (storing passwords) and generating random data. Different applications may
have slightly different requirements regarding the security (and performance) properties
of hash functions.
There are three general approaches to design hash functions:
Based on Block Ciphers Well-known and studied block ciphers are used with a mode
of operation to produce a hash function. Generally, less efficient than customised
hash functions.
Customised Hash Functions Functions designed for the specific purpose of hashing.
Disadvantage is they haven’t been studied as much as block ciphers, so harder to
design secure functions.
Credit: ECRYPT CSA Algorithms, Key Size and Protocols Report, 2018
Figure 16.1 shows selected hash functions, classified for legacy or future use. It is
taken from the ECRYPT-CSA 2018 report on Algorithms, Key Sizes and Protocols.
The authors classified hash functions as legacy, meaning secure for near future, and
future, meaning secure for medium term. It includes history hash functions no longer
recommended, such as MD5, RIPEMD-128 and SHA-1. There are many other hash
functions. Wikipedia has a nice comparison.
Exercise 16.1 (Number of Collisions). If H1 takes fixed length 200-bit messages as input,
and produces a 80-bit hash value as output, are collisions possible?
Solution 16.1 (Number of Collisions). Yes. In this simplistic example (since hash func-
tions normally take variable length messages), there are 2200 possible different inputs. A
hash function maps an input to an output hash value. There are 280 possible output
hash values. That means at least two of the inputs must map to the same output hash
value, i.e. a collision. Assuming the hash function distributes the pre-images to hash val-
ues in a uniformly random manner, then on average, each hash value has 2200−80 = 2120
pre-images.
The point is, that if the input message length is larger than the output hash value
(and in practice, it always is), then collisions are theoretically possible. One aspect of
designing cryptographically secure hash functions is to make it practical impossible for
an attacker to find useful collisions.
Now let’s restate the general requirements of a cryptographic hash function.
Properties: Satisfies one or more of the properties: Pre-image Resistant, Second Pre-
image Resistant, Collision Resistant
Now let’s define several common required properties of cryptographic hash functions.
Informally, it is hard to inverse the hash function. That is, given the output hash
value, find the original input message.
Definition 16.4 (Second Pre-image Resistant Property). For any given x, it is compu-
tationally infeasible to find y 6= x with H(y) = H(x). Also called weak collision resistant
property.
To break this property, the attacker is trying to find a collision. That is, two input
messages x and y that produce the same output hash value. Importantly, the attacker
cannot choose x. They are given x and must find a different message y that produces a
collision.
To break this property, again the attacker is trying to find a collision. However in this
case the attacker has the freedom to find any messages x and y that produce a collision.
This freedom makes it easier for the attacker to perform an attack against this property
than against the Second Pre-image Resistant property.
Exercise 16.2 (Brute Force Attack on Hash Function). Consider a hash function to
be selected for use for digital signatures. Assume an attacker has compute capabilities
to calculate 1012 hashes per second and is prepared to wait for approximately 10 days
for a brute attack. Find the minimum hash value length that the hash function should
support, such that a brute force is not possible.
Solution 16.2 (Brute Force Attack on Hash Function). There are two cases to consider.
If the hash function and network is subject to a chosen message attack, then the hash
function should support all three properties. Preimage and Second Preimage Resistant
properties required effort of approximately 2b for the attacker. But attacking Collision
Resistant property requires significantly less effort, 2b/2 . Therefore the hash length, b,
must be sufficient so that an attack on Collision Resistant property is not possible.
2b/2
> 10 × 24 × 60 × 60
1012
2b/2 > 1012 × 10 × 86400
b/2 > log2 (8.64 × 1017 )
b/2 > 59.583
b > 119.168
we will see in depth in Chapter 17. However note that hash functions do not use any
secret key as input. A variation is to introduce a secret key as input, resulting in a keyed
hash function.
– h = H(K, M )
– Keyed hash function or Message Authentication Code (MAC)
• Hashes and MACs can be used for message authentication, but hashes also used
for multiple other purposes
Now we will shift our focus to MACs, first looking at the general design approaches.
The motivation for different design approaches is similar to that for hash function
design approaches.
Definition 16.6 (Computation Resistance of MAC). Given one or more text-tag pairs,
[xi , MAC(K, xi )], computationally infeasible to compute any text-tag pair [y, MAC(K, y)],
for a new input y 6= xi
Assume an attacker has intercepted messages (text) and the corresponding MACs
(tags). They have i such text-tag pairs. Now there is a new message y. It should
be practically impossible for the attacker to find the corresponding tag of y, that is,
MAC(K, y).
Given what the attacker must do, the security of MACs can be defined based on the
effort of brute force attacks.
This chapter shows how messages can be authenticated, including ensuring data integrity,
using various cryptographic primitives, especially hash functions and MACs from Chap-
ter 16.
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
1. Disclosure: encryption
183
184 CHAPTER 17. AUTHENTICATION AND DATA INTEGRITY
Figure 17.1 shows symmetric key encryption used for confidentiality. On the left is
the sender A, and on the right is the receiver B. In the middle (between the dashed lines)
is the information sent from A to B. Only B (and A) can recover the plaintext. However
in some cases this also provides:
• Source Authentication: A is only other user with key; B knows it must have come
from A
• Data Authentication: successfully decrypted implies data has not been modified
The source and data authentication assumes that the decryptor (B) can recognise that
the result of the decryption, i.e. the output plaintext, is correct.
The assumption about being able to recognise the correct plaintext is explored next.
The typical answer for above is yes, the plaintext was sent by A and nothing has
been modified. This is because the plaintext “makes sense”. Our knowledge of most
ciphers (using the English language) is that if the wrong key is used or the ciphertext has
been modified, then decrypting will produce an output that does not make sense (not a
combination of English words).
Based on the previous argument, the answer is no. Or more precise, either the plain-
text was not sent by A, or the ciphertext was modified along the way. This is because
the plaintext makes no sense, and we were expected it to do so.
• Solutions:
Figure 17.3 shows a different scheme where only the hash value is encrypted. The
receiver can verify that nothing has been changed. This scheme provides authentica-
tion, but does not attempt to provide confidentiality. This is useful in reducing any
computation overhead when confidentiality is not required.
where H(M ) = H(M 0 ). As a result, the attacker can modify M to M 0 , but leave the
remainder of the sent information, E(K, H(M )) as is. They forward M 0 ||E(K, H(M ))
to B. User B decrypts with the key shared with A, then compare the hash value with
H(M 0 ). They match. Therefore user B trusts the message, but in fact it has been subject
to a modification attack.
Figure 17.4 shows a scheme the provides authentication, but without using any en-
cryption. Avoiding encryption can be desirable in very resource constrained environments.
S is a secret value shared by A and B. Concatenating the secret with the message, and
then hashing the result, allows the receiver the verify the plaintext is correct, and keeps
the secret confidential.
Exercise 17.2 (Attack of Authentication with Hash of Shared Secret). If a hash function
did not have the Preimage Resistant property, then demonstrate an attack on the scheme
in Figure 17.4.
Solution 17.2 (Attack of Authentication with Hash of Shared Secret). The attacker
intercepts the message M ||H(M ||S). If the Preimage Resistant property does not hold,
then it is possible for an attacker, given a hash value, to find the original input, i.e. the
preimage. That is the attacker find M ||S. Since they also know M , it is easy to find S,
i.e. the remaining bits. The attacker now knows the shared secret and could masquerade
as A.
In Section 17.5 we will see the role of hash functions in digital signatures.
• MACs have advantage over hashes in that if encryption is defeated, then MAC still
provides integrity
188 CHAPTER 17. AUTHENTICATION AND DATA INTEGRITY
• But two keys must be managed: encryption key and MAC key
Definition 17.1 (Encrypt-then-MAC). The sender encrypts the message M with sym-
metric key encryption, then applies a MAC function on the ciphertext. The ciphertext
and the tag are sent, as follows:
Definition 17.2 (MAC-then-Encrypt). The sender applies a MAC function on the plain-
text, appends the result to the plaintext, and then encrypt both. The ciphertext is sent,
as follows:
E(K1 , M ||MAC(K2 , M ))
Definition 17.3 (Encrypt-and-MAC). The sender encrypts the plaintext, as well ass
applying a MAC function on the plaintext, then combines the two results. The ciphertext
joined with tag are sent, as follows:
E(K1 , M )||MAC(K2 , M )
– StackExchange
– Section 1 and 2 of Authenticated Encryption by J Black
– But cannot prove which user created the data since two users have the same
key
– Can prove that data came from only 1 possible user, since only 1 user has the
private key
• Digital signature
A digital signature has the same purpose of a handwritten signature: to prove that a
document (or message or file) is approved by and originated from one particular person.
If a message is signed, the signer cannot claim they did not sign it (since they are the
only person that could create the signature). Similar, someone cannot pretend to be
someone else, since they cannot create that other persons signature. Of course hand-
written signatures are imprecise and sometimes forgeable. Digital signatures are much
more secure, making it practically impossible for someone to forge a signature or modify
a signed document without it being noticed.
In practice, a digital signature of a message is created by first calculating a hash of
that message, and then encrypting that hash value with the private key of the signer.
The signature is then attached to the message.
The hash function is not necessary for security, but makes signatures practical (the
signature is short fixed size, no matter how long the message is).
• Signing
190 CHAPTER 17. AUTHENTICATION AND DATA INTEGRITY
– User A signs a message by encrypting hash of message with own private key:
S = E(P RA , H(M ))
– User attaches signature S to message M and sends to user B
• Verification
Key Management
191
Chapter 18
– Ciphers to use
– Key lengths or hash lengths
– Security level
• ECRYPT-CSA Project 2018 report on Algorithms, Key Size and Protocols (PDF)
193
194 CHAPTER 18. KEY DISTRIBUTION AND MANAGEMENT
Three different levels of security are given: legacy, current (near-term) and future
(long-term). Current or future levels of security should be used, although legacy levels
may still be secure for some cases.
Chapter 19
Digital Certificates
195
196 CHAPTER 19. DIGITAL CERTIFICATES
Part VII
Advances in Cryptography
197
Chapter 20
This chapter contains brief notes on concepts related to quantum computing and quantum
cryptography. The intention is to be able to understand the role of quantum computing
with respect to attacking ciphers, as well as the security mechanism quantum cryptogra-
phy provides.
Disclaimer: These are very rough notes. A lack of time and in-depth understanding
of quantum computing on my part means there are likely errors, some parts may be
confusing, and the presentation is quite poor (mainly definitions which are not actually
definitions; insufficient examples or diagrams). However it should be enough to gain an
idea what role quantum technology plays in cryptography. The plan is to update the
content after feedback from others.
This information comes from a collection of resources, including Wikipedia pages,
news articles and videos. Some, but definitely not all, of those sources are, in no particular
order:
• https://quantum.country/
• https://blogs.iu.edu/sciu/2019/07/13/quantum-computing-parallelism/
• https://arxiv.org/abs/quant-ph/0507023
• http://www.columbia.edu/~jpp2139/decoherence-superconducting-qubitsWEBv2.
pdf
• https://www.ibm.com/quantum-computing/
• https://qiskit.org/textbook/preface.html
Presentation slides that accompany this chapter can be downloaded in the following
formats: slides only (PDF); slides with notes (PDF, ODP, PPTX).
199
200 CHAPTER 20. QUANTUM COMPUTING AND CRYPTOGRAPHY
There are two important issues about measuring a qubit. First, the result will either
be 0 or 1. However when the qubit is in a superposition state of α|0i + β|1i, then we
don’t know in advance which value will be output from the measurement. But we do
know that with probability α2 it will be bit 0 and with probability β 2 it will be bit 1.
By controlling the weights, α and β, we can increase the probability that a useful output
will be measured.
The other issue is that upon measurement, the qubit reverts to one of the basis states.
It will no longer be a superposition of states.
Quantum entanglement is another concept, which you may hear about when referring
to quantum communications and quantum teleportation. We will not cover it in any
depth here, but present a simple example in the following.
Entanglement can be achieved for example by firing a laser at a crystal that causes
two photons to split but be entangled.
Example 20.2 (qubit Entanglement). If 2 qubits are entangled, then if one qubit is
measured to be 0, then the other qubit will also be measured to be 0 (and similar if
measured as 1).
This definition of quantum computation is quite vague. How are the states of the
qubits modified? Using logic gates to form circuits. One point to note is that at the
end the result is measured. As noted before, measuring a quantum system will return
some binary value with some probability and collapes any superpositions. This means
that any speed up to be potentially be obtained by quantum computing needs to be done
before the measurement.
Definition 20.8 (Classical Computer Circuits). Circuits in classical computers are built
from logic gates, such as AND, NOT, OR, XOR, NAND and NOR.
Note that AND and NOT gates are the universal set: everything else can be built
from them.
A single-bit gate takes a single qubit as input and produces a single qubit as output.
Now we can arrive at a simple definition of a quantum computer.
202 CHAPTER 20. QUANTUM COMPUTING AND CRYPTOGRAPHY
We are only covering a digital quantum computer. The topics of quantum simulation
and quantum annealing are not covered here.
√For example, if the two qubits are constructed so that β = 0 and δ = 0, and α = γ =
1/ 2, then there is 50% probability of measuring 00 and 50% probability of measuring
10. There is no chance of measuring 01 or 11.
Now we get to the benefit of quantum computing.
Definition 20.12 (Quantum Parallelism). Consider a circuit that takes x as input and
returns f (x) as output. Normally, passing in an input, sees the function applied once,
and one output produced. Using quantum gates, such as a Fredkin gate, if x is a quantum
register with a superposition of states, it is passed as input and the function is applied
once. But the function operates on all of the states of the quantum register, returning
output that contains information about the function applied to all states.
The parallelism that can be achieved is the promising feature of quantum computing.
The following example aims to illustrate the idea.
Example 20.3 (Classical Function). Consider the function f (x) = 3x mod 8. Assume
we want to calculate all possible answers for x = 0, 1, 2, . . . , 7. With a classical computer
we would have a 3-bit input to a circuit that calculates f (x), i.e. performs the modular
multiplication. To find all possible answers we would calculate f (0) = 0, f (1) = 3,
f (2) = 6, f (3) = 1, f (4) = 4, f (5) = 7, f (6) = 2, and f (7) = 5. The function/circuit is
applied 8 times.
The above example used decimal values, but also consider their binary values, i.e. the
function is applied to 8 values: 000, 001, 010, 011, 100, 101, 110 and 111.
Example 20.4 (Quantum Function). Now consider the same function, f (x) = 3x mod 8,
but implemented with a quantum circuit. We initialise a quantum register with 3 qubits.
This register is in a superposition of 8 states at once: 000, 001, 010, 011, 100, 101, 110
and 111. The quantum register is input to the circuit. The output register will have 3
qubits in a superposition that contains all 8 answers. By applying the function/circuit
only once, we obtain an output that has information about all 8 answers. This represents
a speedup of a factor of 8 compared to the classical example!
20.2. QUANTUM ALGORITHMS 203
While this a contrived example with many real flaws, it aims to demonstrate that
quantum parallelism is achieved by the fact that the quantum calculation is one all
states of the quantum register, rather than just a single value as in classical computing.
You should already recognise a problem with the above example. While the output
quantum register contains qubits in a superposition that contains information about all
8 answers, when we measure the output register we get just one of those answers with
some probability, i.e. the measurement problem. If the probabilities were all equal, i.e.
12.5%, then when we measure the output we would get a value of 000 with probability
12.5%. If we did it again, we may get 011 with probability 12.5%. So the answer is
essentialy useless to us; we’d need to calculate 8 times, resulting in the same effort as a
classical computer. Quantum algorithms are designed so that the weights/probabilities
of the output do give the “correct” answer with high probability.
Definition 20.13 (Quantum Algorithm). A quantum algorithms are usually a com-
bination of classical algorithms/computations and quantum computations. First pre-
processing is performed using classical techniques. Then the input quantum register is
prepared, a quantum calculation performed, and output quantum register is measured.
There may be some post-processing of the result with classical techniques. If the result
is as desired, then exit, otherwise repeat the process. Repetition is usually needed due
to both errors in quantum calculations and the probabilistic nature of the result.
The main point to note is that “quantum” algorithms actually are a hybrid of classical
algorithms and quantum calculations.
The following are two examples of quantum algorithms which are relevant to cryp-
tography.
Definition 20.14 (Grover’s Search Algorithm). Consider a database of N unstructured
data items (e.g. not sortable). Search is performed by applying a boolean function on
input that returns true if correct answer. Classical search
√ takes O(N ) applications of
function. Grover’s quantum search algorithm takes O( N ) applications of function.
Grover’s search algorithm can be used for a brute-force attack. For example with
a symmetric key cipher, assume we have a function that decrypts the ciphertext and
returns true of the obtained plaintext is correct.
Table 20.1: Worst Case Brute Force Attempts with Classical and Quantum Algorithms
Table 20.1 shows worst case number of attempts a brute-force attack on √a key , using
either a classical algorithm or Grover’s quantum search algorithm. Note that 2N = 2N/2 .
While the quantum algorithm produces a significant speedup, with regards to protecting
symmetric key ciphers against brute force attacks using quantum computers, an easy
solution is to double the key length. That is, if a 128-bit key was recommended as secure
against brute force attacks using today’s classical computers, then to be secure against
204 CHAPTER 20. QUANTUM COMPUTING AND CRYPTOGRAPHY
brute force attacks with future quantum computers, use a 256-bit key. While using a
double length key incurs a performance drop for AES, it is not so substantial that makes
AES too slow to use, and does not require a new algorithm design.
Now let’s look at the promising benefits of quantum computing regarding breaking
ciphers, factoring numbers. Recall that integer factorisation is a problem that public
key algorithms, such as RSA, are built around. That is, the security of RSA depends
on the difficulty of integer factorisation. Let’s look at how the best known algorithms
on classical and quantum computers perform (we will not look at how those algorithms
actually work).
Definition 20.15 (Integer Factorisation with General Number Field Sieve). Given an
integer N , find its prime factors. A general number field sieve on classical computer takes
subexponential time, about 2O(N ) .
1/3
The paper A Blueprint For Building a Quantum Computer by Rodney Van Meter and
Clare Horsman, published in Communications of the ACM, October 2013, has compared
the speeds for specific implementations of algorithms on classical and quantum computers.
Note that the following results are mainly theoretical, estimating the performance based
on several actual measurements with smaller numbers.
Credit: Figure 1 from A Blueprint For Building a Quantum Computer by Van Meter and Horsman, Communications of the ACM, Oct 2013.
Figure 20.1: Scaling the classical number field sieve (NFS) vs. Shor’s quantum algorithm
for factoring
Figure 20.1 shows estimated time to factor a L-bit number. The number field sieve on
the solid black line is using a classical computer. The cross on that line is for the point
of L=768 bits and 3300 CPU years. The NIST recommended key length is L=2048 bits.
The lines labelled with Shor are using a quantum computer. The four lines for Shor are
20.3. ISSUES IN QUANTUM COMPUTING 205
different algorithms and architectures, as well as different quantum clock speeds (1Hz vz
1MHz).
One way to read the figure is to look at the number of bits that can be factored in 1
year. A 1GHz classical computer using number field sieve could factor a 500 bit number.
A quantum computer using Shor’s algorithm and with a 1 Hz clock could factor a 80 bit
number. But with a 1 MHz clock it could factor a 8000 bit number.
Is it likely that quantum computers will break RSA in the near future? Michele
Mosca and Marco Piani, from evolutionQ and the Global Risk Institute, interviewed 22
experts in quantum computing, and one question was about the likelihood that quantum
computers being a significant threat to public-key cryptosystems in the future.
Credit: Quantum Threat Timeline Report, Michele Mosca and Marco Piani, from evolutionQ and the Global Risk Institute, 2019.
Definition 20.18 (Errors in Quantum Computing). Errors frequently occur due to var-
ious reasons including: decay of individual qubits; environmental defects that impact
multiple qubits; interference between qubits and other systems; accidental measurement
of qubits; and even loss of qubits. Significant research effort is on designing error cor-
recting schemes.
Error correcting schemes introduce an overhead, and one concern is that the overhead
needed to deal with errors may mean quantum computing does not produce significant
advantages over classical computing.
Figure 20.3: Quantum error rates vs qubits and intended direction of Google Quantum
Research
Figure 20.3, taken from A Preview of Bristlecone, Google’s New Quantum Processor
by Google Quantum AI Lab, illustrates the conceptual relationship between error rates
and qubits. The error correction threshold indicates error rates below this are needed for
error correction to work.
Definition 20.19 (Cooling). For qubits to maintain coherence, quantum circuits need
to be very cold, approaching 0 Kelvin or -273 C.
• IBM: 5- and 16-qubit machines available for free; 20-qubit machine available via
cloud; 53-qubit machine (2019)
• D-Wave systems: 2000Q has 2048-qubits, however using different technology (quan-
tum annealing) that cannot be used to solve Shor’s algorithm
20.4. QUANTUM CRYPTOGRAPHY 207
Note that while quantum computers can be used to break cryptographic mechanisms
(e.g. using Schor’s algorothm), quantum cryptography is separate topic of quantum sys-
tems that is about creating cryptographic mechanisms. Quantum cryptographic mecha-
nisms will use quantum computers.
Definition 20.21 (Quantum Key Distribution (informal)). The aim of Quantum Key
Distribution (QKD) is for two parties to exchange a secret key (similar to DHKE). A
chooses random bits, as well as corresponding random modification of states (called
sending basis). Applied together using a fixed scheme, A generates and sends photons in
quantum states. B chooses own random measuring basis and measures the photons. A
then informs B their sending basis, and allowing B to recognise which of the measured
photons to consider (i.e. those where the measuring basis and sending basis match). B
uses the resulting bits as a secret key, however only after confirming with A that there
are no errors in the key (e.g. sending a challenge encrypted with the key).
Credit: Bennett and Brassard, Quantum cryptography: Public key distribution and coin tossing, Theoretical Computer Science, Dec 2014,
Copyright Elsevier.
Figure 20.4 is taken from the original 1984 article by Bennet and Brassard, which was
re-published by Elsevier in the journal Theoretical Computer Science in 2014. BB84 is
a scheme still used for quantum key distribution. The paper, in section III, has a nice
explanation of the protocol.
Definition 20.22 (QKD security (informal)). An attacker C tries to learn the secret
key between A and B, without A or B knowing. Therefore the attacker has to measure
the photons sent by A. However, as the photons are a superposition of states, when C
measures them, they are changed. As a result, B will receive changed photons, and when
they check the secret key with A, the check will fail.
208 CHAPTER 20. QUANTUM COMPUTING AND CRYPTOGRAPHY
The security of quantum key distribution depends on that measurement problem, i.e.
that measuring a quantum superposition state, changes the state. The attacker cannot
measure the communications between A and B without changing the communications.
It is easy for A and B to recognise if the communications have been changed.
Additional Resources
209
Appendix A
Cryptography is a large, complex topic. However even if the details are not understood,
we can still apply concepts from cryptography to design secure systems. This chapter
lists some common assumptions that are made about cryptographic techniques as well
as some principles that are used in designing secure systems. Although in theory the
assumptions do not always hold, they are true in many practical situations (and when
they are not true, it will be made clear).
A.1 Assumptions
A.1.1 Encryption
A1. Symmetric key cryptography is also called conventional or secret-key cryptography.
A3. In symmetric key crypto, the same secret key, K, is used for encryption, E(), and
decryption, D(). The secret is shared between two entities, i.e. KAB .
A4. In public key crypto, there is a pair of keys, public (PU ) and private (PR). One key
from the pair is used for encryption, the other is used for decryption. Each entity
has their own pair, e.g. (P UA , P RA ).
A6. Decrypting ciphertext with the correct key will produce the original plaintext. The
decrypter will be able to recognise that the plaintext is correct (and therefore the
key is correct). E.g. P = D(KAB , C) or M = D(P RA , C).
A7. Decrypting ciphertext using the incorrect key will not produce the original plain-
text. The decrypter will be able to recognise that the key is wrong, i.e. the decryp-
tion will produce unrecognisable output.
File: crypto/secassume.tex, r1697
211
212 APPENDIX A. CRYPTOGRAPHY ASSUMPTIONS AND PRINCIPLES
A9. An attacker knows which algorithm is being used, and any public parameters of
the algorithm.
A11. An attacker does not know secret values (e.g. symmetric secret key KAB or private
key P RA ).
A12. Brute force attacks requiring greater than 280 operations are impossible.
A14. An entity receiving a message with attached MAC that successfully verifies, knows
that the message has not been modified and originated at one of the owners of the
MAC secret key.
A17. Given a hash value, h, it is impossible to find another message M 0 that also has a
hash value of h.
A18. It is impossible to find two messages, M and M 0 , that have the same hash value.
A20. An entity receiving a message with an attached digital signature knows that that
message originated by the signer of the message.
A22. Any entity can obtain the correct public key of any other entity.
A.2. PRINCIPLES 213
A23. Pseudo-random number generators (PRNG) can generate effectively true random
numbers.
A.2 Principles
P1. Experience: Algorithms that have been used over a long period are less likely to
have security flaws than newer algorithms.
P2. Performance: Symmetric key algorithms are significantly faster than public key
algorithms.
P5. Key Re-use: The more times a key is used, the greater the chance of an attacker
discovering that key.
P6. Multi-layer Security: Using multiple overlapping security mechanisms can increase
the security of a system.
214 APPENDIX A. CRYPTOGRAPHY ASSUMPTIONS AND PRINCIPLES
Appendix B
Data Formats
• 10 digits
Keys such as SPACE, TAB and ENTER are usually not considered printable.
Primarily seen in applications dealing with user input, e.g. passwords.
File: crypto/formats.tex, r1766
215
216 APPENDIX B. DATA FORMATS
B.1.4 ASCII
American Standard Code for Information Interchange (ASCII) is a common standard
for representing keyboard/computer characters in a digital format. Also referred to as
the International Reference Alphabet and a subset of Unicode, there are 128 characters
in the ASCII character set. Section B.1 shows the mappings to decimal values, while
Section B.2 shows the mapping to 7-bit binary values (take the 3 bits from the column
and then the 4 bits from the row).
B.1.5 Hexadecimal
A character set with 16 characters:
0 1 2 3 4 5 6 7 8 9 A B C D E F
When communicating binary data (to humans), it is sometimes represented in hex-
adecimal as it uses four times less characters (4 bits per character), and has less chance
of reading/writing errors.
Examples of using hexadecimal to illustrate binary data includes: secret keys, public
key pair values, very large numbers (e.g. large primes), ciphertext, and addresses.
B.1.6 Base64
An alternative to hexadecimal representation of binary data is using Base64 encoding.
Base64 is a character set of 64 characters:
• 10 digits
The = character is used to indicate padding (and is not part of the 64 characters).
See online resources for an explanation of padding.
218 APPENDIX B. DATA FORMATS
Base64 maps 6 bits to a character and therefore is more concise than hexadecimal. It
is often used when communicating binary data in text-based protocols in networks (e.g.
including binary data in a HTML page or email).
To convert ASCII characters to their decimal value, in a Linux Bash terminal you can
use printf (newlines have been added below to make the output clearer):
$ printf ’%d’ "’A"
65
$ printf ’%d’ "’a"
97
$ printf ’%d’ "’!"
33
$ printf ’%d’ "’~"
126
You are advised to simply lookup the table or find another tool, rather than use the
Bash commands as above.
repository. The code below is version 4c0faec. An example of the output from running
the conversion functions follows the code.
53 def hex_to_binary(h):
54 return bin(int(h,16))[2:]
55
56 def binary_to_hex(bi):
57 return hex(int(bi,2))[2:]
58
59 def binary_to_bytes(bi):
60 return hex_to_bytes(binary_to_hex(bi))
61
62 def bytes_to_binary(b):
63 return hex_to_binary(bytes_to_hex(b))
64
65 def text_to_binary(s):
66 return bytes_to_binary(text_to_bytes(s))
67
68 def binary_to_text(bi):
69 return bytes_to_text(binary_to_bytes(bi))
70
71 def base64_to_binary(b64):
72 return bytes_to_binary(base64_to_bytes(b64))
73
74 def binary_to_base64(bi):
75 return bytes_to_base64(binary_to_bytes(bi))
76
77
78 def letter_to_number(c, charset="lowercase"):
79 ’’’
80 Convert a single character into a number
81 Converts a -> 0, b -> 1, c -> 2, ... or
82 if uppercase A -> 0, B -> 1, C -> 2, ...
83 ’’’
84
85 if charset == "uppercase":
86 return ord(c) - 65
87 else:
88 return ord(c) - 97
89
90 def number_to_letter(n, charset="lowercase"):
91 ’’’
92 Convert a number into a single character
93 See char_to_num(c) - this is the opposite
94 ’’’
95
96 if charset == "uppercase":
97 return chr(n + 65)
98 else:
99 return chr(n + 97)
100
101 def text_to_numbers(text, charset="lowecase"):
102 ’’’
103 Convert a string into a list of numbers
104 :Example:
105 - input: str = "abc"
106 - output: list = [0, 1, 2]
107 ’’’
108
109 return [letter_to_number(c, charset) for c in text]
B.3. CONVERSIONS USING PYTHON 221
110
111 def numbers_to_text(nums, charset="lowercase"):
112 ’’’
113 Convert a list of numbers into a string
114 See text_to_nums(text) - this is the opposite
115 ’’’
116
117 return ’’.join([num_to_char(n, charset) for n in nums])
118
119
120 if __name__==’__main__’:
121 import sys
122 import argparse
123
124 # Process command line arguments
125 parser = argparse.ArgumentParser(
126 description="Convert␣between␣different␣formats␣for␣cryptography",
127 formatter_class=argparse.RawDescriptionHelpFormatter,
128 epilog=’’’
129 example (command-line):
130 $ python conversions.py
131 ’’’)
132 parser.add_argument("-l", "--log",
133 choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"])
134 args = parser.parse_args()
135
136 # Enable logging based on command line input
137 if args.log is None:
138 numeric_log_level = logging.ERROR
139 else:
140 numeric_log_level = getattr(logging, args.log.upper(), None)
141 if not isinstance(numeric_log_level, int):
142 raise ValueError(’Invalid␣log␣level:␣%s’ % args.log)
143 logging.basicConfig(level=numeric_log_level)
144
145 data1_str = "Hello"
146 data1_bytes = text_to_bytes(data1_str)
147 data1_b64 = text_to_base64(data1_str)
148 data1_hex = text_to_hex(data1_str)
149 data1_bin = text_to_binary(data1_str)
150 data1_list = text_to_list(data1_str)
151
152 print("Converting␣Text␣to␣...")
153 print("␣␣␣Text:" + str(data1_str))
154 print("␣␣␣Bytes␣:" + str(data1_bytes))
155 print("␣␣␣Base64:" + str(data1_b64))
156 print("␣␣␣Hex␣␣␣:" + str(data1_hex))
157 print("␣␣␣Binary:" + str(data1_bin))
158 print("␣␣␣List␣␣:" + str(data1_list))
159
160 data2_b64 = "SGVsbG8="
161 data2_bytes = base64_to_bytes(data2_b64)
162 data2_str = base64_to_text(data2_b64)
163 data2_hex = base64_to_hex(data2_b64)
164 data2_bin = base64_to_binary(data2_b64)
165
166 print("Converting␣Base64␣to␣...")
222 APPENDIX B. DATA FORMATS
223
224 APPENDIX C. ORGANISATIONS AND PEOPLE IN CRYPTOGRAPHY
While studying his Bachelor degree in computer science in 1974, Ralph Merkle de-
veloped a set of puzzles that allowed two users to agree upon a shared secret key by
exchanging messages over an unsecure channel, even if they have no common secrets
known beforehand. This was unique as up until then, as it was normally assumed users
must manually exchange a secret before than can send messages. Ralph continued his
studies in a PhD with Martin Hellman as his adviser.
In 1976 Whitfield Diffie and Martin Hellman used Merkle’s scheme as motivation
for their own, improving the security by basing the problem of the attacker on solving
discrete logarithms (Merkle’s puzzles only involved quadratic complexity problems, much
easier than discrete logarithms). Their scheme, called Diffie-Hellman key exchange, was
the first secure example of public key cryptography. It is still in use today, in particular
in TLS (e.g. when you SSH into another computer).
In the 1990’s it was announced that Clifford Cocks and others at GCHQ had designed
similar public key cryptography concepts earlier than Merkle, Diffie and Hellman.
be decrypted by the other key of the pair. The strength of RSA is based on the difficult
to factor large numbers into their prime factors.
Although their were other public key algorithms developed, before RSA symmetric
key encryption was primarily used in practice. With RSA patented, Rivest, Shamir and
Adleman co-founded RSA Security to commercialise the use of the algorithm. In 2006 it
was acquired by EMC for $US2 billion. RSA is mainly used for digital signatures and
authentication tokens. Verisign was formed as a spin-off company from RSA Security
that used the algorithm to sign digital certificates.
Rivest, Shamir and Adleman continue their cryptography research. Rivest developed
ciphers RC2, RC4, RC5 and RC6 and hash functions MD2, MD4, MD5 and MD6; Shamir
discovered differential crytpanalysis; Adleman is a leader of DNA computing and coined
the term ’computer virus’.
and therefore proved that there is no solution for the Entscheidungsproblem (“decision
problem”).
Turing then worked at Princeton, obtaining his PhD in 1938, which introduced ordi-
nal logic and the computing oracle, which has been highly influential in computational
complexity theory.
In 1938 Turing returned to the England, and during World War II worked for the
British code breaking organisation (which is now GCHQ) in Bletchley Park. He made
major contributions to breaking the Enigma cryptosystem used by Germans, as well as
developing a secure voice scrambler and using statistical techniques to break codes. In
1948 Turing lead the development of one of the first computers. As a contribution to
artificial intelligence, he also developed the Turing test, a way to determine if a machine
is “intelligent”. He also developed LU decomposition, a method used to solve matrix
equations.
Turing was convicted and chemically castrated for being homosexual in 1952. He
committed suicide in 1954.
Shannon and others used principles of information theory to make substantial win-
nings in Las Vegas casinos and on the stock market.
In the early 1930’s Hedy Lamarr acted in several movies in Europe, before moving the
Hollywood in 1938. She had a leading role in multiple top movies in the 1940’s, alongside
the most popular actors of the time.
While acting during World War II, Lamar was inspired to contribute to the war
effort and worked with George Antheil on inventions. They focussed on remote control
torpedoes, in particular how to design communications between a ship and torpedo so
that the signal could not jammed. They developed a method of rapidly switching or
“hopping” between different frequencies (initially 88 frequencies, matching the number
of keys on the piano of Antheil). An attacker would need to transmit on all frequencies
to jam the signal, which would require too much power, making the attack impractical.
Lamarr and Antheil were granted a US patent in 1942.
Although Lamarr did not commercialise the technique, it started to be used by the
US military in the 1960’s, and more widely in the 1990’s. The concept of frequency
hopping serves as the basis of spread spectrum communications used today. It is used in
Bluetooth, WiFi and CDMA mobile phones.
Lamarr continued acting, gaining a star on the Hollywood Walk of Fame, as well as
being inducted into the Inventors Hall of Fame.
the US government were unlikely to stop the exportation of a book that could be legally
purchased.
Zimmermann continues activities in security and privacy, developing ZRTP for en-
crypted real-time VOIP calls, and founding Silent Circle which offers secure text, email
and phones.
• Ross Anderson
• Daniel J. Bernstein
• Dan Boneh
• Joan Daemen
Appendix D
• A new major version will be released (if necessary) at the start of each teaching term.
That is currently March (03), July (07) and November (11). If no significant updates
are made between teaching terms, then a new major version may be skipped. The
major versions will be named by year and month, e.g. 20.03, 20.07, 20.11, 21.03.
• Minor versions will be released to fix bugs, typos and formatting issues. They
may contain new content (e.g. new chapter or new section), so long as the existing
chapters and sections are not re-numbered (e.g. new chapters will be added at the
end of the book). Apart from this, they will not contain significant changes to the
content. The minor versions will be identified by the Subversion (SVN) revision
number on the first page of the book.
Crypto 20.03
r1671, 1 March 2020: First public release of the book.
Crypto 22.03
r1972, 4 January 2022: Replaced many images with own images or Creative Commons
images (e.g. DES, AES, Authentication, Classical chapters); changed slides from 4:3 to
16:9 aspect ratio; several additional examples (e.g. Block Cipher Design Principles); new
videos for Encryption and Number Theory chapters.
229
230 APPENDIX D. VERSIONS OF THIS BOOK
Index
/dev/random, 16 congruence, 34
/dev/urandom, 15 congruent modulo, 34
conventional crypto, see symmetric key crypto
accounting, 6 cryptanalysis, 7
additive inverse, 35 cryptography, 7
asymmetric cryptography, see public key cryp-cryptology, 7
tography cut, 12
asymmetric key crypto, see public-key crypto example, 15
attacks
brute force, see brute force attack decryption, 7, 102
frequency analysis, see frequency anal- diff
ysis example, 102, 116
authentication, 6 digital signature, 149
authorisation, 6 discrete logarithm, 39
availability, 5 divides, 31
divisor, 31
base64, 218
bc, 13 echo
brute force attack, 48 example, 103
example, 48 encryption, 6, 7, 102
monoalphabetic cipher, 51 Euler’s totient, 33
Python, 48
Feistel network, 67
Caesar cipher, 45 Fermat’s theorem, 37
definition, 45, 47 frequency analysis, 52
example, 45–47 monoalphabetic, 52
Python, 48
CIA, 5 gcd(), see greatest common divisor
cipher, 7 git
ciphers example, 20
Caesar, see Caesar cipher greatest common divisor, 31
monoalphabetic, see monoalphabetic ci-
integer factorisation, 41
pher
integrity, 5
Playfair, see Playfair cipher
polyalphabetic, see polyalphabetic cipher key, 7
ciphertext, 7
cmp Linux, 11
example, 163 Ubuntu, 11
confidentiality, 5 logarithm, 38
231
232 INDEX
md5sum, 16 pycipher, 20
mod, 34 Python, 20
mode of operation, 103
modular addition, 35 rail fence cipher, 62
modular arithmetic, 34 random numbers
modular division, 37 /dev/random, 16
modular exponentiation, 38 /dev/urandom, 15
modular logarithm $RANDOM, 15
see discrete logarithm 39 relatively prime, 31
modular multiplication, 36 rows columns cipher, 62
modular subtraction, 35 RSA, 145
modulus, 34 S-DES, 91
monoalphabetic cipher, 50 security protections, 5
definition, 50 sha1sum, 16
example, 50 sha256sum, 16
multiplicative inverse, 36 sha512sum, 16
One-Time Pad, 60 shared-key crypto, see symmetric key crypto
OpenSSL, 17 Simplified-DES, see S-DES
dgst, 149 single-key crypto, see symmetric key crypto
enc, 102 SPN, 67
genpkey, 145, 159 substitution, 67
help, 17 symmetric key crypto, 65
list, 18 transposition, 61
padding, 103
pkey, 146 Ubuntu, see Linux
pkeyparam, 160
pkeyutl, 149, 162 Vernam cipher, 59
rand, 19, 104 Vigenère cipher, 57
smime, 150 example, 57
speed, 116 wc
version, 18 example, 103
passwords XOR
salt, 103 Python, 59
permutation, 50, 67 xxd, 11
plaintext, 7 example, 15, 19, 103, 116
Playfair cipher, 54
definition, 54, 55
example, 54, 55
polyalphabetic cipher, 56
definition, 56
prime factorisation, 33
prime factors, 32
prime number, 32
primitive root, 39
public key cryptography, 131, 145
public-key crypto, 65