Bitcoin_Keys_and_Addresses-1
Bitcoin_Keys_and_Addresses-1
ETH: 0xb79Fb9194C8Cc6221368bb70976e18609Ab9AcA8
As was previously introduced in the Elliptic Curve Groups post, the linkage between
Bitcoin’s private and public keys is determined by a specific elliptic curve known as
secp256k1. Recall that the curve’s parameters are as follows:
• The secp256k1 curve is non-singular and is represented using its short Weierstrass
form. We denoted the resulting group by (E(Fp ), ⊕p ), where
O denotes the point at infinity and is the identity element of the group. Here is a
euclidean representation of this curve when p = 163 (it is not feasible to show it
for p = 2256 − 232 − 29 − 28 − 27 − 26 − 24 − 1).
1
c
2018 Bassam El Khoury Seguias
xG ≡
55066263022277343669578718895168534326250603453777594175500187360389116729240
(mod p)
yG ≡
32670510020758816978083085130507043184471273380659243275938904335757337482424
(mod p)
n=
115792089237316195423570985008687907852837564279074904382605163141518161494337
(mod p)
2
c
2018 Bassam El Khoury Seguias
• Recall that n denotes the order of G, and must divide #E(Fp ) i.e., the order of
E(Fp ). The cofactor h is equal to #E(F
n
p)
, which in this case is equal to 1.
That means that the order of G is equal to that of E(Fp ), i.e., n = #E(Fp ). Since
n is prime, the order of E(Fp ) is also prime. As a result, (E(Fp ), ⊕p ) is a cyclic
group and any of its elements could serve as a generator.
We also saw that Bitcoin’s private and public keys obey the following architecture:
1. A private key m is a 256-bit long scalar chosen from the set F∗n ≡ Fn − {0}.
2. A public key M is an element of the subgroup {G}. M is derived from m by adding
G to itself a total of m times. Addition refers to the elliptic curve group binary
operation ⊕p . More specifically, M = m ⊗p G ≡ G ⊕p G ... ⊕p G (m times). It is
a 512-bit long string denoting the elliptic curve point (x, y). It is an element of the
set {G} which in this case is equivalent to E(Fp ). Both x and y are 256-bit long.
The most important observation was that one can efficiently calculate M from m
using e.g., the double-and-add method, but that deriving m from M is thought to be
intractable. We saw that this conclusion is a manifestation of the exponential hardness
of the Elliptic Curve Discrete Logarithm Problem (ECDLP).
In what follows we include four python methods, the first three of which feed into the
method entitled mul scalar that perfoms elliptic-curve point multiplication. The first
two methods were sourced from [3]:
1. extended euclidean algorithm(a, b): it takes two integers a and b and returns
a three-tuple consisting of gcd(a, b) and the bézout coefficients x and y that satisfy
ax + by = gcd(a, b) (refer to Groups and Finite Fields):
3
c
2018 Bassam El Khoury Seguias
3. add points(A, B, p, p1, p2): it adds two points p1 and p2 on the short
Weierstrass form elliptic curve whose equation is
E : y 2 ≡ x3 + Ax + b (mod p)
The rules for adding two points was outlined in the Elliptic Curve Groups post:
E : y 2 = x3 + Ax + b (mod p)
4
c
2018 Bassam El Khoury Seguias
include below an example of a python code that does this. But first, we specify the
parameters of the elliptic curve group associated with the secp256k1 curve:
• p dec denotes the decimal value of the order of the underlying finite field.
• G dec corresponds to the decimal coordinates (mod p dec) of the base point G.
• n dec is the decimal value of the order of {G}, the subgroup generated by G.
• A dec and B dec are the parameters of the secp256k1 curve represented in short
Weierstrass form. A dec = 0, and B dec = 7.
Next, we generate a random private key in decimal notation that we assign to variable
priv key dec
There is one caveat however. It is possibile for the randomly generated private key
not to be big enough to fill all of the 256 bits (recall that the private key can be any
positive integer less than (n − 1)). If this is the case, we would need to add enough
leading 0’s to ensure that the final length is 256 bits. The following python method is
one way of completing the hexadecimal representation whenever needed:
5
c
2018 Bassam El Khoury Seguias
Private keys - WIF representation Another format for representing private keys
is the Wallet Import Format or WIF for short. The WIF format is used whenever a
private key is imported or exported from one wallet to another. The Quick Response
code (QR) of a private key is usually displayed in WIF format. To perform WIF
encoding, the following sequential procedure (also known as the base58Check
encoding procedure) is implemented:
1. Insert a version prefix of 128 (decimal) or 80 (hexadecimal) at the beginning of
the original private key.
2. Perform a double sha256 on the binary representation of the newly prefixed key.
3. Store the first 4 bytes (i.e., 8 nibbles) in a checksum variable.
4. Append the checksum to the end of the prefixed key.
5. Encoded the result in base 58.
The steps are self-explanatory, except possibly for the last one. A base 58 encoding is
similar in concept to any other base transformation. The alphabet used in this case
consists of the following 58 elements:
The rationale for base 58 encoding is explained in the original Bitcoin client source code:
• Its alphabet consists of all digits (except for 0) and all lower and upper case ISO
basic Latin alphabet symbols (except for O, I, and l). The reason for excluding
them is due to the striking resemblance (when using certain fonts) of I and l, and
of 0 and O. Their inclusion would possibly result in addresses and keys that
visually look similar but that are actually different.
• ”A string with non-alphanumeric characters is not as easily accepted as an
account number”. Limiting the alphabet to alphanumeric characters is safer.
• The exclusion of any punctuation character is motivated by the fact that ”e-mails
usually won’t line-break if there’s no punctuation to break at.
6
c
2018 Bassam El Khoury Seguias
4. Since the previous quotient was 0, we stop the process and conclude that the base
58 representation of 19,099 is given by 6gJ.
Here is an example of a python code that applies this procedure to any non-negative
integer i :
In what follows, we show how the base58Check encoding can be implemented in python.
Note that it is always recommended to rely on existing implementations such as the one
used by the Bitcoin client or as part of other libraries developed specifically for python.
The one we include below is for educational purposes and we built it from scratch with
the sole intention of illustrating the process:
• Our base58Check method takes two arguments:
• We will see shortly that the same method is used to derive the Bitcoin address
associated with a given public key in the context of a P2PKH or P2SH
transaction. In this case, different version prefixes will be used.
• The hashing function sha256 is applied to the binary version of the prefixed
private key. To get to binary, we use python’s binascii.unhexlify(hex str)
method which acts on a hexadecimal string. There must be an even number of hex
digits for it to work or an error gets raised. In our case, the even length constraint
is always observed since we enforce a 256-bit long string (i.e., 64 nibbles).
7
c
2018 Bassam El Khoury Seguias
A private key encoded in WIF format will always start with a 5. To see why, note that
the base58Check method creates a 37-byte long string (a byte-long version prefix, a
32-byte-long private key, and a 4-byte-long checksum) that it transforms into decimal
notation before feeding into the base58 encode int method. The version prefix is set
to ’80’ in hexadecimal notation. The smallest and largest sequences of 74 nibbles (i.e.,
37 bytes) that can be formed with a ’80’ prefix are respectively given by:
8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 00
80FF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
FFFF FFFF FFFF FFFF FFFF FF
When these hexadecimal strings get transformed to decimal representation and then fed
to base58 encode int, we respectively obtain:
5HpHagT65TZzG1PH3CSu63k8DbpvD8s5ip4nEB3kEsreAbmahZy
5Km2kuu7vtFDPpxywn4u3NLu8iSdrqhxWT8tUKjeEXs2fDqZ9iN
Due to the nature of the base 58 encoding scheme (which works like any other base),
8
c
2018 Bassam El Khoury Seguias
the image of any valid string of 74 nibbles will be confined to this range, and hence is
bound to start with a 5.
An exercise similar to the one carried for WIF-encoded keys, reveals that all
WIF-compressed formats start with either K or L.
9
c
2018 Bassam El Khoury Seguias
10
c
2018 Bassam El Khoury Seguias
More specifically, Bitcoin addresses are strings of alphanumeric characters that can
start with either ”1” or ”3”. Note that there is also a new address type known as
Bech32 that starts with bc1 instead. It is a segwit address but is not widely adopted (
< 0.8% of existing Bitcoins as of the time of this writing [2]). We will not cover it in
this post and the reader interested in learning more about it can refer to e.g., [1].
Fundamentally, the two types of addresses (i.e., starting with ”1” or with ”3”)
correspond to the following two cases:
1. The destination of funds is a single recipient (person or entity) that has full
control over the funds and as a result, can spend them as she pleases.
2. The destination of funds is a more complex structure that specifies certain rules
that need to be met in order for the funds to be spent or unlocked.
The first type is known as a Pay to Public Key Hash address or P2PKH. These
addresses always start with ”1”. The rationale for the name stems from the fact that all
that is needed to create the address is a hash of the public key as we will see shortly. In
order to spend the funds, the recipient signs a new transaction using her private key. A
two-step verification mechanism is then conducted: First, the system compares the
address used as a source of funds with the one derived from the signer’s public key. In
case of a match, a second step validates whether the signature provided corresponds to
the sender’s public key or not. A match would indicate that the signer is the legitimate
owner and hence can spend the funds without further constraints. We will discuss the
details of P2PKH transactions in a later post.
The second type is known as a Pay to Script Hash address or P2SH. These
addresses always start with ”3”. They tend to be more complex than their P2PKH
counterpart in the sense that certain rules must be observed in order to unlock the
funds. These rules require more than the provision of a single public key hash and of a
signature derived from an appropriate private key. Applicable rules or conditions are
captured in a construct known as a redeem script. The rationale for the name P2SH
stems from the fact that all that is needed to create the address is a hash of the script.
An example of a script would be an M-of-N multisignature, whereby it is required to
have a minimum of M out of a total of N permissible signatures in order to unlock and
spend the funds associated with that address. A single entity cannot spend them and
hence a single private key is not enough. We will discuss the details of P2SH
transactions in a later post.
P2PKH addresses In order to derive the Bitcoin address associated with a given
public key we make use of two one-way hash functions, namely SHA256 and
RIPEMD-160. Whereas SHA256 outputs 256-bit long digests (i.e., 32 bytes),
RIPEMD160 outputs 160-bit long digests (i.e., 20 bytes). The procedure is as follows:
1. Given a public key (in compressed or uncompressed format), apply SHA256 on its
binary representation.
2. Apply RIPEMD-160 on the binary representation of the previous SHA256 digest.
3. Conduct Base58Check encoding on the previous digest using a version prefix of
’00’. The result is the desired Bitcoin address.
11
c
2018 Bassam El Khoury Seguias
Note that the last step is similar to that used to encode private keys in WIF format.
There are two differences however:
2. Since adding a prefix of ’00’ does not change the integer value of the bit-sequence,
we need a specifier to distinguish a string that starts with leading 0’s from one
that does not. The way Base58Check does it is by mapping the leading 0-byte to
a 1.
Here is a python code that generates the P2PKH addresses using a compressed or
uncompressed public keys
P2SH addresses The procedure used to derive a P2SH address is similar to that
employed to derive P2PKH addresses. The difference is two-fold:
1. The argument is now a redeem script as opposed to a public key (we will discuss
the details of scripts in the Bitcoin transactions post)
2. The version prefix is set to ’05’ as opposed to ’00’. Consequently, there is no need
to add a leading 1 when conducting the Base58Check encoding.
52210232cfef1f9ec45bef08062640963aa8d6b15062c9c9e51c26682369969ba9101a
21029e1f52d753a7c68fb17adaa0b19f6b02f1266245186bc487c691743b6086ed5021
03d0fabbd163dd3a6ccf382b5e640622e9075f2676443499195d9b5f3e4c11993b53ae
For those interested, this script was retrieved on the blockchain (e.g., use
blockchain.info) from the transaction with the following id
38d8d5ad0fad303f7cebd9b7363f22d80f22576ba36846ae44e83ac32615472c
12
c
2018 Bassam El Khoury Seguias
Here is a python code that generates the P2SH address associated with this script
An exercise similar to the one carried for WIF-encoded keys, reveals that all P2SH
addresses always start with a 3.
Below is a chart summarizing the interralation between private keys, public keys,
P2PKH and P2SH addresses
References
[1] Bech32. https://en.bitcoin.it/wiki/Bech32.
13
c
2018 Bassam El Khoury Seguias
14