Module 2
Module 2
BITCOIN
Symmetric Chipers
To talk about ciphers we will make examples using our friends Alice and Bob, who are trying to
communicate securely and there is an attacker who's trying to eavesdrop on their conversation.
To communicate securely, they're going to share a secret key K. They both know the secret key,
but the attacker doesn't know anything about it. They're going to use a cipher, which is a pair of
algorithms (E: encryption, D: descryption). These algorithms work as follows:
1. the encryption algorithm E takes the message M and the key K as inputs. And then it
outputs a ciphertext, which is the encryption of the message M using the key K. Now the
ciphertext is transmitted to Bob, somehow. Actually it could be transmitted over the internet
or using an encrypted file system, it doesn't matter.
2. when the ciphertext reaches Bob, he can plug it into the decryption algorithm using the
same key K.
The reason we say that these are symmetric ciphers is that both the encrypter and decrypter
actually use the same key K. As we'll see later in the course, there are ciphers where the
encrypter uses one key and the decrypter uses a different key. But now we just focus on
symmetric ciphers, where both sides use the same key.
Substitution Cipher
The simplest historical chiper is the substitution cipher. Basically a key for a substitution cipher
is a substitution table that says how to map our letters. So, in our example:
A goes to C
B goes to W
C goes to N
...so on...
until Z, that goes to A
The encryption of a certain message (i.e. "BCZA") using this key would be done by substituting
one letter at a time. B becomes W, C becomes N, Z becomes A, and A becomes C. So the
encryption of BCZA is WNAC, and this defines the ciphertext. Similarly, we can decrypt the
ciphertext using the same key and of course we'll get back the original message.
Caesar Cipher
There is one example related to the substitution cipher called the Caesar cipher. The Caesar
cipher, actually, is not really a cipher at all. The reason is that it doesn't have a key. What a
Caesar cipher is, is basically a substitution cipher where the substitution is fixed. Namely, it's a
shift by three. So A becomes D, B becomes E, C becomes F and so on and so forth. It's a fixed
substitution that's applied to all plaintext messages.
There's no key, so if an attacker knows how the encryption scheme works, he can easily decrypt
the message. The key is not random, and therefore, decryption is very easy once you understand
how the scheme actually works.
The first question is, how big is the key space? How many different keys are there, assuming we
have 26 letters? The number of keys is 26 factorial, because a key is essentially a permutation of
all 26 letters. 26 factorial is about two to the 88th. This is a perfectly fine size for a keyspace. We
will see adequately secure ciphers, with key spaces that are roughly of this size. However,
substitution cipher is still terribly insecure.
Let's take an encrypted message using the substitution cipher. All I know is that the plaintext is
in English and that the letter E is the most frequent letter in English. It appears 12.7% of the time
in standard English texts. I'm going to count how many times every letter appears. The most
common letter in the ciphertext is going to be the encryption of the letter E with very high
probability. So I'm able to recover one entry in the key table. The next most common letters in
English are T (9.1%) and A (8.1%). So we can likely figure out what is their mapping. The
remaining letters in English appear roughly same amount of time, other than some rare letters
like Q and X.
We figured out three entries in the key table, but what do we do next? The next idea is to use
frequencies of pairs of letters ("digrams"). In English, the most common pairs of letters are
things like, HE, AN, IN, TH. So I know that the most common pair of letters in the ciphertext is
likely to be the encryption of one of these four pairs. By trial and error we can figure out more
entries.
The substitution cipher happens to be vulnerable to the worst possible type of attack. Namely
a ciphertext only attack. Just given the ciphertext the attacker that can recover the decryption
key and therefore the original plaintext.
Vigenère cipher
Let's move from Roman Era to the Renaissance. We'll look at a cipher designed by Vigenère,
who lived in the sixteenth century. In a Vigener cipher, the key is a word. In this case the word,
is "CRYPTO", with six letters in it.
To encrypt a message, you write the message under the key. In this case the message is "WHAT
A NICE DAY TODAY" and then you replicate the key as many times as needed to cover the
message. Then the way you encrypt is basically, you add the key letters to the message letters,
modulo 26. For example, if you add Y and A, you get Z. If you add T and A, you get U. And you
do this for all the letters. And remember, whenever you add, you add modulo 26. So if you go
past Z, you go back to A.
Decryption is just as easy as encryption. You would write the ciphertext under the key. You
would replicate the key and then you would subtract the key from the ciphertext to get the
original plain text message.
Again, this is a ciphertext only attack. The interesting thing is the idea of addition mod 26. We'll
see why later, but it's executed very poorly here.
Rotor Machines
Single rotor machines
Fast-forward now to the nineteenth century where everything became electric. And so people
wanted to design ciphers that use electric motors. In particular, these ciphers are called rotor
machines because they use rotors. An early example is called the Hebern machine which uses a
single motor.
The secret key is embedded inside of the disc, which rotates by one notch every time you press a
key on the typewriter. So every time you hit a key, the disc rotates by one notch. Now what does
this key do? Well, the key actually encodes a substitution table. So the disc actually is the secret
key.
In this case, if you press C as the first letter, output would be the letter T. And then the disc
would rotate by one notch. After rotating, the new substitution table becomes the second one
shown in the image. You see that E, basically, moves up, and then the remaining letters move
down. You can imagine this is basically a two dimensional rendering of the disc rotating by one
notch. Then you press the next letter. And the disc rotates again. You notice again N moved up
and the remaining letters moved down.
In particular, if we hit the letter C three times, the first time the output would be T, then S, and
then K. This is how the single rotor machine works. It was again broken basically using letter,
digram and trigram frequencies. It's not that hard given enough ciphertext to directly recover the
secret key and then the message. Again, a ciphertext only attack.
Enigma Machine
Against these frequency attacks, these rotor machines became more and more complicated, until
the release of Enigma. It uses three, four, or five rotors. There are different versions of the
Enigma machine. Here you see an example of the Enigma machine with four rotors.
The secret key is the initial setting of the rotors. In the case of three rotors there would be 26 to
the fourth possible different keys (two to the eighteen), a relatively a small key space. As you
type on the typewriter the rotors rotate at different rates and output the appropriate, letters of, the
ciphertext.
Today it is possible to break very quickly a cipher with a keyspace of 2 to the 18 using brute-
force. But the British cryptographers at Bletchley park were able to mount ciphertext only
attacks on the Enigma machine. They were able to decrypt German ciphers during World War
Two. And that played an important role in many different battles during the war.
Computer Era
After the war, the mechanical age ended and started the digital age where folks use computers.
The US government realized that industry was starting to buy a lot of digital equipment. And
they wanted industry to use a good cipher. So the government put out this request for proposal of
a data encryption standard.
can take in input a string of any size and produces a fixed-size output. We will see hash
function with 256-bits output, since bitcoin uses one of this kind.
it has to be efficiently computable: it is possible to figure out the output in a reasonable
length of time.
1. collision-free: it's almost impossible to find two different input xx and yy, such
that H(x)=H(y)H(x)=H(y). This doesn't mean that there are no-collision at all. In fact, we
map a string of any size to 256-bits, so there are only 22562256 possible output. But, thanks
to the H structure it is really difficult to find two texts that map to the same hash;
2. pre-image resistance: if we know the hash H(x)H(x) it must be really difficult to figure out
what xx is;
3. second pre-image resistance: given an output value y (hash of an element x), it must be
really difficult to find another element x′x′ that hashes to y.
Now, let's see in further details this properties and their importance.
Collision resistance
Methods to find a collision
Let's see how to find a collision of a 256-bit hash function, to understand how difficult it is. We
will have to pick 21302130 randomly chosen inputs to have a 99.8% chance that at least two of
them are going to collide. It works no matter what the hash function is. But this takes a very,
very long time to do. It is necessary to compute the hash function 21302130 times.
We can say that if every computer ever made by humanity was computing since the beginning of
the entire Universe up to now, the odds that they would have found a collision is still
infinitesimally small. So this method clearly takes too long.
Now the question is: is there some other method that could be used on a particular hash function,
in order to find a collision?
Well, there are many hash function for which it is really easy to find a collision. For example the
function that computes the input modulo 22562256. It just selected the last 256 bits of the input.
One collision would be the values 33, and 3+22563+2256.
One thing to note is that there's no hash function which has been proven to be collision free.
There are just some that people have tried really, really hard to find collisions and haven't
succeeded. And so we choose to believe that those are collision free.
We know that if xx and yy have the same hash, then it's safe to assume that they are the same,
otherwise that would be a collision.
Suppose that we had a really big file and we wanted to be able to recognize later if compared
with another file. One way to do that would be to save the whole big file. And then compare it
with the new file.
But, since we have hashes that we believe are collision free, it's more efficient to just remember
the hash of the original file. Then if someone shows us a new file claiming that it's the same, we
can compute the hash of that new file and compare the hashes. If the hashes are the same, then
we conclude that the files must be the same. And that gives us a very efficient way to remember
things we've seen before.
Of course, this is useful because the hash is only 256 bits, while the original file might be really
big. So hash is useful as a message digest. We'll see later on, why it's important to use hash as a
message digest.
Now we ask someone who didn't see the coin flip, but only the hash output, to figure out what
the hashed string was. That, of course, is really easy. He can compute with two steps the hash of
the two possible strings and check the correspondance.
The adversary was able to guess because there were only a couple of possible input values. So, it
needs to be the case that there's no value of x which is particularly likely. The way we're gonna
fix this problem with common values of x, like heads and tails, is to take x and concatenate with
it to a value r, chosen from a random distribution. Using H(r|x)H(r|x) instead of H(x)H(x), it's
infeasible to find x.
Application of pre-image resistance
We want to do something called a commitment. This is the digital analogy of taking a value, a
number, sealing it in an envelope and putting that envelope out on the table, where everyone can
see it. Doing that, you've committed to what's in the envelope and, until it's closed, it's secret
from everyone else. Later, you can open the envelope and get out the value, but it's sealed. So
commit to a value and reveal it later. We wanna do that in a digital sense.
1. commit to a message: returns two values a commitment and a key. The commitment is the
same as the envelope put on the table, the key is necessary to open it
2. later you reveal someone the original message and let him verify that it correspond to the
original, providing him the commitment and the key
With this additional property now we can assure that:
1. given the commitment and the envelope on the table, someone just looking at it can't figure
out what the message is (pre-image resistance)
2. when you commit to what's in the envelope, you can't change your mind later (collision
resistance)
How to commit a message
In order to commit to a value message, we're going to generate a random 256-bits value and call
it the key.
Then we're going to return the hash of the key concatenated together with the message, as the
commitment.
When we provide someone the key and the message, he can compute the same hash of key
concatenated with the message and check the correspondance.
So the hash function is used both in the commitment and in the verification. And we use the
random key to improve the security as explained in the first paragraph regarding pre-image
resistance.
The idea is that we're given a puzzle ID, chosen from very spread out probability distribution and
a target set Y, which someone wants to make the hash function fall into. We wanna try to find a
solution x such that H(k|x)∈YH(k|x)∈Y. So, Y is a target range or a set of hash results that we
want to match, ID specifies a particular puzzle, and x is a solution to the puzzle. The puzzle-
friendly property implies that there's no solving strategy for this puzzle, better than just trying
random values of x.
So if we want to define a puzzle that's difficult to solve, we can do it using hash function, as long
as we can generate puzzle IDs in a suitably random way. We're going to use this puzzle later,
when we talk about bitcoin mining. That's the sort of computational puzzle we're going to use.
SHA-256
We've talked about three properties of hash functions and one application of each of those. Now
let's talk very briefly about the particular hash function we're going to use. There are lots of hash
functions used for cryptographic purposes, but the bitcoin uses SHA-256, which is a pretty good
one. Basically, it works like this:
1. it takes the message you're hashing, and it breaks it up into blocks that are 512 bits in size.
The message size, in general, isn't necessarily a multiple of block size. To make it a multiple
of block size, we will use some kind of padding (i.e. a 1 followed by a certain number of 0)
2. you start with the 256-bit value called the IV, specified in the standards document and the
first block. This 768-bits string goes through a special function cc (compression function)
that outputs a 256-bits string
3. Then the compression function is applied to the concatenation of the first output and the
second block
4. the process is repeated until the end of the blocks, the hash is the final 256-bits output
So a regular pointer gives you a way to retrieve the information. A hash pointer will let us ask to
get the information back and verify that the information hasn't changed. So a hash pointer tells us
where something is and what it's value was. In fact it also stores the hash of the value that this
data had when we last saw it.
Blockchain
For example, here is a linked list that we built with hash pointers. This is a data structure called
blockchain. It is just like a regular linked list where you have a series of blocks containing data
and a pointer to the previous block in the list. Here the previous block pointer will be replaced
with a hash pointer. So it says where it is and what the value of this entire previous block was.
We're going to store the head of the list, just as a regular hash pointer.
Tamper evidence
A use case for this is a tamper evident log, if we want to build a log data structure that stores a
bunch of data. So that we can add data at the end of the log, and detect if somebody tries to mess
up with data already presenti in some block of the log. That's what temper evidence means.
Why a block chain gives us this tamper evident property? Let's see what happens if an adversary
wants to go back and tamper with data that's in the middle of the chain. And he wants to do it in
such a way that we, the holders of the hash pointer at the head, won't be able to detect it.
Tamper trial
The adversary changes the contents of the block with the lightning symbol. Therefore, the hash
of this entire block changes, since the hash function is collision free. So we could detect the
inconsistency between this data and the hash pointer of the following block, unless the adversary
also tampers with its hash pointer.
If he tampers with the hash pointer then these two match up. But the content of the following
block is changed. That means that it's hash is not going to match the hash pointer of the
following block. We're going to detect the inconsistency between the contents of this block and
the hash, unless the adversary also tampers with the last block.
But now, the hash of this block doesn’t match the hash that we hold. The adversary can't tamper
with that, because this is the value we remembered as being the head of the list.
Conclusion
So the upshot of this is that, if the adversary wants to tamper with data anywhere in this entire
chain, in order to keep the story consistent he's going to have to tamper with hash pointers all the
way back to the beginning. And he's ultimately going to run into a road block, because he wont
be able to tamper with the head of the list.
So we can build a block chain like this containing as many blocks as we want, going back to
some special block at the beginning of the list which we might call the genesis block. And that's
a tamper evidence log built out of the block chamber.
Merkle tree
Another useful data structure we can build using hash pointers is a binary tree. We can build a
binary tree with hash pointers called Merkle tree, after Ralph Merkle who invented it.
Suppose we have a bunch of data blocks (i.e. the ones represented on the bottom line of the
graph). We're going to take consecutive pairs of these data blocks and for these two data blocks
we're going to build a data structure that has two hash pointers, one to each of these blocks. Then
we do the same with all the others data blocks.
Tamper evidence
Then we move a level up and build new blocks that contain a hash pointer of its two children.
And so on, all the way back up to the root of the tree. At the end, just like before, we're going to
remember just the hash pointer at the head of the tree.
Suppose that we want to go down through the hash pointers to any point in the list and make sure
that the data hasn't been tampered with. As we just showed you with blockchain, if an adversary
tampers with some data block at the bottom the tree, the hash pointer one level up won't match.
So he'll have to tamper with all the hash pointer of upper levels until the root.
It is enough to check all the hashes of the path from the bottom of the tree till the root. So that
takes about log2nlog2n items that we need to verify, and it takes about O(log2n)O(log2n) time
for us to verify it. And so, even if there's a large number of data blocks in the Merkle tree, we can
verify proven membership in a relatively short time.
Conclusions
So Merkle trees have various advantages:
1. the tree holds many items but we just need to remember the one root hash (only 256-bits)
2. we can verify membership in a Merkle tree in logarithmic time and logarithmic space
Sorted Merkle tree to prove non-membership
A variant of Merkle Tree is a sorted Merkle tree. In this kind of tree, we take the data blocks at
the bottom and we sort them into some order. Say alphabetical, lexicographic, numeric order or
some order that we agree on. Once we've sorted the Merkle tree, we can prove that a particular
block is not in the Merkle tree.
We can do that simply by showing a path to the item that's just before where that item would be
and just after where it would be. And then we can prove that both of these items are in the
Merkle tree are consecutive. Therefore there is no space in between them for the element we are
searching. So even non-membership proofs in sorted Merkle tree is very efficient.
For example, a directed acyclic graph can be constructed using hash pointers and we'll be able to
verify membership very efficiently. This is a general trick we will see over and over throughout
the distributed data structures and the algorithms that we will talk about during the course.
Digital Signatures
Digital Signature Definition
A digital signature is supposed to be just like a signature on paper only in digital form. And there
are mainly two things that we want from signatures:
1. only the owner can make the signature, but everyone seeing it can verify it's validity
2. the signature must be tied to a particular document. So that anyone can copy it and sign
another document
In fact, a signature is not just a signature, but certifies your agreement or endorsement of a
particular document.
Signature scheme
How can we build this in a digital form using cryptography?
There are three operations that we need to be able to do:
Generate a pair of keys: sksk = secret signing key, and pkpk = public verification key of
length in bit equal to keysizekeysize. So we will need an
operation (sk,pk):=generateKeys(keysize)(sk,pk):=generateKeys(keysize). sksk will be the
key to make the signature and pkpk will be the key that let other subjects verify the
signature.
Sign documents: takes the secret key sksk and a message mm and returns the message
signed sigsig. So we will need an operation sig:=sign(sk,message)sig:=sign(sk,message).
Signature verification: takes the public key pkpk, the message mm and the supposed
signature and returns yes or no, whether the signature is valid or not
These three operations constitute a signature scheme. The first two can be randomized
algorithms, while the verification will always be deterministic. Infact, generateKeys must
generate different keys for different people.
Signature requirements
The requirements for the signatures, at a slightly more technical level, are the following two:
1. A valid signature must always verify. So if I sign a message with my secret key sksk and
someones tries to check it with my public key pkpk, it has to be validated correctly.
2. It must be impossible to forge signatures. An adversary who knows my public key and
verifies my signatures on some messages, can't forge my signature on some other messages.
Signatures Unforgeability
Suppose that an attacker claims that he can forge signatures and a judge wants to verify if its true
or not.
We use generateKeys to obtain a secret signing key and a public verification key. We give the
secret key to the judge and the public key to both parties. So the judge can make signatures and
attacker knows only the public key and can see if a signature is valid or not. We will allow the
attacker to see signatures on documents of his choice.
the attacker sends a message m0m0, the judge signs it and sends it back
then he sends another message m1m1, the judge signs it and sends it back
the previous steps can be repeated over and over, until the attacker is satisfied
after that, the attacker picks a new message mm and tries to forge a signature.
the judge will run the verification algorithm to check whether it verifies or not.
We will base our unforgeability definition on the chances of the attacker of winning this game.
Def. The signature scheme is unforgeable if the attacker has only a negligible chance of
successfully forging a message no matter what algorithm the attacker is using.
Practical considerations
Good randomness
The algorithms that we talked about are randomized, so we need a good source of randomness.
In fact, bad randomness will take to an insecure algorithm. Attacks on the source of randomness
are the favourites from intelligence agencies. And the people who know what kinds of attacks are
likely to be successful.
This problem can be fixed easily signing the hash of the message rather than the message itself.
So the message can be really big, but the hash will only be 256 bits. And because hash functions
are collision free, it's safe to use the hash of the message as the input to the digital signature
scheme.
A nice thing we will see later, is that it is also possible to sign a hash pointer. And if you sign a
hash pointer then the signature covers or protects the whole structure, not just the hash pointer
itself, but also everything it points to. For example, if you sign the hash pointer at the end of a
blockchain, the result is that you are digitally signing the entire contents of that blockchain.
ECDSA
Bitcoin uses a particular digital signature scheme called ECDSA. It's an Elliptic Curve Digital
Signature Algorithm, which belongs to US government standards.
We won't go into all the details of how ECDSA works, since it relies on some extremely hairy
math.
ECDSA has good randomness, and this in very important with Elliptic Curves. Since, if you use
bad randomness in general in generating a key, then it is maybe not secure. In addition for
ECDSA, even if the key is perfectly secure, and the bad randomness only regards the signature
generation, this will also lead to private key discovery.
The important thing is that there's no central point of control, anyone is in charge of it. The
system operates in an entirely decentralized way.
Bitcoin Address
This is the way used by Bitcoin system to manage identities, that, in this case, are
called addresses. The Bitcoin address, in fact, is just a public key or hash of a public key. It's an
identity that someone made up, out of thin air, as part of its decentralized identity management
scheme.
Now the obvious question that arises when you're talking about decentralized identity
management, is how private is this? On the one hand, the addresses made up this way are not
connected to you real world identity. You can execute a randomized algorithm, it will make
some kind of pkpk that looks random. And nothing exists initially to connect that to who you are.
The bad news is that the identity is making a series of statements over time, it's doing a series of
operations over time. People can see that, whoever is making a certain series of actions, they can
start to connect the dots. An observer can link together these things over time and make
inferences.
So, at the beginning there's no initial tie to real world identity, but on the other hand a pattern of
behavior of an address emerges over time.
A simplified Cryptocurrency
Now that we start to talk about you'll see how these pieces fit together and why the cryptographic
operations such as hash functions and digital signatures are really useful. In this section we will
talk about a simplified Cryptocurrency that give us ideas about how systems like BitCoin work.
It's going to require about ten more lectures in order to really explain how BitCoin works and
what that means. But let's start the discussion with the simple GoofyCoin, the simplest crypto
currency we can imagine.
Goofy Coin
There are just a couple rules of GoofyCoin:
1. Goofy can create new coins whenever he wants. The new coins, at the beginning, belong to
him. So, there’s a CreateCoin operation, a uniqueCoinID and a Digital Signature that Goofy
So Goofy can create new coins with a simple statement that he's making a new coin with a
unique coin ID. And then whoever owns a coin can pass it on to someone else by signing a
statement, saying pass on this coin to person X. It is possible to verify the validity of a coin by
simply following the chain and verifying all the signatures along the way.
Double-spending attacks are one of the key problems that a Cryptocurrency has to solve.
GoofyCoin does not solve the double spending attack and therefore it is not secure. In order to
build a usable Cryptocurrency, we need to have some solution to the double-spending problem.
Scrooge Coin
So let's design ScroogeCoin, that is going to be like GoofyCoin, except it will solve the double-
spending problem in a particular way. The key idea is that Scrooge is going to publish the history
of all the transactions in a blockchain, that will be digitally signed by Scrooge. So anyone can
check the data blocks. Each block will have one transaction in it and a hash pointer to the
previous block in the history.
Then Scrooge will take the hash pointer, which represents this entire structure, digitally sign it
and publish it. Now anybody can verify that Scrooge really signed this hash pointer. And then
they can follow this chain all the way back and see the entire history of all the transactions of
ScroogeCoin, since the beginning.
Transactions types
In ScroogeCoin there are two kinds of transactions:
coins.
Check transactions validity
This transaction is valid if four conditions are met:
1. the consumed coins are valid: they were really created in previous transactions
2. double-spending check: the consumed coins were not already been spent in previous
transactions
3. the total value of the newly created coins is equal to the total value of the destroyed coins
4. all the digital signature of previous owners are valid
In this case Scrooge will accept it and insert this block into the blockchain, so everyone can
check it. In this scheme the coins are immutable, they are just created with a specific owner and
then destroyed. But every operation is still possible. For example to subdivide a coin, is enough
to create a new transaction that consumes a coin and produces two new coins of the same total
value.
Centralization Problem
In this case the problem is Scrooge. Right? Scrooge says don't worry, I'm honest. But if Scrooge
starts misbehaving, then we're going to have a problem. Or if Scrooge just gets bored of the
whole ScroogeCoin scheme and stops doing the things that he is supposed to do, then the system
won't operate anymore.
The problem we have here is Centralization: although Scrooge is happy with the system, the
users of it might not be. In order to improve on ScroogeCoin is necessary to decentralize the
system.
order to do that, we need to figure out how to provide the same services of Scrooge, but in a
decentralized way, in which no particular party is the only trusted one. We need to figure out:
how everyone can agree upon a single published blockchain, on which transactions are valid
and which transactions actually occurred
how we can assign IDs to things in a decentralized way
Bitcoin transactions
At the moment, to simplify the model, we will suppose that single transactions are added to
blocks and not a set.
The amount of Bitcoin contained in Alice and Bob account will continue to change as they
transfer and get money from other users. In this kind of system in which the transaction are
inserted one by one in the blockchain, it is still necessary to remember the amount of money
available in each account to check transactions validity. With this kind of system it will be
necessary to look at all the transactions since the beginning to know the balance. It's very
complicated, so this model would need an additional structure to remember user current balance.
In this model the transactions explicitly specify a number of inputs and a number of outputs (two
arrays). In addition, each transaction has a unique identifier. So, let's see how it would work with
this model:
1. the first transaction has no input, because it corresponds to new currency being created. It
has an output of 25 Bitcoins going to Alice. It's a creation transaction, so no signature is
needed. It's identified uniquely by Id 1.
2. then Alice wants to send some money to Bob. She refers explicitly to the transaction 1
where these coins comes from. So the input of this transaction will be the output index 0 of
the first transaction. Then this transaction has two outputs: 17 Bitcoin going to Bob, 8
Bitcoin going back to Alice. The idea is to fully consume the output of the previous
transaction, sending the exceeding money back to herself, this operation is called change
address.
3. then Bob does transaction 3 towards Carol and himself referring to the output index 0 of
transaction 2. And Alice tries to make transaction 4 using the money of transaction 2 index
1.
Conceptually one could think that maybe this isn't much different than just maintaining a
separate data structure which tracks account values for each person. The nice thing is that now
this data structure is embedded within the blockchain itself.
With this model it is possible to merge values. Suppose that there are two different transactions
that sends money to Bob. Bob might create a new transaction that has two inputs, and only one
output, so they can be spent together. Similarly, it is possible to do joint payments very easily.
For example, suppose that Carol and Bob both want to pay David. They can make a transaction
with two inputs that are actually owned by two different people and combine the value to pay
David. The only extra thing is that two separate signatures are required, one by Carol and one by
Bob.
1. metadata: there are some housekeeping information. For example, the size of the
transaction that corresponds to its dimension in bytes and not the amount of money. Then
there's the indication of number of inputs and outputs. Then there's a hash of the entire
transaction, which is its unique ID that will let us use hash pointers. There's also
a lock_time parameter, which we will see in details later.
2. inputs: array of inputs of the same form. Every input refers to a specific previous
transaction, or specifically to its hash and the index of output from the previous transaction
the current is referring to. Then there's the signature, which is necessary to prove to be the
owner of the previous transaction output.
3. outputs: array of outputs of the same form. The sum of all output has to be less than or
equal to the sum of all inputs. Each output must go to a specific public key. Each output
appears as a script that we will se in details later.
Common OP instructions
The most common OP instructions are the following:
The first two instructions in this script are simply data instructions:
OP_EQUALVERIFY: verifies if the two hashes on the top of the stack are equals. If so they
are removed. If not, an error is returned and the transaction is not valid.
OP_CHECKSIG: now we have the public key and the signature left on the stack. This last
operation checks if the public key corresponds to the signature, so if the signature is valid. It
returns true if the signature is valid, false otherwise.
Script Properties
The main properties of Bitcoin language and scripts are the following:
every Bitcoin script can only produce two outcomes. It can either execute successfully or
return an error. In the Transaction validation, if there's any error while the script is
executing, the whole transaction will be invalid and shouldn't be accepted into the
blockchain
Bitcoin scripting language is very small, there's only room for 256 instructions, since each
one is given by one byte. 15 of them are currently disabled and 75 are reserved, so actually
they don't have any meaning and could be assigned later on.
Bitcoin scripting language includes instructions to manage basic arithmetic, basic logic,
throwing errors and cryptography management such as hash functions, signature
verification.
There's a small bug regarding this instruction: it pops an extra data value off the stack and
ignores it. So, in programming, it is necessary to deal with it putting an extra dummy variable
onto the stack. It is considered a feature of Bitcoin language, because it is there since the
beginning of times and the costs of removing it are much higher than the damage it causes.
Proof-of-burn
Proof-of-burn is actually a script that can never be redeemed. If you have a proof-of-burn, it's
provable that those coins have been destroyed, there's no possible way for them to be spent. To
implement a proof-of-burn it's necessary to insert an OP_RETURN instruction, which throws as
soon as it is reached, no matter what instructions preeceded it. The data that comes after
OP_RETURN are ignored, so this is an opportunity to specify arbitrary data in a script that will
remain in the blockchain.
insert arbitrary data into blockchain, for example timespamp a document and proove that
you knew some data at a specific time. In this case it is possibile to create a very low value
Bitcoin transaction that's proof-of-burn. So you can destroy a very small amount of
currency, and in exchange write something into the blockchain, which should be kept
forever.
some alternate coins systems, can promote their new currencies forcing people to destroy
Bitcoin in order to gain coins in the new system. We will see more about this in future
lectures.
Pay-to-script-hash
To use Bitcoin a sender must specify a script exactly. A common consumer, wouldn't be able to
specify it, if for example he is ordering something online and a MULTISIG script is required. As
a consumer, he just want to send the money using a simple address. In response to that problem,
there's a feature in Bitcoin that lets the sender specify just a hash of the script that is needed to
redeem the coins. The script acts as follows:
The sender specifies the hash of the script and it is put on the top of the stack
The receiver specifies as a data value, the value of the script corresponding to the previous
hash
The alorithm checks if the hash of this data corresponds to the one specified by the sender
If the two hashes match, the top data value from the stack is reinterpreted as instructions, so
it's executed a second time as a script.
This pay-to-script-hash is an alternative to make Bitcoin payments to the standard way, which is
called pay-to-public-key. This is a useful feature that was not available in the first version of the
protocol. It removes complexity from the sender, and has an additional efficiency gain since the
output scripts are as small as possible with pay-to-script-hash: they just specify a hash, and all of
the complexity is pushed to the input scripts.
Custom script
There isn't too much creativity in terms of what scripts people actually use. One reason for that is
that Bitcoin nodes have a white list of scripts and they refuse to accept scripts that they consider
not standard. This doesn't mean that those scripts can't be used at all, it just makes them harder to
use, we will talk more about it later.
Alice creates a MULTISIG transaction that requires two or three people to sign in order to
redeem the coins. These people are Alice, Bob and Judy. Judy is a judge, who will come
into play only if there's any dispute
Alice signs the transaction redeeming some coins that she owns. At this point these coins are
held in escrow between Alice, Bob, and Judy. And any two of them can specify where the
coin should go.
At this point Bob can safely send the goods to Alice and sign the transaction that releases
money to him.
If the goods arrive and corresponds to what Alice expected, she can release the money
towards Bob signing the transaction. The money will be sent to Bob without the need of
Judy's intervention.
This happens if both are honest. Otherwise Alice could ask for her money back. And maybe
Bob doesn't agree to sign the transactions that releases the money towards Alice.
So, now it's Judy's turn to decide who's right and sign the transaction that releases money
either towards Alice or Bob. In both cases, since only two signatures are required, the
money will be sent towards one of them.
Green addresses
Another cool application is what are called green addresses. Imagine that Alice wants to pay
Bob, who's offline. Bob can't check the blockchain to see if the transaction that Alice is sending
is valid. This can happen for any reason. For example if Bob doesn't have time to connect and
check or doesn't have a connection.
Normally a transaction is valid when its block is followed by other six blocks. This can take up
to an hour. To solve the problem of the recipient not being able to access the blockchain, it is
necessary to introduce a third party: the bank.
Alice can ask her bank to transfer money to Bob. They will deduct some money from Alice
account and make a transaction to Bob from one of their green addresses.
The money to Bob will come directly from a bank controlled address, with a guarantee of no
double spending. If Bob trusts the bank, he can accept the transaction as soon as he receives it.
The money will eventually be his when it's confirmed in the blockchain.
This feature isn't based on Bitcoin system, but is a real world guarantee, so the bank must be
trustable. If the bank ever does double spending, its system is going to collapse quickly. This has
already happened to two famous services that implemented green address:
Instawallet
Mt. Gox
For this reasons, green addresses aren't used as much in Bitcoin as when they were first
proposed. It is necessary to put too much trust in the bank.
Efficient micro-payments
Suppose that Alice is a customer who wants to pay Bob a low amount of money for some
service. So maybe Bob is Alice's wireless service provider, and Alice wants to pay a small
amount of money for every minute she uses it.
It would be very inefficient to create a Bitcoin transaction for every minute of conversation, there
will be too many transactions with low value and too many fees.
A nice solution would be to combine all these small payments into one big payment at the end.
To do this we can:
start with a MULTISIG transaction that pays the maximum amount Alice would ever need
to spend, that requires both Alice and Bob signatures to release the coins
after the first minute of conversation, Alice signs the transaction sending one coin to Bob
and returning the rest to herself.
after another minute, Alice signs another transaction, paying two cojns to Bob and the rest
to herself. At this point Bob hasn't signed anything, yet.
Alice repeats this procedure every minutes of usage. These transactions aren't published on
the blockchain, since Bob signature is missing.
When Bob wants to get his money, he can sign the last transaction and publish it on the
blockchain. He will receive the money he deserves, and the remaining will be sent back to
Alice. The other transactions will never be inserted into the blockchain.
It is impossible to redeem two different transactions generated by Alice, since they are all
technically double spends of the same beginning transaction. If both parties are operating
normally, Bob will never sign any transaction but the last one, so the blockchain won't
actually see any attempt of double-spending.
Lock time
A problem of the micro-payments protocol could be that Bob will never signs any of the
transactions, so all the money that Alice first sent to the multisign address remains blocked. How
can Alice have her money back? She can use another feature called Lock Time.
Before the micro-payment protocol starts, Alice and Bob will both sign a transaction which
refunds all of Alice's money back to her, but is locked until some time in the future.
So before Alice signs the first transaction for the first minute of service, she requires this refund
transaction from Bob. If a certain time T is reached and Bob hasn't signed any of the small
transactions that Alice sent, she can publish this transaction which refunds all of the money
directly to her.
To do this, we will use the LOCK_TIME parameter in the metadata of Bitcoin transactions. It is
possible to specify a value different from 0 for the lock time, which tells the miners not to
publish the transaction until some point in time, based on the timestamps that are put inyo
blocks.
With this feature Alice knows that she can get her money back if Bob never signs the
transactions.
Other applications
There are many other possible applications of Bitcoin protocol, more complicated that the one
explained, for example:
Bitcoin blocks
Why the transactions are grouped together into blocks? There are a couple of reasons:
a block creates a single unit of work for miners, that is bigger than the individual transaction
size. Doing hash and add metadata for every transaction would me too much expensive.
the presence of blocks makes the hash_chain shorter, since we only need one block for a
large number of transactions. So it will also be easier to verify the block chain data
structure.
Blockchain data structure
The blockchain is the combination of two different hash-based data structures:
block header: contains all metadata for the block. The header is the most important part for
the mining puzzle, since we have seen that it has to start with a large number of zeros to be
inserted into the blockchain. The header also contains the nonce that the miners can change
to solve the puzzle and an indication of how difficult was to find the block. Finally there's a
reference to the block body, namely the hash of Merkle tree root. So the header contains all
the information that are hashed during mining.
Merkle Tree of transactions: a long list of transactions organised inside this tree. Inside this
tree there's a special transaction: the coinbase transaction. This is where the creation of
new coins and Bitcoin happens. It almost looks like a normal transaction, with a value
corresponding with the current Bitcoin reward, plus all the block transactions fees. The
difference is that there's no reference to an output of a previous transaction, since it's a
creation of new coins. There's also a special coinbase parameter, that can be populated with
some arbitrary data. There's no actual limits on what miners can put in there.
Block display
The best way to understand blocks structure better is to view them on websites that provide this
feature, for example blockchain.info and many others that let:
It runs over TCP and has a random topology: nodes peered randomly with other nodes. And new
nodes can come at any time. It is possible to download the Bitcoin client today, and start a new
Bitcoin node with the same rights and capabilities as every other existing node.
The network is very dynamic, nodes are coming and going all the time. Although there's no
explicit way to leave the network, if a node doesn't communicate for three hours, other nodes
start to forget it.
Here’s an example of a small network of seven nodes, connected in a random fashion. If a new
node wants to join the network, it sends a message one existing node that's already on the
network, called its seed node. The message will say “tell me all the peers that you have”. Then
the new node proceeds asking the same to the nodes connected to the first one, and so on. At the
end of the iteration it will have a list of peers to make connections with. The new node chooses a
number of peers to connect to from the list and become a full operating node.
Network purposes
The network maintains the blockchain, and to publish a transaction the entire network needs to
hear about it. There's a simple flooding algorithm to make this happen, sometimes called gossip
protocol.
Let’s use again our small network for example purposes. Suppose that node 4 hears about new
transaction. To spread gossip, someone tries to tell the news to many people as he can. Node four
will do the same, notifying the transaction to all the nodes it is connected to (nodes 2 and 3).
Each node maintains a list of all transactions he’s heard about that haven't been put into the
blockchain, yet. The two nodes will add the new transaction to their list and then decide to
forward it to their connected nodes or not.
Of course, if a node receives a notification of a transaction he’s already heard about, he won’t
spread the transaction again. So, when all the nodes know the transaction, the spread process
stops.
Besides checking double heard transactions, a node can decide not to propagate new transactions
he’s never heard about before. In fact, he checks if a transaction is valid or not running the
validation script we’ve seen before. He will refuse transactions that tries to redeem coins already
spent or nonstandard transaction formats. The double spending attempt can be identified even
before one of the transaction is inserted into blockchain. If a node receives two unconfirmed
transactions that try to spend the same money, he won’t spread the second one.
Mining logic
Since Bitcoin network is peer-to-peer and anybody can join, there is always the possibility of a
node not following this exact protocol. It can forward double spends, nonstandard and invalid
transactions. That’s the reason why it’s important that every node repeats the checks.
In addition, it is possible that when a transaction is notified to a node, it isn’t aware of some
other transactions that are already known by other nodes on the network. Let’s take as example a
double spending trial where Alice tries to pay Bob and Charlie with the same money.
Maybe the first node that receives A -> C transaction, isn’t aware of the other A -> B
transactions that has been transmitted a few moments earlier on the other side of the network. So
node 1 considers it valid and spreads it to its neighbors (i.e. node 6 in the image). Node 6 has
already received the transaction A -> B, so refuses the new one and keeps the previous. The
network in this case can be divided in two, but that’s temporary. When one of this two
transaction is put into a block, other nodes will see it and consider invalid the other one they’re
holding, so they’ll drop it.
When there’s a similar situation, nodes usually keep the first transaction they hear about unless it
becomes clearly invalid. But there’s no authority that force them to follow this behavior, so
every node is free to implement any other logic it likes.
Block propagation
The same transactions flood algorithm is used to announce new blocks around the network. In
this case, the nodes are going to verify that the new block is valid by computing the hash. So,
they can make sure that it starts with a sufficient number of zeroes to meet the difficulty target.
Besides validating the headers, the miners have to check also that all transactions contained
inside it are valid and new.
In addition, a node shouldn't forward a block, unless it builds on their perspective of the current
longest chain. The nodes have a view of the blockchain, and they should only forward new
blocks if they come at the very end of the chain, not at some earlier point. This avoids the
possibility of forks.
Network statistics
The average time for new blocks to propagate over all the network is around 30 seconds. This
shows that the protocol isn’t so efficient. This is probably because it was designed to be simple e
so that every node is equal to other. The network isn’t optimized for fast communication. But,
for Bitcoin, it’s more important to have a decentralized structure where all the nodes are equal.
It’s difficult to count the number of online nodes. But some researchers have estimated that there
are over a million IP connected in a month that are both running the Bitcoin protocol and acting
as a node. However, the number of full nodes permanently connected and that validated every
transaction they hear is lower. It’s only about 5 or 10 thousands.
Instead of being a fully validating node, there are lightweight nodes (thin clients or simple
payment verification clients). This is the vast majority of nodes on the Bitcoin network. The
difference is that these nodes aren't attempting to store the entire blockchain. They only store the
pieces they need to verify some specific transactions they care about.
For example, if you run a wallet, your wallet might want to be a simple payment verification
node. And if somebody sends money to you, you'll act as a node. You will download the bits of
the blockchain necessary to verify that the person sending you the money, actually owned it.
Then you’ll check the transaction actually gets included in the blockchain.
An SPV client like this won't have the full security level of being a fully validating node. When
they hear a new block, they can only check the block header. But they can't check the validity of
transactions inside it, since they don't own the entire blockchain. They can only validate the
transactions that actually affect them. So they're essentially trusting the fully validating nodes
about the validity of other transactions.
Visa network can handle an average of 2000 transactions per second around the world
Paypal can handle 100 transactions per second
Bitcoin cryptography
Another important limitation people are worried about in the long term, is that the Bitcoin
cryptography is fixed. It is composed by only a couple of algorithms and only one signature
algorithm, based on the elliptic curve secp256k1. In the future this algorithm could be broken.
The same concern is true for the hash function SHA-1 included in Bitcoin. Because this
function has already some known cryptographic weaknesses.
Software realeases
To change some aspects, it would be necessary to release a new version of the software and hope
that everybody accept it. So, it would be an hard-forking change and it's impossible to assume
that every node would upgrade. If only a group of nodes accept the new version, the blockchain
would split. So the nodes will accept only one branch of it depending on the version of their
software. It's clearly unacceptable.
By contrast, there's an approach called soft forking, which tries to avoid the creation of
permanent forks. The idea is that it is possible to add new features to the Bitcoin protocol, only if
they restrict the set of valid transactions or of valid blocks. So that nodes that don't upgrade will
consider the new blocks inserted valid. While it could be possible that blocks created with the
older version, are rejected by the upgraded ones. In this way, the nodes are encouraged to install
the last version of software.
An example of this kind of release was the pay-to-hash-script. The new nodes checks if the hash
of the data is equal to the value specified in the output script. While the old nodes, never do that
second step, but only check that the value before being hashed corresponds to a valid script. So
pay-to-script-hash were considered valid also by the old nodes.
Other changes might require a hard-fork, for example to introduce new OP_CODES into Bitcoin,
change the limits of block or transaction sizes, or do a lot of bug fixes. For example, the bug
regarding the multi-signature transactions that pops an extra value off the stack.