Bitcoins the hard way: Using the raw Bitcoin protocol
Ken Shirriff’s blog
Xerox Alto restoration, IC switch roles engineering, chargers, and whatever
Bitcoins the hard way: Using the raw Bitcoin protocol
(Feb 23: I have a fresh article that covers the technical details of mining. If you like this article, check out my mining article too.)
This blog post starts with a quick overview of Bitcoin and then leaps into the low-level details: creating a Bitcoin address, making a transaction, signing the transaction, feeding the transaction into the peer-to-peer network, and observing the results.
A quick overview of Bitcoin
To simplify slightly, bitcoins consist of entries in a distributed database that keeps track of the ownership of bitcoins. Unlike a bank, bitcoins are not tied to users or accounts. Instead bitcoins are possessed by a Bitcoin address, for example 1KKKK6N21XKo48zWKuQKXdvSsCf95ibHFa .
A key innovation of Bitcoin is how transactions are recorded in the distributed database through mining. Transactions are grouped into blocks and about every ten minutes a fresh block of transactions is sent out, becoming part of the transaction log known as the blockchain, which indicates the transaction has been made (more-or-less) official. Bitcoin mining is the process that puts transactions into a block, to make sure everyone has a consistent view of the transaction log. To mine a block, miners must find an enormously infrequent solution to an (otherwise-pointless) cryptographic problem. Finding this solution generates a mined block, which becomes part of the official block chain.
Mining is also the mechanism for fresh bitcoins to come in the system. When a block is successfully mined, fresh bitcoins are generated in the block and paid to the miner. This mining bounty is large – presently twenty five bitcoins per block (about $Nineteen,000). In addition, the miner gets any fees associated with the transactions in the block. Because of this, mining is very competitive with many people attempting to mine blocks. The difficulty and competitiveness of mining is a key part of Bitcoin security, since it ensures that nobody can flood the system with bad blocks.
The peer-to-peer network
Blocks and transactions are identified by a 256-bit cryptographic hash of their contents. This hash value is used in numerous places in the Bitcoin protocol. In addition, finding a special hash is the difficult task in mining a block.
Diving into the raw Bitcoin protocol
It turns out that actually using the Bitcoin protocol is firmer than I expected. As you will see, the protocol is a bit of a jumble: it uses big-endian numbers, little-endian numbers, fixed-length numbers, variable-length numbers, custom-made encodings, DER encoding, and a multiplicity of cryptographic algorithms, seemingly arbitrarily. As a result, there’s a lot of annoying manipulation to get data into the right format.
The 2nd complication with using the protocol directly is that being cryptographic, it is very unforgiving. If you get one byte wrong, the transaction is rejected with no clue as to where the problem is.
The final difficulty I encountered is that the process of signing a transaction is much more difficult than necessary, with a lot of details that need to be correct. In particular, the version of a transaction that gets signed is very different from the version that actually gets used.
Bitcoin addresses and keys
Bitcoin uses a multiplicity of keys and addresses, so the following diagram may help explain them. You begin by creating a random 256-bit private key. The private key is needed to sign a transaction and thus transfer (spend) bitcoins. Thus, the private key must be kept secret or else your bitcoins can be stolen.
The Elliptic Curve DSA algorithm generates a 512-bit public key from the private key. (Elliptic curve cryptography will be discussed later.) This public key is used to verify the signature on a transaction. Inconveniently, the Bitcoin protocol adds a prefix of four to the public key. The public key is not exposed until a transaction is signed, unlike most systems where the public key is made public.
The next step is to generate the Bitcoin address that is collective with others. Since the 512-bit public key is inconveniently large, it is hashed down to one hundred sixty bits using the SHA-256 and RIPEMD hash algorithms. The key is then encoded in ASCII using Bitcoin’s custom-built Base58Check encoding.[Ten] The resulting address, such as 1KKKK6N21XKo48zWKuQKXdvSsCf95ibHFa , is the address people publish in order to receive bitcoins. Note that you cannot determine the public key or the private key from the address. If you lose your private key (for example by throwing out your hard drive), your bitcoins are lost forever.
Eventually, the Wallet Interchange Format key (WIF) is used to add a private key to your client wallet software. This is simply a Base58Check encoding of the private key into ASCII, which is lightly reversed to obtain the 256-bit private key. (I was nosey if anyone would use the private key above to steal my eighty cents of bitcoins, and sure enough someone did.)
To summarize, there are three types of keys: the private key, the public key, and the hash of the public key, and they are represented externally in ASCII using Base58Check encoding. The private key is the significant key, since it is required to access the bitcoins and the other keys can be generated from it. The public key hash is the Bitcoin address you see published.
I used the following code snippet to generate a private key in WIF format and an address. The private key is simply a random 256-bit number. The ECDSA crypto library generates the public key from the private key. The Bitcoin address is generated by SHA-256 hashing, RIPEMD-160 hashing, and then Base58 encoding with checksum. Eventually, the private key is encoded in Base58Check to generate the WIF encoding used to come in a private key into Bitcoin client software. Note: this Python random function is not cryptographically strong; use a better function if you’re doing this for real.
Inwards a transaction
The diagram above shows a sample transaction “C”. In this transaction, .005 BTC are taken from an address in Transaction A, and .003 BTC are taken from an address in Transaction B. (Note that arrows are references to the previous outputs, so are rearwards to the flow of bitcoins.) For the outputs, .003 BTC are directed to the very first address and .004 BTC are directed to the 2nd address. The leftover .001 BTC goes to the miner of the block as a fee. Note that the .015 BTC in the other output of Transaction A is not spent in this transaction.
Each input used must be entirely spent in a transaction. If an address received one hundred bitcoins in a transaction and you just want to spend one bitcoin, the transaction must spend all 100. The solution is to use a 2nd output for switch, which comebacks the ninety nine leftover bitcoins back to you.
Transactions can also include fees. If there are any bitcoins left over after adding up the inputs and subtracting the outputs, the remainder is a fee paid to the miner. The fee isn’t rigorously required, but transactions without a fee will be a low priority for miners and may not be processed for days or may be discarded entirely. A typical fee for a transaction is 0.0002 bitcoins (about twenty cents), so fees are low but not trivial.
By hand creating a transaction
Following the specification, the unsigned transaction can be assembled fairly lightly, as shown below. There is one input, which is using output zero (the very first output) from transaction 81b4c832. . Note that this transaction hash is inconveniently reversed in the transaction. The output amount is 0.00091234 bitcoins (91234 is 0x016462 in hex), which is stored in the value field in little-endian form. The cryptographic parts – scriptSig and scriptPubKey – are more complicated and will be discussed later.
Here’s the code I used to generate this unsigned transaction. It’s just a matter of packing the data into binary. Signing the transaction is the hard part, as you’ll see next.
How Bitcoin transactions are signed
By performing several steps, anyone can verify that the transaction is authorized by B. Very first, B’s public key must correspond to B’s address in the previous transaction, proving the public key is valid. (The address can lightly be derived from the public key, as explained earlier.) Next, B’s signature of the transaction can be verified using the B’s public key in the transaction. These steps ensure that the transaction is valid and authorized by B. One unexpected part of Bitcoin is that B’s public key isn’t made public until it is used in a transaction.
With this system, bitcoins are passed from address to address through a chain of transactions. Each step in the chain can be verified to ensure that bitcoins are being spent validly. Note that transactions can have numerous inputs and outputs in general, so the chain branches out into a tree.
The Bitcoin scripting language
The Script language is remarkably sophisticated, with about eighty different opcodes. It includes arithmetic, bitwise operations, string operations, conditionals, and stack manipulation. The language also includes the necessary cryptographic operations (SHA-256, RIPEMD, etc.) as primitives. In order to ensure that scripts terminate, the language does not contain any looping operations. (As a consequence, it is not Turing-complete.) In practice, however, only a few types of transactions are supported.
In order for a Bitcoin transaction to be valid, the two parts of the redemption script must run successfully. The script in the old transaction is called scriptPubKey and the script in the fresh transaction is called scriptSig. To verify a transaction, the scriptSig executed followed by the scriptPubKey. If the script completes successfully, the transaction is valid and the Bitcoin can be spent. Otherwise, the transaction is invalid. The point of this is that the scriptPubKey in the old transaction defines the conditions for spending the bitcoins. The scriptSig in the fresh transaction must provide the data to sate the conditions.
In a standard transaction, the scriptSig shoves the signature (generated from the private key) to the stack, followed by the public key. Next, the scriptPubKey (from the source transaction) is executed to verify the public key and then verify the signature.
As voiced in Script, the scriptSig is: The scriptPubKey is:
When this code executes, PUSHDATA very first shoves the signature to the stack. The next PUSHDATA shoves the public key to the stack. Next, OP_DUP duplicates the public key on the stack. OP_HASH160 computes the 160-bit hash of the public key. PUSHDATA shoves the required Bitcoin address. Then OP_EQUALVERIFY verifies the top two stack values are equal – that the public key hash from the fresh transaction matches the address in the old address. This proves that the public key is valid. Next, OP_CHECKSIG checks that the signature of the transaction matches the public key and signature on the stack. This proves that the signature is valid.
Signing the transaction
The thickest complication is the signature shows up in the middle of the transaction, which raises the question of how to sign the transaction before you have the signature. To avoid this problem, the scriptPubKey script is copied from the source transaction into the spending transaction (i.e. the transaction that is being signed) before computing the signature. Then the signature is turned into code in the Script language, creating the scriptSig script that is embedded in the transaction. It shows up that using the previous transaction’s scriptPubKey during signing is for historical reasons rather than any logical reason. For transactions with numerous inputs, signing is even more complicated since each input requires a separate signature, but I won’t go into the details.
One step that tripped me up is the hash type. Before signing, the transaction has a hash type constant temporarily appended. For a regular transaction, this is SIGHASH_ALL (0x00000001). After signing, this hash type is liquidated from the end of the transaction and appended to the scriptSig.
Another annoying thing about the Bitcoin protocol is that the signature and public key are both 512-bit elliptic curve values, but they are represented in totally different ways: the signature is encoded with DER encoding but the public key is represented as plain bytes. In addition, both values have an extra byte, but placed inconsistently: SIGHASH_ALL is put after the signature, and type four is put before the public key.
Debugging the signature was made more difficult because the ECDSA algorithm uses a random number.[Legitimate] Thus, the signature is different every time you compute it, so it can’t be compared with a known-good signature.
Update (Feb 2014): An significant side-effect of the signature switching every time is that if you re-sign a transaction, the transaction’s hash will switch. This is known as Transaction Malleability. There are also ways that third parties can modify transactions in trivial ways that switch the hash but not the meaning of the transaction. Albeit it has been known for years, malleability has recently caused big problems (Feb 2014) with MtGox (press release).
With these complications it took me a long time to get the signature to work. Eventually, however, I got all the bugs out of my signing code and succesfully signed a transaction. Here’s the code snippet I used.
The final scriptSig contains the signature along with the public key for the source address ( 1MMMMSUb1piy2ufrSguNUdFmAcvqrQF8M5 ). This proves I am permitted to spend these bitcoins, making the transaction valid.