This article is the tenth in a guide to the more exquisitely technical part of Bitcoin and the blockchain, accessible even to those who aren’t experts in coding. This article continues a sort of guide designed to gradually enter what many call the “rabbit hole”.
As far as the bibliography is concerned, it is necessary to mention the book “Mastering Bitcoin” by Andreas M. Antonopoulos, from which the images have been taken.
The Bitcoin Blockchain
The blockchain is a data structure consisting of a list, sorted and back-linked, of blocks of transactions, which can be stored as a flat file, or in a simple database. Bitcoin Core stores the blockchain metadata using Google’s LevelDB database. Each block is identified by a hash (SHA256 of the block header). Each block is linked to the previous one (parent) using the previous block hash field in the block header. The hash sequence links each block to its parent creating a reverse chain to the first block created (genesis block).
Although each block has only one parent (there is only one previous block hash field), it can temporarily have multiple children, born during a “fork”, a temporary situation that can happen when two new blocks are mined at approximately the same time. The previous block hash field is inside the block header ergo affects the current block hash (if the previous block hash was changed, the current block hash would change accordingly).
Structure of a block
A block is a “container” data structure that aggregates transactions to be included in the public ledger, the blockchain. A block is composed of a header, containing metadata, followed by a list of transactions (a complete block takes up about 10,000 times the header space).
The Header of the block consists of three sets of metadata:
- A reference to the hash from the previous block;
- Difficulty, Timestamp and Nonce (values related to mining);
- The Merkle Tree Root.
Identifiers: Block Header Hash and Block Height
The main indicator of a block is the cryptographic hash, its fingerprint, made by double hashing (SHA256) the block header. The resulting 32-byte hash is the block hash, or more precisely the block header hash. This is the one corresponding to the first block ever created:
Note that the block hash is not included in the block data structure, neither when the block is transmitted to the network nor when it becomes part of the blockchain. Instead, it is calculated by each node when the block is received by the network. Another way to identify a block is through its position in the blockchain, called block height. The first block created has a height of “0”, while on January 1st, 2017 it reached a height of about 446,000. Unlike the block hash, the height (which is not part of the data structure of a block) is not a unique identifier, since the same “height” can identify several blocks during a fork.
The Genesis Block
The first block or “block zero” is called genesis block. Each node starts with the blockchain or at least one block and then it’s encoded in the client so it can’t be altered (it’s the foundation for building a reliable blockchain), ergo each node knows the hash and structure of the genesis block, when it was created etc.
Looking for the hash of “block zero” in any block explorer, you’ll find a page describing the content of this block, with a URL containing the hash:
The block contains a hidden message. The coinbase transaction input contains the text “The Times 03/Jan/2009 Chancellor on brink of second bailout for banks.” This is to demonstrate, referring to The Times headline, the proof of the date, and to recall the importance of an independent monetary system.
Linking Blocks in the Blockchain
A full node keeps a local copy of the blockchain, starting from the genesis block, which is constantly updated when a block is found and attached to lengthen the chain, once it has been received by the network and validated. To establish the link, the node examines the previous block hash in the incoming block header.
Each block in the Bitcoin blockchain contains a summary of all transactions in the block, using a merkle tree, or binary hash tree, a data structure used to efficiently summarize and verify the integrity of large data sets. Merkle trees are binary trees containing cryptographic hashes. The term “tree” is used in computer science to describe a branched data structure, but these trees are usually displayed upside down with the “root” at the top and the “leaves” at the bottom of a diagram.
Merkle trees are used in Bitcoin to summarize all transactions in a block, producing an overall fingerprint (hash) of the entire set of transactions, providing a very efficient process for verifying whether a transaction is included in a block. A merkle tree is built from pairs of recursive hashing nodes until there is only one hash, called a merkle root. The cryptographic hash algorithm used in Bitcoin merkle trees is SHA256 applied twice.
When N data elements are hashed and summarized in a merkle tree, it is possible to check if any data element is included in the structure with maximum 2 * log ~ 2 ~ (N) calculations, making this a very efficient data structure.
Since a merkle tree is a binary tree, it needs an even number of “leaves”. If there is an odd number of transactions to summarize, the hash of the last transaction will be duplicated to create an even number of leaf nodes, also known as a “balanced tree”.
The same method for building a four transaction tree can be generalized to build trees of any size. In Bitcoin it is common to have several hundred to more than a thousand transactions in a single block, which are summarized exactly the same way, producing only 32 bytes of data as the single merkle root.
Merkle Trees and Simplified Payment Verification (SPV)
Merkle trees are widely used by SPV nodes. SPV nodes do not have all transactions and do not download complete blocks, only block headers. To verify that a transaction is included in a block, without having to download all the transactions in the block, they use an authentication path or merkle path.
Consider, for example, an SPV node interested in incoming payments to an address in its wallet. The SPV node will establish a bloom filter on its peer connections to limit the transactions received only to those containing the addresses of interest. When a peer sees a transaction that matches the bloom filter, it will send that block using a merkleblock message. The merkleblock message contains the block header and a merkle path linking the transaction of interest to the merkle root in the block. The SPV node can use this merkle path to connect the transaction to the block and verify that the transaction is included in the block. The SPV node also uses the block header to connect the block to the rest of the blockchain. The combination of these two links, between the transaction and the block and between the block and the blockchain, shows that the transaction is recorded in the blockchain. Ultimately, the SPV node will have received less than one kilobyte of data for the block header and the merkle path, which is more than a thousand times less than a complete block (about 1 megabyte).
Besides the “main” blockchain (mainnet) there are other blockchains used for testing: currently testnet, segnet, and regtest (and currently working on simnet).