Hive & Bitcoin: Analytics on Blockchain data with SQL

2016-04-28 --- Jörn Franke

You can now analyze the Bitcoin Blockchain using Hive and the hadoopcryptoledger library with the new HiveSerde plugin.

Basically you can link any data that you loaded in Hive with Bitcoin Blockchain data. For example, you can link Blockchain data with important events in history to determine what causes Bitcoin exchange rates to increase or decrease.

The site provides several examples on how to use SQL in Hive to do calculation upon Blockchain data, such as

Of course, you can calculate nearly anything you can imagine using the Bitcoin Blockchain data as input. Furthermore, you can link the data with other data.

Although accessing Bitcoin blockchain data is rather fast for analytics, you can optimize your analytics by extracting often used data from the blockchain and storing them in a format optimized for analytics, such as the columnar format ORC in Hive.

The following simple example shows how you can do this. I assume that the Bitcoin Blockchain data is represented as the table “BitcoinBlockchain” and you want to copy the hashsum of each Bitcoin block, the block size and the version number in the table “BlockAnalytics” optimized for analytics:

CREATE TABLE BlockAnalytics STORED AS ORC AS SELECT hashmerkleroot, blocksize, version FROM BitcoinBlockchain;

Of course you can access the tables in Hive with analytical and visual analytic tools, such as Tableau, Matlab, SAS, R, SAP Lumira, DS3.js etc.

In the coming weeks, further extensions are planned to be published: