Blockchain analytics has become a trending topic in recent years. This topic is of interest not only for public blockchains, such as Bitcoin or Ethereum and their Altcoins, but also for private/permissive blockchains based on various technologies. Nevertheless, there are many challenges involved, such as the large data volumes, the inefficient format for analytics, state data, crypto-exchange data, privacy, dark blockchain data, sharded chains, off-chain peer-to-peer exchanges, the need for combined batch/stream analytics and the secure, reliable collection of the raw data as well as visualisation.
As you can see, Blockchain Analytics is not trivial, but I will work you through step by step. Interestingly, there are only few academic publications on this topic with only highly specific analysis use cases or technologies for the analytical database.
Blockchain is not only limited to analysis of behaviour of cryptocurrencies. Complex interactions take place on blockchains allowing them, such as Ethereum. For example, execution of business processes and various interactions according to contracts.
What are Blockchains?
Practical usable public blockchains exist since over a decade and have been run on a large scale with a huge audience since then. They are basically decentral distributed databases that allow to execute transactions in a decentral manner on the blockchains where everyone can verify that they have been executed and what the result was. Failure of one or more nodes can be recovered without impacting or only little impacting the ability to execute transactions. Their initial motivation was to provide Cryptocurrencies, which are an alternative payment method independently of any authority to issue a currency. Modern blockchains, such as Ethereum, can also have multiple cryptocurrencies on the same blockchain.
Enabling complex contracts and governance structures
Nowadays they have been significantly extended to implement complex contracts, asset ownership and even (inter-)organisational governance structures. Not all of the blockchains support this. For example, Bitcoin only allows for rather simple exchanges between two or more parties. Ethereum and others allow virtually any types of contracts and structures by providing a Turing-complete language that is executed on the blockchain.
One example is an decentralized autonomous organisation (DAO) on the Ethereum blockchain, which is an organisation without management but based on rules as defined in its contract – public and audit-able for all. There are various forms of DAOs, e.g. to bring together companies and freelancers, decentral finance (DeFi) that aim at providing financial intermediaries as smart contracts, integrating hundreds of organisations as part of a decentralized supply chain management or insurance contracts potentially linked with reliable data on blockchain of Internet-of-Things (IoT) devices.
Usually each blockchain consist of several networks, e.g. the Bitcoin blockchain consists of: TestNet3 and MainNet. The first one is for testing new changes to the software and networks. Hence, it is not used for real transactions. MainNet is the blockchain used by everyone for secure transactions. Ethereum has a similar set of networks.
Altcoins usually use a modified version of the software used by another coin (e.g. the one of Bitcoin or the one of Ethereum), but basically build their own set of networks independent of the network of the original blockchain. However, there are exceptions, for example Namecoin uses the power of the Bitcoin network to do its mining to increase its changes to be a reliable blockchain for identity (Auxiliary Proof of Work (AuxPoW)).
The lightning network for Bitcoin proposes an additional network, which is strongly connected to the Bitcoin blockchain, but allows transaction in a much cheaper and faster way as they run on a side-chain or off-chain channel.
Also the new Ethereum 2.0 Blockchain that migrates from a Proof-of-Work (PoW) to a Proof-of-Stake (PoS) will use two blockchains in parallel for at least some time and both are linked, so that the PoS blockchain can safely be used and evaluate all potential risks without endangering transactions.
State of blockchain analysis
Existing methods for blockchain analysis have been largely based on blockchain explorers (see also at the end of this blog post), which are centrally organised, do not take into account security aspects of the analysis and are not flexible for new types of analysis. Aside of this, there has been little innovation related to blockchain analytics – especially on secure flexible analysis architecture as it is a very complex topic and difficult to do correctly.
What analysis can you do on Blockchain data?
There are numerous of use cases. Find here some examples:
- Compare traditional payment systems with blockchains
- Economical and security analysis of consensus mechanisms needed to operate a blockchain.
- Blockchain economics – fees, actors (e.g. users, miners, stakers, mining pools, stacking pools) and capacity
- Analysing decentralised automated financial markets (DeFi)
- Money laundering activities
- Voting results of blockchain software upgrades
- Donations analysis, e.g. several open source software, such as GnuPG, or public non profit web sites, such as the Internet Archive, support donation in cryptocurrencies
- Bribery activities
- Currency fraud activities
- Realtime analysis of digital goods markets (cf. SorareData for Sorare)
- Salary payments, some organisations, such as Internet Archive, pay part or the full salary of employees in cryptocurrency
- Network analysis – who is involved in a blockchain, such as developers, validators, miners or auditors. Power concentration is one of the important aspects
- Timeseries analysis of economic aspects in the blockchain, such as transaction fees, mining rate, stakes, difficulty, transaction execution speed, network governance structures (e.g. mining concentration), network voting on new versions of the software with new features
- Analysis of unusual data stored on the blockchain, such as images, messages
- Analysis of Dark Blockchain Data – unsuccessful transactions, forks due to CAP theorem, CAP, general disagreements of network participants, spamming of fraudulent transactions
- Analysis of the Dark Web and use of cryptocurrencies
- Location mining of network participants based on IP addresses sending a transaction
- Analysis on resilience of a blockchain, such as
- Tendency of forks in a blockchain and how quickly they are resolved (see a Bitcoin fork monitoring tool here). Forks can occur naturally due to the CAP-theorem or on purpose if there is a disagreement by the participants on the future of the blockchain. In the latter case the fork will not be resolved or only resolved much later if the participants agree to rejoin.
- Revealing anonymized participants in transactions
- Based on addresses that are published somewhere else on the web (e.g. forums) and connected or connectable with the identity of a person or organisation
- Based on inputs and outputs of anonymization actors on blockchains, such as Mixers
- Using clustering algorithms to deduce transactions belonging to the same user
- Lost crypto asset analysis: How many crypto assets have been lost, i.e. their owner does not have access anymore to the key for them. Note: This analysis is impossible to do correctly or estimate realistically despite several publications try to do this. It is impossible to know if an owner has lost their private key or not based on blockchain data. Sometimes you will find stories that a person claims that he or she lost a certain amount of cryptocurrency, but we do not know if this is true and we cannot deduct from those single cases that make it to the news how this affect the population, i.e. the sample is too small and not representative
The Blockchain Analytics Process
The following diagram illustrates the process of blockchain analytics. We will address the data to be processed in the next section and focus in the following paragraphs on the individual steps.
I will explain the two main parts of the process in the subsequent paragraphs:
- Based on data directly related to processing and operating the blockchain (left side)
- Based on data related to external data connected to the blockchain data (right side)
Both can be interlinked to complement the information. For example, certain actors provide in the blockchain data their signature or „graffiti“ (see e.g. here). This signature can be linked with real information about the actor from the web provided intentionally or unintentionally by the actor. This can be also found in popular blockchain explorers.
Collection of information on the blockchain
Processing of information on the blockchain starts with collecting blockchain data. This has been in the past and is still to some extent currently rather easy. You use one of the many public open source technologies for the blockchain that you want to analyse and start letting the tool download and verify the blockchain. While it can consume a lot of space (e.g. Bitcoin and Ethereum each have several hundreds of gigabyte in size) and some time, it is still feasible.
However, just running one node to collect blockchain data might be subject to attacks to temper the information that the one node gets. While it is very difficult to send incorrect transaction on the full blockchain, one attacker might flood nodes used for blockchain analysis with wrong information so that they in turn do wrong analysis possibly published on the web readable for all (see also below for a more detailed description of the issue).
Furthermore, recently popular blockchain technologies addressed the problem of network capacity and transaction speed by introducing new techniques, such as shared chains or off-chain/side-chain transactions. The bottom line for analyses of those blockchains is that certain transactions might not be collected as they are related to the blockchain, but not happening on the blockchain. While it is probably not possible to get all transactions, a more sophisticated deployment of analysis nodes might help to address this to some extent.
Finally, some data, such as dark blockchain data cannot be collected without modifying the original client to keep such data or by also using the logfiles and network traces of those clients in the analysis.
Collection of external data
External data is all data not found on the blockchain but on the web. Examples are exchange rates of cryptocurrencies with traditional currencies (FIAT money) or identities of blockchain actors. Other data could be the geocoding of IP addresses to locations or mailing lists/wikis/documents of developers writing the blockchain software or actors within the blockchain on operational questions (e.g. „how to run a miner“ or „how to do stacking“).
Further away you find related news events that might have impacted transaction behaviour on the blockchain. Forums are another frequent source of information as they may contain blockchain addresses and identities behind those addresses can be linked via their forum activity.
Some organisations, such as the Internet Archive, accept donations in payment. Their identity can be seen publicly on the website to do analysis on who donates to them and to where the donations flow to.
There is no real limit on what could be relevant for analysis and any type of analysis can be imagined (e.g. here for an analysis of Bitcoin transaction on the Darknet or here for donations to political campaigns in the United States).
Before the data can be analysed it needs to be transformed. These are difficult and different challenges for both types of data.
Transforming Blockchain Data
Blockchain data has three issues. First, it is in a format useful for transactions, but highly inefficient for analysis. For example, to get the value of an account you need to process the whole blockchain from begin to the point in time you want to have its balance. The underlying format used by most blockchains is also not very efficient in the sense that you cannot skip unnecessary blocks for making the analysis. While each blockchain node has to do this once to be able to participate in a blockchain, for analysis you have to make it every time you conduct an analysis. Blockchain nodes also usually store only the current state and not historical states, which are probably the most valuable for analysis. Additionally, blockchain nodes store this in databases, such as LevelDB or RocksDB which are not suitable for analysis purposes. Hence, it needs to be converted to a more optimised format for analysis, such as Apache ORC or Apache Parquet.
Second, the transactions and possibly complex smart contracts need to be executed/verified to get the so-called state data containing probably the more interesting information. Obviously it requires that the data is complete.
Third, data from blockchain clients related to the blockchain (network data, log data etc.) needs to be prepared and put in an efficient format for searching and/or analysis.
Transforming external data
Transforming external data has also multiple issues. First, where to find it. This is not obvious as there are many websites on the public web and also many on the dark net. Second, how to know that it is correct and no one seeded wrong information. In the end, we do not know who is really behind an address. Someone might put an innocent address for receiving donations for a good case in a forum for trading illegal products or services. Third, there is a large heterogeneity of information that needs to be matched and integrated – a problem with many solutions and none of them is easy.
Analysing blockchain and external data
Once the data is transformed, it can be analysed. The complexity lies here really in understanding blockchain data in detail – which requires usually relying on documentation, but in most of the cases the analyst need to look into source code of blockchain software to better understand it. More complex blockchain software, such as Ethereum, can have complex contracts – each of them a knowledge domain in itself ranging from various cryptocurrencies with very different properties over cryptokitties to agriculture exchanges.
Analysing external data involves also a lot of learning – each source can be very different and requires different expertise. For example, geocoding of IP addresses require to know much about the limitations of free databases for doing that and also knowledge of geospatial analysis. Understanding the dark net and its data is also very different from the public web. Additionally, it is not something taught in schools or universities regularly.
As you can see analysis of this type of data requires a lot of different expertise and very senior guided expertise. You cannot just dump the data for an analysis and expect any valuable result in a week from someone who has never seen this data. In this case it would take months or years.
Finally, I recommend creating reproducible data products in high quality focusing on specific analysis aspects by experts reusable for analysis for others. This will increase the quality of the analysis and helps people with little prior knowledge to get easier in the analysis topic.
Blockchain data is the core of any blockchain technology. It describes blocks of transactions, links between blocks (the chain) and links between transactions (input and output). You may wonder why there needs to be links between blocks as there are already links between transactions. Basically, the link between blocks confirm that the transaction has happened (it was included in a block).
State Data is any data derived fully from the data in the blockchain by verifying (note: I omit the term execution on purpose as it might be confusing) the transaction. For example, some transactions move an asset, such as cryptocurrencies, from one account to another. State data would be in this case what is the balance of account X at time T. As you can imagine state data can grow a lot especially taking into account the full history of a blockchain. This common problem is known as state explosion. However, it might not be as drastic as one think as, for example, in Ethereum there are economic incentives to prevent it and further mechanisms address this. For example, using a lot of state requires a lot of fees. Obviously this limits to some extent what your contract can do for the sake of efficiency of the blockchain.
State data is very interesting for analysis as it contains timeseries of various aspects of the blockchain. Unfortunately, it 1) has to be verified (i.e. created) 2) continuously updated 3) ensured that it is correct. More on this in the paragraph on collecting blockchain data.
State data can represent huge variety of information from different domains as it represents (intermediate) results of smart contracts. Some of this information might not be documented and its logic only understood by people directly involved in those contracts. For example, just by looking at the contract and the states it generates you may not be able to determine if it is commodity related or a cryptocurrency market maker. Of course, you can do a more or less educated guess.
Creating a timeseries of all states of blockchain can be very time consuming. For instance, Ethereum full nodes only store the current state and only Ethereum archival nodes store the full state history. The latter require currently approximately 10x more space.
Dark Blockchain data
Dark blockchain data is any data that is produced by a blockchain node, but that is not relevant long-term to be stored on the blockchain. For example, log files produce information, such as IP addresses, with which other nodes a blockchain node has interacted. Those can be used for network analysis.
Other examples are blockchain forks, especially those that do not lead to new blockchains, i.e. that are not persistent. They happen naturally, by overload or due to stress on the blockchain nodes due to attacks, such as spamming the network with fraudulent transactions.
This data is not captured systematically by blockchain nodes and may require modifications to the blockchain software to be collected.
External data valuable for blockchain analytics can come from various sources. The following tables provides some examples.
|Blockchain Explorer||Blockchain explorers are probably historically the oldest source of blockchain data. While they contain (e.g. blockchain.com, etherscan.io or beaconcha.in)|
|Crypto-Exchanges||Crypto-Exchanges are usually run as central websites and they offer to exchange one (crypt-)currency against another. Very few run directly on the blockchain (e.g. Uniswap)|
|Real world investigations||Real world investigations may bring up insights related to blockchain activities that cannot be found on the web (e.g. in the case of Silk Road)|
|Events||Events happening in the world may be relate-able to economic and other activities on the blockchain|
|Dark Web||The dark web provides also sources of information that can be used for blockchain analysis as most sites there support it. Additionally a network analysis of the dark web can provide additional insights.|
|Court decisions||Court decisions can be mined to see how they impacted blockchain activities|
|Content||Blockchains allow to manage content where the content itself is stored in a different network (see Swarm or IPFS)|
Privacy is an under-investigated aspect of blockchain analytics. Despite being public, they do contain private data and thus are subject to the General Data Protection Regulation (GDPR). This may sound odd to people not dealing much with privacy, but it is logical. A person that publishes private data, e.g. a message in a social network, does not give consent that this data is automatically processed to create profiles or other analysis about a person. This might be different for celebrities or publicly known people.
Thinking about this it would be a horrific vision that everything what has been published by a person can be used against this person. Hence, the right to be forgotten exists.
While one claim that most of the blockchains are anonymously, it is surprisingly easy to determine based on external data identities behind blockchain transactions.
It has been acknowledged in the literature that privacy is a problem under current regulation
- Bitcoin nodes are joint controllers according to GDPR and thus need to demonstrate compliance according to GDPR
- Smart contracts are a form of solely automated processing according to GDPR
This may also impact to do secure blockchain analytics legally – so you should always consult a data privacy expert when doing it – even if the data is public.
Distributed chains are an additional new challenge for blockchain analysis. Particularly because one node in the blockchain does not store all the transactions up to the extreme that certain transactions are only known between two parties. The mechanisms for distributing chains are not mutually exclusive – one can use one or more – depending on the requirements.
The underlying concept of shared chains is basically that one node does not store the full blockchain, but only a part – called a shard – of it. In order to avoid that it gets lost, blockchains supporting sharding try to control the number of shards as this should not grow too large. Furthermore, the more shards there are the more difficult it is to get a transaction processed.
The number of shards define the network capacity for transactions that can be processed at the same time.
Theoretically, a node can still receive information on all shards for the purpose of analysis, but verification will need more time and thus exclude „real-time“ analysis. Hence, for the purpose of analysis one may also want to use several nodes to collect the data.
Ethereum 2.0 wants to implement shards on its blockchain with initially 64 shards. There are several considerations for this, for example, if shards should only host data or also execute contracts. There are also further considerations of a hierarchy of nodes and what amount of data they store. This will be also relevant for blockchain analytics as it can help to determine the optimal network setup for those endeavours. Finally, there are a lot of different mechanisms suggested for Ethereum 2.0 to avoid attacks against the blockchain (cf. here). Additionally, new actors come into play, such as validators, slashers and beacon clients. Those should be complemented with blockchain analytics tools to get a valid view on the state of the blockchain. However, clearly blockchain analytics is much more difficult and complex to achieve in this scenario.
Aside sharding there is also another mechanism, where transactions run off-chain, but with reference to the blockchain.
One prominent approach is the lightning network, which has initially been connected to Bitcoin, but can in theory also be used for other blockchains. In a lightning network a channel is opened on the blockchain between two participants about a specific sum of cryptocurrency. However, all transactions related to this sum happen off-chain in the lightning network, e.g. participant A sends 5 units of a cryptocurrency to participant B and after some time participant B sends 2 units of cryptocurrency to participant A.
This also works if for example participant A wants to send units of cryptocurrency to participant C, but does not have a channel with participant C. Participant A only needs to have a channel with other participants connected directly or indirectly to participant C. For example, if participant B is connected to participant C then participant B can route the transaction to participant C. The transactions are routed through a kind of onion routing network and thus difficult to trace.
A lightning network provides a foundation for processing more transactions than any centralised system could potentially do. The original lightning network whitepaper claims that „trustless“ complex smart contract transactions can be executed on the lightning network. Furthermore, it can probably not support arbitrary complex smart contracts. Nevertheless, it is clearly riskier than on a shard as the lightning network is not public with several obfuscation layers. This might be extremely risky in case they are linked to other assets outside the blockchain or used as a collateral. Furthermore, if the routing changes during a transaction then it can be also more expensive than originally anticipated.
Also, in the case of lightning networks blockchain analytics are helpful. However, it is even more difficult compared to sharding as the onion routing make analysis of those transaction inherently difficult. Here, also external blockchain data can be extremely valuable, for example, some participants provide transaction fee revenue on their website.
An architecture for secure collection of blockchain data
The following diagram illustrates one possible high-level architecture for secure collection of blockchain data.
You can see there different blockchain networks. For example, you can have a Bitcoin blockchain network and an Ethereum blockchain network depending on your analysis. Possibly you can participate in any number and types of blockchain network.
Normal nodes are nodes that run the standard software for the corresponding blockchain. While those nodes can provide some of the data relevant for blockchain analytics, one may use dedicated special analytics nodes which collect the same data, but more as indicated above (e.g. logfiles or network traffic). Not all nodes are connected to indicate that it is impossible to collect all possible dark blockchain data and there remains always some uncertainty as the blockchain network is open. There is a special reason why you want to have multiple special analytics nodes within one blockchain and why they need to be put in different data centres of different organisations, which I will present below.
The analytics aggregator collects all the information from special analytics nodes, crawls the public web and the dark web. The latter may require also to deploy several analytic nodes within the dark web as it is also distributed in nature.
In principle, the same architecture can be also used for sharded blockchains and off-chain transactions, but to get the same level of information more analytics nodes need to be deployed.
Blockchain Stream and Batch Analytics
I have only talked little about the time of analytics, i.e. when do we need the analytics results. This again depends on your use case. If you are, for example, trading cryptocurrencies then you may opt for near real-time analytics on the current situation to act fast. On the other hand, if you are interest in longer term timeseries or simulations of the future, you may opt for batch analysis running potentially hours. These aspects are addressed by the lambda architecture.
What is secure blockchain analytics?
I have already indicated above that one needs to think about secure blockchain analytics. This has several implications as I will explain now.
Collect reliable the data relevant for blockchain analytics: People make decisions related to blockchain analytics. For example, they may decide based on a certain analysis outcome to execute a transaction, move a contract to the next step or provide products/services outside the blockchain. Hence, it is crucial to collect the blockchain data in a reliable manner.
This relatively easy in centralised systems where the person/organisation doing the analysis has usually full control on the system executing the transactions. Of course special care needs to be taken to avoid bugs or racing conditions, but this is the same in blockchains. Nevertheless, in blockchains the nodes collecting the data might be subject to special flooding attacks by others, ie they see only the transactions of few nodes and thus can be misinformed or delayed to be informed. Hence, I explained above that you need to deploy several analytics nodes ideally with IP addresses coming from different organisations so that people cannot detect to which organisation and analytics node belongs. For instance, imagine you are a large bank that makes decision based on blockchain analytics and you are misinformed then you may make the wrong decision with catastrophic consequences, such as instability of financial markets or bankruptcy.
Financial transaction in centralised systems have complex data structures and they are also not always perfectly available for analysis (e.g. due to temporal differences between interrelated transactions). Blockchain analytics though has also significant complexity as usually one can express much more in a smart contract and more data from complex unstructured data sources must be sourced to do analysis.
Need for a common ontology for analysis
All blockchains are different with different underlying data models and so are the external data sources. At the moment, no ontology for blockchain data exists. This means each organisation wanting to do blockchain analytics need to create it. While there are some standardisations by schema.org on financial products or financial services, they are not complete with respect to blockchain analytics. However, they serve as a good starting point.
Every analytics solution needs to have visualisations. The challenge here is that one needs to visualise potentially millions of transactions or millions of addresses and many different stakeholders potentially acting in different networks at the same time. Hence, the important part for your analysis software is that you can guide the analysis results by providing a story and not just a simple diagram.
Graphs consisting of nodes and edges are usually a popular choice for visualisation of complex relationships across networks. Users may zoom in or zoom out as well as navigate along the edges to explore the relations or do aggregations. Graphs are not good at representing time. This can be mitigated by providing a „play“ function where like a video the graph is changed to the timepoint that is currently played.
Maps are a common visualisation for any transaction, contract or stakeholders involved in those. However, determining the location of transaction or contract execution is difficult in blockchains as usually only the IP address of the nodes that are contacted are known (* it is a common misunderstanding that the IP address is of the user executing the transaction – this is wrong). Furthermore, it is not always easy to map an IP address to a location. While there are free databases that allow this at the country level, more detailed levels are usually commercial and require more frequent updates. The quality of those is unknown and difficult to assess without doing very sophisticated test setups.
However, maps are still useful, for example, to show where stakeholders are located (e.g miners, stackers or development team) and to illustrate relevant insights from external data sources (e.g. mentioning of locations in forum or any other related location information).
Representing maps over time is also a challenging endeavour. One may visualise flows between entities or different composition of entities over time (e.g. new accounts entering or old accounts leave). Similarly, to graphs one can also imagine a kind of video player functionality to represent this.
Finally, there are plenty of visualisations useful for transaction analysis. The Vega Visualization Grammar is the source to find and develop any type of imaginable visualisation. Other work provides a view on the blockchain using 3D and virtual reality techniques.
Last, but not least, it is always important what message you want to convey with a visualisation. Visualisations can be misleading or simply wrong. Invest some time in literature research on this topic and to try out different ones – be critical about them and document the limitations.
Software for Analysing Blockchains
Blockchain data nowadays consumes hundreds of gigabytes of data and even more if you need to take into account states of complex contracts. Furthermore, the amount of external data grows as well. Especially the raw data (e.g. websites) can be huge before extracting the relevant content.
Collecting blockchain data is still the domain of blockchain-specific software that is also used for managing the transactions. Either the raw data is taken from those directly or through REST and other interfaces the data is extracted in formats, such as JSON. This was introduced around 2010 with a special patch to enable so-called blockchain explorers to collect this data.
After this patch the first blockchain explorer appeared on the web (see here or here). Those are still nowadays used by most people doing blockchain analysis as they provide an easy and convenient search interface to find specific blockchain actors or transactions. Later those tools have been enhanced with external data related to blockchain (e.g. exchange rates or identity of actors).
The hadoopcryptoledger library is an open source library to do blockchain analytics on data of the Ethereum and Bitcoin blockchain using Big Data tools, such as Apache Flink, Apache Spark, Apache Hive and Apache Hadoop.
There are many other open source software projects, but all are rather small and specific in scope.
Besides software largely parsing the blockchain and extracting state, very valuable analysis tools come from the software engineering domain for smart contracts, such as fuzzing, mutation testing, profiling or formal validation.
Further software for analysing blockchain data has been investigated here.
External Data from Public and Dark Web
Since public external data is basically analysing any type and content of data, especially unstructured text, there is also a large set of tools to choose from, so I can refer only to the large body of literature about this.
Collecting data from the dark web is also a topic in itself with a variety of tools and methods (e.g. here or here).
The Role of Cloud Offerings
Cloud offerings can have an important role for blockchain analytics. The data amount is large already by taking into account only blockchain data. External and dark web data are several magnitudes larger. The infrastructure efforts for collecting dark blockchain data or external data are huge. Furthermore, collecting it securely as described above is another challenge to do without large data centres and standardised services. Thus, cloud offerings should be seriously taken into account when doing secure blockchain analysis.
Additionally, cloud infrastructure or own data centres are needed to run compute-intensive artificial intelligence (AI) algorithms on the data to discover new patterns and relations. Federated learning could be also directly integrated in blockchain software and the cloud only analysis an aggregated view.
Instead of building oneself the infrastructure for secure blockchain analysis one can leverage the capabilities of many Software-as-a-Service (SaaS) solutions that are nowadays available in the market. They usually include also various external public and dark web data ready for analysis.
Nevertheless, many cloud offerings related to blockchain analytics are very untransparent. It is unclear which analysis approach is taken, there is no use of verifiable open source software as well as the data and its origin are unclear. Furthermore, there are clearly conflicts of interest – many offerings (including blockchain explorers) have also additional services, such as cryptocurrency exchanges or initial coin offering (ICO) services.
Obviously, using the cloud for analysis is also contradicting the decentral nature of blockchains. Thus other means need to be investigated to avoid the centralisation of analysis (see also here on how to execute AI model training and prediction in smart contracts).
Blockchain analytics despite being there since around 2011 is still in its infancy. There are only few small open source offerings, there is no harmonised data model for performing reproduceable analysis and the cloud offerings are untransparent.
Nevertheless, the demand for secure blockchain analysis is clearly huge as smart contracts and complex financial service are realised on the blockchain with large transaction volumes.
The upcoming blockchain upgrades in popular public blockchains not only mean several magnitudes of more smart contract volume, but they also imply that sophisticated secure blockchain analytics is needed to make them sustainable. Additionally, new types of smart contracts – smart audit contracts – will automatically process insights from blockchain analytics on the blockchain itself.
Potentially this will mean that secure blockchain analytics aspects are taken into account for future blockchain software and network upgrades.