Secure Blockchain Analytics

Blockchain analytics has become a trending topic in recent years. This topic is of interest not only for public blockchains, such as Bitcoin or Ethereum and their Altcoins, but also for private/permissive blockchains based on various technologies. Nevertheless, there are many challenges involved, such as the large data volumes, the inefficient format for analytics, state… Secure Blockchain Analytics weiterlesen

AI Applications and Systems for Deep Logic and Probabilistic Networks

This blog post describes the integration of deep learning, logic and probabilistic reasoning to enable advanced artificial intelligence tasks. The combination of completely different set of AI approaches will be one of the key advances to support AI driven business processes in the coming years. Furthermore, I describe challenges for operating such complex AI systems… AI Applications and Systems for Deep Logic and Probabilistic Networks weiterlesen

GPUs, FPGAs, TPUs for Accelerating Intelligent Applications

Intelligent Applications are part of our every day life. One observes constant flow of new algorithms, models and machine learning applications. Some require ingesting a lot of data, some require applying a lot of compute resources and some address real time learning. Dedicated hardware capabilities can thus support some of those, but not all. Many… GPUs, FPGAs, TPUs for Accelerating Intelligent Applications weiterlesen

Collaborative Data Science: About Storing, Reusing, Composing and Deploying Machine Learning Models

Why is this important? Machine Learning has re-emerged in recent years as new Big Data platforms provide means to use them with more data, make them more complex as well as allowing combining several models to make an even more intelligent predictive/prescriptive analysis. This requires storing as well as exchaning machine learning models to enable… Collaborative Data Science: About Storing, Reusing, Composing and Deploying Machine Learning Models weiterlesen

Automated Machine Learning (AutoML) and Big Data Platforms

Although machine learning exists already since decades, the typical data scientist – as you would call it today – would still have to go through a manual labor-intensive process of extracting the data, cleaning, feature extraction, regularization, training, finding the right model, testing, selecting and deploying it. Furthermore, for most machine learning scenarios you do… Automated Machine Learning (AutoML) and Big Data Platforms weiterlesen

HadoopCryptoLedger library a vision for the coming Years

The first commit of the HadoopCryptoLedger has been on 26th March of 2016. Since then a lot of new functionality has been added, such as support for major Big Data platforms including Hive / Flink / Spark. Furthermore, besides Bitcoin, Altcoins based on Bitcoin (e.g. Namecoin, Litecoin or Bitcoin Cash) and Ethereum (including Altcoins) have… HadoopCryptoLedger library a vision for the coming Years weiterlesen

Ethereum & Analytics: Explore the blockchain using Hadoop, Hive, Flink and Spark

HadoopCryptoLedger release 1.1.0 added support for another well-known cryptocurrency: Ethereum and its Altcoins. Of course similar to its Bitcoin & Altcoin support you can use the library with many different frameworks in the Hadoop ecosystem: Hadoop MR Apache Hive Apache Flink Apache Spark and Apache Spark Datasource API Furthermore, you can use it with various… Ethereum & Analytics: Explore the blockchain using Hadoop, Hive, Flink and Spark weiterlesen

Mapred vs MapReduce – The API question of Hadoop and impact on the Ecosystem

I will describe in this blog post the difference between the mapred.* and mapreduce.* API in Hadoop with respect to the custom InputFormats and OutputFormats. Additionally I will write on the impact of having both APIs on the Hadoop Ecosystem and related Big Data platforms, such as Apache Flink, Apache Hive and Apache Spark. Finally,… Mapred vs MapReduce – The API question of Hadoop and impact on the Ecosystem weiterlesen

Big Data Analytics on Bitcoin‘s first Altcoin: NameCoin

This blog post is about analyzing the Namecoin Blockchain using different Big Data technologies based on the HadoopCryptoLedger library. Currently, this library enables you to analyze the Bitcoin blockchain and Altcoins based on Bitcoin (incl. segregated witness), such as Namecoin, Litecoin, Zcash etc., on Big Data platforms, such as Hadoop, Hive, Flink and Spark. A… Big Data Analytics on Bitcoin‘s first Altcoin: NameCoin weiterlesen

Templates, low footprint mode, improved integration with Spark for the HadoopOffice library for reading/writing Excel files on Big data platforms

Although it seems to be that it was only a small improvement, version 1.0.4 of the HadoopOffice library has a lot of new features for reading/writing Excel files: Templates, so you can define complex documents with diagrams or other features in MSExcel and fill it with data or formulas from your Big Data platform in… Templates, low footprint mode, improved integration with Spark for the HadoopOffice library for reading/writing Excel files on Big data platforms weiterlesen