HadoopOffice – A Vision for the coming Years

HadoopOffice is already since more than a year available (first commit: 16.10.2016). Currently it supports Excel formats based on the Apache POI parsers/writers. Meanwhile a lot of functionality has been added, such as: Support for .xlsx and .xls formats – reading and writing Encryption/Decryption Support Support for Hadoop mapred.* and mapreduce.* APIs Support for Spark… HadoopOffice – A Vision for the coming Years weiterlesen

Big Data Analytics on Bitcoin‘s first Altcoin: NameCoin

This blog post is about analyzing the Namecoin Blockchain using different Big Data technologies based on the HadoopCryptoLedger library. Currently, this library enables you to analyze the Bitcoin blockchain and Altcoins based on Bitcoin (incl. segregated witness), such as Namecoin, Litecoin, Zcash etc., on Big Data platforms, such as Hadoop, Hive, Flink and Spark. A… Big Data Analytics on Bitcoin‘s first Altcoin: NameCoin weiterlesen

Spending Time on Quality in Your Big Data Open Source Projects

Open source libraries are nowadays a critical part of the economy. They are used in commercial and non-commercial applications directly or indirectly affecting virtually any human being. Ensuring quality should be at the heart of each open source project. Verifying that an open source project ensures quality is mandatory for each stakeholder of this project,… Spending Time on Quality in Your Big Data Open Source Projects weiterlesen

Leverage the Power of Apache Flink to analyze the Bitcoin Blockchain

The hadoopcryptoledger library has been enhanced with a datasource for Apache Flink. This means you can use the Big Data processing framework Apache Flink to analyze the Bitcoin Blockchain. It also includes an example that counts the total number of transactions in the Bitcoin blockchain. Of course given the power of Apache Flink you can think… Leverage the Power of Apache Flink to analyze the Bitcoin Blockchain weiterlesen

Big Data Lab in the Cloud with Hadoop+Spark+R+Python

This is an update of the second big data lab for the cloud. Similar to previous versions, this document described how you can create a Big Data Lab in the cloud on Amazon EMR. Besides some major upgrades to the newest Amazon Hadoop AMI (3.6.0) Spark (1.3.0) and R, it includes now also the possibility… Big Data Lab in the Cloud with Hadoop+Spark+R+Python weiterlesen

Example projects for using various NoSQL and Big Data technologies

Recently, I published on github.com several example Java projects for using various NoSQL technologies: cassandra-tutorial : Apache Cassandra tutorial (Column-oriented database) mongodb-tutorial : Mongo DB tutorial (Document database) neo4j-tutorial : Neo4J (Graph Database) redis-tutorial : Redis (Key/Value Store) solr-tutorial : Apache SolrCloud (Search technology) Other example Java projects aim at standardized big data processing platforms:… Example projects for using various NoSQL and Big Data technologies weiterlesen