Big Data Analytics on Excel files using Hadoop/Hive/Flink/Spark

Today we have released HadoopOffice v1.1.0 with major enhancements: Based on the latest Apache POI 3.17 Apache Hive: Query Excel files and write tables to Excel files using the Hive Serde Apache Flink support for Flink Table API and Flink DataSource/DataSink Signing and verification of signatures of Excel files Example to use the HadoopOffice library… Big Data Analytics on Excel files using Hadoop/Hive/Flink/Spark weiterlesen

HadoopOffice – A Vision for the coming Years

HadoopOffice is already since more than a year available (first commit: 16.10.2016). Currently it supports Excel formats based on the Apache POI parsers/writers. Meanwhile a lot of functionality has been added, such as: Support for .xlsx and .xls formats – reading and writing Encryption/Decryption Support Support for Hadoop mapred.* and mapreduce.* APIs Support for Spark… HadoopOffice – A Vision for the coming Years weiterlesen

HadoopCryptoLedger library a vision for the coming Years

The first commit of the HadoopCryptoLedger has been on 26th March of 2016. Since then a lot of new functionality has been added, such as support for major Big Data platforms including Hive / Flink / Spark. Furthermore, besides Bitcoin, Altcoins based on Bitcoin (e.g. Namecoin, Litecoin or Bitcoin Cash) and Ethereum (including Altcoins) have… HadoopCryptoLedger library a vision for the coming Years weiterlesen

Ethereum & Analytics: Explore the blockchain using Hadoop, Hive, Flink and Spark

HadoopCryptoLedger release 1.1.0 added support for another well-known cryptocurrency: Ethereum and its Altcoins. Of course similar to its Bitcoin & Altcoin support you can use the library with many different frameworks in the Hadoop ecosystem: Hadoop MR Apache Hive Apache Flink Apache Spark and Apache Spark Datasource API Furthermore, you can use it with various… Ethereum & Analytics: Explore the blockchain using Hadoop, Hive, Flink and Spark weiterlesen

Mapred vs MapReduce – The API question of Hadoop and impact on the Ecosystem

I will describe in this blog post the difference between the mapred.* and mapreduce.* API in Hadoop with respect to the custom InputFormats and OutputFormats. Additionally I will write on the impact of having both APIs on the Hadoop Ecosystem and related Big Data platforms, such as Apache Flink, Apache Hive and Apache Spark. Finally,… Mapred vs MapReduce – The API question of Hadoop and impact on the Ecosystem weiterlesen

Big Data Analytics on Bitcoin‘s first Altcoin: NameCoin

This blog post is about analyzing the Namecoin Blockchain using different Big Data technologies based on the HadoopCryptoLedger library. Currently, this library enables you to analyze the Bitcoin blockchain and Altcoins based on Bitcoin (incl. segregated witness), such as Namecoin, Litecoin, Zcash etc., on Big Data platforms, such as Hadoop, Hive, Flink and Spark. A… Big Data Analytics on Bitcoin‘s first Altcoin: NameCoin weiterlesen

Spending Time on Quality in Your Big Data Open Source Projects

Open source libraries are nowadays a critical part of the economy. They are used in commercial and non-commercial applications directly or indirectly affecting virtually any human being. Ensuring quality should be at the heart of each open source project. Verifying that an open source project ensures quality is mandatory for each stakeholder of this project,… Spending Time on Quality in Your Big Data Open Source Projects weiterlesen

Templates, low footprint mode, improved integration with Spark for the HadoopOffice library for reading/writing Excel files on Big data platforms

Although it seems to be that it was only a small improvement, version 1.0.4 of the HadoopOffice library has a lot of new features for reading/writing Excel files: Templates, so you can define complex documents with diagrams or other features in MSExcel and fill it with data or formulas from your Big Data platform in… Templates, low footprint mode, improved integration with Spark for the HadoopOffice library for reading/writing Excel files on Big data platforms weiterlesen

Leverage the Power of Apache Flink to analyze the Bitcoin Blockchain

The hadoopcryptoledger library has been enhanced with a datasource for Apache Flink. This means you can use the Big Data processing framework Apache Flink to analyze the Bitcoin Blockchain. It also includes an example that counts the total number of transactions in the Bitcoin blockchain. Of course given the power of Apache Flink you can think… Leverage the Power of Apache Flink to analyze the Bitcoin Blockchain weiterlesen

Reading/Writing Excel documents with the HadoopOffice library on Hadoop and Spark – First release

Reading/Writing office documents, such as Excel, has been always challenging on Big data platforms. Although many libraries exist for reading/writing office documents, they have never been really integrated in Hadoop or Spark and thus lead to a lot of development efforts. There are several use cases for using office documents jointly with Big data technologies:… Reading/Writing Excel documents with the HadoopOffice library on Hadoop and Spark – First release weiterlesen