Today we have released HadoopOffice v1.1.0 with major enhancements: Based on the latest Apache POI 3.17 Apache Hive: Query Excel files and write tables to Excel files using the Hive Serde Apache Flink support for Flink Table API and Flink DataSource/DataSink Signing and verification of signatures of Excel files Example to use the HadoopOffice library… Big Data Analytics on Excel files using Hadoop/Hive/Flink/Spark weiterlesen
Kategorie: office
HadoopOffice – A Vision for the coming Years
HadoopOffice is already since more than a year available (first commit: 16.10.2016). Currently it supports Excel formats based on the Apache POI parsers/writers. Meanwhile a lot of functionality has been added, such as: Support for .xlsx and .xls formats – reading and writing Encryption/Decryption Support Support for Hadoop mapred.* and mapreduce.* APIs Support for Spark… HadoopOffice – A Vision for the coming Years weiterlesen
Mapred vs MapReduce – The API question of Hadoop and impact on the Ecosystem
I will describe in this blog post the difference between the mapred.* and mapreduce.* API in Hadoop with respect to the custom InputFormats and OutputFormats. Additionally I will write on the impact of having both APIs on the Hadoop Ecosystem and related Big Data platforms, such as Apache Flink, Apache Hive and Apache Spark. Finally,… Mapred vs MapReduce – The API question of Hadoop and impact on the Ecosystem weiterlesen
Templates, low footprint mode, improved integration with Spark for the HadoopOffice library for reading/writing Excel files on Big data platforms
Although it seems to be that it was only a small improvement, version 1.0.4 of the HadoopOffice library has a lot of new features for reading/writing Excel files: Templates, so you can define complex documents with diagrams or other features in MSExcel and fill it with data or formulas from your Big Data platform in… Templates, low footprint mode, improved integration with Spark for the HadoopOffice library for reading/writing Excel files on Big data platforms weiterlesen
Reading/Writing Excel documents with the HadoopOffice library on Hadoop and Spark – First release
Reading/Writing office documents, such as Excel, has been always challenging on Big data platforms. Although many libraries exist for reading/writing office documents, they have never been really integrated in Hadoop or Spark and thus lead to a lot of development efforts. There are several use cases for using office documents jointly with Big data technologies:… Reading/Writing Excel documents with the HadoopOffice library on Hadoop and Spark – First release weiterlesen
Sneak Preview – HadoopOffice: Processing Office documents using the Hadoop Ecosystem – The example of Excel files
I present in this blog post the sneak preview of the hadoopoffice library that will enable you to process Office files, such as MS Excel, using the Hadoop Ecosystem including Hive/Spark. It currently contains only an ExcelInputFormat, which is based on Apache POI. Additionally, it contains an example that demonstrates how an Excel input file… Sneak Preview – HadoopOffice: Processing Office documents using the Hadoop Ecosystem – The example of Excel files weiterlesen