Zukunft-Innovation-Technik (ZuInnoTe) – Digitalize your business

Schlagwort: xlsx

Big Data Analytics on Excel files using Hadoop/Hive/Flink/Spark

März 17, 2018

—

von

Jörn Franke

in analytics, data warehouse, flink, hive, office, tech

Today we have released HadoopOffice v1.1.0 with major enhancements: Based on the latest Apache POI 3.17 Apache Hive: Query Excel files and write tables to Excel files using the Hive Serde Apache Flink support for Flink Table API and Flink DataSource/DataSink Signing and verification of signatures of Excel files Example to use the HadoopOffice library…
Reading/Writing Excel documents with the HadoopOffice library on Hadoop and Spark – First release

Jan. 8, 2017

—

von

Jörn Franke

in analytics, big data, office, tech

Reading/Writing office documents, such as Excel, has been always challenging on Big data platforms. Although many libraries exist for reading/writing office documents, they have never been really integrated in Hadoop or Spark and thus lead to a lot of development efforts. There are several use cases for using office documents jointly with Big data technologies:…