Sneak Preview – HadoopOffice: Processing Office documents using the Hadoop Ecosystem – The example of Excel files

I present in this blog post the sneak preview of the hadoopoffice library that will enable you to process Office files, such as MS Excel, using the Hadoop Ecosystem including Hive/Spark. It currently contains only an ExcelInputFormat, which is based on Apache POI. Additionally, it contains an example that demonstrates how an Excel input file… Sneak Preview – HadoopOffice: Processing Office documents using the Hadoop Ecosystem – The example of Excel files weiterlesen