Schlagwort: csv
-
Collaborative Data Science: About Storing, Reusing, Composing and Deploying Machine Learning Models
Why is this important? Machine Learning has re-emerged in recent years as new Big Data platforms provide means to use them with more data, make them more complex as well as allowing combining several models to make an even more intelligent predictive/prescriptive analysis. This requires storing as well as exchaning machine learning models to enable…
-
Sneak Preview – HadoopOffice: Processing Office documents using the Hadoop Ecosystem – The example of Excel files
I present in this blog post the sneak preview of the hadoopoffice library that will enable you to process Office files, such as MS Excel, using the Hadoop Ecosystem including Hive/Spark. It currently contains only an ExcelInputFormat, which is based on Apache POI. Additionally, it contains an example that demonstrates how an Excel input file…