Schlagwort: big data
-
Revisiting Big Data Formats: Apache Iceberg, Delta Lake and Apache Hudi
Novel Big Data formats, such as Apache Parquet, Apache ORC or Apache Avro have been years ago the game changer for processing massive amounts of data efficiently as I wrote in a previous blog post (aside of the Big Data platforms leveraging them). Nowadays we see the emergence of new Big Data formats, such as…
-
A Study on using a Rust-based dynamic Module system in WebAssembly for processing Data
I have also published the source code of this on Codeberg (EU Git hosting) and Github (US Git hosting). Nowadays we have a plethora of programming languages and platforms at our fingertips. They have different advantages and disadvantages depending on the use case and preferences. Often many different combinations of components are used for data…
-
HadoopOffice – A Vision for the coming Years
HadoopOffice is already since more than a year available (first commit: 16.10.2016). Currently it supports Excel formats based on the Apache POI parsers/writers. Meanwhile a lot of functionality has been added, such as: Support for .xlsx and .xls formats – reading and writing Encryption/Decryption Support Support for Hadoop mapred.* and mapreduce.* APIs Support for Spark…