analytics – Zukunft-Innovation-Technik (ZuInnoTe)

From Structured Query Languages (SQL) to Dataframe Languages (DL)

Feb. 12, 2024

—

von

Structured Query Languages (SQL) exists since the 1970s and have been first standardized around 1986 by the American National Standards Institute (ANSI). Their purpose was to have a human-understandable language to query data in tables in database management systems. This means SQL is a domain-specific language. Much later they have been also adopted to query…

Revisiting Big Data Formats: Apache Iceberg, Delta Lake and Apache Hudi

Aug. 5, 2023

—

von

Jörn Franke

in analytics, big data, cloud, flink, hadoop, hive, spark

Novel Big Data formats, such as Apache Parquet, Apache ORC or Apache Avro have been years ago the game changer for processing massive amounts of data efficiently as I wrote in a previous blog post (aside of the Big Data platforms leveraging them). Nowadays we see the emergence of new Big Data formats, such as…

The Question of Maintenance of pre-trained Machine Learning Embeddings

Feb. 28, 2021

—

von

Jörn Franke

in analytics, artificial intelligence, business, data science, machine learning

I will address in this post the issue of maintenance of large pretrained embeddings within Artificial Intelligence (AI) services. While this issue has some links to ethical aspects (see for example the European Commission’s guidelines on trustworthy AI or here), the focus here is on maintainability of those embeddings as part of MLOps. Software Maintenance…

Secure Blockchain Analytics

Jan. 20, 2021

—

von

Jörn Franke

in analytics, big data, bitcoin, blockchain, business, data science, ethereum

Blockchain analytics has become a trending topic in recent years. This topic is of interest not only for public blockchains, such as Bitcoin or Ethereum and their Altcoins, but also for private/permissive blockchains based on various technologies. Nevertheless, there are many challenges involved, such as the large data volumes, the inefficient format for analytics, state…

Big Data Analytics on Excel files using Hadoop/Hive/Flink/Spark

März 17, 2018

—

von

Jörn Franke

in analytics, data warehouse, flink, hive, office, tech

Today we have released HadoopOffice v1.1.0 with major enhancements: Based on the latest Apache POI 3.17 Apache Hive: Query Excel files and write tables to Excel files using the Hive Serde Apache Flink support for Flink Table API and Flink DataSource/DataSink Signing and verification of signatures of Excel files Example to use the HadoopOffice library…

HadoopOffice – A Vision for the coming Years

Jan. 2, 2018

—

von

Jörn Franke

in analytics, flink, hive, office, streaming, tech

HadoopOffice is already since more than a year available (first commit: 16.10.2016). Currently it supports Excel formats based on the Apache POI parsers/writers. Meanwhile a lot of functionality has been added, such as: Support for .xlsx and .xls formats – reading and writing Encryption/Decryption Support Support for Hadoop mapred.* and mapreduce.* APIs Support for Spark…

HadoopCryptoLedger library a vision for the coming Years

Dez. 29, 2017

—

von

Jörn Franke

in analytics, big data, blockchain, flink, hadoop, hive, spark, streaming, tech

The first commit of the HadoopCryptoLedger has been on 26th March of 2016. Since then a lot of new functionality has been added, such as support for major Big Data platforms including Hive / Flink / Spark. Furthermore, besides Bitcoin, Altcoins based on Bitcoin (e.g. Namecoin, Litecoin or Bitcoin Cash) and Ethereum (including Altcoins) have…

Ethereum & Analytics: Explore the blockchain using Hadoop, Hive, Flink and Spark

Nov. 14, 2017

—

von

Jörn Franke

in analytics, big data, tech

HadoopCryptoLedger release 1.1.0 added support for another well-known cryptocurrency: Ethereum and its Altcoins. Of course similar to its Bitcoin & Altcoin support you can use the library with many different frameworks in the Hadoop ecosystem: Hadoop MR Apache Hive Apache Flink Apache Spark and Apache Spark Datasource API Furthermore, you can use it with various…

Mapred vs MapReduce – The API question of Hadoop and impact on the Ecosystem

Okt. 20, 2017

—

von

Jörn Franke

in analytics, big data, flink, graph, hive, office, spark, tech

I will describe in this blog post the difference between the mapred.* and mapreduce.* API in Hadoop with respect to the custom InputFormats and OutputFormats. Additionally I will write on the impact of having both APIs on the Hadoop Ecosystem and related Big Data platforms, such as Apache Flink, Apache Hive and Apache Spark. Finally,…

Big Data Analytics on Bitcoin‘s first Altcoin: NameCoin

Okt. 10, 2017

—

von

Jörn Franke

in altcoin, analytics, big data, bitcoin, flink, hive, tech

This blog post is about analyzing the Namecoin Blockchain using different Big Data technologies based on the HadoopCryptoLedger library. Currently, this library enables you to analyze the Bitcoin blockchain and Altcoins based on Bitcoin (incl. segregated witness), such as Namecoin, Litecoin, Zcash etc., on Big Data platforms, such as Hadoop, Hive, Flink and Spark. A…

Kategorie: analytics