data warehouse – Zukunft-Innovation-Technik (ZuInnoTe)

Big Data Analytics on Excel files using Hadoop/Hive/Flink/Spark

März 17, 2018

—

von

in analytics, data warehouse, flink, hive, office, tech

Today we have released HadoopOffice v1.1.0 with major enhancements: Based on the latest Apache POI 3.17 Apache Hive: Query Excel files and write tables to Excel files using the Hive Serde Apache Flink support for Flink Table API and Flink DataSource/DataSink Signing and verification of signatures of Excel files Example to use the HadoopOffice library…

Hive & Bitcoin: Analytics on Blockchain data with SQL

Apr. 28, 2016

—

von

Jörn Franke

in analytics, big data, bitcoin, data warehouse, hive, tech

You can now analyze the Bitcoin Blockchain using Hive and the hadoopcryptoledger library with the new HiveSerde plugin. Basically you can link any data that you loaded in Hive with Bitcoin Blockchain data. For example, you can link Blockchain data with important events in history to determine what causes Bitcoin exchange rates to increase or…

Hive Optimizations with Indexes, Bloom-Filters and Statistics

Juli 25, 2015

—

von

Jörn Franke

in big data, data warehouse, hive, tech

This blog post describes how Storage Indexes, Bitmap Indexes, Compact Indexes, Aggregate Indexes, Covering Indexes/Materialized Views, Bloom-Filters and statistics can increase performance with Apache Hive to enable a real-time datawarehouse. Furthermore, I will address how index-paradigms change due to big data volumes. Generally it is recommended to use less traditional indexes, but focus on storage indexes…

Kategorie: data warehouse

Big Data Analytics on Excel files using Hadoop/Hive/Flink/Spark

Hive & Bitcoin: Analytics on Blockchain data with SQL

Hive Optimizations with Indexes, Bloom-Filters and Statistics