Kategorie: big data
-
From Structured Query Languages (SQL) to Dataframe Languages (DL)
Structured Query Languages (SQL) exists since the 1970s and have been first standardized around 1986 by the American National Standards Institute (ANSI). Their purpose was to have a human-understandable language to query data in tables in database management systems. This means SQL is a domain-specific language. Much later they have been also adopted to query…
-
Revisiting Big Data Formats: Apache Iceberg, Delta Lake and Apache Hudi
Novel Big Data formats, such as Apache Parquet, Apache ORC or Apache Avro have been years ago the game changer for processing massive amounts of data efficiently as I wrote in a previous blog post (aside of the Big Data platforms leveraging them). Nowadays we see the emergence of new Big Data formats, such as…
-
eBPF for modern cloud-based data centres
I have also published the source code related to this article on Codeberg (EU Git hosting) and Github (US Git hosting). Modern cloud data centres offer different level of abstractions to run applications, such as virtual machines, containers, micro-vms (see also my article on Unikernels), functions, virtual data-processing clusters and so on. As those abstractions…
-
A Study on using a Rust-based dynamic Module system in WebAssembly for processing Data
I have also published the source code of this on Codeberg (EU Git hosting) and Github (US Git hosting). Nowadays we have a plethora of programming languages and platforms at our fingertips. They have different advantages and disadvantages depending on the use case and preferences. Often many different combinations of components are used for data…
-
Secure Blockchain Analytics
Blockchain analytics has become a trending topic in recent years. This topic is of interest not only for public blockchains, such as Bitcoin or Ethereum and their Altcoins, but also for private/permissive blockchains based on various technologies. Nevertheless, there are many challenges involved, such as the large data volumes, the inefficient format for analytics, state…
-
AI Applications and Systems for Deep Logic and Probabilistic Networks
This blog post describes the integration of deep learning, logic and probabilistic reasoning to enable advanced artificial intelligence tasks. The combination of completely different set of AI approaches will be one of the key advances to support AI driven business processes in the coming years. Furthermore, I describe challenges for operating such complex AI systems…
-
GPUs, FPGAs, TPUs for Accelerating Intelligent Applications
Intelligent Applications are part of our every day life. One observes constant flow of new algorithms, models and machine learning applications. Some require ingesting a lot of data, some require applying a lot of compute resources and some address real time learning. Dedicated hardware capabilities can thus support some of those, but not all. Many…
-
Collaborative Data Science: About Storing, Reusing, Composing and Deploying Machine Learning Models
Why is this important? Machine Learning has re-emerged in recent years as new Big Data platforms provide means to use them with more data, make them more complex as well as allowing combining several models to make an even more intelligent predictive/prescriptive analysis. This requires storing as well as exchaning machine learning models to enable…
-
Automated Machine Learning (AutoML) and Big Data Platforms
Although machine learning exists already since decades, the typical data scientist – as you would call it today – would still have to go through a manual labor-intensive process of extracting the data, cleaning, feature extraction, regularization, training, finding the right model, testing, selecting and deploying it. Furthermore, for most machine learning scenarios you do…
-
HadoopCryptoLedger library a vision for the coming Years
The first commit of the HadoopCryptoLedger has been on 26th March of 2016. Since then a lot of new functionality has been added, such as support for major Big Data platforms including Hive / Flink / Spark. Furthermore, besides Bitcoin, Altcoins based on Bitcoin (e.g. Namecoin, Litecoin or Bitcoin Cash) and Ethereum (including Altcoins) have…