Autor: Jörn Franke
-
The Role of Improvement Proposals in Large-Scale Open Software Systems
Many organisations embark on complex software systems supporting critical business processes involving humans to deliver value. For instance, they may deploy a large data platform that is shared between various business divisions within the organisations and/or with other organisations. While there is often a lot of money invested when creating the software before the first…
-
From Structured Query Languages (SQL) to Dataframe Languages (DL)
Structured Query Languages (SQL) exists since the 1970s and have been first standardized around 1986 by the American National Standards Institute (ANSI). Their purpose was to have a human-understandable language to query data in tables in database management systems. This means SQL is a domain-specific language. Much later they have been also adopted to query…
-
Novel ways of providing identity to automated cross-cloud processes – Workload Identity Federation and SPIFFE
Growing cybercriminal activities let to a boom of access rights solutions for user access to data and systems. Recently they have been even more augmented by provisioning of hardware FIDO2 based tokens (available also as open source, such as Nitrokey) to ensure the identity of a user and then being able to provide the right…
-
Revisiting Big Data Formats: Apache Iceberg, Delta Lake and Apache Hudi
Novel Big Data formats, such as Apache Parquet, Apache ORC or Apache Avro have been years ago the game changer for processing massive amounts of data efficiently as I wrote in a previous blog post (aside of the Big Data platforms leveraging them). Nowadays we see the emergence of new Big Data formats, such as…
-
Provenance for Data, AI Model and Software Artifacts – Combining OIDC and short-lived private keys
Provenance (Wikipedia) is an important concept in information technology: It essentially says that a digital artefact, such as a dataset, an AI model or software, meets the expectations of the artifact consumer. Expectations can be of different nature, for instance, it can describe how it was generated, that it has been subject to certain automated…
-
eBPF for modern cloud-based data centres
I have also published the source code related to this article on Codeberg (EU Git hosting) and Github (US Git hosting). Modern cloud data centres offer different level of abstractions to run applications, such as virtual machines, containers, micro-vms (see also my article on Unikernels), functions, virtual data-processing clusters and so on. As those abstractions…
-
A Study on using a Rust-based dynamic Module system in WebAssembly for processing Data
I have also published the source code of this on Codeberg (EU Git hosting) and Github (US Git hosting). Nowadays we have a plethora of programming languages and platforms at our fingertips. They have different advantages and disadvantages depending on the use case and preferences. Often many different combinations of components are used for data…
-
Modern Cloud Application Delivery: WASM and WASI
I described in a previous blog post that modularity will play a key role in future enterprise applications. This is demonstrated in the current trends of serverless functions or containerized architectures. However, those solutions are not perfect: Given the trend of many different computing architectures, such as ARM on servers, Internet of Things (IoT) Edge…
-
Semantic Versioning for Artificial Intelligence (AI) 1.0.0
Artificial Intelligence (AI) becomes more and more part of some applications catering for the needs of many people. While AI is part of software products it has a very different velocity and less predictable needs for change. Especially if it addresses open-ended domains, such as natural language processing (NLP), where the content can change or…