Kategorie: cloud
-
Novel ways of providing identity to automated cross-cloud processes – Workload Identity Federation and SPIFFE
Growing cybercriminal activities let to a boom of access rights solutions for user access to data and systems. Recently they have been even more augmented by provisioning of hardware FIDO2 based tokens (available also as open source, such as Nitrokey) to ensure the identity of a user and then being able to provide the right…
-
Revisiting Big Data Formats: Apache Iceberg, Delta Lake and Apache Hudi
Novel Big Data formats, such as Apache Parquet, Apache ORC or Apache Avro have been years ago the game changer for processing massive amounts of data efficiently as I wrote in a previous blog post (aside of the Big Data platforms leveraging them). Nowadays we see the emergence of new Big Data formats, such as…
-
Provenance for Data, AI Model and Software Artifacts – Combining OIDC and short-lived private keys
Provenance (Wikipedia) is an important concept in information technology: It essentially says that a digital artefact, such as a dataset, an AI model or software, meets the expectations of the artifact consumer. Expectations can be of different nature, for instance, it can describe how it was generated, that it has been subject to certain automated…
-
eBPF for modern cloud-based data centres
I have also published the source code related to this article on Codeberg (EU Git hosting) and Github (US Git hosting). Modern cloud data centres offer different level of abstractions to run applications, such as virtual machines, containers, micro-vms (see also my article on Unikernels), functions, virtual data-processing clusters and so on. As those abstractions…
-
A Study on using a Rust-based dynamic Module system in WebAssembly for processing Data
I have also published the source code of this on Codeberg (EU Git hosting) and Github (US Git hosting). Nowadays we have a plethora of programming languages and platforms at our fingertips. They have different advantages and disadvantages depending on the use case and preferences. Often many different combinations of components are used for data…
-
Modern Cloud Application Delivery: WASM and WASI
I described in a previous blog post that modularity will play a key role in future enterprise applications. This is demonstrated in the current trends of serverless functions or containerized architectures. However, those solutions are not perfect: Given the trend of many different computing architectures, such as ARM on servers, Internet of Things (IoT) Edge…
-
Big Data Lab in the Cloud with Hadoop+Spark+R+Python
This is an update of the second big data lab for the cloud. Similar to previous versions, this document described how you can create a Big Data Lab in the cloud on Amazon EMR. Besides some major upgrades to the newest Amazon Hadoop AMI (3.6.0) Spark (1.3.0) and R, it includes now also the possibility…
-
Update: Next Generation Big Data Lab V2 in the Cloud
Recently, I presented the first version of the Big Data Lab in the cloud. Now I extended this version and kept most of the features of the previous version. However, I provide upgrades for important software components. It still runs on Amazon EMR, but with the newest Amazon AMI (including Amazon Linux). It now features…
-
Example projects for using various NoSQL and Big Data technologies
Recently, I published on github.com several example Java projects for using various NoSQL technologies: cassandra-tutorial : Apache Cassandra tutorial (Column-oriented database) mongodb-tutorial : Mongo DB tutorial (Document database) neo4j-tutorial : Neo4J (Graph Database) redis-tutorial : Redis (Key/Value Store) solr-tutorial : Apache SolrCloud (Search technology) Other example Java projects aim at standardized big data processing platforms:…