I have also published the source code related to this article on Codeberg (EU Git hosting) and Github (US Git hosting).
Modern cloud data centres offer different level of abstractions to run applications, such as virtual machines, containers, micro-vms (see also my article on Unikernels), functions, virtual data-processing clusters and so on. As those abstractions go beyond traditional infrastructures this imposes new challenges for example in the area of networking, security, performance and storage – especially since isolation is crucial for reducing the attack surface, increase data privacy and enable true multi-tenancy in light of often higher complexity.
I focus here on a specific level of isolation that is enabled by a feature in the Linux kernel gaining more and more popularity: the enhanced Berkeley Packet Filter (eBPF). This concept – complementary the cGroups kernel isolation feature – make it possible to reach any level of abstraction for advanced cloud infrastructures by making the Linux kernel behaviour programmable without changing the kernel source code. This enables a huge flexibility and avoids the long term process to get something integrated in the Linux kernel source code directly.
I will start in the following section with an overview what eBPF is. Then, I elaborate on some of its use cases. As eBPF is a functionality at the heart of an operating system – the kernel – I will briefly investigate when this can make sense. Nowadays, the Linux kernel provides a plethora of different eBPF programs and I will give a synopsis of some of the more common ones. We then enter the practical area and I will present the typical architecture of an eBPF program and security aspects related to running them. I will finish with some known libraries/frameworks that help with the implementation of eBPF applications and a case study on implementing eBPF applications in Rust with aya-rs.
While eBPF is extremely powerful, it can also introduce complexity and requires very skilled developers – especially for use cases that are beyond the scope of a single host/kernel, but involve a distributed system consisting of multiple kernel instances/nodes.
What is eBPF?
eBPF has its roots in the classic Berkeley Packet Filter (BPF), but they are fundamentally different. Essentially they allow to extend the Linux kernel, because they are small programs interpreted in a special virtual machine with a special instruction language inside the kernel. Those small programs are loaded dynamically into the kernel and communicate with a user space application. eBPF was started in 2011 and since then regularly new features are published in the Linux kernel.
eBPF programs can be loaded during runtime and they are not required to be compiled in the kernel or loaded as a kernel module. Nowadays it takes a long time to get functionality accepted inside the kernel source code and thus eBPF programs are much faster available in Linux distributions and to end users. Furthermore, by running them in a safe virtual environment inside the kernel, they involve much less risk to run compared to functionality directly implemented inside the kernel or as a kernel module. Additionally, they can be efficiently monitored and be managed by a user space application (e.g. a command line tool).
Most of the eBPF programs focus on observing. Some can make decisions (e.g. to make network traffic decisions). Nevertheless, the power comes with its connection to a user space program that can based on the observations provide further information to the eBPF program or implement any type of action (e.g. terminating an application, reducing CPU usage of an application etc.).
Virtually any functionality inside the kernel can be extended by eBPF programs and I will later explain some of the functionality that can be extended.
eBPF has been extremely successful in cloud data centers and essentially enable cloud operating systems (systems spawning many different resources) with increased security, reliability and observability.
What are the use cases?
Traditionally, there were many different tools that required specific extensions in the Linux kernel, such as perf, tcpdump and others. Furthermore, there were specific drivers in the kernel to allow, for instance, software defined networks and other types of virtualization.
While these work in theory they were only available to specific tools and were changed if specific tools required a change. Furthermore, if changes are required in the Linux kernel then this can take some time until they are integrated and even much longer to update all devices to the Linux kernel versions that supports the change, if this is even possible.
With eBPF those use cases are still possible, but any software can leverage them using standardized eBPF programs. They are also more performant and can interact in simpler ways across different technologies (e.g. compute, network or storage). The aforementioned applications also switched to eBPF. Furthermore, any changes/adaptation of logic in the Linux kernel can often done without changing the Linux kernel. Simply load a changed eBPF program during runtime and that is all that need to be done.
However, it does not stop there. Due to the standardization more complex systems can be built that leverage multiple types of eBPF programs (see below). Due to this, there is no limitation what can be done and what needs to be achieved.
For example, cloud data centres of large tech companies deploy changes in a cost-efficient way that were not possible before, e.g.:
- Meta enforces transit encryption at scale in a hetereogenous application landscape
- Netflix does network analytics in a multi-cloud environment
- Cilium uses eBPF to provide „provide high-performance networking, multi-cluster and multi-cloud capabilities, advanced load balancing, transparent encryption, extensive network security capabilities, transparent observability, and much more“
- AWS uses eBPF for securing workloads and improved networking
- Software-defined networking by Microsoft or Google
- DDOS protection in Cloudflare
- Continous profilers in distributed systems without manual collection of traces
- Implement a specialised memory swapping approach for memory intensive databases
- Behavioural analysis of parts of applications to identify and prevent malicious behaviour
- Virtualizing remote storage in a secure fashion
- Many more..
Only recently eBPF has gained novel capabilities and thus a lot of software only touches the surface what is possible. Other possibly more future use cases are:
- Control plane hardening of Cloud APIs – eBPF allows fine-grained control at network and process level also supported by machine learning applications that detect and take automated actions based on anomalous behaviour
- Virtual Battery – Executing Computation when renewable energy is nearby available (e.g. run an AI solution locally to classify your picture or control energy consumption of cloud workloads)
- Modularization at runtime without needing to modularize at design time: One can only allow learned known used functionality of a library by a certain application and only allow access to this functionality – all other functionality is removed from memory and calling it will lead to an error or lazy loading of it
What is the challenge?
eBPF is extremely powerful and one can combine different programs to deliver a superior cloud datacentre experience. It requires a lot of Linux knowledge, strong programming skills in system languages, such as C(++) or Rust, and experience in designing, testing and troubleshooting distributed systems.
However, as – for example – Kubernetes show this can be badly designed introducing a lot of complexity for users and developers that want to deploy often only simple systems consisting of a couple of applications. This complexity makes it unreliable and insecure again. Additionally it has shown as insufficient for multi-tenant and secure applications (e.g. even explained by Kubernetes itself).
Finding the right complexity how one exposes a simple interface to manage a set of eBPF applications for the use cases mentioned above is not trivial. One should keep in mind that it should be simple and different stakeholders require different levels of abstractions and not only one that fails addressing any of them by becoming more complex as well as difficult to manage.
System complexity is generally determined by the number of elements and relations between them and the art is to reduce both of them. Elements can be hardware and software components. Relations between them is the communication through implicit and explicit interfaces between them taking also cross-cutting concerns into account, such as security, reliability and observability.
Some types of eBPF programs?
eBPF enable to modify a lot of aspects of the Linux kernel. There are individual program types, such as:
- eXpress Data Path (XDP)
- traffic control (tc)
- sockets (filter)
- lightweight tunnels (LWT)
- Applications/System calls – this is very powerful as one can „hook“ into virtually any kernel, application or virtual library function
- Linux Security Modules
- … many more and continously evolving
Different program types have access to a different context and also actions they can do in relation to the context. For instance, a traffic classifier (TC) has access to a view on the skbuff kernel structure. The XDP program as access to a view on the xdp_md kernel structure (e.g. Linux 6.0 definition). Both can do different actions related to their context (e.g. XDP vs TC) and also implement custom logic as well as communicate to their user space program (see below). Furthermore, a traffic classifier can instruct to do different things with a a packet compared to a XDP program. This is just an example for some of the networking eBPF programs. You have to study the kernel to find out more about other types of eBPF programs.
Obviously, since they run in the kernel and can have performance critical impact, there are some limitations on them. However, given the architecture of eBPF programs (see next section), those are often not really a limitation, but a help to focus the developer to implement safe, reliable and performance critical software.
Even more importantly and often not well-understood is that you can combine them in arbitrary ways. For example, security scanners could learn typical application behaviour when calling kernel/library functions and prevent suddenly unexpected network traffic occurring immediately afterwards.
Kernels running in different compute in different data centres can be instructed to route data traffic to data centres with current high energy supply of renewables and local application system/library calls are automatically executed transparently remotely without moving the application around. This can lead to a virtualization where one does not use anymore the concept of data centres or regions.
The typical architecture of a eBPF application
A generic architecture of an eBPF application is illustrated in the following figure:
It consists of the following elements
- An application in user space that loads the eBPF program(s) into the kernel
- One or more eBPF programs of a specific type (see previous section) that is interpreted by the kernel virtual eBPF interpreter inside the kernel
- Data structures of different types („maps“) for the eBPF program that it can use optionally also to exchane data with the user space applications
- eBPF kernel functions – different eBPF function that can be accessed depending on the type of EBPF program
The architecture looks simple, but it requires a strong understanding on how Linux works. Often one finds oneself looking in the kernel source code to understand how things are working to make an eBPF application. Performance is crucial so one should do only minimal activities in the eBPF program loaded inside the kernel and delegate the rest to the user space application. This requires an efficient means of communication and right data structures between them.
This becomes even more complex if multiple eBPF programs of different types need to coordinate their actions with each other and everything can be executed in parallel on multiple cores.
Finally, I do not show here how those eBPF work on the distributed level, e.g. across different compute nodes. Here you can for instance redirect network traffic simple to other compute nodes or even call remotely a kernel/application function that was originally called only locally.
Running an eBPF application securely
Obviously you should leverage all options of your toolchain to develop secure programs. For instance, you can use Rust as it helps to develop safer applications. In your toolchain you can then use miri to detect more unsafe behaviour.
An eBPF application requires elevated privileges to run. While it is easy to simply run it as root you must NOT do it. Running it as root gives too much power and opens often additional security issues and backdoors. Instead, give it only the capabilities it needs and run it as a non-root user.
Additionally, you should introduce other layers of defence. Do not provide a public interface to your eBPF application to the public/a large audience, but only through dedicated networks for administrators that needs to access it, so that normal user traffic cannot reach it. This requires sophisticated planning of control plane and data plane network architectures taking also into account proper multi-tenancy aspects.
Tools and libraries
There are plenty of tools and libraries for developing eBPF programs and corresponding user space applications.
The most dominant programming language is C as this is also the development language of the kernel. Since recently one can also use Rust to develop eBPF programs and corresponding user space applications. Rust has the concept of default safe programs and is able to detect a lot of unsafe behaviour at compile time. This is not available in C as it enables default unsafe behaviour and is even very difficult for experts to write safe programs in it. Nevertheless, a Rust program still needs to be well-written and using Rust does not imply automatically that you are safe – in fact you can explicitly specify unsafe behaviour or because the compiler finds issues in your code resort to mark it as unsafe. However, if you reduce this to a minimum you will face considerable less issues compared to C(++) (e.g. in the mobile operating system Android they went down significantly due to the introduction of Rust).
Most of the libraries for eBPF have also bindings to other programming languages, such as Python. While those bindings enable to develop the user space application in another programming language, you can develop eBPF programs that are loaded into the kernel currently only in C or Rust. Some of the libraries are:
- Aya – is a new framework that is completely written in Rust. It does not require to use a C library. It is newer than the other libraries and it is used in production by different companies. However, it is not yet as mature as the other libraries.
- libbpf – is a C library for which also a wrapper exist for Rust. The library is mature, but the wrapper is less.
- bcc – is a C library with a Rust wrapper. It is mature and offers also a Python interface for the user space application.
While those libraries make it much easier to develop eBPF programs and corresponding user space applications, you still require a lot of knowledge about Linux, you should not be shy to browse the Linux kernel source code and need a lot of specific operating system knowledge. It becomes even more complex if your eBPF application works cross compute nodes in a distributed system.
Case study on eBPF programs
I developed a couple of open source eBPF applications to investigate the capabilities of eBPF. I chose Rust as a programming language and Aya as a framework to leverage Rust at its full potential. The use case was to identify ways to protect cloud meta dataservices, such as AWS IMDSv2 or Azure Metadata service. Those services are exposed inside compute instances, such as virtual machines or containers, to enable automatic authentication towards other cloud services (e.g. object stores) and to derive more information on the environment the compute instance is running in.
The following eBPF applications were developed:
- A traffic control (TC) eBPF application that only allows a specific user to access the metadata service (which can be normally accessed by everyone on the compute instance)
- A socket eBPF application that creates a raw socket and uses an eBPF program to filter „suspicious“ communication to the metadata – this avoids that the user space application needs to process all packets
- A uprobe that hooks into the shared library of OpenSSL to decrypt TLS traffic without the need to know the private key for encrypting the communication (if the program that communicates through TLS runs on the same machine). Although metadata services are non-encrypted endpoints it shows what further capabilities can be developed using eBPF by especially looking at the kernel/application functions layer.
eBPF is a novel powerful concept that exists in the Linux kernel and is widely used already today. While some of the possibilities existed already in other systems or even Linux, making it available as loadable programs that can implement their own functionality executed in a highly secure virtual interpreter with high performance characteristics, provides limitless opportunities. This does not exist in a similar fashion in other operating systems.
However, this comes also at a cost. Delivering a complex security, networking and observability application requires a lot of expertise in programming and the Linux kernel. Making such an application easy to configure and understand to reduce introduction of novel attack surfaces is challenging. Especially unnecessary complex abstractions, such as Kubernetes, suffer from this leading to security issues and performance issues. It will still take many years to find simple developer and user abstractions that avoid those issues.
Some show their potential already now: serverless functions, serverless databases or serverless Big Data jobs. Nevertheless, especially at the database and data processing layer they can be still a lot improved – also by incorporating eBPF into the picture.
Tools such as Micro-VMs (e.g. Firecracker) that have the explicit objective of doing one thing and this one right are going in a good direction. Nevertheless, this does not help if one has 100 tools each doing a different thing and one has again the complexity to combine them to do together things well (this issue occurs in Kubernetes, but also cloud platforms). Here you should aim at CUPID principles.
This concept may also inspire other types of application programmer interfaces (APIs). For instance, it is often difficult to extract large data volumes through REST APIs suitable for a specific need. It often involves fetching too much data increasing processing time, networking time and memory needs. If one could provide a program executed by a safe virtual interpreter („virtual API interpreter“) and that would only retrieve the data needed or that communicates only updates to the data then this could bring large benefits compared to traditional query languages. However, we are very far at the moment from conceptualizing this and proving that this indeed brings benefits.
Maybe one day they can enable perfect modularity at a simple layer of abstraction.