Novel ways of providing identity to automated cross-cloud processes – Workload Identity Federation and SPIFFE

Growing cybercriminal activities let to a boom of access rights solutions for user access to data and systems. Recently they have been even more augmented by provisioning of hardware FIDO2 based tokens (available also as open source, such as Nitrokey) to ensure the identity of a user and then being able to provide the right permissions to a user. However, application access to data (or application-to-application, A2A) still is often stuck on-premise and also cross-clouds using the traditional username/password or certificate identity based access, which is problematic from a security perspective and one of the reasons of many data breaches.

I will present in this blog post standard mechanisms to avoid this within the same cloud showing this for different popular cloud providers. Then, I will focus on novel mechanisms on how to ensure this cross-clouds using Workload Identity Federation or SPIFFE Verifiable Identity Documents. I will continue with what scenarios work and which do not work yet taking especially into account „serverless“ (fully managed by cloud provider) workloads. Finally I will give an outlook on future needs.

What is wrong with current application-to-application access?

Many companies use for application to application access static credentials (username and password or a certificate), e.g. a web application accesses a database using a specific username/password of a so-called „technical user“. Other similar approaches are cloud (e.g. AWS) secret keys/access keys, account access keys, API Tokens, access tokens etc. Alternatively, some of them may use certificate-based access. Certificate-based access is a bit more safer compared to username/password access, because for username/password one can easier try out different combinations to find out the right one. However, this does matter much less if the password is reasonable long and random.

While it is a good practice to rotate (change) certificates and passwords regularly, in practice this is not always done or only within very long time frames (e.g. months/years).

This means the risk that they could get stolen or leaked (intentionally or by accident) and abused is high. Nowadays systems are complex with many different components, which increases this risk. Once they are stolen, the credentials can be often used to take over organisations or even full clouds. While complex cloud environments allow to define fine-granular permissions, one often find the opposite, because people do not know well the clouds and configure very permissive policies just to „get things done“.

All those factors increase the risk of static credentials, but there are solutions to it, which due to their high need of automation exist unfortunately often only in the cloud.

How is it solved within the same cloud?

As written before, we would like to ensure the following properties:

  • No static credentials that are stored anywhere
  • Credentials must be short-lived ie minutes to hours, so that if they by accident leak they are most likely valueless

Within the same cloud, e.g. within AWS or Microsoft Azure, there are proprietary mechanism to ensure this.

For example, in AWS you can assign a role to an instance profile of a EC2 VM. This role provides permissions, for instance, to access S3. Within this EC2 VM any process can connect to the Metadata Service (IMDSv2) that is only accessible inside the VM. From there, it can get temporary security credentials that have the permissions to access S3. Those credentials are short-lived and they are not stored anywhere (they can be even limited to the scope of the same VM). This type of access works with any type of compute in AWS that needs to access another AWS service and thus is not limited to EC2, the same or a similar concept exists for Lambda functions, Glue Jobs, Fargate Containers etc.

Similarly, Microsoft Azure allows to assign Managed Identities with corresponding permissions to a Azure VM. This role provides permissions, for instance, to access Microsoft Azure Blob Storage. Again any process in the Azure VM can connect to the Metadata Service that is only accessible inside the VM. There, you can get temporary security credentials that have have the permissions to access the Microsoft Azure Blob Storage. Those credentials are short-lived and they are not stored anywhere. This type of access work with any type of compute in Azure that needs to access another Azure service, such as Azure Functions, Microsoft Synapse, Microsoft Container Instances etc.

However, all those cloud-specific solutions have the issue that they cannot be used to access any resources cross cloud.

For example, one cannot use them to access from a AWS Lambda Function a Microsoft Azure Blob storage, because AWS does not know anything how the existence of Microsoft Azure Blob Storage and for sure cannot issue valid credentials for it. On the other hand, Microsoft Azure does not know anything about AWS Lambda and would have issues to validate its identity.

Hence, other approaches are needed to make this available cross-cloud.

How can it be solved cross-clouds?

Theoretically, the one can use similar approaches that work within a cloud cross-clouds. Practically, the different cloud providers would need to agree on a standard how to do it. Currently, there are different emerging concepts, which have different cloud and technology support. I will describe the following here

  • Workload Identity Federation: This is a more general concept that includes also serverless applications and cloud-specific services and is based on OpenID Connect (OIDC)
  • Secure Production Identity Framework for Everyone (SPIFFE): This is a specific concept mostly limited and supported by Kubernetes. It is often used also in connection with OIDC, but it is not strictly required.

Both work also in on-premise or other data centre scenarios.

Workload Identity Federation

Workload Identity Federation was early mentioned by Google in a blog post to enable secure keyless access from other clouds/on-premise to Google Cloud infrastructure.

While other cloud provider also tried to tackle this (e.g. AWS IAM Roles Anywhere), they relied still on static credentials/certificates for each workload, which is not only cumbersome to operate (e.g. rotating/revoking credentials), are non-standardized and it is very difficult to have properly an overview which types of workload have access where. However, AWS IAM Roles Anywhere could be also used in theory to imitate something like workload identity federation, but it would still be a lot of effort.

The approach mentioned by Google was different. It relied on an already very well-established standard, OpenID Connect (OIDC) and it made it easy to create policies on cloud resources that link a unique identifier of a workload to a policy. Originally, OIDC was mostly for human users, but it is also useful for application-to-application (A2A) access as described here.

I will present now a more abstract scenario and later demonstrate a concrete example.

In this abstract scenario a workload (hosted in one cloud data centre) needs to access a destination service (hosted in another cloud/data centre).

We see in this diagram the following components:

  • Workload
    • Workload Identity Provider (OIDC): The Workload Identity Provider provides a JWT Token for the workload running in the cloud or a data centre. This should be usually injected into the workload when it starts running and depending on the runtime regularly be refreshed. It needs to be registered once with the destination service identity manager, so it can issue valid JWT tokens that are accepted by the destination cloud. The workload identity provider must have a notion/way to identify a workload locally and to inject the right JWT token to it, ie a default/standard OIDC Identity Provider will not work.
    • Workload Cloud/Data Centre: This workload runs into one cloud/data centre and needs to call a service in another cloud/data center without the need to use static credentials/certificates. It provides to the temporary token provider in another cloud/data centre its JWT token
  • Destination
    • Cloud/Data Centre Identity Manager: This component is federated with the identity providers of workloads running in different clouds. It validates for components (e.g. cloud/data centre services) in the same cloud/data centre the identity of workloads in other clouds.
    • Temporary Token Provider: The temporary token provider provides the workload from the other cloud a temporary access token (shortlived) to the services it should have access in the same cloud. This is based on the JWT Token and verified by the Cloud/Data Centre Identity Manager.
    • Cloud/Data Centre Service: This service can be called by the workload running in the other cloud/data centre based on the issues temporary token.

Since all tokens are short-lived, randomly generated the risk that they are leaked is low and if they are leaked they cannot be used because they already expired. Furthermore, it is easier to provide only minimal access for the given token based on the identity of the workload.

Let us explore a more concrete example. We use an example described here from perspective of AWS and here from perspective of Github:

In this example, we have Github Actions (a CI/CD pipeline running in the Github cloud) accessing a AWS S3 bucket (e.g. to store some pipeline artifacts). This is done without static credentials – also AWS advises against using static credentials (see here).

Here the Github OIDC provider is registered with AWS IAM as a federated provider. The JWT token is inserted in the Github action flow transparently.

On the AWS side an IAM role is created once that describes the minimal permission the role should have (in this case S3 access) and which Github actions in which Github repository can assume the role. This role defines in its trust relations the workload which should be allowed to use the role. Example:

"Condition": {
  "StringEquals": {
    "token.actions.githubusercontent.com:aud": "sts.amazonaws.com",
    "token.actions.githubusercontent.com:sub": "repo:<EXAMPEL-ORG>/example-repo:environment:prod"
  }
}

Here, only the Github Action pipeline of repository <EXAMPEL-ORG>/example-repo is configured to assume the AWS role.

Within the Github Actions pipeline a special action is called to request from the AWS Security Token Service (STS) a temporary token using the injected JWT Token.

This then can then be used by the Github actions pipeline to access AWS S3.

One important thing is that you need to trust the cloud provider hosting the original workload to provide correct identities of the workload and that they securely implemented this to avoid that a workload can impersonate another workload.

Furthermore special care should be taken (e.g. software supply chain security) to avoid that malicious code (especially from third party libraries/code) is run. While short-living credentials with minimal permissions still reduce the risk, you still should take care of a secure software supply chain.

Of course the concepts here work with multiple clouds/data centres and technologies as they are based on well-established standards. However, the more different cloud/data centres you need to integrate it, the more complex it can become to manage their integration.

Secure Production Identity Framework for Everyone (SPIFFE)

Secure Production Identity Framework for Everyone (SPIFFE) is mostly limited to the Kubernetes world and has not been adapted to other types of compute, although technically possible.

Technically, in most clouds, SPIFFE uses internally in the background very similar/the same mechanisms as workload identity federation, although other technical realizations are possible.

One key concept is the SPIRE server, which can be compared roughly to the workload-specific identity provider. Many cloud providers realize this by having with a Kubernetes cluster an own OIDC provider associated that knows and can distinguish between different workloads in a Kubernetes cluster (e.g. OVH, AWS, Google, Azure etc.) – this often works without SPIRE/SPIFFE, but can be used together with it.

The SPIRE agent is deployed on each node of the cluster and provides insights on workloads to the SPIRE server. It also assigns so-called SPIFFE IDs to the workloads that can be used in access policies of cloud services so that only a specific workload can access them. Furthermore, the SPIRE agent can provide, for example, temporary JWT Tokens (JWT SPIFFE Verifiable Identity Document (JWT-SVID)) to the workload that are then used to obtain a temporary token by the destination cloud provider. This then can be used to access a service of that cloud provider, if the permissions are the right ones.

The workload needs to run certain steps to finally access the destination cloud service (see example here). This means more complex scenarios, e.g. Spark, a distributed data processing engine, on Kubernetes, do not work yet out of the box.

Cross-cloud scenarios and their maturity

Find here some cross-cloud scenarios authentication scenarios. The table describes a workload type (often a specific cloud service), the workload source cloud/data centre, the destination service, destination cloud, the mechanism and some further information.

Note: This list is not complete and will likely evolve over time. Also for the SPIFFE scenario only one representative scenario is chosen, but since this is related to Kubernetes it works like this similarly independently where you deploy it. Also this table includes only out-of-the-box scenarios. Workload Identity Federation is based on well-established standards, so you can always with some manual „plumbing“ make any combination of any cloud service work. This is not recommended and if not done carefully also a security risk.

Workload TypeWorkload Source Cloud/Data CentreDestination ServiceDestination CloudMechanismFurther Information/Examples
Github ActionGithub* (All, e.g. S3)AWSWorkload Identity Federation (OIDC)Use IAM roles to connect GitHub Actions to actions in AWS
Gitlab CI/CD PipelinesGitlab* (All, e.g. S3)AWSWorkload Identity Federation (OIDC)Configure OpenID Connect in AWS to retrieve temporary credentials
* (All services that can be assigned a service account)Google*AWS, Azure etc.Workload Identity Federation (OIDC)Access AWS using a Google Cloud platform native workload identity
Managed KubernetesAzure, Google Cloud, OVH etc.* (All)AWSSPIFFEAWS OIDC Authentication
Some cross-cloud scenarios for keyless workload authentication supported out-of-the-box

As you can see there are some cross-cloud scenarios, but there are important ones still missing. For instance, let us assume there is a Microsoft Synapse Spark job that should access S3. At the moment, there is no identity provider in Microsoft Azure that can provide Microsoft Synapse Spark jobs an identity and thus workload identity federation would not work. Similarly, also the other way around: A AWS Glue job accessing a Microsoft Azure Blog Storage is also not possible via workload identity. Currently, for both you need to use static credentials/certificates, which you should not use, because this is inherently insecure.

Unfortunately for many cloud-managed services, such as AWS Glue, AWS Lambda, Microsoft Functions, Microsoft Synapse analytics etc. it is at the moment not possible to use workload identity federation out-of-the-box (and I would advise strongly against „manually“ trying to make it work, e.g. as exemplified here for AWS workloads identifying to Azure). Most advanced in that respect seems to be currently Google Cloud as it has the workload identity federation baked in many of its services through its service account that you can assign to services. By registering Google Cloud as an identity provider in any cloud you can provide access to service accounts of individual Google Cloud workloads.

Conclusion

Authentication and Authorization of workloads cross-cloud/data centres of different organisations is of growing importance. Until recently this was done mostly using static credentials/certificates. The issue with those are that they are rotated very rarely/not at all and if they are leaked they offer powerful access to enteprise data assets/critical workloads from anywhere. Leaking can happen in many different ways, such as software supply chain attacks, programming bugs (e.g. logging of secrets) or simply by administrators with access to secret stores.

Workload Identity Federation or SPIFFE as a technology-specific concept address this using already well-established standard such as OpenID Connect (OIDC). The idea is that they provide workloads an identity that can be recognized in other clouds/data centres. By using those only short-lived credentials (few minutes/hours) are used that when they are leaked often already are invalid. Furthermore, they are rotated in an automated fashion without any custom rotation logic very frequently.

However, this requires that workloads provide essentially their own identity provider, which is not yet the case for all cloud workloads, but the possibilities are growing every day.

Also Software-as-a-Service (SaaS) solutions embark on this journey. Predominantly, we find here Continuous Integration (CI)/Continuous Deployment (CD) services as they often are subject to attacks as they store a lot of static credentials/certificates and/or have access to sensitive secret stores.

One should though be very careful on which workload identity provider to trust and make sure they have adequate security measures in place. Also when registering them you should be very cautious to update the registration (e.g. client secrets) regularly. It makes perfectly sense to have some isolation, e.g. do not register one workload identity provider per cloud/data centre, but apply proper isolation and use multiple identity provider per cloud/data centre.

Finally, keep in mind that cross-cloud scenarios are complex and are also more complex security-wise – despite mechanisms such as workload identity federation. Try to keep those cross-cloud connections to a minimum to reduce complexity also from a security perspective. It will otherwise become very complex to get an overview and many cross-cloud security monitoring tools have at the moment no way to monitor/assess cross-cloud workloads properly.


Kommentare

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert