Organizing for Serverless from a Kubernetes-based Platform Team perspective

As a team using a kubernetes-based platform, with Amazon Elastic Kubernetes Service or not, you may have operated with the largely employed paradigm: the application teams build and run code packaged in containers and the platform team builds and runs the kubernetes-based platform running the containers. This platform usually includes the containers orchestrator: kubernetes, running stateless microservices but also the CI/CD pipelines, the Infrastructure as Code (IaC) for stateful services like databases and messaging services from AWS, and the monitoring infrastructure. Serverless, with its higher level of abstraction, tends to change the operating model. Application and infrastructure(platform) tend to merge. The combined teams should reorganise accordingly to fully reap the benefits of the technology.
These benefits range from:

higher frequency of delivery,
to shorter lead time to production,
to better alignment between IT load generating revenue and IT costs as recommended by the frugal architect: Systems that last align cost to business.
to higher application units managed by the same platform team,
to self-documented code with Amazon Step Functions. (1)
to less code liability. “Code is a liability. Less code is less liability.” (Jeff Atwood?).
to less toil overall and particularly around security and reliability

In the following lines, you will learn about what to change to fully take advantage of the higher level of abstraction provided by Serverless and the higher level of management provided by AWS leading to the benefits listed above.

A last note before we go. There is no such dichotomy: serverless versus containers. AWS offers more than one method to run serverless containers: AWS Fargate, AWS Lambda, ... for example. Also it’s rare to have an application architecture pure serverless or pure non-serverless. You can absolutely have a container running on an Amanzon EC2 instance of a Managed Node Group from an Amazon EKS cluster communicating with an Amazon DynamoDB table.
If the introduction and title focused on the most opposite implementations: containers running on Amazon EKS on one side and the equivalent AWS Lambda and AWS Step Functions implementation on the other side, on the rest of the article, I will navigate freely between the various shades of grey in between.

(1) From the overview page: “Step Function is a visual workflow service that helps developers use AWS services to build distributed applications, automate processes, and orchestrate microservice”. With Step Functions, the workflow or state machine that was tightly integrated in your code, receives a visual representation and an optimized execution runtime.

Monolith, microservices and smaller microservices

I need to start by clarifying some vocabulary that tends to lead to misunderstandings. From Wikipedia, a monolithic system is a system that is integrated into one whole, analogous to a monolith. This sentence can apply to any code depending on where you put the borders around the system. From Wikipedia, a microservice is an architectural patternthat arranges an application as a collection of loosely coupled, fine-grained services, communicating through lightweight protocols. A container running the code implementing all the features of a domain can be considered both a microservice as it communicates with the containers implementing the other domains via lightweight protocols e.g.: https requests or message bus, and a monolith as it integrates in one whole all the features of the domain. With serverless, you certainly have microservices but you rarely have monoliths, i.e. one unit e.g.: lambda, packaging all the features of a domain.

A microservice in a container often translates in the serverless paradigm into business logic in AWS Lambdas and workflows into AWS Step Functions. In many occasions, it’s possible to run the code in the container in a Lambda running a monolith then. But it has a number of drawbacks listed in this article (https://docs.aws.amazon.com/lambda/latest/operatorguide/monolith.html): larger packages, hard to enforce least privilege, harder to upgrade, harder to maintain, harder to reuse code and harder to test.
Having multiple AWS Lambdas and AWS Step Functions doesn’t mean an increase in complexity or cognitive load. The elements are still one single unit packaged and deployed as a whole. This whole is not a container anymore but the IaC automating the deployment of the different sub units of the system. The code for the infrastructure is not as clearly separated from the code for the application with serverless as it is for containers though.

Infrastructure as Code or Enterprise Integration Patterns as Code.

With serverless, you are exposing the internals of the “monolith” running in your container to the “outside world” but still inside your AWS Account. Some bits of the code is going to run in Lambdas coordinated by a Step Functions. The creation of the Lambdas and Step functions resources can be implemented using the same tools used to create the infrastructure in the container case e.g.: Terraform or AWS CloudFormation. However as the creation of these resources is an integral part of the application design and code, you can’t let it be managed by the platform team through the means of a ticket written by the application development team. Such a setup would create too much friction for the application developers to be efficient. But having developers learn the languages (DSL) of these IaC implementations increases their cognitive load and slows them down as well.

Modern infrastructure as code tools like AWS Cloud Development Kit or Pulumi reduces the effort needed by the application developers to learn the language as these tools exist in traditional programming language like typescript, python and java. An application developers can reuse their knowledge of the language but still have to learn how to write the creation of these resources using a new framework basically. AWS CDK proposes different levels of abstraction. The level 1 constructs map directly to a single AWS CloudFormation resource that will be generated during the deployment. The level 2 constructs still map directly to a CloudFormation resource but with a level of abstraction making the usage more intuitive and embedding security best practices. The level 3 constructs are opinionated patterns combining multiple AWS CloudFormation resources with default property configurations, see: https://docs.aws.amazon.com/cdk/v2/guide/constructs.html.

To fully take advantage of serverless, you want to maintain the same level of abstraction in resource provisioning as serverless offers on top of compute. So you are going to target level 3 constructs. However you may not always find the pattern that fits your need in the list provided by the CDK team, e.g.: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs_patterns.ApplicationLoadBalancedFargateService.html and the ones contributed by the community on: https://serverlessland.com/patterns. On top of that, the level of abstraction provided stays very close to the resource provisioning still instead of adopting the generic language of Enterprise Integration Patterns, for example. So you will or should develop your own L3 constructs allowing your developers to focus on application design more than cloud resources creation and connection. You may consider these constructs as an element of the system delivered by the platform team for serverless.

CI/CD

Not only the IaC changes when comparing kubernetes-based and serverless-based workload but also the pipelines and tools to build and deploy the applications. The concepts continue to be the same with tests pyramid, stages and no downtime deployments. But you simply can’t re-use your pipelines from your containers platform for your serverless’ one even if some tools may stay the same. If you used Amazon Inspector to scan your containers for vulnerabilities before deploying to production, you can continue to use it for Lambdas as it offers an implementation for this runtime as well. But you won’t be able to use your continuous delivery tool gitOps style, e.g.:ArgoCD, for Lambdas the same way you used it to deploy your containers on kubernetes.

On the deployment side, not only you probably won’t use the same tools but also you won’t use the exact same principles. With a container-based platform, each container would have its own pipeline and each deployment would ensure backward compatibility with the rest of the containers and other consumers. That way, any pipeline can be triggered independently at any point in time without risking an issue at the larger system level. However, with serverless, one package corresponding to the previous container implementation deploys multiple components. You need to pay attention to the sequencing of the deployment or ensure backward compatibility between the different internal components of the package. For example, the deployment of an AWS Step Functions version that can complete its workflow correctly only when a specific version of an AWS Lambda function is deployed may cause downtime. Also there are new ways to implement rolling-updates, blue-green and canary with Lambdas and Step Functions : https://aws.amazon.com/blogs/compute/implementing-canary-deployments-of-aws-lambda-functions-with-alias-traffic-shifting/ and https://docs.aws.amazon.com/step-functions/latest/dg/example-alias-version-deployment.html.

Organizational consequences

As mentioned earlier, the responsibility boundaries shift left i.e.: toward the developers, when moving from containers workload to a serverless workload. First, AWS takes a bigger share of responsibility with serverless. Then the dichotomy, application teams developing containers and platform team running the system running the containers, doesn‘t exist in those terms with serverless. Some organizations may decide not to deploy a platform team for serverless workloads as the responsibility taken by AWS increases. However with the increase in cognitive load for the application teams the need for an enabling team, that helps a stream-aligned team to overcome obstacles and detect missing capabilities, increases. A serverless platform doesn’t have to run the system that runs the containers but has plenty of opportunities to make the application teams productive with serverless.

The responsibilities a serverless platform team can take resides in:
1- The creation of high level IaC patterns allowing the developers to focus on application development abstracted from the underlying infrastructure.
2- the implementation of CI/CD pipelines blueprints to get developers quickly started with their development cycles.
3- The setup of stages to validate applications before reaching production and/or mechanisms decoupling deployment from release.
4- The provisioning of development environment. With serverless, developers need to be able to test on AWS without interfering with the other developers.
5- The implementation of elements of non-functional requirements that can be decoupled from the application implementation like backup, infrastructure for A/B testing, monitoring and security.
These responsibilities are a mix of new ones coming from serverless and ones that already existed with kubernetes-based platforms.

Switching costs

Transitioning from a kubernetes-based platform to a serverless-based platform generates multiple switching costs. Changes in architectures, skillsets, organization and mental models all generate costs. These costs have to be balanced with the expected returns in a higher ratio: opportunity costs versus daily rates, more efficient operations and better aligned IT costs and revenues. All companies are not equal with these benefits. Companies where excellence in IT has visible impacts on the revenues have a better balance between costs and benefits moving from containers to serverless than companies where IT is a cost center and better IT doesn’t impact the revenue significantly. In other words, if the motivation for the change is to reduce costs to increase margin, your balance costs/benefits may not positively tilt.

More serverless

In the introduction, we have already seen that implementations are not simply non-serverless on one side and serverless on the other side. They often mix components of both worlds. So you may decide to gradually progress toward more serverless evolving the organization and the mindset smoothly. One team, we worked with decided to adopt Amazon ECS Fargate alongside their Amazon EKS clusters. They used tools like Crossplane and AWS Controllers for Kubernetes to manage the Amazon ECS resources as they were doing with their kubernetes resources. It allowed them to maintain most of their existing paradigm while delegating more work to AWS and reaping some more benefits from the serverless stack. They used the model Kubernetes is my control plane but not my data plane from our colleague Massimo Re Ferre in his blog posts Kubernetes as a platform vs. Kubernetes as an API.

Conclusion

We went through the changes in mental model, tools and organization for a kubernetes-centric team when adopting serverless.
I highlighted benefits gained with serverless like self documented workflows with Step Functions.
I showed that monoliths in AWS Lambda-based serverless, if possibles, are an antipattern.
I demonstrated the impact on the IaC when working with serverless and the need to change the tooling.
This impact also becomes apparent on the CI/CD pipelines both on the tooling and on the pipeline configuration.
I summarized and extended the changes in roles and responsibilities between application teams and the platform team presented all along the sections. It concludes that the costs benefits equation depends on the relationship between IT excellence and revenue.
Finally, I highlighted the possibility to gradually move toward more serverless implementations as a glide path requiring less changes at once.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Site Terms, Privacy, and more.