Scale to Zero in EKS with KEDA
Learn how Kubernetes Event-Driven Autoscaling (KEDA) overcomes HPA limitations in Amazon EKS, optimizing "scale to zero" and autoscaling for cloud-native apps.
Published Nov 27, 2024
Imagine you’re running a multi-tenant, production-grade kubernetes clusters on Amazon EKS, relying on the Horizontal Pod Autoscaler (HPA) for autoscaling the workloads. While HPA will serve you well in gerneral, you will encounter some limitations as we grew and face with increasing demands on your clusters. This series of blogs will take a look at providing alternate autoscaling options for such situations
In this blog, we’ll explore the limitations of HPA, and how we can address these challenges by adopting KEDA (Kubernetes Event-Driven Autoscaling).
Kubernetes provides HPA to adjust the number of pod replicas based on CPU and memory utilization. While this is sufficient for many use cases, HPA comes with several limitations:
- No Scaling to Zero: HPA cannot scale workloads to zero as it uses metrics such as CPU and memory utilization which cannot reach zero. This is inefficient for intermittent workloads where scaling to zero would optimize costs.
- Limited to Metrics: HPA can only scale based on metrics like CPU and memory. It cannot scale based on custom events or HTTP traffic.
- Dependence on Metrics Aggregators: HPA requires integration with external metrics providers, such as the Kubernetes Metrics Server, Prometheus Adapter, or the deprecated k8s-cloudwatch-adapter, which adds operational complexity.
- Inaccuracy in Scaling Decisions: HPA’s scaling decisions may not align precisely with actual target metrics, as the scaling algorithm often lags behind the real-time metrics of the system.
As your clusters grow, the variety of workloads you manage increases, demanding more sophisticated autoscaling strategies. This often results in the need to scale workloads not only based on internal metrics but also on external metrics from services like Amazon Managed Prometheus, AWS CloudWatch, or Kafka. Kubernetes’ External Metrics API allows autoscaling based on these metrics, but there’s a key limitation: only one metrics server can serve external metrics (
external.metrics.k8s.io
) per cluster.This would limit you to choose one external metrics server (e.g., CloudWatch Adapter or KEDA) for providing the external metrics. This limitation is particularly problematic when you need to scale on multiple event sources simultaneously.
The k8s-cloudwatch-adapter has been the go-to solution for many, to perform scaling based on metrics from AWS CloudWatch. This has been particularly useful for autoscaling based on SQS Queue length. However, since Kubernetes v1.22, this adapter has been deprecated, and AWS has stopped maintaining it. Instead, AWS now recommends using KEDA for event-driven autoscaling.
KEDA (Kubernetes Event-driven Autoscaler) is a lightweight, open-source component that allows Kubernetes workloads to autoscale based on external event-driven metrics. It works alongside HPA, extending its capabilities without duplicating its functionality. KEDA is flexible, enabling scaling for different Kubernetes workloads (e.g., Deployments, StatefulSets, Jobs) based on event sources outside the cluster.
Some of KEDA’s key features include:
- Scaling workloads based on external events such as AWS SQS, Kafka, Prometheus metrics, or custom event sources.
- Support for scaling to and from zero, addressing one of the primary limitations of HPA.
- Extensibility, allowing users to create custom scalers if needed.
- Ability to scale different types of Kubernetes workloads such as Deployments, StatefulSets, and Jobs based on different type of events.
- Ability to scale custom resources as long as the target Custom Resource defines a /scale subresource.
Furthermore, KEDA is part of CNCF and is widely used for event scaling in Kubernetes, and has a big community backing.
How KEDA Works
The below diagram illustrates how KEDA integrates with the Kubernetes Horizontal Pod Autoscaler (HPA), external event sources, and the Kubernetes etcd data store:
KEDA has two components which performs two key roles within Kubernetes:
- Agent: The KEDA operator acts as an agent, activating and deactivating Kubernetes workloads to scale them up or down (including scaling to and from zero). It manages HPA for applications and ensures that the desired number of replicas matches the current load.
- Metrics: KEDA also acts as a Kubernetes metrics server, exposing event-driven metrics (like queue length or stream lag) to HPA. These metrics inform HPA when to scale up or down based on external event sources.
There is an optional 3rd component to KEDA as well
*we will not be configuring any admission webhooks in this blog to avoid complexity
KEDA is installed in the cluster as two key components:
- keda-operator: Manages the scaling logic and maintains the application’s HPA.
- keda-operator-metrics-apiserver: Acts as the metrics server, exposing external metrics to Kubernetes.
In order to streamline the deployment of KEDA into your EKS cluster, you can use Helm and Terraform as follows
First, an IAM role that allows KEDA to communicate with AWS services like CloudWatch or SQS:
With the IAM role created, KEDA can now be deployed using Helm. Below is a sample terraform snippet of how to install KEDA using a Helm release
Once installed, we can verify the health of KEDA’s components by running:
kubeclt get pods -n keda
You should see output similar to:
In order to demonstrate scaling a workload based on an external metric, let’s take a look at the below architecture.
We have deployed a demo application which will polls a SQS queue, and writes a record to a DynamoDB table upon receiving a message.
Sample deployment file
Keda ScaledObject
This is a key component of the whole orchestration, it defines the criteria for autoscaling the workload based on external metrics or event-driven triggers. It specifies the target deployment, the scaling behaviour, and the metrics or events (like queue length) that KEDA should monitor, allowing the workload to scale dynamically in response to real-time demand.
Below is a sample ScaledObject
In the below demonstration you can see this all in action as follows
Terminal 1 (top left hand corner)
kubectl get pods -n keda-test -w
This keeps watching the pods in the keda-test workspace, note that there are no pods at the start, and it grows as the queue length increases (terminal 2)
Terminal 2 (bottom left hand corner)
This is running a script to print the sqs queue length on the screen
Terminal 3 (top right hand corner)
This is running a script to push messages to the sqs queue
Terminal 4 (bottom right hand corner)
This is running a script to count the number of entries in the DynamoDB table, not that the item count stays 0 while there are no pods at the start, but then start increasing as the pods come up to process the messages in the queue
By adopting KEDA, you can unlock the ability to scale Kubernetes workloads based on external even-driven metrics, overcoming the limitations of HPA. This will allows you to cater to diverse scaling requirements, enabling more efficient and cost-effective scaling for your workloads. Whether it’s scaling based on external metrics like AWS CloudWatch or event sources like Kafka, KEDA provides a flexible and extensible solution to meet our growing scaling needs.