Supercharge Your EKS Scaling with KEDA and Application Load Balancer (ALB) Metrics
Learn how to use Application Load Balancer (ALB) metrics with KEDA on EKS to auto-scale workloads based on demand for efficient resource management.
Published Nov 27, 2024
In part 1 of this blog series, we explored the basics of using Kubernetes Event-Driven Autoscaling (KEDA) with Amazon Elastic Kubernetes Service (EKS) to scale workloads down to zero based on demand. In this part, we’ll take it a step further by discussing how to leverage Application Load Balancer (ALB) metrics as triggers for KEDA to automatically scale workloads up or down.
ALB provides key metrics such as
RequestCount
and TargetResponseTime
, which can be used to gauge incoming traffic and application performance. By leveraging these metrics, you can scale based on demand without having to provision excessive resources. Scaling based on RequestCount
is particularly useful for traffic-heavy applications, while TargetResponseTime
is helpful for applications where latency is a critical factor. KEDA can consume these metrics via Amazon CloudWatch and use them as triggers to scale pods dynamically, making your workload more responsive and cost-efficient.Let’s go through the configuration to enable autoscaling with KEDA using CloudWatch metrics from ALB.
If you haven’t already installed KEDA on your Kubernetes cluster, you can do so with the following commands:
You can find detailed installation steps for KEDA in Part 1 of this blog.
2. Create an AWS IAM Role for Accessing ALB Metrics
Your Kubernetes cluster needs permission to read CloudWatch metrics. You can create an AWS IAM role with the necessary permissions and associate it with your Kubernetes service account.
Attach this policy to the IAM role used by the KEDA operator:
Make sure your ALB metrics are being sent to CloudWatch. By default, ALB sends several key metrics, such as
RequestCount
, TargetResponseTime
, and HTTPCode_ELB_5XX_Count
. For this example, we’ll use RequestCount
.A
ScaledObject
is a KEDA custom resource that defines the criteria for scaling. Here’s an example YAML file for scaling based on RequestCount
In this configuration:
targetValue
The number of requests per 5-minute window that triggers scaling.metricCollectionTime
How long in the past (seconds) should the scaler check AWS Cloudwatch.cooldownPeriod
The time KEDA waits before evaluating scaling changes again.identityOwner
Receive permissions for CloudWatch via Pod Identity or from the KEDA operator itself.
If you’re using Pod Identity to grant access to AWS services, you’ll need to define a
TriggerAuthentication
resource for KEDA as follows:This configuration allows KEDA to authenticate with AWS using the assigned IAM role for service account (IRSA) associated with the pod. Ensure the IAM role attached has the necessary permissions to access CloudWatch metrics, as previously specified.
Apply the ScaledObject and TriggerAuthentication
- Monitor the scaling behaviour. As traffic increases, the number of pods should scale up, and as traffic decreases, it should scale down based on the thresholds specified.
- You can adjust targetValue and queryWindow parameters based on your traffic patterns for optimised scaling.
- Experiment with Target Values: Depending on your application’s traffic, you might need to tune the
targetValue
to find the ideal threshold. - Monitor Costs: Since CloudWatch metrics can incur costs, consider limiting the query frequency or time range as appropriate.
- Set Max Replicas Carefully: Set
maxReplicaCount
based on the maximum capacity your cluster can handle. - Combine with Other Metrics: You can configure multiple triggers in a single
ScaledObject
, allowing you to scale based on a combination of metrics (e.gRequestCount
andTargetResponseTime
).
Integrating KEDA with ALB metrics unlocks a dynamic, event-driven scaling solution that keeps your applications consistently performing at their best, regardless of fluctuating demands. This approach goes beyond traditional static or CPU-based autoscaling by adapting to real-time traffic directly from your ALB metrics, allowing you to optimize resource usage precisely when and where it’s needed most. With this setup, you’ll be able to handle peak loads effortlessly while scaling back during quiet periods, leading to substantial cost savings and a far more efficient infrastructure.