AWS Logo
Menu

Supercharge Your EKS Scaling with KEDA and Application Load Balancer (ALB) Metrics

Learn how to use Application Load Balancer (ALB) metrics with KEDA on EKS to auto-scale workloads based on demand for efficient resource management.

Published Nov 27, 2024
In part 1 of this blog series, we explored the basics of using Kubernetes Event-Driven Autoscaling (KEDA) with Amazon Elastic Kubernetes Service (EKS) to scale workloads down to zero based on demand. In this part, we’ll take it a step further by discussing how to leverage Application Load Balancer (ALB) metrics as triggers for KEDA to automatically scale workloads up or down.

Why Scale with ALB Metrics?

ALB provides key metrics such as RequestCount and TargetResponseTime, which can be used to gauge incoming traffic and application performance. By leveraging these metrics, you can scale based on demand without having to provision excessive resources. Scaling based on RequestCount is particularly useful for traffic-heavy applications, while TargetResponseTime is helpful for applications where latency is a critical factor. KEDA can consume these metrics via Amazon CloudWatch and use them as triggers to scale pods dynamically, making your workload more responsive and cost-efficient.

Setting Up KEDA with ALB Metrics

Let’s go through the configuration to enable autoscaling with KEDA using CloudWatch metrics from ALB.

1. Install KEDA

If you haven’t already installed KEDA on your Kubernetes cluster, you can do so with the following commands:
You can find detailed installation steps for KEDA in Part 1 of this blog.
2. Create an AWS IAM Role for Accessing ALB Metrics
Your Kubernetes cluster needs permission to read CloudWatch metrics. You can create an AWS IAM role with the necessary permissions and associate it with your Kubernetes service account.
Attach this policy to the IAM role used by the KEDA operator:

3. Set Up CloudWatch Metrics for ALB

Make sure your ALB metrics are being sent to CloudWatch. By default, ALB sends several key metrics, such as RequestCount, TargetResponseTime, and HTTPCode_ELB_5XX_Count. For this example, we’ll use RequestCount.

4. Define a KEDA ScaledObject

A ScaledObject is a KEDA custom resource that defines the criteria for scaling. Here’s an example YAML file for scaling based on RequestCount
In this configuration:
  • targetValue The number of requests per 5-minute window that triggers scaling.
  • metricCollectionTime How long in the past (seconds) should the scaler check AWS Cloudwatch.
  • cooldownPeriod The time KEDA waits before evaluating scaling changes again.
  • identityOwner Receive permissions for CloudWatch via Pod Identity or from the KEDA operator itself.
If you’re using Pod Identity to grant access to AWS services, you’ll need to define a TriggerAuthentication resource for KEDA as follows:
This configuration allows KEDA to authenticate with AWS using the assigned IAM role for service account (IRSA) associated with the pod. Ensure the IAM role attached has the necessary permissions to access CloudWatch metrics, as previously specified.

Applying and Testing the Configuration

Apply the ScaledObject and TriggerAuthentication
  1. Monitor the scaling behaviour. As traffic increases, the number of pods should scale up, and as traffic decreases, it should scale down based on the thresholds specified.
  2. You can adjust targetValue and queryWindow parameters based on your traffic patterns for optimised scaling.

Best Practices

  • Experiment with Target Values: Depending on your application’s traffic, you might need to tune the targetValue to find the ideal threshold.
  • Monitor Costs: Since CloudWatch metrics can incur costs, consider limiting the query frequency or time range as appropriate.
  • Set Max Replicas Carefully: Set maxReplicaCount based on the maximum capacity your cluster can handle.
  • Combine with Other Metrics: You can configure multiple triggers in a single ScaledObject, allowing you to scale based on a combination of metrics (e.g RequestCount and TargetResponseTime).

Conclusion

Integrating KEDA with ALB metrics unlocks a dynamic, event-driven scaling solution that keeps your applications consistently performing at their best, regardless of fluctuating demands. This approach goes beyond traditional static or CPU-based autoscaling by adapting to real-time traffic directly from your ALB metrics, allowing you to optimize resource usage precisely when and where it’s needed most. With this setup, you’ll be able to handle peak loads effortlessly while scaling back during quiet periods, leading to substantial cost savings and a far more efficient infrastructure.
 

Comments