Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

AWS Logo
Menu

EKS Cost Optimization: Dynamic Scaling with EventBridge & Lambda

Amazon EKS enables cost optimization by automating scaling down of clusters during off-peak hours using AWS Lambda and EventBridge, reducing EC2 expenses

Published Mar 21, 2025
Amazon EKS offers flexibility in deploying Kubernetes applications, with EC2 instances being a popular choice for node hosting. This guide focuses on optimizing costs by automating the scaling down of EKS clusters during off-peak hours. By leveraging AWS Lambda Functions and AWS EventBridge, you can efficiently reduce EC2 computing expenses without sacrificing operational efficiency.

Understanding the Components

AWS EventBridge:
  • Central hub for event-driven architecture.
  • Integrates various AWS services and custom applications.
  • Captures events from AWS services, SaaS apps, and custom sources.
AWS Lambda:
  • Serverless computing service.
  • Executes code in response to triggers.
  • Supports multiple programming languages.
  • Executes custom logic in response to events received from EventBridge.
Amazon Elastic Kubernetes Service (EKS):
  • Managed Kubernetes service on AWS.
  • Simplifies deployment, management, and scaling of containerized applications.
  • Eliminates the need to manage underlying infrastructure.
  • Enables efficient resource utilization and scaling based on demand.
    Image not found
    Architecture

    Steps Need to follow :
Step 1 : Configure an AWS IAM policy for EKS Cluster
Step 2 : Create an IAM role for the new policy for Lambda Function
Step 3 : Create Lambda Functions for ScaleUp and ScaleDown
Step 4 : Create EventBridge scheduler for Scale Down
Step 5 : Create EventBridge scheduler for Scale Up
Step 1 : Configure an AWS IAM policy for EKS Cluster
First we need to create an IAM policy for EKS Cluster. It allows ListNodegroups, UpdateNodegroupConfig, and DescribeNodegroup functions.
In IAM Console -> Access Management -> Policies -> Create Policy
Image not found

 
Image not found

On the JSON tab need to enter following policy code :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"eks:ListNodegroups",
"eks:UpdateNodegroupConfig",
"eks:DescribeNodegroup"
],
"Resource": [
"arn:aws:eks:CLUSTER_REGION:ACCOUNT_ID:cluster/CLUSTER_NAME",
"arn:aws:eks:CLUSTER_REGION:ACCOUNT_ID:nodegroup/CLUSTER_NAME/*/*"
]
}
]
}
(You need to update cluster details in above code)
Step 2 : Create an IAM role for the new policy for Lambda Function
In IAM Console -> Access Management -> Roles-> Create Role
Image not found

Select Trusted entity type as AWS Service and Use case as Lambda.
Image not found
Next to the permissions and select the policy that you created in previous step.
Image not found
Next to the review and enter role name as you preferred.
Step 3 : Create Lambda Functions for Scale Up and Scale Down
Step 3.1 : Create Lambda Functions for Scale Up
In Lambda -> Create Function
Image not found
In Permission, Select Use an existing role and select role that you created in previous step.
Image not found
Next add below python code in the code section and Deploy it.
Image not found
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import boto3

def describe_nodegroup(client, cluster_name, nodegroup_name):
return client.describe_nodegroup(
clusterName=cluster_name,
nodegroupName=nodegroup_name
)['nodegroup']['scalingConfig']

def update_nodegroup_config(client, cluster_name, nodegroup_name, scaling_config):
client.update_nodegroup_config(
clusterName=cluster_name,
nodegroupName=nodegroup_name,
scalingConfig=scaling_config
)

def update_nodegroup_sizes(eks, cluster_name, nodegroup_sizes):
for nodegroup_name, size in nodegroup_sizes.items():
current_size = describe_nodegroup(eks, cluster_name, nodegroup_name)
if size != current_size['desiredSize']:
update_nodegroup_config(eks, cluster_name, nodegroup_name, {'desiredSize': size})
print(f"Updated desired size for node group {nodegroup_name} to {size}")
else:
print(f"Desired size is already {size} for node group {nodegroup_name}")

def update_nodegroup_limits(eks, cluster_name, nodegroup_limits, key):
for nodegroup_name, limit in nodegroup_limits.items():
current_limit = describe_nodegroup(eks, cluster_name, nodegroup_name)
if limit != current_limit[key]:
update_nodegroup_config(eks, cluster_name, nodegroup_name, {key: limit})
print(f"Updated {key} for node group {nodegroup_name} to {limit}")
else:
print(f"{key.capitalize()} is already {limit} for node group {nodegroup_name}")

def lambda_handler(event, context):
eks = boto3.client('eks')
cluster_name = "CLUSTER_NAME"

nodegroup_sizes = {
'NODE_GROUP_1': ACTUAL_SIZE,
'NODE_GROUP_2': ACTUAL_SIZE,
'NODE_GROUP_3': ACTUAL_SIZE
}

nodegroup_minsizes = {
'NODE_GROUP_1': MIN_SIZE,
'NODE_GROUP_2': MIN_SIZE,
'NODE_GROUP_3': MIN_SIZE
}

nodegroup_maxsizes = {
'NODE_GROUP_1': MAX_SIZE,
'NODE_GROUP_2': MAX_SIZE,
'NODE_GROUP_3': MAX_SIZE
}

update_nodegroup_limits(eks, cluster_name, nodegroup_maxsizes, 'maxSize')
update_nodegroup_sizes(eks, cluster_name, nodegroup_sizes)
update_nodegroup_limits(eks, cluster_name, nodegroup_minsizes, 'minSize')
(You need to update cluster details in above code)
Step 3.2 : Create Lambda Functions for Scale Down
Same as scale up function create the scale down function using below python code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import boto3

def lambda_handler(event, context):
eks = boto3.client("eks")
cluster_name = "CLUSTER_NAME"
nodegroup_names = ["NODE_GROUP_1", "NODE_GROUP_2", "NODE_GROUP_3"]
new_desiredSize = 0
new_minSize = 0
new_maxSize = 1

# Update scaling configuration for all node groups
for nodegroup_name in nodegroup_names:
response = eks.update_nodegroup_config(
clusterName=cluster_name,
nodegroupName=nodegroup_name,
scalingConfig={
"desiredSize": new_desiredSize,
"minSize": new_minSize,
"maxSize": new_maxSize
}
)
# Print response if needed for debugging
# print(response)
(You need to update cluster details in above code)
Step 4 : Create EventBridge scheduler for Scale Down
In EventBridge -> Scheduler -> Create Schedule
Image not found
Next you need to give schedule name and pattern. You can crate cron-based schedule for scale down the eks node groups.
Image not found
Next select the target as Lambda.
Image not found
In the Invoke section select the scale down function that you want to trigger.
Image not found
Next Review and create a schedule.
Step 5 : Create EventBridge scheduler for Scale Up
Same as scale down scheduler, create a new scheduler for scale up function.

Conclusion

The article discusses about the scaling processes for Amazon Elastic Kubernetes Service (EKS) cluster, focusing specifically on those running on EC2 instances. It outlines the cost structure associated with running EKS clusters and suggests a strategy to reduce costs by scaling down non-production clusters during off-peak hours, such as at night. The proposed solution involves utilizing AWS Lambda Functions with Python scripts and AWS EventBridge to automate the scaling down process.
 

Comments

Log in to comment