Provisioning a Cost-Effective and Scalable EKS Cluster with Managed Spot Instance Node Groups and Auto-Scaling Policies

What are Spot Instances?

Spot Instances are a type of Amazon EC2 instance that lets you use spare AWS computing capacity at a discount of up to 90% compared to On-Demand instances. They are cost-effective but come with the trade-off that AWS can interrupt and reclaim them with little notice (usually a two-minute warning) if the capacity is needed for other customers

Key Characteristics of Spot Instances:

Significant Cost Savings: Since Spot Instances use excess AWS capacity, they’re much cheaper than standard On-Demand instances.
Interruption Flexibility: Spot Instances are ideal for workloads that can tolerate interruptions, as they may be terminated by AWS when the capacity is needed elsewhere.
Dynamic Availability: Spot Instance availability and price can fluctuate depending on demand for EC2 resources in specific regions or instance types.

Why we are using spot instances to create EKS Cluster?

We use Spot Instances in this setup to significantly reduce costs. Spot Instances are available at a discount of up to 90% compared to On-Demand instances because they utilize unused EC2 capacity. This makes them an excellent choice for workloads that can handle interruptions, as AWS can reclaim Spot Instances when needed.

For an EKS cluster, Spot Instances are ideal for:

Cost Efficiency: Lower infrastructure costs, especially for workloads that don’t need constant uptime.
Scalable, Fault-Tolerant Applications: Suitable for batch processing, CI/CD, web servers, and other stateless or flexible workloads that can handle periodic interruptions.

By combining Spot Instances with auto-scaling, we get both cost savings and dynamic scaling to match resource demand, maximizing efficiency

Tools Required for Setting Up Amazon EKS with Spot Instances

eksctl: To create and manage EKS clusters.
kubectl: To interact with and manage Kubernetes clusters.
awscli: To configure AWS credentials and interact with AWS services.

Step 1: Create the EKS Cluster Without Any Node Groups

Create an EKS cluster without a node group using the eksctl command (eksctl create cluster --name=devopstronaut-cluster --region=eu-west-1 --without-nodegroup). By default, eksctl creates a node group with m5.large instances, so we used the --without-nodegroup option to skip creating a default node group

Go to the EKS console to verify that the cluster was successfully created using eksctl

If you view the created EKS cluster in the console, you'll see that it was created without any node groups

Step 2: Create a Managed Node Group with Spot Instances

Add a managed node group with Spot Instances using the following separate eksctl command: eksctl create nodegroup --name devops-spot-ng --cluster devopstronaut-cluster --region eu-west-1 --nodes 2 --nodes-min 1 --nodes-max 3 --instance-types "t3.medium,t3.large" --managed –spot

Explanation of the flags:

--cluster: Specifies the name of the existing EKS cluster to which the node group will be added.
--name: Names the node group for easy identification.
--region: Specifies the AWS region.
--nodes: Sets the initial desired number of nodes (in this case, 2).
--nodes-min and --nodes-max: Define the minimum and maximum number of nodes for auto-scaling.
--instance-types: Specifies multiple instance types for Spot Instances, improving availability by allowing AWS to select from these instance types.
--managed: Ensures that the node group is managed by AWS for updates and scaling.
--spot: Specifies that this node group should use Spot Instances.

If you check in the console, you’ll see that the node group has been created

In the Spot Requests section of the EC2 console, you can view the Spot Instances created for the node group

Step 3:Configure Context for EKS Cluster:

Set the Kubernetes context for the EKS cluster using the following command: aws eks --region eu-west-1 update-kubeconfig --name devopstronaut-cluster

Use kubectl get ns to view namespaces and kubectl get nodes to check the status of the nodes in the cluster, verifying that the setup is complete

Step 4: Enable Auto-Scaling Policies

To enable --asg-access (Auto Scaling Group access) permissions for an existing node group, you’ll need to update the IAM role linked to the node group with the permissions that --asg-access would normally apply

Open the node group for which you want to enable the auto-scaling policy, then select View in IAM to access its associated IAM role

The IAM Role will open in a new tab. Click Add inline policy to create a custom policy for auto-scaling permissions

Use the Policy editor to create the inline policy with the necessary permissions, then click Next to proceed

Enter a name for the policy, then click Create policy to save it

The policy is now attached to the IAM role

Alternate Method to enable Auto scaling Policies: Recreate the Node Group with --asg-access:

To apply --asg-access using eksctl, you’ll need to delete the existing node group and recreate it with the --asg-access flag

Note: Deleting and recreating a node group can disrupt workloads, so ensure you manage workload migration appropriately to avoid downtime

Step 4a: Drain Each Node in the Node Group

Before deleting the node group, drain the nodes to safely migrate workloads to other nodes or prepare them for deletion

For each node in the node group, use the kubectl drain command to evict all workloads from the node safely. Run this command to drain the node (kubectl drain ip-192-168-7-16.eu-west-1.compute.internal --ignore-daemonsets --delete-emptydir-data)

· <node-name>: Replace this with the name of the node you want to drain.

· --ignore-daemonsets: Ensures that daemonset-managed pods are not evicted (daemonsets will be automatically removed when the node is deleted).

· --delete-emptydir-data: Deletes data in emptyDir volumes, as this data is not preserved when pods are rescheduled.

If you encounter this below error while draining the node

Run the following command to get details of the Pod Disruption Budgets (PDBs) in the kube-system namespace: kubectl get pdb -n kube-system

Run this command to delete the Pod Disruption Budget (kubectl delete pdb coredns -n kube-system)

Now, the other node has also been drained

Do not use kubectl delete node: kubectl drain is specifically designed to safely evict pods from a node. Using kubectl delete node will forcibly remove the node without draining it first, which can disrupt your workloads

Step 4b: Delete the Node Group

Delete the existing node group using eksctl command (eksctl delete nodegroup --cluster devopstronaut-cluster --name devops-spot-ng --region eu-west-1)

In the console, the node group status will show as 'deleting' as the removal process is underway

Recreate the Node Group with --asg-access

Recreate the node group with the required --asg-access permissions using this command (eksctl create nodegroup --name devops-spot-ng --cluster devopstronaut-cluster --region eu-west-1 --nodes 2 --nodes-min 1 --nodes-max 3 --instance-types "t3.medium,t3.large" --managed --spot --asg-access)

--asg-access: Grants IAM permissions required by the Cluster Autoscaler to manage the Auto Scaling Group (ASG) that supports the node group. This flag is useful for enabling dynamic scaling based on cluster workload needs

If you check in the console, you’ll see that the node group has been created

Click View in IAM next to the Node IAM role ARN, which will open in a new tab. There, you’ll see the custom inline policy that was created for the auto-scaling permissions

Select the inline policy you created to view the permissions that were automatically configured for auto-scaling

Run the kubectl get nodes command to verify the newly created nodes

You should be able to see the newly created Spot Instances

Step 5: Delete the EKS Cluster

Delete the EKS cluster using this command (eksctl delete cluster --name=devopstronaut-cluster). This command will remove the entire cluster, including associated resources

As part of the cluster deletion process, the associated Spot Instances were also removed

Conclusion:

we created an Amazon EKS cluster with a managed Spot Instance node group and enabled auto-scaling permissions for dynamic scaling. After verifying the setup, we deleted the cluster, which also removed the Spot Instances automatically. This process highlights an efficient, cost-effective approach to managing scalable EKS resources

Keep Learning, Keep EKS-ing!!

Feel free to reach out to me, if you have any other queries or suggestions Stay connected on LinkedIn: Mahendran Selvakumar

Stay connected on Medium: https://devopstronaut.com/

Site Terms, Privacy, and more.

Provisioning a Cost-Effective and Scalable EKS Cluster with Managed Spot Instance Node Groups and Auto-Scaling Policies

A managed Amazon EKS cluster was set up with Spot Instances and auto-scaling for cost-effective scalability. After testing, the cluster was deleted, automatically removing the Spot Instances, demonstrating a streamlined approach to managing EKS resources

6 Comments