Run Kubernetes Clusters for Less with Amazon EC2 Spot and Karpenter
Learn how to run Kubernetes clusters for up to 90% off with Amazon Elastic Kubernetes Service (EKS), Amazon EC2 Spot Instances, and Karpenter - all in less than 60 minutes.
Christian Melendez
Amazon Employee
Published Sep 7, 2023
Last Modified Apr 26, 2024
One of the main cost factors for Kubernetes clusters relies on the compute layer for the data plane. Running Kubernetes clusters on Amazon EC2 Spot Instances is a great way to reduce your compute costs significantly. When using Spot Instances, you can get up to a 90% price discount compared to On-Demand Instances.
Spot is a great match for workloads that are stateless, fault-tolerant, and flexible applications such as big data, containerized workloads, CI/CD, web servers, high-performance computing (HPC), and test & development workloads. Containers often match with these characteristics as they’re Spot-friendly. For non Spot-friendly workloads, like stateful applications within your cluster, you can continue using On-Demand Instances.
To optimize data place capacity further, you can adjust the number of nodes when pods are unscheduable due to available capacity, or remove nodes when they’re no longer needed. For automatic nodes adjustment, use either Cluster Autoscaler (CA) or Karpenter. Both tools have support for Spot, and in this tutorial I’ll focus on Karpenter.
I’ll guide you on the steps you need to follow to configure an EKS cluster with Spot instances and Karpenter. Additionally, I’ll show you how to configure a workload to see Karpenter in action by provisioning the required capacity using Spot Instances.
Karpenter is an open-source node provisioning project built for Kubernetes. As new pods continue coming to your cluster, either because you increased the number of replicas manually or through an Horizontal Pod Autoscaling (HPA) policy or through a Kubernetes Event-driven Autoscaling (KEDA) event, at some point your data plane nodes will be at full capacity, causing you to have pending (unschedulable) pods. The Karpenter controller reacts to this problem, and aggregates the capacity of these pending pods by evaluating scheduling constraints (resource requests, nodeselectors, affinities, tolerations, and topology spread constraints). Then, Karpenter provisions the right nodes that meet the requirements of these pending pods.
One of the main advantages of using Karpenter is the simplicity of configuring Spot best practices like instance type diversification (multiple families, sizes, generations, etc) in what Karpenter calls a
NodePool
. If you’re getting started with Spot in Amazon Elastic Kubernetes Service (EKS) or are struggling with the complexity of configuring multiple node groups, I recommend using Karpenter. However, if you’re already using CA and want to start spending less, you can find the detailed configuration to use Spot with CA here.✅ AWS experience | Advanced - 300 |
---|---|
⏱ Time to complete | 75 minutes |
💰 Cost to complete | < $10.00 USD |
🧩 Prerequisites | - AWS Account - AWS CLI - Kubernetes CLI (kubectl) - Terraform CLI - Helm |
📢 Feedback | Any feedback, issues, or just a 👍 / 👎 ? |
💾 Code | Download the code |
🛠 Contributors | @jakeskyaws |
⏰ Last Updated | 2024-4-26 |
- You need access to an AWS account with IAM permissions to create an EKS cluster, and an AWS Cloud9 environment if you're running the commands listed in this tutorial.
- Install and configure the AWS CLI
- Install the Kubernetes CLI (kubectl)
- Install the Terraform CLI
- Install Helm (the package manager for Kubernetes)
💡 Tip: You can skip this step if you already have a Cloud9 environment or if you’re planning to run all steps on your own computer. Just make sure you have the proper permissions listed in the pre-requisites section of this tutorial.
💡 Tip: You can control in which region to launch the Cloud9 environment by setting up theAWS_REGION
environment variable.
I’ve prepared an AWS CloudFormation template to create a Cloud9 environment. It has all the tools to follow this tutorial like kubectl and Terraform CLI. You can either create the CloudFormation stack through the AWS Console, or do it through the command line. You can download the CloudFormation template here. I'm going to give you all the commands you need to run to create the stack using the CLI.
💡 IMPORTANT: You need to use the same IAM user/role both in the AWS Console and the AWS CLI setup. Othewrise, when you try to open the Cloud9 environment you won't have permissions to do it.
Before you create the Cloudformation stack, you need to get a public subnet ID to launch the Cloud9 instance with Internet access. Once you get it, set the following environment variable:
Now, let’s create the Cloud9 environment running the following command:
Wait 3-5 minutes after CloudFormation finishes, then open the Cloud9 console and open the environment. From now on, you’ll be running all commands in this tutorial in the Cloud9 terminal.
Cloud9 normally manages IAM credentials dynamically. This isn’t currently compatible with the EKS IAM authentication, so you need to disable it and rely on the IAM role instead. To do so, run the following commands in the Cloud9 terminal:
To confirm that you have all the CLI tools needed for this tutorial, input these commands:
💡 NOTE: If the CloudFormation stack has not reached the "CREATE_COMPLETE" status, the CLI tools may not have been installed yet. Please wait until the stack completes before proceeding with any CLI commands..
💡 Tip: The Terraform template used in this tutorial is using an On-Demand managed node group to host the Karpenter controller. However, if you have an existing cluster, you can use an existing node group with On-Demand instances to deploy the Karpenter controller. To do so, you need to follow the Karpenter getting started guide.
In this step you'll create an Amazon EKS cluster using the EKS Blueprints for Terraform project. The Terraform template you’ll use in this tutorial is going to create a VPC, an EKS control plane, and a Kubernetes service account along with the IAM role and associate them using IAM Roles for Service Accounts (IRSA) to let Karpenter launch instances. Additionally, the template configures the Karpenter node role to the
aws-auth
configmap to allow nodes to connect, and creates an On-Demand managed node group for the kube-system
and karpenter
namespaces.To create the cluster, run the following commands:
Once complete (after waiting about 15 minutes), run the following command to update the
kube.config
file to interact with the cluster through kubectl
:💡 Tip: If you’re using a different region or changed the name of the cluster, you can get the previous command for your setup from the Terraform output by running this command:terraform output -raw configure_kubectl
.
You need to make sure you can interact with the cluster and that the Karpenter pods are running:
The EKS cluster already has a static managed node group configured in advance for the
kube-system
and karpenter
namespaces, and it’s going to be only one you’ll need. For the rest of pods, Karpenter will launch nodes through a NodePool CRD. The NodePool sets constraints on the nodes that can be created by Karpenter and the pods that can run on those nodes. A single Karpenter NodePool is capable of handling many different pod shapes, and for this tutorial you’ll only create the default
NodePool.💡 Tip: Karpenter simplifies the data plane capacity management using an approach called group-less auto scaling. This is because Karpenter is no longer using node groups, which match with Auto Scaling groups, to launch nodes. Over time, clusters using the paradigm of running different types of applications (that require different capacity types), end up with a complex configuration and operational model where node groups must be defined and provided in advance.
Before you continue, you need to enable your AWS account to launch Spot instances if you haven't launch any yet. To do so, create the service-linked role for Spot by running the following command:
If the role has already been successfully created, you will see:
You don't need to worry about this error, you simply had to run the above command to make sure you have the service-linked role to launch Spot instances.
Now, you need to create two environment variables that we’ll use next. The values you need can be obtained from the Terraform output variables. Make sure you’re in the same folder where the Terraform
main.tf
file lives and run the following command:💡 NOTE: If you're working with an existing EKS cluster, make sure to set the proper values for the previous environment variables as we'll use those values to setup the Karpenter provsioner.
Let’s create a default
NodePool
by running the following commands:Karpenter is now active and ready to begin provisioning nodes.
Let me highlight a few important settings from the default
NodePool
you just created:requirements
: Here’s where you define the type of nodes Karpenter can launch. Be as flexible as possible and let Karpenter choose the right instance type based on the pod requirements. For thisNodePool
, you’re saying Karpenter can launch either Spot or On-Demand Instances, families includingc
,m
andr
, with a minimum of 4 vCPUs and 8 GiB of memory. With this configuration, you’re choosing around 150 instance types from the 700+ available today in AWS. Read the next section to understand why this is important.limits
: This is how you constrain the maximum amount of resources that theNodePool
will manage. Karpenter can launch instances with different specs, so instead of limiting a max number of instances (as you’d typically do in an Auto Scaling group), you define a maximum of vCPUs or memory to limit the number of nodes to launch. Karpenter provides a metric to monitor the percentage usage of thisNodePool
based on the limits you configure.disruption
: Karpenter does a great job at launching only the nodes you need, but as pods can come an go, at some point in time the cluster capacity can end up in a fragmented state. To avoid fragmentation and optimize the compute nodes in your cluster, you can enable consolidation. When enabled, Karpenter automatically discovers disruptable nodes and spins up replacements when needed.expireAfter
: Here’s where you define when a node will be deleted. This is useful to force new nodes with up-to-date AMI’s. In this example we have set the value to 7 days.
You can also learn more about which other configuration properties are available for a
NodePool
here.As you noticed, with the above
NodePool
we’re basically letting Karpenter choose from a diverse set of instance types to launch the best instance type possible. If it’s an On-Demand Instance, Karpenter uses the lowest-price
allocation strategy to launch the cheapest instance type that has available capacity. When you use multiple instance types, you can avoid the InsufficientInstanceCapacity error.If it’s a Spot Instance, Karpenter uses the
price-capacity-optimized
(PCO) allocation strategy. PCO looks at both price and capacity availability to launch from the Spot Instance pools that are the least likely to be interrupted and have the lowest possible price. For Spot Instances, applying diversification is key. Spot Instances are spare capacity that can be reclaimed by EC2 when it is required. Karpenter allows you to diversify extensively to replace reclaimed Spot Instances automatically with instances from other pools where capacity is available.You’re now going to see Karpenter in action. Your default
NodePool
can launch both On-Demand and Spot Instances, but Karpenter considers the constraints you configure within a pod to launch the right node(s). Let’s create a Deployment with a nodeSelector to run the pods on Spot instances. To do so, run the following command:As there are no nodes that match the pod’s requirements, all pods will be
Pending
, making Karpenter react and launch the nodes, similar to this output:Review Karpenter logs to see what’s happening while you wait for the new node to be ready. Create the following alias:
Karpenter logs should look similar to this (I’m including only the lines I want to highlight):
By reading the logs, you can see that Karpenter:
- Noticed there were 10 pending pods, and decided that can fit all pods in only one node.
- Is considering the kubelet and kube-proxy
Daemonsets
(2 additional pods), and is aggregating all resources need for 12 pods. Moreover, Karpenter noticed that 100 instance types match these requirements. - Launched an
c7g.2xlarge
Spot Instance ineu-west-2a
as this was the pool with more spare capacity with lowest price.
Karpenter launched only one node for all pending pods. However, putting all your eggs in the same basket is not recommended, as if you lose that node, you’ll need to wait for Karpenter to provision a replacement node (which can be fast, but still, you’ll see an impact). To avoid this, and to make the workload more highly available, let’s spread the pods within multiple AZs. Let’s configure a Topology Spread Constraint (TSP) within the
Deployment
.Before you continue, remove the stateless
Deployment
:💡 NOTE: To see pods being spread within AZs withh similar instance sizes, wait until pods and existing EC2 instances launched by Karpenter are removed.
To configure a TSP, add the following snippet between the
nodeSelector
and the containers
block from the workload.yaml
file you downloaded before:💡 Tip: You can download the full version of the deployment manifest including the TSP here.
Create the stateless
Deployment
again. If you downloaded the manifest from GitHub, you can simply run:Then, you can review the Karpenter logs and notice how different the actions are. Wait one minute and you should see the pods running within three nodes in different AZs:
You should see an output similar to this:
You can simulate a Spot interruption to test the resiliency of your applications. As I said before, Spot is spare capacity for steep discounts in exchange for returning them when EC2 needs the capacity back. Spot interruptions have a 2 minute notice before EC2 reclaims the instance. Karpenter can watch these interruptions (the cluster you created with Terraform is already configured this way). When this happens, the
NodePool
starts a new node as soon as it sees the Spot interruption warning. Karpenter’s average node startup time means that, generally, there is sufficient time for the new node to become ready and to move the pods to the new node before the node is reclaimed.You can simulate a Spot interruption using Fault Injection Simulator (FIS). To do this, you can either do it through the console, or using the Amazon EC2 Spot Interrupter CLI.
In this tutorial, I’ll use a CloudFormation template to create a FIS experiment template, and then run an experiment to send a Spot interruption to one (randomly) instance launched by Karpenter. You first need to download the CloudFormation template:
Now let's create the FIS experiment template by running the following command:
Now, you’ll need two extra terminals: 1) to monitor the nodes
STATUS
, and 2) for the Karpenter logs. In one terminal watch the nodes using this command:In another terminal, run the following commands:
In the third terminal, run the following command to send a Spot interruption:
Review what happens by looking at the Karpenter logs, as soon as the Spot interruption warning lands, Karpenter immediately cordons and drains the node, but also launches a replacement instance:
You can also go back to the terminal where you listed all the nodes, and you'll see how the interrupted instance was cordoned, and when the new instance was launched.
Alternatively to vizualise the consolidation process, you can use eks-node-viewer.
eks-node-viewer
a tool for visualizing dynamic node usage within a cluster. It was originally developed as an internal tool at AWS for demonstrating consolidation with Karpenter. It displays the scheduled pod resource requests vs the allocatable capacity on the node.To launch it execute the following in a new Cloud9 terminal tab:
💡 Tip: You might end up seeing only one/two Spot nodes running, and if you review the Karpenter logs, you’ll see that it was because of the consolidation process.
You can still launch On-Demand Instances in a cluster that’s also running Spot Instances for those non Spot-friendly workloads. Continue using the default Karpenter
NodePool
you created before. But make sure you’re configuring the workload properly. One way of doing it is to use a similar approach for the Spot-friendly workload by using a nodeSelector
.Deploy the following application to simulate a stateful workload:
Same as before, wait one minute, and you should see all pods running and one On-Demand node running:
You can review the Karpenter logs as well, you’ll see a similar behavior as before with the Spot-friendly workload.
When you’re done with this tutorial, remove the two deployments you created:
Wait 30 seconds until the nodes that Karpenter launched are gone (due to consolidation), then remove all resources:
Using Spot Instances for your Kubernetes data plane nodes helps you reduce computing costs. As long as your workloads are fault-tolerant, stateless, and can use a variety of instance types, you can use Spot. Karpenter allows you to simplify the process of configuring your EKS cluster with a high-instance type diversification, and provisions only the capacity you need.
You can learn more about using Karpenter on EKS with this hands-on workshop
Karpenter Blueprints is a repository that includes a list of common workload scenarios.
Dive deeper into the Karpenter concepts here
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.