Run Kubernetes Clusters for Less with Amazon EC2 Spot and Karpenter
Learn how to run Kubernetes clusters for up to 90% off with Amazon Elastic Kubernetes Service (EKS), Amazon EC2 Spot Instances, and Karpenter - all in less than 60 minutes.
NodePool
. If you’re getting started with Spot in Amazon Elastic Kubernetes Service (EKS) or are struggling with the complexity of configuring multiple node groups, I recommend using Karpenter. However, if you’re already using CA and want to start spending less, you can find the detailed configuration to use Spot with CA here.✅ AWS experience | Advanced - 300 |
---|---|
⏱ Time to complete | 75 minutes |
💰 Cost to complete | < $10.00 USD |
🧩 Prerequisites | - AWS Account - AWS CLI - Kubernetes CLI (kubectl) - Terraform CLI - Helm |
📢 Feedback | Any feedback, issues, or just a 👍 / 👎 ? |
💾 Code | Download the code |
🛠 Contributors | @jakeskyaws |
⏰ Last Updated | 2024-4-26 |
- You need access to an AWS account with IAM permissions to create an EKS cluster, and an AWS Cloud9 environment if you're running the commands listed in this tutorial.
- Install and configure the AWS CLI
- Install the Kubernetes CLI (kubectl)
- Install the Terraform CLI
- Install Helm (the package manager for Kubernetes)
💡 Tip: You can skip this step if you already have a Cloud9 environment or if you’re planning to run all steps on your own computer. Just make sure you have the proper permissions listed in the pre-requisites section of this tutorial.
💡 Tip: You can control in which region to launch the Cloud9 environment by setting up theAWS_REGION
environment variable.
💡 IMPORTANT: You need to use the same IAM user/role both in the AWS Console and the AWS CLI setup. Othewrise, when you try to open the Cloud9 environment you won't have permissions to do it.
1
export C9PUBLICSUBNET='<<YOUR PUBLIC SUBNET ID GOES HERE>>'
1
2
wget https://raw.githubusercontent.com/build-on-aws/run-kubernetes-clusters-for-less-with-amazon-ec2-spot-and-karpenter/main/cloud9-cnf.yaml
aws cloudformation deploy --stack-name EKSKarpenterCloud9 --parameter-overrides C9PublicSubnet=$C9PUBLICSUBNET --template-file cloud9-cnf.yaml --capabilities "CAPABILITY_IAM"
1
2
aws cloud9 update-environment --environment-id ${C9_PID} --managed-credentials-action DISABLE
rm -vf ${HOME}/.aws/credentials
1
2
3
4
aws --version
kubectl version --client=true -o json
terraform version
helm version
💡 NOTE: If the CloudFormation stack has not reached the "CREATE_COMPLETE" status, the CLI tools may not have been installed yet. Please wait until the stack completes before proceeding with any CLI commands..
💡 Tip: The Terraform template used in this tutorial is using an On-Demand managed node group to host the Karpenter controller. However, if you have an existing cluster, you can use an existing node group with On-Demand instances to deploy the Karpenter controller. To do so, you need to follow the Karpenter getting started guide.
aws-auth
configmap to allow nodes to connect, and creates an On-Demand managed node group for the kube-system
and karpenter
namespaces.1
2
3
4
5
6
7
wget https://raw.githubusercontent.com/build-on-aws/run-kubernetes-clusters-for-less-with-amazon-ec2-spot-and-karpenter/main/cluster/terraform/main.tf
helm registry logout public.ecr.aws
export TF_VAR_region=$AWS_REGION
terraform init
terraform apply -target="module.vpc" -auto-approve
terraform apply -target="module.eks" -auto-approve
terraform apply --auto-approve
kube.config
file to interact with the cluster through kubectl
:1
aws eks --region $AWS_REGION update-kubeconfig --name spot-and-karpenter
💡 Tip: If you’re using a different region or changed the name of the cluster, you can get the previous command for your setup from the Terraform output by running this command:terraform output -raw configure_kubectl
.
1
2
3
4
$ kubectl get pods -n karpenter
NAME READY STATUS RESTARTS AGE
karpenter-5f97c944df-bm85s 1/1 Running 0 15m
karpenter-5f97c944df-xr9jf 1/1 Running 0 15m
kube-system
and karpenter
namespaces, and it’s going to be only one you’ll need. For the rest of pods, Karpenter will launch nodes through a NodePool CRD. The NodePool sets constraints on the nodes that can be created by Karpenter and the pods that can run on those nodes. A single Karpenter NodePool is capable of handling many different pod shapes, and for this tutorial you’ll only create the default
NodePool.💡 Tip: Karpenter simplifies the data plane capacity management using an approach called group-less auto scaling. This is because Karpenter is no longer using node groups, which match with Auto Scaling groups, to launch nodes. Over time, clusters using the paradigm of running different types of applications (that require different capacity types), end up with a complex configuration and operational model where node groups must be defined and provided in advance.
1
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true
1
An error occurred (InvalidInput) when calling the CreateServiceLinkedRole operation: Service role name AWSServiceRoleForEC2Spot has been taken in this account, please try a different suffix.
main.tf
file lives and run the following command:1
2
export CLUSTER_NAME=$(terraform output -raw cluster_name)
export KARPENTER_NODE_IAM_ROLE_NAME=$(terraform output -raw node_instance_profile_name)
💡 NOTE: If you're working with an existing EKS cluster, make sure to set the proper values for the previous environment variables as we'll use those values to setup the Karpenter provsioner.
NodePool
by running the following commands:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
intent: apps
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: "karpenter.k8s.aws/instance-cpu"
operator: Gt
values: ["4"]
- key: "karpenter.k8s.aws/instance-memory"
operator: Gt
values: ["8191"] # 8 * 1024 - 1
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
kubelet:
systemReserved:
cpu: 100m
memory: 100Mi
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: 168h # 7 * 24h = 168h
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${CLUSTER_NAME}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${CLUSTER_NAME}
role: ${KARPENTER_NODE_IAM_ROLE_NAME}
tags:
project: build-on-aws
IntentLabel: apps
KarpenterNodePoolName: default
NodeType: default
intent: apps
karpenter.sh/discovery: ${CLUSTER_NAME}
EOF
NodePool
you just created:requirements
: Here’s where you define the type of nodes Karpenter can launch. Be as flexible as possible and let Karpenter choose the right instance type based on the pod requirements. For thisNodePool
, you’re saying Karpenter can launch either Spot or On-Demand Instances, families includingc
,m
andr
, with a minimum of 4 vCPUs and 8 GiB of memory. With this configuration, you’re choosing around 150 instance types from the 700+ available today in AWS. Read the next section to understand why this is important.limits
: This is how you constrain the maximum amount of resources that theNodePool
will manage. Karpenter can launch instances with different specs, so instead of limiting a max number of instances (as you’d typically do in an Auto Scaling group), you define a maximum of vCPUs or memory to limit the number of nodes to launch. Karpenter provides a metric to monitor the percentage usage of thisNodePool
based on the limits you configure.disruption
: Karpenter does a great job at launching only the nodes you need, but as pods can come an go, at some point in time the cluster capacity can end up in a fragmented state. To avoid fragmentation and optimize the compute nodes in your cluster, you can enable consolidation. When enabled, Karpenter automatically discovers disruptable nodes and spins up replacements when needed.expireAfter
: Here’s where you define when a node will be deleted. This is useful to force new nodes with up-to-date AMI’s. In this example we have set the value to 7 days.
NodePool
here.NodePool
we’re basically letting Karpenter choose from a diverse set of instance types to launch the best instance type possible. If it’s an On-Demand Instance, Karpenter uses the lowest-price
allocation strategy to launch the cheapest instance type that has available capacity. When you use multiple instance types, you can avoid the InsufficientInstanceCapacity error.price-capacity-optimized
(PCO) allocation strategy. PCO looks at both price and capacity availability to launch from the Spot Instance pools that are the least likely to be interrupted and have the lowest possible price. For Spot Instances, applying diversification is key. Spot Instances are spare capacity that can be reclaimed by EC2 when it is required. Karpenter allows you to diversify extensively to replace reclaimed Spot Instances automatically with instances from other pools where capacity is available.NodePool
can launch both On-Demand and Spot Instances, but Karpenter considers the constraints you configure within a pod to launch the right node(s). Let’s create a Deployment with a nodeSelector to run the pods on Spot instances. To do so, run the following command:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cat <<EOF > workload.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: stateless
spec:
replicas: 10
selector:
matchLabels:
app: stateless
template:
metadata:
labels:
app: stateless
spec:
nodeSelector:
intent: apps
karpenter.sh/capacity-type: spot
containers:
- image: public.ecr.aws/eks-distro/kubernetes/pause:v1.29.0-eks-1-29-latest
name: app
resources:
requests:
cpu: 512m
memory: 512Mi
EOF
kubectl apply -f workload.yaml
Pending
, making Karpenter react and launch the nodes, similar to this output:1
2
3
4
5
6
7
8
9
10
11
12
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
stateless-5c77994ccb-4mtsp 0/1 Pending 0 3s
stateless-5c77994ccb-4mtsp 0/1 Pending 0 3s
stateless-5c77994ccb-4mtsp 0/1 Pending 0 3s
stateless-5c77994ccb-4mtsp 0/1 Pending 0 3s
stateless-5c77994ccb-4mtsp 0/1 Pending 0 3s
stateless-5c77994ccb-4mtsp 0/1 Pending 0 3s
stateless-5c77994ccb-4mtsp 0/1 Pending 0 3s
stateless-5c77994ccb-4mtsp 0/1 Pending 0 3s
stateless-5c77994ccb-4mtsp 0/1 Pending 0 3s
stateless-5c77994ccb-4mtsp 0/1 Pending 0 3s
1
alias kl='kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter --all-containers=true -f --tail=20'
1
2
3
4
$ kl
{"level":"INFO","time":"2024-01-28T21:14:32.625Z","logger":"controller.provisioner","message":"computed new nodeclaim(s) to fit pod(s)","commit":"1072d3b","nodeclaims":1,"pods":10}
{"level":"INFO","time":"2024-01-28T21:14:32.652Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"1072d3b","nodepool":"default","nodeclaim":"default-8blnj","requests":{"cpu":"5330m","memory":"5360Mi","pods":"14"},"instance-types":"c4.2xlarge, c4.4xlarge, c5.2xlarge, c5.4xlarge, c5a.2xlarge and 95 other(s)"}
{"level":"INFO","time":"2024-01-28T21:14:36.823Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"1072d3b","nodeclaim":"default-8blnj","nodepool":"default","provider-id":"aws:///eu-west-2a/i-094792dc93778aa2a","instance-type":"c7g.2xlarge","zone":"eu-west-2a","capacity-type":"spot","allocatable":{"cpu":"7810m","ephemeral-storage":"17Gi","memory":"14003Mi","pods":"58","vpc.amazonaws.com/pod-eni":"38"}}
- Noticed there were 10 pending pods, and decided that can fit all pods in only one node.
- Is considering the kubelet and kube-proxy
Daemonsets
(2 additional pods), and is aggregating all resources need for 12 pods. Moreover, Karpenter noticed that 100 instance types match these requirements. - Launched an
c7g.2xlarge
Spot Instance ineu-west-2a
as this was the pool with more spare capacity with lowest price.
Deployment
.Deployment
:1
kubectl delete deployment stateless
💡 NOTE: To see pods being spread within AZs withh similar instance sizes, wait until pods and existing EC2 instances launched by Karpenter are removed.
nodeSelector
and the containers
block from the workload.yaml
file you downloaded before:1
2
3
4
5
6
7
8
topologySpreadConstraints:
- labelSelector:
matchLabels:
app: stateless
maxSkew: 1
minDomains: 2
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
💡 Tip: You can download the full version of the deployment manifest including the TSP here.
Deployment
again. If you downloaded the manifest from GitHub, you can simply run:1
kubectl apply -f workload.yaml
1
kubectl get nodes -L karpenter.sh/capacity-type,beta.kubernetes.io/instance-type,topology.kubernetes.io/zone -l karpenter.sh/capacity-type=spot
1
2
3
4
NAME STATUS ROLES AGE VERSION CAPACITY-TYPE INSTANCE-TYPE ZONE
ip-10-0-102-121.eu-west-2.compute.internal NotReady <none> 1s v1.29.0-eks-5e0fdde spot m7g.2xlarge eu-west-2c
ip-10-0-36-60.eu-west-2.compute.internal NotReady <none> 4s v1.29.0-eks-5e0fdde spot c7g.2xlarge eu-west-2a
ip-10-0-92-180.eu-west-2.compute.internal NotReady <none> 4s v1.29.0-eks-5e0fdde spot c7g.2xlarge eu-west-2b
NodePool
starts a new node as soon as it sees the Spot interruption warning. Karpenter’s average node startup time means that, generally, there is sufficient time for the new node to become ready and to move the pods to the new node before the node is reclaimed.1
wget https://raw.githubusercontent.com/build-on-aws/run-kubernetes-clusters-for-less-with-amazon-ec2-spot-and-karpenter/main/fis/spotinterruption.yaml
1
aws cloudformation deploy --stack-name fis-spot-and-karpenter --template-file spotinterruption.yaml --capabilities "CAPABILITY_NAMED_IAM"
STATUS
, and 2) for the Karpenter logs. In one terminal watch the nodes using this command:1
kubectl get nodes -L karpenter.sh/capacity-type,beta.kubernetes.io/instance-type,topology.kubernetes.io/zone -l karpenter.sh/capacity-type=spot --watch
1
2
alias kl='kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter --all-containers=true -f --tail=20';
kl
1
2
FIS_EXP_TEMP_ID=$(aws cloudformation describe-stacks --stack-name fis-spot-and-karpenter --query "Stacks[0].Outputs[?OutputKey=='FISExperimentID'].OutputValue" --output text)
aws fis start-experiment --experiment-template-id $FIS_EXP_TEMP_ID --no-cli-pager
1
2
3
4
5
{"level":"INFO","time":"2024-01-29T08:47:30.575Z","logger":"controller.interruption","message":"initiating delete from interruption message","commit":"1072d3b","queue":"karpenter-spot-and-karpenter","messageKind":"SpotInterruptionKind","nodeclaim":"default-4w54b","action":"CordonAndDrain","node":"ip-10-0-36-60.eu-west-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T08:47:30.603Z","logger":"controller.node.termination","message":"tainted node","commit":"1072d3b","node":"ip-10-0-36-60.eu-west-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T08:47:31.963Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"1072d3b","pods":"default/stateless-7956bd8d4c-48mj9, default/stateless-7956bd8d4c-spsqr, default/stateless-7956bd8d4c-sm4cp","duration":"18.833162ms"}
{"level":"INFO","time":"2024-01-29T08:47:31.963Z","logger":"controller.provisioner","message":"computed new nodeclaim(s) to fit pod(s)","commit":"1072d3b","nodeclaims":1,"pods":3}
{"level":"INFO","time":"2024-01-29T08:47:31.997Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"1072d3b","nodepool":"default","nodeclaim":"default-6p2qb","requests":{"cpu":"1746m","memory":"1776Mi","pods":"7"},"instance-types":"c4.2xlarge, c4.4xlarge, {"level":"INFO","time":"2024-01-29T08:47:34.823Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"1072d3b","nodeclaim":"default-6p2qb","nodepool":"default","provider-id":"aws:///eu-west-2a/i-005ca44c327470d09","instance-type":"m7g.2xlarge","zone":"eu-west-2a","capacity-type":"spot","allocatable":{"cpu":"7810m","ephemeral-storage":"17Gi","memory":"29158Mi","pods":"58","vpc.amazonaws.com/pod-eni":"38"}}
eks-node-viewer
a tool for visualizing dynamic node usage within a cluster. It was originally developed as an internal tool at AWS for demonstrating consolidation with Karpenter. It displays the scheduled pod resource requests vs the allocatable capacity on the node.1
eks-node-viewer
💡 Tip: You might end up seeing only one/two Spot nodes running, and if you review the Karpenter logs, you’ll see that it was because of the consolidation process.
NodePool
you created before. But make sure you’re configuring the workload properly. One way of doing it is to use a similar approach for the Spot-friendly workload by using a nodeSelector
.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cat <<EOF > workload-stateful.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: stateful
spec:
replicas: 7
selector:
matchLabels:
app: stateful
template:
metadata:
labels:
app: stateful
spec:
nodeSelector:
intent: apps
karpenter.sh/capacity-type: on-demand
containers:
- name: app
image: public.ecr.aws/eks-distro/kubernetes/pause:v1.29.0-eks-1-29-latest
resources:
requests:
cpu: 512m
memory: 512Mi
EOF
kubectl apply -f workload-stateful.yaml
1
2
3
$ kubectl get nodes -L karpenter.sh/capacity-type,beta.kubernetes.io/instance-type,topology.kubernetes.io/zone -l karpenter.sh/capacity-type=on-demand
NAME STATUS ROLES AGE VERSION CAPACITY-TYPE INSTANCE-TYPE ZONE
ip-10-0-107-229.eu-west-2.compute.internal Ready <none> 13s v1.29.0-eks-5e0fdde on-demand c6g.2xlarge eu-west-2c
1
2
kubectl delete deployment stateless
kubectl delete deployment stateful
1
2
3
4
5
export TF_VAR_region=$AWS_REGION
terraform destroy -target="module.eks_blueprints_addons" --auto-approve
terraform destroy -target="module.eks" --auto-approve
terraform destroy --auto-approve
aws cloudformation delete-stack --stack-name fis-spot-and-karpenter
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.