Managing Asynchronous Tasks with SQS and EFS Persistent Storage in Amazon EKS
Run background tasks in a job queue and leverage scalable, multi-availability zone storage.
Step 1: Configure Cluster Environment Variables
Step 2: Verify or Create the IAM Role for Service Accounts
Step 3: Verify the EFS CSI Driver Add-On Is Installed
Step 4: Run the Sample Batch Application
Step 5: Preparing and Deploying the Batch Container
Step 6: Create the Multi-Architecture Image
Step 7: Deploy the Kubernetes Job
Step 8: Enable Permissions for Batch Processing Jobs on SQS
Step 9: Create a Kubernetes Secret
Step 10: Deploy the Kubernetes Job With Queue Integration
Step 11: Create the PersistentVolume and PersistentVolumeClaim for EFS
- Install the latest version of kubectl. To check your version, run:
kubectl version --short
. - Install the latest version of eksctl. To check your version, run:
eksctl info
. - Install Python 3.9+. To check your version, run:
python3 --version
. - Install Docker or any other container engine equivalent to build the container.
About | ||
---|---|---|
✅ AWS Level | Intermediate - 200 | |
⏱ Time to complete | 30 minutes | |
🧩 Prerequisites | - AWS Account | |
📢 Feedback | Any feedback, issues, or just a 👍 / 👎 ? | |
⏰ Last Updated | 2023-09-29 |
- First, confirm that you are operating within the correct cluster context. This ensures that any subsequent commands are sent to the intended Kubernetes cluster. You can verify the current context by executing the following command:
1
kubectl config current-context
- Define the
CLUSTER_NAME
environment variable for your EKS cluster. Replace the sample value for clusterregion
.
1
export CLUSTER_NAME=$(aws eks describe-cluster --region us-east-1 --name batch-quickstart --query "cluster.name" --output text)
- Define the
CLUSTER_REGION
environment variable for your EKS cluster. Replace the sample value for clusterregion
.
1
export CLUSTER_REGION=$(aws eks describe-cluster --name ${CLUSTER_NAME} --region us-east-1 --query "cluster.arn" --output text | cut -d: -f4)
- Define the
ACCOUNT_ID
environment variable for the account associated with your EKS cluster.
1
export ACCOUNT_ID=$(aws eks describe-cluster --name ${CLUSTER_NAME} --region ${CLUSTER_REGION} --query "cluster.arn" --output text | cut -d':' -f5)
- Make sure the required service accounts for this tutorial are correctly set up in your cluster:
1
kubectl get sa -A | egrep "efs-csi-controller|ecr"
1
2
default ecr-sa 0 31m
kube-system efs-csi-controller-sa 0 30m
- To create a Kubernetes service account for Amazon ECR:
1
2
3
4
5
6
7
eksctl create iamserviceaccount \
--region ${CLUSTER_REGION} \
--name ecr-sa \
--namespace default \
--cluster batch-quickstart \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly \
--approve
- Check that the EFS CSI driver is installed:
1
eksctl get addon --cluster ${CLUSTER_NAME} --region ${CLUSTER_REGION} | grep efs
1
aws-efs-csi-driver v1.5.8-eksbuild.1 ACTIVE 0
input.csv
file, performs data manipulation using randomization for demonstration, and writes the processed data back to an output.csv
file. This serves as a hands-on introduction before we deploy this application to Amazon ECR and EKS.- Create a Python script named
batch_processing.py
and paste the following contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import csv
import time
import random
def read_csv(file_path):
with open(file_path, 'r') as f:
reader = csv.reader(f)
data = [row for row in reader]
return data
def write_csv(file_path, data):
with open(file_path, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(data)
def process_data(data):
processed_data = [["ID", "Value", "ProcessedValue"]]
for row in data[1:]:
id, value = row
processed_value = float(value) * random.uniform(0.8, 1.2)
processed_data.append([id, value, processed_value])
return processed_data
def batch_task():
print("Starting batch task...")
# Read data from CSV
input_data = read_csv('input.csv')
# Process data
processed_data = process_data(input_data)
# Write processed data to CSV
write_csv('output.csv', processed_data)
print("Batch task completed.")
if __name__ == "__main__":
batch_task()
- In the same directory as your Python script, create a file named
input.csv
and paste the following contents:
1
2
3
4
5
6
7
8
9
10
11
ID,Value
1,100.5
2,200.3
3,150.2
4,400.6
5,300.1
6,250.4
7,350.7
8,450.9
9,500.0
10,600.8
- Run the Python script:
1
`python3 batch_processing.py`
1
2
`Starting batch task...`
`Batch task completed.`
output.csv
file will be generated, containing the processed data with an additional column for the processed values:1
2
3
4
5
6
7
8
9
10
11
ID,Value,ProcessedValue
1,100.5,101.40789448456849
2,200.3,202.2013222517103
3,150.2,139.82822974457673
4,400.6,470.8262553815611
5,300.1,253.4504054915937
6,250.4,219.48492376021267
7,350.7,419.3203869922816
8,450.9,495.56898757853986
9,500.0,579.256459785631
10,600.8,630.4063443182313
- In the same directory as the other files you created, create a
Dockerfile
and paste the following contents:
1
2
3
4
5
6
FROM python:3.8-slim
COPY batch_processing.py /
COPY input.csv /
CMD ["python", "/batch_processing.py"]
- Build the Docker image:
1
docker build -t batch-processing-image .
- Create a new private Amazon ECR repository:
1
aws ecr create-repository --repository-name batch-processing-repo --region ${CLUSTER_REGION}
- Authenticate the Docker CLI to your Amazon ECR registry:
1
aws ecr get-login-password --region ${CLUSTER_REGION} | docker login --username AWS --password-stdin ${ACCOUNT_ID}.dkr.ecr.${CLUSTER_REGION}.amazonaws.com
- Tag your container image for the ECR repository:
1
docker tag batch-processing-image:latest ${ACCOUNT_ID}.dkr.ecr.${CLUSTER_REGION}.amazonaws.com/batch-processing-repo:latest
- Push the tagged image to the ECR repository:
1
docker push ${ACCOUNT_ID}.dkr.ecr.${CLUSTER_REGION}.amazonaws.com/batch-processing-repo:latest
- Create and start new builder instances for the batch service:
1
2
3
docker buildx create --name batchBuilder
docker buildx use batchBuilder
docker buildx inspect --bootstrap
- Build and push the images for your batch service to Amazon ECR:
1
docker buildx build --platform linux/amd64,linux/arm64 -t ${ACCOUNT_ID}.dkr.ecr.${CLUSTER_REGION}.amazonaws.com/batch-processing-repo:latest . --push
- Verify that the multi-architecture image is in the ECR repository:
1
aws ecr list-images --repository-name batch-processing-repo --region ${CLUSTER_REGION}
- Get the details of your ECR URL:
1
echo ${ACCOUNT_ID}.dkr.ecr.${CLUSTER_REGION}.amazonaws.com/batch-processing-repo:latest
- Create a Kubernetes Job manifest file named
batch-job.yaml
and paste the following contents. Replace the sample value inimage
with your ECR URL.
1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: batch/v1
kind: Job
metadata:
name: my-batch-processing-job
spec:
template:
spec:
serviceAccountName: ecr-sa
containers:
- name: batch-processor
image: 123456789012.dkr.ecr.us-west-1.amazonaws.com/batch-processing-repo:latest
restartPolicy: Never
- Apply the Job manifest to your EKS cluster:
1
kubectl apply -f batch-job.yaml
1
job.batch/batch-processing-job created
- Monitor the Job execution:
1
kubectl get jobs
1
2
NAME COMPLETIONS DURATION AGE
my-batch-processing-job 1/1 8s 11s
- Create an Amazon SQS queue that will serve as our job queue:
1
aws sqs create-queue --queue-name eks-batch-job-queue
1
2
3
{
"QueueUrl": "https://sqs.us-west-1.amazonaws.com/123456789012/eks-batch-job-queue"
}
- Annotate the existing Amazon ECR service account with Amazon SQS permissions.
1
2
3
4
5
6
7
8
eksctl create iamserviceaccount \
--region ${CLUSTER_REGION} \
--cluster batch-quickstart \
--namespace default \
--name ecr-sa \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonSQSFullAccess \
--override-existing-serviceaccounts \
--approve
- Generate an Amazon ECR authorization token:
1
ECR_TOKEN=$(aws ecr get-login-password --region ${CLUSTER_REGION})
- Create the Kubernetes Secret called “regcred” in the "default" namespace:
1
2
3
4
5
kubectl create secret docker-registry regcred \
--docker-server=${ACCOUNT_ID}.dkr.ecr.${CLUSTER_REGION}.amazonaws.com \
--docker-username=AWS \
--docker-password="${ECR_TOKEN}" \
-n default
1
secret/regcred created
- Create a Kubernetes Job manifest file named
batch-job-queue.yaml
and paste the following contents. Replace the sample values forimage
with your ECR URL andvalue
with your SQS queue URL.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: batch/v1
kind: Job
metadata:
name: batch-processing-job-with-queue
spec:
template:
spec:
containers:
- name: batch-processor
image: 123456789012.dkr.ecr.us-west-1.amazonaws.com/batch-processing-repo:latest
env:
- name: SQS_QUEUE_URL
value: "https://sqs.us-west-1.amazonaws.com/123456789012/eks-batch-job-queue"
restartPolicy: Never
serviceAccountName: ecr-sa
imagePullSecrets:
- name: regcred
- Apply the Job manifest to your EKS cluster:
1
kubectl apply -f batch-job-queue.yaml
1
job.batch/batch-processing-job-with-queue created
- Monitor the Job execution:
1
kubectl get jobs
1
2
3
NAME COMPLETIONS DURATION AGE
batch-processing-job-with-queue 1/1 8s 13s
my-batch-processing-job 1/1 8s 16m
- Echo and save your EFS URL for the next step:
1
echo $FILE_SYSTEM_ID.efs.$CLUSTER_REGION.amazonaws.com
- Create a YAML file named
batch-pv-pvc.yaml
and paste the following contents. Replace the sample value forserver
with your EFS URL.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc
nfs:
path: /
server: fs-0ff53d77cb74d6474.efs.us-east-1.amazonaws.com
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-claim
spec:
storageClassName: efs-sc
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
- Apply the PV and PVC to your Kubernetes cluster:
1
kubectl apply -f batch-pv-pvc.yaml
1
2
persistentvolume/efs-pv created
persistentvolumeclaim/efs-claim created
- Create a Kubernetes Job manifest file named
update-batch-job.yaml
and paste the following contents. Replace the sample value inimage
with your ECR URL.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: batch/v1
kind: Job
metadata:
name: new-batch-job
namespace: default
spec:
template:
spec:
serviceAccountName: ecr-sa
containers:
- name: batch-processor
image: 123456789012.dkr.ecr.us-west-1.amazonaws.com/batch-processing-repo:latest
volumeMounts:
- name: efs-volume
mountPath: /efs
volumes:
- name: efs-volume
persistentVolumeClaim:
claimName: efs-claim
restartPolicy: OnFailure
- Apply the Job manifest to your EKS cluster:
1
kubectl apply -f update-batch-job.yaml
- Create a Kubernetes Job Queue manifest file named
update-batch-job-queue.yaml
and paste the following contents. Replace the sample values forimage
with your ECR URL andvalue
with your SQS queue URL.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: batch/v1
kind: Job
metadata:
name: new-batch-processing-job-queue
namespace: default
spec:
template:
spec:
containers:
- name: batch-processor
image: 123456789012.dkr.ecr.us-west-1.amazonaws.com/batch-processing-repo:latest
env:
- name: SQS_QUEUE_URL
value: "https://sqs.us-west-1.amazonaws.com/123456789012/eks-batch-job-queue"
volumeMounts:
- name: efs-volume
mountPath: /efs
volumes:
- name: efs-volume
persistentVolumeClaim:
claimName: efs-claim
restartPolicy: OnFailure
serviceAccountName: ecr-sa
imagePullSecrets:
- name: regcred
- Apply the Job Queue manifest to your EKS cluster:
1
kubectl apply -f update-batch-job-queue.yaml
1
kubectl logs -f new-batch-job-k267b
1
2
Starting batch task...
Batch task completed.
1
2
3
4
5
# Delete the SQS Queue
aws sqs delete-queue --queue-url YOUR_SQS_QUEUE_URL
# Delete the ECR Repository
aws ecr delete-repository --repository-name YOUR_ECR_REPO_NAME --force
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.