logo
Menu

Managing Asynchronous Tasks with SQS and EFS Persistent Storage in Amazon EKS

Run background tasks in a job queue and leverage scalable, multi-availability zone storage.

Leah Tucker
Amazon Employee
Published Sep 29, 2023
Last Modified Jun 26, 2024
When it comes to managing background work in a Kubernetes environment, the use case isn't solely confined to massive data crunching or real-time analytics. More often, you'll be focused on nuanced tasks like data syncing, file uploads, or other asynchronous activities that work quietly in the background. Amazon SQS, with its 256 KB message size limitation, is especially adept at queuing metadata or status flags that indicate whether jobs are complete or still pending. When combined with Amazon EFS, which offers secure, multi-AZ storage for larger data objects, SQS is free to specialize in task orchestration. This pairing yields two key advantages: it allows for modular operation by segregating different aspects of your background tasks, and it capitalizes on the scalable, multi-availability zone architecture of EFS to meet your evolving storage needs without service interruptions.
Building on the Amazon EKS cluster from part 1 of our series, this tutorial dives into the deployment of batch jobs and job queues. Included in the cluster configuration for the previous tutorial is the installation of the EFS CSI Driver Add-On, IAM Role for Service Account (IRSA) for the EFS CSI Driver, and an OpenID Connect (OIDC) endpoint. For part one of this series, see Building an Amazon EKS Cluster Preconfigured to Run Asynchronous Batch Tasks. To complete the last half of this tutorial, you’ll need the EFS CSI Driver Add-On setup on your cluster. For instructions, see Designing Scalable and Versatile Storage Solutions on Amazon EKS with the Amazon EFS CSI.
You'll also integrate Amazon SQS with your Amazon EKS cluster, build a batch processing application, containerize the application and deploy to Amazon ECR, then use an Amazon SQS job queue to run your batch tasks. In the second half, we'll shift gears to the EFS CSI Driver, which allows us to keep our data intact across multiple nodes while running batch workloads.

Prerequisites

Before you begin this tutorial, you need to:
  • Install the latest version of kubectl. To check your version, run: kubectl version --short.
  • Install the latest version of eksctl. To check your version, run: eksctl info.
  • Install Python 3.9+. To check your version, run: python3 --version.
  • Install Docker or any other container engine equivalent to build the container.
About
✅ AWS LevelIntermediate - 200
⏱ Time to complete30 minutes
🧩 Prerequisites- AWS Account
📢 FeedbackAny feedback, issues, or just a 👍 / 👎 ?
⏰ Last Updated2023-09-29

Step 1: Configure Cluster Environment Variables

Before interacting with your Amazon EKS cluster using Helm or other command-line tools, it's essential to define specific environment variables that encapsulate your cluster's details. These variables will be used in subsequent commands, ensuring that they target the correct cluster and resources.
  1. First, confirm that you are operating within the correct cluster context. This ensures that any subsequent commands are sent to the intended Kubernetes cluster. You can verify the current context by executing the following command:
  1. Define the CLUSTER_NAME environment variable for your EKS cluster. Replace the sample value for cluster region.
  1. Define the CLUSTER_REGION environment variable for your EKS cluster. Replace the sample value for cluster region.
  1. Define the ACCOUNT_ID environment variable for the account associated with your EKS cluster.

Step 2: Verify or Create the IAM Role for Service Accounts

In this section, we will verify that the required IAM roles for service accounts are properly set up in your Amazon EKS cluster. These roles are crucial for enabling smooth interaction between AWS services and Kubernetes, so you can make use of AWS capabilities within your pods. Since batch workloads are typically stored in private container registries, we will create a service account specifically for Amazon ECR.
Make sure the required service accounts for this tutorial are correctly set up in your cluster:
The expected output should look like this:
Optionally, if you do not already have these service accounts set up, or if you receive an error, the following commands will create the service accounts. Note that you must have an OpenID Connect (OIDC) endpoint associated with your cluster before you run these commands.
  1. To create a Kubernetes service account for Amazon ECR:
The EFS CSI Driver does not have an AWS managed policy, so there are a few additional steps to create the service account. For instructions, see Designing Scalable and Versatile Storage Solutions on Amazon EKS with the Amazon EFS CSI.

Step 3: Verify the EFS CSI Driver Add-On Is Installed

In this section, we'll verify that the EFS CSI Driver managed add-on is properly installed and active on your Amazon EKS cluster. The EFS CSI Driver is crucial for enabling Amazon EFS to work seamlessly with Kubernetes, allowing you to mount EFS file systems as persistent volumes for your batch workloads.
  1. Check that the EFS CSI driver is installed:
The expected output should look like this:
If the EFS CSI Driver Add-On is not installed on your cluster, see Designing Scalable and Versatile Storage Solutions on Amazon EKS with the Amazon EFS CSI.

Step 4: Run the Sample Batch Application

In this section, we'll delve into the sample batch application that's part of this tutorial. This Python-based batch processing app serves as a practical example to demonstrate how you can read, process, and write data in batches. It reads data from an input.csv file, performs data manipulation using randomization for demonstration, and writes the processed data back to an output.csv file. This serves as a hands-on introduction before we deploy this application to Amazon ECR and EKS.
  1. Create a Python script named batch_processing.py and paste the following contents:
  1. In the same directory as your Python script, create a file named input.csv and paste the following contents:
  1. Run the Python script:
The expected output should look like this:
Additionally, an output.csv file will be generated, containing the processed data with an additional column for the processed values:

Step 5: Preparing and Deploying the Batch Container

In this section, we’ll build a container from the ground up and store it in a private ECR repository. This section guides you through the process of packaging your batch processing application in a container and uploading it to Amazon's Elastic Container Registry (ECR). Upon completing this section, you'll have a Docker container image securely stored in a private ECR repository, primed for deployment on EKS.
  1. In the same directory as the other files you created, create a Dockerfile and paste the following contents:
  1. Build the Docker image:
  1. Create a new private Amazon ECR repository:
  1. Authenticate the Docker CLI to your Amazon ECR registry:
  1. Tag your container image for the ECR repository:
  1. Push the tagged image to the ECR repository:

Step 6: Create the Multi-Architecture Image

To ensure that your batch application can be deployed across various hardware architectures, like within your Kubernetes cluster, it's vital to create a multi-architecture container image. This step leverages Docker's buildx tool to accomplish this. By the end of this section, you will have successfully built and pushed a multi-architecture container image to Amazon ECR, making it accessible for deployment on your Amazon EKS cluster.
  1. Create and start new builder instances for the batch service:
  1. Build and push the images for your batch service to Amazon ECR:
  1. Verify that the multi-architecture image is in the ECR repository:

Step 7: Deploy the Kubernetes Job

In this section, we'll transition to deploying your containerized batch processing application as a Kubernetes Job on your Amazon EKS cluster. Your batch tasks, encapsulated in a container and stored in a private ECR repository, will now be executed in a managed, scalable environment within EKS.
  1. Get the details of your ECR URL:
  1. Create a Kubernetes Job manifest file named batch-job.yaml and paste the following contents. Replace the sample value in image with your ECR URL.
  1. Apply the Job manifest to your EKS cluster:
The expected output should look like this:
  1. Monitor the Job execution:
The response output should show that the jobs have completed:

Step 8: Enable Permissions for Batch Processing Jobs on SQS

In this section, we'll dive into the orchestration of batch processing jobs in a Kubernetes cluster, leveraging Amazon SQS as a job queue. Additionally, you'll learn how to extend the permissions of an existing Kubernetes service account. In our case, we'll annotate the Amazon ECR service account to include Amazon SQS access, thereby creating a more versatile and secure environment for your batch jobs.
  1. Create an Amazon SQS queue that will serve as our job queue:
You should see the following response output. Save the URL of the queue for subsequent steps.
  1. Annotate the existing Amazon ECR service account with Amazon SQS permissions.

Step 9: Create a Kubernetes Secret

Now we’ll create a Kubernetes secret to ensure our pods have access to our private Amazon ECR repository. This is a critical step because it ensures that your Kubernetes cluster can pull the necessary container images from your private ECR repository. You might be wondering whether this ECR secret will survive pod restarts, especially considering that ECR tokens are only valid for 12 hours. Kubernetes will automatically refresh the secret when it nears expiration, ensuring uninterrupted access to your private ECR repository.
  1. Generate an Amazon ECR authorization token:
  1. Create the Kubernetes Secret called “regcred” in the "default" namespace:
The expected output should look like this:

Step 10: Deploy the Kubernetes Job With Queue Integration

In this section, we'll orchestrate a Kubernetes Job that is tightly integrated with an Amazon SQS queue. This integration is crucial for handling batch processing tasks in a more distributed and scalable manner. By leveraging SQS, you can decouple the components of a cloud application to improve scalability and reliability. We'll start by creating a Kubernetes Job manifest that includes environment variables for the SQS queue URL. This ensures that your batch processing application can interact with the SQS queue to consume messages and possibly trigger more complex workflows.
  1. Create a Kubernetes Job manifest file named batch-job-queue.yaml and paste the following contents. Replace the sample values for image with your ECR URL and value with your SQS queue URL.
  1. Apply the Job manifest to your EKS cluster:
The expected output should look like this:
  1. Monitor the Job execution:
When the Job is completed, you'll see the completion status in the output:
Congratulations! You've successfully deployed a batch processing job to your EKS cluster with an integrated Amazon SQS job queue. This setup allows you to manage and scale your batch jobs more effectively, leveraging the full power of Amazon EKS and AWS services.

Step 11: Create the PersistentVolume and PersistentVolumeClaim for EFS

In this section, you'll create a PersistentVolume (PV) and PersistentVolumeClaim (PVC) that will use the EFS storage class. This will provide a persistent storage layer for your Kubernetes Jobs. This builds upon the previous tutorial at Designing Scalable and Versatile Storage Solutions on Amazon EKS with the Amazon EFS CSI, where you set up environment variables for your EFS URL.
  1. Echo and save your EFS URL for the next step:
  1. Create a YAML file named batch-pv-pvc.yaml and paste the following contents. Replace the sample value for server with your EFS URL.
  1. Apply the PV and PVC to your Kubernetes cluster:
The expected output should look like this:

Step 12: Implement Persistent Storage With Amazon EFS

In this section, we'll enhance your Kubernetes Jobs to use Amazon EFS for persistent storage. Building on the previous tutorial at Designing Scalable and Versatile Storage Solutions on Amazon EKS with the Amazon EFS CSI, where you set up an EFS-based 'StorageClass,' you'll add a Persistent Volume Claim (PVC) to your existing Job manifests. Due to the immutable nature of Jobs, you'll also adopt a versioning strategy. Instead of updating existing Jobs, you'll create new ones with different names but similar specs, allowing for historical tracking and version management through labels and annotations.
  1. Create a Kubernetes Job manifest file named update-batch-job.yaml and paste the following contents. Replace the sample value in image with your ECR URL.
  1. Apply the Job manifest to your EKS cluster:
  1. Create a Kubernetes Job Queue manifest file named update-batch-job-queue.yaml and paste the following contents. Replace the sample values for image with your ECR URL and value with your SQS queue URL.
  1. Apply the Job Queue manifest to your EKS cluster:
You can watch the logs of the job to see it processing the batch task:
The expected output should look like this:

Clean Up

After finishing with this tutorial, for better resource management, you may want to delete the specific resources you created.
If you enjoyed this tutorial, found any issues, or have feedback for us, please send it our way!

Conclusion

You've successfully orchestrated batch processing tasks in your Amazon EKS cluster using Amazon SQS and EFS! You've not only integrated SQS as a robust job queue but also leveraged the EFS CSI Driver for persistent storage across multiple nodes. This tutorial has walked you through the setup of your Amazon EKS cluster, the deployment of a Python-based batch processing application, and its containerization and storage in Amazon ECR. You've also learned how to create multi-architecture images and deploy them as Kubernetes Jobs. Furthermore, you've extended the capabilities of your Kubernetes Jobs by integrating them with Amazon SQS and providing persistent storage through Amazon EFS.
To continue your journey, setup the Cluster Autoscaler for dynamic scaling, or explore EFS Lifecycle Management to automate moving files between performance classes or enabling EFS Intelligent-Tiering to optimize costs.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments