GenAI on Outposts: Bringing AI to the Edge

Authors: Abeer Naffa, Sr SA-Hybrid Cloud, Davi Garcia Sr SA-Migration Specialist and Federico D'Alessio, Principal Solution Architect

Last month, I worked closely with Davi and Federico on testing and deploying a GenAI solution on AWS Outposts, using Amazon EKS as the foundation for seamless integration. Our primary objective was to enable GenAI capabilities at the edge. This setup enables testing various AI models at the edge, enabling low-latency responses and enhanced data privacy for inference tasks. We also developed a detailed, step-by-step walkthrough to automate the deployment of the EKS cluster on Outposts, simplifying the process and ensuring smooth implementation. By extending AWS’s full capabilities to on-premises environments, this hybrid cloud approach meets the data residency demands of GenAI.

Prerequisites

Before proceeding, ensure you meet the following prerequisites:

An AWS account with access to EKS and AWS Outposts.
Appropriate IAM permissions to create EKS clusters, manage EC2 instances, and work with other AWS services.
Availability of the g4dn.12xlarge GPU instance type in your AWS Outposts configuration.

Disclaimer: Before following the steps outlined below, please ensure they comply with your organization's security protocols and corporate policies.

Installation of the latest version AWS Command Line Interface (AWS CLI) (v2 recommended), kubectl, and eksctl.

Walkthrough

Step 1: Prepare the Cluster Configuration File

Create a configuration file for your EKS cluster that specifies the use of g4dn.12xlarge instances on AWS Outposts. Save the following content into a file named cluster-config.yaml:

make sure to update region and OutpostARN before saving this config as cluster-config.yaml

Step 2: Create the EKS Cluster on AWS

Once the configuration file ready and saved, deploy your EKS cluster as the following:

Step 3: Create the EKS Self-Managed Node Group on AWS Outpost

After the cluster setup:

Add your self-managed node group specifically configured for your AWS Outpost:

Install NVIDIA GPU operator:

Step 4: Deploy the AWS Load Balancer Controller

To manage ingress and egress for applications running on the cluster, deploy the "AWS Load Balancer Controller". This requires an IAM role with specific permissions.

Create an IAM policy using the policy downloaded.

Create an IAM role. Create a Kubernetes service account named aws-load-balancer-controller in the kube-system namespace for the AWS Load Balancer Controller and annotate the Kubernetes service account with the name of the IAM role.

Install the AWS Load Balancer Controller, as the following:

Verify that the controller is installed.

Step 5: Deploying NVIDIA GPU-Enabled Sample Apps

Deploying a sample application that utilizes GPU resources on an Amazon EKS cluster deployed on Outposts. Here, you are using Helm to deploy an application named open-webui-ollama, configured to leverage NVIDIA GPUs.

Step 6: Verify the Deployment and Monitor the WebUI Readiness

After deploying the open-webui application, it's crucial to verify that the deployment was successful and to monitor the readiness of the WebUI, which may take approximately 5 to 10 minutes to become fully operational. Here’s how you can do it:

Check Pod Status: Start by checking the status of the pods to ensure they are running correctly. You can use the following command to monitor the pods continuously:

Check Service and Ingress: As the WebUI readiness depends on both the service and the ingress being properly set up, you should verify these resources are configured correctly:

Monitor Ingress Readiness: Since ingress provisioning can take a few minutes, especially when integrating with AWS ALB, you can monitor the readiness by observing the annotations and events associated with the ingress:

Open the ingress address in your browser and start the conversation with the model you choose:

Conclusion

By integrating AWS EKS with AWS Outposts and utilizing GPUs, businesses can deploy powerful edge-specific applications that require intensive computation and low-latency processing. This setup empowers organizations to bring GenAI capabilities directly to the edge, maximizing the potential of their on-premises and hybrid cloud environments.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.