Hosting DeepSeek-R1 on Amazon EKS
In this tutorial, we’ll walk you through how to host DeepSeek models on AWS using Amazon EKS Auto Mode.
Tiago Reichert
Amazon Employee
Published Jan 29, 2025
Last Modified Feb 3, 2025
In this tutorial, we’ll walk you through how to host DeepSeek-R1 model on AWS using Amazon EKS. We are using Amazon EKS Auto Mode for the the flexibility and scalability that it provides, while eliminating the need for you to manage the Kubernetes control plane, compute, storage, and networking components.
- Open and Accessible: DeepSeek's open-source approach democratizes AI development, allowing more organizations and researchers to access and experiment with its advanced language models.
- Improved Reasoning Capabilities: DeepSeek R1 leverages Chain of Thought (CoT) reasoning, which allows the model to break down complex problems into smaller, more manageable steps. This enhances the model's ability to solve tasks like math problems and logical puzzles.
- Simplified Hosting on Amazon EKS: By hosting DeepSeek on Amazon EKS Auto Mode, you can eliminate the need to manage the underlying Kubernetes infrastructure, enabling you to focus on deploying and using the models.
For this tutorial, we’ll use the DeepSeek-R1-Distill-Llama-8B distilled model. While it requires fewer resources (like GPU) compared to the full DeepSeek-R1 model with 671B parameters, it provides a lighter, though less powerful, option compared to the full model.
If you'd prefer to deploy the full DeepSeek-R1 model, replace the distilled model in the vLLM configuration.
We’ll use AWS CloudShell for the setup in this tutorial to simplify the process.

We'll use Terraform to easily provision the infrastructure, including a VPC, ECR repository, and an EKS cluster with Auto Mode enabled.
For GPU support, we need to create a custom NodePool.
We’ll deploy the DeepSeek-R1-Distill-Llama-8B model using vLLM. To simplify the process, we’ve provided a sed command that allows you to easily set the model name and parameters.
Initially, the pod might be in a Pending state while EKS Auto Mode provisions the underlying EC2 instances with the required GPU drivers.
If your pod is stuck in a Pending state for several minutes, confirm that your AWS account has sufficient service quota to launch the required instances. Check the quota limits for G or P instances.
For more information, refer to the AWS EC2 Instance Quotas documentation.
Note: Those quotas are based on vCPUs, not the number of instances, so be sure to request accordingly.
You will see the log entry Application startup complete once the deployment is ready.
Next, we can create a local proxy to interact with the model using a curl request.
The response may take a few seconds to build, depending on the complexity of the model’s output. You can monitor the progress via the deepseek-deployment logs.
While direct API requests work fine, let’s build a more user-friendly Chatbot UI to interact with the model. The source code for the UI is already available in the GitHub repository.
Wait a few seconds for the load balancer to get provisioned.
To access the Chatbot UI, you'll need the username and password stored in a Kubernetes secret.
To access the Chatbot UI, you'll need the username and password stored in a Kubernetes secret.
After logging in, you'll see a new Chatbot tab where you can interact with the model!

By following these steps, you can effectively deploy the DeepSeek R1 model on Amazon EKS, leveraging its flexible scaling options and granular resource control to optimize costs while maintaining high performance. The solution leverages Kubernetes' native capabilities and EKS features like Auto Mode to deliver a highly configurable deployment that can be precisely tailored to your operational requirements and budget constraints.
For more patterns, such as deploying on Neuron and open-source Karpenter, follow the deepseek-using-vllm-on-eks GitHub repository.
Tiago Reichert is a Sr. Containers SA at AWS, focused on helping startups across Latin America to optimize their container strategies. With a deep passion for Containers, DevOps, and SaaS, he collaborates with businesses to design scalable and efficient cloud solutions. Tiago also actively contributes to the tech community as an organizer of KCD Brazil and meetups focused on promoting cloud-native technologies.
Lucas Duarte is a Sr. Containers SA at AWS, dedicated to supporting ISV customers in AMER through AWS Container services. Beyond his Solutions Architect role, Lucas brings extensive hands-on experience in Kubernetes and DevOps leadership. He's been a key contributor to multiple companies in Brazil, driving DevOps excellence.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.