Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

AWS Logo
Menu

Hosting DeepSeek-R1 on Amazon EKS

In this tutorial, we’ll walk you through how to host DeepSeek models on AWS using Amazon EKS Auto Mode.

Tiago Reichert
Amazon Employee
Published Jan 29, 2025
Last Modified Feb 3, 2025
In this tutorial, we’ll walk you through how to host DeepSeek-R1 model on AWS using Amazon EKS. We are using Amazon EKS Auto Mode for the the flexibility and scalability that it provides, while eliminating the need for you to manage the Kubernetes control plane, compute, storage, and networking components.

Why Deploy DeepSeek on Amazon EKS?

  • Open and Accessible: DeepSeek's open-source approach democratizes AI development, allowing more organizations and researchers to access and experiment with its advanced language models.
  • Improved Reasoning Capabilities: DeepSeek R1 leverages Chain of Thought (CoT) reasoning, which allows the model to break down complex problems into smaller, more manageable steps. This enhances the model's ability to solve tasks like math problems and logical puzzles.
  • Simplified Hosting on Amazon EKS: By hosting DeepSeek on Amazon EKS Auto Mode, you can eliminate the need to manage the underlying Kubernetes infrastructure, enabling you to focus on deploying and using the models.

Deploying DeepSeek-R1 on Amazon EKS Auto Mode

For this tutorial, we’ll use the DeepSeek-R1-Distill-Llama-8B distilled model. While it requires fewer resources (like GPU) compared to the full DeepSeek-R1 model with 671B parameters, it provides a lighter, though less powerful, option compared to the full model.
If you'd prefer to deploy the full DeepSeek-R1 model, replace the distilled model in the vLLM configuration.

Install PreReqs

We’ll use AWS CloudShell for the setup in this tutorial to simplify the process.
Image not found
1
2
3
4
5
6
7
8
# Installing kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Install Terraform
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/AmazonLinux/hashicorp.repo
sudo yum -y install terraform

Create an Amazon EKS Cluster w/ Auto Mode using Terraform

We'll use Terraform to easily provision the infrastructure, including a VPC, ECR repository, and an EKS cluster with Auto Mode enabled.
1
2
3
4
5
6
7
8
9
10
# Clone the GitHub repo with the manifests
git clone -b v0.1 https://github.com/aws-samples/deepseek-using-vllm-on-eks
cd deepseek-using-vllm-on-eks

# Apply the Terraform configuration
terraform init
terraform apply -auto-approve

# After Terraform finishes, configure kubectl with the new EKS cluster
$(terraform output configure_kubectl | jq -r)

Create an EKS Auto Mode NodePool

For GPU support, we need to create a custom NodePool.
1
2
3
4
5
# Create a custom NodePool with GPU support
kubectl apply -f manifests/gpu-nodepool.yaml

# Check if the NodePool is in 'Ready' state
kubectl get nodepool/gpu-nodepool

Deploy Deepseek Model

We’ll deploy the DeepSeek-R1-Distill-Llama-8B model using vLLM. To simplify the process, we’ve provided a sed command that allows you to easily set the model name and parameters.
1
2
3
4
5
6
7
8
# Use the sed command to replace the placeholder with the model name and configuration parameters
sed -i "s|__MODEL_NAME_AND_PARAMETERS__|deepseek-ai/DeepSeek-R1-Distill-Llama-8B --max_model 2048|g" manifests/deepseek-deployment-gpu.yaml

# Deploy the DeepSeek model on Kubernetes
kubectl apply -f manifests/deepseek-deployment-gpu.yaml

# Check the pods in the 'deepseek' namespace
kubectl get po -n deepseek
Initially, the pod might be in a Pending state while EKS Auto Mode provisions the underlying EC2 instances with the required GPU drivers.
If your pod is stuck in a Pending state for several minutes, confirm that your AWS account has sufficient service quota to launch the required instances. Check the quota limits for G or P instances.
For more information, refer to the AWS EC2 Instance Quotas documentation.
Note: Those quotas are based on vCPUs, not the number of instances, so be sure to request accordingly.
1
2
3
4
5
6
7
8
# Wait for the pod to reach the 'Running' state
watch -n 1 kubectl get po -n deepseek

# Verify that a new Node has been created
kubectl get nodes -l owner=data-engineer

# Check the logs to confirm that vLLM has started
kubectl logs deployment.apps/deepseek-deployment -n deepseek
You will see the log entry Application startup complete once the deployment is ready.

Interact with the DeepSeek LLM

Next, we can create a local proxy to interact with the model using a curl request.
1
2
3
4
5
6
7
8
9
10
11
12
13
# Set up a proxy to forward the service port to your local terminal
kubectl port-forward svc/deepseek-svc -n deepseek 8080:80 > port-forward.log 2>&1 &

# Send a curl request to the model
curl -X POST "http://localhost:8080/v1/chat/completions" -H "Content-Type: application/json" --data '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [
{
"role": "user",
"content": "What is Kubernetes?"
}
]
}'
The response may take a few seconds to build, depending on the complexity of the model’s output. You can monitor the progress via the deepseek-deployment logs.

Build a Chatbot UI for the Model

While direct API requests work fine, let’s build a more user-friendly Chatbot UI to interact with the model. The source code for the UI is already available in the GitHub repository.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Retrieve the ECR repository URI created by Terraform
export ECR_REPO=$(terraform output ecr_repository_uri | jq -r)

# Build the container image for the Chatbot UI
docker build -t $ECR_REPO:0.1 chatbot-ui/application/.

# Login to ECR and push the image
aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_REPO
docker push $ECR_REPO:0.1

# Update the deployment manifest to use the image
sed -i "s#__IMAGE_DEEPSEEK_CHATBOT__#$ECR_REPO:0.1#g" chatbot-ui/manifests/deployment.yaml

# Generate a random password for the Chatbot UI login
sed -i "s|__PASSWORD__|$(openssl rand -base64 12 | tr -dc A-Za-z0-9 | head -c 16)|" chatbot-ui/manifests/deployment.yaml

# Deploy the UI and create the ingress class required for load balancers
kubectl apply -f chatbot-ui/manifests/ingress-class.yaml
kubectl apply -f chatbot-ui/manifests/deployment.yaml

# Get the URL for the load balancer to access the application
echo http://$(kubectl get ingress/deepseek-chatbot-ingress -n deepseek -o json | jq -r '.status.loadBalancer.ingress[0].hostname')
Wait a few seconds for the load balancer to get provisioned.
To access the Chatbot UI, you'll need the username and password stored in a Kubernetes secret.
1
echo -e "Username=$(kubectl get secret deepseek-chatbot-secrets -n deepseek -o jsonpath='{.data.admin-username}' | base64 --decode)\nPassword=$(kubectl get secret deepseek-chatbot-secrets -n deepseek -o jsonpath='{.data.admin-password}' | base64 --decode)"
After logging in, you'll see a new Chatbot tab where you can interact with the model!
Image not found
By following these steps, you can effectively deploy the DeepSeek R1 model on Amazon EKS, leveraging its flexible scaling options and granular resource control to optimize costs while maintaining high performance. The solution leverages Kubernetes' native capabilities and EKS features like Auto Mode to deliver a highly configurable deployment that can be precisely tailored to your operational requirements and budget constraints.
For more patterns, such as deploying on Neuron and open-source Karpenter, follow the deepseek-using-vllm-on-eks GitHub repository.

Authors

Tiago Reichert is a Sr. Containers SA at AWS, focused on helping startups across Latin America to optimize their container strategies. With a deep passion for Containers, DevOps, and SaaS, he collaborates with businesses to design scalable and efficient cloud solutions. Tiago also actively contributes to the tech community as an organizer of KCD Brazil and meetups focused on promoting cloud-native technologies.
Lucas Duarte is a Sr. Containers SA at AWS, dedicated to supporting ISV customers in AMER through AWS Container services. Beyond his Solutions Architect role, Lucas brings extensive hands-on experience in Kubernetes and DevOps leadership. He's been a key contributor to multiple companies in Brazil, driving DevOps excellence.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

8 Comments

Log in to comment