AWS Logo
Menu

Deploy Deepseek-R1: Guide to run multiple variants on AWS

Run multiple variants on deepseek like R1, distilled llama and qwen on GPUs that suits you best on your own AWS account.

Published Jan 29, 2025
Hi Everyone
Deepseek-R1 is everywhere. So, we have done the heavy lifting for you to run each variant on the cheapest and highest-availability GPUs. All these configurations have been tested with vLLM for high throughput and auto-scale with the Tensorfuse serverless runtime.
Below is the table that summarizes the configurations you can run.
Supported GPU types for each variant of Deepseek R1

Take it for an experimental spin

You can find the Dockerfile and all configurations in the GitHub repo below. Simply open up a GPU VM on your cloud provider, clone the repo, and run the Dockerfile.

Deploy a production-ready service on AWS using Tensorfuse

If you are looking to use Deepseek-R1 models in your production application, follow our detailed guide to deploy it on your AWS account using Tensorfuse.
The guide covers all the steps necessary to deploy open-source models in production:
1. Deployed with the vLLM inference engine for high throughput
2. Support for autoscaling based on traffic
3. Prevent unauthorized access with token-based authentication
4. Configure a TLS endpoint with a custom domain
 

1 Comment