
Deploying DeepSeek-R1-Distill-Llama-8B on AWS m7i Instance
This is a community post discussing how to deploy DeepSeek-R1-Distill-Llama-8B on AWS EC2 M7i instances
1
git clonehttps://github.com/vllm-project/vllm.git
1
cd vllm
1
sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=4g .
--shm-size
flag allows the container to access the host’s shared memory. vLLM uses PyTorch, which uses shared memory to share data between processes under the hood, particularly for tensor parallel inference.1
sudo docker run -it --rm --network=host vllm-cpu-env --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --device cpu --max_model_len 500
1
curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", "prompt": "Explain why the sun emits immense energy", "temperature": 0, "max_tokens": 32}'
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.