
Alibaba's Wan AI video generation models on AWS ECS
Initial tests of WAN AI v2.1 video generation models
Didier Durand
Amazon Employee
Published Mar 3, 2025
Wan AI, the entity responsible for Alibaba Cloud's large-scale generative models, has just released
Wan 2.1, a “comprehensive and open suite of video foundation models that pushes the boundaries of video generation”.
Wan 2.1, a “comprehensive and open suite of video foundation models that pushes the boundaries of video generation”.
Alibaba has released four variants of Wan 2.1 - T2V-1.3B, T2V-14B, I2V-14B-720P and I2V-14B-480P -
which generate images and videos from text and image input. This article will focus on the small
(1.3 billion parameters) model with text-to-video capabilities: Wan2.1-T2V-1.3B. See their web site (for all details.
which generate images and videos from text and image input. This article will focus on the small
(1.3 billion parameters) model with text-to-video capabilities: Wan2.1-T2V-1.3B. See their web site (for all details.
Their Wan Github repo is at https://github.com/Wan-Video/Wan2.1
To reproduce our tests on your own, in particular to get our Docker file used on ECS, please, fork our repo: https://github.com/didier-durand/llms-in-clouds
Our goal is to evaluate those Wan models in a fully isolated and portable environment including AWS ECS. The Wan 2.1 project does not supply yet a Docker file to build and execute the project.
So, we developed one (documented here below) that can be used for running the video generation on your GPU-equipped laptop or in a cloud-based container service. It will be AWS Elastic Container Service (ECS) in our case because it gives easy access to multiple types of GPU via ad hoc EC2 instance types on AWS.
So, we developed one (documented here below) that can be used for running the video generation on your GPU-equipped laptop or in a cloud-based container service. It will be AWS Elastic Container Service (ECS) in our case because it gives easy access to multiple types of GPU via ad hoc EC2 instance types on AWS.
To have a comparison basis for the output, we used the prompt published by Nvidia in their
Cosmos project, launched in January 2025, to see the interpretation of same words by different models.
Cosmos project, launched in January 2025, to see the interpretation of same words by different models.
It is (see Cosmos’ Github repo for prompt and resulting video): "A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus
on the robot while subtly blurring the background for a cinematic effect."
on the robot while subtly blurring the background for a cinematic effect."
The Wan project has added a nice interactive interface, based on HuggingFace’s Gradio, to more easily enter the prompts and see resulting video

The video generated with 50 diffusion steps (whose processing lasts approximately 44.5 minutes for a video duration of 5s - see last line of execution log on the AWS ECS cluster)
from this prompt has been uploaded on Youtube.
from this prompt has been uploaded on Youtube.
For this test, Wan2.1-T2V-1.3B has been deployed in a ECS cluster on an EC2 instance of type
g6.12xlarge featuring 4 x NVIDIA L4 Tensor Core GPUs with 24GB of RAM per GPU.
g6.12xlarge featuring 4 x NVIDIA L4 Tensor Core GPUs with 24GB of RAM per GPU.
This section describes the main aspects of this Dockerfile (available here) in case you need to customize it for reuse in your environment:
- It is based on the image pytorch/pytorch:2.6.0-cuda12.6-cudnn9-devel (sourced directly from Docker hub - size: 13GB+ when stored in Docker registry), which brings Pytorch, Nvidia’s CUDA and all their dependencies on top of Python 3.11 and Ubuntu 22.04 (Jammy Jellyfish).
- The Linux environment variable LD_LIBRARY_PATH has to be extended to allow dynamic loading of Mesa libraries as Pytorch needs them in its execution.
- The Linux environment variable PYTORCH_CUDA_ALLOC_CONF is set to `expandable_segments:True` to optimize GPU memory and avoid some errors of type `torch.OutOfMemoryError: CUDA out of memory`.
- The model will look for the model weights in container directory /home/model/Wan-AI/<name of the model>. For this test, the model is Wan2.1-T2V-1.3B, whose size is approx 14GB when loaded onto the GPU (see output of `nvidia-smi` command below). This respectable size limits the model of GPU that can be used for video generation: the GPU board memory must accept the load of those weights plus those of associated computation binaries.
- The Wan model is NOT included in the image to keep it generic and not too big. It is supposed to be accessible via a volume mounted when `docker run command` is executed. So, import its files on the host system before starting the container and [mount the corresponding directory](https://docs.docker.com/engine/storage/volumes/#syntax). On ECS, we run a command copying the model from S3 (to avoid fetching from HuggingFace on each start) to the EC2 instance providing
the compute capacity, when this instance starts. - Several environment variables are defined to provide additional flexibility (`$MODEL,$MODEL_DIR,$LAUNCHER`) to be able to use 1 single image
and dynamically change its configuration.
The exposed port 7860 is the standard one used by Gradio.
For readability purposes, the Docker file available on Github is copied here:
When started as an ECS Task in a ECS service, the logs emitted by Wan2.1-T2V-1.3B are the following.
When prompt with the prompt described above, the model will do 50 iterations each lasting each approx 53.5s. So the total computing duration is 44min34s as per last line of logs below.
When prompt with the prompt described above, the model will do 50 iterations each lasting each approx 53.5s. So the total computing duration is 44min34s as per last line of logs below.
As expected, only 1 GPU out of the 4 available ones is used by the 1.3B mono-GPU model as shown by command nvidia-smi nvidia-smi
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.