Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

AWS Logo
Menu
Efficiently download LLM weights from HuggingFace to S3

Efficiently download LLM weights from HuggingFace to S3

Interactive CFN-wrapped utility on AWS console to easily and efficiently download multiple models from HuggingFace and store them on S3

Didier Durand
Amazon Employee
Published Mar 4, 2025

Rationale

Providers of LLM models tend to release them in families of multiple members. Quite often, you need to compare several of those members to select the right one with the right compromise between latency and quality for the responses.
Additionally, those models tend to be large: 50GB+ is not at all uncommon for high-end models. So, you cannot download them on-the-fly each time from HuggingFace (HF) for obvious reasons (cost, latency, security, etc.). This simple CloudFormation (CFN) template cfn-hf-download.yaml allows to create a mechanism running on AWS cloud to download LLM models from HuggingFace (HF) site and store them in S3 for re-use. The created CFN stack mainly consists of an AWS CodeBuild project
that can be triggered from the AWS console whenever a new model needs to be downloaded.
Solution architecture
Solution architecture
The goal is to allow fast(er), efficient, scalable and safe archiving and retrieval of large models when they are deployed on AWS Cloud: they never transit via a personal laptop or any other machine with reduced bandwidth. Similarly, for inferences and other ML activities, those models are retrieved directly from S3 instead of HF.
Those stored models can also be used for multiple activities: of course productive inferences but distillation, quantization, fine-tuning, etc. on AWS service like Elastic Cluster Service (ECS) or SageMaker.
To make the life of users easy, .i.e. to avoid changes / update to the CFN stack after it is instantiated (for example, when a new list of models has to be downloaded) and to avoid the necessity of granting CFN privileges to the users, we define the target bucket, target path and LLM model list as AWS Systems Manager (SSM) parameters that can be easily updated via the AWS console by SSM-authorized users with no need for them to have credentials on the CFN stack or on the CodeBuild project (except right to launch it)
We also support the HuggingFace access token, as an optional SSM Parameter. It may be required to authenticate to HF when the license of a given model requires some pre-download agreement on the HuggingFace web site.
Note: if you wonder what the UnicitySuffix parameter is used for. It extracts a chunk of the unique CFN stack UUID. This chunk is then appended to all resource names to avoid issues of name duplicates when multiple instances of this stack template are instantiated simultaneously.

Usage

  1. If needed before its instantiation, update the stack template with the default values for the parameters named LlmModels, PathToModels, BucketName with your own values. The list of models is a comma-separated string. Note: do not update the HF token in the CFN template, you’ll update the corresponding AWS SSM parameter after the CFN created it.
  2. Instantiate the CloudFormation stack via AWS Console, AWS CLI or any SDK. In standard situation, you’ll never have to update it again: everything happens via change of values for the parameters.
  3. If needed update the various parameters (s3 bucket, model list and path, access token) in SSM Parameter Store via the AWS Console interactive UI.
  4. Start the build project via the AWS CodeBuild console to download the models to your S3 bucket. The HF CLI will handle the download and save it.
Notes:
  • the IAM CodeBuild execution role limits the authorization of the Codebuild project. It can only execute CloudWatch logs, SSM and S3 commands to log activity, access SSM parameters and save models in the S3 bucket. If your environment requires it, you can further tighten credentials by restricting the resources to the specific log streams, parameters and bucket in use.
  • On the cost efficiency side, CodeBuild pricing is charged with the pay-as-you-go model: "There are no upfront costs or minimum fees. You pay only for the resources you use. You are charged for compute resources based on the duration it takes your build to run. The rate depends on the selected compute type". So, the CodeBuild project in the CFN template won't cost you if you don't use it. Same thing for standard SSM parameters and IAM roles. For all those services, only API calls (from console or CLI) will be charged.
  • The GitHub workflow lint_cfn.yaml attached to the repository checks the quality of the CFN template each time we commit a new version. It raises an error as soon as cfn-lint finds an error or a warning in template syntax. See Actions page for details.

Execution in AWS CodeBuild

As example, below is the final part of the execution log for the download of 3 models of the IBM Granite 3.2
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments