AWS Logo
Menu
Extending AWS Capacity Blocks for Uninterrupted ML workloads

Extending AWS Capacity Blocks for Uninterrupted ML workloads

Automatically extend Capacity Blocks duration for your ML workloads.

Kareem Abdol-Hamid
Amazon Employee
Published Jun 9, 2025
Authored by: Arpit Sapra & Kareem Abdol-Hamid
How long do you need compute capacity to serve your workload? The answer depends on what the traffic or utilization pattern is. Depending on your use case, compute needs may be unpredictable — such as market driven analysis, event oriented applications; on the other hand, there are workloads where the duration of compute requirement is well defined, such as batch processing jobs.
For AI/ML workloads, knowing the duration of your workload is even more important, as GPUs are a costly resource! While long term commitments provide the most discount for GPU based Amazon EC2 instances, we have observed in our work with Startups that customers are often looking for a cost-effective yet short-lived duration commitment to utilize GPU capacity in the cloud. Customers running training, fine-tuning, or hyperparameter-tuning workloads are particularly keen to reserve GPU capacity at a discount for a duration of days, weeks, or months.
To meet this customer need, AWS launched Amazon EC2 Capacity Blocks for ML in 2023. With Capacity Blocks (CBs), you can reserve accelerated compute instances for up to six months duration in cluster sizes of one to 64 instances (512 GPUs or 1024 Trainium chips), giving you the flexibility to run a broad range of ML workloads. Capacity Blocks also offer the capability to extend your reservation when you need it for a few more days/weeks, eliminating concerns around interruption in between your workload! You don’t want your GPUs to be sitting idle when your work is finished, but you also don’t want your GPUs to be shut down in the middle of a critical workload!
In this post, we will explore Capacity Blocks Manager (CBM), a simple sample that tracks and manages AWS Capacity Blocks reservations, provides alerts for Capacity Blocks reservations that are expiring in the near future, automates extension workflow, and implements approvals process so that you maintain full control over the extension. CBM lets you enjoy the savings Capacity Blocks provides, while automatically finding capacity for you as long as you need!

Deep dive into Capacity Block Manager

Capacity Block Manager is available on AWS’s Github under aws-samples. It allows automating management of AWS Capacity Block compute environments using a CDK-deployed API and Lambda function. It supports extension logic, approval workflows, and secure API-key-based access. These CB environments represent existing Capacity Reservations — what's created when you launch CBs — with additional parameters to automate extensions and configure notifications.
These environments represent existing Capacity Blocks Reservations — what's created when you use CBs — with additional criteria to automate extensions and notifications.

How Does it Work?

Capacity Blocks Manager spins up a simple serverless architecture that can be interacted through an API.

This API allows you to create compute environments with the following configurations:
Here we define the duration we want the Capacity Block to be extended for, and when to check for an extension — 2 days before the end date in this case — the system can later use that information to extend the reservation automagically.
Finally, it runs a Lambda function at a pre-defined time interval which automatically checks your compute environments, extends any capacity that is within it's look-ahead range, and notifies an administrator if an approval is required. This allows you to set the level of control you desire over these automated extensions.
Use Capacity Blocks Manager today by heading over to the installation instructions in the aws-samples github repository. For more guidance on how to configure CBM, please check the Github repository pages.
Disclaimer: This sample is for demonstration purposes. For production, please adjust it for your use case and ensure extensions and reservations are done within your use cases needs. For any questions about Capacity Blocks or this tool, please reach out to the authors.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments