AWS Logo
Menu
Getting started with GPU infrastructure on AWS: Practical tips for obtaining GPU instances

Getting started with GPU infrastructure on AWS: Practical tips for obtaining GPU instances

With overwhelming demand for AI, access to GPU accelerators has become critical. In this post, you will learn a simple 3-step process to effectively obtain and leverage GPU resources on AWS.

Daniel Wirjo
Amazon Employee
Published Apr 22, 2025
This post is co-written by Yudho Diponegoro, Senior Solutions Architect and Sumit Jadhav, Senior GTM Specialist, Accelerated Computing.

Introduction: Getting started with GPUs

With overwhelming demand for AI, access to GPU accelerators has become critical for organizations developing machine learning models, running inference workloads, and processing complex computations. AWS offers various GPU options to meet demands, but securing capacity requires strategic planning. In this post, you will learn a simple 3-step process to effectively obtain and leverage GPU resources on AWS:
  1. Create your strategic demand plan
  2. Determine the optimal AWS requirements including capacity product, GPU instance types and region for your workloads
  3. Execute the appropriate service quotas and prerequisites before you get started

Create your demand plan

A demand plan is the process of forecasting the demand for resources so they can be delivered to meet your needs. Demand planning considers factors such as workload patterns, resource requirements, and budget constraints to ensure you have the right GPU capacity when needed.
You create a spreadsheet with the following headings:
  • Use Case (e.g. distributed training)
  • GPU Accelerator (e.g. NVIDIA A100)
  • Memory/VRAM (e.g. 40GB)
  • Desired Price per Accelerator (i.e. based on your budget)
  • Start Date (i.e. based on your project timelines)
  • End Date (i.e. based on your project timelines)

Determine optimal AWS requirements across capacity products, instance types and region

Once you have an understanding of GPU compute requirements, budget and timelines, you can align align your demand plan to AWS requirements. For optimal pricing and capacity, contact your AWS account team or sales support.

Determining instance types

  • Instance Type: For example, p4d.24xlarge corresponds to NVIDIA A100 GPUs. See Accelerated Computing Instance Types.
  • GPU Accelerators per Instance: Find this under the "GPU" column on Accelerated Computing Instance Types. For example:
    • p4d (NVIDIA A100), p5 (NVIDIA H100), p5e, p5en (NVIDIA H200) instances have 8 GPUs.
    • While G4 (NVIDIA T4), G5 (NVIDIA A10G), G6 (NVIDIA L4) and G6e (NVIDIA L40S) instances vary from 1, 4, 8 GPUs depending on the instance size i.e.12xlarge, 24xlarge consists of 4 GPUs per instance, while 48xlarge consists of 8 GPUs per instance.

Determining region availability

  • AWS Region: GPU availability highly depends on the AWS region. While demand requirements and fluctuates over time. One rule of thumb is that accelerators are more available on US regions such as us-east-1, us-west-2 and us-east-2.

Determining the appropriate capacity product

  • AWS Capacity Product: AWS provides a range of GPU capacity products tailored to different workload requirements, offering flexibility in duration, pricing, and management. Choosing the right capacity product involves aligning your workload needs with the capabilities of Amazon EC2 and Amazon SageMaker AI.

Selecting EC2 vs SageMaker

The choice depends on your desired level of control and automation:
  • EC2 offers raw compute power with full infrastructure control, ideal for teams that require extensive customization and are comfortable managing scaling, monitoring, and maintenance.
  • SageMaker is a fully managed service that simplifies machine learning workflows by automating infrastructure management, scaling, and maintenance
Instance availability, pricing and process to secure resources varies between EC2 and SageMaker so it's important to determine the desired capacity product for your workload when planning demand.
Note: Instance types under SageMaker have an ml prefix, for example, ml.p4d.24xlarge corresponds to SageMaker, while p4d.24xlarge (without prefix) corresponds to Amazon EC2.

Flexible access to premium GPUs: NVIDIA A100, H100, H200

NVIDIA A100, H100, NVIDIA H200 are available on p4d, p5 and p5e/p5en instances.
If you require the premium GPUs and your demand is variable and short-term (less than a year), consider leveraging our flexible capacity services. These allow you to reserve GPUs for a specific period of time, similar to booking a hotel. They are suitable for use cases such as ML research and experiments, and model training and fine-tuning.
EC2 Capacity Blocks for ML
SageMaker HyperPod flexible training plans

On-demand access to cost-effective GPUs: NVIDIA T4, A10, L4, L40S

NVIDIA T4, A10, L4, L40s are available on g4dn, g5, g6 and g6e instances
G series instances are available on Amazon EC2 on-demand, with the option of On-Demand Capacity Reservations (ODCR) to additionally ensure that you have access to EC2 capacity when you need it, for as long as you need it - either immediately or on a future date. See concepts for Amazon EC2 reservations.
G series instances are also available on Amazon SageMaker HyperPod on-demand.

Additional options

  • For workloads that tolerate interruptions, P and G instances are available as spot instances.
  • For predictable long-term workloads, consider Instance Savings Plans (ISPs) with discounted pricing for 1-year and 3-year commitment options.
  • For optimization, AWS Trainium and Inferentia are purpose-built accelerators for training and inference. They require optimization for AWS Neuron SDK with the ability for low-level optimization with Neuron Kernel Interface (NKI). Contact your AWS account team to explore joint engineering engagements for your workload.
  • Note that older generation instances such as P3 (NVIDIA V100) are no longer available, so consider newer generation instances for improved price-performance.

Execute prerequisites and service quotas before getting started

Before you can start using GPU resources, you request for appropriate AWS service quotas. This varies depending on the AWS service you use, and typically applies per AWS region and AWS account.
Each service quota has an identifier. For example, EC2 Running On-Demand G and VT instances corresponds to L-DB2E81BA. With the identifier, you can programmatically request and manage the status of a service quota increase via the CLI or Amazon Q Developer, in addition to the AWS Console.

Quotas for EC2 Capacity Blocks for ML

EC2 Capacity Blocks is self-service without any service quota increase requirements. However, see prerequisites.

Quotas for EC2 On-Demand and Spot

Quotas for SageMaker Hyperpod Flexible Training Plans

You evaluate the below quotas before creating a plan with SageMaker Hyperpod Flexible Training Plans:
  1. Number of training plans per Region (search training-plan-total_count)
  2. Reserved-capacity quota for a specific instance (search reserved-capacity-ml-<instance-type>)
And depending on your needs:
  1. Cluster Usage quota for a specific instance (search for cluster usage)
  2. Training jobs quota for a specific instance (search for training job usage)
  3. Maximum size of EBS volume in GB for a SageMaker HyperPod cluster instance (recommended value is 2000 GB)

Quotas for SageMaker Hyperpod

You evaluate the below quotas before creating a cluster with SageMaker Hyperpod:
  1. Maximum number instances allowed per SageMaker HyperPod cluster*
  2. Total number of instances allowed across SageMaker HyperPod cluster*
  3. Cluster usage for specific instances (search for cluster usage)
  4. Cluster usage for the controller (head) node instances*
  5. Maximum size of EBS volume in GB for a SageMaker HyperPod cluster instance, the recommended value is 2000 GB
*Hyperpod Slurm requires additional controller (head) nodes. However, Hyperpod EKS does not require head nodes as it relies on the Kubernetes control plane.

Quotas for SageMaker AI

Having trouble with service quotas?

Why is my service quota low? Or, why is my service quota increase request rejected?
Many customers often need to get up and running quickly, however newly created AWS accounts typically start with minimal quotas due to its usage and billing history. In fact, your account's actual quota value may be less than the AWS default quota value if the account was recently created or if you use the account minimally. As a result, It’s crucial to proactively request increases for limits that impact your workloads.
To accelerate quota requests, consider:
  • Engaging your AWS account team with a Case ID associated with the service quota increase request
  • Adding a Use Case Description in the case created by the service quota increase request

Conclusion

Obtaining GPU instances on AWS requires careful planning and understanding of available options. By developing a comprehensive demand plan, selecting the right instance types, regions, and utilizing appropriate AWS service and capacity products, you can ensure you have the GPU resources needed for your AI and ML workloads.

Next Steps

Start creating your demand plan using the format below, and engage your AWS account team or contact sales support.
  • Use Case (e.g. distributed training)
  • GPU Accelerator (e.g. NVIDIA A100)
  • Memory/VRAM (e.g. 40GB)
  • Desired Price per Accelerator (i.e. based on your budget)
  • Start Date (i.e. based on your project timelines)
  • End Date (i.e. based on your project timelines)
  • Instance Type
  • GPU Accelerators per Instance
  • AWS Region
  • AWS Capacity Product
To secure optimal pricing and capacity, your AWS account team can also provide guidance on:
  1. Navigating service quotas and proof-of-concepts
  2. Selecting suitable AWS capacity products for your specific use cases
  3. Planning for future GPU needs based on AWS roadmap
  4. Negotiating long-term commitments
  5. Accessing early availability of new GPUs
     

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments