
Getting started with GPU infrastructure on AWS: Practical tips for obtaining GPU instances
With overwhelming demand for AI, access to GPU accelerators has become critical. In this post, you will learn a simple 3-step process to effectively obtain and leverage GPU resources on AWS.
Introduction: Getting started with GPUs
Determine optimal AWS requirements across capacity products, instance types and region
Determining region availability
Determining the appropriate capacity product
Flexible access to premium GPUs: NVIDIA A100, H100, H200
SageMaker HyperPod flexible training plans
On-demand access to cost-effective GPUs: NVIDIA T4, A10, L4, L40S
Execute prerequisites and service quotas before getting started
Quotas for EC2 Capacity Blocks for ML
Quotas for EC2 On-Demand and Spot
Quotas for SageMaker Hyperpod Flexible Training Plans
- Create your strategic demand plan
- Determine the optimal AWS requirements including capacity product, GPU instance types and region for your workloads
- Execute the appropriate service quotas and prerequisites before you get started
- Use Case (e.g. distributed training)
- GPU Accelerator (e.g. NVIDIA A100)
- Memory/VRAM (e.g. 40GB)
- Desired Price per Accelerator (i.e. based on your budget)
- Start Date (i.e. based on your project timelines)
- End Date (i.e. based on your project timelines)
- Instance Type: For example,
p4d.24xlarge
corresponds to NVIDIA A100 GPUs. See Accelerated Computing Instance Types. - GPU Accelerators per Instance: Find this under the "GPU" column on Accelerated Computing Instance Types. For example:
- p4d (NVIDIA A100), p5 (NVIDIA H100), p5e, p5en (NVIDIA H200) instances have
8
GPUs. - While G4 (NVIDIA T4), G5 (NVIDIA A10G), G6 (NVIDIA L4) and G6e (NVIDIA L40S) instances vary from
1
,4
,8
GPUs depending on the instance size i.e.12xlarge
,24xlarge
consists of 4 GPUs per instance, while48xlarge
consists of 8 GPUs per instance.
- AWS Region: GPU availability highly depends on the AWS region. While demand requirements and fluctuates over time. One rule of thumb is that accelerators are more available on US regions such as
us-east-1,
us-west-2
andus-east-2
.
- AWS Capacity Product: AWS provides a range of GPU capacity products tailored to different workload requirements, offering flexibility in duration, pricing, and management. Choosing the right capacity product involves aligning your workload needs with the capabilities of Amazon EC2 and Amazon SageMaker AI.
- EC2 offers raw compute power with full infrastructure control, ideal for teams that require extensive customization and are comfortable managing scaling, monitoring, and maintenance.
- SageMaker is a fully managed service that simplifies machine learning workflows by automating infrastructure management, scaling, and maintenance
Note: Instance types under SageMaker have anml
prefix, for example,ml.p4d.24xlarge
corresponds to SageMaker, whilep4d.24xlarge
(without prefix) corresponds to Amazon EC2.
NVIDIA A100, H100, NVIDIA H200 are available on p4d, p5 and p5e/p5en instances.
- EC2 Capacity Blocks for ML provides an up-front quote based on the instance type and period you select. You can also extend reservations.
- Flexible training plans (generally available on Dec 2024) leverage a similar concept to capacity blocks for SageMaker Hyperpod, which supports managed multi-node distributed training with Kubernetes and Amazon Elastic Kubernetes Service (EKS) or Slurm.
- To find availability, use the AWS Console or SearchTrainingPlanOfferings API or CLI.
- For workloads that tolerate interruptions, P and G instances are available as spot instances.
- For predictable long-term workloads, consider Instance Savings Plans (ISPs) with discounted pricing for 1-year and 3-year commitment options.
- For optimization, AWS Trainium and Inferentia are purpose-built accelerators for training and inference. They require optimization for AWS Neuron SDK with the ability for low-level optimization with Neuron Kernel Interface (NKI). Contact your AWS account team to explore joint engineering engagements for your workload.
- Note that older generation instances such as P3 (NVIDIA V100) are no longer available, so consider newer generation instances for improved price-performance.
- Number of training plans per Region (search
training-plan-total_count
) - Reserved-capacity quota for a specific instance (search
reserved-capacity-ml-<instance-type>
)
- Cluster Usage quota for a specific instance (search
for cluster usage
) - Training jobs quota for a specific instance (search
for training job usage
) - Maximum size of EBS volume in GB for a SageMaker HyperPod cluster instance (recommended value is
2000 GB
)
- Maximum number instances allowed per SageMaker HyperPod cluster*
- Total number of instances allowed across SageMaker HyperPod cluster*
- Cluster usage for specific instances (search
for cluster usage
) - Cluster usage for the controller (head) node instances*
- Maximum size of EBS volume in GB for a SageMaker HyperPod cluster instance, the recommended value is 2000 GB
- Engaging your AWS account team with a Case ID associated with the service quota increase request
- Adding a Use Case Description in the case created by the service quota increase request
- Use Case (e.g. distributed training)
- GPU Accelerator (e.g. NVIDIA A100)
- Memory/VRAM (e.g. 40GB)
- Desired Price per Accelerator (i.e. based on your budget)
- Start Date (i.e. based on your project timelines)
- End Date (i.e. based on your project timelines)
- Instance Type
- GPU Accelerators per Instance
- AWS Region
- AWS Capacity Product
- Navigating service quotas and proof-of-concepts
- Selecting suitable AWS capacity products for your specific use cases
- Planning for future GPU needs based on AWS roadmap
- Negotiating long-term commitments
- Accessing early availability of new GPUs
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.