
Cost Effectively Run Simulation Workloads on AWS
Optimize simulation workloads on AWS using strategic GPU selection and spot pricing techniques.
Kareem Abdol-Hamid
Amazon Employee
Published May 13, 2025
Last Modified May 22, 2025
document prepared by Aaron Klotnia and Kareem Abdol-Hamid from the Startups Specialist Organization
The priority for GPUs in the past view years has been for large machine learning workloads, especially high memory bandwidth, high memory, and high compute workloads. However, not all GPUs are made equally, and not all workloads require the same hardware. Simulation workloads are compute bound, meaning the memory is not as crucial to a successful and scalable workload and clock speed is key. This is prescient for many industries including Robotics, Life Sciences, and even traditional ML workloads like reinforcement learning.
Those who are experimenting with new workloads tend to leverage GeForce GPUs due to their higher clock speed, and relatively lower total cost of ownership. However, these have their limitations as you begin to scale. They may be difficult to source, you rack and stack your own instances, and once you need to scale for bursty workloads, you can’t leverage the cloud as they aren’t made for data centers.
If you’re running any workloads like these you likely are considering rightsizing your workloads, as well as the best price-performance, and orchestration options for the cloud. Luckily in this space a more traditional price performance angle can be applied to these use cases.
Before we start, let’s discuss what GPUs are available on AWS.
The following are as of April 2025, always double check official documentation for most up to date information

AWS provides a wide range of GPUs, however typically for simulation workloads you actually see heavy utilization of desktop GPUs like GeForce RTX 4080 and 5090. For those unfamiliar: foundational model training gains more from specialized tensor units than from faster clock speeds, whereas simulation workloads—which are latency‑sensitive and less parallelizable—benefit immediately from every extra MHz. Also, most simulation workloads don’t need more than 32 GB of memory.
The following table provides clock speeds for data center GPUs and those used commonly for simulation workloads.

Note in practice p* instances are not recommended for these type of simulation workloads so we provide capacity block pricing in the tables above
Although there is a lot of focus on these low clock speed with NVIDIA’s frontier GPUs, AWS does provide the L4 and L40S which have comparable clock speeds to the desktop GPUs that are ideal for simulation workloads. The GeForce GPUs are great for prototyping, but as you scale your workloads, require resiliency and longer running jobs, the cloud becomes a better fit for these workloads.
So how do you leverage the cloud to utilize GPUs, while still keeping your workloads cost effective? How do you traditionally leverage the cloud if your workloads are flexible, can run on off hours, and don’t need to be ran in specific? The answer is Spot Pricing.
In order to be able to take advantage of spot pricing, you need to use spot instances. Spot Instances are spare EC2 capacity that can save you up to 90% off of On-Demand prices that AWS can interrupt with a 2-minute notification. When using spot instances, you are taking advantage of spot pricing.
Unlike the static On-demand EC2 instance pricing, spot pricing is dynamic and based on long term supply and demand trends for each Spot Capacity Pool. Spot capacity pools are combinations of EC2 instance families, sizes and availability zones. For example, the current spot price you pay for a g5.xlarge running in az-1 in US-East-1 is different than the spot price you pay for for a g5.xlarge running in az-2 in US-East-2. This raises the question of how you can implement logic to ensure that you are running spot instances from the lowest cost spot capacity pools.
However, price isn't the only part of the equation. Another important thing to understand when using spot instances is the fact that they can be interrupted. Therefore, when using spot instances the question often becomes “How can I run spot instances for as low of a spot price as possible WHILE minimizing the likelihood of spot interruption?” The answer is by being as diversified as possible AND using the appropriate allocation strategy.
When using spot instances, being able to provision instances from a greater number of spot capacity pools will increase the likelihood that your workload can be completed with minimal interruption. For example, if you test your workload on a g5.xlarge and want to exclusively use g5.xlarge instances across the six different availability zones in US-East-1, you are able to provision instances from six different spot capacity pools. You can calculate this number by multiplying the number of instances and availability zones your workload is able to provision. 6 AZs * 1 instance type = 6 Spot capacity pools. Six spot capacity pools is a good start but in an ideal situation, you want to be able to use as many spot capacity pools as possible. In order to increase the number of spot capacity pools we can use, we can use the concept of diversification. Diversification means being flexible and using as many instance types and availability zones as possible that can support our workload. There are a number of different types of flexibility we can use, including the following:
Size: If we update our configuration to provision g5.2xlarges and g5.4xlarges in addition to g5.xlarges, we are now able to take advantage of 18 different spot capacity pools. (3 instance types * 6 AZs = 18 Spot capacity pools).
Instance Family: If we update our configuration to provision g6.xlarges, g6.2xlarges and g6.4xlarges, we are now able to take advantage of 36 Spot capacity pools (6 instance types * 6 AZs = 36 Spot capacity pools).
Region: If we are able to run our workload within either US-East-1 (6 AZs) OR US-East-2 (3 AZs) with our current config, we are now able to take advantage of a total of 52 Spot capacity pools (6 instance types * 9 AZs = 52 Spot capacity pools). Thi
Time: Spot capacity is inherently excess EC2 capacity. By running your workload at night or over the weekend you can reduce the likelihood of interruption.
Now that we have 52 Spot Capacity pools, how do we figure out which of those capacity pools we want to provision an instance from? The answer is Allocation strategy. The allocation strategies that you can use depend on the service that your workload is using. In the case of ParallelCluster, you can take advantage of the following 3 allocation strategies:
lowest-price
| capacity-optimized
| price-capacity-optimized
. Lowest Price is the default.lowest-price
: Spot instances will be provisioned from the spot capacity pool with the lowest current spot price. We don’t typically recommend using this strategy because while it will provide the lowest possible price, this strategy doesn't take into account the likelihood that the instance will be interrupted.capacity-optimized
: Spot instances will be provisioned from the spot capacity pool with the most excess spot capacity.price-capacity-optimized
: Spot instances will be provisioned from the spot capacity pool with a combination of the lowest price and most excess spot capacity. This is the strategy we typically recommend since it will take into account both price and availability, whereas the other two strategies will only take either price OR availability into account, not both.Further information on available spot allocation strategies in AWS Batch and AWS ParallelCluster can be found at the following links:
The above served as an introduction to best practices for leveraging GPU based spot instances. For further information on more spot best practices, the following links to our official AWS documentation on the subject: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-best-practices.html.
There a few ways to orchestrate your compute workloads on AWS, work with your platform and AWS support team to find the best architecture that works for you. Typically you will decide between what’s familiar and what tends to work in a cloud native approach we’ll cover two common architectures but encourage you to look at other resources/guides.
We’ll start with a more “traditional” orchestration layer utilizing Slurm and ParallelCluster.

In the above architecture diagram, a request is routed through the internet gateway and the router to the head node. The head node then responds to the request and orchestrates the scaling logic of the compute fleet and is responsible of attaching new nodes to the cluster. In order to configure ParallelCluster to use GPU based spot instances, you need to make changes to the Slurm configuration file in order to include them. An example of a Slurm configuration file that includes both the GPU based EC2 instances that support your architecture AND deploy them via spot is included below:
In the case of the above configuration file, we are configuring our head node to deploy g6-xlarge, g6-2xlarge, g6e-xlarge, g6e-2xlarge EC2 instances with the Spot purchase option and the capacity-optimized allocation strategy. By using configurations like this, we can allow our ParallelCluster environment to leverage GPU based spot instances.
Further information about possible values in the Slurm configuration file can be found here: https://docs.aws.amazon.com/ParallelCluster/latest/ug/Scheduling-v3.html.
Below we’ll be doing a “Cloud Native” approach, that is to say, we utilize natively integrated cloud services for ease of use and simplified overhead.

In the above architecture diagram:
- User creates a job container, uploads the container to the Amazon Elastic Container Registry or another container registry (for example, DockerHub), and creates a job definition to AWS Batch.
- User submits jobs to a job queue in AWS Batch.
- AWS Batch pulls the image from the container registry and processes the jobs in the queue with GPU based EC2 Spot instances. The specific instance types that are used are based on your specific configuration, an example of which is included below.
- Input and output data from each job is stored in an S3 bucket.
An example of a CloudFormation template that includes both the GPU based EC2 instances that support your architecture AND deploys them via spot is included below:
Deploy:
Submit a Test Job:
The queue will launch whichever Spot GPU size (
g6*
, g6e*
, g5*
) is cheapest and available, courtesy of SPOT_PRICE_CAPACITY_OPTIMIZED
.Just like that we’ve created production sized clusters utilizing the cost effective nature of Spot Instances!
You should now be armed with foundation resources and content now to try out your own workloads using AWS. Experiment with what individual GPU gives the best price/performance value with spot for your workloads. Then scale up to to leverage Spot at scale with your orchestration tooling of choice and enable resilient, cost effective, scalable workloads.
Happy hunting!
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.