Run a PyTorch Model on AWS Inferentia2

Run a PyTorch Model on AWS Inferentia2

Build a simple multilayer perceptron (MLP) model in PyTorch and run it on AWS Inferentia2

Tanner McRae
Amazon Employee
Published Jul 5, 2024
Photo by Manuel on Unsplash
In this blog post, I’ll demonstrate how to deploy a simple Multilayer Perceptron (MLP) on AWS Inferentia2. We’ll start with a classic problem, predicting California housing prices. I’ll build a simple neural network and deploy it on an Amazon EC2 inf2.XLarge instance.
Even though the model is small enough to run on a CPU, the purpose of this blog is to demonstrate how to run it on Inf2.


AWS Inferentia2 & Neuron: To use these accelerators, AWS created a software development kit (SDK) called AWS Neuron. AWS Neuron includes a deep learning compiler, runtime, and tools that are natively integrated into TensorFlow, PyTorch, and Apache MXNet.
Setup: For these experiments, we’ll use PyTorch 2.1 with Python 3.10 on an inf2.XLarge instance running a Ubuntu 20 AMI.

A Simple Neural Network

To see how Neuron compilation and inference work, let’s build a simple neural network for California housing price prediction. You can access this dataset through Hugging Face, scikit-learn datasets, or other online sources.
Define the Neural Network:
The following neural network takes in our features and outputs a predicted house value.
Let’s also write a simple training script to train our model. Since this is such a small model (~50k parameters), we can train it in < 30 seconds on our dataset with a CPU.
Train the model
Now lets clean our dataset and kick off the training job.
After training, we have a checkpoint saved in the model.pt file. We’ll use it to compile the model using NeuronX for inference.
Save an Example
When compiling a model using Neuron, we need an example to run a trace.

Install Neuron SDK & Dependencies

First, we install all the Neuron dependencies. You can use an existing deep learning AMI (DLAMI). I opted to use a vanilla Ubuntu AMI and install the dependencies myself.
Execute the following commands on the instance.
Next, we’ll install the torch-neuronx package.
And that’s it!

Compile the Model

The PyTorch-Neuron trace() API provides a method to generate PyTorch models for execution on Inferentia2, which can be serialized as TorchScript.
Note: This function is analogous to torch.jit.trace().

Make a Prediction


In this blog post we trained a basic PyTorch model and used the NeuronSDK to run inference on an Amazon EC2 inf2.XLarge machine. This is a simple example but can be expanded to run larger / more complicated models on these alternative accelerators.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

1 Comment