Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

AWS Logo
Menu
Run a PyTorch Model on AWS Inferentia2

Run a PyTorch Model on AWS Inferentia2

Build a simple multilayer perceptron (MLP) model in PyTorch and run it on AWS Inferentia2

Tanner McRae
Amazon Employee
Published Jul 5, 2024
Last Modified Jul 23, 2024
Photo by Manuel on Unsplash
In this blog post, I’ll demonstrate how to deploy a simple Multilayer Perceptron (MLP) on AWS Inferentia2. We’ll start with a classic problem, predicting California housing prices. I’ll build a simple neural network and deploy it on an Amazon EC2 inf2.XLarge instance.
Even though the model is small enough to run on a CPU, the purpose of this blog is to demonstrate how to run it on Inf2.

Background

AWS Inferentia2 & Neuron: To use these accelerators, AWS created a software development kit (SDK) called AWS Neuron. AWS Neuron includes a deep learning compiler, runtime, and tools that are natively integrated into TensorFlow, PyTorch, and Apache MXNet.
Setup: For these experiments, we’ll use PyTorch 2.1 with Python 3.10 on an inf2.XLarge instance running a Ubuntu 20 AMI.

A Simple Neural Network

To see how Neuron compilation and inference work, let’s build a simple neural network for California housing price prediction. You can access this dataset through Hugging Face, scikit-learn datasets, or other online sources.
Define the Neural Network:
The following neural network takes in our features and outputs a predicted house value.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class HousingPriceModel(nn.Module):
def __init__(self, input_size):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(input_size, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(64, 1) # predict a single value
)

def forward(self, x):
return self.layers(x)
Let’s also write a simple training script to train our model. Since this is such a small model (~50k parameters), we can train it in < 30 seconds on our dataset with a CPU.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def train_model(model, train_loader, val_loader, epochs=10, batch_size=16, lr=1e-3):
loss_fn = nn.MSELoss()
optimizer = Adam(model.parameters(), lr=lr)

for epoch in range(epochs):
train_loss, val_loss = 0.0, 0.0

# Train
model.train()
for inputs, target in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = loss_fn(outputs, target.unsqueeze(1))
loss.backward()
optimizer.step()
train_loss += loss.item()

# Validation
model.eval()
with torch.no_grad():
for inputs, target in val_loader:
outputs = model(inputs)
loss = loss_fn(outputs, target.unsqueeze(1))
val_loss += loss.item()

# Calculate metrics
rmse_train_loss = np.sqrt(train_loss / len(train_loader))
rmse_val_loss = np.sqrt(val_loss / len(val_loader))

# Printout metrics
print(f"Epoch [{epoch+1}/{epochs}], RMSE: {rmse_train_loss:.4f}")
print(f"Epoch [{epoch+1}/{epochs}], Validation RMSE: {rmse_val_loss:.4f}")

# Save our checkpoint
checkpoint = {'state_dict': model.state_dict()}
torch.save(checkpoint, 'model.pt')
Train the model
Now lets clean our dataset and kick off the training job.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from sklearn.model_selection import train_test_split

# Download dataset and put in a DataFrame
data = load_dataset("leostelon/california-housing")['train']
df = pd.DataFrame(data)

# Split the DataFrame into training and validation
train_df, val_df = train_test_split(df, test_size=0.2, random_state=42)

# Clean the dataset.
train_dataset = clean_dataset(train_df)
val_dataset = clean_dataset(val_df)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16)

# Get input size
input_size = train_dataset.X.shape[1]

# Get the model
model = HousingPriceModel(input_size)

# Train the model
train_model(model, train_loader, val_loader)
After training, we have a checkpoint saved in the model.pt file. We’ll use it to compile the model using NeuronX for inference.
Save an Example
When compiling a model using Neuron, we need an example to run a trace.
1
2
3
4
5
6
7
8
9
10
11
# Extract a single example
for batch in train_dataset:
# Get the inputs and ignore the targets for now
example_input, _ = batch
# Select the first example and keep the batch dimension
example_input = example_input[0:1]
# We only need the first batch so break the loop
break

# Save the example
torch.save(example_input, 'example_input.pt')

Install Neuron SDK & Dependencies

First, we install all the Neuron dependencies. You can use an existing deep learning AMI (DLAMI). I opted to use a vanilla Ubuntu AMI and install the dependencies myself.
Execute the following commands on the instance.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Configure Linux for Neuron repository updates
. /etc/os-release
sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF
deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main
EOF

wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -

# Update OS packages
sudo apt-get update -y

# Install OS headers
sudo apt-get install linux-headers-$(uname -r) -y

# Install git
sudo apt-get install git -y

# install Neuron Driver
sudo apt-get install aws-neuronx-dkms=2.* -y

# Install Neuron Runtime
sudo apt-get install aws-neuronx-collectives=2.* -y
sudo apt-get install aws-neuronx-runtime-lib=2.* -y

# Install Neuron Tools
sudo apt-get install aws-neuronx-tools=2.* -y

# Add PATH
export PATH=/opt/aws/neuron/bin:$PATH
Next, we’ll install the torch-neuronx package.
1
2
3
4
5
# Set pip repository pointing to the Neuron repository
python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com

# Install Neuron Compiler and Framework
python -m pip install neuronx-cc==2.* --pre torch-neuronx==2.1.* torchvision
And that’s it!

Compile the Model

The PyTorch-Neuron trace() API provides a method to generate PyTorch models for execution on Inferentia2, which can be serialized as TorchScript.
Note: This function is analogous to torch.jit.trace().
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import torch_neuronx

# Use the model definition from the training job.
# If you've been following the tutorial, the input size = 11
model = HousingPriceModel(11)

# Load the checkpoint.
checkpoint = torch.load('model.pt', map_location=torch.device('cpu'))

# Extract the state dictionary
model_state_dict = checkpoint['state_dict']

# Load the state dictionary into the model
model.load_state_dict(model_state_dict)

# Load the example we exported in the previous steps.
example_input = torch.load('example_input.pt')

# Compile the model
model_neuron = torch_neuronx.trace(model, example_input)

# Save the TorchScript for inference
filename = 'model_neuron.pt'
torch.jit.save(model_neuron, filename)

Make a Prediction

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import torch_xla.core.xla_model as xm

# Loads a saved PyTorch model into memory.
model_neuron = torch.jit.load(filename)

# Get an XLA device
device = xm.xla_device()

# Move the model to the XLA device.
# (defaults to a NeuronCore on inf2 instance)
model_neuron = model_neuron.to(device)

def invoke(example):
# Make a prediction using the neuron model.
xla_example = example.to(device)
prediction = model_neuron(example)

# Get the model's prediction, round it to the nearest whole number,
# and adjust it back to the original scale by multiplying by 10,000.
price = round(prediction.item()) * 10000
return {
"house_value": price
}

print(invoke(example_input))

Takeaways

In this blog post we trained a basic PyTorch model and used the NeuronSDK to run inference on an Amazon EC2 inf2.XLarge machine. This is a simple example but can be expanded to run larger / more complicated models on these alternative accelerators.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

2 Comments

Log in to comment