Run a PyTorch Model on AWS Inferentia2

Photo by Manuel on Unsplash

In this blog post, I’ll demonstrate how to deploy a simple Multilayer Perceptron (MLP) on AWS Inferentia2. We’ll start with a classic problem, predicting California housing prices. I’ll build a simple neural network and deploy it on an Amazon EC2 inf2.XLarge instance.

Even though the model is small enough to run on a CPU, the purpose of this blog is to demonstrate how to run it on Inf2.

Background

AWS Inferentia2 & Neuron: To use these accelerators, AWS created a software development kit (SDK) called AWS Neuron. AWS Neuron includes a deep learning compiler, runtime, and tools that are natively integrated into TensorFlow, PyTorch, and Apache MXNet.

Setup: For these experiments, we’ll use PyTorch 2.1 with Python 3.10 on an inf2.XLarge instance running a Ubuntu 20 AMI.

A Simple Neural Network

To see how Neuron compilation and inference work, let’s build a simple neural network for California housing price prediction. You can access this dataset through Hugging Face, scikit-learn datasets, or other online sources.

Define the Neural Network:

The following neural network takes in our features and outputs a predicted house value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class HousingPriceModel(nn.Module):
    def __init__(self, input_size):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_size, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 64), 
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(64, 1) # predict a single value 
        )

    def forward(self, x):
        return self.layers(x)

Let’s also write a simple training script to train our model. Since this is such a small model (~50k parameters), we can train it in < 30 seconds on our dataset with a CPU.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def train_model(model, train_loader, val_loader, epochs=10, batch_size=16, lr=1e-3): 
    loss_fn = nn.MSELoss()
    optimizer = Adam(model.parameters(), lr=lr) 
    
    for epoch in range(epochs):
        train_loss, val_loss = 0.0, 0.0
         
        # Train
        model.train()
        for inputs, target in train_loader:            
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = loss_fn(outputs, target.unsqueeze(1))
            loss.backward()
            optimizer.step()
            train_loss += loss.item()
        
        # Validation
        model.eval()
        with torch.no_grad():
            for inputs, target in val_loader:                
                outputs = model(inputs)
                loss = loss_fn(outputs, target.unsqueeze(1))    
                val_loss += loss.item()
                
        # Calculate metrics
        rmse_train_loss = np.sqrt(train_loss / len(train_loader))
        rmse_val_loss = np.sqrt(val_loss / len(val_loader))
        
        # Printout metrics
        print(f"Epoch [{epoch+1}/{epochs}], RMSE: {rmse_train_loss:.4f}")
        print(f"Epoch [{epoch+1}/{epochs}], Validation RMSE: {rmse_val_loss:.4f}")
       
    # Save our checkpoint
    checkpoint = {'state_dict': model.state_dict()}
    torch.save(checkpoint, 'model.pt')

Train the model

Now lets clean our dataset and kick off the training job.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from sklearn.model_selection import train_test_split

# Download dataset and put in a DataFrame
data = load_dataset("leostelon/california-housing")['train']
df = pd.DataFrame(data)

# Split the DataFrame into training and validation
train_df, val_df = train_test_split(df, test_size=0.2, random_state=42)

# Clean the dataset.
train_dataset = clean_dataset(train_df)
val_dataset = clean_dataset(val_df)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16)

# Get input size
input_size = train_dataset.X.shape[1]

# Get the model
model = HousingPriceModel(input_size)

# Train the model
train_model(model, train_loader, val_loader)

After training, we have a checkpoint saved in the model.pt file. We’ll use it to compile the model using NeuronX for inference.

Save an Example

When compiling a model using Neuron, we need an example to run a trace.

1
2
3
4
5
6
7
8
9
10
11
# Extract a single example
for batch in train_dataset:
    # Get the inputs and ignore the targets for now
    example_input, _ = batch
    # Select the first example and keep the batch dimension
    example_input = example_input[0:1]
    # We only need the first batch so break the loop
    break

# Save the example
torch.save(example_input, 'example_input.pt')

Install Neuron SDK & Dependencies

First, we install all the Neuron dependencies. You can use an existing deep learning AMI (DLAMI). I opted to use a vanilla Ubuntu AMI and install the dependencies myself.

Execute the following commands on the instance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Configure Linux for Neuron repository updates
. /etc/os-release
sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF
deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main
EOF
wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -

# Update OS packages 
sudo apt-get update -y

# Install OS headers 
sudo apt-get install linux-headers-$(uname -r) -y

# Install git 
sudo apt-get install git -y

# install Neuron Driver
sudo apt-get install aws-neuronx-dkms=2.* -y

# Install Neuron Runtime 
sudo apt-get install aws-neuronx-collectives=2.* -y
sudo apt-get install aws-neuronx-runtime-lib=2.* -y

# Install Neuron Tools 
sudo apt-get install aws-neuronx-tools=2.* -y

# Add PATH
export PATH=/opt/aws/neuron/bin:$PATH

Next, we’ll install the torch-neuronx package.

1
2
3
4
5
# Set pip repository pointing to the Neuron repository 
python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com

# Install Neuron Compiler and Framework
python -m pip install neuronx-cc==2.* --pre torch-neuronx==2.1.* torchvision

And that’s it!

Compile the Model

The PyTorch-Neuron trace() API provides a method to generate PyTorch models for execution on Inferentia2, which can be serialized as TorchScript.

Note: This function is analogous to torch.jit.trace().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import torch_neuronx

# Use the model definition from the training job.
# If you've been following the tutorial, the input size = 11
model = HousingPriceModel(11)

# Load the checkpoint.
checkpoint = torch.load('model.pt', map_location=torch.device('cpu'))

# Extract the state dictionary
model_state_dict = checkpoint['state_dict']

# Load the state dictionary into the model
model.load_state_dict(model_state_dict)

# Load the example we exported in the previous steps.
example_input = torch.load('example_input.pt')

# Compile the model
model_neuron = torch_neuronx.trace(model, example_input)

# Save the TorchScript for inference 
filename = 'model_neuron.pt'
torch.jit.save(model_neuron, filename)

Make a Prediction

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import torch_xla.core.xla_model as xm

# Loads a saved PyTorch model into memory. 
model_neuron = torch.jit.load(filename)

# Get an XLA device
device = xm.xla_device()

# Move the model to the XLA device. 
# (defaults to a NeuronCore on inf2 instance)
model_neuron = model_neuron.to(device)

def invoke(example):
    # Make a prediction using the neuron model.
    xla_example = example.to(device)
    prediction = model_neuron(example)

    # Get the model's prediction, round it to the nearest whole number, 
    # and adjust it back to the original scale by multiplying by 10,000.
    price = round(prediction.item()) * 10000
    return {
        "house_value": price
    }

print(invoke(example_input))

Takeaways

In this blog post we trained a basic PyTorch model and used the NeuronSDK to run inference on an Amazon EC2 inf2.XLarge machine. This is a simple example but can be expanded to run larger / more complicated models on these alternative accelerators.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.