“Rustifying” Serverless: Boost AWS Lambda performance with Rust

“Rustifying” Serverless: Boost AWS Lambda performance with Rust

Discover how to deploy Rust functions using AWS SAM and cargo-lambda. Learn to integrate Rust into Python Lambda functions with PyO3 and maturin. Find out how Rust can optimize Lambda functions, including developing Lambda extensions, without requiring a rewrite of your existing codebase.

Published Dec 17, 2023
Meet GammaRay, a fictional AWS dashboard company, expanded to include S3 tools due to customer demand. Initially, they developed a Python and Serverless-based MVP that was well-received. As popularity grew, new features were added. Over time, performance issues and rising IT costs emerged. Realizing Python's limitations, they integrated Rust to improve performance and cost efficiency, focusing on Rust's strengths in runtime performance and concurrency. This strategy aimed to reduce Lambda invocation times and costs.
This post will describe the process they used to transition their existing serverless application to Rust.
Rewriting everything in Rust from scratch wasn't practical; they had a working product and couldn't halt development for a complete rewrite. This post outlines several strategies:
  1. Initially integrating Rust with existing Python Lambdas to avoid a total overhaul.
  2. Gradually transitioning entire Lambdas to Rust.
  3. Sharing common behaviors across the Lambda fleet using extensions.
Each strategy varies in its impact on performance improvement, development effort required, and whether the solution is broad-based or specifically targeted.
Let’s start with the high-level architecture to better understand the issues they were facing.

High-Level Architecture

Their application, a straightforward serverless solution, integrates an API Gateway with multiple Lambda functions. It features a dedicated authorizer Lambda for validating requests against a DynamoDB table, ensuring secure access. While the application contains many Lambdas, this post will focus on a Lambda responsible for listing all S3 buckets in a customer's account.
Developed primarily in Python, the application also includes additional Lambdas created using Node.js and Java. For deployment, they leverage AWS SAM, ensuring a streamlined and efficient process.
Having gained a clear understanding of their application's structure and functionality, let's delve into the specific solutions they implemented to enhance its performance and capabilities.

Rust Bindings

We will zoom into the 'List Buckets' Lambda, a key component in the serverless application, widely utilized across various parts of the application. Enhancing its performance was crucial due to its complexity and broad impact. Instead of a full-scale rewrite in Rust, which posed considerable risks, they opted to partially optimize the code using Rust bindings. This decision balanced development ease against the potential for performance gains. Their approach focused on using PyO3 and Maturin, tools that seamlessly integrate Rust functionalities into Python, enriching their application with the efficiency of Rust while maintaining Python's flexibility.
After examining the Python code, they concluded that optimizing the following sections would have the greatest impact on performance.
The code comprises two main parts: the first makes a boto3 call to fetch all available S3 buckets, and the second involves an API call to determine the region of each S3 bucket.

Using PyO3

PyO3 is a Rust library that facilitates the use of Python and Rust together, allowing both languages to operate smoothly within the same program.
This is Rust code defining a Python module. We are adding a new class definition to the Python module.
Next, we define the actual Python class as a Rust struct, complete with associated methods. We've included a constructor to initialize the Rust AWS SDK client, which we'll use to initialize the Python class outside the Lambda handler for enhanced performance during both hot and cold starts.

Using Maturin

After finalizing the code, we package the binary as a valid .whl package using Maturin.
To minimize the binary size, build it in release mode, as debug binaries are too large for Lambda ZIP packaging due to the 50MB file size limit. For Rust code debugging, Docker packaging is the only viable option.
Ensure the binary is compiled for the correct architecture and platform to avoid LIBC errors during Rust binding execution. This can be achieved using Zig or by adding the manylinux2014 platform option. Maturin provides a container for compiling the binary on the correct platform, as detailed in their documentation.
Incorporating the new package into your Python code is straightforward. Import the module and initialize the class, ensuring the class is initialized outside the handler for better performance during both cold and warm starts.
The Lambda deployment process remains unchanged. However, when using SAM, which caches your packages, you might encounter an issue where your latest changes are not reflected in the rebuilt application. To resolve this, delete the cache folder or use the 'no-cache' flag.

Code Hints

When integrating Rust bindings into your Python module, consider adding code hints. Since your .whl package doesn't include the source code, features like auto-complete in your IDE won't be available. Create a .pyi file as a stub for your implemented classes and methods. Place this file in the root directory of your Rust library. Maturin will include this file in your .whl package.

Performance Gains

For testing, we used two tools: one for measuring cold latency, developed by Lumigo and Yan Cui, and another for warm latency and cost analysis, developed by Alex Casalboni.

Warm Start Performance

Here, we see that Python and Rust deliver comparable performance at 1024 MB of memory. Rust achieves similar performance with just 256 MB of memory, a recurring theme: Rust typically outperforms Python with lower memory configurations.

Cost Gains

Rust is about three times more cost-effective across the board. Note the spike in Rust's cost at 1GB of memory, due to the negligible performance gain at this memory level.

Cold Start Performance

The pure Python Lambda is roughly three times slower. We believe this is because we replaced all Boto3 references with the AWS Rust SDK, eliminating Boto3 from our Lambda package and ceasing any Boto3 code initialization.

Rust Lambda

Their efforts on the "List Buckets" Lambda yielded remarkable results. By leveraging Rust bindings, they were able to enhance performance and reduce costs without a complete overhaul. Additionally, they identified another critical Lambda—the authorization Lambda—that, upon optimization, could significantly boost their application's responsiveness.
This Lambda is triggered for every request in our system. Thus, enhancing its performance would beneficially impact the entire application. Moreover, due to its simplicity, rewriting this Lambda in Rust is viable and carries minimal risk.
Such a rewrite, though more demanding in development effort, promises substantial performance improvements.
Upon examining the Lambda's handler, they found that it retrieves the authorization header and verifies it against data in DynamoDB.
For a Rust Lambda, they use AWS SAM combined with cargo-lambda. AWS SAM manages the Lambda's packaging and deployment, while cargo-lambda, a cargo sub-command, oversees the Lambda's compilation for the runtime environment. AWS SAM directly interfaces with cargo-lambda, simplifying the process for those unfamiliar with the command.
Before diving into Rust Lambda development, understanding the inner workings of a Lambda upon invocation is crucial.

Lambda Runtime

The Lambda Service, developed by AWS engineers, orchestrates the Lambda flow and is beyond our control.
The execution environment, essentially a configured container, encompasses three primary components:
  1. The runtime API, acting as an interface with the Lambda service, handles tasks like receiving event details and error reporting.
  2. Your code, which processes event details.
  3. The Lambda runtime, translating API interactions for your code, invoking your handler for new events, and communicating responses back to the runtime API.
Not all runtimes are created equally. AWS provides out-of-the-box runtimes for languages like Python and Node.js. If your chosen language lacks an AWS-provided runtime, creating one becomes necessary. As AWS doesn't offer a Rust runtime, they had to develop it.
Fortunately, AWS provides a Rust runtime in the form of a crate. While it can't be selected directly from the console, it can be seamlessly integrated into your code, facilitating the creation of a Lambda based on Rust.
Now, we are set to craft our Rust Lambda, which always comprises two segments:
  1. Runtime implementation
  2. Handler code

Coding time

We build an asynchronous service using Tokio, initializing global resources like AWS SDK clients outside the handler to enhance both cold and warm start latency. We then formulate a service function that summons the Lambda handler within a closure, ultimately executing the service. For those acquainted with Tower, it's a Tower service.
This is the runtime implementation
Next is crafting the actual handler. In Rust, defining types is essential. The AWS Rust team provides the aws_lambda_events crate, adding necessary types to your project. This crate is crucial for parsing event payloads efficiently. To minimize the final binary size, use the specific feature for your event type.
Regarding the Lambda's definition in AWS SAM, the build method we utilize is cargo-lambda. Ensure cargo-lambda is installed beforehand, as AWS SAM does not automate this. When using your runtime, the handler name remains constant, labeled as 'bootstrap', and the runtime is designated as 'provided'.
Now, let's assess the performance, mirroring our approach for the previous solution.

Performance Gains

Warm Start Performance

The results are exceptional. Rust consistently delivers impressive performance, even with a memory allocation as low as 128 MB.

Cost Gains

Rust proves to be more economical, reducing costs by a factor of two to three.

Cold Start Performance

The most notable difference is observed in cold starts. The Rust Lambda outperforms the Python Lambda by approximately eight times, showcasing extraordinary efficiency.

Lambda Extensions

As the application expanded and more teams joined to introduce new features, a shared need across all Lambdas for robust analytics was noticed.
Each Lambda execution often encompasses multiple actions, each meticulously recorded. These events are queued in an SQS and later processed by another Lambda. To streamline this process and avoid redundant coding across different Lambdas, we sought a solution that allowed for efficient integration while enhancing performance and resilience through features like batching and retrying. Enter Lambda Extensions.

Behind the Scenes

Remember, our Lambda execution environment consists of three elements: the Runtime API, the Lambda Runtime, and our handler code.
Lambda extensions introduce an additional process in this environment, functioning separately with its own lifecycle events and distinct API interactions. They communicate with handler code via inter-process communication methods, such as HTTP servers.
Extensions are particularly valuable for tasks like analytics. In runtimes limited in parallelism or asynchronous capabilities, extensions enable a sort of parallel processing. You can initiate a task, continue with your primary code, and then synchronize at the end of the execution.
Furthermore, extensions benefit from a unique lifecycle feature. Unlike the standard Lambda handler that freezes upon completing its task, an extension can continue processing post-handler execution. This capability allows for additional computations, like analytics, to occur without delaying the response time. The only trade-off here is financial, as extended execution time incurs additional costs.
Why Rust for extensions? Efficiency and dependency management. Since extensions share computational resources with your handler, highly efficient code is paramount. Moreover, if your extension requires runtime dependencies (like the Python runtime), packaging them increases both size and memory footprint.

Back to Code

We use Cargo Lambda for extension development, the same tool AWS SAM utilizes for Lambda function builds. Cargo Lambda, particularly its 'extension' sub-command, is tailored for building and deploying extensions as layers.
Build your extension with the 'build' command and the 'extension' flag, ensuring you're in release mode to keep the zipped layer under 50 MB.
Deploying the extension as a layer is straightforward with the 'deploy' sub-command.
Just as with a Rust-based Lambda, developing an extension requires crafting both the extension “handler” and the runtime.
In the code, we start by establishing a channel, then set up a server using Axum, and kick off the extension handler. The handler is responsible for pulling messages from the channel and forwarding them to SQS.
You can communicate with the extension using any standard HTTP library.

Performance Gains

Testing the performance enhancements presented challenges, as AWS doesn’t provide comprehensive metrics for extensions.
We approached this by testing from the client’s perspective, using curl for an end-to-end assessment. Tests were conducted with and without the extension.
The results showed a performance boost of approximately 100 to 200 ms.
However, there's an increase in cold start times, around 100 ms in our scenario. This was a trade-off we accepted for the benefits of encapsulation and improved warm start performance.


In conclusion, the successful integration of Rust into a serverless application underscores the effectiveness of various approaches:
  1. Initially integrating Rust with existing Python Lambdas requires less development effort compared to other strategies but has a lesser effect in terms of performance improvement.
  2. Gradually transitioning entire Lambdas to Rust requires more development effort but has the greatest effect on performance.
  3. Sharing common behaviors across Lambda extensions provides the ability to improve performance in multiple Lambdas with a single change.