
“Rustifying” Serverless: Boost AWS Lambda performance with Rust
Discover how to deploy Rust functions using AWS SAM and cargo-lambda. Learn to integrate Rust into Python Lambda functions with PyO3 and maturin. Find out how Rust can optimize Lambda functions, including developing Lambda extensions, without requiring a rewrite of your existing codebase.
Published Dec 17, 2023
Meet GammaRay, a fictional AWS dashboard company, expanded to include S3 tools due to customer demand. Initially, they developed a Python and Serverless-based MVP that was well-received. As popularity grew, new features were added. Over time, performance issues and rising IT costs emerged. Realizing Python's limitations, they integrated Rust to improve performance and cost efficiency, focusing on Rust's strengths in runtime performance and concurrency. This strategy aimed to reduce Lambda invocation times and costs.
This post will describe the process they used to transition their existing serverless application to Rust.
Rewriting everything in Rust from scratch wasn't practical; they had a working product and couldn't halt development for a complete rewrite. This post outlines several strategies:
- Initially integrating Rust with existing Python Lambdas to avoid a total overhaul.
- Gradually transitioning entire Lambdas to Rust.
- Sharing common behaviors across the Lambda fleet using extensions.
Each strategy varies in its impact on performance improvement, development effort required, and whether the solution is broad-based or specifically targeted.
Let’s start with the high-level architecture to better understand the issues they were facing.

Developed primarily in Python, the application also includes additional Lambdas created using Node.js and Java. For deployment, they leverage AWS SAM, ensuring a streamlined and efficient process.
Having gained a clear understanding of their application's structure and functionality, let's delve into the specific solutions they implemented to enhance its performance and capabilities.
We will zoom into the 'List Buckets' Lambda, a key component in the serverless application, widely utilized across various parts of the application. Enhancing its performance was crucial due to its complexity and broad impact. Instead of a full-scale rewrite in Rust, which posed considerable risks, they opted to partially optimize the code using Rust bindings. This decision balanced development ease against the potential for performance gains. Their approach focused on using PyO3 and Maturin, tools that seamlessly integrate Rust functionalities into Python, enriching their application with the efficiency of Rust while maintaining Python's flexibility.
After examining the Python code, they concluded that optimizing the following sections would have the greatest impact on performance.

The code comprises two main parts: the first makes a boto3 call to fetch all available S3 buckets, and the second involves an API call to determine the region of each S3 bucket.
PyO3 is a Rust library that facilitates the use of Python and Rust together, allowing both languages to operate smoothly within the same program.
This is Rust code defining a Python module. We are adding a new class definition to the Python module.

Next, we define the actual Python class as a Rust struct, complete with associated methods. We've included a constructor to initialize the Rust AWS SDK client, which we'll use to initialize the Python class outside the Lambda handler for enhanced performance during both hot and cold starts.

After finalizing the code, we package the binary as a valid .whl package using Maturin.

Ensure the binary is compiled for the correct architecture and platform to avoid LIBC errors during Rust binding execution. This can be achieved using Zig or by adding the
manylinux2014
platform option. Maturin provides a container for compiling the binary on the correct platform, as detailed in their documentation.
The Lambda deployment process remains unchanged. However, when using SAM, which caches your packages, you might encounter an issue where your latest changes are not reflected in the rebuilt application. To resolve this, delete the cache folder or use the 'no-cache' flag.
When integrating Rust bindings into your Python module, consider adding code hints. Since your .whl package doesn't include the source code, features like auto-complete in your IDE won't be available. Create a .pyi file as a stub for your implemented classes and methods. Place this file in the root directory of your Rust library. Maturin will include this file in your .whl package.

For testing, we used two tools: one for measuring cold latency, developed by Lumigo and Yan Cui, and another for warm latency and cost analysis, developed by Alex Casalboni.



Their efforts on the "List Buckets" Lambda yielded remarkable results. By leveraging Rust bindings, they were able to enhance performance and reduce costs without a complete overhaul. Additionally, they identified another critical Lambda—the authorization Lambda—that, upon optimization, could significantly boost their application's responsiveness.

This Lambda is triggered for every request in our system. Thus, enhancing its performance would beneficially impact the entire application. Moreover, due to its simplicity, rewriting this Lambda in Rust is viable and carries minimal risk.
Such a rewrite, though more demanding in development effort, promises substantial performance improvements.
Upon examining the Lambda's handler, they found that it retrieves the authorization header and verifies it against data in DynamoDB.

For a Rust Lambda, they use AWS SAM combined with cargo-lambda. AWS SAM manages the Lambda's packaging and deployment, while cargo-lambda, a cargo sub-command, oversees the Lambda's compilation for the runtime environment. AWS SAM directly interfaces with cargo-lambda, simplifying the process for those unfamiliar with the command.
Before diving into Rust Lambda development, understanding the inner workings of a Lambda upon invocation is crucial.

The Lambda Service, developed by AWS engineers, orchestrates the Lambda flow and is beyond our control.
The execution environment, essentially a configured container, encompasses three primary components:
- The runtime API, acting as an interface with the Lambda service, handles tasks like receiving event details and error reporting.
- Your code, which processes event details.
- The Lambda runtime, translating API interactions for your code, invoking your handler for new events, and communicating responses back to the runtime API.
Not all runtimes are created equally. AWS provides out-of-the-box runtimes for languages like Python and Node.js. If your chosen language lacks an AWS-provided runtime, creating one becomes necessary. As AWS doesn't offer a Rust runtime, they had to develop it.
Fortunately, AWS provides a Rust runtime in the form of a crate. While it can't be selected directly from the console, it can be seamlessly integrated into your code, facilitating the creation of a Lambda based on Rust.
Now, we are set to craft our Rust Lambda, which always comprises two segments:
- Runtime implementation
- Handler code

This is the runtime implementation
Next is crafting the actual handler. In Rust, defining types is essential. The AWS Rust team provides the aws_lambda_events crate, adding necessary types to your project. This crate is crucial for parsing event payloads efficiently. To minimize the final binary size, use the specific feature for your event type.

Regarding the Lambda's definition in AWS SAM, the build method we utilize is cargo-lambda. Ensure cargo-lambda is installed beforehand, as AWS SAM does not automate this. When using your runtime, the handler name remains constant, labeled as 'bootstrap', and the runtime is designated as 'provided'.

Now, let's assess the performance, mirroring our approach for the previous solution.



As the application expanded and more teams joined to introduce new features, a shared need across all Lambdas for robust analytics was noticed.


Lambda extensions introduce an additional process in this environment, functioning separately with its own lifecycle events and distinct API interactions. They communicate with handler code via inter-process communication methods, such as HTTP servers.
Extensions are particularly valuable for tasks like analytics. In runtimes limited in parallelism or asynchronous capabilities, extensions enable a sort of parallel processing. You can initiate a task, continue with your primary code, and then synchronize at the end of the execution.

Why Rust for extensions? Efficiency and dependency management. Since extensions share computational resources with your handler, highly efficient code is paramount. Moreover, if your extension requires runtime dependencies (like the Python runtime), packaging them increases both size and memory footprint.
We use Cargo Lambda for extension development, the same tool AWS SAM utilizes for Lambda function builds. Cargo Lambda, particularly its 'extension' sub-command, is tailored for building and deploying extensions as layers.
Build your extension with the 'build' command and the 'extension' flag, ensuring you're in release mode to keep the zipped layer under 50 MB.
Deploying the extension as a layer is straightforward with the 'deploy' sub-command.
Just as with a Rust-based Lambda, developing an extension requires crafting both the extension “handler” and the runtime.
In the code, we start by establishing a channel, then set up a server using Axum, and kick off the extension handler. The handler is responsible for pulling messages from the channel and forwarding them to SQS.


You can communicate with the extension using any standard HTTP library.

Testing the performance enhancements presented challenges, as AWS doesn’t provide comprehensive metrics for extensions.
We approached this by testing from the client’s perspective, using curl for an end-to-end assessment. Tests were conducted with and without the extension.

However, there's an increase in cold start times, around 100 ms in our scenario. This was a trade-off we accepted for the benefits of encapsulation and improved warm start performance.

In conclusion, the successful integration of Rust into a serverless application underscores the effectiveness of various approaches:
- Initially integrating Rust with existing Python Lambdas requires less development effort compared to other strategies but has a lesser effect in terms of performance improvement.
- Gradually transitioning entire Lambdas to Rust requires more development effort but has the greatest effect on performance.
- Sharing common behaviors across Lambda extensions provides the ability to improve performance in multiple Lambdas with a single change.