AWS Logo
Menu

AWS Lambda performance with Java 21: x86 vs arm64 - Part 1 Initial measurements and comparisons

On July 18, 2024, AWS announced AWS Lambda support for SnapStart for Java functions that use the ARM64 architecture. It's time for some measurements and comparisons!

Published Jul 29, 2024

Introduction

Until now I haven't measured the performance (warm and cold start times) of the Lambda functions using Java 21 runtime for some use cases (like making DynamoDB request) for arm64 architecture because it hasn't supported SnapStart. Lambda with Java 21 runtime with x86_64 architecture with SnapStart enabled will outperform (even more so with additional priming optimization) Lambda with with arm64 architecture. But on July 18, 2024, AWS announced that AWS Lambda now supports SnapStart for Java functions that use the ARM64 architecture. So now it made sense for me to start measuring the cold and warm start times also considering the choice of the architecture of the Lambda function. It's known that with the current AWS Lambda pricing memory setting and execution duration, Lambda with arm64 architecture will be approx. 25% cheaper than Lambda with x86_64 architecture.

Measuring the cold and warm starts for the example application

In our experiment we'll re-use the application introduced in article AWS Lambda SnapStart - Measuring Java 21 Lambda cold starts . Here is the code for the sample application. There are basically 2 main Lambda functions which both respond to the API Gateway requests to create the product with the given id (see PutProductFunction Lambda function) and retrieve the product by the given id (see GetProductByIdFunction Lambda function). You can use both Lambda functions with and without SnapStart enabled. There is an additional Lambda function GetProductByIdWithPrimingFunction which I wrote to independently measure the effect of DynamoDB request priming for the SnapStart enabled Lambda function. You can read more about the effect of priming in my article AWS Lambda SnapStart - Measuring priming, end to end latency and deployment time .
To enable SnapStart on all Lambda functions please uncomment the following in the SAM template :
If I'd like to use SnapStart only for the individual, but not all Lambda functions you have to apply this SnapStart definition on the Lambda function level instead of the global function level.
All Lambda functions have the following settings as the starting point:
  • 1024 MB memory setting
  • Default HTTP Apache client used to talk to the DynamoDB database
  • Java compilation option "-XX:+TieredCompilation -XX:TieredStopAtLevel=1" which proved to provide very good trade off between the cold and warm start times
In the SAM template I added the possibility to define the Lambda architecture in the global Lambda function section like:
Just uncomment the architecture you'd like to use for your Lambda functions.
Even if Java is "write once, run everywhere", I anyway compiled and built the application jar file for my arm64 measurements on the t4g AWS EC2 instance with Graviton processor (which is based on arm64/aarch64 architecture) by previously installing Amazon Corretto 21 for Linux aarch64. You can find this jar here.
I've also re-measured everything for x86_64 architecture once again to have the comparable results using the same Corretto Java 21 latest runtime version which at the time of my measurements was Java 21.v17 .
The results of the experiment below were based on reproducing more than 100 cold and approximately 100.000 warm starts. For it (and experiments from my previous article) I used the load test tool hey, but you can use whatever tool you want, like Serverless-artillery or Postman . It's also important to be aware that I started the measurements right after the fresh source code (re-) deployment of the application. Please note that there is also Impact of the snapshot tiered cache on the cold starts, where the first invocations are generally slower, and subsequent ones become quicker until certain number of invocations is reached.
Now lets put all measurements for "get product by existing id" (Lambda functions GetProductByIdFunction and GetProductByIdWithPrimingFunction for SnapStart enabled and priming measurements) case together.
I'll refer approaches by their number
  1. x86_64, no SnapStart enabled
  2. arm64, no SnapStart enabled
  3. x86_64, SnapStart enabled without priming
  4. arm64, SnapStart enabled without priming
  5. x86_64, SnapStart enabled with DynamoDB request priming
  6. arm64, SnapStart enabled with DynamoDB request priming
Cold (c) and warm (m) start time in ms:
Approachc p50c p75c p90c p99c p99.9c maxw p50w p75w p90w p99w p99.9w max
13554361536663800410941125.426.016.8814.0940.981654
23835390439834047433243365.966.667.6916.0143.681845
31794184720912204224022405.375.966.9315.8851.641578
41845195325922763279327965.916.567.6316.7563.521779
580387011041258144014415.556.257.4515.5063.52449
6910100113771623168516866.056.727.8116.6674.68551

Conclusion

In this article we compared measurements of the cold and warm start times of the Lambda function connecting to DynamoDB database for 3 use cases:
  • without SnapStart enabled on the Lambda function
  • with SnapStart enabled on the Lambda function but without priming optimization
  • with SnapStart enabled on the Lambda function and with priming of the DynamoDB request
We saw that by using the x86_64 architecture all cold and warm start times were lower comparing to arm64 architecture. But as **arm64 architecture Lambda pricing is 25% cheaper than x86_64 architecture** , it introduces very interesting cost-performance trade off.
For our measurements, for all 3 use cases:
  • Lambda cold start times with arm64 architecture were for many percentiles only 10-20% (and only in very rare cases 25-27%) slower compared to x86_64 architecture.
  • Lambda warm start times with arm64 architecture were for many percentiles only 5-10% slower compared to x86_64 architecture.
So, the choice of arm64 architecture is quite reasonable for this sample application. As SnapStart support for arm64 architecture has only been introduced recently, I also expect some performance improvements in the future. Please do your own measurements for your use case!
In the next part of the article we'll do the same performance measurements but setting the Lambda memory to different values between 256 and 2048 MBs and compare the results.
 

Comments