Why Async Lambda with AWS AppSync ?

Lets chat about the new feature for AWS Appsync and why its awesome for Generative AI applications.

Derek Bingham
Amazon Employee
Published Jun 12, 2024
Last Modified Jun 14, 2024

Tldr;

AWS AppSync's latest addition of asynchronous Lambda function invocations is an awesome feature for handling long-running tasks, particularly in generative AI applications. It allows API requests to return immediately while Lambda functions execute in the background, significantly improving scalability and responsiveness. Particularly useful when making requests from an LLM that in some instance may take some time ( depending on the model used )

Introduction

When building modern applications, we developers are always on the lookout for tools and patterns to help us build the best application we can, with least amount of effort and preferably code ! ( come on - we all know more code = more bugs, right? )
GraphQL with AWS Appsync has always been a great way of better integrating and building with an LLM. Not only because as a data aggregator AWS Appsync makes building more complex GenAI application patterns, like RAG, much easier. AWS Appsync also has built in websocket support out of the box, via subscriptions, which makes streaming responses from LLM's also a much simpler task.
Previously, AWS AppSync only supported synchronous Lambda invocations, which posed challenges for long-running operations due to the 30-second timeout limit. Developers had to implement workarounds, such as offloading tasks to other services, other functions via SQS or settling for direct synchronous invocation. The new asynchronous invocation simplifies this process by eliminating these additional steps, enabling a more seamless execution of long-running tasks. Lets take a look at what this would look like in a simple architecture flow diagram or the before and after this feature became available

Architecture

Before ( Offloading)

Original Archiecture
Before Async Invocations
As you can see above, we have a lot of extra services involved when calling, in this case, Amazon Bedrock. The client makes a query using the GraphQL API hosted on AWS AppSync, this query then invokes a Lambda synchronously which needs to put the contents of the query on a SQS queue, to avoid encountering a timeout. This message is then picked up and processed by another Lambda which feeds the prompt into Amazon Bedrock. Bedrock then uses the specified LLM to start to send response tokens in batches to Lambda which invokes a mutation in the GraphQL API hosted on AWS AppSync to send those tokens to the user via WebSockets.

After ( asynchronous invocation )

After Asynchronous invocation
Now things have changed dramatically as we can invoke the Lambda function asynchronously from AWS AppSync. There are much less moving pieces and everything is a lot simpler, as the called Lambda function can respond immediately to AWS Appsync and then continue working on sending the prompt to Amazon Bedrock and then subsequently then start to invoke a mutation with the response tokens from Bedrock on AWS AppSync, which then sends those tokens to the user via WebSockets.
So as you can see now, in scenarios where AI model inference or complex data processing takes longer, the API can respond immediately, and the results can be delivered later via AppSync WebSockets once processing is complete. This approach not only enhances user experience but also allows for better resource management and error handling through built-in retry mechanisms and failure destinations - like a Dead Letter Queue(DLQ).

Implementation

To implement an asynchronous invocation of an AWS Lambda function from AppSync we simply change the resolver call to the function (Note: this is the same for VTL and JS resolver types ). We use the InvocationType of Event which will enable the resolver to call the Lambda asynchronously.
Example Javascript resolver
Example resolver implementation in javascript
So as you can see above in the Javascript resolver request function, all we do is specify the invocationType attribute as Event. Also note that the response function of the resolver can return a static response to indicate to the caller that AppSync has received the request for processing, in this case its 'OK'

Notes and Limitations

With these new asynchronous lambda invocations from Appsync, there are a few things that are worth noting before you go off and decide to change all your AppSync resolvers :). The final response payload size for subscriptions cannot exceed 240K which means careful design considerations are needed if your use is for large data processing tasks.
However, even despite the constraints, its fair to say that the asynchronous Lambda resolver capability in AppSync is a significant improvement for developers, facilitating more efficient handling of long-running operations and enhancing the robustness of serverless applications.

In Summary

Key Benefits:

  1. Improved Scalability and Responsiveness: Immediate API responses while processing tasks in the background.
  2. Simplified Architecture: Reduces the need for additional steps to handle long-running tasks.
  3. Enhanced Error Handling: Built-in retry mechanisms and failure destinations improve reliability.
  4. Use Case Versatility: Ideal for generative AI, data processing, and other long-running operations.

Things to remember:

  1. Payload Size Limitations: 240KB for subscription responses.
  2. Design Adjustments: Necessary for managing large payloads within the provided limits.

Conclusion

Thanks for reading and I hope this article has made you think more about using AWS Appsync when it comes to building out your applications that harness GenAi and Amazon Bedrock.

Further Reading:

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

1 Comment