The Computeless RAG Tool ⭐️

Hey folks! I’m excited to share a project that combines serverless and generative AI: the Computeless RAG tool. This tool taps into private data, like your company's internal databases, and leverages GenAI without requiring traditional AWS compute resources. Let’s dive into how this tool is built.

1. What is the Computless RAG Tool 🤷🏽‍♂️ ?

The Computeless RAG tool uses AWS AppSync to manage GenAI-driven queries without traditional compute resources. By leveraging AppSync's JavaScript to orchestrate the RAG pipeline, the architecture becomes simpler, faster, and more cost-effective.

Key components include:

Amazon Bedrock for foundational models.
Pinecone as the vector database.
AppSync JS resolvers as the orchestrator.

This integration demonstrates AppSync's potential to serve as the backbone of a serverless GenAI system.

2. Technological Overview 🛠️

Let's explore the key technologies behind the Computeless RAG tool:

2.1 AWS AppSync

AWS AppSync serves as the central orchestrator, leveraging JavaScript pipeline resolvers to efficiently manage data sequence and context throughout the query and response process.

2.2 Pinecone

Pinecone serves as the vector database. It will allow for the storage of the internal data and quick retrieval of vector embeddings based on user queries.

2.3 AWS Secrets Manager

AWS Secrets Manager is used to secure sensitive credentials like the Pinecone API key, crucial for interfacing with the vector database.

2.4 Amazon Bedrock

Amazon Bedrock provides the AI power in the setup, serving two main functions:

Amazon Titan Text Embeddings: This model transforms user queries into vector embeddings, crucial for querying the Pinecone database to fetch relevant data.
Anthropic Claude 3 Haiku: After data retrieval, this model processes the information to generate accurate and contextually appropriate responses to the queries, leveraging its advanced natural language processing capabilities.

2.5 Integration Flow

The integration of these technologies is orchestrated through several steps:

Secure API Key Retrieval: Retrieve the Pinecone API key from AWS Secrets Manager to ensure secure access to the database.
Embedding Generation: Convert user queries into vector embeddings using the Titan model.
Data Storage: Insert vectors (and the corresponding text) into the Pinecone database for later retrieval.
Data Retrieval: Use these embeddings to locate the most relevant data vectors within the Pinecone database.
Response Generation: Employ Anthropic’s Claude 3 Haiku to formulate a final, contextual answer based on the retrieved data and the initial query.

3. Architecture and Workflow 🏗️

Let's now break down the architecture and workflow of the Computeless RAG tool, providing a detailed look at how each component interacts.

3.1 Architectural Overview

The architecture diagram illustrates the orchestration of services and data flow in the tool. Each GraphQL query or mutation to the AppSync API triggers a JavaScript pipeline resolver. This resolver runs a sequence of functions to interact with services, data sources, or manipulate data for the next step.

3.2 AppSync Pipeline Resolver Functions

AWS AppSync's pipeline resolvers are vital for complex data operations that require multiple steps. They enable you to define a sequence of function calls, each transforming the output and passing it to the next. This is ideal for workflows where data needs to be fetched, transformed, and used to generate responses, just like the Computeless RAG tool.

Why Use Pipeline Resolvers?

Sequential Logic Execution: Pipeline resolvers execute operations in sequence, where each step depends on the previous one, making them perfect for the use case.
Decoupling Logic and Data Sources: Each function in the pipeline can use different data sources or none at all, creating a cleaner architecture by separating data retrieval, processing, and response generation.
Efficiency and Performance: By handling data flow within AppSync, we reduce the need for external orchestration and cut down on latency from multiple network calls.

Each function in the pipeline is managed by individual JS files located in the api/resolvers/functions folder of the SAM project. This organization makes deployment and updates clear and manageable. Here’s how each function integrates into the pipeline:

3.2.1 Get Pinecone API Key Function (`getPineconeApiKeyFunction.js`)

This function initiates the pipeline by securely retrieving the Pinecone API key from AWS Secrets Manager using an HTTP data source. The key is stored in ctx.stash, making it accessible to subsequent functions.

3.2.2 Generate Embedding Function (`generateEmbeddingFunction.js`)

With the API key secured, this function generates vector embeddings from the user's query. It uses an HTTP data source to send the query to Amazon Bedrock, which leverages the Titan Text Embeddings model to transform the query into vector embeddings. This step is crucial for accurately matching the query with relevant data in Pinecone.

3.2.3 Store Embedding Function (`storeEmbeddingFunction.js`)

This function is used to store an embedding in the Pinecone database. It uses an HTTP data source to execute this operation.

3.2.4 Search Embeddings Function (`searchEmbeddingsFunction.js`)

This function uses an embedding to search the Pinecone database for the most relevant entries. It uses an HTTP data source to perform the query, retrieving data that closely matches the user's initial inquiry based on the previously generated embeddings.

3.2.5 Build Prompt Function (`buildPromptFunction.js`)

This function does not interact with any data sources and is responsible for assembling the final prompt. It combines the initial query, contextual data from Pinecone, and specific instructions for the language model to create a comprehensive input. This step is crucial for ensuring that the AI model generates relevant and accurate responses.

3.2.6 Invoke Model Function (`invokeModelFunction.js`)

This function sends a prompt to Anthropic’s Claude 3 Haiku model via Amazon Bedrock using an HTTP data source. It passes the generated answer down the pipeline by storing it in ctx.stash.

3.3 Pipeline Resolvers

3.3.1 `embedContext` Mutation

The embedContext mutation in the AppSync architecture is crucial for populating the Pinecone database with data. This mutation converts the user query into embeddings and stores it into the vector store making it available for retrieval in the rag query.

The mutation triggers a pipeline resolver composed of the following functions in sequence:

getPineconeApiKeyFunction
generateEmbeddingFunction
storeEmbeddingFunction

3.3.2 `rag` Query

The rag Query is the core functionality of the Computeless RAG tool, designed to retrieve and generate responses based on user queries. This GraphQL query makes a semantic search on the Pinecone database based on the user query, it then generates an answer using Claude 3 Haiku.

The query triggers a pipeline resolver composed of the following functions in sequence:

getPineconeApiKeyFunction
generateEmbeddingFunction
searchEmbeddingsFunction
buildPromptFunction
invokeModelFunction

4. SAM Project Structure 📝

The AWS Serverless Application Model (SAM) simplifies creating and deploying serverless applications on AWS. In the Computeless RAG tool project, the SAM template (template.yaml) and the GraphQL schema (schema.graphql) are key components defining the infrastructure and API interface.

4.1 Overview of `template.yaml`

The template.yaml file defines the resources necessary for deploying the Computeless RAG tool on AWS. Here’s a breakdown of the primary components:

AWS::Serverless::GraphQLApi: This resource creates the AppSync API. It uses the GraphQL schema provided in the schema.graphql file. The API is configured with API keys for authentication and connected to various data sources and resolvers for handling operations.
AWS::Serverless::GraphQLApi - Functions: Functions are defined for each step in the AppSync pipeline resolver, mapped to specific JavaScript files. Those functions have been covered in the AppSync Pipeline Resolver Functions section above.
AWS::IAM::Role: Defines the roles required for the AppSync data sources to interact with AWS services securely. Each role includes policies that grant necessary permissions for actions like retrieving secrets or invoking AI models.
AWS::AppSync::DataSource: In the SAM template, we are defining 3 AppSync HTTP datasources. They allow us to make HTTP calls on Pinecone, Amazon Bedrock and AWS Secrets Manager from within our AppSync pipeline resolvers.

4.2 Understanding `schema.graphql`

The schema.graphql file defines the GraphQL schema used by the AppSync API. Here’s the structure:

This schema sets up a simple API with a single type of query (rag) that accepts a string and returns a QueryOutput type containing a string field output. This setup handles the Q&A functionality of the tool, allowing users to submit queries and receive text responses.

It also exposes a mutation embedContext that we will use to populate the Pinecone database.

4.3 The AppSync resolver functions folder

As mentioned in section 3, the api/resolvers/functions folder contains the js files representing the AppSync resolver functions that will be executed in the pipeline resolver whenever the GraphQL rag is invoked.

4.4 Deploying the SAM project

Deploying this SAM project involves several steps streamlined by the SAM CLI. Here’s how to deploy the project:

Pre-Requisites:

Clone the project on your local computer
Install AWS CLI and configure it with your AWS account credentials.
Install the AWS SAM CLI.
Create a Pinecone account, a Pinecone index with 1536 dimension (same as the output vector generated by Amazon Titan Text Embeddings) and copy the Pinecone API key and newly created index host from your account.
Create a plaintext secret on AWS Secrets Manager. Name of the secret should be pineconeApiKey and value is the API key copied from your Pinecone account.
Update the YOUR_PINECONE_INDEX_HOST in the template.yaml file with the value you copied from your Pinecone account.
Make sure to enable the following models on the Amazon Bedrock Console:
- amazon.titan-embed-text-v1
- anthropic.claude-3-haiku-20240307-v1:0

Build the Project:

Navigate to the project directory in your terminal.
Run the command: sam build. This command prepares the deployment by building any dependencies specified in the template.

Deploy the Project:

After building the project, deploy it by running: sam deploy --guided.
- The guided deployment process will prompt you to enter parameters such as the stack name, AWS region, and any parameters required by the template.
- Confirm the settings and proceed with the deployment. The CLI will handle the creation of all specified resources and provide you with an output that includes the URL of the deployed AppSync API.

5. Testing the Computeless RAG Tool 🧪

Now that the Computeless RAG Tool is successfully deployed, let's go ahead and test it.

5.1 Populating the Pinecone index

In order to test the tool, we need to populate some data in the Pinecone database. To do so, we have deployed a mutation called embedContext. Let's add a few entries in the Pinecone index.

Navigate to the AWS AppSync console
Select the API you just deployed. It should be named ComputelessRagApi
Click on Queries
Paste the following in the query editor

Run all 9 mutations by clicking on the Run button and selecting the corresponding mutation

5.2 Asking questions

With "private" data stored in the Pinecone index, we can now query the tool.

Navigate to the AWS AppSync console
Select the API you just deployed. It should be named ComputelessRagApi
Click on Queries
Paste the following in the query editor

Run both queries by clicking on the Run button and selecting the corresponding query.
- With the negative query, the tool should mention it does not have the necessary information to answer your question.
- With the positive query, it should provide you with an answer incorporating information that you have inserted into your Pinecone index in the previous section.

5.3 Cleanup

In order to remove the AWS resources created in this tutorial, just run the following command: sam delete

6. Conclusion 🌅

Developing the Computeless RAG tool has showcased how seamlessly serverless architectures can integrate with generative AI to handle complex queries from private datasets, leveraging AWS AppSync, Amazon Bedrock, and Pinecone.

Potential Improvements:

Amazon Cognito: Use for user authentication and authorization, allowing access only to data from their respective companies. Cognito integrates seamlessly with AWS AppSync.
AWS Secrets Manager: Each API request fetches secrets, which has two downsides:
1. Costs for secret retrieval with each request, balanced by savings from not using compute resources like Lambda functions.
2. Secrets may appear in plaintext in CloudWatch logs when logging is enabled, exposing the Pinecone API key. Amazon CloudWatch Logs data protection can mitigate this.
Data Ingestion with Bedrock Knowledge Bases: For the data ingestion part in the vector datastore, using Bedrock Knowledge Bases could be an effective option.
Date Retrieval with Bedrock Knowledge Bases: Instead of manually embedding the user query and making the vector search on Pinecone, using the Retrieve API for Amazon Bedrock Knowledge Bases can be an "AWS managed" way of doing things. This approach combines the functionalities of the generateEmbeddingFunction and searchEmbeddingsFunction into a single function, streamlining the process and leveraging AWS-managed services.

Additional Considerations

I acknowledge that other solutions are possible, and my point here is to present AppSync as one of the options among many, offering the luxury of choice.

Step Functions: Step Functions can also avoid compute and orchestrate your RAG pipeline, but with a different cost structure. You pay for executions and GB-seconds, which can add up depending on your use case. Compute time adds to the overall cost and network hops can introduce some additional latency.
Lambda with Bedrock: Using AWS Lambda functions with Amazon Bedrock can handle complex queries and integrate with various data sources. This approach involves costs associated with Lambda invocations and execution time, potentially adding latency.
Lambda with Bedrock Knowledge Bases: Combining Lambda with Bedrock Knowledge Bases enhances capabilities by providing managed knowledge bases. This approach also involves costs associated with Lambda invocations and execution time, potentially adding latency.

Regardless of the chosen method, using Bedrock incurs a cost. The key difference is that with Lambda or Step Functions, compute time is an additional cost. The advantage of AppSync for this use-case is that it offers 30 seconds of "free" compute time to connect to data sources and execute the linear RAG logic, potentially offering cost savings compared to other options.

In summary, I am not trying to convince that AppSync is the go-to solution for all cases. Rather, I aim to share that AppSync is a lesser-known yet viable option for building serverless, GenAI-driven applications. This adds to the range of choices available to developers, enabling them to select the best tool for their specific needs.

Thank you for joining me on this exploration of the Computeless RAG Tool! I hope you enjoyed reading this as much as I enjoyed building it. Find the code in the following repo.

Site Terms, Privacy, and more.

The Computeless RAG Tool: Orchestrating RAG with AWS AppSync