AWS Bedrock Generative AI Application Architecture

Building Enterprise-level application using Large Language Models (LLM) with Amazon Bedrock

Published Apr 14, 2024

ABSTRACT

The motivation of this article is to provide a high level solution architecture that can be potentially used in Questions and Answers and/or AI Conversation tasks. It illustrates how each of the selected AWS services can be used to perform authentication, compute, storage, analytics to build an "Enterprise-level application using Large Language Models (LLM)" with Amazon Bedrock.

ARCHITECTURAL SOLUTION

AWS Bedrock Generative AI Application Architecture
The sample architecture allows the LLM-powered application to perform the task of Questions and Answers with Retrieval Augmented Generation (RAG) using LangChain.
The list of AWS services together with the brief descriptions below summarises the corresponding areas of work in this architectural diagram.
1. Amazon Athena is used to query the Questions & Answers logs to perform the required data analytics (eg. accuracy of query responses).
2. Amazon S3 stores the structured and unstructured data for Question & Answers logs and private documents used to improve on context relevancy.
3. Amazon DynamoDB provides the serverless NoSQL database to store the corresponding history of the Questions & Answers.
4. Amazon Bedrock provides Amazon managed service with access to the Foundational Models (Cohere/Embed and Claude 3 Haiku) via APIs.  Claude 3 is used for multilingual query responses and Cohere/Embed is used for vector embeddings.
5. Amazon Cognito authenticates the users for the Questions and Answers application access.
6. Amazon API Gateway with AWS Lambda implement the fully managed backend API endpoint.  Amazon API Gateway monitors and secures APIs whereas  AWS Lambda responds to events and automatically manages the compute resources.
7. Amazon Aurora Postgres utilises the scalable vector store with pgvector plugin to store the embeddings. These embeddings will be useful in performing RAG for improvement in context relevancy and accuracy.
8. Amazon Elastic Container Service (ECS) performs the optional web crawling, data parsing (depending on document type support) and embedding tasks together with hybrid search (incorporates semantic search with filtering and keyword search) for the similar document to be used in the LLM prompts reconstruction.

LLM PIPELINE

The high level task pipeline for the LLM-powered application would be :
  1. Generate Embeddings (Cohere / Embed) for query matching and Query Classification to identify the query intents.
  2. Use RAG to retrieve similar documents with query (based on Cosine Similarity) from Vector Database.
  3. Augment User Query with similar documents.
  4. Re-write Query and perform LLM Prompt Construction for Foundational Model (Claude 3 Haiku)
  5. Perform Guard Railing on returned LLM response
  6. Format and respond answer to User
In the model evaluation, we could prepare a simple evaluation table (with 👍, 👎) to measure relevancy and accuracy of the application responses to the user queries. This table could also be used as a source for Reinforcement Learning Human Feedback (RLHF) which can be used in fine-tuning of the Foundational Model (Claude 3 Haiku in this example). Moreover, during the solution building, we could use the Amazon Bedrock Chat playground to compare the prompt outputs with the application responses of the same user queries.
Amazon Bedrock - Claude 3 Haiku with LLM parameters settings

REFERENCES 📚

Comments