Build Retrieval Augmented Generation (RAG) based Generative AI (GenAI) application with Amazon Bedrock
Quick GenAI RAG-based application prototype using Large Language Models (LLM) with Amazon Bedrock
Published May 19, 2024
The main objective of this article is to share a quick and easy way to prototype Large Language Model (LLM) with Retrieval Augmented Generation (RAG) application using Amazon Bedrock. We are using information from few HTML documents with general descriptions in the areas of Transformer model architecture and Attention mechanism. These documents are embedded and form our knowledge base on the subject. This example approach can be easily adapted to use different models with advanced RAG to evaluate the LLM responses based on the required tasks.
The following libraries are required for the LLM RAG application prototype.
- Create a boto3 client to connect programmatically for making inference requests for Foundational Models (FM) hosted in Amazon Bedrock (eg. Cohere Command R).
- Load and Chunk (html documents with unstructured library).
- Embed the document chunks in batches using Cohere Embed (English) model hosted in Amazon Bedrock.
- Uses the
hsnwlib
package to index the document chunk embeddings. This ensures efficient similarity search during retrieval. For simplicity, we usehsnwlib
as vector library for our knowledge database.
- Chatbot decides if it needs to consult external information from knowledge database before responding. If so, it determines an optimal set of search queries to use for documents retrieval.
- The document search is performed by the
knn_query()
method from thehnswlib
library. With a user query message, it returns the document chunks that are most similar to this query. We can define the number of document chunks to return using the attributeretrieve_top_k()
. If there are matched documents, the retrieved document chunks are then passed as documents in a new query message send to the FM (Cohere Command R+).
- Display external information from RAG with citations and retrieved document chunks based on the
retrieve_top_k
parameter.
- The chat history is updated for next user query.
- Using an alternative Foundational Models (FM) hosted in Amazon Bedrock (eg. Claude3 Sonnet) to handle general questions whenever the Cohere models do not know the answers. LangChain
LLMChain
andConversationBufferMemory
are used in to establish the conversation and store the chat history.
Using
Streamlit
to build the User interface for the LLM Chatbot with RAG.